fork(1) download
  1. import re
  2. from sys import stdin
  3.  
  4. text = stdin.read()
  5.  
  6. author = r"(?:[A-Z][A-Za-z'`-]+)"
  7. etal = r"(?:et al\.?)"
  8. additional = f"(?:,? (?:(?:and |& )?{author}|{etal}))"
  9. year_num = "(?:19|20)[0-9][0-9]"
  10. page_num = "(?:, p\.? [0-9]+)?" # Always optional
  11. year = fr"(?:, *{year_num}{page_num}| *\({year_num}{page_num}\))"
  12. regex = fr'\b(?!(?:Although|Also)\b){author}{additional}*{year}'
  13. matches = re.findall(regex, text)
  14. matches = list( dict.fromkeys(matches) )
  15. matches.sort()
  16.  
  17. #print(matches)
  18. print ("\n".join(matches))
Success #stdin #stdout 0.02s 9552KB
stdin
Although James (2020) recognized blablabla, Smith et al. (2020) found mimimi. 
Those inconsistent results are a sign of lalala (Green, 2010; Grimm, 1990). 
Also James (2020) ...
stdout
Green, 2010
Grimm, 1990
James (2020)
Smith et al. (2020)