fork download
  1. import re
  2. s="""some text before Expedien: 1-21-212-16-26 some random text
  3. Reference RE9833 of all sentences.
  4. abc
  5. 123
  6. 456
  7. something blah blah Ref.:
  8. tramite 1234567
  9. Ref.:
  10. some junk Expedien N° 18-00777 # some new content
  11. some text Expedien N°18-0022995 # some garbled content"""
  12. my_list = ['Ref.:', 'Reference', 'tramite', 'Expediente', 'Expediente No', 'Expedien N°', 'Exp.No', 'Expedien']
  13. rx = r'(?<!\w)({})\W*([A-Z]*\d+(?:-+[A-Z]*\d+)*)'.format('|'.join(map(re.escape,my_list)))
  14. print(re.findall(rx, s))
  15.  
Success #stdin #stdout 0.04s 27720KB
stdin
Standard input is empty
stdout
[('Expedien', '1-21-212-16-26'), ('Reference', 'RE9833'), ('tramite', '1234567'), ('Expedien N°', '18-00777'), ('Expedien N°', '18-0022995')]