fork download
  1. import re
  2.  
  3. text = "Some text, containing some repeating words. It contains words that are repeating"
  4. words = [w.lower() for w in re.findall(r'[a-zA-Z]+', re.sub(r'<br.*>', '', text))]
  5. print(words)
  6. print(set(words))
  7.  
Success #stdin #stdout 0.02s 9508KB
stdin
Standard input is empty
stdout
['some', 'text', 'containing', 'some', 'repeating', 'words', 'it', 'contains', 'words', 'that', 'are', 'repeating']
{'text', 'contains', 'some', 'are', 'that', 'words', 'it', 'containing', 'repeating'}