fork download
  1. import re
  2.  
  3. links_to_exclude = ['cnn.com', 'nytimes.com']
  4. results = ['https://f...content-available-to-author-only...o.bar', 'https://m...content-available-to-author-only...n.com/2018/08/21/technology/facebook-disinformation-iran-russia/index.html','https://w...content-available-to-author-only...n.com/videos/politics/2018/08/22/carl-bernstein-worse-than-watergate-egregious-trump-newday-sot-vpx.cnn','https://w...content-available-to-author-only...s.com/2018/08/13/us/politics/peter-strzok-fired-fbi.html?hp&action=click&pgtype=Homepage&clickSource=story-heading&module=first-column-region&region=top-news&WT.nav=top-news']
  5.  
  6. for result in results:
  7. print "URL: " + result
  8. for link in links_to_exclude:
  9. regex = '((http[s]?|ftp):\/)?\/?([^:\/\s]+)?({})\/([^\/]+)'.format(link)
  10. if re.search(regex, result):
  11. print ' Matches: ' + link
  12. else:
  13. print ' Does not match: ' + link
  14.  
Success #stdin #stdout 0.06s 65300KB
stdin
Standard input is empty
stdout
URL: https://f...content-available-to-author-only...o.bar
  Does not match: cnn.com
  Does not match: nytimes.com
URL: https://m...content-available-to-author-only...n.com/2018/08/21/technology/facebook-disinformation-iran-russia/index.html
  Matches: cnn.com
  Does not match: nytimes.com
URL: https://w...content-available-to-author-only...n.com/videos/politics/2018/08/22/carl-bernstein-worse-than-watergate-egregious-trump-newday-sot-vpx.cnn
  Matches: cnn.com
  Does not match: nytimes.com
URL: https://w...content-available-to-author-only...s.com/2018/08/13/us/politics/peter-strzok-fired-fbi.html?hp&action=click&pgtype=Homepage&clickSource=story-heading&module=first-column-region&region=top-news&WT.nav=top-news
  Does not match: cnn.com
  Matches: nytimes.com