fork download
  1. <?php
  2. function killSpam($html, $whitelist){
  3.  
  4. preg_match_all('%(<(?:\s+)?a.*?href=["|\'](.*?)["|\'].*?>(.*?)<(?:\s+)?/(?:\s+)?a(?:\s+)?>)%sm', $html, $match, PREG_PATTERN_ORDER);
  5. for ($i = 0; $i < count($match[1]); $i++) {
  6. if(!preg_match("/$whitelist/", $match[1][$i])){
  7. $spamsite = $match[3][$i];
  8. $html = preg_replace("%" . preg_quote($match[1][$i]) . "%", " (SPAM) ", $html);
  9. }
  10. }
  11.  
  12.  
  13. preg_match_all('/(\b(?:(?:(?:https?|ftp|file):\/\/|www\.|ftp\.)[A-Z0-9+&@#\/%?=~_|$!:,.;-]*[A-Z0-9+&@#\/%=~_|$-]|((?:mailto:)?[A-Z0-9._%+-]+@[A-Z0-9._%-]+\.[A-Z]{2,6})\b)|"(?:(?:https?|ftp|file):\/\/|www\.|ftp\.)[^"\r\n]+"|\'(?:(?:https?|ftp|file):\/\/|www\.|ftp\.)[^\'\r\n]+\')/i', $html, $match2, PREG_PATTERN_ORDER);
  14.  
  15. for ($i = 0; $i < count($match2[1]); $i++) {
  16. if(!preg_match("/$whitelist/", $match2[1][$i])){
  17. $spamsite = $match2[1][$i];
  18. $html = preg_replace("%" . preg_quote($spamsite) . "%", " (SPAM) ", $html);
  19. }
  20. }
  21.  
  22.  
  23. return $html;
  24.  
  25. }
  26.  
  27.  
  28. $html = <<< LOB
  29. <p>Hello world, thanks to <a href="http://m...content-available-to-author-only...e.com/about" rel="nofollow">http://mywebsite/about</a> I learned a lot. I found
  30. you on <a href="http://w...content-available-to-author-only...g.com" rel="nofollow">http://w...content-available-to-author-only...g.com</a>, <a href="https://google.com/search" rel="nofollow">https://google.com/search</a> and on some <a href="http://w...content-available-to-author-only...e.com" rel="nofollow">www.spamwebsite.com/refid=spammer2< /a >. www.spamme.com, http://m...content-available-to-author-only...m.com/?aff=122, http://c...content-available-to-author-only...r.com/?money=22 and spam@email.com, file://spamfile.com/file.txt ftp://s...content-available-to-author-only...p.com/file.exe </p>
  31. LOB;
  32.  
  33.  
  34. //USAGE
  35.  
  36. $whitelist = "(google\.com|yahoo\.com|bing\.com|nicesite\.com|mywebsite\.com)";
  37.  
  38. $noSpam = killSpam($html, $whitelist);
  39.  
  40. echo $noSpam;
Success #stdin #stdout 0.02s 24448KB
stdin
Standard input is empty
stdout
 <p>Hello world, thanks to <a href="http://m...content-available-to-author-only...e.com/about" rel="nofollow"> (SPAM) </a> I learned a lot. I found
  you on <a href="http://w...content-available-to-author-only...g.com" rel="nofollow">http://w...content-available-to-author-only...g.com</a>, <a href="https://google.com/search" rel="nofollow">https://google.com/search</a> and on some  (SPAM) .  (SPAM) ,  (SPAM) ,  (SPAM)  and  (SPAM) ,  (SPAM)   (SPAM)  </p>