fork(4) download
  1. <?php
  2.  
  3. $html=<<<EOD
  4. <div class='container clickable' data-param='{"footer"<div>Bye</div>","info":"We win"}'>
  5. <img src='a.jpg' />
  6. </div>
  7. <a href='a.html'>The A</a>
  8. <span></span>
  9. <span data-span-param='{"detailTag":"<span class=\"link\">Anything here</span>"}'>
  10. <a></a>
  11. </span>
  12. <span data-param-tag='<p>How are you</p>'>Thanks</span><div data-tag="%[[[[[[[[[<span class='link'>It is difficult now</span>]]]]]]]]]%">wow</div>
  13. <div id="<div class="<span id="<span id="aa" data-role="then">How</span>" data-param="<span id="finish">we</span>">Thanks</span>">Whence</div>">
  14. Finaly so even though you have <> or < > they won't be captured except html tags unquoted by double or single quote.
  15. </div>
  16. EOD;
  17.  
  18. $tags = array();
  19. if(preg_match_all('~(?!<\s*>)\<(?:(?>[^<>]+)|(?R))*\>~',$html,$matchall,PREG_SET_ORDER)){
  20. foreach($matchall as $m){
  21. $tags[] = $m[0];
  22. }
  23. }
  24. print_r($tags);
Success #stdin #stdout 0.02s 52432KB
stdin
Standard input is empty
stdout
Array
(
    [0] => <div class='container clickable' data-param='{"footer"<div>Bye</div>","info":"We win"}'>
    [1] => <img src='a.jpg' />
    [2] => </div>
    [3] => <a href='a.html'>
    [4] => </a>
    [5] => <span>
    [6] => </span>
    [7] => <span data-span-param='{"detailTag":"<span class=\"link\">Anything here</span>"}'>
    [8] => <a>
    [9] => </a>
    [10] => </span>
    [11] => <span data-param-tag='<p>How are you</p>'>
    [12] => </span>
    [13] => <div data-tag="%[[[[[[[[[<span class='link'>It is difficult now</span>]]]]]]]]]%">
    [14] => </div>
    [15] => <div id="<div class="<span id="<span id="aa" data-role="then">How</span>" data-param="<span id="finish">we</span>">Thanks</span>">Whence</div>">
    [16] => </div>
)