php - Regex for finding url -
<a href="http://newday.com/song.mp3">first link</a> <div id="right_song"> <div style="font-size:15px;"><b>pitbull ft. chris brown - pitbull feat. chris brown - international love mp3</b></div> <div style="clear:both;"></div> <div style="float:left;"> <div style="float:left; height:27px; font-size:13px; padding-top:2px;"> <div style="float:left;"> <a href="http://secondurl.com/thisoneshouldonlyoutput" rel="nofollow" target="_blank" style="color:green;">second link</a></div>';
i want out second link html using pregmatch_all. current regex looks this:
preg_match_all("/\<a.+?href=(\"|')(?!javascript:|#)(.+?)\.mp3(\"|')/i", $html, $urlmatches);
this works fine , 2 links output, want second 1 output without .mp3 extension. please me
description
this regex will
- match first anchor tag after
<div id="rigth_song">
has href attribute value ends.mp3
- will avoid many of edge cases make matching html text regular expression difficult.
<div\sid="right_song">.*?<a(?=\s|>)(?=(?:[^>=]|='[^']*'|="[^"]*"|=[^'"][^\s>]*)*?\shref=(['"]?)(.*?\.mp3)\1(?:\s|\/>|>))(?:[^>=]|='[^']*'|="[^"]*"|=[^'"][^\s>]*)*>.*?<\/a>
example
sample text
note difficult edge case in second anchor tag, string href="bad.mp3"
nested inside attribute value; there javascript greater sign >
inside value; , real href attribute without quotes.
<a href="http://newday.com/song.mp3">first link</a> <div id="right_song"> <div style="font-size:15px;"><b>pitbull ft. chris brown - pitbull feat. chris brown - international love mp3</b></div> <div style="clear:both;"></div> <div style="float:left;"> <div style="float:left; height:27px; font-size:13px; padding-top:2px;"> <div style="float:left;"> <a onmouseover=' href="bad.mp3" ; if ( 6 > x ) {funrotate(href); } ; ' href="http://secondurl.com/thisoneshouldonlyoutput.mp3">first link</a> </div>
code
<?php $sourcestring="your source string"; preg_match('/<div\sid="right_song">.*?<a(?=\s|>)(?=(?:[^>=]|=\'[^\']*\'|="[^"]*"|=[^\'"][^\s>]*)*?\shref=([\'"]?)(.*?\.mp3)\1(?:\s|\/>|>))(?:[^>=]|=\'[^\']*\'|="[^"]*"|=[^\'"][^\s>]*)*>.*?<\/a> /imsx',$sourcestring,$matches); echo "<pre>".print_r($matches,true); ?>
match
group 0 gets text <div
through including full matching anchor tag
group 1 gets opening quote around href value referenced later
group 2 gets href value
[0] => <div id="right_song"> <div style="font-size:15px;"><b>pitbull ft. chris brown - pitbull feat. chris brown - international love mp3</b></div> <div style="clear:both;"></div> <div style="float:left;"> <div style="float:left; height:27px; font-size:13px; padding-top:2px;"> <div style="float:left;"> <a onmouseover=' href="bad.mp3" ; if ( 6 > x ) {funrotate(href); } ; ' href="http://secondurl.com/thisoneshouldonlyoutput.mp3">first link</a> [1] => " [2] => http://secondurl.com/thisoneshouldonlyoutput.mp3
Comments
Post a Comment