php - Regex for finding url -


<a href="http://newday.com/song.mp3">first link</a> <div id="right_song">          <div style="font-size:15px;"><b>pitbull ft. chris brown - pitbull feat. chris brown - international love mp3</b></div>          <div style="clear:both;"></div>  <div style="float:left;">      <div style="float:left; height:27px; font-size:13px; padding-top:2px;">          <div style="float:left;">      <a href="http://secondurl.com/thisoneshouldonlyoutput" rel="nofollow" target="_blank" style="color:green;">second link</a></div>';  

i want out second link html using pregmatch_all. current regex looks this:

preg_match_all("/\<a.+?href=(\"|')(?!javascript:|#)(.+?)\.mp3(\"|')/i", $html, $urlmatches); 

this works fine , 2 links output, want second 1 output without .mp3 extension. please me

description

this regex will

  • match first anchor tag after <div id="rigth_song"> has href attribute value ends .mp3
  • will avoid many of edge cases make matching html text regular expression difficult.

<div\sid="right_song">.*?<a(?=\s|>)(?=(?:[^>=]|='[^']*'|="[^"]*"|=[^'"][^\s>]*)*?\shref=(['"]?)(.*?\.mp3)\1(?:\s|\/>|>))(?:[^>=]|='[^']*'|="[^"]*"|=[^'"][^\s>]*)*>.*?<\/a>

enter image description here

example

sample text

note difficult edge case in second anchor tag, string href="bad.mp3" nested inside attribute value; there javascript greater sign > inside value; , real href attribute without quotes.

<a href="http://newday.com/song.mp3">first link</a> <div id="right_song">          <div style="font-size:15px;"><b>pitbull ft. chris brown - pitbull feat. chris brown - international love mp3</b></div>          <div style="clear:both;"></div>  <div style="float:left;">      <div style="float:left; height:27px; font-size:13px; padding-top:2px;">          <div style="float:left;">  <a onmouseover=' href="bad.mp3" ; if ( 6 > x ) {funrotate(href); } ; ' href="http://secondurl.com/thisoneshouldonlyoutput.mp3">first link</a> </div> 

code

<?php $sourcestring="your source string"; preg_match('/<div\sid="right_song">.*?<a(?=\s|>)(?=(?:[^>=]|=\'[^\']*\'|="[^"]*"|=[^\'"][^\s>]*)*?\shref=([\'"]?)(.*?\.mp3)\1(?:\s|\/>|>))(?:[^>=]|=\'[^\']*\'|="[^"]*"|=[^\'"][^\s>]*)*>.*?<\/a> /imsx',$sourcestring,$matches); echo "<pre>".print_r($matches,true); ?> 

match

group 0 gets text <div through including full matching anchor tag
group 1 gets opening quote around href value referenced later
group 2 gets href value

[0] => <div id="right_song">          <div style="font-size:15px;"><b>pitbull ft. chris brown - pitbull feat. chris brown - international love mp3</b></div>          <div style="clear:both;"></div>  <div style="float:left;">      <div style="float:left; height:27px; font-size:13px; padding-top:2px;">          <div style="float:left;">  <a onmouseover=' href="bad.mp3" ; if ( 6 > x ) {funrotate(href); } ; ' href="http://secondurl.com/thisoneshouldonlyoutput.mp3">first link</a> [1] => " [2] => http://secondurl.com/thisoneshouldonlyoutput.mp3 

Comments

Popular posts from this blog

html5 - What is breaking my page when printing? -

html - Unable to style the color of bullets in a list -

c# - must be a non-abstract type with a public parameterless constructor in redis -