string - Python's "re" module not working? -


i'm using python's "re" module follows:

request = get("http://www.allmusic.com/album/warning-mw0000106792") print re.findall('<hgroup>(.*?)</hgroup>', request) 

all i'm doing getting html of this site, , looking particular snippet of code:

<hgroup>     <h3 class="album-artist">         <a href="http://www.allmusic.com/artist/green-day-mn0000154544">green day</a>        </h3>      <h2 class="album-title">         warning        </h2> </hgroup> 

however, continues print empty array. why this? why can't re.findall find snippet?

the html parsing on multiple lines. need pass re.dotall flag findall this:

print re.findall('<hgroup>(.*?)</hgroup>', request, re.dotall) 

this allows . match newlines, , returns correct output.

@jsalonen right, of course, parsing html regex tricky problem. however, in small cases one-off script i'd it's acceptable.


Comments

Popular posts from this blog

html5 - What is breaking my page when printing? -

c# - must be a non-abstract type with a public parameterless constructor in redis -

ajax - PHP/JSON Login script (Twitter style) not setting sessions -