string - Python's "re" module not working? -
i'm using python's "re" module follows:
request = get("http://www.allmusic.com/album/warning-mw0000106792") print re.findall('<hgroup>(.*?)</hgroup>', request)
all i'm doing getting html of this site, , looking particular snippet of code:
<hgroup> <h3 class="album-artist"> <a href="http://www.allmusic.com/artist/green-day-mn0000154544">green day</a> </h3> <h2 class="album-title"> warning </h2> </hgroup>
however, continues print empty array. why this? why can't re.findall find snippet?
the html parsing on multiple lines. need pass re.dotall
flag findall
this:
print re.findall('<hgroup>(.*?)</hgroup>', request, re.dotall)
this allows .
match newlines, , returns correct output.
@jsalonen right, of course, parsing html regex tricky problem. however, in small cases one-off script i'd it's acceptable.
Comments
Post a Comment