python - Regex multiline- How to grab a portion of a page source -
sorry if question has been brought before, find python regex documentation quite hard understand due lack of examples. want grab block of page source later parsed again. example:
<div id="viewed"><div class="shortstory-block"> <div class="shortstoey-block-image"> <a href="...."><img src="/uploads/posts/cov.jpg" alt="instance 1"/></a> <span class="format"><a href="http://www..../">something</a></span> </div> <a href="http://....."><span class="shortstory-block-title" style="text-decoration:none !important;"> </span> </a> </div><div class="shortstory-block"> <div class="shortstoey-block-image"> <a href="...."><img src="/uploads/posts/cov.jpg" alt="something 2"/></a> <span class="format"><a href="http://www.website/xfsearch/smth/">something</a></span> </div> <a href="http://web.html"><span class="shortstory-block-title" style="text-decoration:none !important;"> </span> </a> </div> (* x times) <div id="rated">....
i have page source in variable (html_source) , want define variable block of code (between div id="viewed" , div id="rated"). want grab despite \n or \r can find between 2 instances.
can point me in right direction (the regex expression)?
thanks in advance
if indeed trying find between 2 elements of text can use following regex:
import re open('yourfile') fin: page_source = fin.read() start_text = re.escape('<div id="viewed">') until_text = re.escape('<div id="rated">') match_text = re.search('{}(.*?){}'.format(start_text, until_text), page_source, flags=re.dotall) if match_text: print match_text.group(1)
Comments
Post a Comment