python - Regex multiline- How to grab a portion of a page source -


sorry if question has been brought before, find python regex documentation quite hard understand due lack of examples. want grab block of page source later parsed again. example:

    <div id="viewed"><div class="shortstory-block">      <div class="shortstoey-block-image">         <a href="...."><img src="/uploads/posts/cov.jpg" alt="instance 1"/></a>         <span class="format"><a href="http://www..../">something</a></span>     </div>      <a href="http://....."><span class="shortstory-block-title" style="text-decoration:none !important;">             </span>     </a>  </div><div class="shortstory-block">      <div class="shortstoey-block-image">         <a href="...."><img src="/uploads/posts/cov.jpg" alt="something 2"/></a>         <span class="format"><a href="http://www.website/xfsearch/smth/">something</a></span>     </div>      <a href="http://web.html"><span class="shortstory-block-title" style="text-decoration:none !important;">             </span>     </a>  </div>   (* x times)      <div id="rated">.... 

i have page source in variable (html_source) , want define variable block of code (between div id="viewed" , div id="rated"). want grab despite \n or \r can find between 2 instances.

can point me in right direction (the regex expression)?

thanks in advance

if indeed trying find between 2 elements of text can use following regex:

import re  open('yourfile') fin:     page_source = fin.read()  start_text = re.escape('<div id="viewed">') until_text = re.escape('<div id="rated">') match_text = re.search('{}(.*?){}'.format(start_text, until_text), page_source, flags=re.dotall) if match_text:     print match_text.group(1) 

Comments

Popular posts from this blog

html5 - What is breaking my page when printing? -

c# - must be a non-abstract type with a public parameterless constructor in redis -

ajax - PHP/JSON Login script (Twitter style) not setting sessions -