Beautifulsoup Python nested text -
i wanted obtain text "some text" nested within tags this:
<tr> <td>cme globex</td> <td colspan="4"> text <a target="_blank"" href="http://...>view rollover dates</a> </td> </tr> i .findall('tr') first, some_tr.findall('td', colspan=4) second , some_td.find(text=true). there more efficient way this? there way keep traversing through tags , find text?
you can use xpath expressions using lxml:
html = """<tr> <td>cme globex</td> <td colspan="4"> text <a target="_blank"" href="http://...">view rollover dates</a> </td> </tr>""" import lxml.html tree = lxml.html.fromstring(html) print tree.xpath('//tr/td[@colspan="4"]/text()') not you're after...
another way maybe find anchor links "view rollover dates" , take preceding element...
from bs4 import beautifulsoup soup = beautifulsoup(html) in soup.find_all('a', text='view rollover dates'): print a.previous_element
Comments
Post a Comment