Beautifulsoup Python nested text -

- January 15, 2011

i wanted obtain text "some text" nested within tags this:

<tr>    <td>cme globex</td>    <td colspan="4">    text    <a target="_blank"" href="http://...>view rollover dates</a>    </td> </tr>

i .findall('tr') first, some_tr.findall('td', colspan=4) second , some_td.find(text=true). there more efficient way this? there way keep traversing through tags , find text?

you can use xpath expressions using lxml:

html = """<tr>    <td>cme globex</td>    <td colspan="4">    text    <a target="_blank"" href="http://...">view rollover dates</a>    </td> </tr>"""  import lxml.html  tree = lxml.html.fromstring(html) print tree.xpath('//tr/td[@colspan="4"]/text()')

not you're after...

another way maybe find anchor links "view rollover dates" , take preceding element...

from bs4 import beautifulsoup soup = beautifulsoup(html) in soup.find_all('a', text='view rollover dates'):     print a.previous_element

Search This Blog

Sharma

Beautifulsoup Python nested text -

Comments

Post a Comment

Popular posts from this blog

c# - must be a non-abstract type with a public parameterless constructor in redis -

c# - ReportViewer control - axd url -

ajax - PHP/JSON Login script (Twitter style) not setting sessions -