regex - How to capture this optional multiline string? -


how can capture optional group? (i mean consuming multiple lines) enter image description here

green group->optional group

red line->new segment(same patterns repeat)

my pattern:

(\t{2}<idx:entry name="dic">\r\n)(\t{4}<idx:orth>)(.+\r\n)(\t{4}<idx:infl>[^</idx:infl>]+)? 

enter image description here

any idea how capture optional group doesn't have fixed length?

try this:

\s*<idx:entry name="dic">\s*<idx:orth>[^<]*\s*(<idx:infl>\s*.*\s*</idx:infl>) 

whitespace between tags ignored in xml shouldn't have specify exact number of tabs , linebreaks in regex. use \s signify whitespace (this includes spaces, tabs , line breaks).

everything in between parantheses () captured , can access group using \1 or $1 depending on regex engine.

however, when parsing xml it's better idea use proper dom parser xpath.


Comments

Popular posts from this blog

html5 - What is breaking my page when printing? -

html - Unable to style the color of bullets in a list -

c# - must be a non-abstract type with a public parameterless constructor in redis -