Remove contents between script and style tags in Objective-C -
alright, working on web crawler can take webpages , convert them passages of text. remove tags themselves, found on stack overflow:
- (nsstring *) striptags:(nsstring *)str { nsmutablestring *ms = [nsmutablestring stringwithcapacity:[str length]]; nsscanner *scanner = [nsscanner scannerwithstring:str]; [scanner setcharacterstobeskipped:nil]; nsstring *s = nil; while (![scanner isatend]) { [scanner scanuptostring:@"<" intostring:&s]; if (s != nil) [ms appendstring:s]; [scanner scanuptostring:@">" intostring:null]; if (![scanner isatend]) [scanner setscanlocation:[scanner scanlocation]+1]; s = nil; } return ms; }
and works, however, removes tags, not contents between script , style tags (as don't want contents between tags removed result in empty string).
is there way can have script , style tags truncated?
thanks lot in advance.
edit:
i have tried changing code to:
- (nsstring *) striptags:(nsstring *)str { nsmutablestring *ms = [nsmutablestring stringwithcapacity:[str length]]; nsscanner *scanner = [nsscanner scannerwithstring:str]; [scanner setcharacterstobeskipped:nil]; nsstring *s = nil; while (![scanner isatend]) { [scanner scanuptostring:@"<script" intostring:&s]; if (s != nil) [ms appendstring:s]; [scanner scanuptostring:@"script>" intostring:null]; if (![scanner isatend]) [scanner setscanlocation:[scanner scanlocation]+1]; [scanner scanuptostring:@"<" intostring:&s]; if (s != nil) [ms appendstring:s]; [scanner scanuptostring:@">" intostring:null]; if (![scanner isatend]) [scanner setscanlocation:[scanner scanlocation]+1]; s = nil; } return ms; }
but scripts , css still being included
you can edit scanner code can check tags. if tag 1 want remove can scan closing tag , discard string. not can store / append string.
read tag start (<
)' read tag can check is. read tag close , either drop or save it.
start (typed inline , not tested in way):
while (![scanner isatend]) { [scanner scanuptostring:@"<" intostring:&s]; if (s != nil) [ms appendstring:s]; [scanner scanuptostring:@">" intostring:&t]; if ([t isequaltostring:@"tagtoignore"]) { [scanner scanuptostring:@"<" intostring:null]; [scanner setscanlocation:[scanner scanlocation]-1]; s = nil; t = nil; continue; } if (![scanner isatend]) [scanner setscanlocation:[scanner scanlocation]+1]; s = nil; t = nil; }
Comments
Post a Comment