perl - How to remove lines from a list which can be found within other longer lines in the list? -
i have file, list.txt
, this:
cat bear tree catfish fish bear
i need delete lines found somewhere else in document, either duplicate line, or found within longer line. e.g., lines "bear" , "bear" same, 1 of these deleted; "cat" can found within "catfish", "cat" deleted. output this:
catfish tree bear
how can delete duplicate lines including lines found within longer lines in list?
so far, have this:
#!/bin/bash touch list.tmp while read -r line found="$(grep -c $line list.tmp)" if [ "$found" -eq "1" ] echo $line >> list.tmp echo $line" added" else echo "not added." fi done < list.txt
if o(n^2) doesn't bother you:
#!/usr/bin/env perl use strict; use warnings; use list::moreutils qw{any}; @words; $word ( sort {length $b <=> length $a} { %words; @words = <>; chomp @words; @words{@words} = (); keys %words; } ) { push @words, $word unless { $re = qr/\q$word/; {m/$re/} @words; }; } print "$_\n" @words;
if o(nlogn) have use sort of trie approach. example using suffix tree:
#!/usr/bin/env perl use strict; use warnings; use tree::suffix; $tree = tree::suffix->new(); @words; $word ( sort {length $b <=> length $a} { %words; @words = <>; chomp @words; @words{@words} = (); keys %words; } ) { unless ($tree->find($word)){ push @words, $word; $tree->insert($word); }; } print "$_\n" @words;
Comments
Post a Comment