perl - How to remove lines from a list which can be found within other longer lines in the list? -


i have file, list.txt, this:

cat bear tree catfish fish bear 

i need delete lines found somewhere else in document, either duplicate line, or found within longer line. e.g., lines "bear" , "bear" same, 1 of these deleted; "cat" can found within "catfish", "cat" deleted. output this:

catfish tree bear 

how can delete duplicate lines including lines found within longer lines in list?

so far, have this:

#!/bin/bash touch list.tmp while read -r line     found="$(grep -c $line list.tmp)"     if [ "$found" -eq "1" ]             echo $line >> list.tmp         echo $line" added"     else         echo "not added." fi done < list.txt 

if o(n^2) doesn't bother you:

#!/usr/bin/env perl  use strict; use warnings; use list::moreutils qw{any};  @words; $word (     sort {length $b <=> length $a}     {         %words;         @words = <>;         chomp @words;         @words{@words} = ();         keys %words;     } ) {     push @words, $word unless {         $re = qr/\q$word/;         {m/$re/} @words;     }; }  print "$_\n" @words; 

if o(nlogn) have use sort of trie approach. example using suffix tree:

#!/usr/bin/env perl  use strict; use warnings; use tree::suffix;  $tree = tree::suffix->new();  @words; $word (     sort {length $b <=> length $a}     {         %words;         @words = <>;         chomp @words;         @words{@words} = ();         keys %words;     } ) {     unless ($tree->find($word)){         push @words, $word;         $tree->insert($word);     }; }  print "$_\n" @words; 

Comments

Popular posts from this blog

html5 - What is breaking my page when printing? -

c# - must be a non-abstract type with a public parameterless constructor in redis -

ajax - PHP/JSON Login script (Twitter style) not setting sessions -