To push unique elements read from file using regex into array-Perl -
here file:
heaven heavenly heavenns abc heavenns heavennly
according code, heavenns
, heavennly
should pushed @myarr
, , should in array 1 time. how that?
my $regx = "heavenn\+"; $tmp=$regx; $tmp=~ s/[\\]//g; $regx=$tmp; print("\nnow regex:", $regx); $file = "myfilename.txt"; @myarr; open $fh, "<", $file; while ( $line = <$fh> ) { if ($line =~ /$regx/){ print $line; push (@myarr,$line); } } print ("\nmylist:", @myarr); #printing 2 times heavenns , heavennly
this perl, there's more 1 way (tmtowtdi). here's 1 of them:
#!/usr/bin/env perl use strict; use warnings; $regex = "heavenn+"; $rx = qr/$regex/; print "regex: $regex\n"; $file = "myfilename.txt"; %list; @myarr; open $fh, "<", $file or die "failed open $file: $?"; while ( $line = <$fh> ) { if ($line =~ $rx) { print $line; $list{$line}++; } } push @myarr, sort keys %list; print "mylist: @myarr\n";
sample output:
regex: heavenn+ heavenns heavenns heavennly mylist: heavennly heavenns
the sort isn't necessary (but presents data in sane order). add items array when count in $list{$line}
0. chomp input lines remove newline. etc.
what if want push particular words. example, if file is, 1. "heavenns hello" 2. "heavenns hi", "3.heavennly good". print 'heavenns' , 'heavennly'?
then have arrange capture word only. means refining regex. assuming want heavenn
@ start of word , don't mind alphabetic characters come after that, then:
#!/usr/bin/env perl use strict; use warnings; $regex = '\b(heavenn[a-za-z]*)\b'; # single quotes necessary! $rx = qr/$regex/; print "regex: $regex\n"; $file = "myfilename.txt"; %list; @myarr; open $fh, "<", $file or die "failed open $file: $?"; while ( $line = <$fh> ) { if ($line =~ $rx) { print $line; $list{$1}++; } } push @myarr, sort keys %list; print "mylist: @myarr\n";
data file:
1. "heavenns hello" 2. "heavenns hi", "3.heavennly good". d heaven heavenly heavenns abc heavenns heavennly
output:
regex: \b(heavenn[a-za-z]*)\b 1. "heavenns hello" 2. "heavenns hi", "3.heavennly good". d heavenns heavenns heavennly mylist: heavennly heavenns
note names in list no longer include newlines.
after chat
this version takes regex command line. script invocation is:
perl script.pl -p 'regex' [file ...]
it read standard input if no file specified on command line (better having fixed input file name — large margin). looks multiple occurrences of specified regex on each line, regex can preceded or followed (or both) 'word characters' specified \w
.
#!/usr/bin/env perl use strict; use warnings; use getopt::std; %opts; getopts('p:', \%opts) or die "usage: $0 [-p 'regex']\n"; $regex_base = 'heavenn'; #$regex_base = $argv[0] if defined $argv[0]; $regex_base = $opts{p} if defined $opts{p}; $regex = '\b(\w*' . ${regex_base} . '\w*)\b'; $rx = qr/$regex/; print "regex: $regex (compiled form: $rx)\n"; %list; @myarr; while (my $line = <>) { while ($line =~ m/$rx/g) { print $line; $list{$1}++; #$line =~ s///; } } push @myarr, sort keys %list; print "matched words: @myarr\n";
given input file:
1. "heavenns hello" 2. "heavenns hi", "3.heavennly good". d heavennsy! heavennnly output equally heavennnnly input! unheavenly host. heavens! heaves yacht! heaven heavens heavenly heavenns abc heavenns heavennly
you can outputs such as:
$ perl script.pl -p 'e\w*?ly' myfilename.txt regex: \b(\w*e\w*?ly\w*)\b (compiled form: (?^:\b(\w*e\w*?ly\w*)\b)) "3.heavennly good". d heavennsy! heavennnly output equally heavennnnly input! heavennsy! heavennnly output equally heavennnnly input! heavennsy! heavennnly output equally heavennnnly input! unheavenly host. heavens! heaves yacht! heavenly heavennly matched words: equally heavenly heavennly heavennnly heavennnnly unheavenly $ perl script.pl myfilename.txt regex: \b(\w*heavenn\w*)\b (compiled form: (?^:\b(\w*heavenn\w*)\b)) 1. "heavenns hello" 2. "heavenns hi", "3.heavennly good". d heavennsy! heavennnly output equally heavennnnly input! heavennsy! heavennnly output equally heavennnnly input! heavennsy! heavennnly output equally heavennnnly input! heavenns heavenns heavennly matched words: heavennly heavennnly heavennnnly heavenns heavennsy $
Comments
Post a Comment