Table of ContentsThe Regular Expression Grammar Rules For Matching Storing Regexes Using Regexes Regex Memory Interpolating into Regexes Evaluated Replacements Extended Regexes Greed Tips |
the quantifier... | equivalent to... |
---|---|
sub-pattern+ | sub-pattern{1,} |
sub-pattern* | sub-pattern{0,} |
sub-pattern? | sub-pattern{0,1} |
if ($filename =~ m/^(\/?([^\/]+\/)*([^\/]+)(\.[^\/]+)?$/) { # it is an absolute filename! } |
if ($filename = m{^(/?([^/]+/)*([^/]+)(\.[^/]+)?$}) { # it is an absolute filename! } |
1 my @strings = ("I am Bobby", "I AM BOBBY", "i am bobby"); 2 3 foreach my $string (@strings) { 4 print " case-sensitive: "; 5 print "\"$string\" " . ($string =~ /Bob/ ? "matches" : "doesn't match") . " /Bob/\n"; 6 print "case-insensitive: "; 7 print "\"$string\" " . ($string =~ /Bob/i ? "matches" : "doesn't match") . " /Bob/i\n\n"; 8 } |
case-sensitive: "I am Bobby" matches /Bob/ case-insensitive: "I am Bobby" matches /Bob/i case-sensitive: "I AM BOBBY" doesn't match /Bob/ case-insensitive: "I AM BOBBY" matches /Bob/i case-sensitive: "i am bobby" doesn't match /Bob/ case-insensitive: "i am bobby" matches /Bob/i |
if ($zip_data =~ m/^\d{5}(-\d{4})?$/) { print "Valid ZIP/ZIP+4 code\n"; } |
1 my $rgx_zip = qr/^\d{5}(-\d{4})?$/; 2 3 if ($zip_data =~ m/$rgx_zip/) { 4 print "Valid ZIP/ZIP+4 code\n"; 5 } |
1 my @strings = qw(apple banana cucumber durian potato); 2 foreach my $string (@strings) { 3 if ($string =~ m/^([bcdfghjklmnpqrstvwxyz][aeiouy])+$/) { 4 print "\"$string\" matches CVCV...CV\n"; 5 } 6 else { 7 print "\"$string\" does not match\n"; 8 } 9 } |
"apple" does not match "banana" matches CVCV...CV "cucumber" does not match "durian" does not match "potato" matches CVCV...CV |
1 my $string = "this 47 string 390 has numbers 000 embedded in it."; 2 my $match_count = 0; 3 while (1) { 4 if ($string =~ m/\d+/g) { 5 print "Found match ending before position " . pos($string) . "\n"; 6 ++$match_count; 7 } 8 else { 9 print "No more matches!\n"; 10 last; 11 } 12 } 13 print "Found $match_count total matches.\n"; |
Found match ending before position 7 Found match ending before position 18 Found match ending before position 34 No more matches! Found 3 total matches. |
my $string = "Bahama mama"; print "Original String: \"$string\"\n"; $string =~ s/ma/pop/; print "Modified String: \"$string\"\n"; |
Original String: "Bahama mama" Modified String: "Bahapop mama" |
1 my $string = "Bahama mama"; 2 print "Original String: \"$string\"\n"; 3 my $repl_count = ($string =~ s/ma/pop/g); 4 print "Modified String: \"$string\"\n"; 5 print " - made $repl_count replacements\n"; |
Original String: "Bahama mama" Modified String: "Bahapop poppop" - made 3 replacements |
my $orig_string = "Bahama mama"; (my $mod_string = $orig_string) =~ s/ma$/pop/; print "Original String: \"$orig_string\"\n"; print "Modified String: \"$mod_string\"\n"; |
Original String: "Bahama mama" Modified String: "Bahama mapop" |
"constant string" =~ s/a/b/g; |
Can't modify constant item in substitution (s///) at subst_const.pl line 1, near "s/a/b/g;" Execution of subst_const.pl aborted due to compilation errors. |
(my $string = "This is a non-lvalue") =~ s/a/x/g; print $string; |
This is x non-lvxlue |
my $string = "Johnson, Bill,345-F24-134A , x1457 ,Nashua"; my @fields = split ",", $string; print Dumper \@fields; |
$VAR1 = [ 'Johnson', ' Bill', '345-F24-134A ', ' x1457 ', 'Nashua' ]; |
Using a regex as the delimiter yields much better results:
my $string = "Johnson, Bill,345-F24-134A , x1457 ,Nashua"; my @fields = split /\s*,\s*/, $string; print Dumper \@fields; |
$VAR1 = [ 'Johnson', 'Bill', '345-F24-134A', 'x1457', 'Nashua' ]; |
my ($key, $value) = split /\s*=\s*/, read_config_line(), 2; |
my @chars = split //, "yee-ha!"; print Dumper \@chars; |
$VAR1 = [ 'y', 'e', 'e', '-', 'h', 'a', '!' ]; |
1 my $str = "phone number is 603-432-8696."; 2 my $rgx_phone = qr/(\d{3})-(\d{3})-(\d{4})/; 3 4 print "Searching the string \"$str\"\n"; 5 if ($str =~ $rgx_phone) { 6 print "Found a phone number. Submatches are [$1], [$2], [$3].\n"; 7 } 8 else { 9 print "Didn't find a phone number\n"; 10 } |
Searching the string "phone number is 603-432-8696." Found a phone number. Submatches are [603], [432], [8696]. |
my $string = "Julie ate a peach."; my @matches = ($string =~ m/([aeiou])([aeiou])/); print Dumper \@matches; |
$VAR1 = [ 'i', 'e' ]; |
1 my $rgx_zip_code = qr/^(\d{5})(-\d{4})?$/; 2 3 my $zip_code = "03060-1234"; 4 my $zip_code_base = ($zip_code =~ m/$rgx_zip_code/)[0]; 5 6 print "base of ZIP code \"$zip_code\" is \"$zip_code_base\"\n"; |
base of ZIP code "03060-1234" is "03060" |
my $string = "Julie ate a peach."; my @matches = ($string =~ m/([aeiou])([aeiou])/g); print Dumper \@matches; |
$VAR1 = [ 'i', 'e', 'e', 'a' ]; |
my $string = "Julie ate a peach."; my @matches = ($string =~ m/[aeiou]{2}/g); print Dumper \@matches; |
$VAR1 = [ 'ie', 'ea' ]; |
1 my $str = 'name = Mr. John Smith'; 2 my $rgx_name = qr/[A-Z][a-z]+\.\s+([A-Z][a-z]+)\s+([A-Z][a-z]+)/; 3 4 print "string before: \"$str\"\n"; 5 $str =~ s/$rgx_name/$2, $1/; 6 print "string after: \"$str\"\n"; |
string before: "name = Mr. John Smith" string after: "name = Smith, John" |
1 my $text_ok = "This is the hour of our discontent"; 2 my $text_repeat = "This is the the hour of our discontent"; 3 my $rgx_repeat_words = qr/\b([a-z]+)\s+\1/i; 4 5 foreach my $text ($text_ok, $text_repeat) { 6 print "- searching for repeated words in string:\n"; 7 print " \"$text\"\n"; 8 if ($text =~ m/$rgx_repeat_words/) { 9 print " found repeated word \"$1\"\n"; 10 } 11 else { 12 print " no repeated words found\n"; 13 } 14 } |
- searching for repeated words in string: "This is the hour of our discontent" no repeated words found - searching for repeated words in string: "This is the the hour of our discontent" found repeated word "the" |
1 my @strings = qw(800-123-4567 987-6543 90210-1000); 2 my $rgx = qr/^((?<area_code>\d{3})-)?(?<number>\d{3}-\d{4})$/; 3 4 foreach my $str (@strings) { 5 print "parsing \"$str\"\n"; 6 if ($str =~ $rgx) { 7 print " - Number was $+{number}\n"; 8 print " - Area code was " . ($+{area_code} // "not present") . "\n"; 9 } 10 else { 11 print " - Not a phone number!\n"; 12 } 13 } |
parsing "800-123-4567" - Number was 123-4567 - Area code was 800 parsing "987-6543" - Number was 987-6543 - Area code was not present parsing "90210-1000" - Not a phone number! |
1 my @names = ("Mr. John Smith", "Bill Jackson", "Mrs. White", "Bill"); 2 my $rgx_lastname = qr/^(?:[A-Z][a-z]+\.)?\s*(?:[A-Z][a-z]+)?\s+([A-Z][a-z]+)/; 3 4 foreach my $name (@names) { 5 if ($name =~ m/$rgx_lastname/) { 6 print "For the name \"$name\", got last name \"$1\"\n"; 7 } 8 else { 9 print "The name \"$name\" did not match\n"; 10 } 11 } |
For the name "Mr. John Smith", got last name "Smith" For the name "Bill Jackson", got last name "Jackson" For the name "Mrs. White", got last name "White" The name "Bill" did not match |
1 # $1 will store the label, if present 2 # | 3 # | $2 will store the LAST set of 4 # | digits followed by a dash matched. 5 # | | 6 # | | $3 will store the LAST set of 7 # | | digits matched 8 # | | | 9 # | | | $4 contains $5 or $6 10 # | | | | 11 my $rgx_digit_cluster_str = qr/ ^ (\w+:)? ( (\d+)-? )* ( (X)+ | (Y+) ) (\.?) $ /x; 12 # | | | 13 # $5 contains one X if any are present | | 14 # | | 15 # $6 contains one or more Ys, if any are present | 16 # | 17 # $7 contains a trailing dot or is defined-but-empty if such is absent 18 # 19 # note the x modifier means spaces are ignored; we will discuss this later. 20 21 my @test_strings = ("123-456-789X", "98765-4321-XXX.", "A:123-456-789Y", 22 "A:98765-4321-YYY.", "A:X", "123--456"); 23 24 foreach my $test_string (@test_strings) { 25 print "TEST STRING: \"$test_string\"... "; 26 if ($test_string =~ m/$rgx_digit_cluster_str/) { 27 print "matches\n"; 28 printf " - \$1 is %-18s", (defined($1) ? "\"$1\"" : "undefined"); 29 printf " - \$4 is %-18s\n", (defined($4) ? "\"$4\"" : "undefined"); 30 printf " - \$2 is %-18s", (defined($2) ? "\"$2\"" : "undefined"); 31 printf " - \$5 is %-18s\n", (defined($5) ? "\"$5\"" : "undefined"); 32 printf " - \$3 is %-18s", (defined($3) ? "\"$3\"" : "undefined"); 33 printf " - \$6 is %-18s\n", (defined($6) ? "\"$6\"" : "undefined"); 34 printf " - \$7 is %-18s\n\n", (defined($7) ? "\"$7\"" : "undefined"); 35 } 36 else { 37 print "doesn't match\n\n"; 38 } 39 } |
TEST STRING: "123-456-789X"... matches - $1 is undefined - $4 is "X" - $2 is "789" - $5 is "X" - $3 is "789" - $6 is undefined - $7 is "" TEST STRING: "98765-4321-XXX."... matches - $1 is undefined - $4 is "XXX" - $2 is "4321-" - $5 is "X" - $3 is "4321" - $6 is undefined - $7 is "." TEST STRING: "A:123-456-789Y"... matches - $1 is "A:" - $4 is "Y" - $2 is "789" - $5 is undefined - $3 is "789" - $6 is "Y" - $7 is "" TEST STRING: "A:98765-4321-YYY."... matches - $1 is "A:" - $4 is "YYY" - $2 is "4321-" - $5 is undefined - $3 is "4321" - $6 is "YYY" - $7 is "." TEST STRING: "A:X"... matches - $1 is "A:" - $4 is "X" - $2 is undefined - $5 is "X" - $3 is undefined - $6 is undefined - $7 is "" TEST STRING: "123--456"... doesn't match |
1 sub print_match_vars { 2 print "\n --> "; 3 print defined $1 ? "\$1=\"$1\"; " : "\$1 is undefined; "; 4 print defined $2 ? "\$2=\"$2\"; " : "\$2 is undefined; "; 5 print defined $3 ? "\$3=\"$3\"\n" : "\$3 is undefined\n"; 6 } 7 8 my $string = "abcdefghijklmnopqrstuvwxyz"; 9 10 print "The pattern match variables start out undefined:"; 11 print_match_vars(); 12 13 print "After a successful match, those used in the pattern are defined: "; 14 print $string =~ m/a(.+)x(.+)z/ ? "(match)" : "(no match)"; 15 print_match_vars(); 16 17 print "After a failed match, they are unchanged: "; 18 print $string =~ m/([0-9]+)/ ? "(match)" : "(no match)"; 19 print_match_vars(); 20 21 print "After a successful match, they are set again based on the pattern: "; 22 print $string =~ m/([^aeiou])/ ? "(match)" : "(no match)"; 23 print_match_vars(); 24 25 print "After a successful match, they are all undefined: "; 26 print $string =~ m/abc/ ? "(match)" : "(no match)"; 27 print_match_vars(); |
The pattern match variables start out undefined: --> $1 is undefined; $2 is undefined; $3 is undefined After a successful match, those used in the pattern are defined: (match) --> $1="bcdefghijklmnopqrstuvw"; $2="y"; $3 is undefined After a failed match, they are unchanged: (no match) --> $1="bcdefghijklmnopqrstuvw"; $2="y"; $3 is undefined After a successful match, they are set again based on the pattern: (match) --> $1="b"; $2 is undefined; $3 is undefined After a successful match, they are all undefined: (match) --> $1 is undefined; $2 is undefined; $3 is undefined |
1 # The next line sets $1, $2, and $3. All other match variables are unset. 2 if (my $code =~ m/^(\d+) LET ([A-Z]+) = (.+)$/) { 3 4 my ($line_num, $var_name, $expression) = ($1, $2, $3); 5 6 # The next line obviously sets the match variables 7 # * assuming the match succeeeds 8 if ($expression =~ m/^(\d+|[A-Z]+) ([-+*\/]) (\d+|[A-Z]+)$/) { 9 10 my ($operand1, $operator, $operand2) = ($1, $2, $3); 11 12 # The next line NON-OBVIOUSLY resets the match variables, EVEN THOUGH IT CONTAINS NO PARENS 13 # * assuming the match succeeeds 14 $operand1 = $symtab{$operand1} if ($operand1 =~ m/^[A-Z]+$/); 15 # ditto 16 $operand2 = $symtab{$operand2} if ($operand2 =~ m/^[A-Z]+$/); 17 18 # The next line MIGHT reset the match variables, depending on the condition 19 # Discussion question: when will the match variables be reset and when not? 20 if ($operator eq "/" and $operand2 !~ m/^0+$/) { 21 warn "Division by zero!\n"; 22 } 23 else { 24 # The next line SNEAKILY resets match variables because--and you can't 25 # tell from here!!--the evaluate_expr() function uses a regex 26 my $expr_value = evaluate_expr($operand1, $operand2, $operator); 27 } 28 29 } 30 } |
1 my $a_plus = "A+"; 2 3 my @students = ("Bill: A", "Jill: B+", "Will: N/A", "Gil: A+", "Phil: F", "Fran: A-"); 4 5 foreach my $student (@students) { 6 if ($student =~ m/$a_plus/) { 7 print "$student\n"; 8 } 9 } |
Bill: A Will: N/A Gil: A+ Fran: A- |
1 my $a_plus = "A+"; 2 3 my @students = ("Bill: A", "Jill: B+", "Will: N/A", "Gil: A+", "Phil: F", "Fran: A-"); 4 5 foreach my $student (@students) { 6 if ($student =~ m/\Q$a_plus\E/) { 7 print "$student\n"; 8 } 9 } |
Gil: A+ |
1 my $rgx_zip = qr/\d{5}(-\d{4})?/; 2 my $rgx_phone = qr/\d{3}-\d{3}-\d{4}/; 3 my $rgx_idnum = qr/[A-Z]{2}\d{5}-[A-Z]{3}/; 4 5 if ($data =~ m/^ZIP=($rgx_zip), PH=($rgx_phone), ID=($rgx_idnum)$/) { 6 print "Data is valid!\n"; 7 } |
$rgx_email = qr/^($rgx_user)@($rgx_host)$/; |
$rgx_host = qr/(?:$rgx_dns_name|$rgx_ipaddr)/; $rgx_user = qr/(?:[a-zA-Z0-9._-]+)/; $rgx_email = qr/^($rgx_user)@($rgx_host)$/; |
1 $rgx_octet = qr/(?:\d{1,2}|[01]\d{2}|2[0-4]\d|25[0-5])/; 2 $rgx_ipaddr = qr/(?:(?:$rgx_octet\.){3}$rgx_octet)/; 3 $rgx_dns_tld = qr/(?:[a-zA-Z]{2,4})/; 4 $rgx_dns_comp = qr/(?:[a-zA-Z0-9]+[a-zA-Z0-9-]*[a-zA-Z0-9]+|[a-zA-Z0-9]+)/; 5 $rgx_dns_name = qr/(?:(?:$rgx_dns_comp\.)+$rgx_dns_tld)/; 6 $rgx_host = qr/(?:$rgx_dns_name|$rgx_ipaddr)/; 7 $rgx_user = qr/(?:[a-zA-Z0-9._-]+)/; 8 $rgx_email = qr/^($rgx_user)@($rgx_host)$/; |
$rgx_email = qr/^([a-zA-Z0-9._-]+)@((?:(?:(?:[a-zA-Z0-9]+[a-zA-Z0-9-]* [a-zA-Z0-9]+|[a-zA-Z0-9]+)\.)+(?:[a-zA-Z]{2,4}))| (?:(?:(?:\d{1,2}|[01]\d{2}|2[0-4]\d|25[0-5])\.){3} (?:\d{1,2}|[01]\d{2}|2[0-4]\d|25[0-5])))$/; |
my $rgx1 = qr/a?b/; my $rgx2 = qr/c|d/; my $rgx_combined = qr/^$rgx1$rgx2$/; |
1 my @months = qw(January February March April May June July 2 August September October November December); 3 4 my $month_abbrs = join "|", map { uc substr $_, 0, 3 } @months; 5 6 my $rgx_date = qr/^([0-9]{4})-($month_abbrs)-([0-9]{2})$/i; 7 8 print "Regex to match dates is /$rgx_date/\n"; |
Regex to match dates is /(?i-xsm:^([0-9]{4})-(JAN|FEB|MAR|APR|MAY|JUN|JUL|AUG|SEP|OCT|NOV|DEC)-([0-9]{2})$)/ |
1 my $rgx_1st_word = qr/^(\w+)\b/; 2 my $sentence = "Call me Ishmael"; 3 4 print "string before modification:\n"; 5 print "$sentence\n\n"; 6 7 my $mod_sentence_without = $sentence; 8 $mod_sentence_without =~ s/$rgx_1st_word/uc $1/; 9 print "without the e modifier:\n"; 10 print "$mod_sentence_without\n\n"; 11 12 my $mod_sentence_with = $sentence; 13 $mod_sentence_with =~ s/$rgx_1st_word/uc $1/e; 14 print "with the e modifier:\n"; 15 print "$mod_sentence_with\n"; |
string before modification: Call me Ishmael without the e modifier: uc Call me Ishmael with the e modifier: CALL me Ishmael |
$rgx_int = qr/^([-+])?(\d+)(?:[eE](\d{1,3}))?$/; |
1 $rgx_int = qr/ 2 ^ # beginning of string 3 4 ( # begin capture $1: sign 5 [-+] # plus or minus 6 ) # end capture $1 7 ? # sign is optional 8 9 ( # capture $2: coefficient 10 \d+ # one or more digits 11 ) # end capture $2 12 13 (?: # group 14 [eE] # literal e or E 15 ( # capture $3: exponent 16 \d{1,3} # one to three digits 17 ) # end capture $3 18 ) 19 ? # exponent is optional 20 21 $ # end of string 22 /x"; |
1 my $html_line = 'Here is a <B>bold</B> thing and an <I>italic</I> thing'; 2 print "HTML is: \"$html_line\"\n"; 3 print "(Trying to strip HTML tags)\n"; 4 $html_line =~ s/<.+>//g; 5 print "TEXT is: \"$html_line\"\n"; |
HTML is: "Here is a <B>bold</B> thing and an <I>italic</I> thing" (Trying to strip HTML tags) TEXT is: "Here is a thing" |
1 my $html_line = 'Here is a <B>bold</B> thing and an <I>italic</I> thing'; 2 print "HTML is: \"$html_line\"\n"; 3 print "(Trying to strip HTML tags)\n"; 4 $html_line =~ s/<.+?>//g; 5 print "TEXT is: \"$html_line\"\n"; |
HTML is: "Here is a <B>bold</B> thing and an <I>italic</I> thing" (Trying to strip HTML tags) TEXT is: "Here is a bold thing and an italic thing" |
my $string = "This is some text."; print "Original String: \"$string\"\n"; $string =~ tr/aeiou/AEIOU/; print "Modified String: \"$string\"\n"; |
Original String: "This is some text." Modified String: "ThIs Is sOmE tExt." |
my $string = "This is some text."; print "Original String: \"$string\"\n"; $string =~ tr/aeiou/!/; print "Modified String: \"$string\"\n"; |
Original String: "This is some text." Modified String: "Th!s !s s!m! t!xt." |
my $string = "This is some text."; print "Original String: \"$string\"\n"; $string =~ tr/aeiou/?/c; print "Modified String: \"$string\"\n"; |
Original String: "This is some text." Modified String: "??i??i???o?e??e???" |
my $string = "This string has some weird spacing. "; print "Original String: \"$string\"\n"; $string =~ tr/ / /s; # collapse strings of spaces print "Modified String: \"$string\"\n"; |
Original String: "This string has some weird spacing. " Modified String: "This string has some weird spacing. " |
my $string = "This is some text."; print "Original String: \"$string\"\n"; $string =~ tr/aeioubcdfghjklmnpqrstvwxyz/AEIOU/d; print "Modified String: \"$string\"\n"; |
Original String: "This is some text." Modified String: "TI I OE E." |
my $string = "This is some text."; my $count = ($string =~ tr/aeiou/aeiou/); print "String \"$string\" contains $count vowels.\n"; |
String "This is some text." contains 5 vowels. |
Modifier | Effect | m// | s/// | qr// | Bad Mnemonic? |
---|---|---|---|---|---|
/i | match case-Insensitively | yes | yes | yes | no |
/s | include \n in the dot class (force string to be handled as a Single line) | yes | yes | yes | yes |
/m | ^ and $ can match before or after an ebmedded newline (force string to be handled as Multiple lines) | yes | yes | yes | yes |
/x | eXtended regex (can include comments and whitespace) | yes | yes | yes | no |
/o | only compile pattern Once | yes | yes | – | no |
/g | match/substitute Globally | yes | yes | – | no |
/c | Continue after a failed global match | with /g | – | – | no |
/e | Evaluate replacement as expression | – | yes | – | no |
if ("abc" eq lc substr($string, 1+index($string, " "), 3)) { ... |
if ($string =~ m/^[^ ]* ABC/i) { ... |
This page was downloaded on
21-June-2016 at 12:18pm. |
These notes are © 2007-2016 by Jeremy Holland. All rights reserved.
NotesMaker is © 2007-2016 by Jeremy Holland. All rights reserved. |