<!-- warn strict  -                                 case.pl: OK                                           case.pl.out: OK       -->
<!-- warn strict  -                                match.pl: OK                                          match.pl.out: OK       -->
<!-- warn strict  -                    progressive_match.pl: OK                              progressive_match.pl.out: OK       -->
<!-- warn strict  -                                subst.pl: OK                                          subst.pl.out: OK       -->
<!-- warn strict  -                         subst_global.pl: OK                                   subst_global.pl.out: OK       -->
<!-- warn strict  -                           subst_copy.pl: OK                                     subst_copy.pl.out: OK       -->
<!-- warn strict  -                          subst_const.pl: OK                                    subst_const.pl.out: OK       -->
<!-- warn strict  -            subst_nonlvalue_enpassant.pl: OK                      subst_nonlvalue_enpassant.pl.out: OK       -->
<!-- warn strict  -                          split_delim.pl: OK                                    split_delim.pl.out: OK       -->
<!-- warn strict  -                          split_regex.pl: OK                                    split_regex.pl.out: OK       -->
<!-- warn strict  -                          split_chars.pl: OK                                    split_chars.pl.out: OK       -->
<!-- warn strict  -                               memory.pl: OK                                         memory.pl.out: OK       -->
<!-- warn strict  -                         global_match.pl: OK                                   global_match.pl.out: OK       -->
<!-- warn strict  -                       slice_on_match.pl: OK                                 slice_on_match.pl.out: OK       -->
<!-- warn strict  -                     global_match_w_g.pl: OK                               global_match_w_g.pl.out: OK       -->
<!-- warn strict  -       global_match_w_g_no_submatches.pl: OK                 global_match_w_g_no_submatches.pl.out: OK       -->
<!-- warn strict  -                              sub_ref.pl: OK                                        sub_ref.pl.out: OK       -->
<!-- warn strict  -                             back_ref.pl: OK                                       back_ref.pl.out: OK       -->
<!-- warn strict  -                       named_cap_exec.pl: OK                                 named_cap_exec.pl.out: OK       -->
<!-- warn strict  -                        noncap_parens.pl: OK                                  noncap_parens.pl.out: OK       -->
<!-- warn strict  -                      multi_match_var.pl: OK                                multi_match_var.pl.out: OK       -->
<!-- warn strict  -                      match_var_reset.pl: OK                                match_var_reset.pl.out: OK       -->
<!-- warn strict  -                      needs_quotemeta.pl: OK                                needs_quotemeta.pl.out: OK       -->
<!-- warn strict  -                        use_quotemeta.pl: OK                                  use_quotemeta.pl.out: OK       -->
<!-- warn strict  -                      construct_regex.pl: OK                                construct_regex.pl.out: OK       -->
<!-- warn strict  -                         replace_eval.pl: OK                                   replace_eval.pl.out: OK       -->
<!-- warn strict  -                                greed.pl: OK                                          greed.pl.out: OK       -->
<!-- warn strict  -                             nongreed.pl: OK                                       nongreed.pl.out: OK       -->
<!-- warn strict  -                                   tr.pl: OK                                             tr.pl.out: OK       -->
<!-- warn strict  -                             tr_repl1.pl: OK                                       tr_repl1.pl.out: OK       -->
<!-- warn strict  -                                 tr_c.pl: OK                                           tr_c.pl.out: OK       -->
<!-- warn strict  -                                 tr_s.pl: OK                                           tr_s.pl.out: OK       -->
<!-- warn strict  -                                 tr_d.pl: OK                                           tr_d.pl.out: OK       -->
<!-- warn strict  -                             tr_count.pl: OK                                       tr_count.pl.out: OK       -->
<html><head>
<meta http-equiv="content-type" content="text/html; charset=UTF-8">
    <title>Regular Expressions</title>
    <style type="text/css" media="all">
      /* Style sheet information */

      BODY {
        font-family: Arial;
        font-size: 18px;
      }

      TT {
        color: #C00000;
        font-family: Courier New;
        font-weight: bold;
      }

      TT.fname {
        color: #000000;
        font-family: Courier New;
        font-weight: bold;
        font-style: normal;
        font-size: 20px;
      }

      TT.soln {
        color: #008040;
        font-family: Courier New;
        font-weight: bold;
      }

      PRE {
        color: #C00000;
        margin: 0px;
        padding: 0px;
        border: 0px;
        font-family: Courier New;
        font-weight: bold;
      }

      TABLE {
        font-size: 18px;
      }

      I {
        color: #0000FF;
      }

      UL, OL {
        margin-top: 0px;
        margin-bottom: 0px;
      }

      UL LI                          { margin-top: 12px; list-style-type: square; }
      OL LI                          { margin-top: 12px; list-style-type: decimal; }
      UL LI UL LI                    { margin-top:  9px; list-style-type: disc; }
      UL LI OL LI                    { margin-top:  9px; list-style-type: decimal; }
      OL LI UL LI                    { margin-top:  9px; list-style-type: square; }
      OL LI OL LI                    { margin-top:  9px; list-style-type: upper-alpha; }
      UL LI UL LI UL LI              { margin-top:  6px; list-style-type: circle; }
      UL LI UL LI OL LI              { margin-top:  6px; list-style-type: decimal; }
      UL LI OL LI UL LI              { margin-top:  6px; list-style-type: square; }
      UL LI OL LI OL LI              { margin-top:  6px; list-style-type: lower-alpha; }
      OL LI UL LI UL LI              { margin-top:  6px; list-style-type: disc; }
      OL LI UL LI OL LI              { margin-top:  6px; list-style-type: decimal; }
      OL LI OL LI UL LI              { margin-top:  6px; list-style-type: square; }
      OL LI OL LI OL LI              { margin-top:  6px; list-style-type: lower-roman; }
      UL LI UL LI UL LI UL LI        { margin-top:  3px; list-style-type: square; }
      UL LI UL LI UL LI OL LI        { margin-top:  3px; list-style-type: decimal; }
      UL LI UL LI OL LI UL LI        { margin-top:  3px; list-style-type: square; }
      UL LI UL LI OL LI OL LI        { margin-top:  3px; list-style-type: lower-alpha; }
      UL LI OL LI UL LI UL LI        { margin-top:  3px; list-style-type: disc; }
      UL LI OL LI UL LI OL LI        { margin-top:  3px; list-style-type: decimal; }
      UL LI OL LI OL LI UL LI        { margin-top:  3px; list-style-type: square; }
      UL LI OL LI OL LI OL LI        { margin-top:  3px; list-style-type: lower-roman; }
      OL LI UL LI UL LI UL LI        { margin-top:  3px; list-style-type: circle; }
      OL LI UL LI UL LI OL LI        { margin-top:  3px; list-style-type: decimal; }
      OL LI UL LI OL LI UL LI        { margin-top:  3px; list-style-type: square; }
      OL LI UL LI OL LI OL LI        { margin-top:  3px; list-style-type: lower-alpha; }
      OL LI OL LI UL LI UL LI        { margin-top:  3px; list-style-type: disc; }
      OL LI OL LI UL LI OL LI        { margin-top:  3px; list-style-type: decimal; }
      OL LI OL LI OL LI UL LI        { margin-top:  3px; list-style-type: square; }
      OL LI OL LI OL LI OL LI        { margin-top:  3px; list-style-type: upper-alpha; }
      UL LI UL LI UL LI UL LI UL LI  { margin-top:  0px; list-style-type: disc; }
      UL LI UL LI UL LI UL LI OL LI  { margin-top:  0px; list-style-type: decimal; }
      UL LI UL LI UL LI OL LI UL LI  { margin-top:  0px; list-style-type: square; }
      UL LI UL LI UL LI OL LI OL LI  { margin-top:  0px; list-style-type: lower-alpha; }
      UL LI UL LI OL LI UL LI UL LI  { margin-top:  0px; list-style-type: disc; }
      UL LI UL LI OL LI UL LI OL LI  { margin-top:  0px; list-style-type: decimal; }
      UL LI UL LI OL LI OL LI UL LI  { margin-top:  0px; list-style-type: square; }
      UL LI UL LI OL LI OL LI OL LI  { margin-top:  0px; list-style-type: lower-roman; }
      UL LI OL LI UL LI UL LI UL LI  { margin-top:  0px; list-style-type: circle; }
      UL LI OL LI UL LI UL LI OL LI  { margin-top:  0px; list-style-type: decimal; }
      UL LI OL LI UL LI OL LI UL LI  { margin-top:  0px; list-style-type: square; }
      UL LI OL LI UL LI OL LI OL LI  { margin-top:  0px; list-style-type: lower-alpha; }
      UL LI OL LI OL LI UL LI UL LI  { margin-top:  0px; list-style-type: disc; }
      UL LI OL LI OL LI UL LI OL LI  { margin-top:  0px; list-style-type: decimal; }
      UL LI OL LI OL LI OL LI UL LI  { margin-top:  0px; list-style-type: square; }
      UL LI OL LI OL LI OL LI OL LI  { margin-top:  0px; list-style-type: upper-alpha; }
      OL LI UL LI UL LI UL LI UL LI  { margin-top:  0px; list-style-type: square; }
      OL LI UL LI UL LI UL LI OL LI  { margin-top:  0px; list-style-type: decimal; }
      OL LI UL LI UL LI OL LI UL LI  { margin-top:  0px; list-style-type: square; }
      OL LI UL LI UL LI OL LI OL LI  { margin-top:  0px; list-style-type: lower-alpha; }
      OL LI UL LI OL LI UL LI UL LI  { margin-top:  0px; list-style-type: disc; }
      OL LI UL LI OL LI UL LI OL LI  { margin-top:  0px; list-style-type: decimal; }
      OL LI UL LI OL LI OL LI UL LI  { margin-top:  0px; list-style-type: square; }
      OL LI UL LI OL LI OL LI OL LI  { margin-top:  0px; list-style-type: lower-roman; }
      OL LI OL LI UL LI UL LI UL LI  { margin-top:  0px; list-style-type: circle; }
      OL LI OL LI UL LI UL LI OL LI  { margin-top:  0px; list-style-type: decimal; }
      OL LI OL LI UL LI OL LI UL LI  { margin-top:  0px; list-style-type: square; }
      OL LI OL LI UL LI OL LI OL LI  { margin-top:  0px; list-style-type: lower-alpha; }
      OL LI OL LI OL LI UL LI UL LI  { margin-top:  0px; list-style-type: disc; }
      OL LI OL LI OL LI UL LI OL LI  { margin-top:  0px; list-style-type: decimal; }
      OL LI OL LI OL LI OL LI UL LI  { margin-top:  0px; list-style-type: square; }
      OL LI OL LI OL LI OL LI OL LI  { margin-top:  0px; list-style-type: upper-roman; }

      .soln {
        color: #20A040;
      }

      A {
        color: green;
        font-weight: bold;
        text-decoration: none;
      }

      A:hover {
        background: yellow;
        text-decoration: underline;
      }

      A.goto {
        color: black;
        text-decoration: none;
      }

      A.goto:hover {
        background: white;
      }

      TABLE, IMG {
        margin-top: 8px;
        margin-bottom: 8px;
        margin-left: 24px;
      }

      IMG {
        page-break-inside: avoid;
        display: block;
        border: 1px solid #000000;
      }

      IMG.no_border {
        page-break-inside: avoid;
        border: none;
      }

      TABLE.inline {
        background: #808080;
      }

      TABLE.soln {
        background: #20A040;
      }

      TD.inline {
        background: #FFFFFF;
        padding: 6px;
        text-align: left;
        vertical-align: top;
      }

      TH.inline {
        background: #F0F0F0;
        padding: 6px;
        font-weight: bold;
        text-align: left;
        vertical-align: top;
      }

      TABLE.show_text {
        background: #000000;
      }

      TD.show_text {
        background: #F0F0F0;
        padding: 8px;
      }

      TD.show_text_header {
        padding: 2px;
        padding-left: 8px;
        color: #000000;
        background: #FFFFF0;
        font-weight: normal;
      }

      PRE.show_text {
        color: #000000;
      }

      TABLE.show_php {
        background: #000000;
      }

      TABLE.show_php_soln {
        background: #20A040;
      }

      TD.show_php {
        background: #F0F0F0;
        padding: 8px;
      }

      TD.show_php_soln {
        background: #F8FFF8;
        padding: 8px;
      }

      TD.show_php_header {
        padding: 2px;
        padding-left: 8px;
        color: #000000;
        background: #FFFFF0;
        font-weight: normal;
      }

      PRE.show_php {

      }

      PRE.show_php_soln {
        color: #20A040;
      }

      TABLE.exec_php {
        background: #000000;
      }

      TD.exec_php {
        padding: 8px;
        background: #FFFFFF;
      }

      TD.exec_php_header {
        padding: 2px;
        padding-left: 8px;
        color: #000000;
        background: #FFFFF0;
        font-weight: normal;
      }

      PRE.exec_php {
        color: #000080;
      }

      TABLE.render_php {
        background: #000000;
      }

      TD.render_php {
        padding: 8px;
        font-family: times;
        background: #FFFFFF;
      }

      TD.render_php_header {
        padding: 2px;
        padding-left: 8px;
        color: #000000;
        background: #FFFFF0;
        font-weight: normal;
      }

      TABLE.show_html {
        background: #000000;
      }

      TABLE.show_html_soln {
        background: #20A040;
      }

      TD.show_html {
        background: #F0F0F0;
        padding: 8px;
      }

      TD.show_html_soln {
        background: #F8FFF8;
        padding: 8px;
      }

      TD.show_html_header {
        padding: 2px;
        padding-left: 8px;
        color: #000000;
        background: #FFFFF0;
        font-weight: normal;
      }

      PRE.show_html {
        color: #000000;
      }

      PRE.show_html_soln {
        color: #20A040;
      }

      TABLE.render_html {
        background: #000000;
      }

      TD.render_html {
        padding: 8px;
        font-family: times;
        background: #FFFFFF;
      }

      TD.render_html_header {
        padding: 2px;
        padding-left: 8px;
        color: #000000;
        background: #FFFFF0;
        font-weight: normal;
      }

      TABLE.show_perl {
        background: #000000;
      }

      TABLE.show_perl_soln {
        background: #20A040;
      }

      TD.show_perl {
        background: #F0F0F0;
        padding: 8px;
      }

      TD.show_perl_soln {
        background: #F8FFF8;
        padding: 8px;
      }

      TD.show_perl_header {
        padding: 2px;
        padding-left: 8px;
        color: #000000;
        background: #FFFFF0;
        font-weight: normal;
      }

      PRE.show_perl {

      }

      PRE.show_perl_soln {
        color: #20A040;
      }

      TABLE.exec_tcsh {
        background: #000000;
      }

      TABLE.exec_tcsh_soln {
        background: #20A040;
      }

      TABLE.exec_perl {
        background: #000000;
      }

      TABLE.exec_perl_soln {
        background: #20A040;
      }

      TD.exec_tcsh {
        background: #FFFFFF;
        padding: 8px;
      }

      TD.exec_tcsh_soln {
        background: #FFFFFF;
        padding: 8px;
      }

      TD.exec_perl {
        background: #FFFFFF;
        padding: 8px;
      }

      TD.exec_perl_soln {
        background: #FFFFFF;
        padding: 8px;
      }

      TD.exec_tcsh_header {
        padding: 2px;
        padding-left: 8px;
        color: #000000;
        background: #FFFFF0;
        font-weight: normal;
      }

      TD.exec_perl_header {
        padding: 2px;
        padding-left: 8px;
        color: #000000;
        background: #FFFFF0;
        font-weight: normal;
      }

      PRE.exec_tcsh {
        color: #800080;
      }

      PRE.exec_tcsh_soln {
        color: #20A040;
      }

      PRE.exec_perl {
        color: #000080;
      }

      PRE.exec_perl_soln {
        color: #20A040;
      }

      DIV.copyright {
        color: #804040;
        font-style: italic;
        font-family: Verdana;
      }

      TABLE.navbar {
        margin-top: 0px;
        margin-bottom: 0px;
        margin-left: 0px;
        border: 2px solid #800000;
        background:#FFF0F0;
      }

      TD.navbar_left {
        font-weight: bold;
        padding: 4px;
      }

      TD.navbar_right {
        font-weight: bold;
        padding: 4px;
      }

      A.navbar {
        color: #000080;
        font-weight: bold;
        text-decoration: none;
        font-family: Verdana;
      }

      A.navbar:hover {
        font-weight: bold;
        text-decoration: underline;
        background: #FFF0F0;
      }

      H1 {
        font-family: Verdana;
        font-size: 36px;
        font-weight: bold;
      }

      H2 {
        page-break-before: auto;
        font-family: Verdana;
        font-size: 28px;
        font-weight: bold;
      }

      H3 {
        page-break-before: auto;
        font-family: Verdana;
        font-size: 24px;
        font-weight: bold;
      }

      H1.toc {
        font-size: 24px;
        margin-bottom: 0px;
        color: black;
        font-family: Verdana;
      }

      A.tocTITLE {
        font-size: 24px;
        color: navy;
        margin-left: 4px;
        margin-right: 12px;
      }

      A.tocTITLE:hover {
        text-decoration: none;
        background: white;
      }

      A.tocCHAPTER {
        font-size: 21px;
        color: navy;
        margin-left: 12px;
        margin-right: 12px;
      }

      A.tocCHAPTER:hover {
        text-decoration: none;
        background: white;
      }

      A.tocSECTION {
        font-size: 18px;
        color: navy;
        margin-left: 20px;
        margin-right: 12px;
      }

      A.tocSECTION:hover {
        text-decoration: none;
        background: white;
      }

      TD.toc {
        background: #E0F0FF;
        padding: 12px;
      }

      HR.toc {
        border: 2px solid black;
      }

      SPAN.linenum {
        color: #404040;
        font-family: Courier New;
        font-weight: normal;
        font-size: 16px;
      }

      DIV.quotebox {
        margin: 24px;
        padding: 6px;
        border: 1px solid black;
      }

      SPAN.quoteboxless {
        border: none;
        margin-top: 12px;
        margin-bottom: 12px;
        padding: 6px;
      }

      SPAN.quote {
        font-family: Cambria, Times;
        font-style: italic;
        font-weight: bold;
        font-size: 18px;
      }

      SPAN.quoteattr {
        font-family: Cambria, Times;
        font-style: italic;
        font-size: 18px;
      }

      SPAN.regex_match {
        background: #FFFF80;
        padding-left: 1px;
        padding-right: 1px;
        margin-left: 1px;
        margin-right: 1px;
        border-left: 1px dotted #404000;
        border-right: 1px dotted #404000;
      }

      SPAN.regex_match_l {
        background: #FFFF80;
        padding-left: 1px;
        padding-right: 1px;
        margin-left: 1px;
        margin-right: 0px;
        border-left: 1px dotted #404000;
        border-right: 0px dotted #404000;
      }

      SPAN.regex_match_m {
        background: #FFFF80;
        padding-left: 1px;
        padding-right: 1px;
        margin-left: 0px;
        margin-right: 0px;
        border-left: 1px dotted #404000;
        border-right: 1px dotted #404000;
      }

      SPAN.regex_match_r {
        background: #FFFF80;
        padding-left: 1px;
        padding-right: 1px;
        margin-left: 0px;
        margin-right: 1px;
        border-left: 0px dotted #404000;
        border-right: 1px dotted #404000;
      }

      SPAN.regex {
        padding-left: 1px;
        padding-right: 1px;
        margin-left: 1px;
        margin-right: 1px;
        border-left: 1px dotted #404000;
        border-right: 1px dotted #404000;
      }

      SPAN.regex_l {
        padding-left: 1px;
        padding-right: 1px;
        margin-left: 1px;
        margin-right: 0px;
        border-left: 1px dotted #404000;
        border-right: 0px dotted #404000;
      }

      SPAN.regex_m {
        padding-left: 1px;
        padding-right: 1px;
        margin-left: 0px;
        margin-right: 0px;
        border-left: 1px dotted #404000;
        border-right: 1px dotted #404000;
      }

      SPAN.regex_r {
        padding-left: 1px;
        padding-right: 1px;
        margin-left: 0px;
        margin-right: 1px;
        border-left: 0px dotted #404000;
        border-right: 1px dotted #404000;
      }

      INPUT, SELECT, TEXTAREA {
        border-top: 1px solid gray;
        border-left: 1px solid gray;
        border-right: 1px solid silver;
        border-bottom: 1px solid silver;
      }

      BUTTON {
        border-top: 1px solid silver;
        border-left: 1px solid silver;
        border-right: 1px solid gray;
        border-bottom: 1px solid gray;
      }
    </style>
  </head>
  <body>
    <h1><a class="goto" name="TITLE0001">Regular Expressions</a></h1>
    <h3><a class="goto" name="SECTION0001">Who am I?</a></h3>
    <ul>
      <li>Jeremy Holland (jeremy.holland@gmail.com)
      </li><li>I have been a professional programmer, mostly using Perl, since the Clinton administration.
      </li><li>I work for a semiconductor in manufacturing operations.
      <ul>
        <li>We have production systems runnning SunOS 4.1.1 on a Sun 3/80.... both of which were EOL'ed back when I was in junior high.
        </li><li>We also have RHEL 8 servers running on 24-core servers.
        </li><li><u>Compatibility</u> is the name of my game.
      </li></ul>
    </li></ul>
    <p>

    <ul>
      <li>These notes are available at <a href="http://www.jeremyholland.info/rgx.html">http://www.jeremyholland.info/rgx.html</a>
      <ul>
        <li>Though not much else is...
      </li></ul>
    </li></ul>

    <p>
    <br>
<table><tbody><tr><td class="toc"><h1 class="toc">Table of Contents</h1><hr class="toc"><a class="tocCHAPTER" href="#CHAPTER0001">The Regular Expression Grammar</a><br><a class="tocCHAPTER" href="#CHAPTER0002">Rules For Matching</a><br><a class="tocCHAPTER" href="#CHAPTER0003">Storing Regexes</a><br><a class="tocCHAPTER" href="#CHAPTER0004">Using Regexes</a><br><a class="tocCHAPTER" href="#CHAPTER0005">Regex Memory</a><br><a class="tocCHAPTER" href="#CHAPTER0006">Interpolating into Regexes</a><br><a class="tocCHAPTER" href="#CHAPTER0007">Evaluated Replacements</a><br><a class="tocCHAPTER" href="#CHAPTER0008">Extended Regexes</a><br><a class="tocCHAPTER" href="#CHAPTER0009">Greed</a><br><a class="tocCHAPTER" href="#CHAPTER0010">Tips</a><br></td></tr></tbody></table><br></p><div class="quotebox"><span class="quote">Some people, when confronted with a problem, think "I know, I'll use regular expressions." Now they have two problems. </span> <span class="quoteattr">— Jamie Zawinski </span></div>    This quote illustrates two things about Regular Expressions: their versatility and their complexity.
    <ul>
      <li><i>Regular expressions</i> (commonly called <i>regexes</i>) are one of Perl's most powerful features.
      </li><li>In reality, regular expressions are an entirely separate programming language embedded
          into Perl. They define a <i>grammar</i> by which strings can be recognized.
      </li><li>There are entire graduate-level courses on regular expressions—although these tend to focus
          on the regex theory, implementation, and optimization, not on practical use.
      </li><li>In this course, we will focus on practical use.
      <ul>
        <li>Regular expressions are primarily used for three things:
        <ol>
          <li>Matching strings of text against <i>patterns</i> (does this string look like a phone number?)
          </li><li>Parsing text to extract information (extract the area code from this phone number)
          </li><li>Altering strings by deleting or swapping out a
substring that matches a pattern (remove the area code to get the base
phone number)
        </li></ol>
        </li><li>Regexes are far more robust than simply using <tt>eq</tt> to test whether one string equals another,
            or using <tt>substr</tt> and <tt>index</tt> to pick apart a string.
      </li></ul>
      </li><li>Most modern programming languages have some support for regular expressions, but it is often "tacked on" in
          the form of an external libraries. In some cases, it is only a partial implementation of regexes.
      </li><li>Perl's strength lies in how regular expressions are integrated into the language as part of the basic syntax
          and how Perl supports extremely advanced regular expressions.
      </li><li>Also, the regex engine in Perl is <b>highly</b> optimized. Therefore, working with string data using regular expressions
          tends to be very fast, even for complex operations.
    </li></ul>
    <h2><a class="goto" name="CHAPTER0001">The Regular Expression Grammar</a></h2>
    <h3><a class="goto" name="SECTION0002">The Basics</a></h3>
    <ul>
      <li>A regex is represented as a string of text, surrounded by slashes.
      <ul>
        <li>One or more <i>pattern modifiers</i> can appear after the final slash; these will be covered later.
      </li></ul>
      </li><li>Unless you use positional anchors (covered later) the pattern will match
          anywhere in the string.  Therefore, the pattern<br>
<tt>&nbsp;&nbsp;&nbsp;/<span class="regex">Bob</span>/</tt><br>
          will match<br>
<tt>&nbsp;&nbsp;&nbsp;"<span class="regex_match">Bob</span>"</tt><br>
<tt>&nbsp;&nbsp;&nbsp;"<span class="regex_match">Bob</span> is a guy"</tt><br>
<tt>&nbsp;&nbsp;&nbsp;"My name is <span class="regex_match">Bob</span>"</tt><br>
<tt>&nbsp;&nbsp;&nbsp;"I am called <span class="regex_match">Bob</span>by"</tt><br>
<tt>&nbsp;&nbsp;&nbsp;"Let's see if <span class="regex_match">Bob</span> can go."</tt><br>
      </li><li>In fact, it will match <b>any</b> string that has <tt>"Bob"</tt> as a substring.
      </li><li>Note that regexes are (by default) <b>case-sensitive</b>.  Thus, the above pattern will
          <b>not</b> match any of:<br>
          <tt>&nbsp;&nbsp;&nbsp;"bob"</tt><br>
          <tt>&nbsp;&nbsp;&nbsp;"BOB"</tt><br>
          <tt>&nbsp;&nbsp;&nbsp;"bob&nbsp;is&nbsp;a&nbsp;guy."</tt><br>
          <tt>&nbsp;&nbsp;&nbsp;"My&nbsp;name&nbsp;is&nbsp;bob"</tt><br>
          <tt>&nbsp;&nbsp;&nbsp;"I&nbsp;am&nbsp;called&nbsp;bobby"</tt><br>
          <tt>&nbsp;&nbsp;&nbsp;"Let's&nbsp;see&nbsp;if&nbsp;BOB&nbsp;can&nbsp;go."</tt><br>
    </li></ul>
    <h3><a class="goto" name="SECTION0003">Quantifiers</a></h3>
    <ul>
      <li>So far, we haven't replicated any functionality beyond <tt>index</tt>, but regular expressions can be far more complex.
          The next layer of functionality is <i>quantifiers</i>.  These allow a <i>sub-pattern</i> (for example, a string or character) to match multiple times in a specific place.
      </li><li>Use the <tt>+</tt> operator to match a sub-pattern <b>one or more times</b>:<br>
<tt>&nbsp;&nbsp;&nbsp;/<span class="regex">(iss)+</span>/</tt><br>
          will match any string that has the string <tt>"iss"</tt> in it one or more times:<br>
<tt>&nbsp;&nbsp;&nbsp;"K<span class="regex_match">iss</span>immee"</tt><br>
<tt>&nbsp;&nbsp;&nbsp;"M<span class="regex_match">ississ</span>ippi"</tt><br>
<tt>&nbsp;&nbsp;&nbsp;"m<span class="regex_match">iss</span>"</tt><br>
<tt>&nbsp;&nbsp;&nbsp;"M<span class="regex_match">iss</span> Mississippi"</tt><br>
          but will not match any string that does not have <tt>"iss"</tt> in it.
      <ul>
        <li>Note in the final example that only the first <tt>"iss"</tt> was part of the match.
      </li></ul>
      </li><li>Note that quantifiers apply to a single sub-pattern, which all else being equal is a single letter.
      <ul>
        <li>In other words,
<tt>/<span class="regex_l">a</span><span class="regex">b+</span>/</tt>
            matches strings like
<tt>"x<span class="regex_match_l">a</span><span class="regex_match">b</span>x"</tt>, <tt>"x<span class="regex_match_l">a</span><span class="regex_match">bbb</span>x"</tt>, and <tt>"x<span class="regex_match_l">a</span><span class="regex_match">bbbbb</span>x"</tt>
            (as well as, of course, strings like
<tt>"x<span class="regex_match_l">a</span><span class="regex_match">b</span>ababababx"</tt>).
        </li><li>You can create a sub-pattern with parentheses:
<tt>/<span class="regex">(ab)+</span>/</tt>
            matches strings like
<tt>"x<span class="regex_match">ab</span>x"</tt>, <tt>"x<span class="regex_match">abab</span>x"</tt>, and <tt>"x<span class="regex_match">ababababab</span>x"</tt>
            (and
<tt>"x<span class="regex_match">abab</span>xababxababx"</tt>).
      </li></ul>
      </li><li>Combine the quantifier with other sub-patterns for more sophisticated matches:<br>
<tt>&nbsp;&nbsp;&nbsp;/<span class="regex_l">have a </span><span class="regex_m">(very )+</span><span class="regex_r">nice day</span>/</tt><br>
          will match<br>
<tt>&nbsp;&nbsp;&nbsp;"<span class="regex_match_l">have a </span><span class="regex_match_m">very </span><span class="regex_match_r">nice day</span>"</tt><br>
<tt>&nbsp;&nbsp;&nbsp;"<span class="regex_match_l">have a </span><span class="regex_match_m">very very </span><span class="regex_match_r">nice day</span>"</tt><br>
<tt>&nbsp;&nbsp;&nbsp;"<span class="regex_match_l">have a </span><span class="regex_match_m">very very very </span><span class="regex_match_r">nice day</span>"</tt><br>
          and so forth (as well as strings like
<tt>"I hope you <span class="regex_match_l">have a </span><span class="regex_match_m">very </span><span class="regex_match_r">nice day</span> today!"</tt>
          ), but will not match any of the following:<br>
          <tt>&nbsp;&nbsp;&nbsp;"have&nbsp;a&nbsp;nice&nbsp;day"</tt><br>
          <tt>&nbsp;&nbsp;&nbsp;"have&nbsp;a&nbsp;very&nbsp;very&nbsp;"</tt><br>
          <tt>&nbsp;&nbsp;&nbsp;"very&nbsp;very&nbsp;very&nbsp;nice&nbsp;day"</tt><br>
          <tt>&nbsp;&nbsp;&nbsp;"very&nbsp;very&nbsp;very&nbsp;"</tt><br>
          <b>all three sub-patterns must be matched</b> for the overall regex to match.
      <ul>
        <li>The string <tt>"nice day very have a "</tt> also fails to match, because the sub-parts are all there, but <b>not in order</b>.
        </li><li>The string <tt>"have a completely very good and nice day"</tt> fails to match, because the sub-parts are all there, but <b>not adjacently</b>.
      </li></ul>
      </li><li>Notice the space is inside the parentheses.  Regexes are very sensitive to space and positioning, so be careful!
      <ul>
        <li>For example, consider the pattern<br>
<tt>&nbsp;&nbsp;&nbsp;/<span class="regex_l">have a </span><span class="regex_m">(very)+</span><span class="regex_r"> nice day</span>/</tt><br>
            Note the slight difference from the pattern above: the space is now outside the quantifier.
            The strings this pattern actually matches are:<br>
<tt>&nbsp;&nbsp;&nbsp;"<span class="regex_match_l">have a </span><span class="regex_match_m">very</span><span class="regex_match_r"> nice day</span>"</tt><br>
<tt>&nbsp;&nbsp;&nbsp;"<span class="regex_match_l">have a </span><span class="regex_match_m">veryvery</span><span class="regex_match_r"> nice day</span>"</tt><br>
<tt>&nbsp;&nbsp;&nbsp;"<span class="regex_match_l">have a </span><span class="regex_match_m">veryveryvery</span><span class="regex_match_r"> nice day</span>"</tt><br>
        </li><li>Similarly, consider the pattern<br>
<tt>&nbsp;&nbsp;&nbsp;/<span class="regex_l">have a </span><span class="regex_m">(very )+</span><span class="regex_r"> nice day</span>/</tt><br>
            Here there is a space both inside and outside the quantifier.
            The strings this matches are:<br>
<tt>&nbsp;&nbsp;&nbsp;"<span class="regex_match_l">have a </span><span class="regex_match_m">very </span><span class="regex_match_r">&nbsp;nice day</span>"</tt><br>
<tt>&nbsp;&nbsp;&nbsp;"<span class="regex_match_l">have a </span><span class="regex_match_m">very very </span><span class="regex_match_r">&nbsp;nice day</span>"</tt><br>
<tt>&nbsp;&nbsp;&nbsp;"<span class="regex_match_l">have a </span><span class="regex_match_m">very very very </span><span class="regex_match_r">&nbsp;nice day</span>"</tt><br>
            Note the double spaces. One is used to match the quantified sub-pattern; the other is
            needed to match the space outside the quantifier.
      </li></ul>
      </li><li>Use the <tt>*</tt> operator to match a sub-pattern
          <b>zero or more times</b>:<br>
<tt>&nbsp;&nbsp;&nbsp;/<span class="regex">(iss)*</span>/</tt><br>
          will still match<br>
<tt>&nbsp;&nbsp;&nbsp;"K<span class="regex_match">iss</span>immee"</tt><br>
<tt>&nbsp;&nbsp;&nbsp;"M<span class="regex_match">ississ</span>ippi"</tt><br>
<tt>&nbsp;&nbsp;&nbsp;"m<span class="regex_match">iss</span>"</tt><br>
          and will <b>also</b> match strings with zero occurrence of <tt>"iss"</tt> in them.  This is obviously not very useful by itself (because every string has zero or more occurrences of <tt>"iss"</tt> in it), but when used in combination with other sub-patterns...<br>
<tt>&nbsp;&nbsp;&nbsp;/<span class="regex_l">have a </span><span class="regex_m">(very )*</span><span class="regex_r">nice day</span>/</tt><br>
          will match strings containing<br>
<tt>&nbsp;&nbsp;&nbsp;"<span class="regex_match_l">have a </span><span class="regex_match_m">very </span><span class="regex_match_r">nice day</span>"</tt><br>
<tt>&nbsp;&nbsp;&nbsp;"<span class="regex_match_l">have a </span><span class="regex_match_m">very very </span><span class="regex_match_r">nice day</span>"</tt><br>
<tt>&nbsp;&nbsp;&nbsp;"<span class="regex_match_l">have a </span><span class="regex_match_m">very very very </span><span class="regex_match_r">nice day</span>"</tt><br>
          etc., and this time will also match:<br>
<tt>&nbsp;&nbsp;&nbsp;"<span class="regex_match_l">have a </span><span class="regex_match_m"></span><span class="regex_match_r">nice day</span>"</tt><br>
          but will still not match any of the following:<br>
          <tt>&nbsp;&nbsp;&nbsp;"have&nbsp;a&nbsp;very&nbsp;very&nbsp;"</tt><br>
          <tt>&nbsp;&nbsp;&nbsp;"very&nbsp;very&nbsp;very&nbsp;nice&nbsp;day"</tt><br>
          <tt>&nbsp;&nbsp;&nbsp;"very&nbsp;very&nbsp;very&nbsp;"</tt><br>
      <ul>
        <li>Again, be careful of spacing:<br>
<tt>&nbsp;&nbsp;&nbsp;/<span class="regex_l">have a </span><span class="regex_m">(very)*</span><span class="regex_r"> nice day</span>/</tt><br>
            matches<br>
<tt>&nbsp;&nbsp;&nbsp;"<span class="regex_match_l">have a </span><span class="regex_match_m">very</span><span class="regex_match_r"> nice day</span>"</tt><br>
<tt>&nbsp;&nbsp;&nbsp;"<span class="regex_match_l">have a </span><span class="regex_match_m">veryvery</span><span class="regex_match_r"> nice day</span>"</tt><br>
<tt>&nbsp;&nbsp;&nbsp;"<span class="regex_match_l">have a </span><span class="regex_match_m">veryveryvery</span><span class="regex_match_r"> nice day</span>"</tt><br>
            and<br>
<tt>&nbsp;&nbsp;&nbsp;"<span class="regex_match_l">have a </span><span class="regex_match_m"></span><span class="regex_match_r">&nbsp;nice day</span>"</tt>
            (note the double spaces)
      </li></ul>
      </li><li>Use the <tt>?</tt> operator to match a sub-pattern <b>zero or one times</b>:<br>
<tt>&nbsp;&nbsp;&nbsp;/<span class="regex_l">have a </span><span class="regex_m">(very )?</span><span class="regex_r">nice day</span>/</tt><br>
          will match strings containing<br>
<tt>&nbsp;&nbsp;&nbsp;"<span class="regex_match_l">have a </span><span class="regex_match_m"></span><span class="regex_match_r">nice day</span>"</tt><br>
<tt>&nbsp;&nbsp;&nbsp;"<span class="regex_match_l">have a </span><span class="regex_match_m">very </span><span class="regex_match_r">nice day</span>"</tt><br>
          and nothing else.
      <ul>
        <li>Final warning regarding spacing:<br>
<tt>&nbsp;&nbsp;&nbsp;/<span class="regex_l">have a </span><span class="regex_m">(very)?</span><span class="regex_r"> nice day</span>/</tt><br>
            matches<br>
<tt>&nbsp;&nbsp;&nbsp;"<span class="regex_match_l">have a </span><span class="regex_match_m">very</span><span class="regex_match_r">&nbsp;nice day</span>"</tt><br>
            and<br>
<tt>&nbsp;&nbsp;&nbsp;"<span class="regex_match_l">have a </span><span class="regex_match_m"></span><span class="regex_match_r">&nbsp;nice day</span>"</tt>
            (again, note the double spaces)
      </li></ul>
      </li><li>Use braces for arbitrary quantification
      <ul>
        <li><tt>sub-pattern{count}</tt> matches exactly <tt>count</tt> instances of <tt>sub-pattern</tt>
        </li><li><tt>sub-pattern{min,max}</tt> matches between <tt>min</tt> and <tt>max</tt> instances of <tt>sub-pattern</tt>, inclusive
        </li><li><tt>sub-pattern{min,}</tt> matches at least <tt>min</tt> instances of <tt>sub-pattern</tt>
        </li><li><tt>sub-pattern{0,max}</tt> matches at most <tt>max</tt> instances of <tt>sub-pattern</tt> (or none)
        </li><li><tt>sub-pattern{1,max}</tt> matches at most <tt>max</tt> instances of <tt>sub-pattern</tt> (but at least one)
        </li><li>Examples:<br>
<tt>&nbsp;&nbsp;&nbsp;/<span class="regex_l">have a </span><span class="regex_m">(very ){3}</span><span class="regex_r">nice day</span>/</tt><br>
<tt>&nbsp;&nbsp;&nbsp;/<span class="regex_l">have a </span><span class="regex_m">(very ){0,3}</span><span class="regex_r">nice day</span>/</tt><br>
<tt>&nbsp;&nbsp;&nbsp;/<span class="regex_l">have a </span><span class="regex_m">(very ){1,5}</span><span class="regex_r">nice day</span>/</tt><br>
<tt>&nbsp;&nbsp;&nbsp;/<span class="regex_l">have a </span><span class="regex_m">(very ){3,5}</span><span class="regex_r">nice day</span>/</tt><br>
<tt>&nbsp;&nbsp;&nbsp;/<span class="regex_l">have a </span><span class="regex_m">(very ){2,}</span><span class="regex_r">nice day</span>/</tt><br>
        </li><li>Note these can all be represented without braces:<br>
<tt>&nbsp;&nbsp;&nbsp;/<span class="regex">have a </span><span class="regex_r">(very )</span><span class="regex_r">(very )</span><span class="regex_r">(very )</span><span class="regex_r">nice day</span>/</tt><br>
<tt>&nbsp;&nbsp;&nbsp;/<span class="regex">have a </span><span class="regex_r">(very )?</span><span class="regex_r">(very )?</span><span class="regex_r">(very )?</span><span class="regex_r">nice day</span>/</tt><br>
<tt>&nbsp;&nbsp;&nbsp;/<span class="regex">have a </span><span class="regex_r">(very )</span><span class="regex_r">(very )?</span><span class="regex_r">(very )?</span><span class="regex_r">(very )?</span><span class="regex_r">(very )?</span><span class="regex_r">nice day</span>/</tt><br>
<tt>&nbsp;&nbsp;&nbsp;/<span class="regex">have a </span><span class="regex_r">(very )</span><span class="regex_r">(very )</span><span class="regex_r">(very )</span><span class="regex_r">(very )?</span><span class="regex_r">(very )?</span><span class="regex_r">nice day</span>/</tt><br>
<tt>&nbsp;&nbsp;&nbsp;/<span class="regex">have a </span><span class="regex_r">(very )</span><span class="regex_r">(very )+</span><span class="regex_r">nice day</span>/</tt><br>
            But the braces are much easier to interpret.
        </li><li>Also, note that the <tt>+</tt>, <tt>*</tt>, and <tt>?</tt> quantifiers can be represented
            using braces:
            <table class="inline">
            <tbody><tr><th class="inline">the quantifier...</th><th class="inline">equivalent to...</th></tr>
            <tr><td class="inline"><tt>sub-pattern+</tt></td><td class="inline"><tt>sub-pattern{1,}</tt></td></tr>
            <tr><td class="inline"><tt>sub-pattern*</tt></td><td class="inline"><tt>sub-pattern{0,}</tt></td></tr>
            <tr><td class="inline"><tt>sub-pattern?</tt></td><td class="inline"><tt>sub-pattern{0,1}</tt></td></tr>
            </tbody></table>
            but again, the simple quantifiers are easier to understand in most cases.
        <ul>
          <li>And
<tt>/<span class="regex">(subpattern)+</span>/</tt>
              is equivalent to
<tt>/<span class="regex_l">(subpattern)</span><span class="regex">(subpattern)*</span>/</tt>
        </li></ul>
        </li><li>Remember that patterns can match anywhere in the string, so the pattern
<tt>/<span class="regex">x{3}</span>/</tt>
            will match any
            string with three <tt>x</tt>s in a row, including strings with <b>four</b> <tt>x</tt>s in a row (or more).
        <ul>
          <li>Typically quantifiers are used with positional anchors (see below) or are adjacent to other sub-patterns
              that limit the match.
          </li><li>For example, the pattern
<tt>/<span class="regex_l">a</span><span class="regex_m">x{3}</span><span class="regex_r">a</span>/</tt>
              will match strings containing
<tt>"<span class="regex_match_l">a</span><span class="regex_match_m">xxx</span><span class="regex_match_r">a</span>"</tt>,
              (exactly three <tt>x</tt>s surrounded by <tt>a</tt>s). Four <tt>x</tt>s surrounded by <tt>a</tt>s (<tt>axxxxa</tt>) won't match.
        </li></ul>
      </li></ul>
    </li></ul>
    <h3><a class="goto" name="SECTION0004">Alternation</a></h3>
    <ul>
      <li><i>Alternation</i> allows you to choose one of a list of possible choices.
      </li><li>Use a vertical bar (<tt>|</tt>) for alternation:<br>
<tt>&nbsp;&nbsp;&nbsp;/<span class="regex">man|bear|pig</span>/</tt><br>
          Will match all strings containing <tt>"man"</tt>, <tt>"bear"</tt>, or <tt>"pig"</tt>.
      </li><li>Note that
<tt>/<span class="regex">abc|xyz</span>/</tt>
          is equivalent to<br>
<tt>&nbsp;&nbsp;&nbsp;/<span class="regex">(abc)|(xyz)</span>/</tt><br>
          <b>not</b><br>
<tt>&nbsp;&nbsp;&nbsp;/<span class="regex_l">(ab)</span><span class="regex_m">(c|x)</span><span class="regex_r">(yz)</span>/</tt><br>
      </li><li>Spaces and alternation don't interact "intuitively":
<tt>/<span class="regex">A B|C D</span>/</tt>
          is the same as<br>
<tt>&nbsp;&nbsp;&nbsp;/<span class="regex">(A B)|(C D)</span>/</tt><br>
          <b>not</b><br>
<tt>&nbsp;&nbsp;&nbsp;/<span class="regex_l">(A )</span><span class="regex_m">(B|C)</span><span class="regex_r">( D)</span>/</tt><br>
      </li><li>This may not be what you expect, given the above example
<tt>/<span class="regex_l">a</span><span class="regex">b+</span>/</tt>,
          (which is equivalent to
<tt>/<span class="regex_l">(a)</span><span class="regex">(b+)</span>/</tt>).
      <ul>
        <li>The explanation has to do with the precedence of operators in regexes.
        </li><li>When using alternation and spaces, (and in general when using regexes) parentheses are recommended for clarity.
        </li><li>In practical use, something like:<br>
<tt>&nbsp;&nbsp;&nbsp;/<span class="regex_l">I live in </span><span class="regex">(ME|NH|VT|MA|RI|CT)</span>/</tt><br>
            will match statements entered by people who live in New England.
      </li></ul>
    </li></ul>
    <h3><a class="goto" name="SECTION0005">Character Classes</a></h3>
    <ul>
      <li>Several symbols represent a <i>character class</i>, or any of a group of characters:
      <ul>
        <li><tt>\s</tt> means "any space character" (in practical use, this is spaces, tabs, and newlines)
        </li><li><tt>\S</tt> means "any non-space character"
        </li><li><tt>\d</tt> means "any digit"
        </li><li><tt>\D</tt> means "any non-digit"
        </li><li><tt>\w</tt> means "any <i>word character</i> (letters, numbers, underscores)"
        </li><li><tt>\W</tt> means "any non-word character"
        </li><li><tt>.</tt> (a dot) means "any single character" except newline (unless you use the <tt>/s</tt> pattern modifier, in which case it matches a newline as well; see below).
      </li></ul>
      </li><li>To define a custom character class, use brackets, and possibly dashes:<br>
      <ul>
        <li>A single character means "this character is part of the class".
        </li><li>Two characters separated by a dash means "all the characters (ASCIIbetically) between the two, inclusively".
        </li><li>The class can contain multiple characters and ranges, which are simply placed adjacently.
        </li><li>Examples:
        <ul>
          <li><tt>[13579]</tt> matches any single odd digit.
          </li><li><tt>[a-m]</tt> matches any lowercase letter between <tt>a</tt> and <tt>m</tt>, while
          </li><li><tt>[A-M]</tt> matches any uppercase letter in the same range.
          </li><li><tt>[A-Ma-m]</tt> matches either
        </li></ul>
      </li></ul>
      </li><li>A <i>complement</i> class matches any character <b>except</b> those specified.
          To specify a complement class, use a caret ("<tt>^</tt>") as the first character inside the brackets.<br>
          Examples:
      <ul>
        <li>The character class<br>
            <tt>&nbsp;&nbsp;&nbsp;[^A-Z]</tt><br>
            matches anything <b>except</b> an uppercase letter.<br>
        </li><li>The character class<br>
            <tt>&nbsp;&nbsp;&nbsp;[^aeiouAEIOU]</tt><br>
            matches anything <b>except</b> a vowel.
      </li></ul>
      </li><li>Character classes can include the "named" classes:<br>
          <tt>&nbsp;&nbsp;&nbsp;[\dA-F]</tt><br>
          matches any (uppercase) hex digit. The class<br>
          <tt>&nbsp;&nbsp;&nbsp;[^\d\s]</tt><br>
          matches any character besides digits and whitespace.
      </li><li>Be careful when mixing uppercase and lowercase letters.
      <ul>
        <li>For example,<br>
            <tt>&nbsp;&nbsp;&nbsp;[A-Za-z]</tt><br>
            matches anything from <tt>A</tt> to <tt>Z</tt> or <tt>a</tt> to <tt>z</tt>, or in other words, any
            letter character.
        </li><li>However<br>
            <tt>&nbsp;&nbsp;&nbsp;[A-z]</tt><br>
            matches any character between <tt>A</tt> (ASCII value 65) and <tt>z</tt> (ASCII value 122), which includes
            all upper- and lower-case letters <b>but also includes</b> brackets and other punctuation characters you probably did not intend to be part of the character class.
      </li></ul>
      </li><li>Discussion question: what characters are in the class
          <tt>[^\W_]</tt>?
      </li><li>No matter how many characters are in a class, a class
only matches one character (it can be one of any of the characters in
the class). So the pattern<br>
<tt>&nbsp;&nbsp;&nbsp;/<span class="regex">[ABCDEFGHIJKLMNOPQRSTUVWXYZ]</span>/</tt><br>
          despite having 26 characters in it, only matches a single uppercase letter.
      <ul>
        <li>If you want to match more than one of a class, you must replicate the class, as in<br>
<tt>&nbsp;&nbsp;&nbsp;/<span class="regex_l">[ABCDEFGHIJKLMNOPQRSTUVWXYZ]</span><span class="regex">[ABCDEFGHIJKLMNOPQRSTUVWXYZ]</span>/</tt><br>
            which matches two uppercase letters.
        </li><li>You may also (preferably) use quantification (see above).
      </li></ul>
      </li><li>Classes can be used with repetition and alternation operators.  Example:<br>
<tt>&nbsp;&nbsp;&nbsp;/<span class="regex_l">\d{5}</span><span class="regex">(-\d{4})?</span>/</tt><br>
          matches a ZIP code.<br>
<tt>&nbsp;&nbsp;&nbsp;/<span class="regex">[aeiou]+</span>/</tt><br>
          matches one or more vowels.
      </li><li>Another example:<br>
<tt>&nbsp;&nbsp;&nbsp;/<span class="regex_l">have a </span><span class="regex_l">.*</span><span class="regex">day</span>/</tt><br>
          (remember that dot is a character class) matches<br>
<tt>&nbsp;&nbsp;&nbsp;"<span class="regex_match_l">have a </span><span class="regex_match_m">nice </span><span class="regex_match_r">day</span>"</tt><br>
<tt>&nbsp;&nbsp;&nbsp;"<span class="regex_match_l">have a </span><span class="regex_match_m">very nice </span><span class="regex_match_r">day</span>"</tt><br>
<tt>&nbsp;&nbsp;&nbsp;"<span class="regex_match_l">have a </span><span class="regex_match_m">Tues</span><span class="regex_match_r">day</span>"</tt><br>
<tt>&nbsp;&nbsp;&nbsp;"<span class="regex_match_l">have a </span><span class="regex_match_m">rotten </span><span class="regex_match_r">day</span>"</tt><br>
<tt>&nbsp;&nbsp;&nbsp;"<span class="regex_match_l">have a </span><span class="regex_match_m">Ms3!a 54dPO2$ -)eEq1</span><span class="regex_match_r">day</span>"</tt><br>
<tt>&nbsp;&nbsp;&nbsp;"<span class="regex_match_l">have a </span><span class="regex_match_m">&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;</span><span class="regex_match_r">day</span>"</tt><br>
<tt>&nbsp;&nbsp;&nbsp;"yo, dude, <span class="regex_match_l">have a </span><span class="regex_match_m"></span><span class="regex_match_r">day</span>!!!"</tt><br>
      </li><li>Classes are more efficient than alternation, so the pattern<br>
<tt>&nbsp;&nbsp;&nbsp;/<span class="regex">([aeiou])+</span>/</tt><br>
          will match much faster than the pattern<br>
<tt>&nbsp;&nbsp;&nbsp;/<span class="regex">(a|e|i|o|u)+</span>/</tt><br>
          even though they are functionally equivalent.
    </li></ul>
    <h3><a class="goto" name="SECTION0006">Anchors</a></h3>
    <ul>
      <li>Anchors match conditions in the string but <b>do not match characters.</b>
      </li><li><tt>^</tt> matches the beginning of the string (or at the beginning of any "line", if you use the <tt>/m</tt> pattern modifier and the string contains embedded newlines)
      </li><li><tt>$</tt> matches at the end of the string <b>before the newline (if present)</b> or at the end of any "line", under the circumstances given above
      </li><li>The pattern
<tt>/<span class="regex_l">^</span><span class="regex">hi</span>/</tt>
          will match the strings<br>
<tt>&nbsp;&nbsp;&nbsp;"<span class="regex_match_m"></span><span class="regex_match_r">hi</span>"</tt><br>
<tt>&nbsp;&nbsp;&nbsp;"<span class="regex_match_m"></span><span class="regex_match_r">hi</span> there"</tt><br>
<tt>&nbsp;&nbsp;&nbsp;"<span class="regex_match_m"></span><span class="regex_match_r">hi</span>story"</tt><br>
          but will not match<br>
          <tt>&nbsp;&nbsp;&nbsp;"this"</tt><br>
          <tt>&nbsp;&nbsp;&nbsp;"I&nbsp;said&nbsp;hi"</tt><br>
<tt>&nbsp;&nbsp;&nbsp;</tt>
          <tt>" hi"</tt> (note the space)<br>
<tt>&nbsp;&nbsp;&nbsp;</tt>
          <tt>"Hillside"</tt> (note the upper-case)<br>
      </li><li>Similarly, the pattern
<tt>/<span class="regex_l">st</span><span class="regex">$</span>/</tt>
          will match the strings<br>
<tt>&nbsp;&nbsp;&nbsp;"<span class="regex_match_m">st</span><span class="regex_match_r"></span>"</tt><br>
<tt>&nbsp;&nbsp;&nbsp;"mo<span class="regex_match_m">st</span><span class="regex_match_r"></span>"</tt><br>
<tt>&nbsp;&nbsp;&nbsp;"mnopqr<span class="regex_match_m">st</span><span class="regex_match_r"></span>"</tt><br>
<tt>&nbsp;&nbsp;&nbsp;"Come here Fa<span class="regex_match_m">st</span><span class="regex_match_r"></span>"</tt><br>
          but will not match<br>
          <tt>&nbsp;&nbsp;&nbsp;"state"</tt><br>
<tt>&nbsp;&nbsp;&nbsp;</tt>
          <tt>"st "</tt> (note the space)<br>
<tt>&nbsp;&nbsp;&nbsp;</tt>
          <tt>"Come here FAST"</tt> (note the upper-case)<br>
      </li><li>Using them together:  Remember, patterns can match anywhere in the string being matched.  Therefore, recall the pattern<br>
<tt>&nbsp;&nbsp;&nbsp;/<span class="regex_l">have a </span><span class="regex_m">(very )+</span><span class="regex_r">nice day</span>/</tt><br>
          will match<br>
<tt>&nbsp;&nbsp;&nbsp;"<span class="regex_match_l">have a </span><span class="regex_match_m">very </span><span class="regex_match_r">nice day</span>"</tt><br>
<tt>&nbsp;&nbsp;&nbsp;"<span class="regex_match_l">have a </span><span class="regex_match_m">very very </span><span class="regex_match_r">nice day</span>"</tt><br>
<tt>&nbsp;&nbsp;&nbsp;"<span class="regex_match_l">have a </span><span class="regex_match_m">very very very </span><span class="regex_match_r">nice day</span>"</tt><br>
          etc., but will also match<br>
<tt>&nbsp;&nbsp;&nbsp;"DO NOT <span class="regex_match_l">have a </span><span class="regex_match_m">very </span><span class="regex_match_r">nice day</span>, you bonehead"</tt><br>
          which is slightly different in tone...  Add anchors to correct this:<br>
<tt>&nbsp;&nbsp;&nbsp;/<span class="regex_l">^</span><span class="regex_l">have a </span><span class="regex_m">(very )+</span><span class="regex_r">nice day</span><span class="regex_r">$</span>/</tt><br>
          will no longer match strings besides<br>
<tt>&nbsp;&nbsp;&nbsp;"<span class="regex_match_l"></span><span class="regex_match_m">have a </span><span class="regex_match_m">very </span><span class="regex_match_m">nice day</span><span class="regex_match_r"></span>"</tt><br>
<tt>&nbsp;&nbsp;&nbsp;"<span class="regex_match_l"></span><span class="regex_match_m">have a </span><span class="regex_match_m">very very </span><span class="regex_match_m">nice day</span><span class="regex_match_r"></span>"</tt><br>
<tt>&nbsp;&nbsp;&nbsp;"<span class="regex_match_l"></span><span class="regex_match_m">have a </span><span class="regex_match_m">very very very </span><span class="regex_match_m">nice day</span><span class="regex_match_r"></span>"</tt><br>
          and so forth<br>
      </li><li>Note that <tt>$</tt> matches the end of the string, <b>before the newline</b> if any.
      <ul>
        <li>For example, the pattern
<tt>/<span class="regex_l">^</span><span class="regex_m">hello</span><span class="regex_r">$</span>/</tt>
            matches both
<tt>"<span class="regex_match_l"></span><span class="regex_match_m">hello</span><span class="regex_match_r"></span>"</tt>
            and
<tt>"<span class="regex_match_l"></span><span class="regex_match_m">hello</span><span class="regex_match_r"></span>\n"</tt>
            (the latter being similar to what you would read in from a file).
      </li></ul>
      </li><li>Word boundaries:
      <ul>
        <li>Consider the string <tt>"ab_c .xyz<b></b>"</tt> (why you are considering this particular string is left to the imagination). Where are the "words"?
        </li><li><tt>\b</tt> matches at a word boundary (between a <tt>\w</tt> character and a <tt>\W</tt> character, or at the beginning/end of a string if the first/last character is a word character):<br>
<tt>&nbsp;&nbsp;&nbsp;"<span class="regex_l">ab_c</span><span class="regex_l"> .</span><span class="regex_l">xyz</span><span class="regex_l">!!</span>"</tt>
        </li><li><tt>\B</tt> matches at a non-word-boundary (between two <tt>\w</tt> characters or between two <tt>\W</tt> characters, or at the beginning/end of a string if the first/last character is a non-word character)):<br>
<tt>&nbsp;&nbsp;&nbsp;"<span class="regex_r">a</span><span class="regex_r">b</span><span class="regex_r">_</span><span class="regex_r">c </span><span class="regex_r">.x</span><span class="regex_r">y</span><span class="regex_r">z!</span><span class="regex_r">!</span>"</tt>
        </li><li>As you recall, the pattern
<tt>/<span class="regex">it</span>/</tt>
            will match any string with the letters <tt>"it"</tt> in it, including:<br>
<tt>&nbsp;&nbsp;&nbsp;"<span class="regex_match">it</span>"</tt><br>
<tt>&nbsp;&nbsp;&nbsp;"sp<span class="regex_match">it</span>"</tt><br>
<tt>&nbsp;&nbsp;&nbsp;"<span class="regex_match">it</span>ch"</tt><br>
<tt>&nbsp;&nbsp;&nbsp;"fr<span class="regex_match">it</span>ter"</tt><br>
<tt>&nbsp;&nbsp;&nbsp;"this is <span class="regex_match">it</span>!"</tt><br>
<tt>&nbsp;&nbsp;&nbsp;"is <span class="regex_match">it</span> here?"</tt><br>
<tt>&nbsp;&nbsp;&nbsp;"h<span class="regex_match">it</span> it"</tt>
            (Only one needs to match for the whole string to match.)<br>
            If you are looking for strings with the word <tt>"it"</tt> by itself, use the <tt>\b</tt> anchor:<br>
<tt>&nbsp;&nbsp;&nbsp;/<span class="regex_l">\b</span><span class="regex_m">it</span><span class="regex_r">\b</span>/</tt><br>
            will match only the following from the above list:<br>
<tt>&nbsp;&nbsp;&nbsp;"<span class="regex_match_l"></span><span class="regex_match_m">it</span><span class="regex_match_r"></span>"</tt><br>
<tt>&nbsp;&nbsp;&nbsp;"this is <span class="regex_match_l"></span><span class="regex_match_m">it</span><span class="regex_match_r"></span>!"</tt><br>
<tt>&nbsp;&nbsp;&nbsp;"is <span class="regex_match_l"></span><span class="regex_match_m">it</span><span class="regex_match_r"></span> here?"</tt><br>
<tt>&nbsp;&nbsp;&nbsp;"hit <span class="regex_match_l"></span><span class="regex_match_m">it</span><span class="regex_match_r"></span>"</tt><br>
            Alternatively, <br>
<tt>&nbsp;&nbsp;&nbsp;/<span class="regex_l">\B</span><span class="regex_m">it</span><span class="regex_r">\B</span>/</tt><br>
            will match only the following from the above list:<br>
<tt>&nbsp;&nbsp;&nbsp;"fr<span class="regex_match_l"></span><span class="regex_match_m">it</span><span class="regex_match_r"></span>ter"</tt><br>
            This is because <tt>\B</tt> <b>does not match</b> at the beginning and end of strings.
            To match strings containing words with the letters <tt>"it"</tt> but not the word <tt>"it"</tt> itself, you must do:<br>
            <tt>&nbsp;&nbsp;&nbsp;/(it\B)|(\Bit)/</tt><br>
      </li></ul>
      </li><li>An important point: anchors do <b>not</b> match characters. They match <b>positions</b> or <b>conditions</b> within the string.
    </li></ul>
    <h3><a class="goto" name="SECTION0007">Escaped Characters</a></h3>
    <ul>
      <li>To literally match one of the characters in the regex grammar (or a slash), escape it with a backslash.
      </li><li>Example 1:<br>
<tt>&nbsp;&nbsp;&nbsp;/<span class="regex_l">^</span><span class="regex_l">\$</span><span class="regex_l">[a-zA-Z_]</span><span class="regex_l">\w*</span><span class="regex_l">\[</span><span class="regex_l">[0-9]+</span><span class="regex_m">\]</span><span class="regex_r">$</span>/</tt><br>
          matches legal Perl array lookups (with a numeric index), such as
<tt><span class="regex_match_l"></span><span class="regex_match_m">$</span><span class="regex_match_l">f</span><span class="regex_match_l">oo</span><span class="regex_match_l">[</span><span class="regex_match_l">3</span><span class="regex_match_m">]</span><span class="regex_match_r"></span></tt>.
      <ul>
        <li>Note the backslash before the first dollar sign (because
it's meant literally) and the lack of backslash before the second one
(because it's matching the end of the string).
        </li><li>Note also which brackets are escaped (literal) and which are defining character classes.
      </li></ul>
      </li><li>Example 2:<br>
          <tt>&nbsp;&nbsp;&nbsp;/^\d+\.\d+\.\d+\.\d+$/</tt><br>
          matches IP addresses (you might then want to check that each part was between 0 and 255).
      </li><li>Inside a character class, you do not need to escape anything except <tt>[</tt>, <tt>]</tt>, <tt>/</tt>, and <tt>\</tt>.
      <ul>
        <li>You don't need to escape <tt>-</tt>; instead put it as the first (after any initial <tt>^</tt>) or last character in the class and Perl will figure out what you mean.
        </li><li>You don't need to escape <tt>^</tt>; just put it anywhere besides first.
        </li><li>Bonus discussion question: what character(s) do the following classes match?<br>
            <tt>&nbsp;&nbsp;&nbsp;[--^]&nbsp;&nbsp;[^-^]&nbsp;&nbsp;[-^]</tt><br>
            <tt>&nbsp;&nbsp;&nbsp;[^--^]&nbsp;&nbsp;[^-]&nbsp;&nbsp;[^^]</tt>
      </li></ul>
      </li><li>If your pattern contains slashes, you have to escape them. If your pattern contains a lot of slashes, this can get ugly:
<table class="show_perl" width="75%"><tbody><tr class="show_perl"><td class="show_perl"><pre class="show_perl">if ($filename =~ m/^(\/?([^\/]+\/)*([^\/]+)(\.[^\/]+)?$/) {
  # it is an absolute filename!
}
</pre></td></tr></tbody></table>
      <ul>
        <li>In this case, you may use braces to delimit your pattern, which means you don't
            need to escape the slashes:
<table class="show_perl" width="75%"><tbody><tr class="show_perl"><td class="show_perl"><pre class="show_perl">if ($filename = m{^(/?([^/]+/)*([^/]+)(\.[^/]+)?$}) {
  # it is an absolute filename!
}
</pre></td></tr></tbody></table>
        <ul>
          <li>OK, maybe this isn't <b>clear</b>, but it's clear<b>ER</b>.
        </li></ul>
      </li></ul>
    </li></ul>
    <h3><a class="goto" name="SECTION0008">Case Sensitivity</a></h3>
    <ul>
      <li>As mentioned, regular expressions are case-sensitive.
      </li><li>It is always possible to write a regex that ignores case, by using
          <tt>[a-zA-Z]</tt> for a letter instead of <tt>[a-z]</tt> or <tt>[A-Z]</tt> and by
          specifying literal text as <tt>[Pp][Ee][Rr][Ll]</tt>.
      </li><li>However, if you have long strings of literal text, this is very cumbersome.
      </li><li>Instead, use the <tt>/i</tt> modifier to cause the regex to match case-insensitively.
      </li><li>Just add an <tt>i</tt> after the regex, like this: <tt>/^PERL$/i;</tt>.
      </li><li>Example: the code
<table class="show_perl" width="75%"><tbody><tr class="show_perl"><td class="show_perl"><pre class="show_perl"><span class="linenum">1</span> my @strings = ("I am Bobby", "I AM BOBBY", "i am bobby");
<span class="linenum">2</span>
<span class="linenum">3</span> foreach my $string (@strings) {
<span class="linenum">4</span>   print "  case-sensitive: ";
<span class="linenum">5</span>   print "\"$string\" " . ($string =~ /Bob/ ? "matches" : "doesn't match") . " /Bob/\n";
<span class="linenum">6</span>   print "case-insensitive: ";
<span class="linenum">7</span>   print "\"$string\" " . ($string =~ /Bob/i ? "matches" : "doesn't match") . " /Bob/i\n\n";
<span class="linenum">8</span> }
</pre></td></tr></tbody></table>
          outputs
<table class="exec_perl" width="75%"><tbody><tr class="exec_perl"><td class="exec_perl"><pre class="exec_perl">  case-sensitive: "I am Bobby" matches /Bob/
case-insensitive: "I am Bobby" matches /Bob/i

  case-sensitive: "I AM BOBBY" doesn't match /Bob/
case-insensitive: "I AM BOBBY" matches /Bob/i

  case-sensitive: "i am bobby" doesn't match /Bob/
case-insensitive: "i am bobby" matches /Bob/i
</pre></td></tr></tbody></table>
    </li></ul>
    <h2><a class="goto" name="CHAPTER0002">Rules For Matching</a></h2>
    <ul>
      <li>In addition to the regex grammar itself, remember two things and you won't be steered wrong:
      <ol>
        <li>Patterns always try to match the <b>LONGEST LEFTMOST</b> substring.
        <ul>
          <li>In other words, if a pattern can match two places in a string, it always matches as far to the left as possible.
          </li><li>If there are two possible matches at that leftmost spot, the longest one is the substring that is actually matched.
        </li></ul>
        </li><li>When a pattern is matched against a string, the string
must match all sub-patterns (be they individual characters, anchors,
parenthetical
            expressions, alternations, or quantifications) <b>IN-ORDER, ADJACENTLY</b>.
      </li></ol>
    </li></ul>
    <h2><a class="goto" name="CHAPTER0003">Storing Regexes</a></h2>
    <ul>
      <li>Regular expressions are just strings, and as such can be stored in a scalar variable, but there is a
          special function to facilitate this: <tt>qr//</tt>.
      <ul>
        <li>So this:
<table class="show_perl" width="75%"><tbody><tr class="show_perl"><td class="show_perl"><pre class="show_perl">if ($zip_data =~ m/^\d{5}(-\d{4})?$/) {
  print "Valid ZIP/ZIP+4 code\n";
}
</pre></td></tr></tbody></table>
            is the same as this:
<table class="show_perl" width="75%"><tbody><tr class="show_perl"><td class="show_perl"><pre class="show_perl"><span class="linenum">1</span> my $rgx_zip = qr/^\d{5}(-\d{4})?$/;
<span class="linenum">2</span>
<span class="linenum">3</span> if ($zip_data =~ m/$rgx_zip/) {
<span class="linenum">4</span>   print "Valid ZIP/ZIP+4 code\n";
<span class="linenum">5</span> }
</pre></td></tr></tbody></table>
      </li></ul>
      </li><li>This is helpful because it is a form of encapsulation: regexes are defined away from where they are used.
      </li><li>It is especially beneficial as a form of code re-use: storing regexes that will be used in different places throughout
          your program.
      </li><li>Finally, it facilitates use creating commented multi-line regexes using the <tt>/x</tt>
          modifier (see below); it is unwieldy to use <tt>/x</tt> <i>in-place</i>.
    </li></ul>
    <h2><a class="goto" name="CHAPTER0004">Using Regexes</a></h2>
    There are three primary operations in Perl that use regular expressions: matching, substituting, and splitting.
    <h3><a class="goto" name="SECTION0009">Matching Using Regexes</a></h3>
    <ul>
      <li>The most basic regex operation is determining whether a string matches a given pattern. This is done using the
          <tt>=~</tt> operator (pronounced "matches") and the <tt>m/pattern/</tt> construct.
<table class="show_perl" width="75%"><tbody><tr class="show_perl"><td class="show_perl"><pre class="show_perl"><span class="linenum">1</span> my @strings = qw(apple banana cucumber durian potato);
<span class="linenum">2</span> foreach my $string (@strings) {
<span class="linenum">3</span>   if ($string =~ m/^([bcdfghjklmnpqrstvwxyz][aeiouy])+$/) {
<span class="linenum">4</span>     print "\"$string\" matches CVCV...CV\n";
<span class="linenum">5</span>   }
<span class="linenum">6</span>   else {
<span class="linenum">7</span>     print "\"$string\" does not match\n";
<span class="linenum">8</span>   }
<span class="linenum">9</span> }
</pre></td></tr></tbody></table>
          outputs:
<table class="exec_perl" width="75%"><tbody><tr class="exec_perl"><td class="exec_perl"><pre class="exec_perl">"apple" does not match
"banana" matches CVCV...CV
"cucumber" does not match
"durian" does not match
"potato" matches CVCV...CV
</pre></td></tr></tbody></table>
      <ul>
        <li>Note <tt>m</tt> in front of the pattern. This is technically not required for simple pattern
            matching but it's best practice to <b>always</b> include it.
      </li></ul>
      </li><li>In scalar (typically Boolean) context, the pattern match operation returns a true value if the string matches
          the pattern and false if not.
      </li><li>What <tt>m//</tt> returns in list context is covered later.
      </li><li>The "does not match" operator, <tt>!~</tt>, is simply the inverse of <tt>=~</tt>. It is primarily used in Boolean context.
      </li><li>Note that <tt>m//</tt> works on any string, including non-<i>lvalues</i> such as a function return. Therefore, you can use <tt>m//</tt>
 to test whether a function returned a string that matches a certain
pattern, or to test whether a constant string matches a (presumably
variable) pattern.
      </li><li>If you use the "global match" modifier ("<tt>/g</tt>"), each match remembers where it left off. Matching the same pattern against the same string
          repeatedly does a <i>progressive match</i>.
      <ul>
        <li>The code<br>
            <tt>&nbsp;&nbsp;&nbsp;"abcdefghijklmno"&nbsp;=~&nbsp;m/[aeiou]/g</tt>
            (match one vowel)<br>
            will match four substrings of the string if called repeatedly:<br>
<tt>&nbsp;&nbsp;&nbsp;<span class="regex_match">a</span>bcd<span class="regex_match">e</span>fgh<span class="regex_match">i</span>jklmn<span class="regex_match">o</span></tt>
        </li><li>The code<br>
            <tt>&nbsp;&nbsp;&nbsp;"abcdefghijklmno"&nbsp;=~&nbsp;m/[aeiou]./g</tt>
            (match a vowel followed by another character)<br>
            will match three substrings:<br>
<tt>&nbsp;&nbsp;&nbsp;<span class="regex_match">ab</span>cd<span class="regex_match">ef</span>gh<span class="regex_match">ij</span>klmno</tt>
        <ul>
          <li>It can't match where the "<tt>o</tt>" is, because there is no character after it.
        </li></ul>
        </li><li>The code<br>
            <tt>&nbsp;&nbsp;&nbsp;"abcdefghijklmno"&nbsp;=~&nbsp;m/.[aeiou]/g</tt>
            (match a character followed by a vowel)<br>
            will match three substrings:<br>
<tt>&nbsp;&nbsp;&nbsp;abc<span class="regex_match">de</span>fg<span class="regex_match">hi</span>jklm<span class="regex_match">no</span></tt>
        <ul>
          <li>It can't match where the "<tt>a</tt>" is because there is no character preceding it.
        </li></ul>
        </li><li>The code<br>
            <tt>&nbsp;&nbsp;&nbsp;"abcdefghijklmno"&nbsp;=~&nbsp;m/^[aeiou]/g</tt>
            (match a vowel at the beginning of the string)<br>
            will only match one substring:<br>
<tt>&nbsp;&nbsp;&nbsp;<span class="regex_match">a</span>bcdefghijklmno</tt>
            <br>
            as will the code<br>
            <tt>&nbsp;&nbsp;&nbsp;"abcdefghijklmno"&nbsp;=~&nbsp;m/[aeiou]$/g</tt>
            (match a vowel at the end of the string)<br>
            which matches:<br>
<tt>&nbsp;&nbsp;&nbsp;abcdefghijklmn<span class="regex_match">o</span></tt>
        <ul>
          <li>These can't match any of the other vowels because they aren't at the
              beginning/end of the string.
          </li><li>It is almost always pointless to use a progressive match that
              includes anchors like <tt>^</tt> and <tt>$</tt> <b>unless</b> it is a string with
              embedded newlines and the regex has the <tt>/m</tt> modifier.
        </li></ul>
        </li><li>The code<br>
            <tt>&nbsp;&nbsp;&nbsp;"abcdefghijklmno"&nbsp;=~&nbsp;m/[aeiou].{3}/g</tt>
            (match a vowel followed by three characters)<br>
            will match three substrings:<br>
<tt>&nbsp;&nbsp;&nbsp;<span class="regex_match">abcd</span><span class="regex_match">efgh</span><span class="regex_match">ijkl</span>mno</tt>
        <ul>
          <li>But the code<br>
              <tt>&nbsp;&nbsp;&nbsp;"abcdefghijklmno"&nbsp;=~&nbsp;m/[aeiou].{4}/g</tt>
              (match a vowel followed by <b>four</b> characters)<br>
              will only match two substrings:<br>
<tt>&nbsp;&nbsp;&nbsp;<span class="regex_match">abcde</span>fgh<span class="regex_match">ijklm</span>no</tt>
          <ul>
            <li>It can't match where the "<tt>e</tt>" is because the leave-off point has already progressed beyond it (the "<tt>e</tt>" was one of the four characters
                following the "<tt>a</tt>".)
          </li></ul>
        </li></ul>
      </li></ul>
      <br>
      A code-based example:
<table class="show_perl" width="75%"><tbody><tr class="show_perl"><td class="show_perl"><pre class="show_perl"><span class="linenum"> 1</span> my $string = "this 47 string 390 has numbers 000 embedded in it.";
<span class="linenum"> 2</span> my $match_count = 0;
<span class="linenum"> 3</span> while (1) {
<span class="linenum"> 4</span>   if ($string =~ m/\d+/g) {
<span class="linenum"> 5</span>     print "Found match ending before position " . pos($string) . "\n";
<span class="linenum"> 6</span>     ++$match_count;
<span class="linenum"> 7</span>   }
<span class="linenum"> 8</span>   else {
<span class="linenum"> 9</span>     print "No more matches!\n";
<span class="linenum">10</span>     last;
<span class="linenum">11</span>   }
<span class="linenum">12</span> }
<span class="linenum">13</span> print "Found $match_count total matches.\n";
</pre></td></tr></tbody></table>
      outputs
<table class="exec_perl" width="75%"><tbody><tr class="exec_perl"><td class="exec_perl"><pre class="exec_perl">Found match ending before position 7
Found match ending before position 18
Found match ending before position 34
No more matches!
Found 3 total matches.
</pre></td></tr></tbody></table>
      <ul>
        <li>Note that the location where the next global match will start is accessible using the <tt>pos</tt> function.
        </li><li>When using progressive matching, beware positional anchors, especially absolute anchors, in the pattern.
        </li><li>Also be aware of non-matching stuff in between matching
 sections of the string. This is not so important in
            the example above, where the pattern allows us to only match
 what we want, but many patterns must match every character.
      </li></ul>
    </li></ul>
    <h3><a class="goto" name="SECTION0010">Substitution Using Regexes</a></h3>
    <ul>
      <li>Another common regex operation is substituting. This is accomplished using the <tt>s/pattern/replacement/</tt> construct with the <tt>=~</tt> operator.
          A simple example: the code
<table class="show_perl" width="75%"><tbody><tr class="show_perl"><td class="show_perl"><pre class="show_perl">my $string = "Bahama mama";
print "Original String: \"$string\"\n";
$string =~ s/ma/pop/;
print "Modified String: \"$string\"\n";
</pre></td></tr></tbody></table>
          which outputs
<table class="exec_perl" width="75%"><tbody><tr class="exec_perl"><td class="exec_perl"><pre class="exec_perl">Original String: "Bahama mama"
Modified String: "Bahapop mama"
</pre></td></tr></tbody></table>
      </li><li><tt>s///</tt> returns the number of substitutions made, regardless of context.
      <ul>
        <li>Therefore it doesn't make much sense to call <tt>s///</tt> in list context.
        </li><li>In Boolean context, this helpfully evaluates to "true" if the substitution succeeded and "false" otherwise.
        </li><li>The value will always be 0 or 1 unless the substitution uses the "global" modifier ("<tt>/g</tt>", as it was with <tt>m//</tt>). This causes the <tt>s///</tt> to
            find and replace ALL matching substrings with the replacement:
<table class="show_perl" width="75%"><tbody><tr class="show_perl"><td class="show_perl"><pre class="show_perl"><span class="linenum">1</span> my $string = "Bahama mama";
<span class="linenum">2</span> print "Original String: \"$string\"\n";
<span class="linenum">3</span> my $repl_count = ($string =~ s/ma/pop/g);
<span class="linenum">4</span> print "Modified String: \"$string\"\n";
<span class="linenum">5</span> print "  - made $repl_count replacements\n";
</pre></td></tr></tbody></table>
            outputs
<table class="exec_perl" width="75%"><tbody><tr class="exec_perl"><td class="exec_perl"><pre class="exec_perl">Original String: "Bahama mama"
Modified String: "Bahapop poppop"
  - made 3 replacements
</pre></td></tr></tbody></table>
      </li></ul>
      </li><li>Note that <tt>s///</tt> is destructive. If you want to remember the original string, you must make a copy first.
          You can do this simply in one operation, (called "substitution <i>en-passant</i>"), like this:
<table class="show_perl" width="75%"><tbody><tr class="show_perl"><td class="show_perl"><pre class="show_perl">my $orig_string = "Bahama mama";
(my $mod_string = $orig_string) =~ s/ma$/pop/;
print "Original String: \"$orig_string\"\n";
print "Modified String: \"$mod_string\"\n";
</pre></td></tr></tbody></table>
          which outputs
<table class="exec_perl" width="75%"><tbody><tr class="exec_perl"><td class="exec_perl"><pre class="exec_perl">Original String: "Bahama mama"
Modified String: "Bahama mapop"
</pre></td></tr></tbody></table>
      <ul>
        <li>A side effect of <tt>s///</tt>'s destructiveness is that unlike <tt>m//</tt>, it can only be applied to lvalues. You <b>cannot</b>
            use <tt>s///</tt> on constant strings, the return values of subroutine calls, etc. Code like this:
<table class="show_perl" width="75%"><tbody><tr class="show_perl"><td class="show_perl"><pre class="show_perl">"constant string" =~ s/a/b/g;
</pre></td></tr></tbody></table>
            blows up like this:
<table class="exec_perl" width="75%"><tbody><tr class="exec_perl"><td class="exec_perl"><pre class="exec_perl">Can't modify constant item in substitution (s///) at subst_const.pl line 1, near "s/a/b/g;"
Execution of subst_const.pl aborted due to compilation errors.
</pre></td></tr></tbody></table>
        </li><li>Not that this type of operation makes any sense without first storing the value into a variable.
        </li><li>Substitution en-passant works when the original value is a non-lvalue:
<table class="show_perl" width="75%"><tbody><tr class="show_perl"><td class="show_perl"><pre class="show_perl">(my $string = "This is a non-lvalue") =~ s/a/x/g;
print $string;
</pre></td></tr></tbody></table>
            outputs
<table class="exec_perl" width="75%"><tbody><tr class="exec_perl"><td class="exec_perl"><pre class="exec_perl">This is x non-lvxlue</pre></td></tr></tbody></table>
      </li></ul>
      </li><li>The <tt>replacement</tt> is treated as a normal
double-quoted string. Therefore you can interpolate variables into it,
as well as hash lookups, array indexes, etc.
    </li></ul>
    <h3><a class="goto" name="SECTION0011">Split Using Regexes</a></h3>
    <ul>
      <li>We have already seen the <tt>split</tt> function used with the first argument to it being a single character
          that is used as a delimiter.
      </li><li>However, the first argument to split can be a regular expression. That regex is matched repeatedly
          in the target string (although never overlapping) and the text in between each match is what is
          returned as the array of substrings.
      </li><li>So for example, using just a single-character delimiter:
<table class="show_perl" width="75%"><tbody><tr class="show_perl"><td class="show_perl"><pre class="show_perl">my $string = "Johnson, Bill,345-F24-134A , x1457 ,Nashua";
my @fields = split ",", $string;

print Dumper \@fields;
</pre></td></tr></tbody></table>
          outputs
<table class="exec_perl" width="75%"><tbody><tr class="exec_perl"><td class="exec_perl"><pre class="exec_perl">$VAR1 = [
          'Johnson',
          ' Bill',
          '345-F24-134A ',
          ' x1457 ',
          'Nashua'
        ];
</pre></td></tr></tbody></table>
      <ul>
        <li>All the useless spaces around the commas are preserved.
      </li></ul>
      <p>
      Using a regex as the delimiter yields much better results:
<table class="show_perl" width="75%"><tbody><tr class="show_perl"><td class="show_perl"><pre class="show_perl">my $string = "Johnson, Bill,345-F24-134A , x1457 ,Nashua";
my @fields = split /\s*,\s*/, $string;

print Dumper \@fields;
</pre></td></tr></tbody></table>
      outputs
<table class="exec_perl" width="75%"><tbody><tr class="exec_perl"><td class="exec_perl"><pre class="exec_perl">$VAR1 = [
          'Johnson',
          'Bill',
          '345-F24-134A',
          'x1457',
          'Nashua'
        ];
</pre></td></tr></tbody></table>
      </p></li><li>Another common example is to use
          <tt>split /\s+/</tt> to split a string on one or more whitespace characters, including spaces and tabs.
      </li><li>Note that the first argument to <tt>split</tt> is the one place where you do not need the <tt>m</tt> in front of the slashes.
          It is obvious by the fact that you are calling split that the first argument is a regex.
      </li><li><tt>split</tt> is not destructive and can be applied to non-lvalues:
<table class="show_perl" width="75%"><tbody><tr class="show_perl"><td class="show_perl"><pre class="show_perl">my ($key, $value) = split /\s*=\s*/, read_config_line(), 2;
</pre></td></tr></tbody></table>
      </li><li>If you give <tt>split</tt> an empty regex, it returns an array of the individual characters from the original string:
<table class="show_perl" width="75%"><tbody><tr class="show_perl"><td class="show_perl"><pre class="show_perl">my @chars = split //, "yee-ha!";
print Dumper \@chars;
</pre></td></tr></tbody></table>
          prints
<table class="exec_perl" width="75%"><tbody><tr class="exec_perl"><td class="exec_perl"><pre class="exec_perl">$VAR1 = [
          'y',
          'e',
          'e',
          '-',
          'h',
          'a',
          '!'
        ];
</pre></td></tr></tbody></table>
    </li></ul>
    <h2><a class="goto" name="CHAPTER0005">Regex Memory</a></h2>
    <ul>
      <li>As seen, parentheses are used in regexes for <i>grouping</i> but they also have
          <i>memory</i>.
      </li><li>What is matched inside the parentheses is <i>captured</i>, and can be accessed
          after the match.
      </li><li>Use the <i>match variable</i> <tt>$1</tt> to access the first <i>sub-match</i>—i.e., what was matched by the first set of parentheses (counting from the left).
          <tt>$2</tt> contains what was matched by the second set of parentheses, and so on.
      <ul>
        <li>If you need to go higher than <tt>$9</tt>, you should consider splitting the string
            up into chunks first (perhaps by using capturing parentheses) and then further
            processing the chunks using smaller regexes.
        </li><li><tt>$1</tt> corresponds to the 0th match (i.e. the match variables are 1-indexed) because <tt>$0</tt> means something
            totally different.
      </li></ul>
      </li><li>Example:
<table class="show_perl" width="75%"><tbody><tr class="show_perl"><td class="show_perl"><pre class="show_perl"><span class="linenum"> 1</span> my $str = "phone number is 603-432-8696.";
<span class="linenum"> 2</span> my $rgx_phone = qr/(\d{3})-(\d{3})-(\d{4})/;
<span class="linenum"> 3</span>
<span class="linenum"> 4</span> print "Searching the string \"$str\"\n";
<span class="linenum"> 5</span> if ($str =~ $rgx_phone) {
<span class="linenum"> 6</span>   print "Found a phone number. Submatches are [$1], [$2], [$3].\n";
<span class="linenum"> 7</span> }
<span class="linenum"> 8</span> else {
<span class="linenum"> 9</span>   print "Didn't find a phone number\n";
<span class="linenum">10</span> }
</pre></td></tr></tbody></table>
          outputs:
<table class="exec_perl" width="75%"><tbody><tr class="exec_perl"><td class="exec_perl"><pre class="exec_perl">Searching the string "phone number is 603-432-8696."
Found a phone number. Submatches are [603], [432], [8696].
</pre></td></tr></tbody></table>
      </li><li>If the overall pattern doesn't match, the match variables will not contain meaningful information.
      <ul>
        <li><b>Make sure the match succeeded before using the match variables!</b>
      </li></ul>
      </li><li>If a match is performed in list context, it returns the match variables in  list (e.g. <tt>($1, $2, $3)</tt>), or an empty list if the match fails.
      <ul>
        <li>For example, the code
<table class="show_perl" width="75%"><tbody><tr class="show_perl"><td class="show_perl"><pre class="show_perl">my $string = "Julie ate a peach.";
my @matches = ($string =~ m/([aeiou])([aeiou])/);
print Dumper \@matches;
</pre></td></tr></tbody></table>
            outputs
<table class="exec_perl" width="75%"><tbody><tr class="exec_perl"><td class="exec_perl"><pre class="exec_perl">$VAR1 = [
          'i',
          'e'
        ];
</pre></td></tr></tbody></table>
        </li><li>This is another time when using array slices on a returned value can be handy:
<table class="show_perl" width="75%"><tbody><tr class="show_perl"><td class="show_perl"><pre class="show_perl"><span class="linenum">1</span> my $rgx_zip_code = qr/^(\d{5})(-\d{4})?$/;
<span class="linenum">2</span>
<span class="linenum">3</span> my $zip_code = "03060-1234";
<span class="linenum">4</span> my $zip_code_base = ($zip_code =~ m/$rgx_zip_code/)[0];
<span class="linenum">5</span>
<span class="linenum">6</span> print "base of ZIP code \"$zip_code\" is \"$zip_code_base\"\n";
</pre></td></tr></tbody></table>
            outputs
<table class="exec_perl" width="75%"><tbody><tr class="exec_perl"><td class="exec_perl"><pre class="exec_perl">base of ZIP code "03060-1234" is "03060"
</pre></td></tr></tbody></table>
        <ul>
          <li>Note that <tt>$1</tt> corresponds to element <b>zero</b>, <tt>$2</tt> to element <b>one</b>, etc.
        </li></ul>
        </li><li>If the <tt>/g</tt> (global) modifier is used, in list context the match returns all sub-matches in the entire string as it matches repeatedly:
<table class="show_perl" width="75%"><tbody><tr class="show_perl"><td class="show_perl"><pre class="show_perl">my $string = "Julie ate a peach.";
my @matches = ($string =~ m/([aeiou])([aeiou])/g);
print Dumper \@matches;
</pre></td></tr></tbody></table>
            outputs
<table class="exec_perl" width="75%"><tbody><tr class="exec_perl"><td class="exec_perl"><pre class="exec_perl">$VAR1 = [
          'i',
          'e',
          'e',
          'a'
        ];
</pre></td></tr></tbody></table>
        </li><li>Finally, if the pattern doesn't contain any capturing parentheses, evaluating the
            match in list context doesn't make sense, unless it's a global match, in which case
            it returns all the substrings that matched:
<table class="show_perl" width="75%"><tbody><tr class="show_perl"><td class="show_perl"><pre class="show_perl">my $string = "Julie ate a peach.";
my @matches = ($string =~ m/[aeiou]{2}/g);
print Dumper \@matches;
</pre></td></tr></tbody></table>
            outputs
<table class="exec_perl" width="75%"><tbody><tr class="exec_perl"><td class="exec_perl"><pre class="exec_perl">$VAR1 = [
          'ie',
          'ea'
        ];
</pre></td></tr></tbody></table>
      </li></ul>
      </li><li>You can also use the match variables in the replacement string.
      <ul>
        <li>Example:
<table class="show_perl" width="75%"><tbody><tr class="show_perl"><td class="show_perl"><pre class="show_perl"><span class="linenum">1</span> my $str = 'name = Mr. John Smith';
<span class="linenum">2</span> my $rgx_name = qr/[A-Z][a-z]+\.\s+([A-Z][a-z]+)\s+([A-Z][a-z]+)/;
<span class="linenum">3</span>
<span class="linenum">4</span> print "string before: \"$str\"\n";
<span class="linenum">5</span> $str =~ s/$rgx_name/$2, $1/;
<span class="linenum">6</span> print "string after: \"$str\"\n";
</pre></td></tr></tbody></table>
            outputs:
<table class="exec_perl" width="75%"><tbody><tr class="exec_perl"><td class="exec_perl"><pre class="exec_perl">string before: "name = Mr. John Smith"
string after: "name = Smith, John"
</pre></td></tr></tbody></table>
      </li></ul>
      </li><li>Finally, you can even use the sub-matches later in the same pattern by referring
          to then as <tt>\1</tt>, <tt>\2</tt>, etc.
      <ul>
        <li>Example:
<table class="show_perl" width="75%"><tbody><tr class="show_perl"><td class="show_perl"><pre class="show_perl"><span class="linenum"> 1</span> my $text_ok = "This is the hour of our discontent";
<span class="linenum"> 2</span> my $text_repeat = "This is the the hour of our discontent";
<span class="linenum"> 3</span> my $rgx_repeat_words = qr/\b([a-z]+)\s+\1/i;
<span class="linenum"> 4</span>
<span class="linenum"> 5</span> foreach my $text ($text_ok, $text_repeat) {
<span class="linenum"> 6</span>   print "- searching for repeated words in string:\n";
<span class="linenum"> 7</span>   print "    \"$text\"\n";
<span class="linenum"> 8</span>   if ($text =~ m/$rgx_repeat_words/) {
<span class="linenum"> 9</span>     print "  found repeated word \"$1\"\n";
<span class="linenum">10</span>   }
<span class="linenum">11</span>   else {
<span class="linenum">12</span>     print "  no repeated words found\n";
<span class="linenum">13</span>   }
<span class="linenum">14</span> }
</pre></td></tr></tbody></table>
            outputs:
<table class="exec_perl" width="75%"><tbody><tr class="exec_perl"><td class="exec_perl"><pre class="exec_perl">- searching for repeated words in string:
    "This is the hour of our discontent"
  no repeated words found
- searching for repeated words in string:
    "This is the the hour of our discontent"
  found repeated word "the"
</pre></td></tr></tbody></table>
        </li><li>This is known as <i>backreferencing</i>.
      </li></ul>
      </li><li>If you are using Perl 5.10 or higher, you can also used <i>named capture buffers</i>.
          This stores the results of captures by name. After your match, you can access the sub-matches
          via the fantastically-named <tt>%+</tt> hash. So the code:
<table class="show_perl" width="75%"><tbody><tr class="show_perl"><td class="show_perl"><pre class="show_perl"><span class="linenum"> 1</span> my @strings = qw(800-123-4567 987-6543 90210-1000);
<span class="linenum"> 2</span> my $rgx = qr/^((?&lt;area_code&gt;\d{3})-)?(?&lt;number&gt;\d{3}-\d{4})$/;
<span class="linenum"> 3</span>
<span class="linenum"> 4</span> foreach my $str (@strings) {
<span class="linenum"> 5</span>   print "parsing \"$str\"\n";
<span class="linenum"> 6</span>   if ($str =~ $rgx) {
<span class="linenum"> 7</span>     print "  - Number was $+{number}\n";
<span class="linenum"> 8</span>     print "  - Area code was " . ($+{area_code} // "not present") . "\n";
<span class="linenum"> 9</span>   }
<span class="linenum">10</span>   else {
<span class="linenum">11</span>     print "   - Not a phone number!\n";
<span class="linenum">12</span>   }
<span class="linenum">13</span> }
</pre></td></tr></tbody></table>
          outputs
<table class="exec_perl" width="75%"><tbody><tr class="exec_perl"><td class="exec_perl"><pre class="exec_perl">parsing "800-123-4567"
  - Number was 123-4567
  - Area code was 800
parsing "987-6543"
  - Number was 987-6543
  - Area code was not present
parsing "90210-1000"
   - Not a phone number!
</pre></td></tr></tbody></table>
      <ul>
        <li>To use named buffers as backrefences, use the <tt>\g{}</tt>  notation. For example, a simple regex to detect
            repeated words in text might look like<br>
            <tt>&nbsp;&nbsp;&nbsp;/\b(?&lt;word&gt;[a-z]+)&nbsp;\g{word}\b/</tt>
      </li></ul>
      </li><li>Named capture buffers can help write more readable code, and if you are using Perl 5.10+ on all of your
          systems, you should consider using them. However, there are vast swaths of Perl out there that
          use the numbered buffers, so make sure you understand them as well.
      </li><li>If you want to group a sub-pattern for clarity or to force precedence, but don't want to
          capture it (an action also known as <i>clustering</i>), you can use <i>non-capturing</i> parentheses.
      <ul>
        <li>They are specified as <tt>(?: ... )</tt>.
        </li><li>They group like ordinary parentheses, but their contents are not captured.
        </li><li>For example:
<table class="show_perl" width="75%"><tbody><tr class="show_perl"><td class="show_perl"><pre class="show_perl"><span class="linenum"> 1</span> my @names = ("Mr. John Smith", "Bill Jackson", "Mrs. White", "Bill");
<span class="linenum"> 2</span> my $rgx_lastname = qr/^(?:[A-Z][a-z]+\.)?\s*(?:[A-Z][a-z]+)?\s+([A-Z][a-z]+)/;
<span class="linenum"> 3</span>
<span class="linenum"> 4</span> foreach my $name (@names) {
<span class="linenum"> 5</span>   if ($name =~ m/$rgx_lastname/) {
<span class="linenum"> 6</span>     print "For the name \"$name\", got last name \"$1\"\n";
<span class="linenum"> 7</span>   }
<span class="linenum"> 8</span>   else {
<span class="linenum"> 9</span>     print "The name \"$name\" did not match\n";
<span class="linenum">10</span>   }
<span class="linenum">11</span> }
</pre></td></tr></tbody></table>
            outputs:
<table class="exec_perl" width="75%"><tbody><tr class="exec_perl"><td class="exec_perl"><pre class="exec_perl">For the name "Mr. John Smith", got last name "Smith"
For the name "Bill Jackson", got last name "Jackson"
For the name "Mrs. White", got last name "White"
The name "Bill" did not match
</pre></td></tr></tbody></table>
        <ul>
          <li>Note in this example, even though there are three sets of parentheses,
              only one set captures anything. This simplifies subsequent operations.
        </li></ul>
        </li><li>Non-capturing parentheses are usually good to use unless you know you will be capturing.
        <ul>
          <li>However, they make a regular expression harder to read because they
              are more complicated. For this reason, these notes (and many perl programmers)
              use regular parentheses unless it's important for parentheses NOT to capture.
        </li></ul>
      </li></ul>
      </li><li>If you include <i>capturing parentheses</i> in a regex passed to <tt>split</tt>, what they match will be returned,
          interspersed with the delimited fields. It's usually best to use non-capturing
          parentheses in regexes passed to <tt>split</tt> unless you need to know what the delimiters were.
      </li><li>What a match variable contains depends on how its corresponding parenthetical
          expression figures into the overall match.
      <ul>
        <li>Under most circumstances, the match variable will contains the part of the string its sub-pattern matched in the overall match.
        <ul>
          <li>Note that if its subpattern <u>contains</u> a quantifier that allows zero matches (e.g. <tt>?</tt>, <tt>*</tt>, or <tt>{0,}</tt>), the match
              variable may be defined but empty (<tt>""</tt>).
        </li></ul>
        </li><li>If a sub-pattern matches multiple times because it (or a sub-match that contains it) is followed by a quantifier,
            the match variable associated with that expression will contain the <b>last thing that matched</b>.
        </li><li>If a sub-pattern is not included in the match (for instance, because it is on one
            side of an alternation, or <u>is followed by</u> a quantifier that allows zero matches, then the associated match
            variable will be undefined.
        </li><li>For example, the code
<table class="show_perl" width="75%"><tbody><tr class="show_perl"><td class="show_perl"><pre class="show_perl"><span class="linenum"> 1</span> #                                 $1 will store the label, if present
<span class="linenum"> 2</span> #                                 |
<span class="linenum"> 3</span> #                                 |       $2 will store the LAST set of
<span class="linenum"> 4</span> #                                 |       digits followed by a dash matched.
<span class="linenum"> 5</span> #                                 |       |
<span class="linenum"> 6</span> #                                 |       | $3 will store the LAST set of
<span class="linenum"> 7</span> #                                 |       | digits matched
<span class="linenum"> 8</span> #                                 |       | |
<span class="linenum"> 9</span> #                                 |       | |          $4 contains $5 or $6
<span class="linenum">10</span> #                                 |       | |          |
<span class="linenum">11</span> my $rgx_digit_cluster_str = qr/ ^ (\w+:)? ( (\d+)-? )* ( (X)+ | (Y+) ) (\.?) $ /x;
<span class="linenum">12</span> #                                                        |      |      |
<span class="linenum">13</span> #                     $5 contains one X if any are present      |      |
<span class="linenum">14</span> #                                                               |      |
<span class="linenum">15</span> #                  $6 contains one or more Ys, if any are present      |
<span class="linenum">16</span> #                                                                      |
<span class="linenum">17</span> #   $7 contains a trailing dot or is defined-but-empty if such is absent
<span class="linenum">18</span> #
<span class="linenum">19</span> # note the x modifier means spaces are ignored; we will discuss this later.
<span class="linenum">20</span>
<span class="linenum">21</span> my @test_strings = ("123-456-789X", "98765-4321-XXX.", "A:123-456-789Y",
<span class="linenum">22</span>                      "A:98765-4321-YYY.", "A:X", "123--456");
<span class="linenum">23</span>
<span class="linenum">24</span> foreach my $test_string (@test_strings) {
<span class="linenum">25</span>   print "TEST STRING: \"$test_string\"... ";
<span class="linenum">26</span>   if ($test_string =~ m/$rgx_digit_cluster_str/) {
<span class="linenum">27</span>     print "matches\n";
<span class="linenum">28</span>     printf "  - \$1 is %-18s",     (defined($1) ? "\"$1\"" : "undefined");
<span class="linenum">29</span>     printf "  - \$4 is %-18s\n",   (defined($4) ? "\"$4\"" : "undefined");
<span class="linenum">30</span>     printf "  - \$2 is %-18s",     (defined($2) ? "\"$2\"" : "undefined");
<span class="linenum">31</span>     printf "  - \$5 is %-18s\n",   (defined($5) ? "\"$5\"" : "undefined");
<span class="linenum">32</span>     printf "  - \$3 is %-18s",     (defined($3) ? "\"$3\"" : "undefined");
<span class="linenum">33</span>     printf "  - \$6 is %-18s\n",   (defined($6) ? "\"$6\"" : "undefined");
<span class="linenum">34</span>     printf "  - \$7 is %-18s\n\n", (defined($7) ? "\"$7\"" : "undefined");
<span class="linenum">35</span>   }
<span class="linenum">36</span>   else {
<span class="linenum">37</span>     print "doesn't match\n\n";
<span class="linenum">38</span>   }
<span class="linenum">39</span> }
</pre></td></tr></tbody></table>
            outputs:
<table class="exec_perl" width="75%"><tbody><tr class="exec_perl"><td class="exec_perl"><pre class="exec_perl">TEST STRING: "123-456-789X"... matches
  - $1 is undefined           - $4 is "X"
  - $2 is "789"               - $5 is "X"
  - $3 is "789"               - $6 is undefined
  - $7 is ""

TEST STRING: "98765-4321-XXX."... matches
  - $1 is undefined           - $4 is "XXX"
  - $2 is "4321-"             - $5 is "X"
  - $3 is "4321"              - $6 is undefined
  - $7 is "."

TEST STRING: "A:123-456-789Y"... matches
  - $1 is "A:"                - $4 is "Y"
  - $2 is "789"               - $5 is undefined
  - $3 is "789"               - $6 is "Y"
  - $7 is ""

TEST STRING: "A:98765-4321-YYY."... matches
  - $1 is "A:"                - $4 is "YYY"
  - $2 is "4321-"             - $5 is undefined
  - $3 is "4321"              - $6 is "YYY"
  - $7 is "."

TEST STRING: "A:X"... matches
  - $1 is "A:"                - $4 is "X"
  - $2 is undefined           - $5 is "X"
  - $3 is undefined           - $6 is undefined
  - $7 is ""

TEST STRING: "123--456"... doesn't match
</pre></td></tr></tbody></table>
      </li></ul>
      </li><li>Match variables persist until the next <i>sucessful</i> pattern match, at which
          point they are all cleared (i.e. they are <tt>undef</tt>ined),
          <b><u>even if the successfully-matched pattern doesn't contain any capturing parentheses!</u></b>
      <ul>
        <li>Wait, what??
        </li><li>An example is in order. The code:
<table class="show_perl" width="75%"><tbody><tr class="show_perl"><td class="show_perl"><pre class="show_perl"><span class="linenum"> 1</span> sub print_match_vars {
<span class="linenum"> 2</span>   print "\n  --&gt; ";
<span class="linenum"> 3</span>   print defined $1 ? "\$1=\"$1\"; " : "\$1 is undefined; ";
<span class="linenum"> 4</span>   print defined $2 ? "\$2=\"$2\"; " : "\$2 is undefined; ";
<span class="linenum"> 5</span>   print defined $3 ? "\$3=\"$3\"\n" : "\$3 is undefined\n";
<span class="linenum"> 6</span> }
<span class="linenum"> 7</span>
<span class="linenum"> 8</span> my $string = "abcdefghijklmnopqrstuvwxyz";
<span class="linenum"> 9</span>
<span class="linenum">10</span> print "The pattern match variables start out undefined:";
<span class="linenum">11</span> print_match_vars();
<span class="linenum">12</span>
<span class="linenum">13</span> print "After a successful match, those used in the pattern are defined: ";
<span class="linenum">14</span> print $string =~ m/a(.+)x(.+)z/ ? "(match)" : "(no match)";
<span class="linenum">15</span> print_match_vars();
<span class="linenum">16</span>
<span class="linenum">17</span> print "After a failed match, they are unchanged: ";
<span class="linenum">18</span> print $string =~ m/([0-9]+)/ ? "(match)" : "(no match)";
<span class="linenum">19</span> print_match_vars();
<span class="linenum">20</span>
<span class="linenum">21</span> print "After a successful match, they are set again based on the pattern: ";
<span class="linenum">22</span> print $string =~ m/([^aeiou])/ ? "(match)" : "(no match)";
<span class="linenum">23</span> print_match_vars();
<span class="linenum">24</span>
<span class="linenum">25</span> print "After a successful match, they are all undefined: ";
<span class="linenum">26</span> print $string =~ m/abc/ ? "(match)" : "(no match)";
<span class="linenum">27</span> print_match_vars();
</pre></td></tr></tbody></table>
            outputs
<table class="exec_perl" width="75%"><tbody><tr class="exec_perl"><td class="exec_perl"><pre class="exec_perl">The pattern match variables start out undefined:
  --&gt; $1 is undefined; $2 is undefined; $3 is undefined
After a successful match, those used in the pattern are defined: (match)
  --&gt; $1="bcdefghijklmnopqrstuvw"; $2="y"; $3 is undefined
After a failed match, they are unchanged: (no match)
  --&gt; $1="bcdefghijklmnopqrstuvw"; $2="y"; $3 is undefined
After a successful match, they are set again based on the pattern: (match)
  --&gt; $1="b"; $2 is undefined; $3 is undefined
After a successful match, they are all undefined: (match)
  --&gt; $1 is undefined; $2 is undefined; $3 is undefined
</pre></td></tr></tbody></table>
        <ul>
          <li>Notice also that I did not pass <tt>$1</tt>, <tt>$2</tt>,  etc. to the <tt>print_match_vars()</tt> subroutine.
              This serves to illustrate that the pattern match variables are all in the <b>global</b> <i>scope</i>.
          </li><li>It is better to pass needed match variables to a
subroutine, because otherwise it is very unclear what's going on from
the point-of-view of
              the subroutine (and its maintainer).
        </li></ul>
        </li><li>Because the match variables tend to get reset, it is a good idea to store them in other variables <b>immediately</b>.
        </li><li>This has the advantageous side effect of making your code much clearer, especially when passing them around.
        </li><li>Example:
<table class="show_perl" width="75%"><tbody><tr class="show_perl"><td class="show_perl"><pre class="show_perl"><span class="linenum"> 1</span> # The next line sets $1, $2, and $3. All other match variables are unset.
<span class="linenum"> 2</span> if (my $code =~ m/^(\d+) LET ([A-Z]+) = (.+)$/) {
<span class="linenum"> 3</span>
<span class="linenum"> 4</span>   my ($line_num, $var_name, $expression) = ($1, $2, $3);
<span class="linenum"> 5</span>
<span class="linenum"> 6</span>   # The next line obviously sets the match variables
<span class="linenum"> 7</span>   #   * assuming the match succeeeds
<span class="linenum"> 8</span>   if ($expression =~ m/^(\d+|[A-Z]+) ([-+*\/]) (\d+|[A-Z]+)$/) {
<span class="linenum"> 9</span>
<span class="linenum">10</span>     my ($operand1, $operator, $operand2) = ($1, $2, $3);
<span class="linenum">11</span>
<span class="linenum">12</span>     # The next line NON-OBVIOUSLY resets the match variables, EVEN THOUGH IT CONTAINS NO PARENS
<span class="linenum">13</span>     #   * assuming the match succeeeds
<span class="linenum">14</span>     $operand1 = $symtab{$operand1} if ($operand1 =~ m/^[A-Z]+$/);
<span class="linenum">15</span>     # ditto
<span class="linenum">16</span>     $operand2 = $symtab{$operand2} if ($operand2 =~ m/^[A-Z]+$/);
<span class="linenum">17</span>
<span class="linenum">18</span>     # The next line MIGHT reset the match variables, depending on the condition
<span class="linenum">19</span>     #   Discussion question: when will the match variables be reset and when not?
<span class="linenum">20</span>     if ($operator eq "/" and $operand2 !~ m/^0+$/) {
<span class="linenum">21</span>       warn "Division by zero!\n";
<span class="linenum">22</span>     }
<span class="linenum">23</span>     else {
<span class="linenum">24</span>       # The next line SNEAKILY resets match variables because--and you can't
<span class="linenum">25</span>       # tell from here!!--the evaluate_expr() function uses a regex
<span class="linenum">26</span>       my $expr_value = evaluate_expr($operand1, $operand2, $operator);
<span class="linenum">27</span>     }
<span class="linenum">28</span>
<span class="linenum">29</span>   }
<span class="linenum">30</span> }
</pre></td></tr></tbody></table>
      </li></ul>
      </li><li>A final note on the match variables and when they get
(re)set: Remember the difference between "not used in the match" and
"used in the match but matched no characters"
    </li></ul>
    <h2><a class="goto" name="CHAPTER0006">Interpolating into Regexes</a></h2>
    <ul>
      <li>As regexes are just strings, they interpolate just as a double-quoted string does. Therefore,
          the regex defined by<br>
          <tt>&nbsp;&nbsp;&nbsp;my&nbsp;$str&nbsp;=&nbsp;"abc";</tt><br>
          <tt>&nbsp;&nbsp;&nbsp;my&nbsp;$rgx&nbsp;=&nbsp;qr/123$str/;</tt><br>
          is <tt>/123abc/</tt>.
      <ul>
        <li>Be aware, however, that any special characters in the string interpolated <u><b>are</b> treated as special characters</u>.
            All interpolation is done first and then the final string is
            interpreted by the regex engine.
        </li><li>Therefore, consider the following code:
<table class="show_perl" width="75%"><tbody><tr class="show_perl"><td class="show_perl"><pre class="show_perl"><span class="linenum">1</span> my $a_plus = "A+";
<span class="linenum">2</span>
<span class="linenum">3</span> my @students = ("Bill: A", "Jill: B+", "Will: N/A", "Gil: A+", "Phil: F", "Fran: A-");
<span class="linenum">4</span>
<span class="linenum">5</span> foreach my $student (@students) {
<span class="linenum">6</span>   if ($student =~ m/$a_plus/) {
<span class="linenum">7</span>     print "$student\n";
<span class="linenum">8</span>   }
<span class="linenum">9</span> }
</pre></td></tr></tbody></table>
            This outputs
<table class="exec_perl" width="75%"><tbody><tr class="exec_perl"><td class="exec_perl"><pre class="exec_perl">Bill: A
Will: N/A
Gil: A+
Fran: A-
</pre></td></tr></tbody></table>
        </li><li>This is (presumably) not correct. The <tt>+</tt> in the string <tt>$award_grade</tt> is interpreted
            as the "one or more" quantifier, not a literal plus sign.
        </li><li>Fortunately, Perl has the <tt>\Q\E</tt> operator that <i>quotes metacharacters</i>. Use it to surround
            interpolated strings that you want to be literal text in the regex.
        </li><li>Consider the following code:
<table class="show_perl" width="75%"><tbody><tr class="show_perl"><td class="show_perl"><pre class="show_perl"><span class="linenum">1</span> my $a_plus = "A+";
<span class="linenum">2</span>
<span class="linenum">3</span> my @students = ("Bill: A", "Jill: B+", "Will: N/A", "Gil: A+", "Phil: F", "Fran: A-");
<span class="linenum">4</span>
<span class="linenum">5</span> foreach my $student (@students) {
<span class="linenum">6</span>   if ($student =~ m/\Q$a_plus\E/) {
<span class="linenum">7</span>     print "$student\n";
<span class="linenum">8</span>   }
<span class="linenum">9</span> }
</pre></td></tr></tbody></table>
            This outputs
<table class="exec_perl" width="75%"><tbody><tr class="exec_perl"><td class="exec_perl"><pre class="exec_perl">Gil: A+
</pre></td></tr></tbody></table>
      </li></ul>
    </li></ul>
    <h3><a class="goto" name="SECTION0012">Building Complex Regexes</a></h3>
    <ul>
      <li>Interpolation is largely useful because it allows you to interpolate sub-regexes defined with <tt>qr//</tt>
          into larger regexes. This is very useful for building very complex regexes.
<table class="show_perl" width="75%"><tbody><tr class="show_perl"><td class="show_perl"><pre class="show_perl"><span class="linenum">1</span> my $rgx_zip = qr/\d{5}(-\d{4})?/;
<span class="linenum">2</span> my $rgx_phone = qr/\d{3}-\d{3}-\d{4}/;
<span class="linenum">3</span> my $rgx_idnum = qr/[A-Z]{2}\d{5}-[A-Z]{3}/;
<span class="linenum">4</span>
<span class="linenum">5</span> if ($data =~ m/^ZIP=($rgx_zip), PH=($rgx_phone), ID=($rgx_idnum)$/) {
<span class="linenum">6</span>   print "Data is valid!\n";
<span class="linenum">7</span> }
</pre></td></tr></tbody></table>
      <ul>
        <li>There are two distinct methods for creating large complex regexes
            from small simple building blocks.
        </li><li>Building regexes from the <i>top down</i> means creating one large regex
            that matches parts of the large string in a very general way. Then parts of this
            "overview" regex can be refined until the overall regex matches the string
            with the level of detail needed.
        <ul>
          <li>Define a complex regex simply by using undefined sub-regexes as placeholders.
          </li><li>Then define the sub-regexes, possibly with more sub-regex placeholders if the sub-regex is itself complex.
          </li><li>Continue until you have built a complex regular expression much more easily than by doing it all at once.
          </li><li>For example, a regular expression to define any possible email address
              would be highly complex.  So instead, start with the following:
<table class="show_perl" width="75%"><tbody><tr class="show_perl"><td class="show_perl"><pre class="show_perl">$rgx_email = qr/^($rgx_user)@($rgx_host)$/;
</pre></td></tr></tbody></table>
          </li><li>Then define each of the two sub-regexes:
<table class="show_perl" width="75%"><tbody><tr class="show_perl"><td class="show_perl"><pre class="show_perl">$rgx_host = qr/(?:$rgx_dns_name|$rgx_ipaddr)/;
$rgx_user = qr/(?:[a-zA-Z0-9._-]+)/;

$rgx_email = qr/^($rgx_user)@($rgx_host)$/;
</pre></td></tr></tbody></table>
          </li><li>Keep going until you have defined all sub-levels:
<table class="show_perl" width="75%"><tbody><tr class="show_perl"><td class="show_perl"><pre class="show_perl"><span class="linenum">1</span> $rgx_octet = qr/(?:\d{1,2}|[01]\d{2}|2[0-4]\d|25[0-5])/;
<span class="linenum">2</span> $rgx_ipaddr = qr/(?:(?:$rgx_octet\.){3}$rgx_octet)/;
<span class="linenum">3</span> $rgx_dns_tld = qr/(?:[a-zA-Z]{2,4})/;
<span class="linenum">4</span> $rgx_dns_comp = qr/(?:[a-zA-Z0-9]+[a-zA-Z0-9-]*[a-zA-Z0-9]+|[a-zA-Z0-9]+)/;
<span class="linenum">5</span> $rgx_dns_name = qr/(?:(?:$rgx_dns_comp\.)+$rgx_dns_tld)/;
<span class="linenum">6</span> $rgx_host = qr/(?:$rgx_dns_name|$rgx_ipaddr)/;
<span class="linenum">7</span> $rgx_user = qr/(?:[a-zA-Z0-9._-]+)/;
<span class="linenum">8</span> $rgx_email = qr/^($rgx_user)@($rgx_host)$/;
</pre></td></tr></tbody></table>
          </li><li>In addition to how much easier it was to construct, note how much easier this is to read than if it were all combined
              into one monstrous regular expression:
<table class="show_perl" width="75%"><tbody><tr class="show_perl"><td class="show_perl"><pre class="show_perl">$rgx_email = qr/^([a-zA-Z0-9._-]+)@((?:(?:(?:[a-zA-Z0-9]+[a-zA-Z0-9-]*
                [a-zA-Z0-9]+|[a-zA-Z0-9]+)\.)+(?:[a-zA-Z]{2,4}))|
                (?:(?:(?:\d{1,2}|[01]\d{2}|2[0-4]\d|25[0-5])\.){3}
                (?:\d{1,2}|[01]\d{2}|2[0-4]\d|25[0-5])))$/;
</pre></td></tr></tbody></table>
        </li></ul>
        </li><li>Building regexes from the <i>bottom up</i> means creating small regexes that match
            bits of the string, then combining them into larger regexes until you have one
            regex that should match the entire string. In other words, you define the pieces you know you
            will need first, then glue them together.
        <ul>
          <li>The end result is usually similar; the distinction is in the method of construction.
          </li><li>In practice, you will often come at a complex regex from both sides at the same time.
        </li></ul>
        </li><li>There are a couple of caveats to remember when using interpolation to build complex regular expressions:
        <ul>
          <li>It is often a good idea to surround every sub-regex with
(non-capturing) parentheses in order to prevent unexpected interactions
between
              adjacent interpolations. For example, consider
<table class="show_perl" width="75%"><tbody><tr class="show_perl"><td class="show_perl"><pre class="show_perl">my $rgx1 = qr/a?b/;
my $rgx2 = qr/c|d/;
my $rgx_combined = qr/^$rgx1$rgx2$/;
</pre></td></tr></tbody></table>
              The overall regex is <tt>/^a?bc|d$/</tt>, which matches any string that starts with "<tt>abc</tt>" or "<tt>bc</tt>" or ends with "<tt>d</tt>".
          <ul>
            <li>However, the author <b>probably</b> meant to match "<tt>abc</tt>", "<tt>abd</tt>", "<tt>bc</tt>", or "<tt>bd</tt>".
            </li><li>The regex to do that would be <tt>/^(a?b)(c|d)$/</tt>
            </li><li>Which would have been realized had the sub-regexes been enclosed in (capturing or clustering) parentheses.
          </li></ul>
          </li><li>Remember that positional anchors (beginning and
ending of string) are interpreted in the context of the overall regex.
Placing positional
              anchors in sub-regexes will not normally work.
          </li><li>Capturing parentheses are numbered left to right in the overall interpolated regex. It is often a good
              idea to avoid capturing parentheses in sub-regexes that are designed to be interpolated
              (non-capturing parentheses are fine).
        </li></ul>
      </li></ul>
    </li></ul>
    <h3><a class="goto" name="SECTION0013">Constructing Regexes Programmatically</a></h3>
    <ul>
      <li>Other string operations such as <tt>join</tt> and concatenation work on regexes as well.
      </li><li>This allows you to create very long and cumbersome regexes in code using some
          data structure as a base (perhaps even using user-supplied data).
      </li><li>Here is a simple example:
<table class="show_perl" width="75%"><tbody><tr class="show_perl"><td class="show_perl"><pre class="show_perl"><span class="linenum">1</span> my @months = qw(January February March April May June July
<span class="linenum">2</span>                 August September October November December);
<span class="linenum">3</span>
<span class="linenum">4</span> my $month_abbrs = join "|", map { uc substr $_, 0, 3 } @months;
<span class="linenum">5</span>
<span class="linenum">6</span> my $rgx_date = qr/^([0-9]{4})-($month_abbrs)-([0-9]{2})$/i;
<span class="linenum">7</span>
<span class="linenum">8</span> print "Regex to match dates is /$rgx_date/\n";
</pre></td></tr></tbody></table>
          which outputs
<table class="exec_perl" width="75%"><tbody><tr class="exec_perl"><td class="exec_perl"><pre class="exec_perl">Regex to match dates is /(?i-xsm:^([0-9]{4})-(JAN|FEB|MAR|APR|MAY|JUN|JUL|AUG|SEP|OCT|NOV|DEC)-([0-9]{2})$)/
</pre></td></tr></tbody></table>
      </li><li>Ignore the "<tt>(?i-xsm:</tt> ... <tt>)</tt>" for the moment. No worries, we'll come back to it.
      </li><li>Remember that regexes can be passed to and returned from subroutines.
      </li><li>Discussion question: what's wrong with the above date-matching regex?
    </li></ul>
    <h2><a class="goto" name="CHAPTER0007">Evaluated Replacements</a></h2>
    <ul>
      <li><tt>s///</tt> supports "evaluated replacement". If the <tt>/e</tt> modifier is used on the pattern,
          then the replacement will be executed as Perl code and the string returned will be substituted.
          For example:
<table class="show_perl" width="75%"><tbody><tr class="show_perl"><td class="show_perl"><pre class="show_perl"><span class="linenum"> 1</span> my $rgx_1st_word = qr/^(\w+)\b/;
<span class="linenum"> 2</span> my $sentence = "Call me Ishmael";
<span class="linenum"> 3</span>
<span class="linenum"> 4</span> print "string before modification:\n";
<span class="linenum"> 5</span> print "$sentence\n\n";
<span class="linenum"> 6</span>
<span class="linenum"> 7</span> my $mod_sentence_without = $sentence;
<span class="linenum"> 8</span> $mod_sentence_without =~ s/$rgx_1st_word/uc $1/;
<span class="linenum"> 9</span> print "without the e modifier:\n";
<span class="linenum">10</span> print "$mod_sentence_without\n\n";
<span class="linenum">11</span>
<span class="linenum">12</span> my $mod_sentence_with = $sentence;
<span class="linenum">13</span> $mod_sentence_with =~ s/$rgx_1st_word/uc $1/e;
<span class="linenum">14</span> print "with the e modifier:\n";
<span class="linenum">15</span> print "$mod_sentence_with\n";
</pre></td></tr></tbody></table>
          outputs:
<table class="exec_perl" width="75%"><tbody><tr class="exec_perl"><td class="exec_perl"><pre class="exec_perl">string before modification:
Call me Ishmael

without the e modifier:
uc Call me Ishmael

with the e modifier:
CALL me Ishmael
</pre></td></tr></tbody></table>
    </li></ul>
    <h2><a class="goto" name="CHAPTER0008">Extended Regexes</a></h2>
    <ul>
      <li>By using the <tt>/x</tt> modifier, you can insert whitespace and comments.
      </li><li>This makes regular expressions much easier to use.
      </li><li>For example, consider the regular expression
<table class="show_perl" width="75%"><tbody><tr class="show_perl"><td class="show_perl"><pre class="show_perl">$rgx_int = qr/^([-+])?(\d+)(?:[eE](\d{1,3}))?$/;
</pre></td></tr></tbody></table>
      </li><li>Using extended regexes, this regex can be written as
<table class="show_perl" width="75%"><tbody><tr class="show_perl"><td class="show_perl"><pre class="show_perl"><span class="linenum"> 1</span> $rgx_int = qr/
<span class="linenum"> 2</span>                ^               # beginning of string
<span class="linenum"> 3</span>
<span class="linenum"> 4</span>                (               # begin capture $1: sign
<span class="linenum"> 5</span>                  [-+]          #   plus or minus
<span class="linenum"> 6</span>                )               # end capture $1
<span class="linenum"> 7</span>                ?               # sign is optional
<span class="linenum"> 8</span>
<span class="linenum"> 9</span>                (               # capture $2: coefficient
<span class="linenum">10</span>                  \d+           #   one or more digits
<span class="linenum">11</span>                )               # end capture $2
<span class="linenum">12</span>
<span class="linenum">13</span>                (?:             # group
<span class="linenum">14</span>                  [eE]          # literal e or E
<span class="linenum">15</span>                  (             # capture $3: exponent
<span class="linenum">16</span>                    \d{1,3}     #   one to three digits
<span class="linenum">17</span>                  )             # end capture $3
<span class="linenum">18</span>                )
<span class="linenum">19</span>                ?               # exponent is optional
<span class="linenum">20</span>
<span class="linenum">21</span>                $               # end of string
<span class="linenum">22</span>              /x";
</pre></td></tr></tbody></table>
      </li><li>These regular expressions are equivalent, but the second is easier to comprehend.
      <ul>
        <li>You almost always want to use <tt>qr//</tt> and store a regex in a variable when
            using <tt>/x</tt>; it is very cumbersome (though allowed) to use extended regexes
            in-place.
      </li></ul>
      </li><li>To include literal spaces or hash characters in an extended regex, escape them or include them in a character class.
    </li></ul>
    <h2><a class="goto" name="CHAPTER0009">Greed</a></h2>
    <ul>
      <li>The repetition operators <tt>*</tt>, <tt>+</tt>, and <tt>?</tt> are <i>greedy</i>.
      <ul>
        <li>This means they match as much of the string as possible.  Example:
<table class="show_perl" width="75%"><tbody><tr class="show_perl"><td class="show_perl"><pre class="show_perl"><span class="linenum">1</span> my $html_line = 'Here is a &lt;B&gt;bold&lt;/B&gt; thing and an &lt;I&gt;italic&lt;/I&gt; thing';
<span class="linenum">2</span> print "HTML is: \"$html_line\"\n";
<span class="linenum">3</span> print "(Trying to strip HTML tags)\n";
<span class="linenum">4</span> $html_line =~ s/&lt;.+&gt;//g;
<span class="linenum">5</span> print "TEXT is: \"$html_line\"\n";
</pre></td></tr></tbody></table>
            outputs:
<table class="exec_perl" width="75%"><tbody><tr class="exec_perl"><td class="exec_perl"><pre class="exec_perl">HTML is: "Here is a &lt;B&gt;bold&lt;/B&gt; thing and an &lt;I&gt;italic&lt;/I&gt; thing"
(Trying to strip HTML tags)
TEXT is: "Here is a  thing"
</pre></td></tr></tbody></table>
        </li><li>Oh no!  What happened?
        <ul>
          <li>The pattern<br>
<tt>&nbsp;&nbsp;&nbsp;/<span class="regex_match_l">&lt;</span><span class="regex_match_l">.+</span><span class="regex_match">&gt;</span>/</tt><br>
              is three sub-patterns:
          <ul>
            <li>subpat1: (<tt>&lt;</tt>) Match a literal "&lt;"
            </li><li>subpat2: (<tt>.+</tt>) Match one or more characters
            </li><li>subpat3: (<tt>&gt;</tt>) Match a literal "&gt;"
          </li></ul>
          </li><li>On the first pass, subpat1 matches the "<tt>&lt;</tt>" in "<tt>&lt;B&gt;</tt>"
          </li><li>The pattern match continues from that point with subpat2, which is "<tt>.+</tt>".  This is
              intended to match everything inside the HTML tag (the "<tt>B</tt>",
              in the first case, the "<tt>/B</tt>" in the second, but instead matches as
              much as possible, including the "<tt>&gt;</tt>" ending the "<tt>&lt;B&gt;</tt>" tag, the entire
              "<tt>&lt;/B&gt;</tt>" tag, the "<tt>&lt;I&gt;</tt>" tag and the "<tt>&lt;/I</tt>" towards the end, plus all the
              text in between.
          </li><li>Then subpat3 tries to match.  It succeeds because there is another "<tt>&gt;</tt>" after "<tt>&lt;/I</tt>".
          </li><li>All three sub-patterns have matched so the overall pattern is a match.
          <ul>
            <li>However, the text actually matched within the string was:
                "<tt>&lt;B&gt;bold&lt;/B&gt; thing and an &lt;I&gt;italic&lt;/I&gt;</tt>".
            </li><li>Therefore when the substitution is done, much more of the string is removed than was intended.
          </li></ul>
        </li></ul>
      </li></ul>
      </li><li>To avoid greed, use the <tt>+?</tt>, <tt>*?</tt>, and <tt>??</tt>
          <i>minimal match</i> operators instead.  They still match one
or more repetitions, or zero or more repetitions, but they gobble up the
 minimum necessary for the pattern to match:
<table class="show_perl" width="75%"><tbody><tr class="show_perl"><td class="show_perl"><pre class="show_perl"><span class="linenum">1</span> my $html_line = 'Here is a &lt;B&gt;bold&lt;/B&gt; thing and an &lt;I&gt;italic&lt;/I&gt; thing';
<span class="linenum">2</span> print "HTML is: \"$html_line\"\n";
<span class="linenum">3</span> print "(Trying to strip HTML tags)\n";
<span class="linenum">4</span> $html_line =~ s/&lt;.+?&gt;//g;
<span class="linenum">5</span> print "TEXT is: \"$html_line\"\n";
</pre></td></tr></tbody></table>
          outputs:
<table class="exec_perl" width="75%"><tbody><tr class="exec_perl"><td class="exec_perl"><pre class="exec_perl">HTML is: "Here is a &lt;B&gt;bold&lt;/B&gt; thing and an &lt;I&gt;italic&lt;/I&gt; thing"
(Trying to strip HTML tags)
TEXT is: "Here is a bold thing and an italic thing"
</pre></td></tr></tbody></table>
      <ul>
        <li>That's what we intended.
        </li><li>Another approach is to explicitly exclude the "closing" from the match with a pattern like<br>
<tt>&nbsp;&nbsp;&nbsp;/<span class="regex_match_l">&lt;</span><span class="regex_match_l">[^&gt;]+</span><span class="regex_match">&gt;</span>/</tt><br>
            This pattern has three sub-patterns like the previous one, but with one critical difference:
        <ul>
          <li>subpat1: (<tt>&lt;</tt>) Match a literal "&lt;"
          </li><li>subpat2: (<tt>[^&gt;]+</tt>) Match one or more characters, <b><u>other than a literal "&gt;"</u></b>
          </li><li>subpat3: (<tt>&gt;</tt>) Match a literal "&gt;"
        </li></ul>
        </li><li>The use of non-greedy quantifier or an approach like
this one is largely a matter of taste, but it does occasionally matter
what the middle part consists of.
      </li></ul>
      </li><li>Greed is important when thinking about the "longest leftmost" rule.
    </li></ul>
    <h3><a class="goto" name="SECTION0014">Transliteration</a></h3>
    <ul>
      <li>Transliteration is not a regex operation, but because it looks like it is, it's always discussed when regexes are.
      <ul>
        <li>Carrying on the tradition here!
      </li></ul>
      </li><li>A transliteration operation simply searches for instances of a list of characters in a string and replaces
          them with the corresponding character in a second list, using syntax like "<tt>$string =~ tr/SRCH/REPL/;</tt>".
      </li><li>For example:
<table class="show_perl" width="75%"><tbody><tr class="show_perl"><td class="show_perl"><pre class="show_perl">my $string = "This is some text.";
print "Original String: \"$string\"\n";
$string =~ tr/aeiou/AEIOU/;
print "Modified String: \"$string\"\n";
</pre></td></tr></tbody></table>
          outputs
<table class="exec_perl" width="75%"><tbody><tr class="exec_perl"><td class="exec_perl"><pre class="exec_perl">Original String: "This is some text."
Modified String: "ThIs Is sOmE tExt."
</pre></td></tr></tbody></table>
      </li><li>Like <tt>s///</tt>, the <tt>tr///</tt> operation is destructive.
      </li><li>Note that if the replacement set has less characters in it than the search
          set, the last character in the replacement set is replicated as necessary.
      <ul>
        <li>If the replacement set only contains one character, it will have the effect of
            replacing all elements in the search set with that character:
<table class="show_perl" width="75%"><tbody><tr class="show_perl"><td class="show_perl"><pre class="show_perl">my $string = "This is some text.";
print "Original String: \"$string\"\n";
$string =~ tr/aeiou/!/;
print "Modified String: \"$string\"\n";
</pre></td></tr></tbody></table>
            outputs
<table class="exec_perl" width="75%"><tbody><tr class="exec_perl"><td class="exec_perl"><pre class="exec_perl">Original String: "This is some text."
Modified String: "Th!s !s s!m! t!xt."
</pre></td></tr></tbody></table>
        <ul>
          <li>This is more clearly accomplished with <tt>s/[aeiou]/!/g</tt>.
        </li></ul>
      </li></ul>
      </li><li>The <tt>tr///</tt> construct also accepts modifiers, the most useful of which are:
      <ul>
        <li><tt>/c</tt> means the search set is the <i>complement</i> of the given characters,
            meaning is it is every <b>other</b> character. This is most useful with
            a one-character replacement set:
<table class="show_perl" width="75%"><tbody><tr class="show_perl"><td class="show_perl"><pre class="show_perl">my $string = "This is some text.";
print "Original String: \"$string\"\n";
$string =~ tr/aeiou/?/c;
print "Modified String: \"$string\"\n";
</pre></td></tr></tbody></table>
            outputs
<table class="exec_perl" width="75%"><tbody><tr class="exec_perl"><td class="exec_perl"><pre class="exec_perl">Original String: "This is some text."
Modified String: "??i??i???o?e??e???"
</pre></td></tr></tbody></table>
        <ul>
          <li>This is more clearly accomplished with <tt>s/[^aeiou]/?/g</tt>.
        </li></ul>
        </li><li><tt>/s</tt> "squashes" runs of replaceable characters with one replacement instead of replacing them one-for-one:
<table class="show_perl" width="75%"><tbody><tr class="show_perl"><td class="show_perl"><pre class="show_perl">my $string = "This string  has some   weird  spacing.  ";
print "Original String: \"$string\"\n";
$string =~ tr/ / /s; # collapse strings of spaces
print "Modified String: \"$string\"\n";
</pre></td></tr></tbody></table>
            outputs
<table class="exec_perl" width="75%"><tbody><tr class="exec_perl"><td class="exec_perl"><pre class="exec_perl">Original String: "This string  has some   weird  spacing.  "
Modified String: "This string has some weird spacing. "
</pre></td></tr></tbody></table>
        <ul>
          <li>This is more easily accomplished with <tt>s/ +/ /g;</tt>
        </li></ul>
        </li><li><tt>/d</tt> deletes characters in the search set but not in the replacement set:
<table class="show_perl" width="75%"><tbody><tr class="show_perl"><td class="show_perl"><pre class="show_perl">my $string = "This is some text.";
print "Original String: \"$string\"\n";
$string =~ tr/aeioubcdfghjklmnpqrstvwxyz/AEIOU/d;
print "Modified String: \"$string\"\n";
</pre></td></tr></tbody></table>
            outputs
<table class="exec_perl" width="75%"><tbody><tr class="exec_perl"><td class="exec_perl"><pre class="exec_perl">Original String: "This is some text."
Modified String: "TI I OE E."
</pre></td></tr></tbody></table>
        <ul>
          <li>If the replacement set is empty, this has the effect of deleting all characters in the search set, but this is more
              clearly accomplished (<u>are you sensing a theme?</u>) with something like <tt>s/[aeiou]//g</tt>.
        </li></ul>
      </li></ul>
      </li><li><tt>tr</tt> returns the number of substitutions made, so it provides an easy way to count
          the occurrence of a specific character or set of characters, if given a replacement
          set that is the same as the search set:
<table class="show_perl" width="75%"><tbody><tr class="show_perl"><td class="show_perl"><pre class="show_perl">my $string = "This is some text.";
my $count = ($string =~ tr/aeiou/aeiou/);
print "String \"$string\" contains $count vowels.\n";
</pre></td></tr></tbody></table>
          outputs
<table class="exec_perl" width="75%"><tbody><tr class="exec_perl"><td class="exec_perl"><pre class="exec_perl">String "This is some text." contains 5 vowels.
</pre></td></tr></tbody></table>
      <ul>
        <li>This is easily accomplished with <tt>s/([aeiou])/$1/g;</tt>
      </li></ul>
      </li><li>You can also do something like<br>
          <tt>&nbsp;&nbsp;&nbsp;$count&nbsp;=&nbsp;grep&nbsp;{&nbsp;m/[aeiou]/&nbsp;}&nbsp;split&nbsp;//,&nbsp;$string;</tt><br>
          which has the additional benefit of being non-destructive,
          so (unlike <tt>tr///</tt> or <tt>s///</tt>) it can be used on constant and literal strings.
      </li><li>If you need a non-destructive <tt>tr///</tt> you can use it en-passant, as with <tt>s///</tt>.
      </li><li>For reasons of historical reverence, <tt>y///</tt> is a synonym for <tt>tr///</tt>; the two are identical in every respect.
      <ul>
        <li>Stick with <tt>tr///</tt>.
        </li><li>Better yet, stick with <tt>s///</tt>, which can almost always do what you need.
        <ul>
          <li>I have been a professional Perl programmer since 1999. I have used <tt>tr///</tt> exactly zero times.
          </li><li>But maybe you will find it useful. Truth is stranger than fiction.
        </li></ul>
      </li></ul>
    </li></ul>
    <h3><a class="goto" name="SECTION0015">Summary of Pattern Modifiers</a></h3>
    Several regex modifiers have been mentioned so far. Here is a list
of the important ones. For more information, see the chapter "Pattern
Matching" in <u>Programming Perl</u>.
    <p>
    <table class="inline">
    <tbody><tr><th class="inline">Modifier</th><th class="inline">Effect</th><th class="inline"><tt>m//</tt></th><th class="inline"><tt>s///</tt></th><th class="inline"><tt>qr//</tt></th><th class="inline">Bad Mnemonic?</th></tr>
    <tr><td class="inline"><tt>/i</tt></td><td class="inline">match case-<u><b>I</b></u>nsensitively</td><td class="inline">yes</td><td class="inline">yes</td><td class="inline">yes</td><td class="inline">no</td></tr>
    <tr><td class="inline"><tt>/s</tt></td><td class="inline">include <tt>\n</tt> in the dot class<br>(force string to be handled as a <u><b>S</b></u>ingle line)</td><td class="inline">yes</td><td class="inline">yes</td><td class="inline">yes</td><td class="inline">yes</td></tr>
    <tr><td class="inline"><tt>/m</tt></td><td class="inline"><tt>^</tt> and <tt>$</tt> can match before or after an ebmedded newline<br>(force string to be handled as <u><b>M</b></u>ultiple lines)</td><td class="inline">yes</td><td class="inline">yes</td><td class="inline">yes</td><td class="inline">yes</td></tr>
    <tr><td class="inline"><tt>/x</tt></td><td class="inline">e<u><b>X</b></u>tended regex (can include comments and whitespace)</td><td class="inline">yes</td><td class="inline">yes</td><td class="inline">yes</td><td class="inline">no</td></tr>
    <tr><td class="inline"><tt>/o</tt></td><td class="inline">only compile pattern <u><b>O</b></u>nce</td><td class="inline">yes</td><td class="inline">yes</td><td class="inline">–</td><td class="inline">no</td></tr>
    <tr><td class="inline"><tt>/g</tt></td><td class="inline">match/substitute <u><b>G</b></u>lobally</td><td class="inline">yes</td><td class="inline">yes</td><td class="inline">–</td><td class="inline">no</td></tr>
    <tr><td class="inline"><tt>/c</tt></td><td class="inline"><u><b>C</b></u>ontinue after a failed global match</td><td class="inline">with <tt>/g</tt></td><td class="inline">–</td><td class="inline">–</td><td class="inline">no</td></tr>
    <tr><td class="inline"><tt>/e</tt></td><td class="inline"><u><b>E</b></u>valuate replacement as expression</td><td class="inline">–</td><td class="inline">yes</td><td class="inline">–</td><td class="inline">no</td></tr>
    </tbody></table>
    </p><ul>
      <li>Remember when we printed out a regex that had been stored with <tt>qr//</tt> and it looked like<br>
          <tt>&nbsp;&nbsp;&nbsp;/(?i-xsm:^([0-9]{4})-(JAN|FEB|MAR|APR|MAY|JUN|JUL|AUG|SEP|OCT|NOV|DEC)-([0-9]{2})$)/</tt>
      <ul>
        <li>The  "<tt>(?i-xsm:</tt> ... <tt>)</tt>" indicates that for that part of the pattern, the <tt>/i</tt> modifier is in effect,
            and the <tt>/x</tt>, <tt>/s</tt>, and <tt>/m</tt> modifiers are not in effect.
        </li><li>These are known as <i>cloistered pattern modifiers</i> and it's possible for different parts of the regex to have different
            modifiers, either by specifiying them literally using this construction, or by interpolating patterns stored
            with <tt>qr//</tt> and given different options where they are defined.
        </li><li>Whoa, look at the time!
      </li></ul>
    </li></ul>
    <h2><a class="goto" name="CHAPTER0010">Tips</a></h2>
    <ul>
      <li>Remember the rules for matching:
      <ul>
        <li>Patterns always try to match the <b>LONGEST LEFTMOST</b> substring.
        </li><li>When a pattern is matched against a string, the string must match all sub-patterns <b>IN-ORDER, ADJACENTLY</b>.
      </li></ul>
      </li><li>Develop regular expressions one piece at a time, such as from the inside to the outside, and break them up if needed.
      <ul>
        <li>Use interpolation to make regular expressions easier to understand.
        </li><li>Use a top-down or bottom-up method to develop complex regexes.
      </li></ul>
      </li><li>If you use match variables, remember that <b>all</b> regular parentheses in a regex memorize things.
      <ul>
        <li>This is particularly important to remember when interpolating regexes into one another.
      </li></ul>
      </li><li>Remember that that <tt>*</tt>, <tt>+</tt>, and <tt>?</tt> quantifiers are greedy.
      </li><li>Remember that word boundaries and other anchors match conditions, <b>not</b> characters.
      </li><li>Don't reinvent the wheel - if you need to use a regex for URLs, email addresses,
          U.S. Postal addresses, or anything else common but complicated, search the
          Web for a solution before you develop your own.
      </li><li>Keep regexes as simple as possible.
      </li><li>Keep regexes single-functional.
      </li><li>Give clear names to regular expressions that are stored in variables.
      <ul>
        <li>Generally they should start with the <tt>$rgx</tt> prefix or something similar.
        </li><li>Give them names based on what they are supposed to do, not how they
            are supposed to do it.
        <ul>
          <li>Call it <tt>$rgx_zipcode</tt>, not <tt>$rgx_5digits</tt>
        </li></ul>
      </li></ul>
    </li></ul>
    <h3><a class="goto" name="SECTION0016">When <u>not</u> to use Regular Expressions</a></h3>
    <ul>
      <li>Regexes are not the best solution for <b>every</b> problem involving interpreting a string.
      <ul>
        <li>They are very powerful, but they are often more difficult to use than some alternatives.
        </li><li>The regex engine is highly optimized, but Perl's built-in string functions (like <tt>substr</tt> and <tt>index</tt>) are
            even more highly optimized, so regexes, while very efficient, are not always
            more efficient than combinations of string functions.
      </li></ul>
      </li><li>Therefore, don't use regexes to:
      <ul>
        <li>Determine if a string is exactly the same as another string (use <tt>eq</tt>).
        </li><li>Determine if a string is exactly the same as another string, disregarding case (use <tt>eq</tt> with <tt>uc</tt>).
        </li><li>Determine if a string contains a given substring (use <tt>index</tt>, with <tt>uc</tt> for case-insensitive search).
        </li><li>Determine if a string has a given prefix or suffix (use <tt>index</tt>/<tt>rindex</tt> or <tt>eq</tt> with <tt>substr</tt>, again perhaps with <tt>uc</tt>).
        </li><li>Chop up a string by length (use <tt>substr</tt> or <tt>unpack</tt>)
      </li></ul>
      </li><li>In general, if the problem can be solved with one or more string functions, it will probably be faster and easier to do so.
      <ul>
        <li>But don't use some tortured combination of multiple string functions when a simple regex will work.
        </li><li>For example, if you want to find the substring of a string bounded by angle-brackets,
            you could say<br>
            <tt>&nbsp;&nbsp;&nbsp;$start&nbsp;=&nbsp;index&nbsp;$string,&nbsp;"&lt;";</tt><br>
            <tt>&nbsp;&nbsp;&nbsp;$stop&nbsp;=&nbsp;index&nbsp;$string,&nbsp;"&gt;",&nbsp;$start;</tt><br>
            <tt>&nbsp;&nbsp;&nbsp;$substring&nbsp;=&nbsp;substr&nbsp;$string,&nbsp;$start&nbsp;+&nbsp;1,&nbsp;$stop&nbsp;-&nbsp;$start&nbsp;-&nbsp;1;</tt><br>
            But you would be better off saying:<br>
            <tt>&nbsp;&nbsp;&nbsp;$substring&nbsp;=&nbsp;($string&nbsp;=~&nbsp;m/&lt;(.*?)&gt;/)[0];</tt>
        <ul>
          <li>Although the regex solution looks more complicated at first blush, once
              you understand regexes it's much simpler.
        </li></ul>
        </li><li>Example 2: suppose you want to determine whether the first three characters of a string after the
            first space are "abc" (case-insensitive)*. The string-functions version:
<table class="show_perl" width="75%"><tbody><tr class="show_perl"><td class="show_perl"><pre class="show_perl">if ("abc" eq lc substr($string, 1+index($string, " "), 3)) {
  ...
</pre></td></tr></tbody></table>
            is probably faster than the functionally equivalent regex version:
<table class="show_perl" width="75%"><tbody><tr class="show_perl"><td class="show_perl"><pre class="show_perl">if ($string =~ m/^[^ ]* ABC/i) {
  ...
</pre></td></tr></tbody></table>
            but the regex version is easier to understand (again, once you are familiar with regexes).
        <ul>
          * <b>WHY</b> you would want to do this is beyond the scope of this class...
        </ul>
        </li><li>In general, if you need two or more lines, or especially a loop or
            conditional to parse a string, use a regex.
      </li></ul>
    </li></ul>
    <h3><a class="goto" name="SECTION0017">Good Grief, Make It Stop!</a></h3>
    <ul>
      <li>This information on regular expressions—detailed may it be—is only the beginning.
          There are <b>many</b> more elements in the regex syntax that we have not covered here:
      <ul>
        <li>Non-destructive substitution and transliteration (since Perl 5.14)
        </li><li>Matches and character classes based on <i>Unicode</i>
        </li><li>Additional anchors (<tt>\G</tt>, <tt>\A</tt>, <tt>\Z</tt>, and <tt>\z</tt>)
        </li><li>Recursive regexes
        </li><li>Cloistered pattern modifiers
        </li><li><i>Possessive</i> quantifiers
        </li><li>Relative backreferences
        </li><li><i>Lookahead</i> and <i>lookbehind assertions</i>
        <ul>
          <li>Positive and negative
        </li></ul>
        </li><li><i>Non-backtracking</i> sub-patterns
        </li><li><i>Match-time</i> code evaluation and pattern interpolation
        <ul>
          <li>Evaluated code to generate sub-patterns
          </li><li>Recursive patterns
          </li><li>Conditional interpolation
        </li></ul>
        </li><li>Tracing and optimizing the regex engine
        </li><li>And more!
      </li></ul>
    </li></ul>
    <h3><a class="goto" name="SECTION0018">Answers to discussion questions:</a></h3>
    <ul>
      <li>What characters are in the class <tt>[^\W_]</tt>?
      <ul>
        <li>This class contains everything <b>except</b> "non-word characters and underscores".
            In otherwords, it contains word characters except underscores
        </li><li>So it's equivalent to <tt>[A-Za-z0-9]</tt>
      </li></ul>
      </li><li>What character(s) do the following classes match?
      <ul>
        <li><tt>[^-]</tt> contains everything but a dash
        </li><li><tt>[-^]</tt> contains a dash and a caret
        </li><li><tt>[^-^]</tt> contains everything except a dash and a caret
        </li><li><tt>[--^]</tt> contains characters from a dash to a caret (includes digits, uppercase letters, and more punctuation)
        </li><li><tt>[^--^]</tt> contains everything except characters
from a dash to a caret (including lowercase letters, and some
punctuation, and whitespace characters)
        </li><li><tt>[^^]</tt>  contains everything but a caret
        </li><li><b>Please don't use any of these!</b>
      </li></ul>
      </li><li>When will the match variables be reset (and when not) when the conditional expression<br>
          <tt>&nbsp;&nbsp;&nbsp;$operator&nbsp;eq&nbsp;"/"&nbsp;and&nbsp;$operand2&nbsp;!~&nbsp;m/^0+$/</tt><br>
          is evaluated?
      <ul>
        <li>This expression is a candidate for short-circuit Boolean evaluation: if the first part (<tt>$operator eq "/"</tt>) is true,
            the second part (the regex match) will be evaluated and the match variables will be reset <u>on a successful match</u>.
            If the first part is false, the expression is already false and the second part will not be evaluated;
            hence the pattern match variables will <b>not</b> be reset,  even if the match would have succeeded.
      </li></ul>
      </li><li>What's wrong with the above date-matching regex:<br>
          <tt>&nbsp;&nbsp;&nbsp;/(?i-xsm:^([0-9]{4})-(JAN|FEB|MAR|APR|MAY|JUN|JUL|AUG|SEP|OCT|NOV|DEC)-([0-9]{2})$)/</tt>
      <ul>
        <li>Presumably the case-sensitivity is <b>not</b> what's wrong. The regex was specifically constructed to be case-sensitive and to match uppercase date abbreviations only.
        </li><li>The regex is intended to match strings like <tt>"2009-JAN-12"</tt>, but look carefully at the year and day-of-month parts. They just match strings of 4 and 2 digits, respectively.
        <ul>
          <li>So strings like <tt>"0000-JAN-99"</tt> will match.
          </li><li>This is almost certainly not what was intended.
          </li><li>It turns out that a regex to match only valid dates is hilariouly terrible, even if you don't worry about leap years.
          <ul>
            <li>Pop quiz: was 2000 a leap year?
          </li></ul>
          </li><li>Discussion question: what's a better way to write a date validator?
        </li></ul>
      </li></ul>
      </li><li>What's a better way to write a date validator?
      <ul>
        <li>Why does this question sound so familiar?
        </li><li>There are several modules on CPAN to do this.
        </li><li>A "roll-your-own" approach would be to use a
"naïve"
            regex like the one above, and then pass the (appropriately massaged) submatches to the <tt>timelocal</tt> function (in the <tt>Time::Local</tt> core library) to see if it <tt>die</tt>s, in which case it's an invalid date.
      </li></ul>
    </li></ul>
<p>
</p><hr style="border-bottom: 2px solid #804040;">
  <table style="margin: 0px;" border="0">
    <tbody><tr>
      <td style="padding-right: 12px; border-right: 2px solid #804040;">
        <div class="copyright">
          This page was downloaded on<br>
          21-June-2016 at 12:18pm.<br>
        </div>
      </td>
      <td style="padding-left: 12px;">
        <div class="copyright">
          <div style="margin-bottom: 4px; margin-top: 4px;">These notes are © 2007-2016 by Jeremy Holland.  All rights reserved.</div>
          <b style="border: solid 2px #804040; background: #FFE0E0; color:#402020">&nbsp;NotesMaker&nbsp;</b> is © 2007-2016 by Jeremy Holland. All rights reserved.<br>
        </div>
      </td>
    </tr>
  </tbody></table>


</body></html>