hws/hw04.tex
changeset 943 5365ef60707e
parent 939 f85e784d3014
equal deleted inserted replaced
942:c82a45f48bfc 943:5365ef60707e
    20 \end{center}
    20 \end{center}
    21 
    21 
    22 there are several values for how these regular expressions can
    22 there are several values for how these regular expressions can
    23 recognise the strings (for 1) $ab$ and (for 2) $aaa$. Give in each case
    23 recognise the strings (for 1) $ab$ and (for 2) $aaa$. Give in each case
    24 \emph{all} the values and indicate which one is the POSIX value.
    24 \emph{all} the values and indicate which one is the POSIX value.
       
    25 
       
    26 \solution{
       
    27   1) There are only 2 values (writing $a$ for $Char(a)$ and so on)
       
    28 
       
    29   \begin{center}
       
    30   \begin{tabular}{l}
       
    31     $Sequ(Left(Sequ(a,b)),Left(Empty))$\\
       
    32     $Sequ(Right(a),Left(b))$\\ 
       
    33   \end{tabular}    
       
    34   \end{center}
    25   
    35   
       
    36   The first is the POSIX value because of the preference for $Left$.
       
    37 
       
    38   2) There are three ``main'' values, namely
       
    39 
       
    40   \begin{center}
       
    41   \begin{tabular}{l}
       
    42     $Stars\,[Left(Sequ(a,a)),Right(a)]$\\
       
    43     $Stars\,[Right(a), Left(Sequ(a,a))]$\\
       
    44     $Stars\,[Right(a), Right(a), Right(a)]$\\
       
    45   \end{tabular}    
       
    46   \end{center}
       
    47 
       
    48   Again the first one is the POSIX value, but if it just about all
       
    49   possible values, then there are in fact infinitely many values because
       
    50   the following 
       
    51 
       
    52   \begin{center}
       
    53   \begin{tabular}{l}
       
    54     $Stars\,[Left(Sequ(a,a)),Empty,Right(a)]$\\
       
    55     $Stars\,[Left(Sequ(a,a)),Empty,Empty,Right(a)]$\\
       
    56     $Stars\,[Left(Sequ(a,a)),Empty,Right(a), Empty]$, \ldots\\
       
    57   \end{tabular}    
       
    58   \end{center}
       
    59 
       
    60   are also values for this regex and the string $aaa$.
       
    61 }  
    26 
    62 
    27 \item If a regular expression $r$ does not contain any occurrence of $\ZERO$,  
    63 \item If a regular expression $r$ does not contain any occurrence of $\ZERO$,  
    28   is it possible for $L(r)$ to be empty? Explain why, or give a proof.
    64   is it possible for $L(r)$ to be empty? Explain why, or give a proof.
    29 
    65 
    30   \solution{
    66   \solution{
    31     The property to prove is
    67     No. The property to prove by induction is
    32 
    68 
    33     \begin{center}
    69     \begin{center}
    34     $P(r)$: If $r$ does not contain $\ZERO$, then $L(r) \not= \emptyset$.  
    70     $P(r)$: If $r$ does not contain $\ZERO$, then $L(r) \not= \emptyset$.  
    35   \end{center}
    71   \end{center}
    36 
    72 
    73   \item $(a / 3) * 3$
   109   \item $(a / 3) * 3$
    74   \end{itemize}
   110   \end{itemize}
    75 
   111 
    76   In case they can, can you give the corresponding token
   112   In case they can, can you give the corresponding token
    77   sequences.
   113   sequences.
       
   114 
       
   115   \solution{
       
   116   The first 2 are lexibile. The 3 one contains $/$ which is not an operator.
       
   117   }  
    78 
   118 
    79 \item Assume $r$ is nullable. Show that
   119 \item Assume $r$ is nullable. Show that
    80   \[ 1 + r + r\cdot r \;\equiv\; r\cdot r
   120   \[ 1 + r + r\cdot r \;\equiv\; r\cdot r
    81   \]
   121   \]
    82 
   122 
   126 
   166 
   127 \item Construct a regular expression that can validate passwords. A
   167 \item Construct a regular expression that can validate passwords. A
   128   password should be at least 8 characters long and consist of upper-
   168   password should be at least 8 characters long and consist of upper-
   129   and lower-case letters and digits. It should contain at least a
   169   and lower-case letters and digits. It should contain at least a
   130   single lower-case letter, at least a single upper-case letter and at
   170   single lower-case letter, at least a single upper-case letter and at
   131   least a single digit. If possible ise the intersection regular
   171   least a single digit. If possible use the intersection regular
   132   expression from CW1, written $\_\&\_$, the bounded regular
   172   expression from CW1, written $\_\&\_$, and the bounded regular
   133   expressions; you can also assume a regular expression written
   173   expressions; you can also assume a regular expression written
   134   \texttt{ALL} that can match any character.
   174   \texttt{ALL} that can match any character.
   135 
   175 
   136   \solution{
   176   \solution{
   137     You can build-up the different constraints separately and then
   177     You can build-up the different constraints separately and then
   142     $ALL^{\{8..\}}$ & \;\&\; & $(ALL^*\cdot [a-z]\cdot ALL^*)$\\
   182     $ALL^{\{8..\}}$ & \;\&\; & $(ALL^*\cdot [a-z]\cdot ALL^*)$\\
   143                    & \;\&\; & $(ALL^*\cdot [A-Z]\cdot ALL^*)$\\
   183                    & \;\&\; & $(ALL^*\cdot [A-Z]\cdot ALL^*)$\\
   144                    & \;\&\; & $(ALL^*\cdot [0-9]\cdot ALL^*)$\\
   184                    & \;\&\; & $(ALL^*\cdot [0-9]\cdot ALL^*)$\\
   145   \end{tabular}
   185   \end{tabular}
   146   \end{center}
   186   \end{center}
       
   187 
       
   188   $ALL$ could be represented as $\sim \ZERO$.
   147   }
   189   }
   148   
   190   
   149 \item Assume the delimiters for comments are
   191 \item Assume the delimiters for comments are
   150       \texttt{$\slash$*} and \texttt{*$\slash$}. Give a
   192       \texttt{$\slash$*} and \texttt{*$\slash$}. Give a
   151       regular expression that can recognise comments of the
   193       regular expression that can recognise comments of the
   159       not comment delimiters. (Hint: You can assume you are
   201       not comment delimiters. (Hint: You can assume you are
   160       already given a regular expression written \texttt{ALL},
   202       already given a regular expression written \texttt{ALL},
   161       that can recognise any character, and a regular
   203       that can recognise any character, and a regular
   162       expression \texttt{NOT} that recognises the complement
   204       expression \texttt{NOT} that recognises the complement
   163       of a regular expression.)
   205       of a regular expression.)
   164   
   206 
       
   207       \solution{
       
   208         \[/ * \sim (ALL^* * / ALL^*) * /\]
       
   209 
       
   210       The idea to make sure in between $/ *$ and $* /$ ar no strings that contain $* /$.  
       
   211       }
       
   212       
   165 \item Simplify the regular expression
   213 \item Simplify the regular expression
   166 
   214 
   167 \[
   215 \[
   168 (\ZERO \cdot (b \cdot c)) + 
   216 (\ZERO \cdot (b \cdot c)) + 
   169 ((\ZERO \cdot c) + \ONE)
   217 ((\ZERO \cdot c) + \ONE)
   170 \]
   218 \]
   171 
   219 
   172       Does simplification always preserve the meaning of a
   220       Does simplification always preserve the meaning of a
   173       regular expression? 
   221       regular expression?
       
   222 
       
   223       \solution{ Yes, simplification preserves the language. It
       
   224         simplifies to just $\ONE$. It should be remembered that the
       
   225         Brzozowski does not simplify under stars. This does not apply
       
   226         in this example, though.  }
   174      
   227      
   175 \item The Sulzmann \& Lu algorithm contains the function
   228 \item The Sulzmann \& Lu algorithm contains the function
   176       $mkeps$ which answers how a regular expression can match
   229       $mkeps$ which answers how a regular expression can match
   177       the empty string. What is the answer of $mkeps$ for the
   230       the empty string. What is the answer of $mkeps$ for the
   178       regular expressions:    
   231       regular expressions:    
   182   (\ZERO \cdot (b \cdot c)) + 
   235   (\ZERO \cdot (b \cdot c)) + 
   183   ((\ZERO \cdot c) + \ONE)\\
   236   ((\ZERO \cdot c) + \ONE)\\
   184     (a + \ONE) \cdot (\ONE + \ONE)\\
   237     (a + \ONE) \cdot (\ONE + \ONE)\\
   185     a^*
   238     a^*
   186   \end{array}
   239   \end{array}
   187   \]
   240 \]
       
   241 
       
   242 \solution{
       
   243   The values are
       
   244   \begin{center}
       
   245   \begin{tabular}{l}
       
   246     $Right(Right(Empty))$\\
       
   247     $Sequ(Right(\ONE),Left(\ONE))$\\
       
   248     $Stars\,[]$
       
   249   \end{tabular}  
       
   250   \end{center}
       
   251 
       
   252   The last one uses the rule that $mkeps$ for the star returns always $Star\,[]$.
       
   253   }
   188 
   254 
   189 \item What is the purpose of the record regular expression in
   255 \item What is the purpose of the record regular expression in
   190       the Sulzmann \& Lu algorithm?
   256   the Sulzmann \& Lu algorithm?
       
   257 
       
   258   \solution{
       
   259     It marks a part of a regular expression and can be used to extract the part of the
       
   260     string that is matched by this marked part of the regular expression.
       
   261   }
   191 
   262 
   192 \item Recall the functions \textit{nullable} and
   263 \item Recall the functions \textit{nullable} and
   193       \textit{zeroable}.  Define recursive functions
   264       \textit{zeroable}.  Define recursive functions
   194       \textit{atmostempty} (for regular expressions that match no
   265       \textit{atmostempty} (for regular expressions that match no
   195       string or only the empty string), \textit{somechars} (for
   266       string or only the empty string), \textit{somechars} (for
   250           i(r_1 + r_2) &\dn i(r_1) \vee i(r_2)\\
   321           i(r_1 + r_2) &\dn i(r_1) \vee i(r_2)\\
   251           i(r_1 \cdot r_2) &\dn (\neg z(r_1) \wedge i(r_2)) \;\vee\; (\neg z(r_2) \wedge i(r_1))\\
   322           i(r_1 \cdot r_2) &\dn (\neg z(r_1) \wedge i(r_2)) \;\vee\; (\neg z(r_2) \wedge i(r_1))\\
   252           i(r^*)  &\dn \neg a(r)
   323           i(r^*)  &\dn \neg a(r)
   253         \end{align}
   324         \end{align}
   254 
   325 
   255         Here the interesting bit is that as soon $r$ can match at least a single string, then $r^*$
   326         Here the interesting bit is that as soon $r$ can match at least a single non-empty string, then $r^*$
   256         will match infinitely many strings.
   327         will match infinitely many strings.
   257         }
   328         }
   258 
   329 
   259       
   330       
   260 \item There are two kinds of automata that are generated for 
   331 \item There are two kinds of automata that are generated for 
   262   the one in Python generate NFAs.  Explain what is the problem with such
   333   the one in Python generate NFAs.  Explain what is the problem with such
   263   NFAs and what is the reason why they use NFAs. (2) Regular expression
   334   NFAs and what is the reason why they use NFAs. (2) Regular expression
   264   engines like the one in Rust generate DFAs. Explain what is the
   335   engines like the one in Rust generate DFAs. Explain what is the
   265   problem with these regex engines and also what is the problem with $a^{\{1000\}}$
   336   problem with these regex engines and also what is the problem with $a^{\{1000\}}$
   266   in these engines.
   337   in these engines.
       
   338 
       
   339   \solution{
       
   340     Why they use NFAs? NFAs are of similar size as the regular expression (they do not explode
       
   341     for the basic regular expressions. Python regex library supports constructions like
       
   342     back-refernces which cannot be represented by DFAs (string matching with back-references
       
   343     can be NP. What is the problem with $a^{\{1000\}}$. When generating DFAs (and NFAs) for the
       
   344     bounded regular expressions, one has to make $n$ copies, which means their size can grow
       
   345     drastically for large counters.
       
   346   }
   267       
   347       
   268 %\item (Optional) The tokenizer in \texttt{regexp3.scala} takes as
   348 %\item (Optional) The tokenizer in \texttt{regexp3.scala} takes as
   269 %argument a string and a list of rules. The result is a list of tokens. Improve this tokenizer so 
   349 %argument a string and a list of rules. The result is a list of tokens. Improve this tokenizer so 
   270 %that it filters out all comments and whitespace from the result.
   350 %that it filters out all comments and whitespace from the result.
   271 
   351