diff -r c82a45f48bfc -r 5365ef60707e hws/hw04.tex --- a/hws/hw04.tex Fri Oct 13 23:49:34 2023 +0100 +++ b/hws/hw04.tex Sat Oct 21 09:09:09 2023 +0100 @@ -22,13 +22,49 @@ there are several values for how these regular expressions can recognise the strings (for 1) $ab$ and (for 2) $aaa$. Give in each case \emph{all} the values and indicate which one is the POSIX value. + +\solution{ + 1) There are only 2 values (writing $a$ for $Char(a)$ and so on) + + \begin{center} + \begin{tabular}{l} + $Sequ(Left(Sequ(a,b)),Left(Empty))$\\ + $Sequ(Right(a),Left(b))$\\ + \end{tabular} + \end{center} + The first is the POSIX value because of the preference for $Left$. + + 2) There are three ``main'' values, namely + + \begin{center} + \begin{tabular}{l} + $Stars\,[Left(Sequ(a,a)),Right(a)]$\\ + $Stars\,[Right(a), Left(Sequ(a,a))]$\\ + $Stars\,[Right(a), Right(a), Right(a)]$\\ + \end{tabular} + \end{center} + + Again the first one is the POSIX value, but if it just about all + possible values, then there are in fact infinitely many values because + the following + + \begin{center} + \begin{tabular}{l} + $Stars\,[Left(Sequ(a,a)),Empty,Right(a)]$\\ + $Stars\,[Left(Sequ(a,a)),Empty,Empty,Right(a)]$\\ + $Stars\,[Left(Sequ(a,a)),Empty,Right(a), Empty]$, \ldots\\ + \end{tabular} + \end{center} + + are also values for this regex and the string $aaa$. +} \item If a regular expression $r$ does not contain any occurrence of $\ZERO$, is it possible for $L(r)$ to be empty? Explain why, or give a proof. \solution{ - The property to prove is + No. The property to prove by induction is \begin{center} $P(r)$: If $r$ does not contain $\ZERO$, then $L(r) \not= \emptyset$. @@ -76,6 +112,10 @@ In case they can, can you give the corresponding token sequences. + \solution{ + The first 2 are lexibile. The 3 one contains $/$ which is not an operator. + } + \item Assume $r$ is nullable. Show that \[ 1 + r + r\cdot r \;\equiv\; r\cdot r \] @@ -128,8 +168,8 @@ password should be at least 8 characters long and consist of upper- and lower-case letters and digits. It should contain at least a single lower-case letter, at least a single upper-case letter and at - least a single digit. If possible ise the intersection regular - expression from CW1, written $\_\&\_$, the bounded regular + least a single digit. If possible use the intersection regular + expression from CW1, written $\_\&\_$, and the bounded regular expressions; you can also assume a regular expression written \texttt{ALL} that can match any character. @@ -144,6 +184,8 @@ & \;\&\; & $(ALL^*\cdot [0-9]\cdot ALL^*)$\\ \end{tabular} \end{center} + + $ALL$ could be represented as $\sim \ZERO$. } \item Assume the delimiters for comments are @@ -161,7 +203,13 @@ that can recognise any character, and a regular expression \texttt{NOT} that recognises the complement of a regular expression.) - + + \solution{ + \[/ * \sim (ALL^* * / ALL^*) * /\] + + The idea to make sure in between $/ *$ and $* /$ ar no strings that contain $* /$. + } + \item Simplify the regular expression \[ @@ -170,7 +218,12 @@ \] Does simplification always preserve the meaning of a - regular expression? + regular expression? + + \solution{ Yes, simplification preserves the language. It + simplifies to just $\ONE$. It should be remembered that the + Brzozowski does not simplify under stars. This does not apply + in this example, though. } \item The Sulzmann \& Lu algorithm contains the function $mkeps$ which answers how a regular expression can match @@ -184,10 +237,28 @@ (a + \ONE) \cdot (\ONE + \ONE)\\ a^* \end{array} - \] +\] + +\solution{ + The values are + \begin{center} + \begin{tabular}{l} + $Right(Right(Empty))$\\ + $Sequ(Right(\ONE),Left(\ONE))$\\ + $Stars\,[]$ + \end{tabular} + \end{center} + + The last one uses the rule that $mkeps$ for the star returns always $Star\,[]$. + } \item What is the purpose of the record regular expression in - the Sulzmann \& Lu algorithm? + the Sulzmann \& Lu algorithm? + + \solution{ + It marks a part of a regular expression and can be used to extract the part of the + string that is matched by this marked part of the regular expression. + } \item Recall the functions \textit{nullable} and \textit{zeroable}. Define recursive functions @@ -252,7 +323,7 @@ i(r^*) &\dn \neg a(r) \end{align} - Here the interesting bit is that as soon $r$ can match at least a single string, then $r^*$ + Here the interesting bit is that as soon $r$ can match at least a single non-empty string, then $r^*$ will match infinitely many strings. } @@ -264,6 +335,15 @@ engines like the one in Rust generate DFAs. Explain what is the problem with these regex engines and also what is the problem with $a^{\{1000\}}$ in these engines. + + \solution{ + Why they use NFAs? NFAs are of similar size as the regular expression (they do not explode + for the basic regular expressions. Python regex library supports constructions like + back-refernces which cannot be represented by DFAs (string matching with back-references + can be NP. What is the problem with $a^{\{1000\}}$. When generating DFAs (and NFAs) for the + bounded regular expressions, one has to make $n$ copies, which means their size can grow + drastically for large counters. + } %\item (Optional) The tokenizer in \texttt{regexp3.scala} takes as %argument a string and a list of rules. The result is a list of tokens. Improve this tokenizer so