--- a/hws/hw04.tex Fri Oct 13 23:49:34 2023 +0100
+++ b/hws/hw04.tex Sat Oct 21 09:09:09 2023 +0100
@@ -22,13 +22,49 @@
there are several values for how these regular expressions can
recognise the strings (for 1) $ab$ and (for 2) $aaa$. Give in each case
\emph{all} the values and indicate which one is the POSIX value.
+
+\solution{
+ 1) There are only 2 values (writing $a$ for $Char(a)$ and so on)
+
+ \begin{center}
+ \begin{tabular}{l}
+ $Sequ(Left(Sequ(a,b)),Left(Empty))$\\
+ $Sequ(Right(a),Left(b))$\\
+ \end{tabular}
+ \end{center}
+ The first is the POSIX value because of the preference for $Left$.
+
+ 2) There are three ``main'' values, namely
+
+ \begin{center}
+ \begin{tabular}{l}
+ $Stars\,[Left(Sequ(a,a)),Right(a)]$\\
+ $Stars\,[Right(a), Left(Sequ(a,a))]$\\
+ $Stars\,[Right(a), Right(a), Right(a)]$\\
+ \end{tabular}
+ \end{center}
+
+ Again the first one is the POSIX value, but if it just about all
+ possible values, then there are in fact infinitely many values because
+ the following
+
+ \begin{center}
+ \begin{tabular}{l}
+ $Stars\,[Left(Sequ(a,a)),Empty,Right(a)]$\\
+ $Stars\,[Left(Sequ(a,a)),Empty,Empty,Right(a)]$\\
+ $Stars\,[Left(Sequ(a,a)),Empty,Right(a), Empty]$, \ldots\\
+ \end{tabular}
+ \end{center}
+
+ are also values for this regex and the string $aaa$.
+}
\item If a regular expression $r$ does not contain any occurrence of $\ZERO$,
is it possible for $L(r)$ to be empty? Explain why, or give a proof.
\solution{
- The property to prove is
+ No. The property to prove by induction is
\begin{center}
$P(r)$: If $r$ does not contain $\ZERO$, then $L(r) \not= \emptyset$.
@@ -76,6 +112,10 @@
In case they can, can you give the corresponding token
sequences.
+ \solution{
+ The first 2 are lexibile. The 3 one contains $/$ which is not an operator.
+ }
+
\item Assume $r$ is nullable. Show that
\[ 1 + r + r\cdot r \;\equiv\; r\cdot r
\]
@@ -128,8 +168,8 @@
password should be at least 8 characters long and consist of upper-
and lower-case letters and digits. It should contain at least a
single lower-case letter, at least a single upper-case letter and at
- least a single digit. If possible ise the intersection regular
- expression from CW1, written $\_\&\_$, the bounded regular
+ least a single digit. If possible use the intersection regular
+ expression from CW1, written $\_\&\_$, and the bounded regular
expressions; you can also assume a regular expression written
\texttt{ALL} that can match any character.
@@ -144,6 +184,8 @@
& \;\&\; & $(ALL^*\cdot [0-9]\cdot ALL^*)$\\
\end{tabular}
\end{center}
+
+ $ALL$ could be represented as $\sim \ZERO$.
}
\item Assume the delimiters for comments are
@@ -161,7 +203,13 @@
that can recognise any character, and a regular
expression \texttt{NOT} that recognises the complement
of a regular expression.)
-
+
+ \solution{
+ \[/ * \sim (ALL^* * / ALL^*) * /\]
+
+ The idea to make sure in between $/ *$ and $* /$ ar no strings that contain $* /$.
+ }
+
\item Simplify the regular expression
\[
@@ -170,7 +218,12 @@
\]
Does simplification always preserve the meaning of a
- regular expression?
+ regular expression?
+
+ \solution{ Yes, simplification preserves the language. It
+ simplifies to just $\ONE$. It should be remembered that the
+ Brzozowski does not simplify under stars. This does not apply
+ in this example, though. }
\item The Sulzmann \& Lu algorithm contains the function
$mkeps$ which answers how a regular expression can match
@@ -184,10 +237,28 @@
(a + \ONE) \cdot (\ONE + \ONE)\\
a^*
\end{array}
- \]
+\]
+
+\solution{
+ The values are
+ \begin{center}
+ \begin{tabular}{l}
+ $Right(Right(Empty))$\\
+ $Sequ(Right(\ONE),Left(\ONE))$\\
+ $Stars\,[]$
+ \end{tabular}
+ \end{center}
+
+ The last one uses the rule that $mkeps$ for the star returns always $Star\,[]$.
+ }
\item What is the purpose of the record regular expression in
- the Sulzmann \& Lu algorithm?
+ the Sulzmann \& Lu algorithm?
+
+ \solution{
+ It marks a part of a regular expression and can be used to extract the part of the
+ string that is matched by this marked part of the regular expression.
+ }
\item Recall the functions \textit{nullable} and
\textit{zeroable}. Define recursive functions
@@ -252,7 +323,7 @@
i(r^*) &\dn \neg a(r)
\end{align}
- Here the interesting bit is that as soon $r$ can match at least a single string, then $r^*$
+ Here the interesting bit is that as soon $r$ can match at least a single non-empty string, then $r^*$
will match infinitely many strings.
}
@@ -264,6 +335,15 @@
engines like the one in Rust generate DFAs. Explain what is the
problem with these regex engines and also what is the problem with $a^{\{1000\}}$
in these engines.
+
+ \solution{
+ Why they use NFAs? NFAs are of similar size as the regular expression (they do not explode
+ for the basic regular expressions. Python regex library supports constructions like
+ back-refernces which cannot be represented by DFAs (string matching with back-references
+ can be NP. What is the problem with $a^{\{1000\}}$. When generating DFAs (and NFAs) for the
+ bounded regular expressions, one has to make $n$ copies, which means their size can grow
+ drastically for large counters.
+ }
%\item (Optional) The tokenizer in \texttt{regexp3.scala} takes as
%argument a string and a list of rules. The result is a list of tokens. Improve this tokenizer so