afl-material: comparison hws/hw04.tex

equal deleted inserted replaced

-:c82a45f48bfc
+:5365ef60707e
 \end{center}
 there are several values for how these regular expressions can
 recognise the strings (for 1) $ab$ and (for 2) $aaa$. Give in each case
 \emph{all} the values and indicate which one is the POSIX value.
+\solution{
+1) There are only 2 values (writing $a$ for $Char(a)$ and so on)
+\begin{center}
+\begin{tabular}{l}
+$Sequ(Left(Sequ(a,b)),Left(Empty))$\\
+$Sequ(Right(a),Left(b))$\\
+\end{tabular}
+\end{center}
+The first is the POSIX value because of the preference for $Left$.
+2) There are three ``main'' values, namely
+\begin{center}
+\begin{tabular}{l}
+$Stars\,[Left(Sequ(a,a)),Right(a)]$\\
+$Stars\,[Right(a), Left(Sequ(a,a))]$\\
+$Stars\,[Right(a), Right(a), Right(a)]$\\
+\end{tabular}
+\end{center}
+Again the first one is the POSIX value, but if it just about all
+possible values, then there are in fact infinitely many values because
+the following
+\begin{center}
+\begin{tabular}{l}
+$Stars\,[Left(Sequ(a,a)),Empty,Right(a)]$\\
+$Stars\,[Left(Sequ(a,a)),Empty,Empty,Right(a)]$\\
+$Stars\,[Left(Sequ(a,a)),Empty,Right(a), Empty]$, \ldots\\
+\end{tabular}
+\end{center}
+are also values for this regex and the string $aaa$.
+}
 \item If a regular expression $r$ does not contain any occurrence of $\ZERO$,
 is it possible for $L(r)$ to be empty? Explain why, or give a proof.
 \solution{
-The property to prove is
+No. The property to prove by induction is
 \begin{center}
 $P(r)$: If $r$ does not contain $\ZERO$, then $L(r) \not= \emptyset$.
 \end{center}
 \item $(a / 3) * 3$
 \end{itemize}
 In case they can, can you give the corresponding token
 sequences.
+\solution{
+The first 2 are lexibile. The 3 one contains $/$ which is not an operator.
+}
 \item Assume $r$ is nullable. Show that
 \[ 1 + r + r\cdot r \;\equiv\; r\cdot r
 \]
 \item Construct a regular expression that can validate passwords. A
 password should be at least 8 characters long and consist of upper-
 and lower-case letters and digits. It should contain at least a
 single lower-case letter, at least a single upper-case letter and at
-least a single digit. If possible ise the intersection regular
+least a single digit. If possible use the intersection regular
-expression from CW1, written $\_\&\_$, the bounded regular
+expression from CW1, written $\_\&\_$, and the bounded regular
 expressions; you can also assume a regular expression written
 \texttt{ALL} that can match any character.
 \solution{
 You can build-up the different constraints separately and then
 $ALL^{\{8..\}}$ & \;\&\; & $(ALL^*\cdot [a-z]\cdot ALL^*)$\\
 & \;\&\; & $(ALL^*\cdot [A-Z]\cdot ALL^*)$\\
 & \;\&\; & $(ALL^*\cdot [0-9]\cdot ALL^*)$\\
 \end{tabular}
 \end{center}
+$ALL$ could be represented as $\sim \ZERO$.
 }
 \item Assume the delimiters for comments are
 \texttt{$\slash$*} and \texttt{*$\slash$}. Give a
 regular expression that can recognise comments of the
 not comment delimiters. (Hint: You can assume you are
 already given a regular expression written \texttt{ALL},
 that can recognise any character, and a regular
 expression \texttt{NOT} that recognises the complement
 of a regular expression.)
+\solution{
+\[/ * \sim (ALL^* * / ALL^*) * /\]
+The idea to make sure in between $/ *$ and $* /$ ar no strings that contain $* /$.
+}
 \item Simplify the regular expression
 \[
 (\ZERO \cdot (b \cdot c)) +
 ((\ZERO \cdot c) + \ONE)
 \]
 Does simplification always preserve the meaning of a
 regular expression?
+\solution{ Yes, simplification preserves the language. It
+simplifies to just $\ONE$. It should be remembered that the
+Brzozowski does not simplify under stars. This does not apply
+in this example, though.  }
 \item The Sulzmann \& Lu algorithm contains the function
 $mkeps$ which answers how a regular expression can match
 the empty string. What is the answer of $mkeps$ for the
 regular expressions:
 (\ZERO \cdot (b \cdot c)) +
 ((\ZERO \cdot c) + \ONE)\\
 (a + \ONE) \cdot (\ONE + \ONE)\\
 a^*
 \end{array}
 \]
+\solution{
+The values are
+\begin{center}
+\begin{tabular}{l}
+$Right(Right(Empty))$\\
+$Sequ(Right(\ONE),Left(\ONE))$\\
+$Stars\,[]$
+\end{tabular}
+\end{center}
+The last one uses the rule that $mkeps$ for the star returns always $Star\,[]$.
+}
 \item What is the purpose of the record regular expression in
 the Sulzmann \& Lu algorithm?
+\solution{
+It marks a part of a regular expression and can be used to extract the part of the
+string that is matched by this marked part of the regular expression.
+}
 \item Recall the functions \textit{nullable} and
 \textit{zeroable}.  Define recursive functions
 \textit{atmostempty} (for regular expressions that match no
 string or only the empty string), \textit{somechars} (for
 i(r_1 + r_2) &\dn i(r_1) \vee i(r_2)\\
 i(r_1 \cdot r_2) &\dn (\neg z(r_1) \wedge i(r_2)) \;\vee\; (\neg z(r_2) \wedge i(r_1))\\
 i(r^*)  &\dn \neg a(r)
 \end{align}
-Here the interesting bit is that as soon $r$ can match at least a single string, then $r^*$
+Here the interesting bit is that as soon $r$ can match at least a single non-empty string, then $r^*$
 will match infinitely many strings.
 }
 \item There are two kinds of automata that are generated for
 the one in Python generate NFAs.  Explain what is the problem with such
 NFAs and what is the reason why they use NFAs. (2) Regular expression
 engines like the one in Rust generate DFAs. Explain what is the
 problem with these regex engines and also what is the problem with $a^{\{1000\}}$
 in these engines.
+\solution{
+Why they use NFAs? NFAs are of similar size as the regular expression (they do not explode
+for the basic regular expressions. Python regex library supports constructions like
+back-refernces which cannot be represented by DFAs (string matching with back-references
+can be NP. What is the problem with $a^{\{1000\}}$. When generating DFAs (and NFAs) for the
+bounded regular expressions, one has to make $n$ copies, which means their size can grow
+drastically for large counters.
+}
 %\item (Optional) The tokenizer in \texttt{regexp3.scala} takes as
 %argument a string and a list of rules. The result is a list of tokens. Improve this tokenizer so
 %that it filters out all comments and whitespace from the result.

changeset 943	5365ef60707e
parent 939	f85e784d3014