\documentclass{article}
\usepackage{../style}
\usepackage{../graphics}
\begin{document}
\section*{Homework 4}
\HEADER
\begin{enumerate}
\item Given the regular expressions
\begin{center}
\begin{tabular}{ll}
1) & $(ab + a)\cdot (\ONE + b)$\\
2) & $(aa + a)^*$\\
\end{tabular}
\end{center}
there are several values for how these regular expressions can
recognise the strings (for 1) $ab$ and (for 2) $aaa$. Give in each case
\emph{all} the values and indicate which one is the POSIX value.
\item If a regular expression $r$ does not contain any occurrence of $\ZERO$,
is it possible for $L(r)$ to be empty? Explain why, or give a proof.
\item Define the tokens and regular expressions for a language
consisting of numbers, left-parenthesis $($, right-parenthesis $)$,
identifiers and the operations $+$, $-$ and $*$. Can the following
strings in this language be lexed?
\begin{itemize}
\item $(a + 3) * b$
\item $)()++ -33$
\item $(a / 3) * 3$
\end{itemize}
In case they can, can you give the corresponding token
sequences.
\item Assume $r$ is nullable. Show that
\[ 1 + r + r\cdot r \;\equiv\; r\cdot r
\]
holds.
\item \textbf{(Deleted)} Assume that $s^{-1}$ stands for the operation of reversing a
string $s$. Given the following \emph{reversing} function on regular
expressions
\begin{center}
\begin{tabular}{r@{\hspace{1mm}}c@{\hspace{1mm}}l}
$rev(\ZERO)$ & $\dn$ & $\ZERO$\\
$rev(\ONE)$ & $\dn$ & $\ONE$\\
$rev(c)$ & $\dn$ & $c$\\
$rev(r_1 + r_2)$ & $\dn$ & $rev(r_1) + rev(r_2)$\\
$rev(r_1 \cdot r_2)$ & $\dn$ & $rev(r_2) \cdot rev(r_1)$\\
$rev(r^*)$ & $\dn$ & $rev(r)^*$\\
\end{tabular}
\end{center}
and the set
\begin{center}
$Rev\,A \dn \{s^{-1} \;|\; s \in A\}$
\end{center}
prove whether
\begin{center}
$L(rev(r)) = Rev (L(r))$
\end{center}
holds.
\item Assume the delimiters for comments are
\texttt{$\slash$*} and \texttt{*$\slash$}. Give a
regular expression that can recognise comments of the
form
\begin{center}
\texttt{$\slash$*~\ldots{}~*$\slash$}
\end{center}
where the three dots stand for arbitrary characters, but
not comment delimiters. (Hint: You can assume you are
already given a regular expression written \texttt{ALL},
that can recognise any character, and a regular
expression \texttt{NOT} that recognises the complement
of a regular expression.)
\item Simplify the regular expression
\[
(\ZERO \cdot (b \cdot c)) +
((\ZERO \cdot c) + \ONE)
\]
Does simplification always preserve the meaning of a
regular expression?
\item The Sulzmann \& Lu algorithm contains the function
$mkeps$ which answers how a regular expression can match
the empty string. What is the answer of $mkeps$ for the
regular expressions:
\[
\begin{array}{l}
(\ZERO \cdot (b \cdot c)) +
((\ZERO \cdot c) + \ONE)\\
(a + \ONE) \cdot (\ONE + \ONE)\\
a^*
\end{array}
\]
\item What is the purpose of the record regular expression in
the Sulzmann \& Lu algorithm?
\item Recall the functions \textit{nullable} and
\textit{zeroable}. Define recursive functions
\textit{atmostempty} (for regular expressions that match no
string or only the empty string), \textit{somechars} (for
regular expressions that match some non-empty string),
\textit{infinitestrings} (for regular expressions that can match
infinitely many strings).
\item There are two kinds of automata that are generated for
regular expression matching---DFAs and NFAs. (1) Regular expression engines like
the one in Python generate NFAs. Explain what is the problem with such
NFAs and what is the reason why they use NFAs. (2) Regular expression
engines like the one in Rust generate DFAs. Explain what is the
problem with these regex engines and also what is the problem with $a^{\{1000\}}$
in these engines.
%\item (Optional) The tokenizer in \texttt{regexp3.scala} takes as
%argument a string and a list of rules. The result is a list of tokens. Improve this tokenizer so
%that it filters out all comments and whitespace from the result.
%\item (Optional) Modify the tokenizer in \texttt{regexp2.scala} so that it
%implements the \texttt{findAll} function. This function takes a regular
%expressions and a string, and returns all substrings in this string that
%match the regular expression.
\item \POSTSCRIPT
\end{enumerate}
\end{document}
%%% Local Variables:
%%% mode: latex
%%% TeX-master: t
%%% End: