\documentclass{article}
\usepackage{../style}
\usepackage{../graphics}
\begin{document}
\section*{Homework 6}
\HEADER
\begin{enumerate}
\item (i) Give the regular expressions for lexing a language
consisting of whitespaces, identifiers (some letters
followed by digits), numbers, operations \texttt{=},
\texttt{<} and \texttt{>}, and the keywords \texttt{if},
\texttt{then} and \texttt{else}. (ii) Decide whether the
following strings can be lexed in this language?
\begin{enumerate}
\item \texttt{"if y4 = 3 then 1 else 3"}
\item \texttt{"if33 ifif then then23 else else 32"}
\item \texttt{"if x4x < 33 then 1 else 3"}
\end{enumerate}
In case they can, give the corresponding token sequences. (Hint:
Observe the maximal munch rule and priorities of your regular
expressions that make the process of lexing unambiguous.)
\item Suppose the grammar
\begin{center}
\begin{tabular}{lcl}
$E$ & $\rightarrow$ & $F \;|\; F \cdot * \cdot F \;|\; F \cdot \backslash \cdot F$\\
$F$ & $\rightarrow$ & $T \;|\; T \cdot \texttt{+} \cdot T \;|\; T \cdot \texttt{-} \cdot T$\\
$T$ & $\rightarrow$ & $num \;|\; \texttt{(} \cdot E \cdot \texttt{)}$\\
\end{tabular}
\end{center}
where $E$, $F$ and $T$ are non-terminals, $E$ is the starting symbol of the grammar, and $num$ stands for
a number token. Give a parse tree for the string \texttt{(3+3)+(2*3)}.
\item Define what it means for a grammar to be ambiguous. Give an example of
an ambiguous grammar.
\item Suppose boolean expressions are built up from
\begin{center}
\begin{tabular}{ll}
1.) & tokens for \texttt{true} and \texttt{false},\\
2.) & the infix operations \texttt{$\wedge$} and \texttt{$\vee$},\\
3.) & the prefix operation $\neg$, and\\
4.) & can be enclosed in parentheses.
\end{tabular}
\end{center}
(i) Give a grammar that can recognise such boolean expressions
and (ii) give a sample string involving all rules given in 1.-4.~that
can be parsed by this grammar.
\item Parsing combinators consist of atomic parsers, alternative
parsers, sequence parsers and semantic actions. What is the purpose
of (1) atomic parsers and of (2) semantic actions?
\item Parser combinators can directly be given a string as
input, without the need of a lexer. What are the
advantages to first lex a string and then feed a
sequence of tokens as input to the parser?
% Reason 1 you can filter out whitespaces and comments, which makes the grammar rules simpler. If you have to make sure that a whitespace comes after a variable say, then your parser rule for variables gets more complicated. Same with comments which do not contribute anything to the parser tree.
% Reason 2) The lexer can already classify tokens, for example as numbers, keywords or identifiers. This again makes the grammar rules more deterministic and as a result faster to parse.
% The point is that parser combinators can be used to process strings, but in case of compilers where whitespaces and comments need to be filtered out, the lexing phase is actually quite useful.
\item The injection function for sequence regular expressions is defined
by three clauses:
\begin{center}
\begin{tabular}{l@{\hspace{1mm}}c@{\hspace{1mm}}l}
$\inj\,(r_1 \cdot r_2)\,c\,\,Seq(v_1,v_2)$ & $\dn$ & $Seq(\inj\,r_1\,c\,v_1,v_2)$\\
$\inj\,(r_1 \cdot r_2)\,c\,\,\Left(Seq(v_1,v_2))$ & $\dn$ & $Seq(\inj\,r_1\,c\,v_1,v_2)$\\
$\inj\,(r_1 \cdot r_2)\,c\,\,Right(v)$ & $\dn$ & $Seq(\textit{mkeps}(r_1),\inj\,r_2\,c\,v)$\\
\end{tabular}
\end{center}
Explain why there are three cases in the injection function for sequence
regular expressions.
\item \POSTSCRIPT
\end{enumerate}
\end{document}
%%% Local Variables:
%%% mode: latex
%%% TeX-master: t
%%% End: