\documentclass{article}+ −
\usepackage{../style}+ −
\usepackage{../graphics}+ −
\usepackage{../grammar}+ −
+ −
\begin{document}+ −
+ −
\section*{Homework 6}+ −
+ −
%\HEADER+ −
+ −
\begin{enumerate}+ −
\item (i) Give the regular expressions for lexing a language+ −
consisting of whitespaces, identifiers (some letters+ −
followed by digits), numbers, operations \texttt{=},+ −
\texttt{<} and \texttt{>}, and the keywords \texttt{if},+ −
\texttt{then} and \texttt{else}. (ii) Decide whether the+ −
following strings can be lexed in this language?+ −
+ −
\begin{enumerate}+ −
\item \texttt{"if y4 = 3 then 1 else 3"}+ −
\item \texttt{"if33 ifif then then23 else else 32"}+ −
\item \texttt{"if x4x < 33 then 1 else 3"}+ −
\end{enumerate}+ −
+ −
In case they can, give the corresponding token sequences. (Hint: + −
Observe the maximal munch rule and priorities of your regular+ −
expressions that make the process of lexing unambiguous.)+ −
+ −
\solution{+ −
a and b are ok; c is not matched because x4x is not an identifier according to the rules.+ −
}+ −
+ −
+ −
\item Suppose the grammar+ −
+ −
\begin{plstx}[margin=1cm] + −
:\meta{E} ::= \meta{F} \;|\; \meta{F} \cdot \texttt{*} \cdot \meta{F} \;|\; \meta{F} \cdot \backslash \cdot \meta{F}\\+ −
:\meta{F} ::= \meta{T} \;|\; \meta{T} \cdot \texttt{+} \cdot \meta{T} \;|\; \meta{T} \cdot \texttt{-} \cdot \meta{T}\\+ −
:\meta{T} ::= $num$ \;|\; \texttt{(} \cdot E \cdot \texttt{)}\\+ −
\end{plstx}+ −
+ −
where \meta{E}, \meta{F} and \meta{T} are non-terminals, \meta{E} is+ −
the starting symbol of the grammar, and $num$ stands for a number+ −
token. Give a parse tree for the string \texttt{(3+3)+(2*3)}. Note that+ −
\meta{F} and \meta{T} are ``exchanged'' in this grammar in comparison with+ −
the usual grammar for arithmetic expressions. What does this mean in terms+ −
of precedences of the operators?+ −
+ −
\solution{+ −
\begin{tikzpicture}[level distance=10mm, black,sibling distance=10mm]+ −
\node {\meta{E}}+ −
child[sibling distance=20mm] {node {\meta{F}} + −
child {node {\meta{T}}+ −
child[sibling distance=5mm] {node {(}}+ −
child[sibling distance=5mm] {node {\meta{E}}+ −
child {node {\meta{F}}+ −
child {node {\meta{T}} child {node {3}}}+ −
child {node {+}}+ −
child {node {\meta{T}} child {node {3}}}+ −
} + −
}+ −
child[sibling distance=5mm] {node {)}}+ −
}+ −
child {node {+}}+ −
child {node {\meta{T}}+ −
child[sibling distance=5mm] {node {(}}+ −
child[sibling distance=5mm] {node {\meta{E}}+ −
child {node {\meta{F}}+ −
child {node {\meta{T}} child {node {2}}}+ −
}+ −
child {node {*}}+ −
child {node {\meta{F}}+ −
child {node {\meta{T}} child {node {3}}}+ −
}+ −
}+ −
child[sibling distance=5mm] {node {)}}+ −
}+ −
};+ −
\end{tikzpicture}+ −
+ −
+ −
For the second part "1+2*3" will be parsed as (1+2)*3, meaning + and - bind+ −
tighter than * and $\backslash$+ −
}+ −
+ −
\item Define what it means for a grammar to be ambiguous. Give an example of+ −
an ambiguous grammar.+ −
+ −
\solution{+ −
Already the grammar+ −
\begin{plstx}[margin=1cm]+ −
: \meta{E} ::= \meta{E}\cdot +\cdot\meta{E} | num\\+ −
\end{plstx}+ −
+ −
is ambiguous because a string like "1+2+3" can be parsed+ −
as (1+2)+3 or 1+(2+3).+ −
}+ −
+ −
+ −
\item Suppose boolean expressions are built up from+ −
+ −
\begin{center}+ −
\begin{tabular}{ll}+ −
1.) & tokens for \texttt{true} and \texttt{false},\\+ −
2.) & the infix operations \texttt{$\wedge$} and \texttt{$\vee$},\\+ −
3.) & the prefix operation $\neg$, and\\+ −
4.) & can be enclosed in parentheses.+ −
\end{tabular}+ −
\end{center}+ −
+ −
(i) Give a grammar that can recognise such boolean expressions+ −
and (ii) give a sample string involving all rules given in 1.-4.~that + −
can be parsed by this grammar.+ −
+ −
\solution{+ −
The simplest grammar is+ −
+ −
\begin{plstx}[margin=1cm] + −
:\meta{B} ::= true \;|\; false \;|\;\meta{B} \cdot $\wedge$ \cdot \meta{B} \;|\; \meta{B} \cdot $\vee$ \cdot \meta{B}+ −
\;|\; $\neg$\meta{B} \;|\; (\cdot\meta{B}\cdot)\\+ −
\end{plstx}+ −
+ −
This is unfortunately a left-recursive grammar. Giving a not left-recursive one is a bit more involved.+ −
}+ −
+ −
\item Parsing combinators consist of atomic parsers, alternative+ −
parsers, sequence parsers and semantic actions. What is the purpose+ −
of (1) atomic parsers and of (2) map-parsers (also called semantic actions)?+ −
+ −
\solution{+ −
Atomic parsers look at a concrete prefix of the input, like num-tokens or identifiers.+ −
Map-parsers apply a function to the output of a parser. In this way you can transform+ −
the output from, for example, a string to an integer.+ −
}+ −
+ −
+ −
\item Parser combinators can directly be given a string as+ −
input, without the need of a lexer. What are the+ −
advantages to first lex a string and then feed a+ −
sequence of tokens as input to the parser?+ −
+ −
\solution{ Reason 1 you can filter out whitespaces and comments,+ −
which makes the grammar rules simpler. If you have to make+ −
sure that a whitespace comes after a variable say, then your+ −
parser rule for variables gets more complicated. Same with+ −
comments which do not contribute anything to the parse tree.+ −
+ −
Reason 2) The lexer can already classify tokens, for example+ −
as numbers, keywords or identifiers. This again makes the grammar+ −
rules more deterministic and as a result faster to parse.+ −
+ −
The point is that parser combinators can be used to process+ −
strings, but in case of compilers where whitespaces and+ −
comments need to be filtered out, the lexing phase is actually+ −
quite useful. }+ −
+ −
\item The injection function for sequence regular expressions is defined+ −
by three clauses:+ −
+ −
\begin{center}+ −
\begin{tabular}{l@{\hspace{1mm}}c@{\hspace{1mm}}l}+ −
$\inj\,(r_1 \cdot r_2)\,c\,\,Seq(v_1,v_2)$ & $\dn$ & $Seq(\inj\,r_1\,c\,v_1,v_2)$\\+ −
$\inj\,(r_1 \cdot r_2)\,c\,\,\Left(Seq(v_1,v_2))$ & $\dn$ & $Seq(\inj\,r_1\,c\,v_1,v_2)$\\+ −
$\inj\,(r_1 \cdot r_2)\,c\,\,Right(v)$ & $\dn$ & $Seq(\textit{mkeps}(r_1),\inj\,r_2\,c\,v)$\\+ −
\end{tabular}+ −
\end{center}+ −
+ −
Explain why there are three cases in the injection function for sequence+ −
regular expressions.+ −
+ −
\solution{+ −
This is because the derivative of sequences can be of the form+ −
+ −
\begin{itemize}+ −
\item $(der\,c\,r_1)\cdot r_2$+ −
\item $(der\,c\,r_1)\cdot r_2 \;+\; der\,c\,r_2$ + −
\end{itemize}+ −
+ −
In the first case the value needs to be of the form $Seq$, in the second case $\Left$ or $Right$.+ −
Therefore 3 cases.+ −
}+ −
+ −
\item \POSTSCRIPT + −
\end{enumerate}+ −
+ −
\end{document}+ −
+ −
%%% Local Variables: + −
%%% mode: latex+ −
%%% TeX-master: t+ −
%%% End: + −