56
|
1 |
\documentclass{article}
|
292
Christian Urban <christian dot urban at kcl dot ac dot uk>
diff
changeset
|
2 |
\usepackage{../style}
|
Christian Urban <christian dot urban at kcl dot ac dot uk>
diff
changeset
|
3 |
\usepackage{../graphics}
|
954
|
4 |
\usepackage{../grammar}
|
56
|
5 |
|
|
6 |
\begin{document}
|
|
7 |
|
|
8 |
\section*{Homework 6}
|
|
9 |
|
916
|
10 |
%\HEADER
|
359
Christian Urban <christian dot urban at kcl dot ac dot uk>
diff
changeset
|
11 |
|
56
|
12 |
\begin{enumerate}
|
|
13 |
\item (i) Give the regular expressions for lexing a language
|
401
Christian Urban <christian dot urban at kcl dot ac dot uk>
diff
changeset
|
14 |
consisting of whitespaces, identifiers (some letters
|
Christian Urban <christian dot urban at kcl dot ac dot uk>
diff
changeset
|
15 |
followed by digits), numbers, operations \texttt{=},
|
Christian Urban <christian dot urban at kcl dot ac dot uk>
diff
changeset
|
16 |
\texttt{<} and \texttt{>}, and the keywords \texttt{if},
|
Christian Urban <christian dot urban at kcl dot ac dot uk>
diff
changeset
|
17 |
\texttt{then} and \texttt{else}. (ii) Decide whether the
|
Christian Urban <christian dot urban at kcl dot ac dot uk>
diff
changeset
|
18 |
following strings can be lexed in this language?
|
56
|
19 |
|
|
20 |
\begin{enumerate}
|
|
21 |
\item \texttt{"if y4 = 3 then 1 else 3"}
|
|
22 |
\item \texttt{"if33 ifif then then23 else else 32"}
|
|
23 |
\item \texttt{"if x4x < 33 then 1 else 3"}
|
|
24 |
\end{enumerate}
|
|
25 |
|
|
26 |
In case they can, give the corresponding token sequences. (Hint:
|
89
Christian Urban <christian dot urban at kcl dot ac dot uk>
diff
changeset
|
27 |
Observe the maximal munch rule and priorities of your regular
|
56
|
28 |
expressions that make the process of lexing unambiguous.)
|
|
29 |
|
955
|
30 |
\solution{
|
|
31 |
a and b are ok; c is not matched because x4x is not an identifier according to the rules.
|
|
32 |
}
|
|
33 |
|
|
34 |
|
56
|
35 |
\item Suppose the grammar
|
|
36 |
|
954
|
37 |
\begin{plstx}[margin=1cm]
|
|
38 |
:\meta{E} ::= \meta{F} \;|\; \meta{F} \cdot \texttt{*} \cdot \meta{F} \;|\; \meta{F} \cdot \backslash \cdot \meta{F}\\
|
|
39 |
:\meta{F} ::= \meta{T} \;|\; \meta{T} \cdot \texttt{+} \cdot \meta{T} \;|\; \meta{T} \cdot \texttt{-} \cdot \meta{T}\\
|
|
40 |
:\meta{T} ::= $num$ \;|\; \texttt{(} \cdot E \cdot \texttt{)}\\
|
|
41 |
\end{plstx}
|
56
|
42 |
|
954
|
43 |
where \meta{E}, \meta{F} and \meta{T} are non-terminals, \meta{E} is
|
|
44 |
the starting symbol of the grammar, and $num$ stands for a number
|
|
45 |
token. Give a parse tree for the string \texttt{(3+3)+(2*3)}. Note that
|
|
46 |
\meta{F} and \meta{T} are ``exchanged'' in this grammar in comparison with
|
|
47 |
the usual grammar for arithmetic expressions. What does this mean in terms
|
|
48 |
of precedences of the operators?
|
|
49 |
|
|
50 |
\solution{
|
955
|
51 |
\begin{tikzpicture}[level distance=10mm, black,sibling distance=10mm]
|
|
52 |
\node {\meta{E}}
|
|
53 |
child[sibling distance=20mm] {node {\meta{F}}
|
|
54 |
child {node {\meta{T}}
|
|
55 |
child[sibling distance=5mm] {node {(}}
|
|
56 |
child[sibling distance=5mm] {node {\meta{E}}
|
|
57 |
child {node {\meta{F}}
|
|
58 |
child {node {\meta{T}} child {node {3}}}
|
|
59 |
child {node {+}}
|
|
60 |
child {node {\meta{T}} child {node {3}}}
|
|
61 |
}
|
|
62 |
}
|
|
63 |
child[sibling distance=5mm] {node {)}}
|
|
64 |
}
|
|
65 |
child {node {+}}
|
|
66 |
child {node {\meta{T}}
|
|
67 |
child[sibling distance=5mm] {node {(}}
|
|
68 |
child[sibling distance=5mm] {node {\meta{E}}
|
|
69 |
child {node {\meta{F}}
|
|
70 |
child {node {\meta{T}} child {node {2}}}
|
|
71 |
}
|
|
72 |
child {node {*}}
|
|
73 |
child {node {\meta{F}}
|
|
74 |
child {node {\meta{T}} child {node {3}}}
|
|
75 |
}
|
|
76 |
}
|
|
77 |
child[sibling distance=5mm] {node {)}}
|
|
78 |
}
|
|
79 |
};
|
|
80 |
\end{tikzpicture}
|
|
81 |
|
|
82 |
|
954
|
83 |
For the second part "1+2*3" will be parsed as (1+2)*3, meaning + and - bind
|
|
84 |
tighter than * and $\backslash$
|
|
85 |
}
|
56
|
86 |
|
|
87 |
\item Define what it means for a grammar to be ambiguous. Give an example of
|
|
88 |
an ambiguous grammar.
|
|
89 |
|
954
|
90 |
\solution{
|
|
91 |
Already the grammar
|
|
92 |
\begin{plstx}[margin=1cm]
|
|
93 |
: \meta{E} ::= \meta{E}\cdot +\cdot\meta{E} | num\\
|
|
94 |
\end{plstx}
|
|
95 |
|
|
96 |
is ambiguous because a string like "1+2+3" can be parsed
|
|
97 |
as (1+2)+3 or 1+(2+3).
|
|
98 |
}
|
|
99 |
|
|
100 |
|
56
|
101 |
\item Suppose boolean expressions are built up from
|
|
102 |
|
|
103 |
\begin{center}
|
|
104 |
\begin{tabular}{ll}
|
|
105 |
1.) & tokens for \texttt{true} and \texttt{false},\\
|
|
106 |
2.) & the infix operations \texttt{$\wedge$} and \texttt{$\vee$},\\
|
|
107 |
3.) & the prefix operation $\neg$, and\\
|
|
108 |
4.) & can be enclosed in parentheses.
|
|
109 |
\end{tabular}
|
|
110 |
\end{center}
|
|
111 |
|
|
112 |
(i) Give a grammar that can recognise such boolean expressions
|
|
113 |
and (ii) give a sample string involving all rules given in 1.-4.~that
|
|
114 |
can be parsed by this grammar.
|
|
115 |
|
955
|
116 |
\solution{
|
|
117 |
The simplest grammar is
|
|
118 |
|
|
119 |
\begin{plstx}[margin=1cm]
|
|
120 |
:\meta{B} ::= true \;|\; false \;|\;\meta{B} \cdot $\wedge$ \cdot \meta{B} \;|\; \meta{B} \cdot $\vee$ \cdot \meta{B}
|
|
121 |
\;|\; $\neg$\meta{B} \;|\; (\cdot\meta{B}\cdot)\\
|
|
122 |
\end{plstx}
|
459
|
123 |
|
955
|
124 |
This is unfortunately a left-recursive grammar. Giving a not left-recursive one is a bit more involved.
|
|
125 |
}
|
721
|
126 |
|
647
|
127 |
\item Parsing combinators consist of atomic parsers, alternative
|
|
128 |
parsers, sequence parsers and semantic actions. What is the purpose
|
954
|
129 |
of (1) atomic parsers and of (2) map-parsers (also called semantic actions)?
|
647
|
130 |
|
954
|
131 |
\solution{
|
|
132 |
Atomic parsers look at a concrete prefix of the input, like num-tokens or identifiers.
|
|
133 |
Map-parsers apply a function to the output of a parser. In this way you can transform
|
|
134 |
the output from, for example, a string to an integer.
|
|
135 |
}
|
|
136 |
|
|
137 |
|
726
|
138 |
\item Parser combinators can directly be given a string as
|
|
139 |
input, without the need of a lexer. What are the
|
|
140 |
advantages to first lex a string and then feed a
|
|
141 |
sequence of tokens as input to the parser?
|
|
142 |
|
954
|
143 |
\solution{ Reason 1 you can filter out whitespaces and comments,
|
|
144 |
which makes the grammar rules simpler. If you have to make
|
|
145 |
sure that a whitespace comes after a variable say, then your
|
|
146 |
parser rule for variables gets more complicated. Same with
|
|
147 |
comments which do not contribute anything to the parse tree.
|
|
148 |
|
|
149 |
Reason 2) The lexer can already classify tokens, for example
|
|
150 |
as numbers, keywords or identifiers. This again makes the grammar
|
|
151 |
rules more deterministic and as a result faster to parse.
|
898
|
152 |
|
954
|
153 |
The point is that parser combinators can be used to process
|
|
154 |
strings, but in case of compilers where whitespaces and
|
|
155 |
comments need to be filtered out, the lexing phase is actually
|
|
156 |
quite useful. }
|
898
|
157 |
|
894
|
158 |
\item The injection function for sequence regular expressions is defined
|
|
159 |
by three clauses:
|
726
|
160 |
|
894
|
161 |
\begin{center}
|
|
162 |
\begin{tabular}{l@{\hspace{1mm}}c@{\hspace{1mm}}l}
|
|
163 |
$\inj\,(r_1 \cdot r_2)\,c\,\,Seq(v_1,v_2)$ & $\dn$ & $Seq(\inj\,r_1\,c\,v_1,v_2)$\\
|
|
164 |
$\inj\,(r_1 \cdot r_2)\,c\,\,\Left(Seq(v_1,v_2))$ & $\dn$ & $Seq(\inj\,r_1\,c\,v_1,v_2)$\\
|
|
165 |
$\inj\,(r_1 \cdot r_2)\,c\,\,Right(v)$ & $\dn$ & $Seq(\textit{mkeps}(r_1),\inj\,r_2\,c\,v)$\\
|
|
166 |
\end{tabular}
|
|
167 |
\end{center}
|
|
168 |
|
954
|
169 |
Explain why there are three cases in the injection function for sequence
|
|
170 |
regular expressions.
|
|
171 |
|
|
172 |
\solution{
|
|
173 |
This is because the derivative of sequences can be of the form
|
|
174 |
|
|
175 |
\begin{itemize}
|
|
176 |
\item $(der\,c\,r_1)\cdot r_2$
|
|
177 |
\item $(der\,c\,r_1)\cdot r_2 \;+\; der\,c\,r_2$
|
|
178 |
\end{itemize}
|
|
179 |
|
|
180 |
In the first case the value needs to be of the form $Seq$, in the second case $\Left$ or $Right$.
|
|
181 |
Therefore 3 cases.
|
|
182 |
}
|
894
|
183 |
|
459
|
184 |
\item \POSTSCRIPT
|
56
|
185 |
\end{enumerate}
|
|
186 |
|
|
187 |
\end{document}
|
|
188 |
|
|
189 |
%%% Local Variables:
|
|
190 |
%%% mode: latex
|
|
191 |
%%% TeX-master: t
|
|
192 |
%%% End:
|