author | Christian Urban <christian.urban@kcl.ac.uk> |
Wed, 29 May 2024 13:25:30 +0100 | |
changeset 960 | c7009356ddd8 |
parent 955 | 47acfd7f9096 |
permissions | -rw-r--r-- |
56 | 1 |
\documentclass{article} |
292
7ed2a25dd115
updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents:
102
diff
changeset
|
2 |
\usepackage{../style} |
7ed2a25dd115
updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents:
102
diff
changeset
|
3 |
\usepackage{../graphics} |
954 | 4 |
\usepackage{../grammar} |
56 | 5 |
|
6 |
\begin{document} |
|
7 |
||
8 |
\section*{Homework 6} |
|
9 |
||
916 | 10 |
%\HEADER |
359
db106e5b7c4d
updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents:
292
diff
changeset
|
11 |
|
56 | 12 |
\begin{enumerate} |
13 |
\item (i) Give the regular expressions for lexing a language |
|
401
5d85dc9779b1
updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents:
359
diff
changeset
|
14 |
consisting of whitespaces, identifiers (some letters |
5d85dc9779b1
updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents:
359
diff
changeset
|
15 |
followed by digits), numbers, operations \texttt{=}, |
5d85dc9779b1
updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents:
359
diff
changeset
|
16 |
\texttt{<} and \texttt{>}, and the keywords \texttt{if}, |
5d85dc9779b1
updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents:
359
diff
changeset
|
17 |
\texttt{then} and \texttt{else}. (ii) Decide whether the |
5d85dc9779b1
updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents:
359
diff
changeset
|
18 |
following strings can be lexed in this language? |
56 | 19 |
|
20 |
\begin{enumerate} |
|
21 |
\item \texttt{"if y4 = 3 then 1 else 3"} |
|
22 |
\item \texttt{"if33 ifif then then23 else else 32"} |
|
23 |
\item \texttt{"if x4x < 33 then 1 else 3"} |
|
24 |
\end{enumerate} |
|
25 |
||
26 |
In case they can, give the corresponding token sequences. (Hint: |
|
89
24adcc265f2e
typo
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents:
56
diff
changeset
|
27 |
Observe the maximal munch rule and priorities of your regular |
56 | 28 |
expressions that make the process of lexing unambiguous.) |
29 |
||
955 | 30 |
\solution{ |
31 |
a and b are ok; c is not matched because x4x is not an identifier according to the rules. |
|
32 |
} |
|
33 |
||
34 |
||
56 | 35 |
\item Suppose the grammar |
36 |
||
954 | 37 |
\begin{plstx}[margin=1cm] |
38 |
:\meta{E} ::= \meta{F} \;|\; \meta{F} \cdot \texttt{*} \cdot \meta{F} \;|\; \meta{F} \cdot \backslash \cdot \meta{F}\\ |
|
39 |
:\meta{F} ::= \meta{T} \;|\; \meta{T} \cdot \texttt{+} \cdot \meta{T} \;|\; \meta{T} \cdot \texttt{-} \cdot \meta{T}\\ |
|
40 |
:\meta{T} ::= $num$ \;|\; \texttt{(} \cdot E \cdot \texttt{)}\\ |
|
41 |
\end{plstx} |
|
56 | 42 |
|
954 | 43 |
where \meta{E}, \meta{F} and \meta{T} are non-terminals, \meta{E} is |
44 |
the starting symbol of the grammar, and $num$ stands for a number |
|
45 |
token. Give a parse tree for the string \texttt{(3+3)+(2*3)}. Note that |
|
46 |
\meta{F} and \meta{T} are ``exchanged'' in this grammar in comparison with |
|
47 |
the usual grammar for arithmetic expressions. What does this mean in terms |
|
48 |
of precedences of the operators? |
|
49 |
||
50 |
\solution{ |
|
955 | 51 |
\begin{tikzpicture}[level distance=10mm, black,sibling distance=10mm] |
52 |
\node {\meta{E}} |
|
53 |
child[sibling distance=20mm] {node {\meta{F}} |
|
54 |
child {node {\meta{T}} |
|
55 |
child[sibling distance=5mm] {node {(}} |
|
56 |
child[sibling distance=5mm] {node {\meta{E}} |
|
57 |
child {node {\meta{F}} |
|
58 |
child {node {\meta{T}} child {node {3}}} |
|
59 |
child {node {+}} |
|
60 |
child {node {\meta{T}} child {node {3}}} |
|
61 |
} |
|
62 |
} |
|
63 |
child[sibling distance=5mm] {node {)}} |
|
64 |
} |
|
65 |
child {node {+}} |
|
66 |
child {node {\meta{T}} |
|
67 |
child[sibling distance=5mm] {node {(}} |
|
68 |
child[sibling distance=5mm] {node {\meta{E}} |
|
69 |
child {node {\meta{F}} |
|
70 |
child {node {\meta{T}} child {node {2}}} |
|
71 |
} |
|
72 |
child {node {*}} |
|
73 |
child {node {\meta{F}} |
|
74 |
child {node {\meta{T}} child {node {3}}} |
|
75 |
} |
|
76 |
} |
|
77 |
child[sibling distance=5mm] {node {)}} |
|
78 |
} |
|
79 |
}; |
|
80 |
\end{tikzpicture} |
|
81 |
||
82 |
||
954 | 83 |
For the second part "1+2*3" will be parsed as (1+2)*3, meaning + and - bind |
84 |
tighter than * and $\backslash$ |
|
85 |
} |
|
56 | 86 |
|
87 |
\item Define what it means for a grammar to be ambiguous. Give an example of |
|
88 |
an ambiguous grammar. |
|
89 |
||
954 | 90 |
\solution{ |
91 |
Already the grammar |
|
92 |
\begin{plstx}[margin=1cm] |
|
93 |
: \meta{E} ::= \meta{E}\cdot +\cdot\meta{E} | num\\ |
|
94 |
\end{plstx} |
|
95 |
||
96 |
is ambiguous because a string like "1+2+3" can be parsed |
|
97 |
as (1+2)+3 or 1+(2+3). |
|
98 |
} |
|
99 |
||
100 |
||
56 | 101 |
\item Suppose boolean expressions are built up from |
102 |
||
103 |
\begin{center} |
|
104 |
\begin{tabular}{ll} |
|
105 |
1.) & tokens for \texttt{true} and \texttt{false},\\ |
|
106 |
2.) & the infix operations \texttt{$\wedge$} and \texttt{$\vee$},\\ |
|
107 |
3.) & the prefix operation $\neg$, and\\ |
|
108 |
4.) & can be enclosed in parentheses. |
|
109 |
\end{tabular} |
|
110 |
\end{center} |
|
111 |
||
112 |
(i) Give a grammar that can recognise such boolean expressions |
|
113 |
and (ii) give a sample string involving all rules given in 1.-4.~that |
|
114 |
can be parsed by this grammar. |
|
115 |
||
955 | 116 |
\solution{ |
117 |
The simplest grammar is |
|
118 |
||
119 |
\begin{plstx}[margin=1cm] |
|
120 |
:\meta{B} ::= true \;|\; false \;|\;\meta{B} \cdot $\wedge$ \cdot \meta{B} \;|\; \meta{B} \cdot $\vee$ \cdot \meta{B} |
|
121 |
\;|\; $\neg$\meta{B} \;|\; (\cdot\meta{B}\cdot)\\ |
|
122 |
\end{plstx} |
|
459 | 123 |
|
955 | 124 |
This is unfortunately a left-recursive grammar. Giving a not left-recursive one is a bit more involved. |
125 |
} |
|
721
e3c64f22dd31
added slides from Rochester
Christian Urban <christian.urban@kcl.ac.uk>
parents:
647
diff
changeset
|
126 |
|
647 | 127 |
\item Parsing combinators consist of atomic parsers, alternative |
128 |
parsers, sequence parsers and semantic actions. What is the purpose |
|
954 | 129 |
of (1) atomic parsers and of (2) map-parsers (also called semantic actions)? |
647 | 130 |
|
954 | 131 |
\solution{ |
132 |
Atomic parsers look at a concrete prefix of the input, like num-tokens or identifiers. |
|
133 |
Map-parsers apply a function to the output of a parser. In this way you can transform |
|
134 |
the output from, for example, a string to an integer. |
|
135 |
} |
|
136 |
||
137 |
||
726 | 138 |
\item Parser combinators can directly be given a string as |
139 |
input, without the need of a lexer. What are the |
|
140 |
advantages to first lex a string and then feed a |
|
141 |
sequence of tokens as input to the parser? |
|
142 |
||
954 | 143 |
\solution{ Reason 1 you can filter out whitespaces and comments, |
144 |
which makes the grammar rules simpler. If you have to make |
|
145 |
sure that a whitespace comes after a variable say, then your |
|
146 |
parser rule for variables gets more complicated. Same with |
|
147 |
comments which do not contribute anything to the parse tree. |
|
148 |
||
149 |
Reason 2) The lexer can already classify tokens, for example |
|
150 |
as numbers, keywords or identifiers. This again makes the grammar |
|
151 |
rules more deterministic and as a result faster to parse. |
|
898 | 152 |
|
954 | 153 |
The point is that parser combinators can be used to process |
154 |
strings, but in case of compilers where whitespaces and |
|
155 |
comments need to be filtered out, the lexing phase is actually |
|
156 |
quite useful. } |
|
898 | 157 |
|
894
02ef5c3abc51
updatedHG: added solutions/cw5/fun_tokens.sc
Christian Urban <christian.urban@kcl.ac.uk>
parents:
726
diff
changeset
|
158 |
\item The injection function for sequence regular expressions is defined |
02ef5c3abc51
updatedHG: added solutions/cw5/fun_tokens.sc
Christian Urban <christian.urban@kcl.ac.uk>
parents:
726
diff
changeset
|
159 |
by three clauses: |
726 | 160 |
|
894
02ef5c3abc51
updatedHG: added solutions/cw5/fun_tokens.sc
Christian Urban <christian.urban@kcl.ac.uk>
parents:
726
diff
changeset
|
161 |
\begin{center} |
02ef5c3abc51
updatedHG: added solutions/cw5/fun_tokens.sc
Christian Urban <christian.urban@kcl.ac.uk>
parents:
726
diff
changeset
|
162 |
\begin{tabular}{l@{\hspace{1mm}}c@{\hspace{1mm}}l} |
02ef5c3abc51
updatedHG: added solutions/cw5/fun_tokens.sc
Christian Urban <christian.urban@kcl.ac.uk>
parents:
726
diff
changeset
|
163 |
$\inj\,(r_1 \cdot r_2)\,c\,\,Seq(v_1,v_2)$ & $\dn$ & $Seq(\inj\,r_1\,c\,v_1,v_2)$\\ |
02ef5c3abc51
updatedHG: added solutions/cw5/fun_tokens.sc
Christian Urban <christian.urban@kcl.ac.uk>
parents:
726
diff
changeset
|
164 |
$\inj\,(r_1 \cdot r_2)\,c\,\,\Left(Seq(v_1,v_2))$ & $\dn$ & $Seq(\inj\,r_1\,c\,v_1,v_2)$\\ |
02ef5c3abc51
updatedHG: added solutions/cw5/fun_tokens.sc
Christian Urban <christian.urban@kcl.ac.uk>
parents:
726
diff
changeset
|
165 |
$\inj\,(r_1 \cdot r_2)\,c\,\,Right(v)$ & $\dn$ & $Seq(\textit{mkeps}(r_1),\inj\,r_2\,c\,v)$\\ |
02ef5c3abc51
updatedHG: added solutions/cw5/fun_tokens.sc
Christian Urban <christian.urban@kcl.ac.uk>
parents:
726
diff
changeset
|
166 |
\end{tabular} |
02ef5c3abc51
updatedHG: added solutions/cw5/fun_tokens.sc
Christian Urban <christian.urban@kcl.ac.uk>
parents:
726
diff
changeset
|
167 |
\end{center} |
02ef5c3abc51
updatedHG: added solutions/cw5/fun_tokens.sc
Christian Urban <christian.urban@kcl.ac.uk>
parents:
726
diff
changeset
|
168 |
|
954 | 169 |
Explain why there are three cases in the injection function for sequence |
170 |
regular expressions. |
|
171 |
||
172 |
\solution{ |
|
173 |
This is because the derivative of sequences can be of the form |
|
174 |
||
175 |
\begin{itemize} |
|
176 |
\item $(der\,c\,r_1)\cdot r_2$ |
|
177 |
\item $(der\,c\,r_1)\cdot r_2 \;+\; der\,c\,r_2$ |
|
178 |
\end{itemize} |
|
179 |
||
180 |
In the first case the value needs to be of the form $Seq$, in the second case $\Left$ or $Right$. |
|
181 |
Therefore 3 cases. |
|
182 |
} |
|
894
02ef5c3abc51
updatedHG: added solutions/cw5/fun_tokens.sc
Christian Urban <christian.urban@kcl.ac.uk>
parents:
726
diff
changeset
|
183 |
|
459 | 184 |
\item \POSTSCRIPT |
56 | 185 |
\end{enumerate} |
186 |
||
187 |
\end{document} |
|
188 |
||
189 |
%%% Local Variables: |
|
190 |
%%% mode: latex |
|
191 |
%%% TeX-master: t |
|
192 |
%%% End: |