author | Christian Urban <christian.urban@kcl.ac.uk> |
Sat, 11 Nov 2023 10:08:33 +0000 | |
changeset 954 | eda0ccf56c72 |
parent 916 | 10f834eb0a9e |
child 955 | 47acfd7f9096 |
permissions | -rw-r--r-- |
56 | 1 |
\documentclass{article} |
292
7ed2a25dd115
updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents:
102
diff
changeset
|
2 |
\usepackage{../style} |
7ed2a25dd115
updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents:
102
diff
changeset
|
3 |
\usepackage{../graphics} |
954 | 4 |
\usepackage{../grammar} |
56 | 5 |
|
6 |
\begin{document} |
|
7 |
||
8 |
\section*{Homework 6} |
|
9 |
||
916 | 10 |
%\HEADER |
359
db106e5b7c4d
updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents:
292
diff
changeset
|
11 |
|
56 | 12 |
\begin{enumerate} |
13 |
\item (i) Give the regular expressions for lexing a language |
|
401
5d85dc9779b1
updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents:
359
diff
changeset
|
14 |
consisting of whitespaces, identifiers (some letters |
5d85dc9779b1
updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents:
359
diff
changeset
|
15 |
followed by digits), numbers, operations \texttt{=}, |
5d85dc9779b1
updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents:
359
diff
changeset
|
16 |
\texttt{<} and \texttt{>}, and the keywords \texttt{if}, |
5d85dc9779b1
updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents:
359
diff
changeset
|
17 |
\texttt{then} and \texttt{else}. (ii) Decide whether the |
5d85dc9779b1
updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents:
359
diff
changeset
|
18 |
following strings can be lexed in this language? |
56 | 19 |
|
20 |
\begin{enumerate} |
|
21 |
\item \texttt{"if y4 = 3 then 1 else 3"} |
|
22 |
\item \texttt{"if33 ifif then then23 else else 32"} |
|
23 |
\item \texttt{"if x4x < 33 then 1 else 3"} |
|
24 |
\end{enumerate} |
|
25 |
||
26 |
In case they can, give the corresponding token sequences. (Hint: |
|
89
24adcc265f2e
typo
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents:
56
diff
changeset
|
27 |
Observe the maximal munch rule and priorities of your regular |
56 | 28 |
expressions that make the process of lexing unambiguous.) |
29 |
||
30 |
\item Suppose the grammar |
|
31 |
||
954 | 32 |
\begin{plstx}[margin=1cm] |
33 |
:\meta{E} ::= \meta{F} \;|\; \meta{F} \cdot \texttt{*} \cdot \meta{F} \;|\; \meta{F} \cdot \backslash \cdot \meta{F}\\ |
|
34 |
:\meta{F} ::= \meta{T} \;|\; \meta{T} \cdot \texttt{+} \cdot \meta{T} \;|\; \meta{T} \cdot \texttt{-} \cdot \meta{T}\\ |
|
35 |
:\meta{T} ::= $num$ \;|\; \texttt{(} \cdot E \cdot \texttt{)}\\ |
|
36 |
\end{plstx} |
|
56 | 37 |
|
954 | 38 |
where \meta{E}, \meta{F} and \meta{T} are non-terminals, \meta{E} is |
39 |
the starting symbol of the grammar, and $num$ stands for a number |
|
40 |
token. Give a parse tree for the string \texttt{(3+3)+(2*3)}. Note that |
|
41 |
\meta{F} and \meta{T} are ``exchanged'' in this grammar in comparison with |
|
42 |
the usual grammar for arithmetic expressions. What does this mean in terms |
|
43 |
of precedences of the operators? |
|
44 |
||
45 |
\solution{ |
|
46 |
For the second part "1+2*3" will be parsed as (1+2)*3, meaning + and - bind |
|
47 |
tighter than * and $\backslash$ |
|
48 |
} |
|
56 | 49 |
|
50 |
\item Define what it means for a grammar to be ambiguous. Give an example of |
|
51 |
an ambiguous grammar. |
|
52 |
||
954 | 53 |
\solution{ |
54 |
Already the grammar |
|
55 |
\begin{plstx}[margin=1cm] |
|
56 |
: \meta{E} ::= \meta{E}\cdot +\cdot\meta{E} | num\\ |
|
57 |
\end{plstx} |
|
58 |
||
59 |
is ambiguous because a string like "1+2+3" can be parsed |
|
60 |
as (1+2)+3 or 1+(2+3). |
|
61 |
} |
|
62 |
||
63 |
||
56 | 64 |
\item Suppose boolean expressions are built up from |
65 |
||
66 |
\begin{center} |
|
67 |
\begin{tabular}{ll} |
|
68 |
1.) & tokens for \texttt{true} and \texttt{false},\\ |
|
69 |
2.) & the infix operations \texttt{$\wedge$} and \texttt{$\vee$},\\ |
|
70 |
3.) & the prefix operation $\neg$, and\\ |
|
71 |
4.) & can be enclosed in parentheses. |
|
72 |
\end{tabular} |
|
73 |
\end{center} |
|
74 |
||
75 |
(i) Give a grammar that can recognise such boolean expressions |
|
76 |
and (ii) give a sample string involving all rules given in 1.-4.~that |
|
77 |
can be parsed by this grammar. |
|
78 |
||
459 | 79 |
|
721
e3c64f22dd31
added slides from Rochester
Christian Urban <christian.urban@kcl.ac.uk>
parents:
647
diff
changeset
|
80 |
|
647 | 81 |
\item Parsing combinators consist of atomic parsers, alternative |
82 |
parsers, sequence parsers and semantic actions. What is the purpose |
|
954 | 83 |
of (1) atomic parsers and of (2) map-parsers (also called semantic actions)? |
647 | 84 |
|
954 | 85 |
\solution{ |
86 |
Atomic parsers look at a concrete prefix of the input, like num-tokens or identifiers. |
|
87 |
Map-parsers apply a function to the output of a parser. In this way you can transform |
|
88 |
the output from, for example, a string to an integer. |
|
89 |
} |
|
90 |
||
91 |
||
726 | 92 |
\item Parser combinators can directly be given a string as |
93 |
input, without the need of a lexer. What are the |
|
94 |
advantages to first lex a string and then feed a |
|
95 |
sequence of tokens as input to the parser? |
|
96 |
||
954 | 97 |
\solution{ Reason 1 you can filter out whitespaces and comments, |
98 |
which makes the grammar rules simpler. If you have to make |
|
99 |
sure that a whitespace comes after a variable say, then your |
|
100 |
parser rule for variables gets more complicated. Same with |
|
101 |
comments which do not contribute anything to the parse tree. |
|
102 |
||
103 |
Reason 2) The lexer can already classify tokens, for example |
|
104 |
as numbers, keywords or identifiers. This again makes the grammar |
|
105 |
rules more deterministic and as a result faster to parse. |
|
898 | 106 |
|
954 | 107 |
The point is that parser combinators can be used to process |
108 |
strings, but in case of compilers where whitespaces and |
|
109 |
comments need to be filtered out, the lexing phase is actually |
|
110 |
quite useful. } |
|
898 | 111 |
|
894
02ef5c3abc51
updatedHG: added solutions/cw5/fun_tokens.sc
Christian Urban <christian.urban@kcl.ac.uk>
parents:
726
diff
changeset
|
112 |
\item The injection function for sequence regular expressions is defined |
02ef5c3abc51
updatedHG: added solutions/cw5/fun_tokens.sc
Christian Urban <christian.urban@kcl.ac.uk>
parents:
726
diff
changeset
|
113 |
by three clauses: |
726 | 114 |
|
894
02ef5c3abc51
updatedHG: added solutions/cw5/fun_tokens.sc
Christian Urban <christian.urban@kcl.ac.uk>
parents:
726
diff
changeset
|
115 |
\begin{center} |
02ef5c3abc51
updatedHG: added solutions/cw5/fun_tokens.sc
Christian Urban <christian.urban@kcl.ac.uk>
parents:
726
diff
changeset
|
116 |
\begin{tabular}{l@{\hspace{1mm}}c@{\hspace{1mm}}l} |
02ef5c3abc51
updatedHG: added solutions/cw5/fun_tokens.sc
Christian Urban <christian.urban@kcl.ac.uk>
parents:
726
diff
changeset
|
117 |
$\inj\,(r_1 \cdot r_2)\,c\,\,Seq(v_1,v_2)$ & $\dn$ & $Seq(\inj\,r_1\,c\,v_1,v_2)$\\ |
02ef5c3abc51
updatedHG: added solutions/cw5/fun_tokens.sc
Christian Urban <christian.urban@kcl.ac.uk>
parents:
726
diff
changeset
|
118 |
$\inj\,(r_1 \cdot r_2)\,c\,\,\Left(Seq(v_1,v_2))$ & $\dn$ & $Seq(\inj\,r_1\,c\,v_1,v_2)$\\ |
02ef5c3abc51
updatedHG: added solutions/cw5/fun_tokens.sc
Christian Urban <christian.urban@kcl.ac.uk>
parents:
726
diff
changeset
|
119 |
$\inj\,(r_1 \cdot r_2)\,c\,\,Right(v)$ & $\dn$ & $Seq(\textit{mkeps}(r_1),\inj\,r_2\,c\,v)$\\ |
02ef5c3abc51
updatedHG: added solutions/cw5/fun_tokens.sc
Christian Urban <christian.urban@kcl.ac.uk>
parents:
726
diff
changeset
|
120 |
\end{tabular} |
02ef5c3abc51
updatedHG: added solutions/cw5/fun_tokens.sc
Christian Urban <christian.urban@kcl.ac.uk>
parents:
726
diff
changeset
|
121 |
\end{center} |
02ef5c3abc51
updatedHG: added solutions/cw5/fun_tokens.sc
Christian Urban <christian.urban@kcl.ac.uk>
parents:
726
diff
changeset
|
122 |
|
954 | 123 |
Explain why there are three cases in the injection function for sequence |
124 |
regular expressions. |
|
125 |
||
126 |
\solution{ |
|
127 |
This is because the derivative of sequences can be of the form |
|
128 |
||
129 |
\begin{itemize} |
|
130 |
\item $(der\,c\,r_1)\cdot r_2$ |
|
131 |
\item $(der\,c\,r_1)\cdot r_2 \;+\; der\,c\,r_2$ |
|
132 |
\end{itemize} |
|
133 |
||
134 |
In the first case the value needs to be of the form $Seq$, in the second case $\Left$ or $Right$. |
|
135 |
Therefore 3 cases. |
|
136 |
} |
|
894
02ef5c3abc51
updatedHG: added solutions/cw5/fun_tokens.sc
Christian Urban <christian.urban@kcl.ac.uk>
parents:
726
diff
changeset
|
137 |
|
459 | 138 |
\item \POSTSCRIPT |
56 | 139 |
\end{enumerate} |
140 |
||
141 |
\end{document} |
|
142 |
||
143 |
%%% Local Variables: |
|
144 |
%%% mode: latex |
|
145 |
%%% TeX-master: t |
|
146 |
%%% End: |