| author | Christian Urban <christian.urban@kcl.ac.uk> | 
| Sun, 28 Sep 2025 18:57:33 +0100 | |
| changeset 994 | 95824be3d3f0 | 
| parent 954 | 4a7ed272d46e | 
| permissions | -rw-r--r-- | 
| 56 | 1  | 
\documentclass{article}
 | 
| 
292
 
7ed2a25dd115
updated
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
102 
diff
changeset
 | 
2  | 
\usepackage{../style}
 | 
| 
 
7ed2a25dd115
updated
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
102 
diff
changeset
 | 
3  | 
\usepackage{../graphics}
 | 
| 953 | 4  | 
\usepackage{../grammar}
 | 
| 56 | 5  | 
|
6  | 
\begin{document}
 | 
|
7  | 
||
8  | 
\section*{Homework 6}
 | 
|
9  | 
||
| 916 | 10  | 
%\HEADER  | 
| 
359
 
db106e5b7c4d
updated
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
292 
diff
changeset
 | 
11  | 
|
| 56 | 12  | 
\begin{enumerate}
 | 
13  | 
\item (i) Give the regular expressions for lexing a language  | 
|
| 
401
 
5d85dc9779b1
updated
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
359 
diff
changeset
 | 
14  | 
consisting of whitespaces, identifiers (some letters  | 
| 
 
5d85dc9779b1
updated
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
359 
diff
changeset
 | 
15  | 
      followed by digits), numbers, operations \texttt{=},
 | 
| 
 
5d85dc9779b1
updated
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
359 
diff
changeset
 | 
16  | 
      \texttt{<} and \texttt{>}, and the keywords \texttt{if},
 | 
| 
 
5d85dc9779b1
updated
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
359 
diff
changeset
 | 
17  | 
      \texttt{then} and \texttt{else}. (ii) Decide whether the
 | 
| 
 
5d85dc9779b1
updated
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
359 
diff
changeset
 | 
18  | 
following strings can be lexed in this language?  | 
| 56 | 19  | 
|
20  | 
\begin{enumerate}
 | 
|
21  | 
\item \texttt{"if y4 = 3 then 1 else 3"}
 | 
|
22  | 
\item \texttt{"if33 ifif then then23 else else 32"}
 | 
|
23  | 
\item \texttt{"if x4x < 33 then 1 else 3"}
 | 
|
24  | 
\end{enumerate}
 | 
|
25  | 
||
26  | 
In case they can, give the corresponding token sequences. (Hint:  | 
|
| 
89
 
24adcc265f2e
typo
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
56 
diff
changeset
 | 
27  | 
Observe the maximal munch rule and priorities of your regular  | 
| 56 | 28  | 
expressions that make the process of lexing unambiguous.)  | 
29  | 
||
| 954 | 30  | 
\solution{
 | 
31  | 
a and b are ok; c is not matched because x4x is not an identifier according to the rules.  | 
|
32  | 
}  | 
|
33  | 
||
34  | 
||
| 56 | 35  | 
\item Suppose the grammar  | 
36  | 
||
| 953 | 37  | 
\begin{plstx}[margin=1cm]    
 | 
38  | 
:\meta{E} ::= \meta{F} \;|\; \meta{F} \cdot \texttt{*} \cdot \meta{F} \;|\; \meta{F} \cdot \backslash \cdot \meta{F}\\
 | 
|
39  | 
:\meta{F} ::= \meta{T} \;|\; \meta{T} \cdot \texttt{+} \cdot \meta{T} \;|\; \meta{T} \cdot \texttt{-} \cdot \meta{T}\\
 | 
|
40  | 
:\meta{T} ::= $num$ \;|\; \texttt{(} \cdot E \cdot \texttt{)}\\
 | 
|
41  | 
\end{plstx}
 | 
|
| 56 | 42  | 
|
| 953 | 43  | 
where \meta{E}, \meta{F} and \meta{T} are non-terminals, \meta{E} is
 | 
44  | 
the starting symbol of the grammar, and $num$ stands for a number  | 
|
45  | 
token. Give a parse tree for the string \texttt{(3+3)+(2*3)}. Note that
 | 
|
46  | 
\meta{F} and \meta{T} are ``exchanged'' in this grammar in comparison with
 | 
|
47  | 
the usual grammar for arithmetic expressions. What does this mean in terms  | 
|
48  | 
of precedences of the operators?  | 
|
49  | 
||
50  | 
\solution{
 | 
|
| 954 | 51  | 
  \begin{tikzpicture}[level distance=10mm, black,sibling distance=10mm]
 | 
52  | 
  \node {\meta{E}}
 | 
|
53  | 
  child[sibling distance=20mm] {node {\meta{F}} 
 | 
|
54  | 
    child {node {\meta{T}}
 | 
|
55  | 
      child[sibling distance=5mm] {node {(}}
 | 
|
56  | 
      child[sibling distance=5mm]  {node {\meta{E}}
 | 
|
57  | 
        child {node {\meta{F}}
 | 
|
58  | 
          child {node {\meta{T}} child {node {3}}}
 | 
|
59  | 
          child {node {+}}
 | 
|
60  | 
          child {node {\meta{T}} child {node {3}}}
 | 
|
61  | 
}  | 
|
62  | 
}  | 
|
63  | 
      child[sibling distance=5mm]  {node {)}}
 | 
|
64  | 
}  | 
|
65  | 
    child {node {+}}
 | 
|
66  | 
    child {node {\meta{T}}
 | 
|
67  | 
      child[sibling distance=5mm] {node {(}}
 | 
|
68  | 
      child[sibling distance=5mm] {node {\meta{E}}
 | 
|
69  | 
        child {node {\meta{F}}
 | 
|
70  | 
          child {node {\meta{T}} child {node {2}}}
 | 
|
71  | 
}  | 
|
72  | 
        child {node {*}}
 | 
|
73  | 
        child {node {\meta{F}}
 | 
|
74  | 
          child {node {\meta{T}} child {node {3}}}
 | 
|
75  | 
}  | 
|
76  | 
}  | 
|
77  | 
      child[sibling distance=5mm] {node {)}}
 | 
|
78  | 
}  | 
|
79  | 
};  | 
|
80  | 
\end{tikzpicture}
 | 
|
81  | 
||
82  | 
||
| 953 | 83  | 
For the second part "1+2*3" will be parsed as (1+2)*3, meaning + and - bind  | 
84  | 
tighter than * and $\backslash$  | 
|
85  | 
}  | 
|
| 56 | 86  | 
|
87  | 
\item Define what it means for a grammar to be ambiguous. Give an example of  | 
|
88  | 
an ambiguous grammar.  | 
|
89  | 
||
| 953 | 90  | 
\solution{
 | 
91  | 
Already the grammar  | 
|
92  | 
  \begin{plstx}[margin=1cm]
 | 
|
93  | 
    : \meta{E} ::= \meta{E}\cdot +\cdot\meta{E} | num\\
 | 
|
94  | 
  \end{plstx}
 | 
|
95  | 
||
96  | 
is ambiguous because a string like "1+2+3" can be parsed  | 
|
97  | 
as (1+2)+3 or 1+(2+3).  | 
|
98  | 
}  | 
|
99  | 
||
100  | 
||
| 56 | 101  | 
\item Suppose boolean expressions are built up from  | 
102  | 
||
103  | 
\begin{center}
 | 
|
104  | 
\begin{tabular}{ll}
 | 
|
105  | 
1.) & tokens for \texttt{true} and \texttt{false},\\
 | 
|
106  | 
2.) & the infix operations \texttt{$\wedge$} and \texttt{$\vee$},\\
 | 
|
107  | 
3.) & the prefix operation $\neg$, and\\  | 
|
108  | 
4.) & can be enclosed in parentheses.  | 
|
109  | 
\end{tabular}
 | 
|
110  | 
\end{center}
 | 
|
111  | 
||
112  | 
(i) Give a grammar that can recognise such boolean expressions  | 
|
113  | 
and (ii) give a sample string involving all rules given in 1.-4.~that  | 
|
114  | 
can be parsed by this grammar.  | 
|
115  | 
||
| 954 | 116  | 
\solution{
 | 
117  | 
The simplest grammar is  | 
|
118  | 
||
119  | 
\begin{plstx}[margin=1cm]    
 | 
|
120  | 
  :\meta{B} ::= true \;|\; false \;|\;\meta{B} \cdot $\wedge$ \cdot \meta{B} \;|\; \meta{B} \cdot $\vee$ \cdot \meta{B}
 | 
|
121  | 
  \;|\; $\neg$\meta{B} \;|\; (\cdot\meta{B}\cdot)\\
 | 
|
122  | 
\end{plstx}
 | 
|
| 459 | 123  | 
|
| 954 | 124  | 
This is unfortunately a left-recursive grammar. Giving a not left-recursive one is a bit more involved.  | 
125  | 
}  | 
|
| 
721
 
e712943cff71
added slides from Rochester
 
Christian Urban <christian.urban@kcl.ac.uk> 
parents: 
647 
diff
changeset
 | 
126  | 
|
| 647 | 127  | 
\item Parsing combinators consist of atomic parsers, alternative  | 
128  | 
parsers, sequence parsers and semantic actions. What is the purpose  | 
|
| 953 | 129  | 
of (1) atomic parsers and of (2) map-parsers (also called semantic actions)?  | 
| 647 | 130  | 
|
| 953 | 131  | 
  \solution{
 | 
132  | 
Atomic parsers look at a concrete prefix of the input, like num-tokens or identifiers.  | 
|
133  | 
Map-parsers apply a function to the output of a parser. In this way you can transform  | 
|
134  | 
the output from, for example, a string to an integer.  | 
|
135  | 
}  | 
|
136  | 
||
137  | 
||
| 726 | 138  | 
\item Parser combinators can directly be given a string as  | 
139  | 
input, without the need of a lexer. What are the  | 
|
140  | 
advantages to first lex a string and then feed a  | 
|
141  | 
sequence of tokens as input to the parser?  | 
|
142  | 
||
| 953 | 143  | 
      \solution{ Reason 1 you can filter out whitespaces and comments,
 | 
144  | 
which makes the grammar rules simpler. If you have to make  | 
|
145  | 
sure that a whitespace comes after a variable say, then your  | 
|
146  | 
parser rule for variables gets more complicated. Same with  | 
|
147  | 
comments which do not contribute anything to the parse tree.  | 
|
148  | 
||
149  | 
Reason 2) The lexer can already classify tokens, for example  | 
|
150  | 
as numbers, keywords or identifiers. This again makes the grammar  | 
|
151  | 
rules more deterministic and as a result faster to parse.  | 
|
| 898 | 152  | 
|
| 953 | 153  | 
The point is that parser combinators can be used to process  | 
154  | 
strings, but in case of compilers where whitespaces and  | 
|
155  | 
comments need to be filtered out, the lexing phase is actually  | 
|
156  | 
quite useful. }  | 
|
| 898 | 157  | 
|
| 
894
 
4d5058706f1b
updatedHG: added solutions/cw5/fun_tokens.sc
 
Christian Urban <christian.urban@kcl.ac.uk> 
parents: 
726 
diff
changeset
 | 
158  | 
\item The injection function for sequence regular expressions is defined  | 
| 
 
4d5058706f1b
updatedHG: added solutions/cw5/fun_tokens.sc
 
Christian Urban <christian.urban@kcl.ac.uk> 
parents: 
726 
diff
changeset
 | 
159  | 
by three clauses:  | 
| 726 | 160  | 
|
| 
894
 
4d5058706f1b
updatedHG: added solutions/cw5/fun_tokens.sc
 
Christian Urban <christian.urban@kcl.ac.uk> 
parents: 
726 
diff
changeset
 | 
161  | 
\begin{center}
 | 
| 
 
4d5058706f1b
updatedHG: added solutions/cw5/fun_tokens.sc
 
Christian Urban <christian.urban@kcl.ac.uk> 
parents: 
726 
diff
changeset
 | 
162  | 
\begin{tabular}{l@{\hspace{1mm}}c@{\hspace{1mm}}l}
 | 
| 
 
4d5058706f1b
updatedHG: added solutions/cw5/fun_tokens.sc
 
Christian Urban <christian.urban@kcl.ac.uk> 
parents: 
726 
diff
changeset
 | 
163  | 
$\inj\,(r_1 \cdot r_2)\,c\,\,Seq(v_1,v_2)$ & $\dn$ & $Seq(\inj\,r_1\,c\,v_1,v_2)$\\  | 
| 
 
4d5058706f1b
updatedHG: added solutions/cw5/fun_tokens.sc
 
Christian Urban <christian.urban@kcl.ac.uk> 
parents: 
726 
diff
changeset
 | 
164  | 
$\inj\,(r_1 \cdot r_2)\,c\,\,\Left(Seq(v_1,v_2))$ & $\dn$ & $Seq(\inj\,r_1\,c\,v_1,v_2)$\\  | 
| 
 
4d5058706f1b
updatedHG: added solutions/cw5/fun_tokens.sc
 
Christian Urban <christian.urban@kcl.ac.uk> 
parents: 
726 
diff
changeset
 | 
165  | 
  $\inj\,(r_1 \cdot r_2)\,c\,\,Right(v)$ & $\dn$  & $Seq(\textit{mkeps}(r_1),\inj\,r_2\,c\,v)$\\
 | 
| 
 
4d5058706f1b
updatedHG: added solutions/cw5/fun_tokens.sc
 
Christian Urban <christian.urban@kcl.ac.uk> 
parents: 
726 
diff
changeset
 | 
166  | 
\end{tabular}
 | 
| 
 
4d5058706f1b
updatedHG: added solutions/cw5/fun_tokens.sc
 
Christian Urban <christian.urban@kcl.ac.uk> 
parents: 
726 
diff
changeset
 | 
167  | 
\end{center}
 | 
| 
 
4d5058706f1b
updatedHG: added solutions/cw5/fun_tokens.sc
 
Christian Urban <christian.urban@kcl.ac.uk> 
parents: 
726 
diff
changeset
 | 
168  | 
|
| 953 | 169  | 
Explain why there are three cases in the injection function for sequence  | 
170  | 
regular expressions.  | 
|
171  | 
||
172  | 
     \solution{
 | 
|
173  | 
This is because the derivative of sequences can be of the form  | 
|
174  | 
||
175  | 
       \begin{itemize}
 | 
|
176  | 
\item $(der\,c\,r_1)\cdot r_2$  | 
|
177  | 
\item $(der\,c\,r_1)\cdot r_2 \;+\; der\,c\,r_2$  | 
|
178  | 
       \end{itemize}
 | 
|
179  | 
||
180  | 
In the first case the value needs to be of the form $Seq$, in the second case $\Left$ or $Right$.  | 
|
181  | 
Therefore 3 cases.  | 
|
182  | 
}  | 
|
| 
894
 
4d5058706f1b
updatedHG: added solutions/cw5/fun_tokens.sc
 
Christian Urban <christian.urban@kcl.ac.uk> 
parents: 
726 
diff
changeset
 | 
183  | 
|
| 459 | 184  | 
\item \POSTSCRIPT  | 
| 56 | 185  | 
\end{enumerate}
 | 
186  | 
||
187  | 
\end{document}
 | 
|
188  | 
||
189  | 
%%% Local Variables:  | 
|
190  | 
%%% mode: latex  | 
|
191  | 
%%% TeX-master: t  | 
|
192  | 
%%% End:  |