author | Christian Urban <christian.urban@kcl.ac.uk> |
Sat, 09 Sep 2023 14:14:31 +0100 | |
changeset 916 | 10f834eb0a9e |
parent 898 | 45a48c47dcca |
child 954 | eda0ccf56c72 |
permissions | -rw-r--r-- |
56 | 1 |
\documentclass{article} |
292
7ed2a25dd115
updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents:
102
diff
changeset
|
2 |
\usepackage{../style} |
7ed2a25dd115
updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents:
102
diff
changeset
|
3 |
\usepackage{../graphics} |
56 | 4 |
|
5 |
\begin{document} |
|
6 |
||
7 |
\section*{Homework 6} |
|
8 |
||
916 | 9 |
%\HEADER |
359
db106e5b7c4d
updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents:
292
diff
changeset
|
10 |
|
56 | 11 |
\begin{enumerate} |
12 |
\item (i) Give the regular expressions for lexing a language |
|
401
5d85dc9779b1
updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents:
359
diff
changeset
|
13 |
consisting of whitespaces, identifiers (some letters |
5d85dc9779b1
updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents:
359
diff
changeset
|
14 |
followed by digits), numbers, operations \texttt{=}, |
5d85dc9779b1
updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents:
359
diff
changeset
|
15 |
\texttt{<} and \texttt{>}, and the keywords \texttt{if}, |
5d85dc9779b1
updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents:
359
diff
changeset
|
16 |
\texttt{then} and \texttt{else}. (ii) Decide whether the |
5d85dc9779b1
updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents:
359
diff
changeset
|
17 |
following strings can be lexed in this language? |
56 | 18 |
|
19 |
\begin{enumerate} |
|
20 |
\item \texttt{"if y4 = 3 then 1 else 3"} |
|
21 |
\item \texttt{"if33 ifif then then23 else else 32"} |
|
22 |
\item \texttt{"if x4x < 33 then 1 else 3"} |
|
23 |
\end{enumerate} |
|
24 |
||
25 |
In case they can, give the corresponding token sequences. (Hint: |
|
89
24adcc265f2e
typo
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents:
56
diff
changeset
|
26 |
Observe the maximal munch rule and priorities of your regular |
56 | 27 |
expressions that make the process of lexing unambiguous.) |
28 |
||
29 |
\item Suppose the grammar |
|
30 |
||
31 |
\begin{center} |
|
32 |
\begin{tabular}{lcl} |
|
33 |
$E$ & $\rightarrow$ & $F \;|\; F \cdot * \cdot F \;|\; F \cdot \backslash \cdot F$\\ |
|
34 |
$F$ & $\rightarrow$ & $T \;|\; T \cdot \texttt{+} \cdot T \;|\; T \cdot \texttt{-} \cdot T$\\ |
|
35 |
$T$ & $\rightarrow$ & $num \;|\; \texttt{(} \cdot E \cdot \texttt{)}$\\ |
|
36 |
\end{tabular} |
|
37 |
\end{center} |
|
38 |
||
39 |
where $E$, $F$ and $T$ are non-terminals, $E$ is the starting symbol of the grammar, and $num$ stands for |
|
40 |
a number token. Give a parse tree for the string \texttt{(3+3)+(2*3)}. |
|
41 |
||
42 |
\item Define what it means for a grammar to be ambiguous. Give an example of |
|
43 |
an ambiguous grammar. |
|
44 |
||
45 |
\item Suppose boolean expressions are built up from |
|
46 |
||
47 |
\begin{center} |
|
48 |
\begin{tabular}{ll} |
|
49 |
1.) & tokens for \texttt{true} and \texttt{false},\\ |
|
50 |
2.) & the infix operations \texttt{$\wedge$} and \texttt{$\vee$},\\ |
|
51 |
3.) & the prefix operation $\neg$, and\\ |
|
52 |
4.) & can be enclosed in parentheses. |
|
53 |
\end{tabular} |
|
54 |
\end{center} |
|
55 |
||
56 |
(i) Give a grammar that can recognise such boolean expressions |
|
57 |
and (ii) give a sample string involving all rules given in 1.-4.~that |
|
58 |
can be parsed by this grammar. |
|
59 |
||
459 | 60 |
|
721
e3c64f22dd31
added slides from Rochester
Christian Urban <christian.urban@kcl.ac.uk>
parents:
647
diff
changeset
|
61 |
|
647 | 62 |
\item Parsing combinators consist of atomic parsers, alternative |
63 |
parsers, sequence parsers and semantic actions. What is the purpose |
|
64 |
of (1) atomic parsers and of (2) semantic actions? |
|
65 |
||
726 | 66 |
\item Parser combinators can directly be given a string as |
67 |
input, without the need of a lexer. What are the |
|
68 |
advantages to first lex a string and then feed a |
|
69 |
sequence of tokens as input to the parser? |
|
70 |
||
898 | 71 |
% Reason 1 you can filter out whitespaces and comments, which makes the grammar rules simpler. If you have to make sure that a whitespace comes after a variable say, then your parser rule for variables gets more complicated. Same with comments which do not contribute anything to the parser tree. |
72 |
% Reason 2) The lexer can already classify tokens, for example as numbers, keywords or identifiers. This again makes the grammar rules more deterministic and as a result faster to parse. |
|
73 |
% The point is that parser combinators can be used to process strings, but in case of compilers where whitespaces and comments need to be filtered out, the lexing phase is actually quite useful. |
|
74 |
||
75 |
||
894
02ef5c3abc51
updatedHG: added solutions/cw5/fun_tokens.sc
Christian Urban <christian.urban@kcl.ac.uk>
parents:
726
diff
changeset
|
76 |
\item The injection function for sequence regular expressions is defined |
02ef5c3abc51
updatedHG: added solutions/cw5/fun_tokens.sc
Christian Urban <christian.urban@kcl.ac.uk>
parents:
726
diff
changeset
|
77 |
by three clauses: |
726 | 78 |
|
894
02ef5c3abc51
updatedHG: added solutions/cw5/fun_tokens.sc
Christian Urban <christian.urban@kcl.ac.uk>
parents:
726
diff
changeset
|
79 |
\begin{center} |
02ef5c3abc51
updatedHG: added solutions/cw5/fun_tokens.sc
Christian Urban <christian.urban@kcl.ac.uk>
parents:
726
diff
changeset
|
80 |
\begin{tabular}{l@{\hspace{1mm}}c@{\hspace{1mm}}l} |
02ef5c3abc51
updatedHG: added solutions/cw5/fun_tokens.sc
Christian Urban <christian.urban@kcl.ac.uk>
parents:
726
diff
changeset
|
81 |
$\inj\,(r_1 \cdot r_2)\,c\,\,Seq(v_1,v_2)$ & $\dn$ & $Seq(\inj\,r_1\,c\,v_1,v_2)$\\ |
02ef5c3abc51
updatedHG: added solutions/cw5/fun_tokens.sc
Christian Urban <christian.urban@kcl.ac.uk>
parents:
726
diff
changeset
|
82 |
$\inj\,(r_1 \cdot r_2)\,c\,\,\Left(Seq(v_1,v_2))$ & $\dn$ & $Seq(\inj\,r_1\,c\,v_1,v_2)$\\ |
02ef5c3abc51
updatedHG: added solutions/cw5/fun_tokens.sc
Christian Urban <christian.urban@kcl.ac.uk>
parents:
726
diff
changeset
|
83 |
$\inj\,(r_1 \cdot r_2)\,c\,\,Right(v)$ & $\dn$ & $Seq(\textit{mkeps}(r_1),\inj\,r_2\,c\,v)$\\ |
02ef5c3abc51
updatedHG: added solutions/cw5/fun_tokens.sc
Christian Urban <christian.urban@kcl.ac.uk>
parents:
726
diff
changeset
|
84 |
\end{tabular} |
02ef5c3abc51
updatedHG: added solutions/cw5/fun_tokens.sc
Christian Urban <christian.urban@kcl.ac.uk>
parents:
726
diff
changeset
|
85 |
\end{center} |
02ef5c3abc51
updatedHG: added solutions/cw5/fun_tokens.sc
Christian Urban <christian.urban@kcl.ac.uk>
parents:
726
diff
changeset
|
86 |
|
02ef5c3abc51
updatedHG: added solutions/cw5/fun_tokens.sc
Christian Urban <christian.urban@kcl.ac.uk>
parents:
726
diff
changeset
|
87 |
Explain why there are three cases in the injection function for sequence |
02ef5c3abc51
updatedHG: added solutions/cw5/fun_tokens.sc
Christian Urban <christian.urban@kcl.ac.uk>
parents:
726
diff
changeset
|
88 |
regular expressions. |
02ef5c3abc51
updatedHG: added solutions/cw5/fun_tokens.sc
Christian Urban <christian.urban@kcl.ac.uk>
parents:
726
diff
changeset
|
89 |
|
459 | 90 |
\item \POSTSCRIPT |
56 | 91 |
\end{enumerate} |
92 |
||
93 |
\end{document} |
|
94 |
||
95 |
%%% Local Variables: |
|
96 |
%%% mode: latex |
|
97 |
%%% TeX-master: t |
|
98 |
%%% End: |