| author | Christian Urban <christian.urban@kcl.ac.uk> | 
| Tue, 21 Oct 2025 08:41:46 +0200 | |
| changeset 1014 | 8400bbdef1b7 | 
| parent 955 | 47acfd7f9096 | 
| permissions | -rw-r--r-- | 
| 56 | 1 | \documentclass{article}
 | 
| 292 
7ed2a25dd115
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
102diff
changeset | 2 | \usepackage{../style}
 | 
| 
7ed2a25dd115
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
102diff
changeset | 3 | \usepackage{../graphics}
 | 
| 954 | 4 | \usepackage{../grammar}
 | 
| 56 | 5 | |
| 6 | \begin{document}
 | |
| 7 | ||
| 8 | \section*{Homework 6}
 | |
| 9 | ||
| 916 | 10 | %\HEADER | 
| 359 
db106e5b7c4d
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
292diff
changeset | 11 | |
| 56 | 12 | \begin{enumerate}
 | 
| 13 | \item (i) Give the regular expressions for lexing a language | |
| 401 
5d85dc9779b1
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
359diff
changeset | 14 | consisting of whitespaces, identifiers (some letters | 
| 
5d85dc9779b1
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
359diff
changeset | 15 |       followed by digits), numbers, operations \texttt{=},
 | 
| 
5d85dc9779b1
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
359diff
changeset | 16 |       \texttt{<} and \texttt{>}, and the keywords \texttt{if},
 | 
| 
5d85dc9779b1
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
359diff
changeset | 17 |       \texttt{then} and \texttt{else}. (ii) Decide whether the
 | 
| 
5d85dc9779b1
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
359diff
changeset | 18 | following strings can be lexed in this language? | 
| 56 | 19 | |
| 20 | \begin{enumerate}
 | |
| 21 | \item \texttt{"if y4 = 3 then 1 else 3"}
 | |
| 22 | \item \texttt{"if33 ifif then then23 else else 32"}
 | |
| 23 | \item \texttt{"if x4x < 33 then 1 else 3"}
 | |
| 24 | \end{enumerate}
 | |
| 25 | ||
| 26 | In case they can, give the corresponding token sequences. (Hint: | |
| 89 
24adcc265f2e
typo
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
56diff
changeset | 27 | Observe the maximal munch rule and priorities of your regular | 
| 56 | 28 | expressions that make the process of lexing unambiguous.) | 
| 29 | ||
| 955 | 30 | \solution{
 | 
| 31 | a and b are ok; c is not matched because x4x is not an identifier according to the rules. | |
| 32 | } | |
| 33 | ||
| 34 | ||
| 56 | 35 | \item Suppose the grammar | 
| 36 | ||
| 954 | 37 | \begin{plstx}[margin=1cm]    
 | 
| 38 | :\meta{E} ::= \meta{F} \;|\; \meta{F} \cdot \texttt{*} \cdot \meta{F} \;|\; \meta{F} \cdot \backslash \cdot \meta{F}\\
 | |
| 39 | :\meta{F} ::= \meta{T} \;|\; \meta{T} \cdot \texttt{+} \cdot \meta{T} \;|\; \meta{T} \cdot \texttt{-} \cdot \meta{T}\\
 | |
| 40 | :\meta{T} ::= $num$ \;|\; \texttt{(} \cdot E \cdot \texttt{)}\\
 | |
| 41 | \end{plstx}
 | |
| 56 | 42 | |
| 954 | 43 | where \meta{E}, \meta{F} and \meta{T} are non-terminals, \meta{E} is
 | 
| 44 | the starting symbol of the grammar, and $num$ stands for a number | |
| 45 | token. Give a parse tree for the string \texttt{(3+3)+(2*3)}. Note that
 | |
| 46 | \meta{F} and \meta{T} are ``exchanged'' in this grammar in comparison with
 | |
| 47 | the usual grammar for arithmetic expressions. What does this mean in terms | |
| 48 | of precedences of the operators? | |
| 49 | ||
| 50 | \solution{
 | |
| 955 | 51 |   \begin{tikzpicture}[level distance=10mm, black,sibling distance=10mm]
 | 
| 52 |   \node {\meta{E}}
 | |
| 53 |   child[sibling distance=20mm] {node {\meta{F}} 
 | |
| 54 |     child {node {\meta{T}}
 | |
| 55 |       child[sibling distance=5mm] {node {(}}
 | |
| 56 |       child[sibling distance=5mm]  {node {\meta{E}}
 | |
| 57 |         child {node {\meta{F}}
 | |
| 58 |           child {node {\meta{T}} child {node {3}}}
 | |
| 59 |           child {node {+}}
 | |
| 60 |           child {node {\meta{T}} child {node {3}}}
 | |
| 61 | } | |
| 62 | } | |
| 63 |       child[sibling distance=5mm]  {node {)}}
 | |
| 64 | } | |
| 65 |     child {node {+}}
 | |
| 66 |     child {node {\meta{T}}
 | |
| 67 |       child[sibling distance=5mm] {node {(}}
 | |
| 68 |       child[sibling distance=5mm] {node {\meta{E}}
 | |
| 69 |         child {node {\meta{F}}
 | |
| 70 |           child {node {\meta{T}} child {node {2}}}
 | |
| 71 | } | |
| 72 |         child {node {*}}
 | |
| 73 |         child {node {\meta{F}}
 | |
| 74 |           child {node {\meta{T}} child {node {3}}}
 | |
| 75 | } | |
| 76 | } | |
| 77 |       child[sibling distance=5mm] {node {)}}
 | |
| 78 | } | |
| 79 | }; | |
| 80 | \end{tikzpicture}
 | |
| 81 | ||
| 82 | ||
| 954 | 83 | For the second part "1+2*3" will be parsed as (1+2)*3, meaning + and - bind | 
| 84 | tighter than * and $\backslash$ | |
| 85 | } | |
| 56 | 86 | |
| 87 | \item Define what it means for a grammar to be ambiguous. Give an example of | |
| 88 | an ambiguous grammar. | |
| 89 | ||
| 954 | 90 | \solution{
 | 
| 91 | Already the grammar | |
| 92 |   \begin{plstx}[margin=1cm]
 | |
| 93 |     : \meta{E} ::= \meta{E}\cdot +\cdot\meta{E} | num\\
 | |
| 94 |   \end{plstx}
 | |
| 95 | ||
| 96 | is ambiguous because a string like "1+2+3" can be parsed | |
| 97 | as (1+2)+3 or 1+(2+3). | |
| 98 | } | |
| 99 | ||
| 100 | ||
| 56 | 101 | \item Suppose boolean expressions are built up from | 
| 102 | ||
| 103 | \begin{center}
 | |
| 104 | \begin{tabular}{ll}
 | |
| 105 | 1.) & tokens for \texttt{true} and \texttt{false},\\
 | |
| 106 | 2.) & the infix operations \texttt{$\wedge$} and \texttt{$\vee$},\\
 | |
| 107 | 3.) & the prefix operation $\neg$, and\\ | |
| 108 | 4.) & can be enclosed in parentheses. | |
| 109 | \end{tabular}
 | |
| 110 | \end{center}
 | |
| 111 | ||
| 112 | (i) Give a grammar that can recognise such boolean expressions | |
| 113 | and (ii) give a sample string involving all rules given in 1.-4.~that | |
| 114 | can be parsed by this grammar. | |
| 115 | ||
| 955 | 116 | \solution{
 | 
| 117 | The simplest grammar is | |
| 118 | ||
| 119 | \begin{plstx}[margin=1cm]    
 | |
| 120 |   :\meta{B} ::= true \;|\; false \;|\;\meta{B} \cdot $\wedge$ \cdot \meta{B} \;|\; \meta{B} \cdot $\vee$ \cdot \meta{B}
 | |
| 121 |   \;|\; $\neg$\meta{B} \;|\; (\cdot\meta{B}\cdot)\\
 | |
| 122 | \end{plstx}
 | |
| 459 | 123 | |
| 955 | 124 | This is unfortunately a left-recursive grammar. Giving a not left-recursive one is a bit more involved. | 
| 125 | } | |
| 721 
e3c64f22dd31
added slides from Rochester
 Christian Urban <christian.urban@kcl.ac.uk> parents: 
647diff
changeset | 126 | |
| 647 | 127 | \item Parsing combinators consist of atomic parsers, alternative | 
| 128 | parsers, sequence parsers and semantic actions. What is the purpose | |
| 954 | 129 | of (1) atomic parsers and of (2) map-parsers (also called semantic actions)? | 
| 647 | 130 | |
| 954 | 131 |   \solution{
 | 
| 132 | Atomic parsers look at a concrete prefix of the input, like num-tokens or identifiers. | |
| 133 | Map-parsers apply a function to the output of a parser. In this way you can transform | |
| 134 | the output from, for example, a string to an integer. | |
| 135 | } | |
| 136 | ||
| 137 | ||
| 726 | 138 | \item Parser combinators can directly be given a string as | 
| 139 | input, without the need of a lexer. What are the | |
| 140 | advantages to first lex a string and then feed a | |
| 141 | sequence of tokens as input to the parser? | |
| 142 | ||
| 954 | 143 |       \solution{ Reason 1 you can filter out whitespaces and comments,
 | 
| 144 | which makes the grammar rules simpler. If you have to make | |
| 145 | sure that a whitespace comes after a variable say, then your | |
| 146 | parser rule for variables gets more complicated. Same with | |
| 147 | comments which do not contribute anything to the parse tree. | |
| 148 | ||
| 149 | Reason 2) The lexer can already classify tokens, for example | |
| 150 | as numbers, keywords or identifiers. This again makes the grammar | |
| 151 | rules more deterministic and as a result faster to parse. | |
| 898 | 152 | |
| 954 | 153 | The point is that parser combinators can be used to process | 
| 154 | strings, but in case of compilers where whitespaces and | |
| 155 | comments need to be filtered out, the lexing phase is actually | |
| 156 | quite useful. } | |
| 898 | 157 | |
| 894 
02ef5c3abc51
updatedHG: added solutions/cw5/fun_tokens.sc
 Christian Urban <christian.urban@kcl.ac.uk> parents: 
726diff
changeset | 158 | \item The injection function for sequence regular expressions is defined | 
| 
02ef5c3abc51
updatedHG: added solutions/cw5/fun_tokens.sc
 Christian Urban <christian.urban@kcl.ac.uk> parents: 
726diff
changeset | 159 | by three clauses: | 
| 726 | 160 | |
| 894 
02ef5c3abc51
updatedHG: added solutions/cw5/fun_tokens.sc
 Christian Urban <christian.urban@kcl.ac.uk> parents: 
726diff
changeset | 161 | \begin{center}
 | 
| 
02ef5c3abc51
updatedHG: added solutions/cw5/fun_tokens.sc
 Christian Urban <christian.urban@kcl.ac.uk> parents: 
726diff
changeset | 162 | \begin{tabular}{l@{\hspace{1mm}}c@{\hspace{1mm}}l}
 | 
| 
02ef5c3abc51
updatedHG: added solutions/cw5/fun_tokens.sc
 Christian Urban <christian.urban@kcl.ac.uk> parents: 
726diff
changeset | 163 | $\inj\,(r_1 \cdot r_2)\,c\,\,Seq(v_1,v_2)$ & $\dn$ & $Seq(\inj\,r_1\,c\,v_1,v_2)$\\ | 
| 
02ef5c3abc51
updatedHG: added solutions/cw5/fun_tokens.sc
 Christian Urban <christian.urban@kcl.ac.uk> parents: 
726diff
changeset | 164 | $\inj\,(r_1 \cdot r_2)\,c\,\,\Left(Seq(v_1,v_2))$ & $\dn$ & $Seq(\inj\,r_1\,c\,v_1,v_2)$\\ | 
| 
02ef5c3abc51
updatedHG: added solutions/cw5/fun_tokens.sc
 Christian Urban <christian.urban@kcl.ac.uk> parents: 
726diff
changeset | 165 |   $\inj\,(r_1 \cdot r_2)\,c\,\,Right(v)$ & $\dn$  & $Seq(\textit{mkeps}(r_1),\inj\,r_2\,c\,v)$\\
 | 
| 
02ef5c3abc51
updatedHG: added solutions/cw5/fun_tokens.sc
 Christian Urban <christian.urban@kcl.ac.uk> parents: 
726diff
changeset | 166 | \end{tabular}
 | 
| 
02ef5c3abc51
updatedHG: added solutions/cw5/fun_tokens.sc
 Christian Urban <christian.urban@kcl.ac.uk> parents: 
726diff
changeset | 167 | \end{center}
 | 
| 
02ef5c3abc51
updatedHG: added solutions/cw5/fun_tokens.sc
 Christian Urban <christian.urban@kcl.ac.uk> parents: 
726diff
changeset | 168 | |
| 954 | 169 | Explain why there are three cases in the injection function for sequence | 
| 170 | regular expressions. | |
| 171 | ||
| 172 |      \solution{
 | |
| 173 | This is because the derivative of sequences can be of the form | |
| 174 | ||
| 175 |        \begin{itemize}
 | |
| 176 | \item $(der\,c\,r_1)\cdot r_2$ | |
| 177 | \item $(der\,c\,r_1)\cdot r_2 \;+\; der\,c\,r_2$ | |
| 178 |        \end{itemize}
 | |
| 179 | ||
| 180 | In the first case the value needs to be of the form $Seq$, in the second case $\Left$ or $Right$. | |
| 181 | Therefore 3 cases. | |
| 182 | } | |
| 894 
02ef5c3abc51
updatedHG: added solutions/cw5/fun_tokens.sc
 Christian Urban <christian.urban@kcl.ac.uk> parents: 
726diff
changeset | 183 | |
| 459 | 184 | \item \POSTSCRIPT | 
| 56 | 185 | \end{enumerate}
 | 
| 186 | ||
| 187 | \end{document}
 | |
| 188 | ||
| 189 | %%% Local Variables: | |
| 190 | %%% mode: latex | |
| 191 | %%% TeX-master: t | |
| 192 | %%% End: |