630
|
1 |
% !TEX program = xelatex
|
178
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents:
diff
changeset
|
2 |
\documentclass{article}
|
275
Christian Urban <christian dot urban at kcl dot ac dot uk>
diff
changeset
|
3 |
\usepackage{../style}
|
216
Christian Urban <christian dot urban at kcl dot ac dot uk>
diff
changeset
|
4 |
\usepackage{../langs}
|
178
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents:
diff
changeset
|
5 |
|
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents:
diff
changeset
|
6 |
\begin{document}
|
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents:
diff
changeset
|
7 |
|
748
|
8 |
\section*{Coursework 2}
|
198
Christian Urban <christian dot urban at kcl dot ac dot uk>
diff
changeset
|
9 |
|
748
|
10 |
\noindent This coursework is worth 8\% and is due on \cwTWO{} at
|
|
11 |
18:00. You are asked to implement the Sulzmann \& Lu lexer for the
|
|
12 |
WHILE language. You can do the implementation in any programming
|
|
13 |
language you like, but you need to submit the source code with which
|
|
14 |
you answered the questions, otherwise a mark of 0\% will be
|
|
15 |
awarded. You can submit your answers in a txt-file or as pdf. Code
|
|
16 |
submit as code. Please package everything in a zip-file that creates a
|
|
17 |
directory with the name \texttt{YournameYourfamilyname} on my end. Thanks!
|
180
Christian Urban <christian dot urban at kcl dot ac dot uk>
diff
changeset
|
18 |
|
750
|
19 |
\subsection*{Disclaimer\alert}
|
178
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents:
diff
changeset
|
20 |
|
358
Christian Urban <christian dot urban at kcl dot ac dot uk>
diff
changeset
|
21 |
It should be understood that the work you submit represents
|
Christian Urban <christian dot urban at kcl dot ac dot uk>
diff
changeset
|
22 |
your own effort. You have not copied from anyone else. An
|
363
Christian Urban <christian dot urban at kcl dot ac dot uk>
diff
changeset
|
23 |
exception is the Scala code from KEATS and the code I showed
|
419
Christian Urban <christian dot urban at kcl dot ac dot uk>
diff
changeset
|
24 |
during the lectures, which you can both freely use. You can
|
Christian Urban <christian dot urban at kcl dot ac dot uk>
diff
changeset
|
25 |
also use your own code from the CW~1.
|
178
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents:
diff
changeset
|
26 |
|
419
Christian Urban <christian dot urban at kcl dot ac dot uk>
diff
changeset
|
27 |
\subsection*{Question 1}
|
178
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents:
diff
changeset
|
28 |
|
419
Christian Urban <christian dot urban at kcl dot ac dot uk>
diff
changeset
|
29 |
To implement a lexer for the WHILE language, you first
|
358
Christian Urban <christian dot urban at kcl dot ac dot uk>
diff
changeset
|
30 |
need to design the appropriate regular expressions for the
|
748
|
31 |
following eleven syntactic entities:
|
178
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents:
diff
changeset
|
32 |
|
180
Christian Urban <christian dot urban at kcl dot ac dot uk>
diff
changeset
|
33 |
\begin{enumerate}
|
Christian Urban <christian dot urban at kcl dot ac dot uk>
diff
changeset
|
34 |
\item keywords are
|
Christian Urban <christian dot urban at kcl dot ac dot uk>
diff
changeset
|
35 |
|
748
|
36 |
\begin{center}
|
275
Christian Urban <christian dot urban at kcl dot ac dot uk>
diff
changeset
|
37 |
\texttt{while},
|
Christian Urban <christian dot urban at kcl dot ac dot uk>
diff
changeset
|
38 |
\texttt{if},
|
Christian Urban <christian dot urban at kcl dot ac dot uk>
diff
changeset
|
39 |
\texttt{then},
|
Christian Urban <christian dot urban at kcl dot ac dot uk>
diff
changeset
|
40 |
\texttt{else},
|
Christian Urban <christian dot urban at kcl dot ac dot uk>
diff
changeset
|
41 |
\texttt{do},
|
Christian Urban <christian dot urban at kcl dot ac dot uk>
diff
changeset
|
42 |
\texttt{for},
|
Christian Urban <christian dot urban at kcl dot ac dot uk>
diff
changeset
|
43 |
\texttt{to},
|
Christian Urban <christian dot urban at kcl dot ac dot uk>
diff
changeset
|
44 |
\texttt{true},
|
Christian Urban <christian dot urban at kcl dot ac dot uk>
diff
changeset
|
45 |
\texttt{false},
|
Christian Urban <christian dot urban at kcl dot ac dot uk>
diff
changeset
|
46 |
\texttt{read},
|
Christian Urban <christian dot urban at kcl dot ac dot uk>
diff
changeset
|
47 |
\texttt{write},
|
Christian Urban <christian dot urban at kcl dot ac dot uk>
diff
changeset
|
48 |
\texttt{skip}
|
748
|
49 |
\end{center}
|
180
Christian Urban <christian dot urban at kcl dot ac dot uk>
diff
changeset
|
50 |
|
748
|
51 |
\item operators are:
|
275
Christian Urban <christian dot urban at kcl dot ac dot uk>
diff
changeset
|
52 |
\texttt{+},
|
Christian Urban <christian dot urban at kcl dot ac dot uk>
diff
changeset
|
53 |
\texttt{-},
|
Christian Urban <christian dot urban at kcl dot ac dot uk>
diff
changeset
|
54 |
\texttt{*},
|
Christian Urban <christian dot urban at kcl dot ac dot uk>
diff
changeset
|
55 |
\texttt{\%},
|
Christian Urban <christian dot urban at kcl dot ac dot uk>
diff
changeset
|
56 |
\texttt{/},
|
Christian Urban <christian dot urban at kcl dot ac dot uk>
diff
changeset
|
57 |
\texttt{==},
|
Christian Urban <christian dot urban at kcl dot ac dot uk>
diff
changeset
|
58 |
\texttt{!=},
|
Christian Urban <christian dot urban at kcl dot ac dot uk>
diff
changeset
|
59 |
\texttt{>},
|
748
|
60 |
\texttt{<},
|
|
61 |
\texttt{<=},
|
|
62 |
\texttt{>=},
|
275
Christian Urban <christian dot urban at kcl dot ac dot uk>
diff
changeset
|
63 |
\texttt{:=},
|
Christian Urban <christian dot urban at kcl dot ac dot uk>
diff
changeset
|
64 |
\texttt{\&\&},
|
Christian Urban <christian dot urban at kcl dot ac dot uk>
diff
changeset
|
65 |
\texttt{||}
|
748
|
66 |
|
|
67 |
\item letters are uppercase and lowercase
|
180
Christian Urban <christian dot urban at kcl dot ac dot uk>
diff
changeset
|
68 |
|
748
|
69 |
\item symbols are letters plus the characters
|
|
70 |
\texttt{.},
|
|
71 |
\texttt{\_},
|
|
72 |
\texttt{>},
|
|
73 |
\texttt{<},
|
|
74 |
\texttt{=},
|
|
75 |
\texttt{;},
|
|
76 |
\texttt{,} and
|
|
77 |
\texttt{:}
|
|
78 |
|
797
|
79 |
\textcolor{red}{Please also add \texttt{$\backslash$} for the collatz program.}
|
|
80 |
|
748
|
81 |
\item strings are enclosed by \texttt{"\ldots"} and consisting of
|
|
82 |
symbols, whitespaces and digits
|
180
Christian Urban <christian dot urban at kcl dot ac dot uk>
diff
changeset
|
83 |
\item parentheses are \texttt{(}, \texttt{\{}, \texttt{)} and \texttt{\}}
|
Christian Urban <christian dot urban at kcl dot ac dot uk>
diff
changeset
|
84 |
\item there are semicolons \texttt{;}
|
447
Christian Urban <christian dot urban at kcl dot ac dot uk>
diff
changeset
|
85 |
\item whitespaces are either \texttt{" "} (one or more) or \texttt{$\backslash$n} or
|
Christian Urban <christian dot urban at kcl dot ac dot uk>
diff
changeset
|
86 |
\texttt{$\backslash$t}
|
180
Christian Urban <christian dot urban at kcl dot ac dot uk>
diff
changeset
|
87 |
\item identifiers are letters followed by underscores \texttt{\_\!\_}, letters
|
Christian Urban <christian dot urban at kcl dot ac dot uk>
diff
changeset
|
88 |
or digits
|
396
Christian Urban <christian dot urban at kcl dot ac dot uk>
diff
changeset
|
89 |
\item numbers are \pcode{0}, \pcode{1}, \ldots and so on; give
|
Christian Urban <christian dot urban at kcl dot ac dot uk>
diff
changeset
|
90 |
a regular expression that can recognise \pcode{0}, but not numbers
|
Christian Urban <christian dot urban at kcl dot ac dot uk>
diff
changeset
|
91 |
with leading zeroes, such as \pcode{001}
|
748
|
92 |
\item comments start with \texttt{//} and contain symbols, spaces and digits until the end of the line
|
180
Christian Urban <christian dot urban at kcl dot ac dot uk>
diff
changeset
|
93 |
\end{enumerate}
|
178
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents:
diff
changeset
|
94 |
|
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents:
diff
changeset
|
95 |
\noindent
|
275
Christian Urban <christian dot urban at kcl dot ac dot uk>
diff
changeset
|
96 |
You can use the basic regular expressions
|
178
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents:
diff
changeset
|
97 |
|
275
Christian Urban <christian dot urban at kcl dot ac dot uk>
diff
changeset
|
98 |
\[
|
419
Christian Urban <christian dot urban at kcl dot ac dot uk>
diff
changeset
|
99 |
\ZERO,\; \ONE,\; c,\; r_1 + r_2,\; r_1 \cdot r_2,\; r^*
|
275
Christian Urban <christian dot urban at kcl dot ac dot uk>
diff
changeset
|
100 |
\]
|
178
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents:
diff
changeset
|
101 |
|
275
Christian Urban <christian dot urban at kcl dot ac dot uk>
diff
changeset
|
102 |
\noindent
|
Christian Urban <christian dot urban at kcl dot ac dot uk>
diff
changeset
|
103 |
but also the following extended regular expressions
|
182
Christian Urban <christian dot urban at kcl dot ac dot uk>
diff
changeset
|
104 |
|
Christian Urban <christian dot urban at kcl dot ac dot uk>
diff
changeset
|
105 |
\begin{center}
|
275
Christian Urban <christian dot urban at kcl dot ac dot uk>
diff
changeset
|
106 |
\begin{tabular}{ll}
|
494
|
107 |
$[c_1,c_2,\ldots,c_n]$ & a set of characters\\
|
275
Christian Urban <christian dot urban at kcl dot ac dot uk>
diff
changeset
|
108 |
$r^+$ & one or more times $r$\\
|
Christian Urban <christian dot urban at kcl dot ac dot uk>
diff
changeset
|
109 |
$r^?$ & optional $r$\\
|
Christian Urban <christian dot urban at kcl dot ac dot uk>
diff
changeset
|
110 |
$r^{\{n\}}$ & n-times $r$\\
|
182
Christian Urban <christian dot urban at kcl dot ac dot uk>
diff
changeset
|
111 |
\end{tabular}
|
Christian Urban <christian dot urban at kcl dot ac dot uk>
diff
changeset
|
112 |
\end{center}
|
Christian Urban <christian dot urban at kcl dot ac dot uk>
diff
changeset
|
113 |
|
458
|
114 |
\noindent
|
473
|
115 |
Later on you will also need the record regular expression:
|
458
|
116 |
|
|
117 |
\begin{center}
|
|
118 |
\begin{tabular}{ll}
|
|
119 |
$REC(x:r)$ & record regular expression\\
|
|
120 |
\end{tabular}
|
|
121 |
\end{center}
|
|
122 |
|
396
Christian Urban <christian dot urban at kcl dot ac dot uk>
diff
changeset
|
123 |
\noindent Try to design your regular expressions to be as
|
494
|
124 |
small as possible. For example you should use character sets
|
|
125 |
for identifiers and numbers. Feel free to use the general
|
|
126 |
character constructor \textit{CFUN} introduced in CW 1.
|
275
Christian Urban <christian dot urban at kcl dot ac dot uk>
diff
changeset
|
127 |
|
419
Christian Urban <christian dot urban at kcl dot ac dot uk>
diff
changeset
|
128 |
\subsection*{Question 2}
|
275
Christian Urban <christian dot urban at kcl dot ac dot uk>
diff
changeset
|
129 |
|
419
Christian Urban <christian dot urban at kcl dot ac dot uk>
diff
changeset
|
130 |
Implement the Sulzmann \& Lu lexer from the lectures. For
|
358
Christian Urban <christian dot urban at kcl dot ac dot uk>
diff
changeset
|
131 |
this you need to implement the functions $nullable$ and $der$
|
369
Christian Urban <christian dot urban at kcl dot ac dot uk>
diff
changeset
|
132 |
(you can use your code from CW~1), as well as $mkeps$ and
|
358
Christian Urban <christian dot urban at kcl dot ac dot uk>
diff
changeset
|
133 |
$inj$. These functions need to be appropriately extended for
|
369
Christian Urban <christian dot urban at kcl dot ac dot uk>
diff
changeset
|
134 |
the extended regular expressions from Q1. Write down the
|
Christian Urban <christian dot urban at kcl dot ac dot uk>
diff
changeset
|
135 |
clauses for
|
275
Christian Urban <christian dot urban at kcl dot ac dot uk>
diff
changeset
|
136 |
|
369
Christian Urban <christian dot urban at kcl dot ac dot uk>
diff
changeset
|
137 |
\begin{center}
|
Christian Urban <christian dot urban at kcl dot ac dot uk>
diff
changeset
|
138 |
\begin{tabular}{@ {}l@ {\hspace{2mm}}c@ {\hspace{2mm}}l@ {}}
|
494
|
139 |
$mkeps([c_1,c_2,\ldots,c_n])$ & $\dn$ & $?$\\
|
369
Christian Urban <christian dot urban at kcl dot ac dot uk>
diff
changeset
|
140 |
$mkeps(r^+)$ & $\dn$ & $?$\\
|
Christian Urban <christian dot urban at kcl dot ac dot uk>
diff
changeset
|
141 |
$mkeps(r^?)$ & $\dn$ & $?$\\
|
Christian Urban <christian dot urban at kcl dot ac dot uk>
diff
changeset
|
142 |
$mkeps(r^{\{n\}})$ & $\dn$ & $?$\medskip\\
|
494
|
143 |
$inj\, ([c_1,c_2,\ldots,c_n])\,c\,\ldots$ & $\dn$ & $?$\\
|
369
Christian Urban <christian dot urban at kcl dot ac dot uk>
diff
changeset
|
144 |
$inj\, (r^+)\,c\,\ldots$ & $\dn$ & $?$\\
|
Christian Urban <christian dot urban at kcl dot ac dot uk>
diff
changeset
|
145 |
$inj\, (r^?)\,c\,\ldots$ & $\dn$ & $?$\\
|
Christian Urban <christian dot urban at kcl dot ac dot uk>
diff
changeset
|
146 |
$inj\, (r^{\{n\}})\,c\,\ldots$ & $\dn$ & $?$\\
|
Christian Urban <christian dot urban at kcl dot ac dot uk>
diff
changeset
|
147 |
\end{tabular}
|
Christian Urban <christian dot urban at kcl dot ac dot uk>
diff
changeset
|
148 |
\end{center}
|
Christian Urban <christian dot urban at kcl dot ac dot uk>
diff
changeset
|
149 |
|
Christian Urban <christian dot urban at kcl dot ac dot uk>
diff
changeset
|
150 |
\noindent where $inj$ takes three arguments: a regular
|
396
Christian Urban <christian dot urban at kcl dot ac dot uk>
diff
changeset
|
151 |
expression, a character and a value. Test your lexer code
|
Christian Urban <christian dot urban at kcl dot ac dot uk>
diff
changeset
|
152 |
with at least the two small examples below:
|
Christian Urban <christian dot urban at kcl dot ac dot uk>
diff
changeset
|
153 |
|
Christian Urban <christian dot urban at kcl dot ac dot uk>
diff
changeset
|
154 |
\begin{center}
|
Christian Urban <christian dot urban at kcl dot ac dot uk>
diff
changeset
|
155 |
\begin{tabular}{ll}
|
Christian Urban <christian dot urban at kcl dot ac dot uk>
diff
changeset
|
156 |
regex: & string:\smallskip\\
|
Christian Urban <christian dot urban at kcl dot ac dot uk>
diff
changeset
|
157 |
$a^{\{3\}}$ & $aaa$\\
|
458
|
158 |
$(a + \ONE)^{\{3\}}$ & $aa$
|
396
Christian Urban <christian dot urban at kcl dot ac dot uk>
diff
changeset
|
159 |
\end{tabular}
|
Christian Urban <christian dot urban at kcl dot ac dot uk>
diff
changeset
|
160 |
\end{center}
|
Christian Urban <christian dot urban at kcl dot ac dot uk>
diff
changeset
|
161 |
|
Christian Urban <christian dot urban at kcl dot ac dot uk>
diff
changeset
|
162 |
|
598
|
163 |
\noindent Both strings should be successfully lexed by the
|
396
Christian Urban <christian dot urban at kcl dot ac dot uk>
diff
changeset
|
164 |
respective regular expression, that means the lexer returns
|
Christian Urban <christian dot urban at kcl dot ac dot uk>
diff
changeset
|
165 |
in both examples a value.
|
Christian Urban <christian dot urban at kcl dot ac dot uk>
diff
changeset
|
166 |
|
Christian Urban <christian dot urban at kcl dot ac dot uk>
diff
changeset
|
167 |
|
Christian Urban <christian dot urban at kcl dot ac dot uk>
diff
changeset
|
168 |
Also add the record regular expression from the
|
419
Christian Urban <christian dot urban at kcl dot ac dot uk>
diff
changeset
|
169 |
lectures to your lexer and implement a function, say
|
396
Christian Urban <christian dot urban at kcl dot ac dot uk>
diff
changeset
|
170 |
\pcode{env}, that returns all assignments from a value (such
|
Christian Urban <christian dot urban at kcl dot ac dot uk>
diff
changeset
|
171 |
that you can extract easily the tokens from a value).\medskip
|
369
Christian Urban <christian dot urban at kcl dot ac dot uk>
diff
changeset
|
172 |
|
Christian Urban <christian dot urban at kcl dot ac dot uk>
diff
changeset
|
173 |
\noindent
|
384
Christian Urban <christian dot urban at kcl dot ac dot uk>
diff
changeset
|
174 |
Finally give the tokens for your regular expressions from Q1 and the
|
369
Christian Urban <christian dot urban at kcl dot ac dot uk>
diff
changeset
|
175 |
string
|
275
Christian Urban <christian dot urban at kcl dot ac dot uk>
diff
changeset
|
176 |
|
Christian Urban <christian dot urban at kcl dot ac dot uk>
diff
changeset
|
177 |
\begin{center}
|
Christian Urban <christian dot urban at kcl dot ac dot uk>
diff
changeset
|
178 |
\code{"read n;"}
|
Christian Urban <christian dot urban at kcl dot ac dot uk>
diff
changeset
|
179 |
\end{center}
|
Christian Urban <christian dot urban at kcl dot ac dot uk>
diff
changeset
|
180 |
|
Christian Urban <christian dot urban at kcl dot ac dot uk>
diff
changeset
|
181 |
\noindent
|
Christian Urban <christian dot urban at kcl dot ac dot uk>
diff
changeset
|
182 |
and use your \pcode{env} function to give the token sequence.
|
Christian Urban <christian dot urban at kcl dot ac dot uk>
diff
changeset
|
183 |
|
333
Christian Urban <christian dot urban at kcl dot ac dot uk>
diff
changeset
|
184 |
|
419
Christian Urban <christian dot urban at kcl dot ac dot uk>
diff
changeset
|
185 |
\subsection*{Question 3}
|
275
Christian Urban <christian dot urban at kcl dot ac dot uk>
diff
changeset
|
186 |
|
748
|
187 |
Extend your lexer from Q2 to also simplify regular expressions after
|
|
188 |
each derivation step and rectify the computed values after each
|
419
Christian Urban <christian dot urban at kcl dot ac dot uk>
diff
changeset
|
189 |
injection. Use this lexer to tokenize the programs in
|
748
|
190 |
Figures~\ref{fib} -- \ref{collatz}. You can find the programms also on
|
|
191 |
KEATS. Give the tokens of these programs where whitespaces are
|
|
192 |
filtered out. Make sure you can tokenise \textbf{exactly} these
|
|
193 |
programs.\bigskip
|
182
Christian Urban <christian dot urban at kcl dot ac dot uk>
diff
changeset
|
194 |
|
178
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents:
diff
changeset
|
195 |
|
578
|
196 |
\begin{figure}[h]
|
748
|
197 |
\mbox{\lstinputlisting[language=While,xleftmargin=10mm]{../progs/while-tests/fib.while}}
|
181
Christian Urban <christian dot urban at kcl dot ac dot uk>
diff
changeset
|
198 |
\caption{Fibonacci program in the WHILE language.\label{fib}}
|
Christian Urban <christian dot urban at kcl dot ac dot uk>
diff
changeset
|
199 |
\end{figure}
|
178
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents:
diff
changeset
|
200 |
|
578
|
201 |
\begin{figure}[h]
|
748
|
202 |
\mbox{\lstinputlisting[language=While,xleftmargin=10mm]{../progs/while-tests/loops.while}}
|
275
Christian Urban <christian dot urban at kcl dot ac dot uk>
diff
changeset
|
203 |
\caption{The three-nested-loops program in the WHILE language.
|
578
|
204 |
(Usually used for timing measurements.)\label{loop}}
|
181
Christian Urban <christian dot urban at kcl dot ac dot uk>
diff
changeset
|
205 |
\end{figure}
|
178
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents:
diff
changeset
|
206 |
|
659
|
207 |
\begin{figure}[h]
|
748
|
208 |
\mbox{\lstinputlisting[language=While,xleftmargin=10mm]{../progs/while-tests/factors.while}}
|
659
|
209 |
\caption{A program that calculates factors for numbers in the WHILE
|
|
210 |
language.\label{factors}}
|
|
211 |
\end{figure}
|
|
212 |
|
748
|
213 |
\begin{figure}[h]
|
|
214 |
\mbox{\lstinputlisting[language=While,xleftmargin=10mm]{../progs/while-tests/collatz2.while}}
|
|
215 |
\caption{A program that calculates the Collatz series for numbers
|
|
216 |
between 1 and 100.\label{collatz}}
|
|
217 |
\end{figure}
|
|
218 |
|
178
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents:
diff
changeset
|
219 |
\end{document}
|
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents:
diff
changeset
|
220 |
|
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents:
diff
changeset
|
221 |
%%% Local Variables:
|
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents:
diff
changeset
|
222 |
%%% mode: latex
|
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents:
diff
changeset
|
223 |
%%% TeX-master: t
|
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents:
diff
changeset
|
224 |
%%% End:
|