% !TEX program = xelatex+ −
\documentclass{article}+ −
\usepackage{../style}+ −
\usepackage{../langs}+ −
+ −
\begin{document}+ −
+ −
\section*{Coursework 2 (Strand 1)}+ −
+ −
\noindent This coursework is worth 5\% and is due on 4+ −
November at 18:00. You are asked to implement the Sulzmann \&+ −
Lu lexer for the WHILE language. You can do the+ −
implementation in any programming language you like, but you+ −
need to submit the source code with which you answered the+ −
questions, otherwise a mark of 0\% will be awarded. You can+ −
submit your answers in a txt-file or as pdf. Code submit as + −
code.+ −
+ −
\subsection*{Disclaimer}+ −
+ −
It should be understood that the work you submit represents+ −
your own effort. You have not copied from anyone else. An+ −
exception is the Scala code from KEATS and the code I showed+ −
during the lectures, which you can both freely use. You can+ −
also use your own code from the CW~1.+ −
+ −
\subsection*{Question 1}+ −
+ −
To implement a lexer for the WHILE language, you first+ −
need to design the appropriate regular expressions for the+ −
following eight syntactic entities:+ −
+ −
\begin{enumerate}+ −
\item keywords are+ −
+ −
\begin{quote}+ −
\texttt{while}, + −
\texttt{if}, + −
\texttt{then}, + −
\texttt{else}, + −
\texttt{do}, + −
\texttt{for}, + −
\texttt{to}, + −
\texttt{true}, + −
\texttt{false}, + −
\texttt{read}, + −
\texttt{write},+ −
\texttt{skip}+ −
\end{quote} + −
+ −
\item operators are+ −
+ −
\begin{quote}+ −
\texttt{+}, + −
\texttt{-}, + −
\texttt{*}, + −
\texttt{\%},+ −
\texttt{/},+ −
\texttt{==}, + −
\texttt{!=}, + −
\texttt{>}, + −
\texttt{<}, + −
\texttt{:=},+ −
\texttt{\&\&},+ −
\texttt{||}+ −
\end{quote} + −
+ −
\item strings are enclosed by \texttt{"\ldots"} + −
\item parentheses are \texttt{(}, \texttt{\{}, \texttt{)} and \texttt{\}}+ −
\item there are semicolons \texttt{;}+ −
\item whitespaces are either \texttt{" "} (one or more) or \texttt{$\backslash$n} or+ −
\texttt{$\backslash$t}+ −
\item identifiers are letters followed by underscores \texttt{\_\!\_}, letters+ −
or digits+ −
\item numbers are \pcode{0}, \pcode{1}, \ldots and so on; give + −
a regular expression that can recognise \pcode{0}, but not numbers + −
with leading zeroes, such as \pcode{001}+ −
\end{enumerate}+ −
+ −
\noindent+ −
You can use the basic regular expressions + −
+ −
\[+ −
\ZERO,\; \ONE,\; c,\; r_1 + r_2,\; r_1 \cdot r_2,\; r^*+ −
\]+ −
+ −
\noindent+ −
but also the following extended regular expressions+ −
+ −
\begin{center}+ −
\begin{tabular}{ll}+ −
$[c_1,c_2,\ldots,c_n]$ & a set of characters\\+ −
$r^+$ & one or more times $r$\\+ −
$r^?$ & optional $r$\\+ −
$r^{\{n\}}$ & n-times $r$\\+ −
\end{tabular}+ −
\end{center}+ −
+ −
\noindent+ −
Later on you will also need the record regular expression:+ −
+ −
\begin{center}+ −
\begin{tabular}{ll}+ −
$REC(x:r)$ & record regular expression\\+ −
\end{tabular}+ −
\end{center}+ −
+ −
\noindent Try to design your regular expressions to be as+ −
small as possible. For example you should use character sets+ −
for identifiers and numbers. Feel free to use the general+ −
character constructor \textit{CFUN} introduced in CW 1.+ −
+ −
\subsection*{Question 2}+ −
+ −
Implement the Sulzmann \& Lu lexer from the lectures. For+ −
this you need to implement the functions $nullable$ and $der$+ −
(you can use your code from CW~1), as well as $mkeps$ and+ −
$inj$. These functions need to be appropriately extended for+ −
the extended regular expressions from Q1. Write down the + −
clauses for+ −
+ −
\begin{center}+ −
\begin{tabular}{@ {}l@ {\hspace{2mm}}c@ {\hspace{2mm}}l@ {}}+ −
$mkeps([c_1,c_2,\ldots,c_n])$ & $\dn$ & $?$\\+ −
$mkeps(r^+)$ & $\dn$ & $?$\\+ −
$mkeps(r^?)$ & $\dn$ & $?$\\+ −
$mkeps(r^{\{n\}})$ & $\dn$ & $?$\medskip\\+ −
$inj\, ([c_1,c_2,\ldots,c_n])\,c\,\ldots$ & $\dn$ & $?$\\+ −
$inj\, (r^+)\,c\,\ldots$ & $\dn$ & $?$\\+ −
$inj\, (r^?)\,c\,\ldots$ & $\dn$ & $?$\\+ −
$inj\, (r^{\{n\}})\,c\,\ldots$ & $\dn$ & $?$\\+ −
\end{tabular}+ −
\end{center}+ −
+ −
\noindent where $inj$ takes three arguments: a regular+ −
expression, a character and a value. Test your lexer code+ −
with at least the two small examples below:+ −
+ −
\begin{center}+ −
\begin{tabular}{ll}+ −
regex: & string:\smallskip\\+ −
$a^{\{3\}}$ & $aaa$\\+ −
$(a + \ONE)^{\{3\}}$ & $aa$+ −
\end{tabular}+ −
\end{center}+ −
+ −
+ −
\noindent Both strings should be successfully lexed by the+ −
respective regular expression, that means the lexer returns + −
in both examples a value.+ −
+ −
+ −
Also add the record regular expression from the+ −
lectures to your lexer and implement a function, say+ −
\pcode{env}, that returns all assignments from a value (such+ −
that you can extract easily the tokens from a value).\medskip + −
+ −
\noindent+ −
Finally give the tokens for your regular expressions from Q1 and the+ −
string+ −
+ −
\begin{center}+ −
\code{"read n;"}+ −
\end{center} + −
+ −
\noindent+ −
and use your \pcode{env} function to give the token sequence.+ −
+ −
+ −
\subsection*{Question 3}+ −
+ −
Extend your lexer from Q2 to also simplify regular expressions+ −
after each derivation step and rectify the computed values after each+ −
injection. Use this lexer to tokenize the programs in+ −
Figure~\ref{fib}, \ref{loop} and \ref{factors}. Give the tokens of these+ −
programs where whitespaces are filtered out. Make sure you can+ −
tokenise \textbf{exactly} these programs.\bigskip+ −
+ −
+ −
\begin{figure}[h]+ −
\mbox{\lstinputlisting[language=While,xleftmargin=10mm]{../progs/fib.while}}+ −
\caption{Fibonacci program in the WHILE language.\label{fib}}+ −
\end{figure}+ −
+ −
\begin{figure}[h]+ −
\mbox{\lstinputlisting[language=While,xleftmargin=10mm]{../progs/loops.while}}+ −
\caption{The three-nested-loops program in the WHILE language. + −
(Usually used for timing measurements.)\label{loop}}+ −
\end{figure}+ −
+ −
\begin{figure}[h]+ −
\mbox{\lstinputlisting[language=While,xleftmargin=10mm]{../progs/factors.while}}+ −
\caption{A program that calculates factors for numbers in the WHILE+ −
language.\label{factors}}+ −
\end{figure}+ −
+ −
\end{document}+ −
+ −
%%% Local Variables: + −
%%% mode: latex+ −
%%% TeX-master: t+ −
%%% End: + −