\documentclass{article}
\usepackage{../style}
\usepackage{../langs}
\begin{document}
\section*{Coursework 2 (Strand 1)}
\noindent
This coursework is worth 5\% and is due on 3 November at 16:00. You
are asked to implement the Sulzmann tokeniser for the WHILE language.
You need to submit a document containing the answers for the questions
below. You can do the implementation in any programming language you
like, but you need to submit the source code with which you answered
the questions. However, the coursework will \emph{only} be judged
according to the answers. You can submit your answers in a txt-file or
as pdf.
\subsection*{Disclaimer}
It should be understood that the work you submit represents your own
effort. You have not copied from anyone else. An exception is the
Scala code I showed during the lectures, which you can use.
You can also use your own code from the CW~1.
\subsection*{Question 1 (marked with 1\%)}
To implement a tokeniser for the WHILE language, you first need to design
the appropriate regular expressions for the following eight syntactic entities:
\begin{enumerate}
\item keywords are
\begin{quote}
\texttt{while},
\texttt{if},
\texttt{then},
\texttt{else},
\texttt{do},
\texttt{for},
\texttt{to},
\texttt{true},
\texttt{false},
\texttt{read},
\texttt{write},
\texttt{skip}
\end{quote}
\item operators are
\begin{quote}
\texttt{+},
\texttt{-},
\texttt{*},
\texttt{\%},
\texttt{/},
\texttt{==},
\texttt{!=},
\texttt{>},
\texttt{<},
\texttt{:=},
\texttt{\&\&},
\texttt{||}
\end{quote}
\item strings are enclosed by \texttt{"\ldots"}
\item parentheses are \texttt{(}, \texttt{\{}, \texttt{)} and \texttt{\}}
\item there are semicolons \texttt{;}
\item whitespaces are either \texttt{" "} (one or more) or \texttt{$\backslash$n}
\item identifiers are letters followed by underscores \texttt{\_\!\_}, letters
or digits
\item numbers are \texttt{0}, \text{1}, \ldots
\end{enumerate}
\noindent
You can use the basic regular expressions
\[
\varnothing, \epsilon, c, r_1 + r_2, r_1 \cdot r_2, r^*
\]
\noindent
but also the following extended regular expressions
\begin{center}
\begin{tabular}{ll}
$[c_1 c_2 \ldots c_n]$ & a range of characters\\
$r^+$ & one or more times $r$\\
$r^?$ & optional $r$\\
$r^{\{n\}}$ & n-times $r$\\
\end{tabular}
\end{center}
\noindent
Once you have designed all regular expressions for 1 - 8, then
give the token sequence for the Fibonacci program shown below in Fig.~\ref{fib}.
\subsection*{Question 2 (marked with 3\%)}
Implement the Sulzmann tokeniser from the lectures. For this you need
to implement the functions $nullable$ and $der$ (you can use your code
from CW 1), as well as $mkeps$ and $inj$. These functions need to be
appropriately extended for the extended regular expressions from
Q1. Also add the record regular expression from the lectures and
implement a function, say \pcode{env}, that returns all assignments
from a value (such that you can extract easily the tokens from a
value).
The functions $mkeps$ and $inj$ return values. Calculate
the value for your regular expressions from Q1 and the string
\begin{center}
\code{"read n;"}
\end{center}
\noindent
and use your \pcode{env} function to give the token sequence.
\subsection*{Question 3 (marked with 1\%)}
Extend your tokenizer from Q2 to also simplify regular expressions
after each derivation step and rectify the computed values after each
injection. Use this tokenizer to tokenize the programs in
Figure~\ref{fib} and \ref{loop}.
\begin{figure}[p]
\begin{center}
\mbox{\lstinputlisting[language=while]{../progs/fib.while}}
\end{center}
\caption{Fibonacci program in the WHILE language.\label{fib}}
\end{figure}
\begin{figure}[p]
\begin{center}
\mbox{\lstinputlisting[language=while]{../progs/loops.while}}
\end{center}
\caption{The three-nested-loops program in the WHILE language.
Usually used for timing measurements.\label{loop}}
\end{figure}
\end{document}
%%% Local Variables:
%%% mode: latex
%%% TeX-master: t
%%% End: