\documentclass{article}\usepackage{../style}\usepackage{../langs}\begin{document}\section*{Coursework 2 (Strand 1)}\noindent This coursework is worth 5\% and is due on 6November at 16:00. You are asked to implement the Sulzmann \&Lu tokeniser for the WHILE language. You can do theimplementation in any programming language you like, but youneed to submit the source code with which you answered thequestions, otherwise a mark of 0\% will be awarded. You cansubmit your answers in a txt-file or as pdf.\subsection*{Disclaimer}It should be understood that the work you submit representsyour own effort. You have not copied from anyone else. Anexception is the Scala code from KEATS and the code I showedduring the lectures, which you can both use. You can also useyour own code from the CW~1.\subsection*{Question 1 (marked with 1\%)}To implement a tokeniser for the WHILE language, you firstneed to design the appropriate regular expressions for thefollowing eight syntactic entities:\begin{enumerate}\item keywords are\begin{quote}\texttt{while}, \texttt{if}, \texttt{then}, \texttt{else}, \texttt{do}, \texttt{for}, \texttt{to}, \texttt{true}, \texttt{false}, \texttt{read}, \texttt{write},\texttt{skip}\end{quote} \item operators are\begin{quote}\texttt{+}, \texttt{-}, \texttt{*}, \texttt{\%},\texttt{/},\texttt{==}, \texttt{!=}, \texttt{>}, \texttt{<}, \texttt{:=},\texttt{\&\&},\texttt{||}\end{quote} \item strings are enclosed by \texttt{"\ldots"} \item parentheses are \texttt{(}, \texttt{\{}, \texttt{)} and \texttt{\}}\item there are semicolons \texttt{;}\item whitespaces are either \texttt{" "} (one or more) or \texttt{$\backslash$n}\item identifiers are letters followed by underscores \texttt{\_\!\_}, lettersor digits\item numbers are \pcode{0}, \pcode{1}, \ldots and so on; give a regular expression that can recognise \pcode{0}, but not numbers with leading zeroes, such as \pcode{001}\end{enumerate}\noindentYou can use the basic regular expressions \[\varnothing, \epsilon, c, r_1 + r_2, r_1 \cdot r_2, r^*\]\noindentbut also the following extended regular expressions\begin{center}\begin{tabular}{ll}$[c_1 c_2 \ldots c_n]$ & a range of characters\\$r^+$ & one or more times $r$\\$r^?$ & optional $r$\\$r^{\{n\}}$ & n-times $r$\\\end{tabular}\end{center}\noindent Try to design your regular expressions to be assmall as possible. For example you should use character rangesfor identifiers and numbers.\subsection*{Question 2 (marked with 3\%)}Implement the Sulzmann \& Lu tokeniser from the lectures. Forthis you need to implement the functions $nullable$ and $der$(you can use your code from CW~1), as well as $mkeps$ and$inj$. These functions need to be appropriately extended forthe extended regular expressions from Q1. Write down the clauses for\begin{center}\begin{tabular}{@ {}l@ {\hspace{2mm}}c@ {\hspace{2mm}}l@ {}}$mkeps([c_1 c_2 \ldots c_n])$ & $\dn$ & $?$\\$mkeps(r^+)$ & $\dn$ & $?$\\$mkeps(r^?)$ & $\dn$ & $?$\\$mkeps(r^{\{n\}})$ & $\dn$ & $?$\medskip\\$inj\, ([c_1 c_2 \ldots c_n])\,c\,\ldots$ & $\dn$ & $?$\\$inj\, (r^+)\,c\,\ldots$ & $\dn$ & $?$\\$inj\, (r^?)\,c\,\ldots$ & $\dn$ & $?$\\$inj\, (r^{\{n\}})\,c\,\ldots$ & $\dn$ & $?$\\\end{tabular}\end{center}\noindent where $inj$ takes three arguments: a regularexpression, a character and a value. Test your lexer codewith at least the two small examples below:\begin{center}\begin{tabular}{ll}regex: & string:\smallskip\\$a^{\{3\}}$ & $aaa$\\$(a + \epsilon)^{\{3\}}$ & $aa$\end{tabular}\end{center}\noindent Both strings should be sucessfully lexed by therespective regular expression, that means the lexer returns in both examples a value.Also add the record regular expression from thelectures to your tokeniser and implement a function, say\pcode{env}, that returns all assignments from a value (suchthat you can extract easily the tokens from a value).\medskip \noindentFinally give the tokens for your regular expressions from Q1 and thestring\begin{center}\code{"read n;"}\end{center} \noindentand use your \pcode{env} function to give the token sequence.\subsection*{Question 3 (marked with 1\%)}Extend your tokenizer from Q2 to also simplify regular expressionsafter each derivation step and rectify the computed values after eachinjection. Use this tokenizer to tokenize the programs inFigure~\ref{fib} and \ref{loop}. Give the tokens of theseprograms where whitespaces are filtered out.\begin{figure}[p]\begin{center}\mbox{\lstinputlisting[language=while]{../progs/fib.while}}\end{center}\caption{Fibonacci program in the WHILE language.\label{fib}}\end{figure}\begin{figure}[p]\begin{center}\mbox{\lstinputlisting[language=while]{../progs/loops.while}}\end{center}\caption{The three-nested-loops program in the WHILE language. Usually used for timing measurements.\label{loop}}\end{figure}\end{document}%%% Local Variables: %%% mode: latex%%% TeX-master: t%%% End: