% !TEX program = xelatex\documentclass{article}\usepackage{../style}\usepackage{../langs}\begin{document}\section*{Coursework 2 (Strand 1)}\noindent This coursework is worth 5\% and is due on 4November at 18:00. You are asked to implement the Sulzmann \&Lu lexer for the WHILE language. You can do theimplementation in any programming language you like, but youneed to submit the source code with which you answered thequestions, otherwise a mark of 0\% will be awarded. You cansubmit your answers in a txt-file or as pdf. Code submit as code.\subsection*{Disclaimer}It should be understood that the work you submit representsyour own effort. You have not copied from anyone else. Anexception is the Scala code from KEATS and the code I showedduring the lectures, which you can both freely use. You canalso use your own code from the CW~1.\subsection*{Question 1}To implement a lexer for the WHILE language, you firstneed to design the appropriate regular expressions for thefollowing eight syntactic entities:\begin{enumerate}\item keywords are\begin{quote}\texttt{while}, \texttt{if}, \texttt{then}, \texttt{else}, \texttt{do}, \texttt{for}, \texttt{to}, \texttt{true}, \texttt{false}, \texttt{read}, \texttt{write},\texttt{skip}\end{quote} \item operators are\begin{quote}\texttt{+}, \texttt{-}, \texttt{*}, \texttt{\%},\texttt{/},\texttt{==}, \texttt{!=}, \texttt{>}, \texttt{<}, \texttt{:=},\texttt{\&\&},\texttt{||}\end{quote} \item strings are enclosed by \texttt{"\ldots"} \item parentheses are \texttt{(}, \texttt{\{}, \texttt{)} and \texttt{\}}\item there are semicolons \texttt{;}\item whitespaces are either \texttt{" "} (one or more) or \texttt{$\backslash$n} or \texttt{$\backslash$t}\item identifiers are letters followed by underscores \texttt{\_\!\_}, lettersor digits\item numbers are \pcode{0}, \pcode{1}, \ldots and so on; give a regular expression that can recognise \pcode{0}, but not numbers with leading zeroes, such as \pcode{001}\end{enumerate}\noindentYou can use the basic regular expressions \[\ZERO,\; \ONE,\; c,\; r_1 + r_2,\; r_1 \cdot r_2,\; r^*\]\noindentbut also the following extended regular expressions\begin{center}\begin{tabular}{ll}$[c_1,c_2,\ldots,c_n]$ & a set of characters\\$r^+$ & one or more times $r$\\$r^?$ & optional $r$\\$r^{\{n\}}$ & n-times $r$\\\end{tabular}\end{center}\noindentLater on you will also need the record regular expression:\begin{center}\begin{tabular}{ll}$REC(x:r)$ & record regular expression\\\end{tabular}\end{center}\noindent Try to design your regular expressions to be assmall as possible. For example you should use character setsfor identifiers and numbers. Feel free to use the generalcharacter constructor \textit{CFUN} introduced in CW 1.\subsection*{Question 2}Implement the Sulzmann \& Lu lexer from the lectures. Forthis you need to implement the functions $nullable$ and $der$(you can use your code from CW~1), as well as $mkeps$ and$inj$. These functions need to be appropriately extended forthe extended regular expressions from Q1. Write down the clauses for\begin{center}\begin{tabular}{@ {}l@ {\hspace{2mm}}c@ {\hspace{2mm}}l@ {}}$mkeps([c_1,c_2,\ldots,c_n])$ & $\dn$ & $?$\\$mkeps(r^+)$ & $\dn$ & $?$\\$mkeps(r^?)$ & $\dn$ & $?$\\$mkeps(r^{\{n\}})$ & $\dn$ & $?$\medskip\\$inj\, ([c_1,c_2,\ldots,c_n])\,c\,\ldots$ & $\dn$ & $?$\\$inj\, (r^+)\,c\,\ldots$ & $\dn$ & $?$\\$inj\, (r^?)\,c\,\ldots$ & $\dn$ & $?$\\$inj\, (r^{\{n\}})\,c\,\ldots$ & $\dn$ & $?$\\\end{tabular}\end{center}\noindent where $inj$ takes three arguments: a regularexpression, a character and a value. Test your lexer codewith at least the two small examples below:\begin{center}\begin{tabular}{ll}regex: & string:\smallskip\\$a^{\{3\}}$ & $aaa$\\$(a + \ONE)^{\{3\}}$ & $aa$\end{tabular}\end{center}\noindent Both strings should be successfully lexed by therespective regular expression, that means the lexer returns in both examples a value.Also add the record regular expression from thelectures to your lexer and implement a function, say\pcode{env}, that returns all assignments from a value (suchthat you can extract easily the tokens from a value).\medskip \noindentFinally give the tokens for your regular expressions from Q1 and thestring\begin{center}\code{"read n;"}\end{center} \noindentand use your \pcode{env} function to give the token sequence.\subsection*{Question 3}Extend your lexer from Q2 to also simplify regular expressionsafter each derivation step and rectify the computed values after eachinjection. Use this lexer to tokenize the programs inFigure~\ref{fib} and \ref{loop}. Give the tokens of theseprograms where whitespaces are filtered out.\bigskip\begin{figure}[h]\mbox{\lstinputlisting[language=While,xleftmargin=10mm]{../progs/fib.while}}\caption{Fibonacci program in the WHILE language.\label{fib}}\end{figure}\begin{figure}[h]\mbox{\lstinputlisting[language=While,xleftmargin=10mm]{../progs/loops.while}}\caption{The three-nested-loops program in the WHILE language. (Usually used for timing measurements.)\label{loop}}\end{figure}\end{document}%%% Local Variables: %%% mode: latex%%% TeX-master: t%%% End: