afl-material: coursework/cw02.tex@4b454a6d1814

\documentclass{article}
\usepackage{../style}
\usepackage{../langs}

\begin{document}

\section*{Coursework 2 (Strand 1)}

\noindent This coursework is worth 5\% and is due on 6
November at 16:00. You are asked to implement the Sulzmann \&
Lu tokeniser for the WHILE language. You can do the
implementation in any programming language you like, but you
need to submit the source code with which you answered the
questions, otherwise a mark of 0\% will be awarded. You can
submit your answers in a txt-file or as pdf.

\subsection*{Disclaimer}

It should be understood that the work you submit represents
your own effort. You have not copied from anyone else. An
exception is the Scala code from KEATS and the code I showed
during the lectures, which you can both use. You can also use
your own code from the CW~1.

\subsection*{Question 1 (marked with 1\%)}

To implement a tokeniser for the WHILE language, you first
need to design the appropriate regular expressions for the
following eight syntactic entities:

\begin{enumerate}
\item keywords are

\begin{quote}
\texttt{while}, 
\texttt{if}, 
\texttt{then}, 
\texttt{else}, 
\texttt{do}, 
\texttt{for}, 
\texttt{to}, 
\texttt{true}, 
\texttt{false}, 
\texttt{read}, 
\texttt{write},
\texttt{skip}
\end{quote} 

\item operators are

\begin{quote}
\texttt{+}, 
\texttt{-}, 
\texttt{*}, 
\texttt{\%},
\texttt{/},
\texttt{==}, 
\texttt{!=}, 
\texttt{>}, 
\texttt{<}, 
\texttt{:=},
\texttt{\&\&},
\texttt{||}
\end{quote} 

\item strings are enclosed by \texttt{"\ldots"} 
\item parentheses are \texttt{(}, \texttt{\{}, \texttt{)} and \texttt{\}}
\item there are semicolons \texttt{;}
\item whitespaces are either \texttt{" "} (one or more) or \texttt{$\backslash$n}
\item identifiers are letters followed by underscores \texttt{\_\!\_}, letters
or digits
\item numbers are \pcode{0}, \pcode{1}, \ldots and so on; give 
a regular expression that can recognise \pcode{0}, but not numbers 
with leading zeroes, such as \pcode{001}
\end{enumerate}

\noindent
You can use the basic regular expressions 

\[
\varnothing, \epsilon, c, r_1 + r_2, r_1 \cdot r_2, r^*
\]

\noindent
but also the following extended regular expressions

\begin{center}
\begin{tabular}{ll}
$[c_1 c_2 \ldots c_n]$ & a range of characters\\
$r^+$ & one or more times $r$\\
$r^?$ & optional $r$\\
$r^{\{n\}}$ & n-times $r$\\
\end{tabular}
\end{center}

\noindent Try to design your regular expressions to be as
small as possible. For example you should use character ranges
for identifiers and numbers.

\subsection*{Question 2 (marked with 3\%)}

Implement the Sulzmann \& Lu tokeniser from the lectures. For
this you need to implement the functions $nullable$ and $der$
(you can use your code from CW~1), as well as $mkeps$ and
$inj$. These functions need to be appropriately extended for
the extended regular expressions from Q1. Write down the 
clauses for

\begin{center}
\begin{tabular}{@ {}l@ {\hspace{2mm}}c@ {\hspace{2mm}}l@ {}}
$mkeps([c_1 c_2 \ldots c_n])$  & $\dn$ & $?$\\
$mkeps(r^+)$                   & $\dn$ & $?$\\
$mkeps(r^?)$                   & $\dn$ & $?$\\
$mkeps(r^{\{n\}})$             & $\dn$ & $?$\medskip\\
$inj\, ([c_1 c_2 \ldots c_n])\,c\,\ldots$  & $\dn$ & $?$\\
$inj\, (r^+)\,c\,\ldots$                   & $\dn$ & $?$\\
$inj\, (r^?)\,c\,\ldots$                   & $\dn$ & $?$\\
$inj\, (r^{\{n\}})\,c\,\ldots$             & $\dn$ & $?$\\
\end{tabular}
\end{center}

\noindent where $inj$ takes three arguments: a regular
expression, a character and a value. Test your lexer code
with at least the two small examples below:

\begin{center}
\begin{tabular}{ll}
regex: & string:\smallskip\\
$a^{\{3\}}$ & $aaa$\\
$(a + \epsilon)^{\{3\}}$ & $aa$
\end{tabular}
\end{center}


\noindent Both strings should be sucessfully lexed by the
respective regular expression, that means the lexer returns 
in both examples a value.


Also add the record regular expression from the
lectures to your tokeniser and implement a function, say
\pcode{env}, that returns all assignments from a value (such
that you can extract easily the tokens from a value).\medskip 

\noindent
Finally give the tokens for your regular expressions from Q1 and the
string

\begin{center}
\code{"read n;"}
\end{center} 

\noindent
and use your \pcode{env} function to give the token sequence.


\subsection*{Question 3 (marked with 1\%)}

Extend your tokenizer from Q2 to also simplify regular expressions
after each derivation step and rectify the computed values after each
injection. Use this tokenizer to tokenize the programs in
Figure~\ref{fib} and \ref{loop}. Give the tokens of these
programs where whitespaces are filtered out.


\begin{figure}[p]
\begin{center}
\mbox{\lstinputlisting[language=while]{../progs/fib.while}}
\end{center}
\caption{Fibonacci program in the WHILE language.\label{fib}}
\end{figure}

\begin{figure}[p]
\begin{center}
\mbox{\lstinputlisting[language=while]{../progs/loops.while}}
\end{center}
\caption{The three-nested-loops program in the WHILE language. 
Usually used for timing measurements.\label{loop}}
\end{figure}

\end{document}

%%% Local Variables: 
%%% mode: latex
%%% TeX-master: t
%%% End:
author	Christian Urban <christian dot urban at kcl dot ac dot uk>
	Sun, 21 Aug 2016 00:37:27 +0100 (2016-08-20)
changeset 407	4b454a6d1814
parent 396	4cd75c619e06
child 419	4110ab35e5d8
permissions	-rw-r--r--