--- a/handouts/ho04.tex Sat Oct 17 11:24:41 2015 +0100
+++ b/handouts/ho04.tex Mon Oct 19 14:45:50 2015 +0100
@@ -29,10 +29,10 @@
If the string does not match the string, an error will be
raised. Since the first phase of the algorithm by Sulzmann \&
Lu is identical to the derivative based matcher from the first
-coursework, the function $nullable$ will be used to decide
-whether as string is matched by a regular expression. If
-$nullable$ says yes, then values are constructed that reflect
-how the regular expression matched the string.
+coursework, the function \textit{nullable} will be used to
+decide whether as string is matched by a regular expression.
+If \textit{nullable} says yes, then values are constructed
+that reflect how the regular expression matched the string.
The definitions for values is given below. They are shown
together with the regular expressions $r$ to which
@@ -83,11 +83,11 @@
regular expressions are written entirely with upper-case
letters, while values just start with a single upper-case
character and the rest are lower-case letters. My definition
-of regular expressions and values in Scala is shown below. I use
-this in the REPL of Scala; when I use the Scala compiler I
-need to rename some constructors, because Scala on Macs does
-not like classes that are called \pcode{EMPTY} and
-\pcode{Empty}.
+of regular expressions and values in Scala is shown below. I
+use this in the REPL of Scala; when I use the Scala compiler I
+unfortunately need to rename some constructors, because Scala
+on Macs does not like classes that are called \pcode{EMPTY}
+and \pcode{Empty}.
{\small\lstinputlisting[language=Scala,numbers=none]
{../progs/app01.scala}}
@@ -223,7 +223,15 @@
$Left(\ldots)$. The point is that from this value we can
directly read off which part of $r_4$ matched the empty
string: take the right-alternative first, and then the
-right-alternative again.
+right-alternative again. Remember $r_4$ is of the form
+
+\begin{center}
+$r_4$:\;$(\varnothing \cdot (b \cdot c)) +
+ ((\varnothing \cdot c) + \underline{\epsilon})$\\
+\end{center}
+
+\noindent the value tells us that the underlined $\epsilon$
+is responsible for matching the empty string.
Next we have to ``inject'' the last character, that is $c$ in
the running example, into this value $v_4$ in order to
@@ -341,19 +349,20 @@
manner in order to calculate this simplified regular
expression.
-The rectification we can implement by letting simp return
-not just a (simplified) regular expression, but also a
+The rectification we can implement by letting simp return not
+just a (simplified) regular expression, but also a
rectification function. Let us consider the alternative case,
$r_1 + r_2$, first. By going depth-first, we first simplify
the component regular expressions $r_1$ and $r_2.$ This will
-return simplified versions (if they can be simplified), say
-$r_{1s}$ and $r_{2s}$, but also two rectification functions
-$f_{1s}$ and $f_{2s}$. We need to assemble them in order to
-obtain a rectified value for $r_1 + r_2$. In case $r_{1s}$
-simplified to $\varnothing$, we continue the derivative
-calculation with $r_{2s}$. The Sulzmann \& Lu algorithm would
-return a corresponding value, say $v_{2s}$. But now this value
-needs to be ``rectified'' to the value
+return simplified versions, say $r_{1s}$ and $r_{2s}$, of the
+component regular expressions (if they can be simplified) but
+also two rectification functions $f_{1s}$ and $f_{2s}$. We
+need to assemble them in order to obtain a rectified value for
+$r_1 + r_2$. In case $r_{1s}$ simplified to $\varnothing$, we
+continue the derivative calculation with $r_{2s}$. The
+Sulzmann \& Lu algorithm would return a corresponding value,
+say $v_{2s}$. But now this value needs to be ``rectified'' to
+the value
\begin{center}
$Right(v_{2s})$
@@ -720,8 +729,29 @@
(top\_level:\texttt{ac.uk})]$
\end{center}
-\noindent As you will see in the next lecture, this is now all
-we need to tokenise an input string and classify each token.
+Recall that we want to lex a little programming language,
+called the \emph{While}-language. The main keywords in this
+language are \pcode{while}, \pcode{if}, \pcode{then} and
+\pcode{else}. As usual we have identifiers, operators, numbers
+and so on. For this we would need to design the corresponding
+regular expressions to recognise these syntactic categories. I
+let you do this design task. Having these regular expressions
+at our disposal we can form the regular expression
+
+\begin{center}
+\begin{tabular}{rcl}
+\textit{WhileRegs} & $\dn$ & (($k$, KEYWORD) +\\
+ & & ($i$, ID) +\\
+ & & ($o$, OP) + \\
+ & & ($n$, NUM) + \\
+ & & ($s$, SEMI) + \\
+ & & ($p$, (LPAREN + RPAREN)) +\\
+ & & ($b$, (BEGIN + END)) + \\
+ & & ($w$, WHITESPACE))$^*$
+\end{tabular}
+\end{center}
+
+
\end{document}
%%% Local Variables: