handouts/ho04.tex
changeset 356 d9c784c71305
parent 352 1e1b0fe66107
child 357 603e171a7b48
--- a/handouts/ho04.tex	Sat Oct 17 11:24:41 2015 +0100
+++ b/handouts/ho04.tex	Mon Oct 19 14:45:50 2015 +0100
@@ -29,10 +29,10 @@
 If the string does not match the string, an error will be
 raised. Since the first phase of the algorithm by Sulzmann \&
 Lu is identical to the derivative based matcher from the first
-coursework, the function $nullable$ will be used to decide
-whether as string is matched by a regular expression. If
-$nullable$ says yes, then values are constructed that reflect
-how the regular expression matched the string. 
+coursework, the function \textit{nullable} will be used to
+decide whether as string is matched by a regular expression.
+If \textit{nullable} says yes, then values are constructed
+that reflect how the regular expression matched the string. 
 
 The definitions for values is given below. They are shown 
 together with the regular expressions $r$ to which
@@ -83,11 +83,11 @@
 regular expressions are written entirely with upper-case
 letters, while values just start with a single upper-case
 character and the rest are lower-case letters. My definition
-of regular expressions and values in Scala is shown below. I use
-this in the REPL of Scala; when I use the Scala compiler I
-need to rename some constructors, because Scala on Macs does
-not like classes that are called \pcode{EMPTY} and
-\pcode{Empty}.
+of regular expressions and values in Scala is shown below. I
+use this in the REPL of Scala; when I use the Scala compiler I
+unfortunately need to rename some constructors, because Scala
+on Macs does not like classes that are called \pcode{EMPTY}
+and \pcode{Empty}.
  
  {\small\lstinputlisting[language=Scala,numbers=none]
 {../progs/app01.scala}}
@@ -223,7 +223,15 @@
 $Left(\ldots)$. The point is that from this value we can
 directly read off which part of $r_4$ matched the empty
 string: take the right-alternative first, and then the
-right-alternative again. 
+right-alternative again. Remember $r_4$ is of the form
+
+\begin{center}
+$r_4$:\;$(\varnothing \cdot (b \cdot c)) + 
+         ((\varnothing \cdot c) + \underline{\epsilon})$\\
+\end{center}
+
+\noindent the value tells us that the underlined $\epsilon$
+is responsible for matching the empty string.
 
 Next we have to ``inject'' the last character, that is $c$ in
 the running example, into this value $v_4$ in order to
@@ -341,19 +349,20 @@
 manner in order to calculate this simplified regular
 expression.
 
-The rectification we can implement by letting simp return
-not just a (simplified) regular expression, but also a
+The rectification we can implement by letting simp return not
+just a (simplified) regular expression, but also a
 rectification function. Let us consider the alternative case,
 $r_1 + r_2$, first. By going depth-first, we first simplify
 the component regular expressions $r_1$ and $r_2.$ This will
-return simplified versions (if they can be simplified), say
-$r_{1s}$ and $r_{2s}$, but also two rectification functions
-$f_{1s}$ and $f_{2s}$. We need to assemble them in order to
-obtain a rectified value for $r_1 + r_2$. In case $r_{1s}$
-simplified to $\varnothing$, we continue the derivative
-calculation with $r_{2s}$. The Sulzmann \& Lu algorithm would
-return a corresponding value, say $v_{2s}$. But now this value
-needs to be ``rectified'' to the value 
+return simplified versions, say $r_{1s}$ and $r_{2s}$, of the
+component regular expressions (if they can be simplified) but
+also two rectification functions $f_{1s}$ and $f_{2s}$. We
+need to assemble them in order to obtain a rectified value for
+$r_1 + r_2$. In case $r_{1s}$ simplified to $\varnothing$, we
+continue the derivative calculation with $r_{2s}$. The
+Sulzmann \& Lu algorithm would return a corresponding value,
+say $v_{2s}$. But now this value needs to be ``rectified'' to
+the value 
 
 \begin{center}
 $Right(v_{2s})$
@@ -720,8 +729,29 @@
   (top\_level:\texttt{ac.uk})]$
 \end{center}
 
-\noindent As you will see in the next lecture, this is now all
-we need to tokenise an input string and classify each token.
+Recall that we want to lex a little programming language,
+called the \emph{While}-language. The main keywords in this
+language are \pcode{while}, \pcode{if}, \pcode{then} and
+\pcode{else}. As usual we have identifiers, operators, numbers
+and so on. For this we would need to design the corresponding
+regular expressions to recognise these syntactic categories. I
+let you do this design task. Having these regular expressions
+at our disposal we can form the regular expression
+
+\begin{center}
+\begin{tabular}{rcl}
+\textit{WhileRegs} & $\dn$ & (($k$, KEYWORD) +\\
+                   &     & ($i$, ID) +\\
+                   &     & ($o$, OP) + \\
+                   &     & ($n$, NUM) + \\
+                   &     & ($s$, SEMI) + \\
+                   &     & ($p$, (LPAREN + RPAREN)) +\\ 
+                   &     & ($b$, (BEGIN + END)) + \\
+                   &     & ($w$, WHITESPACE))$^*$
+\end{tabular}
+\end{center}
+
+
 \end{document}
 
 %%% Local Variables: