afl-material: comparison handouts/ho04.tex

equal deleted inserted replaced

-:a259eec25156
+:d9c784c71305
 Lu introduce \emph{values}. A value will be the output of the
 algorithm whenever the regular expression matches the string.
 If the string does not match the string, an error will be
 raised. Since the first phase of the algorithm by Sulzmann \&
 Lu is identical to the derivative based matcher from the first
-coursework, the function $nullable$ will be used to decide
+coursework, the function \textit{nullable} will be used to
-whether as string is matched by a regular expression. If
+decide whether as string is matched by a regular expression.
-$nullable$ says yes, then values are constructed that reflect
+If \textit{nullable} says yes, then values are constructed
-how the regular expression matched the string.
+that reflect how the regular expression matched the string.
 The definitions for values is given below. They are shown
 together with the regular expressions $r$ to which
 they correspond:
 values, I have in my implementation the convention that
 regular expressions and values have the same name, except that
 regular expressions are written entirely with upper-case
 letters, while values just start with a single upper-case
 character and the rest are lower-case letters. My definition
-of regular expressions and values in Scala is shown below. I use
+of regular expressions and values in Scala is shown below. I
-this in the REPL of Scala; when I use the Scala compiler I
+use this in the REPL of Scala; when I use the Scala compiler I
-need to rename some constructors, because Scala on Macs does
+unfortunately need to rename some constructors, because Scala
-not like classes that are called \pcode{EMPTY} and
+on Macs does not like classes that are called \pcode{EMPTY}
-\pcode{Empty}.
+and \pcode{Empty}.
 {\small\lstinputlisting[language=Scala,numbers=none]
 {../progs/app01.scala}}
 \noindent If there had been a $\epsilon$ on the left, then
 $mkeps$ would have returned something of the form
 $Left(\ldots)$. The point is that from this value we can
 directly read off which part of $r_4$ matched the empty
 string: take the right-alternative first, and then the
-right-alternative again.
+right-alternative again. Remember $r_4$ is of the form
+\begin{center}
+$r_4$:\;$(\varnothing \cdot (b \cdot c)) +
+((\varnothing \cdot c) + \underline{\epsilon})$\\
+\end{center}
+\noindent the value tells us that the underlined $\epsilon$
+is responsible for matching the empty string.
 Next we have to ``inject'' the last character, that is $c$ in
 the running example, into this value $v_4$ in order to
 calculate how $r_3$ could have matched the string $c$.
 According to the definition of $inj$ we obtain
 simplifications in order end up with just $\epsilon$. However,
 it is possible to apply them in a depth-first, or inside-out,
 manner in order to calculate this simplified regular
 expression.
-The rectification we can implement by letting simp return
+The rectification we can implement by letting simp return not
-not just a (simplified) regular expression, but also a
+just a (simplified) regular expression, but also a
 rectification function. Let us consider the alternative case,
 $r_1 + r_2$, first. By going depth-first, we first simplify
 the component regular expressions $r_1$ and $r_2.$ This will
-return simplified versions (if they can be simplified), say
+return simplified versions, say $r_{1s}$ and $r_{2s}$, of the
-$r_{1s}$ and $r_{2s}$, but also two rectification functions
+component regular expressions (if they can be simplified) but
-$f_{1s}$ and $f_{2s}$. We need to assemble them in order to
+also two rectification functions $f_{1s}$ and $f_{2s}$. We
-obtain a rectified value for $r_1 + r_2$. In case $r_{1s}$
+need to assemble them in order to obtain a rectified value for
-simplified to $\varnothing$, we continue the derivative
+$r_1 + r_2$. In case $r_{1s}$ simplified to $\varnothing$, we
-calculation with $r_{2s}$. The Sulzmann \& Lu algorithm would
+continue the derivative calculation with $r_{2s}$. The
-return a corresponding value, say $v_{2s}$. But now this value
+Sulzmann \& Lu algorithm would return a corresponding value,
-needs to be ``rectified'' to the value
+say $v_{2s}$. But now this value needs to be ``rectified'' to
+the value
 \begin{center}
 $Right(v_{2s})$
 \end{center}
 $[(name:\texttt{christian.urban}),
 (domain:\texttt{kcl}),
 (top\_level:\texttt{ac.uk})]$
 \end{center}
-\noindent As you will see in the next lecture, this is now all
+Recall that we want to lex a little programming language,
-we need to tokenise an input string and classify each token.
+called the \emph{While}-language. The main keywords in this
+language are \pcode{while}, \pcode{if}, \pcode{then} and
+\pcode{else}. As usual we have identifiers, operators, numbers
+and so on. For this we would need to design the corresponding
+regular expressions to recognise these syntactic categories. I
+let you do this design task. Having these regular expressions
+at our disposal we can form the regular expression
+\begin{center}
+\begin{tabular}{rcl}
+\textit{WhileRegs} & $\dn$ & (($k$, KEYWORD) +\\
+&     & ($i$, ID) +\\
+&     & ($o$, OP) + \\
+&     & ($n$, NUM) + \\
+&     & ($s$, SEMI) + \\
+&     & ($p$, (LPAREN + RPAREN)) +\\
+&     & ($b$, (BEGIN + END)) + \\
+&     & ($w$, WHITESPACE))$^*$
+\end{tabular}
+\end{center}
 \end{document}
 %%% Local Variables:
 %%% mode: latex
 %%% TeX-master: t

changeset 356	d9c784c71305
parent 352	1e1b0fe66107
child 357	603e171a7b48