--- a/handouts/ho04.tex Fri May 01 19:19:31 2015 +0100
+++ b/handouts/ho04.tex Fri May 01 21:30:13 2015 +0100
@@ -207,11 +207,14 @@
\end{center}
\noindent The point is that from this value we can directly
-read off which part of $r_4$ matched the empty string. Next we
-have to ``inject'' the last character, that is $c$ in the
-running example, into this value $v_4$ in order to calculate
-how $r_3$ could have matched the string $c$. According to the
-definition of $inj$ we obtain
+read off which part of $r_4$ matched the empty string: take
+the right-alternative first, and then the right-alternative
+again.
+
+Next we have to ``inject'' the last character, that is $c$ in
+the running example, into this value $v_4$ in order to
+calculate how $r_3$ could have matched the string $c$.
+According to the definition of $inj$ we obtain
\begin{center}
$v_3:\;Right(Seq(Empty, Char(c)))$
@@ -279,8 +282,8 @@
each derivative step. But this is a bit more involved in the
algorithm of Sulzmann \& Lu. So what follows might require you
to read several times before it makes sense and also might
-require that you do some example calculations. As a first
-example consider the last derivation step in our earlier
+require that you do some example calculations yourself. As a
+first example consider the last derivation step in our earlier
example:
\begin{center}
@@ -293,7 +296,7 @@
input, however, would then provide us with $Empty$ instead of
$Right(Right(Empty))$ that was obtained without the
simplification. The problem is we need to recreate this more
-complicated value, rather than just $Empty$.
+complicated value, rather than just return $Empty$.
This will require what I call \emph{rectification functions}.
They need to be calculated whenever a regular expression gets
@@ -319,7 +322,7 @@
manner in order to calculate this simplified regular
expression.
-The rectification we can implement this by letting simp return
+The rectification we can implement by letting simp return
not just a (simplified) regular expression, but also a
rectification function. Let us consider the alternative case,
$r_1 + r_2$, first. By going depth-first, we first simplify
@@ -483,7 +486,7 @@
string and therefore the second phase of the algorithm would
never been called. The simplification function still expects
us to give a function. So in my own implementation I just
-returned a function which raises an error. In the case
+returned a function that raises an error. In the case
where $r_{1s} = \epsilon$ (similarly $r_{2s}$) we have
to create a sequence where the first component is a rectified
version of $Empty$. Therefore we call $f_{1s}$ with $Empty$.
@@ -553,72 +556,71 @@
\end{tabular}
\end{center}
-\noindent This corresponds to the $matches$ function we
-have seen in earlier lectures. In the first clause we are
-given an empty string, $[]$, and need to test wether the
-regular expression is $nullable$. If yes we can proceed
-normally and just return the value calculated by $mkeps$.
-The second clause is for strings where the first character is
-$c$ and the rest of the string is $s$. We first build the
-derivative of $r$ with respect to $c$; simplify the resulting
-regulare expression. We continue lexing with the simplified
-regular expression and the string $s$. Whatever will be
-returned as value, we sill rectify using the $f_{rect}$
-from the simplification and finally inject $c$ back into
-the (rectified) value.
+\noindent This corresponds to the $matches$ function we have
+seen in earlier lectures. In the first clause we are given an
+empty string, $[]$, and need to test wether the regular
+expression is $nullable$. If yes, we can proceed normally and
+just return the value calculated by $mkeps$. The second clause
+is for strings where the first character is $c$, say, and the
+rest of the string is $s$. We first build the derivative of
+$r$ with respect to $c$; simplify the resulting regular
+expression. We continue lexing with the simplified regular
+expression and the string $s$. Whatever will be returned as
+value, we sill need to rectify using the $f_{rect}$ from the
+simplification and finally inject $c$ back into the
+(rectified) value.
\subsubsection*{Records}
-Remember we want to tokenize input strings, that means
-splitting strings into their ``word'' components. But
-furthermore we want to classify each token as being a keyword
-or identifier and so on. For this one more feature will be
-required, which I call \emph{record}. While values record
-precisely how a regular expression matches a string,
-records can be used to focus on some particular
-parts of the regular expression and forget about others.
-Let us look at an example.
+Remember we wanted to tokenize input strings, that means
+splitting strings into their ``word'' components. Furthermore
+we want to classify each token as being a keyword or
+identifier and so on. For this one more feature will be
+required, which I call a \emph{record} regular expression.
+While values encode how a regular expression matches a string,
+records can be used to ``focus'' on some particular parts of
+the regular expression and ``forget'' about others.
-Suppose you have the regular expression $ab + ac$. Clearly
-this regular expression can only recognise two strings. But
-suppose you are not interested whether it can recognise $ab$
-or $ac$, but rather if it matched, then what was the last
-character of the matched string\ldots either $b$ or $c$.
-You can do this by annotating the regular expression with
-a record, written $(x:r)$, where $x$ is just an identifier
-(in my implementation a plain string) and $r$ is a regular
-expression. A record will be regarded as a regular expression.
-The extended definition in Scala looks as follows:
+Let us look at an example. Suppose you have the regular
+expression $a\cdot b + a\cdot c$. Clearly this regular expression can only
+recognise two strings. But suppose you are not interested
+whether it can recognise $ab$ or $ac$, but rather if it
+matched, then what was the last character of the matched
+string\ldots either $b$ or $c$. You can do this by annotating
+the regular expression with a record, written in general
+$(x:r)$, where $x$ is just an identifier (in my implementation
+a plain string) and $r$ is a regular expression. A record will
+be regarded as a regular expression. The extended definition
+in Scala therefore looks as follows:
{\small\lstinputlisting[language=Scala]
{../progs/app03.scala}}
-\noindent Since we regard records as regular expressions
-we need to extend the functions $nullable$ and $der$.
-Similarly $mkeps$ and $inj$ need to be extended and they
-sometimes can return a particular value for records.
-This means we also need to extend the definition of values.
-The extended definition in Scala looks as follows:
+\noindent Since we regard records as regular expressions we
+need to extend the functions $nullable$ and $der$. Similarly
+$mkeps$ and $inj$ need to be extended. This means we also need
+to extend the definition of values, which in Scala looks as
+follows:
{\small\lstinputlisting[language=Scala]
{../progs/app04.scala}}
\noindent Let us now look at the purpose of records more
-closely and lets return to our question whether the string
+closely and let us return to our question whether the string
terminated in a $b$ or $c$. We can do this as follows: we
annotate the regular expression $ab + ac$ with a record
as follows
\begin{center}
-$a(x:b) + a(x:c)$
+$a\cdot (x:b) + a\cdot (x:c)$
\end{center}
\noindent This regular expression can still only recognise
the strings $ab$ and $ac$, but we can now use a function
that takes a value and returns all records. I call this
function \emph{env} for environment\ldots it builds a list
-of identifiers associated with their string. This function
+of identifiers associated with a string. This function
can be defined as follows:
\begin{center}
@@ -634,21 +636,21 @@
\end{tabular}
\end{center}
-\noindent where in the last clause we use the flatten function
-defined earlier. The function $env$ ``picks'' out all
-underlying strings where a record is given. Since there can be
-more than one, the environment will potentially contain
-many ``recordings''. If we now postprocess the value
-calculated by $lex$ extracting all recordings using $env$,
-we can answer the question whether the last element in the
-string was an $b$ or a $c$. Lets see this in action: if
-we use $ab + ac$ and $ac$ the calculated value will be
+\noindent where in the last clause we use the flatten function
+defined earlier. As can be seen, the function $env$ ``picks''
+out all underlying strings where a record is given. Since
+there can be more than one, the environment will potentially
+contain many ``records''. If we now postprocess the value
+calculated by $lex$ extracting all records using $env$, we can
+answer the question whether the last element in the string was
+an $b$ or a $c$. Lets see this in action: if we use $a\cdot b
++ a\cdot c$ and $ac$ the calculated value will be
\begin{center}
$Right(Seq(Char(a), Char(c)))$
\end{center}
-\noindent If we use instead $a(x:b) + a(x:c)$ and
+\noindent If we use instead $a\cdot (x:b) + a\cdot (x:c)$ and
use the $env$ function to extract the recording for
$x$ we obtain
@@ -661,7 +663,7 @@
iterate this. Consider the regular expression
\begin{center}
-$(a(x:b) + a(y:c))^*$
+$(a\cdot (x:b) + a\cdot (y:c))^*$
\end{center}
\noindent and the string $ababacabacab$. This string is