handouts/ho04.tex
changeset 326 94700593a2d5
parent 319 e7b110f93697
child 350 c4e7caa06c74
--- a/handouts/ho04.tex	Fri May 01 19:19:31 2015 +0100
+++ b/handouts/ho04.tex	Fri May 01 21:30:13 2015 +0100
@@ -207,11 +207,14 @@
 \end{center}
 
 \noindent The point is that from this value we can directly
-read off which part of $r_4$ matched the empty string. Next we
-have to ``inject'' the last character, that is $c$ in the
-running example, into this value $v_4$ in order to calculate
-how $r_3$ could have matched the string $c$. According to the
-definition of $inj$ we obtain
+read off which part of $r_4$ matched the empty string: take
+the right-alternative first, and then the right-alternative
+again. 
+
+Next we have to ``inject'' the last character, that is $c$ in
+the running example, into this value $v_4$ in order to
+calculate how $r_3$ could have matched the string $c$.
+According to the definition of $inj$ we obtain
 
 \begin{center}
 $v_3:\;Right(Seq(Empty, Char(c)))$
@@ -279,8 +282,8 @@
 each derivative step. But this is a bit more involved in the
 algorithm of Sulzmann \& Lu. So what follows might require you
 to read several times before it makes sense and also might
-require that you do some example calculations. As a first
-example consider the last derivation step in our earlier
+require that you do some example calculations yourself. As a
+first example consider the last derivation step in our earlier
 example:
 
 \begin{center}
@@ -293,7 +296,7 @@
 input, however, would then provide us with $Empty$ instead of
 $Right(Right(Empty))$ that was obtained without the
 simplification. The problem is we need to recreate this more
-complicated value, rather than just $Empty$.
+complicated value, rather than just return $Empty$.
 
 This will require what I call \emph{rectification functions}.
 They need to be calculated whenever a regular expression gets
@@ -319,7 +322,7 @@
 manner in order to calculate this simplified regular
 expression.
 
-The rectification we can implement this by letting simp return
+The rectification we can implement by letting simp return
 not just a (simplified) regular expression, but also a
 rectification function. Let us consider the alternative case,
 $r_1 + r_2$, first. By going depth-first, we first simplify
@@ -483,7 +486,7 @@
 string and therefore the second phase of the algorithm would
 never been called. The simplification function still expects
 us to give a function. So in my own implementation I just
-returned a function which raises an error. In the case
+returned a function that raises an error. In the case
 where $r_{1s} = \epsilon$ (similarly $r_{2s}$) we have
 to create a sequence where the first component is a rectified
 version of $Empty$. Therefore we call $f_{1s}$ with $Empty$.
@@ -553,72 +556,71 @@
 \end{tabular}
 \end{center}
 
-\noindent This corresponds to the $matches$ function we
-have seen in earlier lectures. In the first clause we are
-given an empty string, $[]$, and need to test wether the
-regular expression is $nullable$. If yes we can proceed
-normally and just return the value calculated by $mkeps$.
-The second clause is for strings where the first character is
-$c$ and the rest of the string is $s$. We first build the
-derivative of $r$ with respect to $c$; simplify the resulting 
-regulare expression. We continue lexing with the simplified
-regular expression and the string $s$. Whatever will be
-returned as value, we sill rectify using the $f_{rect}$
-from the simplification and finally inject $c$ back into
-the (rectified) value.
+\noindent This corresponds to the $matches$ function we have
+seen in earlier lectures. In the first clause we are given an
+empty string, $[]$, and need to test wether the regular
+expression is $nullable$. If yes, we can proceed normally and
+just return the value calculated by $mkeps$. The second clause
+is for strings where the first character is $c$, say, and the
+rest of the string is $s$. We first build the derivative of
+$r$ with respect to $c$; simplify the resulting regular
+expression. We continue lexing with the simplified regular
+expression and the string $s$. Whatever will be returned as
+value, we sill need to rectify using the $f_{rect}$ from the
+simplification and finally inject $c$ back into the
+(rectified) value.
 
 
 \subsubsection*{Records}
 
-Remember we want to tokenize input strings, that means
-splitting strings into their ``word'' components. But
-furthermore we want to classify each token as being a keyword
-or identifier and so on. For this one more feature will be
-required, which I call \emph{record}. While values record
-precisely how a regular expression matches a string, 
-records can be used to focus on some particular
-parts of the regular expression and forget about others.
-Let us look at an example. 
+Remember we wanted to tokenize input strings, that means
+splitting strings into their ``word'' components. Furthermore
+we want to classify each token as being a keyword or
+identifier and so on. For this one more feature will be
+required, which I call a \emph{record} regular expression.
+While values encode how a regular expression matches a string,
+records can be used to ``focus'' on some particular parts of
+the regular expression and ``forget'' about others. 
 
-Suppose you have the regular expression $ab + ac$. Clearly
-this regular expression can only recognise two strings. But
-suppose you are not interested whether it can recognise $ab$
-or $ac$, but rather if it matched, then what was the last
-character of the matched string\ldots either $b$ or $c$.
-You can do this by annotating the regular expression with
-a record, written $(x:r)$, where $x$ is just an identifier
-(in my implementation a plain string) and $r$ is a regular
-expression. A record will be regarded as a regular expression.
-The extended definition in Scala looks as follows:
+Let us look at an example. Suppose you have the regular
+expression $a\cdot b + a\cdot c$. Clearly this regular expression can only
+recognise two strings. But suppose you are not interested
+whether it can recognise $ab$ or $ac$, but rather if it
+matched, then what was the last character of the matched
+string\ldots either $b$ or $c$. You can do this by annotating
+the regular expression with a record, written in general
+$(x:r)$, where $x$ is just an identifier (in my implementation
+a plain string) and $r$ is a regular expression. A record will
+be regarded as a regular expression. The extended definition
+in Scala therefore looks as follows:
 
 {\small\lstinputlisting[language=Scala]
 {../progs/app03.scala}}
 
-\noindent Since we regard records as regular expressions
-we need to extend the functions $nullable$ and $der$. 
-Similarly $mkeps$ and $inj$ need to be extended and they 
-sometimes can return a particular value for records. 
-This means we also need to extend the definition of values.
-The extended definition in Scala looks as follows:
+\noindent Since we regard records as regular expressions we
+need to extend the functions $nullable$ and $der$. Similarly
+$mkeps$ and $inj$ need to be extended. This means we also need
+to extend the definition of values, which in Scala looks as
+follows:
 
 {\small\lstinputlisting[language=Scala]
 {../progs/app04.scala}}
 
 \noindent Let us now look at the purpose of records more
-closely and lets return to our question whether the string
+closely and let us return to our question whether the string
 terminated in a $b$ or $c$. We can do this as follows: we
 annotate the regular expression $ab + ac$ with a record
 as follows
 
 \begin{center}
-$a(x:b) + a(x:c)$
+$a\cdot (x:b) + a\cdot (x:c)$
 \end{center}
 
 \noindent This regular expression can still only recognise
 the strings $ab$ and $ac$, but we can now use a function
 that takes a value and returns all records. I call this
 function \emph{env} for environment\ldots it builds a list
-of identifiers associated with their string. This function
+of identifiers associated with a string. This function
 can be defined as follows:
 
 \begin{center}
@@ -634,21 +636,21 @@
 \end{tabular}
 \end{center}
 
-\noindent where in the last clause we use the flatten function 
-defined earlier. The function $env$ ``picks'' out all 
-underlying strings where a record is given. Since there can be 
-more than one, the environment will potentially contain
-many ``recordings''. If we now postprocess the value 
-calculated by $lex$ extracting all recordings using $env$, 
-we can answer the question whether the last element in the
-string was an $b$ or a $c$. Lets see this in action: if
-we use $ab + ac$ and $ac$ the calculated value will be
+\noindent where in the last clause we use the flatten function
+defined earlier. As can be seen, the function $env$ ``picks''
+out all underlying strings where a record is given. Since
+there can be more than one, the environment will potentially
+contain many ``records''. If we now postprocess the value
+calculated by $lex$ extracting all records using $env$, we can
+answer the question whether the last element in the string was
+an $b$ or a $c$. Lets see this in action: if we use $a\cdot b
++ a\cdot c$ and $ac$ the calculated value will be
 
 \begin{center}
 $Right(Seq(Char(a), Char(c)))$
 \end{center}
 
-\noindent If we use instead $a(x:b) + a(x:c)$ and
+\noindent If we use instead $a\cdot (x:b) + a\cdot (x:c)$ and
 use the $env$ function to extract the recording for 
 $x$ we obtain
 
@@ -661,7 +663,7 @@
 iterate this. Consider the regular expression 
 
 \begin{center}
-$(a(x:b) + a(y:c))^*$
+$(a\cdot (x:b) + a\cdot (y:c))^*$
 \end{center}
 
 \noindent and the string $ababacabacab$. This string is