added
authorChristian Urban <christian dot urban at kcl dot ac dot uk>
Sat, 05 Oct 2013 00:36:53 +0100
changeset 125 39c75cf4e079
parent 124 dd8b5a3dac0a
child 126 7c7185cb4f2b
added
handouts/ho02.pdf
handouts/ho02.tex
Binary file handouts/ho02.pdf has changed
--- a/handouts/ho02.tex	Fri Oct 04 15:55:42 2013 +0100
+++ b/handouts/ho02.tex	Sat Oct 05 00:36:53 2013 +0100
@@ -90,7 +90,96 @@
 which will take a regular expression, say $r$, and a character, say $c$, as argument and return 
 a new regular expression. Beware that the intuition behind this function is not so easy to grasp on first
 reading. Essentially this function solves the following problem: if $r$ can match a string of the form
-$c\!::\!s$, what does the regular expression look like that can match just $s$.
+$c\!::\!s$, what does the regular expression look like that can match just $s$. The definition of this
+function is as follows:
+
+\begin{center}
+\begin{tabular}{@ {}l@ {\hspace{2mm}}c@ {\hspace{2mm}}l@ {\hspace{-10mm}}l@ {}}
+  $der\, c\, (\varnothing)$      & $\dn$ & $\varnothing$ & \\
+  $der\, c\, (\epsilon)$           & $\dn$ & $\varnothing$ & \\
+  $der\, c\, (d)$                     & $\dn$ & if $c = d$ then $\epsilon$ else $\varnothing$ & \\
+  $der\, c\, (r_1 + r_2)$        & $\dn$ & $der\, c\, r_1 + der\, c\, r_2$ & \\
+  $der\, c\, (r_1 \cdot r_2)$  & $\dn$  & if $nullable (r_1)$\\
+  & & then $(der\,c\,r_1) \cdot r_2 + der\, c\, r_2$\\ 
+  & & else $(der\, c\, r_1) \cdot r_2$\\
+  $der\, c\, (r^*)$          & $\dn$ & $(der\,c\,r) \cdot (r^*)$ &
+  \end{tabular}
+\end{center}
+
+\noindent
+The first two clauses can be rationalised as follows: recall that $der$ should calculate a regular
+expression, if the ``input'' regular expression can match a string of the form $c\!::\!s$. Since neither
+$\varnothing$ nor $\epsilon$ can match such a string we return $\varnothing$. In the third case
+we have to make a case-distinction: In case the regular expression is $c$, then clearly it can recognise
+a string of the form $c\!::\!s$, just that $s$ is the empty string. Therefore we return the $\epsilon$-regular 
+expression. In the other case we again return $\varnothing$ since no string of the $c\!::\!s$ can be matched.
+The $+$-case is relatively straightforward: all strings of the form $c\!::\!s$ are either matched by the
+regular expression $r_1$ or $r_2$. So we just have to recursively call $der$ with these two regular
+expressions and compose the results again with $+$. The $\cdot$-case is more complicated:
+if $r_1\cdot r_2$ matches a string of the form $c\!::\!s$, then the first part must be matched by $r_1$.
+Consequently, it makes sense to construct the regular expression for $s$ by calling $der$ with $r_1$ and
+``appending'' $r_2$. There is however one exception to this simple rule: if $r_1$ can match the empty
+string, then all of $c\!::\!s$ is matched by $r_2$. So in case $r_1$ is nullable (that is can match the
+empty string) we have to allow the choice $der\,c\,r_2$ for calculating the regular expression that can match 
+$s$. The $*$-case is again simple: if $r^*$ matches a string of the form $c\!::\!s$, then the first part must be
+``matched'' by a single copy of $r$. Therefore we call recursively $der\,c\,r$ and ``append'' $r^*$ in order to 
+match the rest of $s$.
+
+Another way to rationalise the definition of $der$ is to consider the following operation on sets:
+
+\[
+Der\,c\,A\;\dn\;\{s\,|\,c\!::\!s \in A\}
+\]
+
+\noindent
+which essentially transforms a set of strings $A$ by filtering out all strings that do not start with $c$ and then
+strip off the $c$ from all the remaining strings. For example suppose $A = \{"f\!oo", "bar", "f\!rak"\}$ then
+\[
+Der\,f\,A = \{"oo", "rak"\}\quad,\quad
+Der\,b\,A = \{"ar"\}  \quad \text{and} \quad
+Der\,a\,A = \varnothing
+\]
+ 
+\noindent
+Note that in the last case $Der$ is empty, because no string in $A$ starts with $a$. With this operation we can
+state the following property about $der$:
+
+\[
+L(der\,c\,r) = Der\,c\,(L(r))
+\]
+
+\noindent
+This property clarifies what regular expression $der$ calculates, namely take the set of strings
+that $r$ can match ($L(r)$), filter out all strings not starting with $c$ and strip off the $c$ from the
+remaining strings---this is exactly the language that $der\,c\,r$ can match.
+
+For our matching algorithm we need to lift the notion of derivatives from characters to strings. This can be
+done using the following function, taking a string and regular expression as input and a regular expression 
+as output.
+
+\begin{center}
+\begin{tabular}{@ {}l@ {\hspace{2mm}}c@ {\hspace{2mm}}l@ {\hspace{-10mm}}l@ {}}
+  $der\!s\, []\, r$     & $\dn$ & $r$ & \\
+  $der\!s\, (c\!::\!s)\, r$ & $\dn$ & $der\!s\,s\,(der\,c\,r)$ & \\
+  \end{tabular}
+\end{center}
+
+\noindent
+Having $ders$ in place, we can finally define our matching algorithm:
+
+\[
+match\,s\,r = nullable(ders\,s\,r)
+\]
+
+\noindent
+We claim that
+
+\[
+match\,s\,r\quad\text{if and only if}\quad s\in L(r)
+\]
+
+\noindent
+holds, which means our algorithm satisfies the specification.
 
 \end{document}