# HG changeset patch # User Christian Urban # Date 1413549890 -3600 # Node ID 8a222559278f867e7432e4def608c3dbee8243c7 # Parent 0afe43616b6ad64a62149038d94c0d98775bbb6c updated diff -r 0afe43616b6a -r 8a222559278f handouts/ho04.pdf Binary file handouts/ho04.pdf has changed diff -r 0afe43616b6a -r 8a222559278f handouts/ho04.tex --- a/handouts/ho04.tex Fri Oct 17 11:56:19 2014 +0100 +++ b/handouts/ho04.tex Fri Oct 17 13:44:50 2014 +0100 @@ -268,15 +268,15 @@ step in our running example \begin{center} -$r_4 = der\,c\,r_3$ +$r_4 = der\,c\,r_3 = (\varnothing \cdot (b \cdot c)) + ((\varnothing \cdot c) + \epsilon)$ \end{center} -\noindent -Simplifying the result would just give use $\epsilon$. -Running $mkeps$ on this regular expression would provide -us with $Empty$ instead of $Right(Right(Empty))$ that -was obtained without the simplification. The problem is -we need to recreate this value, rather than just $Empty$. +\noindent Simplifying the result would just give us +$\epsilon$. Running $mkeps$ on this regular expression would +then provide us with $Empty$ instead of $Right(Right(Empty))$ +that was obtained without the simplification. The problem is +we need to recreate this more complicated value, rather than +just $Empty$. This requires what I call \emph{rectification functions}. They need to be calculated whenever a regular expression gets @@ -297,22 +297,115 @@ \end{center} \noindent Applying them to $r_4$ will require several nested -simplifications in order end up with just $\epsilon$. We can -implement this by letting simp return not just a (simplified) -regular expression, but also a rectification function. Let us -consider the alternative case, say $r_1 + r_2$, first. We -would first simplify the component regular expressions $r_1$ -and $r_2.$ This will return simplified versions (if they can -be simplified), say $r_{1s}$ and $r_{2s}$, but also two -rectification functions $f_{1s}$ and $f_{2s}$. We need to -assemble them in order to obtain a rectified value for -$r_1 + r_2$. +simplifications in order end up with just $\epsilon$. + +We can implement this by letting simp return not just a +(simplified) regular expression, but also a rectification +function. Let us consider the alternative case, say $r_1 + +r_2$, first. We would first simplify the component regular +expressions $r_1$ and $r_2.$ This will return simplified +versions (if they can be simplified), say $r_{1s}$ and +$r_{2s}$, but also two rectification functions $f_{1s}$ and +$f_{2s}$. We need to assemble them in order to obtain a +rectified value for $r_1 + r_2$. In case $r_{1s}$ simplified +to $\varnothing$, we would continue the derivative calculation +with $r_{2s}$. The Sulzmann \& Lu algorithm would return a +corresponding value, say $v_{2s}$. But now this needs to +be ``rectified'' to the value + +\begin{center} +$Right(v_{2s})$ +\end{center} + +\noindent Unfortunately, this is not enough because there +might be some simplifications that happened inside $r_2$ +and for which the simplification function retuned also +a rectification function $f_{2s}$. So in fact we need to +apply this one too which gives + +\begin{center} +$Right(f_{2s}(v_{2s}))$ +\end{center} + +\noindent So if we want to return this as function, +we would need to return + +\begin{center} +$\lambda v.\,Right(f_{2s}(v))$ +\end{center} + +\noindent which is the lambda-calculus notation for +a function that expects a value $v$ and returns everything +after the dot where $v$ is replaced by whatever value is +given. + +Let us package these ideas into a single function (still only +considering the alternative case): -In case $r_{1s}$ simplified to $\varnothing$, we would -continue the derivative calculation with $r_{2s}$. The -Sulzmann \& Lu algorithm would return a corresponding value, -say $v_{2s}$. +\begin{center} +\begin{tabular}{l} +$simp(r)$:\\ +\quad case $r = r_1 + r_2$\\ +\qquad let $(r_{1s}, f_{1s}) = simp(r_1)$\\ +\qquad \phantom{let} $(r_{2s}, f_{2s}) = simp(r_2)$\smallskip\\ +\qquad case $r_{1s} = \varnothing$: + return $(r_{2s}, \lambda v. \,Right(f_{2s}(v)))$\\ +\qquad case $r_{2s} = \varnothing$: + return $(r_{1s}, \lambda v. \,Left(f_{1s}(v)))$\\ +\qquad otherwise: + return $(r_{1s} + r_{2s}, f_{alt}(f_{1s}, f_{2s}))$\\ +\end{tabular} +\end{center} + +\noindent We first recursively call the simlification with $r_1$ +and $r_2$. This gives simplified regular expressions, $r_{1s}$ +and $r_{2s}$, as well as two rectification functions $f_{1s}$ +and $f_{2s}$. We next need to test whether the simplified +regular expressions are $\varnothing$ so as to make further +simplifications. In case $r_{1s}$ is $\varnothing$ then +we can return $r_{2s}$ (the other alternative). However +we need to now build a rectification function, which as +said above is + +\begin{center} +$\lambda v.\,Right(f_{2s}(v))$ +\end{center} + +\noindent The case where $r_{2s} = \varnothing$ is similar. +We return $r_{1s}$ but now have to rectify such that we return +\begin{center} +$\lambda v.\,Left(f_{1s}(v))$ +\end{center} + +\noindent Note that in this case we have to apply $f_{1s}$, +not $f_{2s}$, which is responsible to rectify the inner parts +of $v$. The otherwise-case is slightly interesting. In this +case neither $r_{1s}$ nor $r_{2s}$ are $\varnothing$ and no +further simplification can be applied. Accordingly, we return +$r_{1s} + r_{2s}$ as the simplified regular expression. In +principle we also do not have to do any rectification, because +no simplification was done in this case. But this is actually +not true: There might have been simplifications inside +$r_{1s}$ and $r_2s$. We therefore need to take into account +the calculated rectification functions $f_{1s}$ and $f_{2s}$. +We can do this by defining a rectification function $f_{alt}$ +which takes two rectification functions as arguments + +\begin{center} +\begin{tabular}{l@{\hspace{1mm}}l} +$f_{alt}(f_1, f_2) \dn$\\ +\qquad $\lambda v.\,$ case $v = Left(v')$: + & return $Left(f_1(v'))$\\ +\qquad \phantom{$\lambda v.\,$} case $v = Right(v')$: + & return $Right(f_2(v'))$\\ +\end{tabular} +\end{center} + +\noindent In essence we need to apply in this case +the appropriate rectification function to the inner part +of the value $v$, wherevy $v$ can only be of the form +$Right(\_)$ or $Left(\_)$. \subsubsection*{Records and Tokenisation}