handouts/ho04.tex
changeset 285 8a222559278f
parent 284 0afe43616b6a
child 286 19020b75d75e
equal deleted inserted replaced
284:0afe43616b6a 285:8a222559278f
   266 each derivatives step. But this is a bit more involved in 
   266 each derivatives step. But this is a bit more involved in 
   267 algorithm of Sulzmann \& Lu. Consider the last derivation
   267 algorithm of Sulzmann \& Lu. Consider the last derivation
   268 step in our running example
   268 step in our running example
   269 
   269 
   270 \begin{center}
   270 \begin{center}
   271 $r_4 = der\,c\,r_3$
   271 $r_4 = der\,c\,r_3 = (\varnothing \cdot (b \cdot c)) + ((\varnothing \cdot c) + \epsilon)$
   272 \end{center}
   272 \end{center}
   273 
   273 
   274 \noindent 
   274 \noindent Simplifying the result would just give us
   275 Simplifying the result would just give use $\epsilon$.
   275 $\epsilon$. Running $mkeps$ on this regular expression would
   276 Running $mkeps$ on this regular expression would provide 
   276 then provide us with $Empty$ instead of $Right(Right(Empty))$
   277 us with $Empty$ instead of $Right(Right(Empty))$ that
   277 that was obtained without the simplification. The problem is
   278 was obtained without the simplification. The problem is
   278 we need to recreate this more complicated value, rather than 
   279 we need to recreate this value, rather than just $Empty$.
   279 just $Empty$.
   280 
   280 
   281 This requires what I call \emph{rectification functions}. They
   281 This requires what I call \emph{rectification functions}. They
   282 need to be calculated whenever a regular expression gets
   282 need to be calculated whenever a regular expression gets
   283 simplified. Rectification functions take a value as argument
   283 simplified. Rectification functions take a value as argument
   284 and return a (rectified) value. Our simplification rules so
   284 and return a (rectified) value. Our simplification rules so
   295 $r + r$ & $\mapsto$ & $r$\\ 
   295 $r + r$ & $\mapsto$ & $r$\\ 
   296 \end{tabular}
   296 \end{tabular}
   297 \end{center}
   297 \end{center}
   298 
   298 
   299 \noindent Applying them to $r_4$ will require several nested
   299 \noindent Applying them to $r_4$ will require several nested
   300 simplifications in order end up with just $\epsilon$. We can
   300 simplifications in order end up with just $\epsilon$. 
   301 implement this by letting simp return not just a (simplified)
   301 
   302 regular expression, but also a rectification function. Let us
   302 We can implement this by letting simp return not just a
   303 consider the alternative case, say $r_1 + r_2$, first. We
   303 (simplified) regular expression, but also a rectification
   304 would first simplify the component regular expressions $r_1$
   304 function. Let us consider the alternative case, say $r_1 +
   305 and $r_2.$ This will return simplified versions (if they can
   305 r_2$, first. We would first simplify the component regular
   306 be simplified), say $r_{1s}$ and $r_{2s}$, but also two
   306 expressions $r_1$ and $r_2.$ This will return simplified
   307 rectification functions $f_{1s}$ and $f_{2s}$. We need to
   307 versions (if they can be simplified), say $r_{1s}$ and
   308 assemble them in order to obtain a rectified value for 
   308 $r_{2s}$, but also two rectification functions $f_{1s}$ and
   309 $r_1 + r_2$.
   309 $f_{2s}$. We need to assemble them in order to obtain a
   310 
   310 rectified value for $r_1 + r_2$. In case $r_{1s}$ simplified
   311 In case $r_{1s}$ simplified to $\varnothing$, we would
   311 to $\varnothing$, we would continue the derivative calculation
   312 continue the derivative calculation with $r_{2s}$. The
   312 with $r_{2s}$. The Sulzmann \& Lu algorithm would return a
   313 Sulzmann \& Lu algorithm would return a corresponding value,
   313 corresponding value, say $v_{2s}$. But now this needs to
   314 say $v_{2s}$.
   314 be ``rectified'' to the value 
   315 
   315 
       
   316 \begin{center}
       
   317 $Right(v_{2s})$
       
   318 \end{center}
       
   319 
       
   320 \noindent Unfortunately, this is not enough because there
       
   321 might be some simplifications that happened inside $r_2$ 
       
   322 and for which the simplification function retuned also
       
   323 a rectification function $f_{2s}$. So in fact we need to
       
   324 apply this one too which gives
       
   325 
       
   326 \begin{center}
       
   327 $Right(f_{2s}(v_{2s}))$
       
   328 \end{center}
       
   329 
       
   330 \noindent So if we want to return this as function,
       
   331 we would need to return
       
   332 
       
   333 \begin{center}
       
   334 $\lambda v.\,Right(f_{2s}(v))$
       
   335 \end{center}
       
   336 
       
   337 \noindent which is the lambda-calculus notation for
       
   338 a function that expects a value $v$ and returns everything
       
   339 after the dot where $v$ is replaced by whatever value is 
       
   340 given.
       
   341 
       
   342 Let us package these ideas into a single function (still only
       
   343 considering the alternative case):
       
   344 
       
   345 \begin{center}
       
   346 \begin{tabular}{l}
       
   347 $simp(r)$:\\
       
   348 \quad case $r = r_1 + r_2$\\
       
   349 \qquad let $(r_{1s}, f_{1s}) = simp(r_1)$\\
       
   350 \qquad \phantom{let} $(r_{2s}, f_{2s}) = simp(r_2)$\smallskip\\
       
   351 \qquad case $r_{1s} = \varnothing$: 
       
   352        return $(r_{2s}, \lambda v. \,Right(f_{2s}(v)))$\\
       
   353 \qquad case $r_{2s} = \varnothing$: 
       
   354        return $(r_{1s}, \lambda v. \,Left(f_{1s}(v)))$\\
       
   355 \qquad otherwise: 
       
   356        return $(r_{1s} + r_{2s}, f_{alt}(f_{1s}, f_{2s}))$\\
       
   357 \end{tabular}
       
   358 \end{center}
       
   359 
       
   360 \noindent We first recursively call the simlification with $r_1$
       
   361 and $r_2$. This gives simplified regular expressions, $r_{1s}$
       
   362 and $r_{2s}$, as well as two rectification functions $f_{1s}$
       
   363 and $f_{2s}$. We next need to test whether the simplified 
       
   364 regular expressions are $\varnothing$ so as to make further 
       
   365 simplifications. In case $r_{1s}$ is $\varnothing$ then 
       
   366 we can return $r_{2s}$ (the other alternative). However
       
   367 we need to now build a rectification function, which as
       
   368 said above is
       
   369 
       
   370 \begin{center}
       
   371 $\lambda v.\,Right(f_{2s}(v))$
       
   372 \end{center}
       
   373 
       
   374 \noindent The case where $r_{2s} = \varnothing$ is similar.
       
   375 We return $r_{1s}$ but now have to rectify such that we return
       
   376 
       
   377 \begin{center}
       
   378 $\lambda v.\,Left(f_{1s}(v))$
       
   379 \end{center}
       
   380 
       
   381 \noindent Note that in this case we have to apply $f_{1s}$,
       
   382 not $f_{2s}$, which is responsible to rectify the inner parts
       
   383 of $v$. The otherwise-case is slightly interesting. In this
       
   384 case neither $r_{1s}$ nor $r_{2s}$ are $\varnothing$ and no
       
   385 further simplification can be applied. Accordingly, we return
       
   386 $r_{1s} + r_{2s}$ as the simplified regular expression. In
       
   387 principle we also do not have to do any rectification, because
       
   388 no simplification was done in this case. But this is actually
       
   389 not true: There might have been simplifications inside
       
   390 $r_{1s}$ and $r_2s$. We therefore need to take into account
       
   391 the calculated rectification functions $f_{1s}$ and $f_{2s}$.
       
   392 We can do this by defining a rectification function $f_{alt}$
       
   393 which takes two rectification functions as arguments
       
   394 
       
   395 \begin{center}
       
   396 \begin{tabular}{l@{\hspace{1mm}}l}
       
   397 $f_{alt}(f_1, f_2) \dn$\\
       
   398 \qquad $\lambda v.\,$ case $v = Left(v')$: 
       
   399       & return $Left(f_1(v'))$\\
       
   400 \qquad \phantom{$\lambda v.\,$} case $v = Right(v')$: 
       
   401       & return $Right(f_2(v'))$\\      
       
   402 \end{tabular}
       
   403 \end{center}
       
   404 
       
   405 \noindent In essence we need to apply in this case 
       
   406 the appropriate rectification function to the inner part
       
   407 of the value $v$, wherevy $v$ can only be of the form 
       
   408 $Right(\_)$ or $Left(\_)$.
   316 
   409 
   317 \subsubsection*{Records and Tokenisation}
   410 \subsubsection*{Records and Tokenisation}
   318 
   411 
   319 \newpage
   412 \newpage
   320 Algorithm by Sulzmann, Lexing
   413 Algorithm by Sulzmann, Lexing