266 each derivatives step. But this is a bit more involved in |
266 each derivatives step. But this is a bit more involved in |
267 algorithm of Sulzmann \& Lu. Consider the last derivation |
267 algorithm of Sulzmann \& Lu. Consider the last derivation |
268 step in our running example |
268 step in our running example |
269 |
269 |
270 \begin{center} |
270 \begin{center} |
271 $r_4 = der\,c\,r_3$ |
271 $r_4 = der\,c\,r_3 = (\varnothing \cdot (b \cdot c)) + ((\varnothing \cdot c) + \epsilon)$ |
272 \end{center} |
272 \end{center} |
273 |
273 |
274 \noindent |
274 \noindent Simplifying the result would just give us |
275 Simplifying the result would just give use $\epsilon$. |
275 $\epsilon$. Running $mkeps$ on this regular expression would |
276 Running $mkeps$ on this regular expression would provide |
276 then provide us with $Empty$ instead of $Right(Right(Empty))$ |
277 us with $Empty$ instead of $Right(Right(Empty))$ that |
277 that was obtained without the simplification. The problem is |
278 was obtained without the simplification. The problem is |
278 we need to recreate this more complicated value, rather than |
279 we need to recreate this value, rather than just $Empty$. |
279 just $Empty$. |
280 |
280 |
281 This requires what I call \emph{rectification functions}. They |
281 This requires what I call \emph{rectification functions}. They |
282 need to be calculated whenever a regular expression gets |
282 need to be calculated whenever a regular expression gets |
283 simplified. Rectification functions take a value as argument |
283 simplified. Rectification functions take a value as argument |
284 and return a (rectified) value. Our simplification rules so |
284 and return a (rectified) value. Our simplification rules so |
295 $r + r$ & $\mapsto$ & $r$\\ |
295 $r + r$ & $\mapsto$ & $r$\\ |
296 \end{tabular} |
296 \end{tabular} |
297 \end{center} |
297 \end{center} |
298 |
298 |
299 \noindent Applying them to $r_4$ will require several nested |
299 \noindent Applying them to $r_4$ will require several nested |
300 simplifications in order end up with just $\epsilon$. We can |
300 simplifications in order end up with just $\epsilon$. |
301 implement this by letting simp return not just a (simplified) |
301 |
302 regular expression, but also a rectification function. Let us |
302 We can implement this by letting simp return not just a |
303 consider the alternative case, say $r_1 + r_2$, first. We |
303 (simplified) regular expression, but also a rectification |
304 would first simplify the component regular expressions $r_1$ |
304 function. Let us consider the alternative case, say $r_1 + |
305 and $r_2.$ This will return simplified versions (if they can |
305 r_2$, first. We would first simplify the component regular |
306 be simplified), say $r_{1s}$ and $r_{2s}$, but also two |
306 expressions $r_1$ and $r_2.$ This will return simplified |
307 rectification functions $f_{1s}$ and $f_{2s}$. We need to |
307 versions (if they can be simplified), say $r_{1s}$ and |
308 assemble them in order to obtain a rectified value for |
308 $r_{2s}$, but also two rectification functions $f_{1s}$ and |
309 $r_1 + r_2$. |
309 $f_{2s}$. We need to assemble them in order to obtain a |
310 |
310 rectified value for $r_1 + r_2$. In case $r_{1s}$ simplified |
311 In case $r_{1s}$ simplified to $\varnothing$, we would |
311 to $\varnothing$, we would continue the derivative calculation |
312 continue the derivative calculation with $r_{2s}$. The |
312 with $r_{2s}$. The Sulzmann \& Lu algorithm would return a |
313 Sulzmann \& Lu algorithm would return a corresponding value, |
313 corresponding value, say $v_{2s}$. But now this needs to |
314 say $v_{2s}$. |
314 be ``rectified'' to the value |
315 |
315 |
|
316 \begin{center} |
|
317 $Right(v_{2s})$ |
|
318 \end{center} |
|
319 |
|
320 \noindent Unfortunately, this is not enough because there |
|
321 might be some simplifications that happened inside $r_2$ |
|
322 and for which the simplification function retuned also |
|
323 a rectification function $f_{2s}$. So in fact we need to |
|
324 apply this one too which gives |
|
325 |
|
326 \begin{center} |
|
327 $Right(f_{2s}(v_{2s}))$ |
|
328 \end{center} |
|
329 |
|
330 \noindent So if we want to return this as function, |
|
331 we would need to return |
|
332 |
|
333 \begin{center} |
|
334 $\lambda v.\,Right(f_{2s}(v))$ |
|
335 \end{center} |
|
336 |
|
337 \noindent which is the lambda-calculus notation for |
|
338 a function that expects a value $v$ and returns everything |
|
339 after the dot where $v$ is replaced by whatever value is |
|
340 given. |
|
341 |
|
342 Let us package these ideas into a single function (still only |
|
343 considering the alternative case): |
|
344 |
|
345 \begin{center} |
|
346 \begin{tabular}{l} |
|
347 $simp(r)$:\\ |
|
348 \quad case $r = r_1 + r_2$\\ |
|
349 \qquad let $(r_{1s}, f_{1s}) = simp(r_1)$\\ |
|
350 \qquad \phantom{let} $(r_{2s}, f_{2s}) = simp(r_2)$\smallskip\\ |
|
351 \qquad case $r_{1s} = \varnothing$: |
|
352 return $(r_{2s}, \lambda v. \,Right(f_{2s}(v)))$\\ |
|
353 \qquad case $r_{2s} = \varnothing$: |
|
354 return $(r_{1s}, \lambda v. \,Left(f_{1s}(v)))$\\ |
|
355 \qquad otherwise: |
|
356 return $(r_{1s} + r_{2s}, f_{alt}(f_{1s}, f_{2s}))$\\ |
|
357 \end{tabular} |
|
358 \end{center} |
|
359 |
|
360 \noindent We first recursively call the simlification with $r_1$ |
|
361 and $r_2$. This gives simplified regular expressions, $r_{1s}$ |
|
362 and $r_{2s}$, as well as two rectification functions $f_{1s}$ |
|
363 and $f_{2s}$. We next need to test whether the simplified |
|
364 regular expressions are $\varnothing$ so as to make further |
|
365 simplifications. In case $r_{1s}$ is $\varnothing$ then |
|
366 we can return $r_{2s}$ (the other alternative). However |
|
367 we need to now build a rectification function, which as |
|
368 said above is |
|
369 |
|
370 \begin{center} |
|
371 $\lambda v.\,Right(f_{2s}(v))$ |
|
372 \end{center} |
|
373 |
|
374 \noindent The case where $r_{2s} = \varnothing$ is similar. |
|
375 We return $r_{1s}$ but now have to rectify such that we return |
|
376 |
|
377 \begin{center} |
|
378 $\lambda v.\,Left(f_{1s}(v))$ |
|
379 \end{center} |
|
380 |
|
381 \noindent Note that in this case we have to apply $f_{1s}$, |
|
382 not $f_{2s}$, which is responsible to rectify the inner parts |
|
383 of $v$. The otherwise-case is slightly interesting. In this |
|
384 case neither $r_{1s}$ nor $r_{2s}$ are $\varnothing$ and no |
|
385 further simplification can be applied. Accordingly, we return |
|
386 $r_{1s} + r_{2s}$ as the simplified regular expression. In |
|
387 principle we also do not have to do any rectification, because |
|
388 no simplification was done in this case. But this is actually |
|
389 not true: There might have been simplifications inside |
|
390 $r_{1s}$ and $r_2s$. We therefore need to take into account |
|
391 the calculated rectification functions $f_{1s}$ and $f_{2s}$. |
|
392 We can do this by defining a rectification function $f_{alt}$ |
|
393 which takes two rectification functions as arguments |
|
394 |
|
395 \begin{center} |
|
396 \begin{tabular}{l@{\hspace{1mm}}l} |
|
397 $f_{alt}(f_1, f_2) \dn$\\ |
|
398 \qquad $\lambda v.\,$ case $v = Left(v')$: |
|
399 & return $Left(f_1(v'))$\\ |
|
400 \qquad \phantom{$\lambda v.\,$} case $v = Right(v')$: |
|
401 & return $Right(f_2(v'))$\\ |
|
402 \end{tabular} |
|
403 \end{center} |
|
404 |
|
405 \noindent In essence we need to apply in this case |
|
406 the appropriate rectification function to the inner part |
|
407 of the value $v$, wherevy $v$ can only be of the form |
|
408 $Right(\_)$ or $Left(\_)$. |
316 |
409 |
317 \subsubsection*{Records and Tokenisation} |
410 \subsubsection*{Records and Tokenisation} |
318 |
411 |
319 \newpage |
412 \newpage |
320 Algorithm by Sulzmann, Lexing |
413 Algorithm by Sulzmann, Lexing |