341 regular expression(just like "erasing" the bits). Its definition is omitted. |
341 regular expression(just like "erasing" the bits). Its definition is omitted. |
342 $\rup\backslash_{simp} \, s$ and $\simp(\rup\backslash s)$ |
342 $\rup\backslash_{simp} \, s$ and $\simp(\rup\backslash s)$ |
343 are very closely related, but not identical. |
343 are very closely related, but not identical. |
344 For example, let $r$ be the regular expression |
344 For example, let $r$ be the regular expression |
345 $(a+b)(a+a*)$ and $s$ be the string $aa$, then |
345 $(a+b)(a+a*)$ and $s$ be the string $aa$, then |
|
346 both $\erase (\rup\backslash_{simp} \, s)$ and $\erase (\simp (\rup\backslash s))$ |
|
347 are $\ONE + a^*$. However, without $\erase$ |
346 \begin{center} |
348 \begin{center} |
347 $\rup\backslash_{simp} \, s$ is equal to $_0(_0\ONE +_{11}a^*)$ |
349 $\rup\backslash_{simp} \, s$ is equal to $_0(_0\ONE +_{11}a^*)$ |
348 \end{center} |
350 \end{center} |
349 \noindent |
351 \noindent |
350 whereas |
352 whereas |
351 \begin{center} |
353 \begin{center} |
352 $\rup\backslash \, s$ is equal to $(_{00}\ONE +_{011}a^*)$ |
354 $\simp(\rup\backslash s)$ is equal to $(_{00}\ONE +_{011}a^*)$ |
353 \end{center} |
355 \end{center} |
354 \noindent |
356 \noindent |
|
357 For the sake of visual simplicity, we use numbers to denote the bits |
|
358 in bitcodes as we have previously defined for annotated |
|
359 regular expressions. $\S$ is replaced by |
|
360 subscript $_1$ and $\Z$ by $_0$. |
|
361 |
355 Two "rules" might be inferred from the above example. |
362 Two "rules" might be inferred from the above example. |
|
363 |
356 First, after erasing the bits the two regular expressions |
364 First, after erasing the bits the two regular expressions |
357 are exactly the same: both become $1+a^*$. Here the |
365 are exactly the same: both become $1+a^*$. Here the |
358 function $\simp$ exhibits the "once equals many times" |
366 function $\simp$ exhibits the "one in the end equals many times |
|
367 at the front" |
359 property: one simplification in the end causes the |
368 property: one simplification in the end causes the |
360 same regular expression structure as |
369 same regular expression structure as |
361 successive simplifications done alongside derivatives. |
370 successive simplifications done alongside derivatives. |
|
371 $\rup\backslash_{simp} \, s$ unfolds to |
|
372 $\simp((\simp(r\backslash a))\backslash a)$ |
|
373 and $\simp(\rup\backslash s)$ unfolds to |
|
374 $\simp((r\backslash a)\backslash a)$. The one simplification |
|
375 in the latter causes the resulting regular expression to |
|
376 become $1+a^*$, exactly the same as the former with |
|
377 two simplifications. |
|
378 |
362 Second, the bit-codes are different, but they are essentially |
379 Second, the bit-codes are different, but they are essentially |
363 the same: if we push the outmost bits ${\bf_0}(_0\ONE +_{11}a^*)$ of $\rup\backslash_{simp} \, s$ |
380 the same: if we push the outmost bits ${\bf_0}(_0\ONE +_{11}a^*)$ of $\rup\backslash_{simp} \, s$ |
364 inside then we get $(_{00}\ONE +_{011}a^*)$, exactly the |
381 inside then we get $(_{00}\ONE +_{011}a^*)$, exactly the |
365 same as that of $\rup\backslash \, s$. And this difference |
382 same as that of $\rup\backslash \, s$. And this difference |
366 does not matter when we try to apply $\bmkeps$ or $\retrieve$ |
383 does not matter when we try to apply $\bmkeps$ or $\retrieve$ |
367 to it.\\ |
384 to it. This seems a good news if we want to use $\retrieve$ |
368 If we look into the difference above, we could see why: |
385 to prove things. |
369 during the first derivative operation, |
386 |
370 $\rup\backslash a=(_0\ONE + \ZERO)(_0a + _1a^*)$ gets into a sequence |
387 If we look into the difference above, we could see that the |
371 with the first part being nullable, but not the second part. |
388 difference is not fundamental: the bits are just being moved |
|
389 around in a way that does not hurt the correctness. |
|
390 During the first derivative operation, |
|
391 $\rup\backslash a=(_0\ONE + \ZERO)(_0a + _1a^*)$ is |
|
392 in the form of a sequence regular expression with the first |
|
393 part being nullable. |
|
394 Recall the simplification function definition: |
|
395 \begin{center} |
|
396 \begin{tabular}{@{}lcl@{}} |
|
397 |
|
398 $\textit{simp} \; (\textit{SEQ}\;bs\,a_1\,a_2)$ & $\dn$ & $ (\textit{simp} \; a_1, \textit{simp} \; a_2) \; \textit{match} $ \\ |
|
399 &&$\quad\textit{case} \; (\ZERO, \_) \Rightarrow \ZERO$ \\ |
|
400 &&$\quad\textit{case} \; (\_, \ZERO) \Rightarrow \ZERO$ \\ |
|
401 &&$\quad\textit{case} \; (\ONE, a_2') \Rightarrow \textit{fuse} \; bs \; a_2'$ \\ |
|
402 &&$\quad\textit{case} \; (a_1', \ONE) \Rightarrow \textit{fuse} \; bs \; a_1'$ \\ |
|
403 &&$\quad\textit{case} \; (a_1', a_2') \Rightarrow \textit{SEQ} \; bs \; a_1' \; a_2'$ \\ |
|
404 |
|
405 $\textit{simp} \; (\textit{ALTS}\;bs\,as)$ & $\dn$ & $\textit{distinct}( \textit{flatten} ( \textit{map simp as})) \; \textit{match} $ \\ |
|
406 &&$\quad\textit{case} \; [] \Rightarrow \ZERO$ \\ |
|
407 &&$\quad\textit{case} \; a :: [] \Rightarrow \textit{fuse bs a}$ \\ |
|
408 &&$\quad\textit{case} \; as' \Rightarrow \textit{ALTS}\;bs\;as'$\\ |
|
409 |
|
410 $\textit{simp} \; a$ & $\dn$ & $\textit{a} \qquad \textit{otherwise}$ |
|
411 \end{tabular} |
|
412 \end{center} |
|
413 |
|
414 \noindent |
|
415 If we call $\simp$ on $\rup$, just as $\backslash_{simp}$ |
|
416 requires, then we would go throught the third clause of |
|
417 the sequence case:$\quad\textit{case} \; (\ONE, a_2') \Rightarrow \textit{fuse} \; bs \; a_2'$. |
|
418 The $\ZERO$ of $(_0\ONE + \ZERO)$ is simplified away and |
|
419 $_0\ONE$ merged into $_0a + _1a^*$ by simply |
|
420 putting its bits($_0$) to the front of the second component: |
|
421 ${\bf_0}(_0a + _1a^*)$. |
|
422 After a second derivative operation, |
|
423 namely, $(_0(_0a + _1a^*))\backslash a$, we get |
|
424 $ |
|
425 _0(_0 \ONE + _1(_1\ONE \cdot a^*)) |
|
426 $, and this simplifies to $_0(_0 \ONE + _{11} a^*)$ |
|
427 by the third clause of the alternative case: |
|
428 $\quad\textit{case} \; as' \Rightarrow \textit{ALTS}\;bs\;as'$. |
|
429 The outmost bit $_0$ remains unchanged and stays with |
|
430 the outmost regular expression. However, things are a bit |
|
431 different when it comes to $\simp(\rup\backslash \, s)$, because |
|
432 without simplification, first term of the sequence |
|
433 $\rup\backslash a=(_0\ONE + \ZERO)(_0a + _1a^*)$ |
|
434 is not merged into the second component |
|
435 and is nullable. |
|
436 Therefore $((_0\ONE + \ZERO)(_0a + _1a^*))\backslash a$ splits into |
|
437 $([(\ZERO + \ZERO)\cdot(_0a + _1a^*)] + _0( _0\ONE + _1[_1\ONE \cdot a^*]))$. |
|
438 After these two successive derivatives without simplification, |
|
439 we apply $\simp$ to this regular expression, which goes through |
|
440 the alternative clause, and each component of $([(\ZERO + \ZERO)\cdot(_0a + _1a^*)] + _0( _0\ONE + _1[_1\ONE \cdot a^*]))$ will be simplified, giving us the list:$[\ZERO, _0(_0\ONE + _{11}a^*]$,this |
|
441 list is then flattened--for |
|
442 $([(\ZERO + \ZERO)\cdot(_0a + _1a^*)]$ it will be simplified into $\ZERO$ |
|
443 and then thrown away by $\textit{flatten}$; $ _0( _0\ONE + _1[_1\ONE \cdot a^*]))$ |
|
444 becomes $ _{00}\ONE + _{011}a^*]))$ because flatten opens up the alternative |
|
445 $\ONE + a^*$ and fuses the front bit(s) $_0$ to the front of $_0\ONE $ and $_{11}a^*$ |
|
446 and get $_{00}$ and $_{011}$. |
|
447 %CONSTRUCTION SITE |
372 and we can use this to construct a set of examples based |
448 and we can use this to construct a set of examples based |
373 on this type of behaviour of two operations. |
449 on this type of behaviour of two operations: |
|
450 $r_1$ |
374 that is to say, despite the bits are being moved around on the regular expression |
451 that is to say, despite the bits are being moved around on the regular expression |
375 (difference in bits), the structure of the (unannotated)regular expression |
452 (difference in bits), the structure of the (unannotated)regular expression |
376 after one simplification is exactly the same after the |
453 after one simplification is exactly the same after the |
377 same sequence of derivative operations |
454 same sequence of derivative operations |
378 regardless of whether we did simplification |
455 regardless of whether we did simplification |