516 To understands this definition in a more vivid way, the reader |
516 To understands this definition in a more vivid way, the reader |
517 might imagine this: when we do the derivative on regular expression |
517 might imagine this: when we do the derivative on regular expression |
518 $r_{i-1}$, we chop off a character from $r_{i-1}$ to form $r_i$. |
518 $r_{i-1}$, we chop off a character from $r_{i-1}$ to form $r_i$. |
519 This leaves a "hole" on $r_i$. In order to calculate a value for |
519 This leaves a "hole" on $r_i$. In order to calculate a value for |
520 $r_{i-1}$, we need to locate where that hole is. We can find this location |
520 $r_{i-1}$, we need to locate where that hole is. We can find this location |
521 informtaion by comparing $r_{i-1}$ and $v_i$. |
521 informtation by comparing $r_{i-1}$ and $v_i$. |
522 For instance, if $r_{i-1}$ is of shape $r_a \cdot r_b$, and $v_i$ |
522 For instance, if $r_{i-1}$ is of shape $r_a \cdot r_b$, and $v_i$ |
523 is of shape $\textit{Left}(\textit{Seq}(v_a, v_b))$, we know immediately that |
523 is of shape $\textit{Left}(\textit{Seq}(v_a, v_b))$, we know immediately that |
524 \[ (r_a \cdot r_b)\backslash c = (r_a\backslash c) \cdot r_b \,+\, r_b\backslash c\], |
524 \[ (r_a \cdot r_b)\backslash c = (r_a\backslash c) \cdot r_b \,+\, r_b\backslash c\], |
525 otherwise if $r_a$ is not nullable, |
525 otherwise if $r_a$ is not nullable, |
526 \[ (r_a \cdot r_b)\backslash c = (r_a\backslash c) \cdot r_b\], |
526 \[ (r_a \cdot r_b)\backslash c = (r_a\backslash c) \cdot r_b\], |
566 The outer 2 $Left$ tells us the first nullable part hides in the term |
566 The outer 2 $Left$ tells us the first nullable part hides in the term |
567 \[((0+0+0 + 0 + 1 \cdot 1 \cdot 1) \cdot r^* \] |
567 \[((0+0+0 + 0 + 1 \cdot 1 \cdot 1) \cdot r^* \] |
568 in \[r_3 = ( \underline{(0+0+0 + 0 + 1 \cdot 1 \cdot 1) \cdot r^*} + (0+0+0 + 1 + 0) |
568 in \[r_3 = ( \underline{(0+0+0 + 0 + 1 \cdot 1 \cdot 1) \cdot r^*} + (0+0+0 + 1 + 0) |
569 \cdot r^*) +((0+1+0 + 0 + 0) \cdot r^*+(0+0+0 + 1 + 0) \cdot r^* ) |
569 \cdot r^*) +((0+1+0 + 0 + 0) \cdot r^*+(0+0+0 + 1 + 0) \cdot r^* ) |
570 \](underlined). Note that the leftmost location of term \[((0+0+0 + 0 + 1 \cdot 1 \cdot 1) \cdot r^* \] |
570 \](underlined). Note that the leftmost location of term \[((0+0+0 + 0 + 1 \cdot 1 \cdot 1) \cdot r^* \] |
571 (which corresponds to the intial sub-match $abc$) |
571 (which corresponds to the initial sub-match $abc$) |
572 allows $mkeps$ to choose it as the first candidate that meets the requirement of being $nullable$ |
572 allows $mkeps$ to choose it as the first candidate that meets the requirement of being $nullable$ |
573 because $mkeps$ is defined to always choose the left one when nullable. |
573 because $mkeps$ is defined to always choose the left one when nullable. |
574 In the case of this example, $abc$ is preferred over $a$ or $ab$. |
574 In the case of this example, $abc$ is preferred over $a$ or $ab$. |
575 This location is naturally generated by the splitting clause\[ |
575 This location is naturally generated by the splitting clause\[ |
576 (r_1 \cdot r_2)\backslash c (when \; r_1 \; nullable)) \, = (r_1\backslash c) \cdot r_2 \,+\, r_2\backslash c\]. |
576 (r_1 \cdot r_2)\backslash c (when \; r_1 \; nullable)) \, = (r_1\backslash c) \cdot r_2 \,+\, r_2\backslash c\]. |
957 It is relatively straightforward to establish that after 1 simplification step, the part of derivative that corresponds to a POSIX value remains intact and can still be collected, in other words, |
957 It is relatively straightforward to establish that after 1 simplification step, the part of derivative that corresponds to a POSIX value remains intact and can still be collected, in other words, |
958 bmkeps r = bmkeps simp r |
958 bmkeps r = bmkeps simp r |
959 as this basically comes down to proving actions like removing the additional $r$ in $r+r$ does not delete important POSIX information in a regular expression. |
959 as this basically comes down to proving actions like removing the additional $r$ in $r+r$ does not delete important POSIX information in a regular expression. |
960 The hardcore of this problem is to prove that |
960 The hardcore of this problem is to prove that |
961 bmkeps bders r = bmkeps bders simp r |
961 bmkeps bders r = bmkeps bders simp r |
962 That is, if we do derivative on regular expression r and the simplified version for, they can still prove the same POSIX value if there is one . This is not as straghtforward as the previous proposition, as the two regular expression r and simp r might become very different regular expressions after repeated application ofd simp and derivative. |
962 That is, if we do derivative on regular expression r and the simplified version for, they can still prove the same POSIX value if there is one . This is not as straightforward as the previous proposition, as the two regular expression r and simp r might become very different regular expressions after repeated application ofd simp and derivative. |
963 The crucial point is to find the "gene" of a regular expression and how it is kept intact during simplification. |
963 The crucial point is to find the "gene" of a regular expression and how it is kept intact during simplification. |
964 To aid this, we are utilizing the helping function retrieve described by Sulzmann and Lu: |
964 To aid this, we are utilizing the helping function retrieve described by Sulzmann and Lu: |
965 \\definition of retrieve\\ |
965 \\definition of retrieve\\ |
966 This function assembled the bitcode that corresponds to a parse tree for how the current derivative matches the suffix of the string(the characters that have not yet appeared, but is stored in the value). |
966 This function assembled the bitcode that corresponds to a parse tree for how the current derivative matches the suffix of the string(the characters that have not yet appeared, but is stored in the value). |
967 Sulzmann and Lu used this to connect the bit-coded algorithm to the older algorithm by the following equation:\\ |
967 Sulzmann and Lu used this to connect the bit-coded algorithm to the older algorithm by the following equation:\\ |