793 \blexers \; r \; s = \blexer \;r\;s |
793 \blexers \; r \; s = \blexer \;r\;s |
794 \end{equation} |
794 \end{equation} |
795 |
795 |
796 \noindent |
796 \noindent |
797 whereby $\blexers$ simplifies (makes derivatives smaller) in each |
797 whereby $\blexers$ simplifies (makes derivatives smaller) in each |
798 step, whereas with $\blexer$ the size can grow exponentially. This |
798 step. This |
799 would be an important milestone for my thesis, because we already |
799 would be an important milestone for my thesis, because we already |
800 have a very good idea how to establish that our set our simplification |
800 have a very good idea how to establish that our set our simplification |
801 rules keeps the size of derivativs below a relatively tight bound. |
801 rules keep the size of derivatives below a relatively tight bound. |
802 |
802 |
803 In order to prove the main theorem in \eqref{mainthm}, we need to prove that the |
803 In order to prove the main theorem in \eqref{mainthm}, we need to prove that the |
804 two functions produce the same output. The definition of these two functions |
804 two functions produce the same output. The definition of these two functions |
805 is shown below. |
805 is shown below. |
806 |
806 |
1015 $ \textit{retrieve} \; |
1015 $ \textit{retrieve} \; |
1016 a \; v \;=\; \textit{retrieve} \; (\textit{simp}\,a) \; v'.$ |
1016 a \; v \;=\; \textit{retrieve} \; (\textit{simp}\,a) \; v'.$ |
1017 \end{center} |
1017 \end{center} |
1018 The idea is that using $v'$, a simplified version of $v$ that had gone |
1018 The idea is that using $v'$, a simplified version of $v$ that had gone |
1019 through the same simplification step as $\textit{simp}(a)$, we are able |
1019 through the same simplification step as $\textit{simp}(a)$, we are able |
1020 to extract the bitcode that gives the same parsing information as the |
1020 to extract the bitcode that gives the same lexing information as the |
1021 unsimplified one. |
1021 unsimplified one. |
1022 If we want to use a similar technique as |
1022 If we want to use a similar technique as |
1023 that of the existing proof, |
1023 that of the existing proof, |
1024 we face the problem that in the above |
1024 we face the problem that in the above |
1025 equalities, |
1025 equalities, |
1026 $\retrieve \; a \; v$ is not always defined. |
1026 $\retrieve \; a \; v$ is not always defined. |
1027 for example, |
1027 For example, |
1028 $\retrieve \; _0(_1a+_0a) \; \Left(\Empty)$ |
1028 $\retrieve \; _0(_1a+_0a) \; \Left(\Empty)$ |
1029 is defined, but not $\retrieve \; (_{01}a) \;\Left(\Empty)$, |
1029 is defined, but not $\retrieve \; (_{01}a) \;\Left(\Empty)$, |
1030 though we can extract the same POSIX |
1030 though we can extract the same POSIX |
1031 bits from the two annotated regular expressions. |
1031 bits from the two annotated regular expressions. |
1032 The latter might occur when we try to retrieve from |
1032 The latter might occur when we try to retrieve from |
1033 a simplified regular expression using the same value |
1033 a simplified regular expression using the same value |
1034 as the unsimplified one. |
1034 underlying the unsimplified one. |
1035 This is because $\Left(\Empty)$ corresponds to |
1035 This is because $\Left(\Empty)$ corresponds to |
1036 the regular expression structure $\ONE+r_2$ instead of |
1036 the regular expression structure $\ONE+r_2$ instead of |
1037 $\ONE$. |
1037 $\ONE$. |
1038 That means, if we |
1038 That means, if we |
1039 want to prove that |
1039 want to prove that |
1040 \begin{center} |
1040 \begin{center} |
1041 $\textit{decode} \; \bmkeps \; \rup\backslash s \; r = \textit{decode} \; \bmkeps \; \rup\backslash_{simp} s \; r$ |
1041 $\bmkeps \; \rup\backslash s = \bmkeps \; \rup\backslash_{simp} s$ |
1042 \end{center} |
1042 \end{center} |
1043 \noindent |
1043 \noindent |
1044 holds by using $\retrieve$, |
1044 holds by using $\retrieve$, |
1045 we probably need to prove an equality like below: |
1045 we probably need to prove an equality like below: |
1046 \begin{center} |
1046 \begin{center} |
1175 $\quad\textit{case} \; as' \Rightarrow _{bs}\sum{as'}$. |
1170 $\quad\textit{case} \; as' \Rightarrow _{bs}\sum{as'}$. |
1176 \end{center} |
1171 \end{center} |
1177 |
1172 |
1178 \noindent |
1173 \noindent |
1179 The outmost bit $_0$ stays with |
1174 The outmost bit $_0$ stays with |
1180 the outmost regular expression, rather than being fused to |
1175 the outermost regular expression, rather than being fused to |
1181 its child regular expressions, as what we will later see happens |
1176 its child regular expressions, as what we will later see happens |
1182 to $\simp(\rup\backslash \, s)$. |
1177 to $\simp(\rup\backslash \, s)$. |
1183 If we choose to not simplify in the midst of derivative operations, |
1178 If we choose to not simplify in the midst of derivative operations, |
1184 but only do it at the end after the string has been exhausted, |
1179 but only do it at the end after the string has been exhausted, |
1185 namely, $\simp(\rup\backslash \, s)=\simp((\rup\backslash a)\backslash a)$, |
1180 namely, $\simp(\rup\backslash \, s)=\simp((\rup\backslash a)\backslash a)$, |
1192 $(when \; \textit{bnullable}\,a_1)$\\ |
1187 $(when \; \textit{bnullable}\,a_1)$\\ |
1193 & &$_{bs}\sum\,\;[_{[]}((a_1\,\backslash c) \cdot \,a_2),$\\ |
1188 & &$_{bs}\sum\,\;[_{[]}((a_1\,\backslash c) \cdot \,a_2),$\\ |
1194 & &$(\textit{fuse}\,(\textit{bmkeps}\,a_1)\,(a_2\,\backslash c))]$\\ |
1189 & &$(\textit{fuse}\,(\textit{bmkeps}\,a_1)\,(a_2\,\backslash c))]$\\ |
1195 \end{tabular} |
1190 \end{tabular} |
1196 \end{center} |
1191 \end{center} |
1197 |
1192 \noindent |
1198 because |
1193 because |
1199 $\rup\backslash a = (_0\ONE + \ZERO)(_0a + _1a^*)$ |
1194 $\rup\backslash a = (_0\ONE + \ZERO)(_0a + _1a^*)$ |
1200 is a sequence |
1195 is a sequence |
1201 with the first component being nullable |
1196 with the first component being nullable |
1202 (unsimplified, unlike the first round of running$\backslash_{simp}$). |
1197 (unsimplified, unlike the first round of running$\backslash_{simp}$). |
1229 $\exists \textit{rs}_1. \; \simp(r_2 \backslash c_2) = _{bs}{\sum \textit{rs}_1}$ & $and \;\simp(r_1 \backslash [c_1c_2]) = \ZERO\}$\\ |
1224 $\exists \textit{rs}_1. \; \simp(r_2 \backslash c_2) = _{bs}{\sum \textit{rs}_1}$ & $and \;\simp(r_1 \backslash [c_1c_2]) = \ZERO\}$\\ |
1230 \end{tabular} |
1225 \end{tabular} |
1231 \end{center} |
1226 \end{center} |
1232 We take a pair $(r, \;s)$ from the set $D$. |
1227 We take a pair $(r, \;s)$ from the set $D$. |
1233 |
1228 |
1234 Now we compute ${\bf \rup \backslash_{simp} s}$, we get: |
1229 Now we compute $ \rup \backslash_{simp} s$, we get: |
1235 \begin{center} |
1230 \begin{center} |
1236 \begin{tabular}{lcl} |
1231 \begin{tabular}{lcl} |
1237 $(r_1\cdot r_2)\backslash_{simp} \, [c_1c_2]$ & $= \simp\left[ \big(\simp\left[ \left( r_1\cdot r_2 \right) \backslash c_1\right] \big)\backslash c_2\right]$\\ |
1232 $(r_1\cdot r_2)\backslash_{simp} \, [c_1c_2]$ & $= \simp\left[ \big(\simp\left[ \left( r_1\cdot r_2 \right) \backslash c_1\right] \big)\backslash c_2\right]$\\ |
1238 & $= \simp\left[ \big(\simp \left[ \left(r_1 \backslash c_1\right) \cdot r_2 \right] \big) \backslash c_2 \right]$\\ |
1233 & $= \simp\left[ \big(\simp \left[ \left(r_1 \backslash c_1\right) \cdot r_2 \right] \big) \backslash c_2 \right]$\\ |
1239 & $= \simp \left[ (\fuse \; \bmkeps(r_1\backslash c_1) \; \simp(r_2) ) \backslash c_2 \right]$,\\ |
1234 & $= \simp \left[ (\fuse \; \bmkeps(r_1\backslash c_1) \; \simp(r_2) ) \backslash c_2 \right]$,\\ |
1435 |
1430 |
1436 |
1431 |
1437 |
1432 |
1438 |
1433 |
1439 |
1434 |
1440 We have changed the algorithm to suppress the old |
1435 We have changed the algorithm to avoid the old |
1441 counterexample, but this gives rise to new counterexamples. |
1436 counterexample, but this gives rise to new counterexamples. |
1442 This dilemma causes this amendment not a successful |
1437 This dilemma causes this amendment not a successful |
1443 attempt to make $\rup\backslash_{simp} \, s = \simp(\rup\backslash s)$ |
1438 attempt to make $\rup\backslash_{simp} \, s = \simp(\rup\backslash s)$ |
1444 under every possible regular expression and string. |
1439 under every possible regular expression and string. |
1445 \subsection{Properties of the Function $\simp$} |
1440 \subsection{Properties of the Function $\simp$} |
1470 \begin{center} |
1465 \begin{center} |
1471 $\simp(\simp(r)) = \simp(r)$ |
1466 $\simp(\simp(r)) = \simp(r)$ |
1472 \end{center} |
1467 \end{center} |
1473 \item |
1468 \item |
1474 \begin{center} |
1469 \begin{center} |
1475 $\textit{if} r = \simp(r') \textit{then} \; \textit{good}(r) $ |
1470 $\textit{if}\; r = \simp(r') \textit{then} \; \textit{good}(r) $ |
1476 \end{center} |
1471 \end{center} |
1477 \end{itemize} |
1472 \end{itemize} |
1478 \subsection{the Contains relation} |
1473 \subsection{The Contains relation} |
1479 $\retrieve$ is a too strong relation in that |
1474 $\retrieve$ is a too strong relation in that |
1480 it only extracts one bitcode instead of a set of them. |
1475 it only extracts one bitcode instead of a set of them. |
1481 Therefore we try to define another relation(predicate) |
1476 Therefore we try to define another relation(predicate) |
1482 to capture the fact the regular expression has bits |
1477 to capture the fact the regular expression has bits |
1483 being moved around but still has all the bits needed. |
1478 being moved around but still has all the bits needed. |
1635 \end{center} |
1630 \end{center} |
1636 Here $\gg$ is almost like an $\textit{NFA}$ in the sense that |
1631 Here $\gg$ is almost like an $\textit{NFA}$ in the sense that |
1637 it simulates the lexing process with respect to different strings. |
1632 it simulates the lexing process with respect to different strings. |
1638 |
1633 |
1639 Our hope is that using $\gg$ we can prove the bits |
1634 Our hope is that using $\gg$ we can prove the bits |
1640 information are not lost when we simplify a regular expression, |
1635 information are not lost when we simplify a regular expression. |
1641 so we need to relate $\gg$ with simplifcation, for example, |
1636 So we need to relate $\gg$ with simplifcation, for example, |
1642 one of the lemmas we have proved about $\gg$ is that |
1637 one of the lemmas we have proved about $\gg$ is that |
1643 \item |
1638 \item |
1644 \begin{center} |
1639 \begin{center} |
1645 $\simp \; a \gg \textit{bs} \iff a \gg \textit{bs}$ |
1640 $\simp \; a \gg \textit{bs} \iff a \gg \textit{bs}$ |
1646 \end{center} |
1641 \end{center} |
1674 What we do after we work out |
1669 What we do after we work out |
1675 the proof of the above lemma |
1670 the proof of the above lemma |
1676 is still not clear. It is one of the next steps we need to |
1671 is still not clear. It is one of the next steps we need to |
1677 work on. |
1672 work on. |
1678 |
1673 |
1679 \subsection{the $\textit{ders}_2$ Function} |
1674 \subsection{The $\textit{ders}_2$ Function} |
1680 If we want to prove the result |
1675 If we want to prove the result |
1681 \begin{center} |
1676 \begin{center} |
1682 $ \textit{blexer}\_{simp}(r, \; s) = \textit{blexer}(r, \; s)$ |
1677 $ \textit{blexer}\_{simp}(r, \; s) = \textit{blexer}(r, \; s)$ |
1683 \end{center} |
1678 \end{center} |
1684 inductively |
1679 inductively |
1693 structure of the regular expression, and we are mainly |
1688 structure of the regular expression, and we are mainly |
1694 focusing on structure here. |
1689 focusing on structure here. |
1695 It is based on the observation that the derivative of $r_1 \cdot r_2$ |
1690 It is based on the observation that the derivative of $r_1 \cdot r_2$ |
1696 with respect to a string $s$ can actually be written in an "explicit form" |
1691 with respect to a string $s$ can actually be written in an "explicit form" |
1697 composed of $r_1$ and $r_2$'s derivatives. |
1692 composed of $r_1$ and $r_2$'s derivatives. |
1698 For example, we can look at how $r1\cdot r2$ expands |
1693 For example, we can look at how $r_1\cdot r_2$ expands |
1699 when being derived with a two-character string: |
1694 when being derived with a two-character string: |
1700 \begin{center} |
1695 \begin{center} |
1701 \begin{tabular}{lcl} |
1696 \begin{tabular}{lcl} |
1702 $ (r_1 \cdot r_2) \backslash [c_1c_2]$ & $=$ & $ (\textit{if} \; \nullable(r_1)\; \textit{then} \; ((r_1 \backslash c_1) \cdot r_2 + r_2 \backslash c_1) \; \textit{else} \; (r_1\backslash c_1) \cdot r_2) \backslash c_2$\\ |
1697 $ (r_1 \cdot r_2) \backslash [c_1c_2]$ & $=$ & $ (\textit{if} \; \nullable(r_1)\; \textit{then} \; ((r_1 \backslash c_1) \cdot r_2 + r_2 \backslash c_1) \; \textit{else} \; (r_1\backslash c_1) \cdot r_2) \backslash c_2$\\ |
1703 & $=$ & $\textit{if} \; \textit{nullable}(r_1) \;\textit{and} \; \nullable(r_1\backslash c_1) \; \textit{then} \; |
1698 & $=$ & $\textit{if} \; \textit{nullable}(r_1) \;\textit{and} \; \nullable(r_1\backslash c_1) \; \textit{then} \; |
1725 \begin{tabular}{lcl} |
1720 \begin{tabular}{lcl} |
1726 $(r_1 \cdot r_2) \backslash s $ & $=$ & $(r_1\backslash s) \cdot r_2 + \sum\limits_{s_i }{r_2 \backslash s_j} \; \text{where} \; s_i \; \text{is} \; \text{true prefix}\; \text{of} \; s \;\text{and} \; s_i @s_j = s \; \text{and} \;\nullable(r_1\backslash s_i)$ |
1721 $(r_1 \cdot r_2) \backslash s $ & $=$ & $(r_1\backslash s) \cdot r_2 + \sum\limits_{s_i }{r_2 \backslash s_j} \; \text{where} \; s_i \; \text{is} \; \text{true prefix}\; \text{of} \; s \;\text{and} \; s_i @s_j = s \; \text{and} \;\nullable(r_1\backslash s_i)$ |
1727 \end{tabular} |
1722 \end{tabular} |
1728 \end{center} |
1723 \end{center} |
1729 We have formalized and proved the correctness of this |
1724 We have formalized and proved the correctness of this |
1730 alternative definition of derivative and call it $\textit{ders2}$ to |
1725 alternative definition of derivative and call it $\textit{ders}_2$ to |
1731 make a distinction of it with the $\textit{ders}$-function. |
1726 make a distinction of it with the $\textit{ders}$-function. |
1732 Note this differentiates from the lexing algorithm in the sense that |
1727 Note this differentiates from the lexing algorithm in the sense that |
1733 it calculates the results $r_1\backslash s_i , r_2 \backslash s_j$ first |
1728 it calculates the results $r_1\backslash s_i , r_2 \backslash s_j$ first |
1734 and then glue them together |
1729 and then glue them together |
1735 into nested alternatives whereas the $r_1 \cdot r_2 \backslash s$ procedure, |
1730 into nested alternatives. |
1736 used by algorithm $\lexer$, can only produce each element of the list |
1731 $\lexer$, on the other hand, can only produce each element of the list |
1737 in the resulting alternatives regular expression |
1732 in the resulting alternatives regular expression |
1738 altogether rather than |
1733 altogether in the last derivative step. |
1739 generating each of the children nodes |
1734 $\lexer$ does lexing in a "breadth first" manner, |
1740 in a single recursive call that is only for generating that |
1735 generating all the children nodes simultaneously |
1741 very expression itself. |
1736 whereas |
1742 $\lexer$ does lexing in a "breadth first" manner whereas |
1737 $\textit{ders}_2$ does it in a "depth first" manner. |
1743 $\textit{ders2}$ does it in a "depth first" manner. |
|
1744 Using this intuition we can also define the annotated regular expression version of |
1738 Using this intuition we can also define the annotated regular expression version of |
1745 derivative and call it $\textit{bders2}$ and prove the equivalence with $\textit{bders}$. |
1739 derivative and call it $\textit{bders}_2$ and prove the equivalence with $\textit{bders}$. |
1746 Our hope is to use this alternative definition as a guide |
1740 Our hope is to use this alternative definition as a guide |
1747 for our induction. |
1741 for our induction. |
1748 Using $\textit{bders2}$ we have a clearer idea |
1742 Using $\textit{bders}_2$ we have a clearer idea |
1749 of what $r\backslash s$ and $\simp(r\backslash s)$ looks like. |
1743 of what $r\backslash s$ and $\simp(r\backslash s)$ looks like. |
1750 \section{Conclusion} |
1744 \section{Conclusion} |
1751 Under the exhaustive tests we believe the main |
1745 Under the exhaustive tests we believe the main |
1752 result holds, yet a proof still seems elusive. |
1746 result holds, yet a proof still seems elusive. |
1753 We have tried out different approaches, and |
1747 We have tried out different approaches, and |
1758 are the subtle differences between a |
1752 are the subtle differences between a |
1759 nested simplified regular expression and a |
1753 nested simplified regular expression and a |
1760 regular expression that is simplified at the final moment. |
1754 regular expression that is simplified at the final moment. |
1761 We are almost there, but a last step is needed to make the proof work. |
1755 We are almost there, but a last step is needed to make the proof work. |
1762 Hopefully in the next few weeks we will be able to find one. |
1756 Hopefully in the next few weeks we will be able to find one. |
1763 |
1757 This would be an important milestone for my dissertation. |
1764 |
1758 |
1765 \bibliographystyle{plain} |
1759 \bibliographystyle{plain} |
1766 \bibliography{root} |
1760 \bibliography{root} |
1767 |
1761 |
1768 |
1762 |