# HG changeset patch # User Chengsong # Date 1583310338 0 # Node ID a7c063981fa5304d49db8b88e28e8d8c009228cf # Parent bc340e8f4165fa7f6858a6a571f5372a015c391c changes diff -r bc340e8f4165 -r a7c063981fa5 etnms/etnms.tex --- a/etnms/etnms.tex Tue Mar 03 21:56:09 2020 +0000 +++ b/etnms/etnms.tex Wed Mar 04 08:25:38 2020 +0000 @@ -795,10 +795,10 @@ \noindent whereby $\blexers$ simplifies (makes derivatives smaller) in each -step, whereas with $\blexer$ the size can grow exponentially. This +step. This would be an important milestone for my thesis, because we already have a very good idea how to establish that our set our simplification -rules keeps the size of derivativs below a relatively tight bound. +rules keep the size of derivatives below a relatively tight bound. In order to prove the main theorem in \eqref{mainthm}, we need to prove that the two functions produce the same output. The definition of these two functions @@ -889,7 +889,7 @@ \noindent which established that the bit-sequence algorithm produces the same result as the original algorithm, which does not use -bit-sequence. +bit-sequences. The proof uses two ``tricks''. One is that it uses a \flex-function \begin{center} @@ -1017,28 +1017,28 @@ \end{center} The idea is that using $v'$, a simplified version of $v$ that had gone through the same simplification step as $\textit{simp}(a)$, we are able -to extract the bitcode that gives the same parsing information as the +to extract the bitcode that gives the same lexing information as the unsimplified one. If we want to use a similar technique as that of the existing proof, we face the problem that in the above equalities, $\retrieve \; a \; v$ is not always defined. -for example, +For example, $\retrieve \; _0(_1a+_0a) \; \Left(\Empty)$ is defined, but not $\retrieve \; (_{01}a) \;\Left(\Empty)$, though we can extract the same POSIX bits from the two annotated regular expressions. The latter might occur when we try to retrieve from a simplified regular expression using the same value -as the unsimplified one. +underlying the unsimplified one. This is because $\Left(\Empty)$ corresponds to the regular expression structure $\ONE+r_2$ instead of $\ONE$. That means, if we want to prove that \begin{center} -$\textit{decode} \; \bmkeps \; \rup\backslash s \; r = \textit{decode} \; \bmkeps \; \rup\backslash_{simp} s \; r$ +$\bmkeps \; \rup\backslash s = \bmkeps \; \rup\backslash_{simp} s$ \end{center} \noindent holds by using $\retrieve$, @@ -1051,7 +1051,7 @@ $f$ rectifies $r\backslash s$ so the value $\mkeps(f(r\backslash s))$ becomes something simpler to make the retrieve function defined.\\ -\subsubsection{Ways to Rectify Value} +\subsubsection{Ways to Rectify Values} One way to do this is to prove the following: \begin{center} $\retrieve \; \rup\backslash_{simp} s \; \mkeps(\simp(r\backslash s))=\textit{retrieve} \; \rup\backslash s \; \mkeps(r\backslash s)$ @@ -1083,11 +1083,6 @@ \begin{center} $\simp(\rup\backslash s)$ is equal to $(_{00}\ONE +_{011}a^*)$ \end{center} -\noindent -(For the sake of visual simplicity, we use numbers to denote the bits -in bitcodes as we have previously defined for annotated -regular expressions. $\S$ is replaced by -subscript $_1$ and $\Z$ by $_0$.) What makes the difference? @@ -1177,7 +1172,7 @@ \noindent The outmost bit $_0$ stays with -the outmost regular expression, rather than being fused to +the outermost regular expression, rather than being fused to its child regular expressions, as what we will later see happens to $\simp(\rup\backslash \, s)$. If we choose to not simplify in the midst of derivative operations, @@ -1194,7 +1189,7 @@ & &$(\textit{fuse}\,(\textit{bmkeps}\,a_1)\,(a_2\,\backslash c))]$\\ \end{tabular} \end{center} - +\noindent because $\rup\backslash a = (_0\ONE + \ZERO)(_0a + _1a^*)$ is a sequence @@ -1231,7 +1226,7 @@ \end{center} We take a pair $(r, \;s)$ from the set $D$. -Now we compute ${\bf \rup \backslash_{simp} s}$, we get: +Now we compute $ \rup \backslash_{simp} s$, we get: \begin{center} \begin{tabular}{lcl} $(r_1\cdot r_2)\backslash_{simp} \, [c_1c_2]$ & $= \simp\left[ \big(\simp\left[ \left( r_1\cdot r_2 \right) \backslash c_1\right] \big)\backslash c_2\right]$\\ @@ -1437,7 +1432,7 @@ -We have changed the algorithm to suppress the old +We have changed the algorithm to avoid the old counterexample, but this gives rise to new counterexamples. This dilemma causes this amendment not a successful attempt to make $\rup\backslash_{simp} \, s = \simp(\rup\backslash s)$ @@ -1472,10 +1467,10 @@ \end{center} \item \begin{center} -$\textit{if} r = \simp(r') \textit{then} \; \textit{good}(r) $ +$\textit{if}\; r = \simp(r') \textit{then} \; \textit{good}(r) $ \end{center} \end{itemize} -\subsection{the Contains relation} +\subsection{The Contains relation} $\retrieve$ is a too strong relation in that it only extracts one bitcode instead of a set of them. Therefore we try to define another relation(predicate) @@ -1637,8 +1632,8 @@ it simulates the lexing process with respect to different strings. Our hope is that using $\gg$ we can prove the bits -information are not lost when we simplify a regular expression, -so we need to relate $\gg$ with simplifcation, for example, +information are not lost when we simplify a regular expression. +So we need to relate $\gg$ with simplifcation, for example, one of the lemmas we have proved about $\gg$ is that \item \begin{center} @@ -1676,7 +1671,7 @@ is still not clear. It is one of the next steps we need to work on. -\subsection{the $\textit{ders}_2$ Function} +\subsection{The $\textit{ders}_2$ Function} If we want to prove the result \begin{center} $ \textit{blexer}\_{simp}(r, \; s) = \textit{blexer}(r, \; s)$ @@ -1695,7 +1690,7 @@ It is based on the observation that the derivative of $r_1 \cdot r_2$ with respect to a string $s$ can actually be written in an "explicit form" composed of $r_1$ and $r_2$'s derivatives. -For example, we can look at how $r1\cdot r2$ expands +For example, we can look at how $r_1\cdot r_2$ expands when being derived with a two-character string: \begin{center} \begin{tabular}{lcl} @@ -1727,25 +1722,24 @@ \end{tabular} \end{center} We have formalized and proved the correctness of this -alternative definition of derivative and call it $\textit{ders2}$ to +alternative definition of derivative and call it $\textit{ders}_2$ to make a distinction of it with the $\textit{ders}$-function. Note this differentiates from the lexing algorithm in the sense that it calculates the results $r_1\backslash s_i , r_2 \backslash s_j$ first and then glue them together -into nested alternatives whereas the $r_1 \cdot r_2 \backslash s$ procedure, -used by algorithm $\lexer$, can only produce each element of the list +into nested alternatives. +$\lexer$, on the other hand, can only produce each element of the list in the resulting alternatives regular expression -altogether rather than -generating each of the children nodes -in a single recursive call that is only for generating that -very expression itself. -$\lexer$ does lexing in a "breadth first" manner whereas -$\textit{ders2}$ does it in a "depth first" manner. +altogether in the last derivative step. +$\lexer$ does lexing in a "breadth first" manner, +generating all the children nodes simultaneously +whereas +$\textit{ders}_2$ does it in a "depth first" manner. Using this intuition we can also define the annotated regular expression version of -derivative and call it $\textit{bders2}$ and prove the equivalence with $\textit{bders}$. +derivative and call it $\textit{bders}_2$ and prove the equivalence with $\textit{bders}$. Our hope is to use this alternative definition as a guide for our induction. -Using $\textit{bders2}$ we have a clearer idea +Using $\textit{bders}_2$ we have a clearer idea of what $r\backslash s$ and $\simp(r\backslash s)$ looks like. \section{Conclusion} Under the exhaustive tests we believe the main @@ -1760,7 +1754,7 @@ regular expression that is simplified at the final moment. We are almost there, but a last step is needed to make the proof work. Hopefully in the next few weeks we will be able to find one. - +This would be an important milestone for my dissertation. \bibliographystyle{plain} \bibliography{root}