cst_tests: comparison etnms/etnms.tex

equal deleted inserted replaced

-:bc340e8f4165
+:a7c063981fa5
 \blexers \; r \; s = \blexer \;r\;s
 \end{equation}
 \noindent
 whereby $\blexers$ simplifies (makes derivatives smaller) in each
-step, whereas with $\blexer$ the size can grow exponentially. This
+step. This
 would be an important milestone for my thesis, because we already
 have a very good idea how to establish that our set our simplification
-rules keeps the size of derivativs below a relatively tight bound.
+rules keep the size of derivatives below a relatively tight bound.
 In order to prove the main theorem in \eqref{mainthm}, we need to prove that the
 two functions produce the same output. The definition of these two  functions
 is shown below.
 %\end{center}
 \noindent
 which established that the bit-sequence algorithm produces the same
 result as the original algorithm, which does not use
-bit-sequence.
+bit-sequences.
 The proof uses two ``tricks''. One is that it uses a \flex-function
 \begin{center}
 \begin{tabular}{lcl}
 $\textit{flex} \;r\; f\; (c\!::\!s) $ & $\dn$ & $\textit{flex} \;  (r\backslash c) \;(\lambda v. f (inj \; r \; c \; v)) \;s$ \\
 $ \textit{retrieve} \;
 a \; v \;=\; \textit{retrieve}  \; (\textit{simp}\,a) \; v'.$
 \end{center}
 The idea is that using $v'$, a simplified version of $v$ that had gone
 through the same simplification step as $\textit{simp}(a)$, we are able
-to extract the bitcode that gives the same parsing information as the
+to extract the bitcode that gives the same lexing information as the
 unsimplified one.
 If we want to use a similar technique as
 that of the existing proof,
 we face the problem that in the above
 equalities,
 $\retrieve \; a \; v$ is not always defined.
-for example,
+For example,
 $\retrieve \; _0(_1a+_0a) \; \Left(\Empty)$
 is defined, but not $\retrieve \; (_{01}a) \;\Left(\Empty)$,
 though we can extract the same POSIX
 bits from the two annotated regular expressions.
 The latter might occur when we try to retrieve from
 a simplified regular expression using the same value
-as the unsimplified one.
+underlying the unsimplified one.
 This is because $\Left(\Empty)$ corresponds to
 the regular expression structure $\ONE+r_2$ instead of
 $\ONE$.
 That means, if we
 want to prove that
 \begin{center}
-$\textit{decode} \; \bmkeps \; \rup\backslash s \; r = \textit{decode} \; \bmkeps \; \rup\backslash_{simp} s \; r$
+$\bmkeps \; \rup\backslash s = \bmkeps \; \rup\backslash_{simp} s$
 \end{center}
 \noindent
 holds by using $\retrieve$,
 we probably need to prove an equality like below:
 \begin{center}
 \end{center}
 \noindent
 $f$ rectifies $r\backslash s$ so the value $\mkeps(f(r\backslash s))$ becomes
 something simpler
 to make the retrieve function defined.\\
-\subsubsection{Ways to Rectify Value}
+\subsubsection{Ways to Rectify Values}
 One way to do this is to prove the following:
 \begin{center}
 $\retrieve \; \rup\backslash_{simp} s \; \mkeps(\simp(r\backslash s))=\textit{retrieve} \; \rup\backslash s \; \mkeps(r\backslash s)$
 \end{center}
 \noindent
 \noindent
 whereas
 \begin{center}
 $\simp(\rup\backslash  s)$ is equal to $(_{00}\ONE +_{011}a^*)$
 \end{center}
-\noindent
-(For the sake of visual simplicity, we use numbers to denote the bits
-in bitcodes as we have previously defined for annotated
-regular expressions. $\S$ is replaced by
-subscript $_1$ and $\Z$ by $_0$.)
 What makes the difference?
 %Two "rules" might be inferred from the above example.
 $\quad\textit{case} \;  as' \Rightarrow  _{bs}\sum{as'}$.
 \end{center}
 \noindent
 The outmost bit $_0$ stays with
-the outmost regular expression, rather than being fused to
+the outermost regular expression, rather than being fused to
 its child regular expressions, as what we will later see happens
 to $\simp(\rup\backslash \, s)$.
 If we choose to not simplify in the midst of derivative operations,
 but only do it at the end after the string has been exhausted,
 namely, $\simp(\rup\backslash \, s)=\simp((\rup\backslash a)\backslash a)$,
 $(when \; \textit{bnullable}\,a_1)$\\
 					       & &$_{bs}\sum\,\;[_{[]}((a_1\,\backslash c) \cdot \,a_2),$\\
 					       & &$(\textit{fuse}\,(\textit{bmkeps}\,a_1)\,(a_2\,\backslash c))]$\\
 \end{tabular}
 \end{center}
+\noindent
 because
 $\rup\backslash a = (_0\ONE  + \ZERO)(_0a  +  _1a^*)$
 is a sequence
 with the first component being nullable
 (unsimplified, unlike the first round of running$\backslash_{simp}$).
 $\exists \textit{rs}_1. \; \simp(r_2 \backslash c_2) = _{bs}{\sum \textit{rs}_1}$ &  $and \;\simp(r_1 \backslash [c_1c_2]) = \ZERO\}$\\
 \end{tabular}
 \end{center}
 We take a pair $(r, \;s)$ from the set $D$.
-Now we compute ${\bf \rup \backslash_{simp} s}$, we get:
+Now we compute $ \rup \backslash_{simp} s$, we get:
 \begin{center}
 \begin{tabular}{lcl}
 $(r_1\cdot r_2)\backslash_{simp} \, [c_1c_2]$ & $= \simp\left[ \big(\simp\left[ \left( r_1\cdot r_2 \right) \backslash c_1\right] \big)\backslash c_2\right]$\\
 								      & $= \simp\left[ \big(\simp \left[  \left(r_1 \backslash c_1\right) \cdot r_2 \right] \big) \backslash c_2 \right]$\\
 								      & $= \simp \left[  (\fuse \; \bmkeps(r_1\backslash c_1) \; \simp(r_2) ) \backslash c_2 \right]$,\\
-We have changed the algorithm to suppress the old
+We have changed the algorithm to avoid the old
 counterexample, but this gives rise to new counterexamples.
 This dilemma causes this amendment not a successful
 attempt to make $\rup\backslash_{simp} \, s = \simp(\rup\backslash s)$
 under every possible regular expression and string.
 \subsection{Properties of the Function $\simp$}
 \begin{center}
 $\simp(\simp(r)) = \simp(r)$
 \end{center}
 \item
 \begin{center}
-$\textit{if} r = \simp(r') \textit{then} \; \textit{good}(r) $
+$\textit{if}\; r = \simp(r') \textit{then} \; \textit{good}(r) $
 \end{center}
 \end{itemize}
-\subsection{the Contains relation}
+\subsection{The Contains relation}
 $\retrieve$ is a too strong relation in that
 it only extracts one bitcode instead of a set of them.
 Therefore we try to define another relation(predicate)
 to capture the fact the regular expression has bits
 being moved around but still has all the bits needed.
 \end{center}
 Here $\gg$ is almost like an $\textit{NFA}$ in the sense that
 it simulates the lexing process with respect to different strings.
 Our hope is that using $\gg$ we can prove the bits
-information are not lost when we simplify a regular expression,
+information are not lost when we simplify a regular expression.
-so we need to relate $\gg$ with simplifcation, for example,
+So we need to relate $\gg$ with simplifcation, for example,
 one of the lemmas we have proved about $\gg$ is that
 \item
 \begin{center}
 $\simp \; a \gg \textit{bs} \iff  a \gg \textit{bs}$
 \end{center}
 What we do after we work out
 the proof of the above lemma
 is still not clear. It is one of the next steps we need to
 work on.
-\subsection{the $\textit{ders}_2$ Function}
+\subsection{The $\textit{ders}_2$ Function}
 If we want to prove the result
 \begin{center}
 	$ \textit{blexer}\_{simp}(r, \; s) =  \textit{blexer}(r, \; s)$
 \end{center}
 inductively
 structure of the regular expression, and we are mainly
 focusing on structure here.
 It is based on the observation that the derivative of $r_1 \cdot r_2$
 with respect to a string $s$ can actually be written in an "explicit form"
 composed of $r_1$ and $r_2$'s derivatives.
-For example, we can look at how $r1\cdot r2$ expands
+For example, we can look at how $r_1\cdot r_2$ expands
 when being derived with a two-character string:
 \begin{center}
 \begin{tabular}{lcl}
 	$ (r_1 \cdot r_2) \backslash [c_1c_2]$ & $=$ & $ (\textit{if} \; \nullable(r_1)\;  \textit{then} \; ((r_1 \backslash c_1) \cdot r_2 + r_2 \backslash c_1) \; \textit{else} \; (r_1\backslash c_1) \cdot r_2) \backslash c_2$\\
 	& $=$ & $\textit{if} \; \textit{nullable}(r_1) \;\textit{and} \; \nullable(r_1\backslash c_1) \; \textit{then} \;
 \begin{tabular}{lcl}
 	$(r_1 \cdot r_2) \backslash s $ & $=$ & $(r_1\backslash s) \cdot r_2 + \sum\limits_{s_i }{r_2 \backslash s_j} \; \text{where} \; s_i \; \text{is} \; \text{true prefix}\;  \text{of} \; s \;\text{and} \; s_i @s_j = s \; \text{and} \;\nullable(r_1\backslash s_i)$
 \end{tabular}
 \end{center}
 We have formalized and proved the correctness of this
-alternative definition of derivative and call it $\textit{ders2}$ to
+alternative definition of derivative and call it $\textit{ders}_2$ to
 make a distinction of it with the $\textit{ders}$-function.
 Note this differentiates from the lexing algorithm  in the sense that
 it calculates the results $r_1\backslash s_i , r_2 \backslash s_j$ first
 and then glue them together
-into nested alternatives whereas the $r_1 \cdot r_2 \backslash s$ procedure,
+into nested alternatives.
-used by algorithm $\lexer$, can only produce each element of the list
+$\lexer$, on the other hand, can only produce each element of the list
 in the resulting alternatives regular expression
-altogether rather than
+altogether in the last derivative step.
-generating each of the children nodes
+$\lexer$ does lexing in a "breadth first" manner,
-in a single recursive call that is only for generating that
+generating all the children nodes simultaneously
-very expression itself.
+whereas
-$\lexer$ does lexing in a "breadth first" manner whereas
+$\textit{ders}_2$ does it in a "depth first" manner.
-$\textit{ders2}$ does it in a "depth first" manner.
 Using this intuition we can also define the annotated regular expression version of
-derivative and call it $\textit{bders2}$ and prove the equivalence with $\textit{bders}$.
+derivative and call it $\textit{bders}_2$ and prove the equivalence with $\textit{bders}$.
 Our hope is to use this alternative definition as a guide
 for our induction.
-Using $\textit{bders2}$ we have a clearer idea
+Using $\textit{bders}_2$ we have a clearer idea
 of what $r\backslash s$ and $\simp(r\backslash s)$ looks like.
 \section{Conclusion}
 Under the exhaustive tests we believe the main
 result holds, yet a proof still seems elusive.
 We have tried out different approaches, and
 are the subtle differences between a
 nested simplified regular expression and a
 regular expression that is simplified at the final moment.
 We are almost there, but a last step is needed to make the proof work.
 Hopefully in the next few weeks we will be able to find one.
+This would be an important milestone for my dissertation.
 \bibliographystyle{plain}
 \bibliography{root}

changeset 145	a7c063981fa5
parent 144	bc340e8f4165
child 148	c8ef391dd6f7