cst_tests: comparison etnms/etnms.tex

equal deleted inserted replaced

-:317a7d54ebcc
+:e0db3242d8b5
 negation and back-references.
 \end{abstract}
 \section{Introduction}
-While we believe derivatives of regular expressions is a beautiful
+While we believe derivatives of regular expressions, written
-concept (in terms of ease of implementing them in functional
+$r\backslash s$, are a beautiful concept (in terms of ease of
-programming languages and in terms of reasoning about them formally),
+implementing them in functional programming languages and in terms of
-they have one major drawback: every derivative step can make regular
+reasoning about them formally), they have one major drawback: every
-expressions grow drastically in size. This in turn has negative effect
+derivative step can make regular expressions grow drastically in
-on the runtime of the corresponding lexing algorithms. Consider for
+size. This in turn has negative effect on the runtime of the
-example the regular expression $(a+aa)^*$ and the short string
+corresponding lexing algorithms. Consider for example the regular
-$aaaaaaaaaaaa$. The corresponding derivative is already 8668 nodes
+expression $(a+aa)^*$ and the short string $aaaaaaaaaaaa$. The
-assuming the derivative is seen as a tree. The reason for the poor
+corresponding derivative contains already 8668 nodes where we assume
-runtime of the lexing algorithms is that they need to traverse such
+the derivative is given as a tree. The reason for the poor runtime of
-trees over and over again. The solution is to find a complete set of
+the derivative-based lexing algorithms is that they need to traverse
-simplification rules that keep the sizes of derivatives uniformly
+such trees over and over again. The solution is to find a complete set
+of simplification rules that keep the sizes of derivatives uniformly
 small.
 For reasons beyond this report, it turns out that a complete set of
 simplification rules depends on values being encoded as
 bitsequences.\footnote{Values are the results the lexing algorithms
 generate; they encode how a regular expression matched a string.} We
 already know that the lexing algorithm using bitsequences but
 \emph{without} simplification is correct, albeilt horribly
-slow. Therefore in the past 6 months we were trying to prove that the
+slow. Therefore in the past 6 months I was trying to prove that the
 algorithm using bitsequences plus our simplification rules is
-correct. Formally this amounts to show that
+also correct. Formally this amounts to show that
 \begin{equation}\label{mainthm}
 \blexers \; r \; s = \blexer \;r\;s
 \end{equation}
 \noindent
 whereby $\blexers$ simplifies (makes derivatives smaller) in each
 step, whereas with $\blexer$ the size can grow exponentially. This
-would be an important milestone for the thesis, because we already
+would be an important milestone for my thesis, because we already
 have a very good idea how to establish that our set our simplification
 rules keeps the size of derivativs below a relatively tight bound.
-In order to prove the main theorem in \eqref{mainthm}, we need to prove the
+In order to prove the main theorem in \eqref{mainthm}, we need to prove that the
-two functions produce the same output. The definition of these functions
+two functions produce the same output. The definition of these two  functions
 is shown below.
 \begin{center}
 \begin{tabular}{lcl}
 $\textit{blexer}\;r\,s$ & $\dn$ &
 \end{tabular}
 \end{center}
 \noindent
 In these definitions $(r^\uparrow)$ is a kind of coding function that
 is the same in each case, similarly the decode and the \textit{bmkeps}
-are functions that are the same in each case. Our main theorem
+are functions that are the same in each case. Our main
-\eqref{mainthm} therefore boils down to proving the following two
+theorem~\eqref{mainthm} therefore boils down to proving the following
-propositions (depending on which branch the if-else clause takes). It
+two propositions (depending on which branch the if-else clause
-establishes how the derivatives \emph{with} simplification do not
+takes). They establish how the derivatives \emph{with} simplification
-change the computed result:
+do not change the computed result:
 \begin{itemize}
 \item{(a)} If a string $s$ is in the language of $L(r)$, then \\
 $\textit{bmkeps} (r^\uparrow)\backslash_{simp}\,s = \textit{bmkeps} (r^\uparrow)\backslash s$,\\
 \item{(b)} If a string $s$ is in the language $L(r)$, then
 $\rup \backslash_{simp} \,s$ is not nullable.
 \end{itemize}
 \noindent
-We have already proved in Isabelle the second part. This is actually
+We have already proved the second part  in Isabelle. This is actually
 not too difficult because we can show that simplification does not
 change the language of simplified regular expressions.
 If we can prove the first part, that is the bitsequence algorithm with
 simplification produces the same result as the one without
 %up it totally, that is why i believe the old
 %proof is fundamental in understanding
 %what we are doing in the past 6 months.
 %\bigskip\noindent\rule[1.5ex]{\linewidth}{5pt}
+\subsubsection*{Existing Proof}
 For this we have started with looking at the original proof that
 established that the bitsequence algorrithm produces the same result
 as the algorithm not using bitsequences. Formally this proof
 established
 %\begin{center}
 %$\blexer \; r^\uparrow \;s = \blexers \; r^\uparrow \;s$
 %\end{center}
 \noindent
-This proof used two ``tricks''. One is that it defined a \flex-function
+The proof uses two ``tricks''. One is that it uses a \flex-function
 \begin{center}
 \begin{tabular}{lcl}
 $\textit{flex} \;r\; f\; (c\!::\!s) $ & $\dn$ & $\textit{flex} \;  (r\backslash c) \;(\lambda v. f (inj \; r \; c \; v)) \;s$ \\
 $\textit{flex} \;r\; f\;  [\,] $ & $\dn$ & $f$
 \end{tabular}
 \end{center}
 \noindent
-and then proved for the right-hand side in \eqref{lexer}
+and then proves for the right-hand side in \eqref{lexer}
 \begin{center}
-$\lexer \;r\; s = \flex \;\textit{id} \; r\;s \;(\mkeps \; r\backslash s)$
+$\lexer \;r\; s = \flex \;\textit{id} \; r\;s \;(\mkeps \; (r\backslash s))$
 \end{center}.
-\noindent\rule[1.5ex]{\linewidth}{1pt}
 \noindent
 The $\flex$-function essentially does lexing by
 stacking up injection functions while doing derivatives.
 explicitly showing the order of characters being
 injected back in each step.
 With $\flex$ we can write $\lexer$ this way:
 \begin{center}
 $\lexer \;r\; s = \flex \;id \; r\;s \;(\mkeps r\backslash s)$
 \end{center}
-\noindent
-$\flex$ focuses on
+%\noindent
-the injections instead
+%$\flex$ focuses on the injections instead of the derivatives ,
-of the derivatives ,
+%compared to the original definition of $\lexer$, which puts equal
-compared
+%amount of emphasis on injection and derivative with respect to each
-to the original definition of $\lexer$,
+%character:
-which puts equal amount of emphasis on
-injection and derivative with respect to each character:
+%\begin{center}
-\begin{center}
+%\begin{tabular}{lcl}
-\begin{tabular}{lcl}
+%$\textit{lexer} \; r\; (c\!::\!s) $ & $\dn$ & $\textit{case} \; \lexer \; (r\backslash c) \;s \; %\textit{of}$ \\
-$\textit{lexer} \; r\; (c\!::\!s) $ & $\dn$ & $\textit{case} \; \lexer \; (r\backslash c) \;s \; \textit{of}$ \\
+% & & $\textit{None} \; \Longrightarrow \; \textit{None}$\\
-& & $\textit{None} \; \Longrightarrow \; \textit{None}$\\
+%  & & $\textbar \; v \; \Longrightarrow \; \inj \; r\;c\;v$\\
-& & $\textbar \; v \; \Longrightarrow \; \inj \; r\;c\;v$\\
+%$\textit{lexer} \; r\;  [\,] $ & $\dn$ & $\textit{if} \; \nullable (r) \; \textit{then} \; \mkeps% (r) \; \textit{else} \;None$
-$\textit{lexer} \; r\;  [\,] $ & $\dn$ & $\textit{if} \; \nullable (r) \; \textit{then} \; \mkeps (r) \; \textit{else} \;None$
+%\end{tabular}
-\end{tabular}
+%\end{center}
-\end{center}
 \noindent
-Using this feature of $\flex$  we can rewrite the lexing
+The crux in the existing proof is how $\flex$ relates to injection, namely
-$w.r.t \; s @ [c]$ in term of lexing
-$w.r.t \; s$:
 \begin{center}
 $\flex \; r \; id \; (s@[c]) \; v = \flex \;  r \; id \; s \; (inj \; (r\backslash s) \; c\; v)$.
 \end{center}
-\noindent
-this allows us to use
+\noindent
-the inductive hypothesis to get
+This property allows one to rewrite an induction hypothesis like
 \begin{center}
 $ \flex \; r\; id\; (s@[c])\; v = \textit{decode} \;( \textit{retrieve}\; (\rup \backslash s) \; (\inj \; (r\backslash s) \;c\;v)\;) r$
 \end{center}
+\noindent\rule[1.5ex]{\linewidth}{4pt}
+There is no mention of retrieve yet .... this is the second trick in the
+existing proof. I am not sure whether you need to explain annotated regular
+expressions much earlier - maybe before the ``existing proof'' section, or
+evan earlier.
+\noindent\rule[1.5ex]{\linewidth}{4pt}
 \noindent
 By using a property of retrieve we have the $\textit{RHS}$ of the above equality is
 $decode (retrieve (r^\uparrow \backslash(s @ [c])) v) r$, and this gives the
 main lemma result:
 \begin{center}
 $ \flex \;r\;  id \; (s@[c]) \; v =\textit{decode}(\textit{retrieve} (\rup \backslash (s@[c])) \;v) r$
 \end{center}
 \noindent
 To use this lemma result for our
 correctness proof, simply replace the $v$ in the
 $\textit{RHS}$ of the above equality with
 $\mkeps\;(r\backslash (s@[c]))$, and apply the lemma that

changeset 106	e0db3242d8b5
parent 105	317a7d54ebcc
parent 104	e7ba4da53930
child 107	b1e365afa29c