--- a/etnms/etnms.tex	Fri Jan 10 17:02:16 2020 +0000
+++ b/etnms/etnms.tex	Fri Jan 10 22:35:08 2020 +0000
@@ -87,27 +87,54 @@
   answers with \emph{how} it matches the string.  This is important for
   applications such as lexing (tokenising a string). The problem is to
   make the algorithm by Sulzmann and Lu fast on all inputs without
-  breaking its correctness. We have already developed some
-  simplification rules for this, but have not yet proved that they
-  preserve the correctness of the algorithm. We also have not yet
-  looked at extended regular expressions, such as bounded repetitions,
+  breaking its correctness. Being fast depends on a complete set of 
+  simplification rules, some of which 
+  have been put forward by Sulzmann and Lu. We have extended their
+  rules in order to obtain a tight bound on size of regular expressions.
+  We have tested the correctness of these extended rules, but have not 
+  formally established their correctness. We also have not yet looked 
+  at extended regular expressions, such as bounded repetitions,
   negation and back-references.
 \end{abstract}
 
 \section{Introduction}
 
-
+While we believe derivatives of regular expressions is a beautiful 
+concept (interms of ease to implementing them in functional programming
+language and ease to reason about them formally), they have one major 
+drawback: every derivative step can make regular expressions grow 
+drastically in size. This in turn has negative effects on the runtime of 
+the corresponding lexing algorithms. Consider for example the regular 
+expression $(a+aa)^*$ and the short string $aaaaaaaaaaaa$. The size of
+the corresponding derivative is already 8668 node assuming the derivatives
+is seen as a tree. The reason for the poor runtime of the lexing algorithms is
+that they need to traverse such trees over and over again. The solution is to
+find a complete set of simplification rules that keep the sizes of derivatives
+uniformly small.
 
-\noindent\rule[0.5ex]{\linewidth}{1pt}
-Between the 2 bars are the new materials.\\
-In the past 6 months I was trying to prove that the bit-coded algorithm is correct.
-\begin{center}
-$\blexers \;r \; s = \blexer \; r \; s$
-\end{center}
+For reasons beyond this report, it turns out that a complete set of 
+simplification rules depend on values being encoded as bitsequences.
+(Vlue are the results of the lexing algorithms generate; they encode how
+a regular expression matched a string.) We already know that the lexing 
+algorithm \emph{without} simplification is correct. Therefore in the
+past 6 months we were trying to prove  that the algorithm using bitsequences plus
+our simplification rules is correct. Formally this amounts to show that
+
+\begin{equation}\label{mainthm}
+\blexers \; r \; s = \blexer \;r\;s
+\end{equation}
+
 \noindent
-To prove this, we need to prove these two functions produce the same output 
-whether or not $r \in L(r)$.
-Given the definition of $\blexer$ and $\blexers$:
+whereby $\blexers$ simplifies (makes derivatives smaller) in each step,
+whereas with $\blexer$ the size can grow exponentially. This would be an
+important milestone, because we already have a very good idea how to 
+establish that our set our simplification rules keeps the size below a
+relatively tight bound.
+
+In order to prove the main theorem \eqref{mainthm}, we need to prove the
+two functions produce the same output. The definition of these functions 
+is shown below.
+
 \begin{center}
 \begin{tabular}{lcl}
   $\textit{blexer}\;r\,s$ & $\dn$ &
@@ -118,41 +145,63 @@
 \end{tabular}
 \end{center}
 
- \begin{center}
+\begin{center}
 \begin{tabular}{lcl}
-  $\textit{blexer\_simp}\;r\,s$ & $\dn$ &
-      $\textit{let}\;a = (r^\uparrow)\backslash_{simp}\, s\;\textit{in}$\\                
-  & & $\;\;\textit{if}\; \textit{bnullable}(a)$\\
-  & & $\;\;\textit{then}\;\textit{decode}\,(\textit{bmkeps}\,a)\,r$\\
-  & & $\;\;\textit{else}\;\textit{None}$
+  $\blexers \; r \, s$ &$\dn$ &
+    $\textit{let} \; a = (r^\uparrow)\backslash_{simp}\, s\; \textit{in}$\\
+  & & $\; \; \textit{if} \; \textit{bnullable}(a)$\\
+  & & $\; \; \textit{then} \; \textit{decode}\,(\textit{bmkeps}\,a)\,r$\\
+  & & $\;\;   \textit{else}\;\textit{None}$
 \end{tabular}
 \end{center}
 \noindent
-it boils down to proving the following two propositions(depending on which
-branch in the if-else clause is taken):
+In these definitions $(r^\uparrow)$ is a kind of coding function that is the
+same in each case, similarly the decode and the \textit{bmkeps}
+functions. Our main theorem \eqref{mainthm} therefore boils down to
+proving the following two  propositions (depending on which branch the 
+if-else clause takes). It establishes how the derivatives  \emph{with}
+simplification do not change the computed result:
 
 \begin{itemize}
-
-\item{}
-When s is a string in the language L(r), \\
-$\textit{bmkeps} (r^\uparrow)\backslash_{simp}\, s  = \textit{bmkeps} (r^\uparrow)\backslash s$, \\
-\item{}
-when s is not a string of the language L(ar)
-ders\_simp(ar, s) is not nullable
+\item{} If a string $s$ is in the language of $L(r)$, then \\
+$\textit{bmkeps} (r^\uparrow)\backslash_{simp}\,s = \textit{bmkeps} (r^\uparrow)\backslash s$,\\
+\item{} If a string $s$ is in the language $L(r)$, then 
+$\rup \backslash_{simp} \,s$ is not nullable.
 \end{itemize}
-The second one is relatively straightforward using isabelle to prove.
-The first part requires more effort. 
-It builds on the result that the bit-coded algorithm without simplification
-produces the correct result:
-\begin{center}
-$\blexer \;r^\uparrow  s = \lexer \; r\; s$
-\end{center}
+
 \noindent
-the definition of lexer and its correctness is 
-omitted(see \cite{AusafDyckhoffUrban2016}).
-if we can prove that the bit-coded algorithm with simplification produces 
-the same result as the original bit-coded algorithm, 
-then we are done.
+We have already proved in Isabelle the second part. This is actually not 
+too difficult because we can show that simplification does not change
+the language of regular expressions. If we can prove the first case,
+that is the bitsequence algorithm with simplification produces the same 
+result as the one without simplification, then we are done.
+Unfortunately that part requires more effort, because simplification does not 
+only.need to \emph{not} change the language, but also not change
+the value (computed result).
+
+\bigskip\noindent\rule[1.5ex]{\linewidth}{5pt}
+Do you want to keep this? You essentially want to say that the old
+method used retrieve, which unfortunately cannot be adopted to 
+the simplification rules. You could just say that and give an example.
+However you have to think about how you give the example....nobody knows
+about AZERO etc yet. Maybe it might be better to use normal regexes
+like $a + aa$, but annotate bitsequences as subscript like $_1(_0a + _1aa)$.
+
+\bigskip\noindent\rule[1.5ex]{\linewidth}{5pt}
+REPLY:\\
+Yes, I am essentially saying that the old method
+cannot be adopted without adjustments.
+But this does not mean we should skip
+the proof of the bit-coded algorithm
+as it is still the main direction we are looking into
+to prove things. We are trying to modify
+the old proof to suit our needs, but not give 
+up it totally, that is why i believe the old 
+proof is fundamental in understanding
+what we are doing in the past 6 months.
+
+\bigskip\noindent\rule[1.5ex]{\linewidth}{5pt}
+
 The correctness proof of
 \begin{center}
 $\blexer \; r^\uparrow  s = \lexer \;r \;s$
@@ -247,8 +296,8 @@
 equalities,
 $\retrieve \; a \; v$ is not always defined.
 for example,
-$\retrieve \; \AALTS(Z, \AONE(S), \AONE(S)) \; \Left(\Empty)$
-is defined, but not $\retrieve \; \AONE(\Z\S) \; \Left(\Empty)$,
+$\retrieve \; _0(_1a+_0a) \; \Left(\Empty)$
+is defined, but not $\retrieve \; _{01}a \;\Left(\Empty)$,
 though we can extract the same POSIX
 bits from the two annotated regular expressions.
 The latter might occur when we try to retrieve from
changeset 100	397b31867ea6
parent 95	c969a973fcae
child 101	4a327e70d538