ninems/ninems.tex
changeset 80 d9d61a648292
parent 79 481c8000de6d
child 81 a0df84875788
--- a/ninems/ninems.tex	Mon Jul 22 23:01:00 2019 +0100
+++ b/ninems/ninems.tex	Tue Jul 23 09:26:22 2019 +0100
@@ -586,7 +586,7 @@
 for conciseness). 
 Before $\textit{inj}$ comes into play, 
 our lexer first builds derivative using string $abc$ (we simplified some regular expressions like
-$0 \cdot b$ to $0$ for conciseness; we also omit parentheses if
+$\ZERO \cdot b$ to $\ZERO$ for conciseness; we also omit parentheses if
 they are clear from the context):
 %Similarly, we allow
 %$\textit{ALT}$ to take a list of regular expressions as an argument
@@ -746,9 +746,9 @@
 
 \begin{center}
 \begin{tabular}{lcl}
- $\textit{pder} \; c \; 0$ & $\dn$ & $\emptyset$\\
- $\textit{pder} \; c \; 1$ & $\dn$ & $\emptyset$ \\
- $\textit{pder} \; c \; d$ & $\dn$ & $\textit{if} \; c \,=\, d \; \{  1   \}  \; \textit{else} \; \emptyset$ \\ 
+ $\textit{pder} \; c \; \ZERO$ & $\dn$ & $\emptyset$\\
+ $\textit{pder} \; c \; \ONE$ & $\dn$ & $\emptyset$ \\
+ $\textit{pder} \; c \; d$ & $\dn$ & $\textit{if} \; c \,=\, d \; \{  \ONE   \}  \; \textit{else} \; \emptyset$ \\ 
   $\textit{pder} \; c \; r_1+r_2$ & $\dn$ & $pder \; c \; r_1 \cup pder \; c \;  r_2$ \\
    $\textit{pder} \; c \; r_1 \cdot r_2$ & $\dn$ & $\textit{if} \; nullable \; r_1 $\\
      & & $\textit{then} \; \{  r \cdot r_2 \mid r \in pder \; c \; r_1   \}  \cup pder \; c \; r_2 \;$\\
@@ -929,7 +929,8 @@
 annotated regular expressions. This derivative operation is the same as
 what we previously have for the simple regular expressions, except that
 we beed to take special care of the bitcodes:\comment{You need to be consitent with  ALTS and ALT; ALT is just 
-an abbreviation; derivations and so on are defined for ALTS}
+an abbreviation; derivations and so on are defined for ALTS}\comment{no this is not the case,
+ALT for 2 regexes, ALTS for a list}
 
  %\begin{definition}{bder}
 \begin{center}
@@ -1039,16 +1040,16 @@
   \begin{tabular}{@{}lcl@{}}
    
   $\textit{simp} \; (\textit{SEQ}\;bs\,a_1\,a_2)$ & $\dn$ & $ (\textit{simp} \; a_1, \textit{simp}  \; a_2) \; \textit{match} $ \\
-   &&$\quad\textit{case} \; (0, \_) \Rightarrow  0$ \\
-   &&$\quad\textit{case} \; (\_, 0) \Rightarrow  0$ \\
-   &&$\quad\textit{case} \;  (1, a_2') \Rightarrow  \textit{fuse} \; bs \;  a_2'$ \\
-   &&$\quad\textit{case} \; (a_1', 1) \Rightarrow  \textit{fuse} \; bs \;  a_1'$ \\
+   &&$\quad\textit{case} \; (\ZERO, \_) \Rightarrow  \ZERO$ \\
+   &&$\quad\textit{case} \; (\_, \ZERO) \Rightarrow  \ZERO$ \\
+   &&$\quad\textit{case} \;  (\ONE, a_2') \Rightarrow  \textit{fuse} \; bs \;  a_2'$ \\
+   &&$\quad\textit{case} \; (a_1', \ONE) \Rightarrow  \textit{fuse} \; bs \;  a_1'$ \\
    &&$\quad\textit{case} \; (a_1', a_2') \Rightarrow  \textit{SEQ} \; bs \; a_1' \;  a_2'$ \\
 
   $\textit{simp} \; (\textit{ALTS}\;bs\,as)$ & $\dn$ & $\textit{distinct}( \textit{flatten} ( \textit{map simp as})) \; \textit{match} $ \\
-  &&$\quad\textit{case} \; [] \Rightarrow  0$ \\
+  &&$\quad\textit{case} \; [] \Rightarrow  \ZERO$ \\
    &&$\quad\textit{case} \; a :: [] \Rightarrow  \textit{fuse bs a}$ \\
-   &&$\quad\textit{case} \;  as' \Rightarrow  \textit{ALT}\;bs\;as'$\\ 
+   &&$\quad\textit{case} \;  as' \Rightarrow  \textit{ALTS}\;bs\;as'$\\ 
 
    $\textit{simp} \; a$ & $\dn$ & $\textit{a} \qquad \textit{otherwise}$   
 \end{tabular}    
@@ -1064,12 +1065,12 @@
 auxiliary functions flatten and distinct to open up nested
 $\textit{ALTS}$ and reduce as many duplicates as possible. Function
 distinct  keeps the first occurring copy only and remove all later ones
-when detected duplicates. Function flatten opens up nested \textit{ALT}.
+when detected duplicates. Function flatten opens up nested \textit{ALTS}.
 Its recursive definition is given below:
 
  \begin{center}
   \begin{tabular}{@{}lcl@{}}
-  $\textit{flatten} \; (\textit{ALT}\;bs\,as) :: as'$ & $\dn$ & $(\textit{map} \;
+  $\textit{flatten} \; (\textit{ALTS}\;bs\,as) :: as'$ & $\dn$ & $(\textit{map} \;
      (\textit{fuse}\;bs)\; \textit{as}) \; @ \; \textit{flatten} \; as' $ \\
   $\textit{flatten} \; \textit{ZERO} :: as'$ & $\dn$ & $ \textit{flatten} \;  as' $ \\
     $\textit{flatten} \; a :: as'$ & $\dn$ & $a :: \textit{flatten} \; as'$ \quad(otherwise) 
@@ -1090,7 +1091,7 @@
 \begin{center}
 \begin{tabular}{lcl}
 $r \backslash_{simp} (c\!::\!s) $ & $\dn$ & $(r \backslash_{simp}\, c) \backslash_{simp}\, s$ \\
-$r \backslash [\,] $ & $\dn$ & $r$
+$r \backslash_{simp} [\,] $ & $\dn$ & $r$
 \end{tabular}
 \end{center}
 
@@ -1142,10 +1143,15 @@
 straightforward to establish that after one simplification step, the part of a
 nullable derivative that corresponds to a POSIX value remains intact and can
 still be collected, in other words, we can show that\comment{Double-check....I
-think this  is not the case}
+think this  is not the case}\comment{If i remember correctly, you have proved this lemma.
+I feel this is indeed not true because you might place arbitrary 
+bits on the regex r, however if this is the case, did i remember wrongly that
+you proved something like simplification does not affect $\textit{bmkeps}$ results?
+Anyway, i have amended this a little bit so it does not allow arbitrary bits attached
+to a regex. Maybe it works now.}
 
 \begin{center}
-$\textit{bmkeps} \; r = \textit{bmkeps} \; \textit{simp} \; r\;( r\; \textit{nullable})$
+$\textit{bmkeps} \; a = \textit{bmkeps} \; \textit{bsimp} \; a\;( a\; \textit{bnullable} and \textit{decode}(r, \textit{bmkeps}(a)) is a \textit{POSIX} value)$
 \end{center}
 
 \noindent
@@ -1159,47 +1165,63 @@
 
 \noindent That is, if we do derivative on regular expression $r$ and then
 simplify it, and repeat this process until we exhaust the string, we get a
-regular expression $r''$ that provides the POSIX matching as the result $r'$ of
-the normal derivative algorithm which only does derivative operation repeatedly
-and has no simplification at all.  This might seem at first glance very
-unintuitive, as the $r'$ is exponentially larger than $r''$. But this can be
-explained in the following way: we are pruning away the possible matches that
-are not POSIX. Since there are exponentially non-POSIX matchings and only 1
-POSIX matching, it is understandable that our $r''$ can be a lot smaller.  we
-can still provide the same POSIX value if there is one.  This is not as
-straightforward as the previous proposition, as the two regular expressions $r$
-and $\textit{simp}\; r$ might become very different regular expressions after
-repeated application of $\textit{simp}$ and derivative.  The crucial point is
-to find the indispensable information of a regular expression and how it is
-kept intact during simplification so that it performs as good as a regular
-expression that has not been simplified in the subsequent derivative
-operations.  To aid this, we use the helping function retrieve described by
-Sulzmann and Lu: \\definition of retrieve\\
- 
-This function assembled the bitcode that corresponds to a parse tree for
-how the current derivative matches the suffix of the string(the
-characters that have not yet appeared, but is stored in the value).
-Sulzmann and Lu used this to connect the bitcoded algorithm to the older
-algorithm by the following equation:
+regular expression $r''$($\textit{LHS}$)  that provides the POSIX matching
+information, which is exactly the same as the result $r'$($\textit{RHS}$ of the
+normal derivative algorithm that only does derivative repeatedly and has no
+simplification at all.  This might seem at first glance very unintuitive, as
+the $r'$ is exponentially larger than $r''$, but can be explained in the
+following way: we are pruning away the possible matches that are not POSIX.
+Since there are exponentially non-POSIX matchings and only 1 POSIX matching, it
+is understandable that our $r''$ can be a lot smaller.  we can still provide
+the same POSIX value if there is one.  This is not as straightforward as the
+previous proposition, as the two regular expressions $r'$ and $r''$ might have
+become very different regular expressions.  The crucial point is to find the
+$\textit{POSIX}$  information of a regular expression and how it is modified,
+augmented and propagated 
+during simplification in parallel with the regularr expression that
+has not been simplified in the subsequent derivative operations.  To aid this,
+we use the helping function retrieve described by Sulzmann and Lu: \\definition
+of retrieve TODO\\
+This function assembles the bitcode that corresponds to a parse tree for how
+the current derivative matches the suffix of the string(the characters that
+have not yet appeared, but will appear as the successive derivatives go on,
+how do we get this "future" information? By the value $v$, which is
+computed by a pass of the algorithm that uses
+$inj$ as described in the previous section).  Sulzmann and Lu used this
+to connect the bitcoded algorithm to the older algorithm by the following
+equation:
  
  \begin{center} $inj \;a\; c \; v = \textit{decode} \; (\textit{retrieve}\;
-	 ((\textit{internalise}\; r)\backslash_{simp} c) v)$ \end{center} A
-	 little fact that needs to be stated to help comprehension:
-	 \begin{center} $r^\uparrow = a$($a$ stands for $\textit{annotated}).$
-	 \end{center} Ausaf and Urban also used this fact to prove  the
-	 correctness of bitcoded algorithm without simplification.  Our purpose
-	 of using this, however, is try to establish \\ $ \textit{retrieve} \;
-	 a \; v \;=\; \textit{retrieve}  \; \textit{simp}(a) \; v'.$\\ The idea
-	 is that using $v'$, a simplified version of $v$ that possibly had gone
-	 through the same simplification step as $\textit{simp}(a)$ we are
-	 still  able to extract the bit-sequence that gives the same parsing
-	 information as the unsimplified one.  After establishing this, we
-	 might be able to finally bridge the gap of proving\\
-	 $\textit{retrieve} \; r   \backslash  s \; v = \;\textit{retrieve} \;
-	 \textit{simp}(r)  \backslash  s \; v'$\\ and subsequently\\
-	 $\textit{retrieve} \; r \backslash  s \; v\; = \; \textit{retrieve} \;
-	 r  \backslash_{simp}   s \; v'$.\\ This proves that our simplified
-	 version of regular expression still contains all the bitcodes needed.
+	 ((\textit{internalise}\; r)\backslash_{simp} c) v)$ 
+ \end{center} 
+ A little fact that needs to be stated to help comprehension:
+\begin{center} 
+	$r^\uparrow = a$($a$ stands for $\textit{annotated}).$
+\end{center} 
+Ausaf and Urban also used this fact to prove  the
+correctness of bitcoded algorithm without simplification.  Our purpose
+of using this, however, is to establish 
+\begin{center}
+$ \textit{retrieve} \;
+a \; v \;=\; \textit{retrieve}  \; \textit{simp}(a) \; v'.$
+\end{center}
+The idea
+is that using $v'$, a simplified version of $v$ that possibly had gone
+through the same simplification step as $\textit{simp}(a)$ we are
+able to extract the bit-sequence that gives the same parsing
+information as the unsimplified one.  After establishing this, we
+might be able to finally bridge the gap of proving
+\begin{center}
+$\textit{retrieve} \; r   \backslash  s \; v = \;\textit{retrieve} \;
+\textit{simp}(r)  \backslash  s \; v'$
+\end{center}
+and subsequently
+\begin{center}
+$\textit{retrieve} \; r \backslash  s \; v\; = \; \textit{retrieve} \;
+r  \backslash_{simp}   s \; v'$.
+\end{center}
+This proves that our simplified
+version of regular expression still contains all the bitcodes needed.
 
 
 The second task is to speed up the more aggressive simplification.