cst_tests: changeset 49:d256aabe88f3

--- a/ninems/ninems.tex	Thu Jul 04 22:45:02 2019 +0100
+++ b/ninems/ninems.tex	Thu Jul 04 23:20:00 2019 +0100
@@ -733,6 +733,12 @@
 unnecessary ``copies'' of regular expressions (very similar to
 simplifying $r + r$ to just $r$, but in a more general
 setting). 
+Another modification is that we use simplification rules
+inspired by Antimirov's work on partial derivatives. They maintain the
+idea that only the first ``copy'' of a regular expression in an
+alternative contributes to the calculation of a POSIX value. All
+subsequent copies can be pruned from the regular expression.
+
 A psuedocode version of our algorithm is given below:\\
 
 simp r \defn r if r = ONE bs or CHAR bs c or STAR bs r
@@ -768,17 +774,17 @@
 This algorithm effectively keeps the regular expression size small, for example,
 with this simplification our previous $(a + aa)^*$ example's 8000 nodes will be reduced to only 6 and stay constant, however long the input string is.
 
-Another modification is that we use simplification rules
-inspired by Antimirov's work on partial derivatives. They maintain the
-idea that only the first ``copy'' of a regular expression in an
-alternative contributes to the calculation of a POSIX value. All
-subsequent copies can be pruned from the regular expression.
 
 
-We are currently engaged with proving that our simplification rules
+We are currently engaged in 2 tasks related to this algorithm. 
+The first one is proving that our simplification rules
 actually do not affect the POSIX value that should be generated by the
-algorithm according to the specification of a POSIX value and
-furthermore that our derivatives stay small for all derivatives. For
+algorithm according to the specification of a POSIX value
+ and furthermore obtain a much
+tighter bound on the sizes of derivatives. The result is that our
+algorithm should be correct and faster on all inputs.  The original
+blow-up, as observed in JavaScript, Python and Java, would be excluded
+from happening in our algorithm.For
 this proof we use the theorem prover Isabelle. Once completed, this
 result will advance the state-of-the-art: Sulzmann and Lu wrote in
 their paper \cite{Sulzmann2014} about the bitcoded ``incremental
@@ -794,11 +800,31 @@
 \end{quote}  
 
 \noindent
-We would settle the correctness claim and furthermore obtain a much
-tighter bound on the sizes of derivatives. The result is that our
-algorithm should be correct and faster on all inputs.  The original
-blow-up, as observed in JavaScript, Python and Java, would be excluded
-from happening in our algorithm.
+We would settle the correctness claim.
+It is relatively straightforward to establish that after 1 simplification step, the part of derivative that corresponds to a POSIX value remains intact and can still be collected, in other words,
+bmkeps r = bmkeps simp r
+as this basically comes down to proving actions like removing the additional $r$ in $r+r$  does not delete imporatnt POSIX information in a regular expression.
+The hardcore of this problem is to prove that
+bmkeps bders r = bmkeps bders simp r
+That is, if we do derivative on regular expression r and the simplificed version fo r , they can still provie the same POSIZ calue if there is one . This is not as strightforward as the previous proposition, as the two regular expression r and simp r  might become very different regular epxressions after repeated application ofd simp and derivative.
+The crucial point is to find the "gene" of a regular expression and how it is kept intact during simplification.
+To aid this, we are utilizing the helping function retrieve described by Sulzmann and Lu:
+\\definition of retrieve\\
+ This function assembled the bitcode that corresponds to a parse tree for how the current derivative mathces the suffix of the string(the characters that have not yet appeared, but is stored in the value).
+ Sulzmann and Lu used this to connect the bit-coded algorithm to the older algorithm by the following equation:\\
+ inj r c v = decode retrieve (bder c (internalise r)) v
+ Fahad and Christian also used this fact to prove  the correctness of bit-coded algorithm without simplificaiton.
+ Our purpose of using this, however, is try to establish 
+ retrieve r v = retrieve simp(r) v'.
+ Although the r and v are different now,  we can still extract the bitsequence that gies the same parsing information.
+ After establishing this, we might be able to finally bridge the gap of proving
+ retrieve r \backslash s v = retrieve simp(r) \backslash s v'
+ and subsequently
+ retrieve r \backslash s v = retrieve r \backslash{with simplification} s v'.
+ This proves that our simplified version of regular expression still contains all the bitcodes neeeded.
+
+The second task is to speed up the more aggressive simplification. Currently it is slower than a naive simplifiction(the naive version as implemented in ADU of course can explode in some cases).
+So it needs to be explored how to make it faster. Our possibility would be to explore again the connection to DFAs. This is very much work in progress.
 
 \section{Conclusion}
author	Chengsong
	Thu, 04 Jul 2019 23:20:00 +0100
changeset 49	d256aabe88f3
parent 48	bbefcf7351f2
child 50	866eda9ba66a