diff -r bbefcf7351f2 -r d256aabe88f3 ninems/ninems.tex --- a/ninems/ninems.tex Thu Jul 04 22:45:02 2019 +0100 +++ b/ninems/ninems.tex Thu Jul 04 23:20:00 2019 +0100 @@ -733,6 +733,12 @@ unnecessary ``copies'' of regular expressions (very similar to simplifying $r + r$ to just $r$, but in a more general setting). +Another modification is that we use simplification rules +inspired by Antimirov's work on partial derivatives. They maintain the +idea that only the first ``copy'' of a regular expression in an +alternative contributes to the calculation of a POSIX value. All +subsequent copies can be pruned from the regular expression. + A psuedocode version of our algorithm is given below:\\ simp r \defn r if r = ONE bs or CHAR bs c or STAR bs r @@ -768,17 +774,17 @@ This algorithm effectively keeps the regular expression size small, for example, with this simplification our previous $(a + aa)^*$ example's 8000 nodes will be reduced to only 6 and stay constant, however long the input string is. -Another modification is that we use simplification rules -inspired by Antimirov's work on partial derivatives. They maintain the -idea that only the first ``copy'' of a regular expression in an -alternative contributes to the calculation of a POSIX value. All -subsequent copies can be pruned from the regular expression. -We are currently engaged with proving that our simplification rules +We are currently engaged in 2 tasks related to this algorithm. +The first one is proving that our simplification rules actually do not affect the POSIX value that should be generated by the -algorithm according to the specification of a POSIX value and -furthermore that our derivatives stay small for all derivatives. For +algorithm according to the specification of a POSIX value + and furthermore obtain a much +tighter bound on the sizes of derivatives. The result is that our +algorithm should be correct and faster on all inputs. The original +blow-up, as observed in JavaScript, Python and Java, would be excluded +from happening in our algorithm.For this proof we use the theorem prover Isabelle. Once completed, this result will advance the state-of-the-art: Sulzmann and Lu wrote in their paper \cite{Sulzmann2014} about the bitcoded ``incremental @@ -794,11 +800,31 @@ \end{quote} \noindent -We would settle the correctness claim and furthermore obtain a much -tighter bound on the sizes of derivatives. The result is that our -algorithm should be correct and faster on all inputs. The original -blow-up, as observed in JavaScript, Python and Java, would be excluded -from happening in our algorithm. +We would settle the correctness claim. +It is relatively straightforward to establish that after 1 simplification step, the part of derivative that corresponds to a POSIX value remains intact and can still be collected, in other words, +bmkeps r = bmkeps simp r +as this basically comes down to proving actions like removing the additional $r$ in $r+r$ does not delete imporatnt POSIX information in a regular expression. +The hardcore of this problem is to prove that +bmkeps bders r = bmkeps bders simp r +That is, if we do derivative on regular expression r and the simplificed version fo r , they can still provie the same POSIZ calue if there is one . This is not as strightforward as the previous proposition, as the two regular expression r and simp r might become very different regular epxressions after repeated application ofd simp and derivative. +The crucial point is to find the "gene" of a regular expression and how it is kept intact during simplification. +To aid this, we are utilizing the helping function retrieve described by Sulzmann and Lu: +\\definition of retrieve\\ + This function assembled the bitcode that corresponds to a parse tree for how the current derivative mathces the suffix of the string(the characters that have not yet appeared, but is stored in the value). + Sulzmann and Lu used this to connect the bit-coded algorithm to the older algorithm by the following equation:\\ + inj r c v = decode retrieve (bder c (internalise r)) v + Fahad and Christian also used this fact to prove the correctness of bit-coded algorithm without simplificaiton. + Our purpose of using this, however, is try to establish + retrieve r v = retrieve simp(r) v'. + Although the r and v are different now, we can still extract the bitsequence that gies the same parsing information. + After establishing this, we might be able to finally bridge the gap of proving + retrieve r \backslash s v = retrieve simp(r) \backslash s v' + and subsequently + retrieve r \backslash s v = retrieve r \backslash{with simplification} s v'. + This proves that our simplified version of regular expression still contains all the bitcodes neeeded. + +The second task is to speed up the more aggressive simplification. Currently it is slower than a naive simplifiction(the naive version as implemented in ADU of course can explode in some cases). +So it needs to be explored how to make it faster. Our possibility would be to explore again the connection to DFAs. This is very much work in progress. \section{Conclusion}