diff -r c8af86f7558e -r f2aa71c76cba etnms/etnms.tex --- a/etnms/etnms.tex Mon Mar 02 21:45:24 2020 +0000 +++ b/etnms/etnms.tex Mon Mar 02 22:13:37 2020 +0000 @@ -1736,22 +1736,36 @@ && $\textit{else if} \; \nullable(r_1) \textit{and} \;\textit{not} \; \nullable(r_1 \backslash c_1)\; \textit{then} \; ((r_1\backslash c_1c_2)\cdot r_2 + r_2 \backslash c_1c_2)$\\ && $\textit{else} \;(r_1\backslash c_1c_2) \cdot r_2$\\ - & $=$ & $\sum\limits{s_i }{r_1 \backslash s_i \cdot r_2 \backslash s_j} \; \text{where} \; s_i \; \text{is} \; \text{true prefix}\; \text{of} \; s \;\text{and} \; s_i @s_j = s \; \text{and} \;\nullable(r_1\backslash s_i)$ + & $=$ & $(r_1\backslash s) \cdot r_2 + \sum\limits_{s_j=c_2, c_1c_2 }{r_2 \backslash s_j} \; \text{where} \; \nullable(r_1\backslash s_i) \; \text{and} \;s_i @s_j = [c_1c_2]$ +\end{tabular} +\end{center} +In a more general form, +\begin{center} +\begin{tabular}{lcl} + $(r_1 \cdot r_2) \backslash s $ & $=$ & $(r_1\backslash s) \cdot r_2 + \sum\limits_{s_i }{r_2 \backslash s_j} \; \text{where} \; s_i \; \text{is} \; \text{true prefix}\; \text{of} \; s \;\text{and} \; s_i @s_j = s \; \text{and} \;\nullable(r_1\backslash s_i)$ \end{tabular} \end{center} -In a more serious manner, we should write -\begin{center} - $ (r_1 \cdot r_2) \backslash [c_1c_2] = \sum{r_1 \backslash s_i \cdot r_2 \backslash s_j}$ -\end{center} -Note this differentiates from the previous form in the sense that -it calculates the results $r_1\backslash s_i , r_2 \backslash s_j$ first and then glue them together -through nested alternatives whereas the $r_1 \cdot r_2 \backslash s$ procedure, -used by algorithm $\lexer$, can only produce each component of the -resulting alternatives regular expression altogether rather than -generating each of the children nodes one by one -n a single recursive call that is only for generating that +We have formalized and proved the correctness of this +alternative definition of derivative and call it $\textit{ders2}$ to +make a distinction of it with the $\textit{ders}$-function. +Note this differentiates from the lexing algorithm in the sense that +it calculates the results $r_1\backslash s_i , r_2 \backslash s_j$ first +and then glue them together +into nested alternatives whereas the $r_1 \cdot r_2 \backslash s$ procedure, +used by algorithm $\lexer$, can only produce each element of the list +in the resulting alternatives regular expression +altogether rather than +generating each of the children nodes +in a single recursive call that is only for generating that very expression itself. -We have this +$\lexer$ does lexing in a "breadth first" manner whereas +$\textit{ders2}$ does it in a "depth first" manner. +Using this intuition we can also define the annotated regular expression version of +derivative and call it $\textit{bders2}$ and prove the equivalence with $\textit{bders}$. +Our hope is to use this alternative definition as a guide +for our induction. +Using $\textit{bders2}$ we have a clearer idea +of what $r\backslash s$ and $\simp(r\backslash s)$ looks like. \section{Conclusion} Under the exhaustive tests we believe the main result holds, yet a proof still seems elusive. @@ -1765,181 +1779,6 @@ regular expression that is simplified at the final moment. We are almost there, but a last step is needed to make the proof work. Hopefully in the next few weeks we will be able to find one. -%CONSTRUCTION SITE HERE -that is to say, despite the bits are being moved around on the regular expression -(difference in bits), the structure of the (unannotated)regular expression -after one simplification is exactly the same after the -same sequence of derivative operations -regardless of whether we did simplification -along the way. - One way would be to give a function that calls - -fuse is the culprit: it causes the order in which alternatives -are opened up to be remembered and finally the difference -appear in $\simp(\rup \backslash s)$ and $\rup \backslash{simp} \,s$. -but we have to use them as they are essential in the simplification: -flatten needs them. - - - -However, without erase the above equality does not hold: -for the regular expression -$(a+b)(a+a*)$, -if we do derivative with respect to string $aa$, -we get - -\subsection{Another Proof Strategy} -sdddddr does not equal sdsdsdsr sometimes.\\ -For example, - -This equicalence class method might still have the potential of proving this, -but not yet -i parallelly tried another method of using retrieve\\ - - -The vsimp function, defined as follows -tries to simplify the value in lockstep with -regular expression:\\ - - -The problem here is that - -we used retrieve for the key induction: -$decode (retrieve (r\backslash (s @ [c])) v) r $ -$decode (retrieve (r\backslash s) (inj (r\backslash s) c v)) r$ -Here, decode recovers a value that corresponds to a match(possibly partial) -from bits, and the bits are extracted by retrieve, -and the key value $v$ that guides retrieve is -$mkeps r\backslash s$, $inj r c (mkeps r\backslash s)$, $inj (inj (v))$, ...... -if we can -the problem is that -need vsiimp to make a value that is suitable for decoding -$Some(flex rid(s@[c])v) = Some(flex rids(inj (r\backslash s)cv))$ -another way that christian came up with that might circumvent the -prblem of finding suitable value is by not stating the visimp -function but include all possible value in a set that a regex is able to produce, -and proving that both r and sr are able to produce the bits that correspond the POSIX value - -produced by feeding the same initial regular expression $r$ and string $s$ to the - two functions $ders$ and $ders\_simp$. -The reason why -Namely, if $bmkeps( r_1) = bmkeps(r_2)$, then we - - -If we define the equivalence relation $\sim_{m\epsilon}$ between two regular expressions -$r_1$ and $r_2$as follows: -$r_1 \sim_{m\epsilon} r_2 \iff bmkeps(r_1)= bmkeps(r_2)$ -(in other words, they $r1$ and $r2$ produce the same output under the function $bmkeps$.) -Then the first goal -might be restated as -$(r^\uparrow)\backslash_{simp}\, s \sim_{m\epsilon} (r^\uparrow)\backslash s$. -I tried to establish an equivalence relation between the regular experssions -like dddr dddsr,..... -but right now i am only able to establish dsr and dr, using structural induction on r. -Those involve multiple derivative operations are harder to prove. -Two attempts have been made: -(1)induction on the number of der operations(or in other words, the length of the string s), -the inductive hypothesis was initially specified as -"For an arbitrary regular expression r, -For all string s in the language of r whose length do not exceed -the number n, ders s r me derssimp s r" -and the proof goal may be stated as -"For an arbitrary regular expression r, -For all string s in the language of r whose length do not exceed -the number n+1, ders s r me derssimp s r" -the problem here is that although we can easily break down -a string s of length n+1 into s1@list(c), it is not that easy -to use the i.h. as a stepping stone to prove anything because s1 may well be not -in the language L(r). This inhibits us from obtaining the fact that -ders s1 r me derssimps s1 r. -Further exploration is needed to amend this hypothesis so it includes the -situation when s1 is not nullable. -For example, what information(bits? -values?) can be extracted -from the regular expression ders(s1,r) so that we can compute or predict the possible -result of bmkeps after another derivative operation. What function f can used to -carry out the task? The possible way of exploration can be -more directly perceived throught the graph below: -find a function -f -such that -f(bders s1 r) -= re1 -f(bderss s1 r) -= re2 -bmkeps(bders s r) = g(re1,c) -bmkeps(bderssimp s r) = g(re2,c) -and g(re1,c) = g(re2,c) -The inductive hypothesis would be -"For all strings s1 of length <= n, -f(bders s1 r) -= re1 -f(bderss s1 r) -= re2" -proving this would be a lemma for the main proof: -the main proof would be -" -bmkeps(bders s r) = g(re1,c) -bmkeps(bderssimp s r) = g(re2,c) -for s = s1@c -" -and f need to be a recursive property for the lemma to be proved: -it needs to store not only the "after one char nullable info", -but also the "after two char nullable info", -and so on so that it is able to predict what f will compute after a derivative operation, -in other words, it needs to be "infinitely recursive"\\ -To prove the lemma, in other words, to get -"For all strings s1 of length <= n+1, -f(bders s1 r) -= re3 -f(bderss s1 r) -= re4"\\ -from\\ -"For all strings s1 of length <= n, -f(bders s1 r) -= re1 -f(bderss s1 r) -= re2"\\ -it might be best to construct an auxiliary function h such that\\ -h(re1, c) = re3\\ -h(re2, c) = re4\\ -and re3 = f(bder c (bders s1 r))\\ -re4 = f(simp(bder c (bderss s1 r))) -The key point here is that we are not satisfied with what bders s r will produce under -bmkeps, but also how it will perform after a derivative operation and then bmkeps, and two -derivative operations and so on. In essence, we are preserving the regular expression -itself under the function f, in a less compact way than the regluar expression: we are -not just recording but also interpreting what the regular expression matches. -In other words, we need to prove the properties of bderss s r beyond the bmkeps result, -i.e., not just the nullable ones, but also those containing remaining characters.\\ -(2)we observed the fact that -erase sdddddr= erase sdsdsdsr -that is to say, despite the bits are being moved around on the regular expression -(difference in bits), the structure of the (unannotated)regular expression -after one simplification is exactly the same after the -same sequence of derivative operations -regardless of whether we did simplification -along the way. -However, without erase the above equality does not hold: -for the regular expression -$(a+b)(a+a*)$, -if we do derivative with respect to string $aa$, -we get -%TODO -sdddddr does not equal sdsdsdsr sometimes.\\ -For example, - -This equicalence class method might still have the potential of proving this, -but not yet -i parallelly tried another method of using retrieve\\ - - - -\noindent\rule[0.5ex]{\linewidth}{1pt} - - - - \bibliographystyle{plain}