ninems/ninems.tex
changeset 55 488d35c20949
parent 54 4bfd28722884
child 56 747c8cf666ca
--- a/ninems/ninems.tex	Fri Jul 05 18:01:26 2019 +0100
+++ b/ninems/ninems.tex	Fri Jul 05 21:00:40 2019 +0100
@@ -478,13 +478,14 @@
 
  After this, we inject back the characters one by one in order to build
 the parse tree $v_i$ for how the regex $r_i$ matches the string
-$s_i$ ($s_i$ means the string s with the first $i$ characters being
-chopped off) from the previous parse tree. After $n$ transformations, we
-get the parse tree for how $r_0$ matches $s$, exactly as we wanted. An
-inductive proof can be routinely established.
+$s_i$ ($s_i = c_i \ldots c_n$ ) from the previous parse tree $v_{i+1}$. After injecting back $n$ characters, we
+get the parse tree for how $r_0$ matches $s$, exactly as we wanted.
+ A correctness proof using induction can be routinely established.
 
-It is instructive to see how it works by a little example. Suppose we have a regular expression $(a+b+ab+c+abc)*$ and we want to match it against the string $abc$. By POSIX rules the lexer should go for the longest matching, i.e. it should match the string $abc$ in one star iteration, using the longest string $abc$ in the sub-expression $a+b+ab+c+abc$(we use $r$ to denote this sub-expression for conciseness). Here is how the lexer achieves a parse tree for this matching.
-First, we build successive derivatives until we exhaust the string, as illustrated here( we omitted some parenthesis for better readability):
+It is instructive to see how this algorithm works by a little example. Suppose we have a regular expression $(a+b+ab+c+abc)^*$ and we want to match it against the string $abc$(when $abc$ is written
+as a regular expression, the most standard way of expressing it should be $a \cdot (b \cdot c)$. We omit the parenthesis and dots here for readability). 
+By POSIX rules the lexer should go for the longest matching, i.e. it should match the string $abc$ in one star iteration, using the longest string $abc$ in the sub-expression $a+b+ab+c+abc$(we use $r$ to denote this sub-expression for conciseness). Here is how the lexer achieves a parse tree for this matching.
+First, we build successive derivatives until we exhaust the string, as illustrated here( we simplified some regular expressions like $0 \cdot b$ to $0$ for conciseness):
 
 \[ r^* \xrightarrow{\backslash a} r_1 = (1+0+1 \cdot b + 0 + 1 \cdot b \cdot c) \cdot r* \xrightarrow{\backslash b}\]
 \[r_2 = (0+0+1 \cdot 1 + 0 + 1 \cdot 1 \cdot c) \cdot r^* +(0+1+0  + 0 + 0) \cdot r* \xrightarrow{\backslash c}\]