# HG changeset patch # User Chengsong # Date 1562356840 -3600 # Node ID 488d35c20949017926d663ddd52206c3edd4885b # Parent 4bfd2872288430d086a844335c0ef1d35d3e90c1 some minor changes diff -r 4bfd28722884 -r 488d35c20949 ninems/ninems.tex --- a/ninems/ninems.tex Fri Jul 05 18:01:26 2019 +0100 +++ b/ninems/ninems.tex Fri Jul 05 21:00:40 2019 +0100 @@ -478,13 +478,14 @@ After this, we inject back the characters one by one in order to build the parse tree $v_i$ for how the regex $r_i$ matches the string -$s_i$ ($s_i$ means the string s with the first $i$ characters being -chopped off) from the previous parse tree. After $n$ transformations, we -get the parse tree for how $r_0$ matches $s$, exactly as we wanted. An -inductive proof can be routinely established. +$s_i$ ($s_i = c_i \ldots c_n$ ) from the previous parse tree $v_{i+1}$. After injecting back $n$ characters, we +get the parse tree for how $r_0$ matches $s$, exactly as we wanted. + A correctness proof using induction can be routinely established. -It is instructive to see how it works by a little example. Suppose we have a regular expression $(a+b+ab+c+abc)*$ and we want to match it against the string $abc$. By POSIX rules the lexer should go for the longest matching, i.e. it should match the string $abc$ in one star iteration, using the longest string $abc$ in the sub-expression $a+b+ab+c+abc$(we use $r$ to denote this sub-expression for conciseness). Here is how the lexer achieves a parse tree for this matching. -First, we build successive derivatives until we exhaust the string, as illustrated here( we omitted some parenthesis for better readability): +It is instructive to see how this algorithm works by a little example. Suppose we have a regular expression $(a+b+ab+c+abc)^*$ and we want to match it against the string $abc$(when $abc$ is written +as a regular expression, the most standard way of expressing it should be $a \cdot (b \cdot c)$. We omit the parenthesis and dots here for readability). +By POSIX rules the lexer should go for the longest matching, i.e. it should match the string $abc$ in one star iteration, using the longest string $abc$ in the sub-expression $a+b+ab+c+abc$(we use $r$ to denote this sub-expression for conciseness). Here is how the lexer achieves a parse tree for this matching. +First, we build successive derivatives until we exhaust the string, as illustrated here( we simplified some regular expressions like $0 \cdot b$ to $0$ for conciseness): \[ r^* \xrightarrow{\backslash a} r_1 = (1+0+1 \cdot b + 0 + 1 \cdot b \cdot c) \cdot r* \xrightarrow{\backslash b}\] \[r_2 = (0+0+1 \cdot 1 + 0 + 1 \cdot 1 \cdot c) \cdot r^* +(0+1+0 + 0 + 0) \cdot r* \xrightarrow{\backslash c}\]