--- a/ninems/ninems.tex Sat Jul 06 19:48:20 2019 +0100
+++ b/ninems/ninems.tex Sat Jul 06 20:16:20 2019 +0100
@@ -489,8 +489,8 @@
reverses the "chopping off" for values which we did in the
first phase for derivatives. The corresponding function is
called $\textit{inj}$ (short for injection). This function takes three
-arguments: the first one is a regular expression $\textit{r_{i-1}}$
-before the character is chopped off, the second is $\textit{c_{i-1}}$,
+arguments: the first one is a regular expression ${r_{i-1}}$,
+before the character is chopped off, the second is ${c_{i-1}}$,
the character
we
want to inject and the third argument is the value $textit{v_i}$, where we
@@ -520,7 +520,7 @@
$r_{i-1}$, we need to locate where that hole is. We can find this location
informtaion by comparing $r_{i-1}$ and $v_i$.
For instance, if $r_{i-1}$ is of shape $r_a \cdot r_b$, and $v_i$
-is of shape $\textit{Left(Seq(v_a, v_b))}$, we know immediately that
+is of shape $\textit{Left}(\textit{Seq}(v_a, v_b))$, we know immediately that
\[ (r_a \cdot r_b)\backslash c = (r_a\backslash c) \cdot r_b \,+\, r_b\backslash c\],
otherwise if $r_a$ is not nullable,
\[ (r_a \cdot r_b)\backslash c = (r_a\backslash c) \cdot r_b\],
@@ -530,44 +530,56 @@
branch is taken instead of the right one. We have therefore found out
that the hole will be on $r_a$. So we recursively call $\textit{inj}
r_a c v_a$ to fill that hole in $v_a$. After injection, the value
-$v_i$ for $r_i = r_a \cdot \rb$ should be $\textit{Seq}(\textit{inj}
+$v_i$ for $r_i = r_a \cdot r_b$ should be $\textit{Seq}(\textit{inj}
r_a c v_a, v_b)$.
+Other clauses can be understood in a similar way.
To understand what is going on, it might be best to do some
example calculations and compare them with Figure~\eqref{graph:2}.
-
-
-
- A correctness proof using induction can be routinely established.
-
-It is instructive to see how this algorithm works by a little example.
-Suppose we have a regular expression $(a+b+ab+c+abc)^*$ and we want to
+% A correctness proof using induction can be routinely established.
+Suppose we have a regular expression $((((a+b)+ab)+c)+abc)^*$ and want to
match it against the string $abc$(when $abc$ is written as a regular
expression, the most standard way of expressing it should be $a \cdot (b
-\cdot c)$. We omit the parenthesis and dots here for readability). By
+\cdot c)$. We omit the parentheses and dots here for readability). By
POSIX rules the lexer should go for the longest matching, i.e. it should
match the string $abc$ in one star iteration, using the longest string
$abc$ in the sub-expression $a+b+ab+c+abc$(we use $r$ to denote this
sub-expression for conciseness). Here is how the lexer achieves a parse
-tree for this matching. First, we build successive derivatives until we
-exhaust the string, as illustrated here( we simplified some regular
-expressions like $0 \cdot b$ to $0$ for conciseness. Similarly, we allow
-$\textit{ALT}$ to take a list of regular expressions as an argument
-instead of just 2 operands to reduce the nested depth of
-$\textit{ALT}$):
-
-\[ r^* \xrightarrow{\backslash a} r_1 = (1+0+1 \cdot b + 0 + 1 \cdot b \cdot c) \cdot r* \xrightarrow{\backslash b}\]
-\[r_2 = (0+0+1 \cdot 1 + 0 + 1 \cdot 1 \cdot c) \cdot r^* +(0+1+0 + 0 + 0) \cdot r* \xrightarrow{\backslash c}\]
-\[r_3 = ((0+0+0 + 0 + 1 \cdot 1 \cdot 1) \cdot r^* + (0+0+0 + 1 + 0) \cdot r*) +((0+1+0 + 0 + 0) \cdot r*+(0+0+0 + 1 + 0) \cdot r* )
+tree for this matching. First, build successive derivatives until we
+exhaust the string, as illustrated here(we simplified some regular
+expressions like $0 \cdot b$ to $0$ for conciseness. We also omit
+some parentheses if they are clear from the context.):
+%Similarly, we allow
+%$\textit{ALT}$ to take a list of regular expressions as an argument
+%instead of just 2 operands to reduce the nested depth of
+%$\textit{ALT}$
+\begin{center}
+\[ r^* \xrightarrow{\backslash a} r_1 = (1+0+1 \cdot b + 0 + 1 \cdot b \cdot c) \cdot r^* \xrightarrow{\backslash b}\]
+\[r_2 = (0+0+1 \cdot 1 + 0 + 1 \cdot 1 \cdot c) \cdot r^* +(0+1+0 + 0 + 0) \cdot r^* \xrightarrow{\backslash c}\]
+\[r_3 = (( 0+0+0 + 0 + 1 \cdot 1 \cdot 1) \cdot r^* + (0+0+0 + 1 + 0) \cdot r^*) +((0+1+0 + 0 + 0) \cdot r^*+(0+0+0 + 1 + 0) \cdot r^* )
\]
-
-Now instead of using $nullable$ to give a $yes$, we call $mkeps$ to construct a parse tree for how $r_3$ matched the string $abc$. $mkeps$ gives the following value $v_3$: \\$Left(Left(Seq(Right(Right(Right(Seq(Empty, Seq(Empty, Empty)))))), Stars []))$\\
-This corresponds to the leftmost term
-$((0+0+0 + 0 + 1 \cdot 1 \cdot 1) \cdot r^* $\\
- in $r_3$. Note that its leftmost location allows $mkeps$ to choose it as the first candidate that meets the requirement of being $nullable$. This location is naturally generated by the splitting clause\\ $(r_1 \cdot r_2)\backslash c (when \; r_1 \; nullable)) \, = (r_1\backslash c) \cdot r_2 \,+\, r_2\backslash c. \\$ By this clause, we put
-$r_1 \backslash c \cdot r_2 $ at the $\textit{front}$ and $r_2 \backslash c$ at the $\textit{back}$. This allows $mkeps$ to always pick up among two matches the one with a longer prefix. The value \\
-$Left(Left(Seq(Right(Right(Right(Seq(Empty, Seq(Empty, Empty)))))), Stars []))$\\
-tells us how about the empty string matches the final regular expression after doing all the derivatives: among the regular expressions in \\$(0+0+0 + 0 + 1 \cdot 1 \cdot 1) \cdot r^* + (0+0+0 + 1 + 0) \cdot r*) +((0+1+0 + 0 + 0) \cdot r*+(0+0+0 + 1 + 0) \cdot r* )$, \\
+\end{center}
+Now instead when $nullable$ gives a $yes$ on $r_3$, we call $mkeps$
+to construct a parse tree for how $r_3$ matched the string $abc$.
+$mkeps$ gives the following value $v_3$:
+\[$Left(Left(Seq(Right(Seq(Empty, Seq(Empty, Empty))), Stars [])))$\]
+The outer 2 $Left$ tells us the first nullable part hides in the term
+\[((0+0+0 + 0 + 1 \cdot 1 \cdot 1) \cdot r^* \]
+ in \[r_3 = ( \underline{(0+0+0 + 0 + 1 \cdot 1 \cdot 1) \cdot r^*} + (0+0+0 + 1 + 0)
+ \cdot r^*) +((0+1+0 + 0 + 0) \cdot r^*+(0+0+0 + 1 + 0) \cdot r^* )
+ \](underlined). Note that the leftmost location of term \[((0+0+0 + 0 + 1 \cdot 1 \cdot 1) \cdot r^* \]
+ (which corresponds to the intial sub-match $abc$)
+ allows $mkeps$ to choose it as the first candidate that meets the requirement of being $nullable$
+ because $mkeps$ is defined to always choose the left one when nullable.
+ In the case of this example, $abc$ is preferred over $a$ or $ab$.
+ This location is naturally generated by the splitting clause\[
+ (r_1 \cdot r_2)\backslash c (when \; r_1 \; nullable)) \, = (r_1\backslash c) \cdot r_2 \,+\, r_2\backslash c\].
+ By this clause, we put
+$r_1 \backslash c \cdot r_2 $ at the $\textit{front}$ and $r_2 \backslash c$ at the $\textit{back}$.
+This allows $mkeps$ to always pick up among two matches the one with a longer initial sub-match.
+ The inside subvalue
+ \[$Seq(Right(Seq(Empty, Seq(Empty, Empty))), Stars [])$\]
+tells us how about the empty string matches : among the regular expressions in \\$(0+0+0 + 0 + 1 \cdot 1 \cdot 1) \cdot r^* + (0+0+0 + 1 + 0) \cdot r*) +((0+1+0 + 0 + 0) \cdot r*+(0+0+0 + 1 + 0) \cdot r* )$, \\
we choose the left most nullable one, which is composed of a sequence of an alternative and a star. In that alternative $0+0+0 + 0 + 1 \cdot 1 \cdot 1$ we take the rightmost choice.
Using the value $v_3$, the character c, and the regular expression $r_2$, we can recover how $r_2$ matched the string $[c]$ : we inject $c$ back to $v_3$, and get \\ $v_2 = Left(Seq(Right(Right(Right(Seq(Empty, Seq(Empty, c)))))), Stars [])$, \\