# HG changeset patch
# User Chengsong
# Date 1657807052 -3600
# Node ID 28cb8089ec368fb90d6aa6db827db04b3314157b
# Parent  94604a5fd2717eb99cbe71c0320b7f8954fd3a94
more updaates

diff -r 94604a5fd271 -r 28cb8089ec36 ChengsongTanPhdThesis/Chapters/Inj.tex
--- a/ChengsongTanPhdThesis/Chapters/Inj.tex	Wed Jul 13 08:35:45 2022 +0100
+++ b/ChengsongTanPhdThesis/Chapters/Inj.tex	Thu Jul 14 14:57:32 2022 +0100
@@ -621,100 +621,138 @@
 \begin{proof}
 By induction on $s$, $r$ and $v_1$. The inductive cases
 are all the POSIX rules. 
-Each case is proven by a combination of
-the induction rules for $\POSIX$ 
-values and the inductive hypothesis.
 Probably the most cumbersome cases are 
 the sequence and star with non-empty iterations.
+We shall give the details for proving the sequence case here.
 
-We give the reasoning about the sequence case as follows:
-When we have $(s_1, r_1) \rightarrow v_1$ and $(s_2, r_2) \rightarrow v_2$, 
-we know that there could not be a longer string $r_1'$ such that $(s_1', r_1) \rightarrow v_1'$
-and $(s_2', r_2) \rightarrow v2'$ and $s_1' @s_2' = s$ all hold.
-For possible values of $s_1'$ and $s_2'$ where $s_1'$ is shorter, they cannot
-possibly form a $\POSIX$ for $s$.
-If we have some other values $v_1'$ and $v_2'$ such that 
-$(s_1, r_1) \rightarrow v_1'$ and $(s_2, r_2) \rightarrow v_2'$,
-Then by induction hypothesis $v_1' = v_1$ and $v_2'= v_2$, 
-which means this "different" $\POSIX$ value $\Seq(v_1', v_2')$
+When we have 
+\[
+	(s_1, r_1) \rightarrow v_1 \;\, and \;\, 
+	(s_2, r_2) \rightarrow v_2  \;\, and \;\,\\ 
+	\nexists s_3 \; s_4. s_3 \neq [] \land s_3 @ s_4 = s_2 \land 
+		s_1@ s_3 \in L \; r_1 \land s_4 \in L \; r_2
+\]
+we know that the last condition 
+excludes the possibility of a 
+string $s_1'$ longer than $s_1$ such that 
+\[
+(s_1', r_1) \rightarrow v_1'   \;\; 
+and\;\; (s_2', r_2) \rightarrow v_2'\;\; and \;\;s_1' @s_2' = s 
+\]
+hold.
+A shorter string $s_1''$ with $s_2''$ satisfying
+\[
+(s_1'', r_1) \rightarrow v_1''
+\;\;and\;\; (s_2'', r_2) \rightarrow v_2'' \;\;and \;\;s_1'' @s_2'' = s 
+\]
+cannot possibly form a $\POSIX$ value either, because
+by definition there is a candidate
+with longer initial string
+$s_1$. Therefore, we know that the POSIX
+value $\Seq \; a \; b$ for $r_1 \cdot r_2$ matching
+$s$ must have the 
+property that 
+\[
+	|a| = s_1 \;\; and \;\; |b| = s_2.
+\]
+The goal is to prove that $a = v_1 $ and $b = v_2$.
+If we have some other POSIX values $v_{10}$ and $v_{20}$ such that 
+$(s_1, r_1) \rightarrow v_{10}$ and $(s_2, r_2) \rightarrow v_{20}$ hold,
+then by induction hypothesis $v_{10} = v_1$ and $v_{20}= v_2$, 
+which means this "other" $\POSIX$ value $\Seq(v_{10}, v_{20})$
 is the same as $\Seq(v_1, v_2)$. 
 \end{proof}
-Now we know what a $\POSIX$ value is; the problem is how do we achieve 
-such a value in a lexing algorithm, using derivatives?
+\noindent
+Now we know what a $\POSIX$ value is and why it is unique;
+the problem is generating 
+such a value in a lexing algorithm using derivatives.
 
 \subsection{Sulzmann and Lu's Injection-based Lexing Algorithm}
 
-The contribution of Sulzmann and Lu is an extension of Brzozowski's
-algorithm by a second phase (the first phase being building successive
-derivatives---see \ref{graph:successive_ders}). This second phase generates a POSIX value if the regular expression matches the string.
+Sulzmann and Lu extended Brzozowski's 
+derivative-based matching
+to a lexing algorithm by a second pass
+after the initial phase of successive derivatives.
+This second phase generates a POSIX value 
+if the regular expression matches the string.
 Two functions are involved: $\inj$ and $\mkeps$.
-The function $\mkeps$ constructs a value from the last
-one of all the successive derivatives:
+The first one used is $\mkeps$, which constructs a POSIX value from the last
+derivative $r_n$:
 \begin{ceqn}
 \begin{equation}\label{graph:mkeps}
 \begin{tikzcd}
-r_0 \arrow[r, "\backslash c_0"]  & r_1 \arrow[r, "\backslash c_1"] & r_2 \arrow[r, dashed] & r_n \arrow[d, "mkeps" description] \\
+r_0 \arrow[r, "\backslash c_0"]  & r_1 \arrow[r, "\backslash c_1"] & r_2 \arrow[r, dashed, "\ldots"] & r_n \arrow[d, "mkeps" description] \\
 	        & 	              & 	            & v_n       
 \end{tikzcd}
 \end{equation}
 \end{ceqn}
-
-It tells us how can an empty string be matched by a 
-regular expression, in a $\POSIX$ way:
-
+\noindent
+In the above diagram, again we assume that
+the input string $s$ is made of $n$ characters
+$c_0c_1 \ldots c_{n-1}$, and the input regular expression $r$
+is given label $0$ and after each character $c_i$ is taken off
+by the derivative operation the resulting derivative regular 
+expressioin is $r_{i+1}$.The last derivative operation 
+$\backslash c_{n-1}$ gives back $r_n$, which is transformed into 
+a value $v_n$ by $\mkeps$.
+$v_n$ tells us how an empty string is matched by the (nullable)
+regular expression $r_n$, in a $\POSIX$ way.
+The definition of $\mkeps$ is
 	\begin{center}
 		\begin{tabular}{lcl}
 			$\mkeps \; \ONE$ 		& $\dn$ & $\Empty$ \\
-			$\mkeps \; r_{1}+r_{2}$	& $\dn$ 
-						& \textit{if} ($\nullable \; r_{1})$\\ 
-			& & \textit{then} $\Left (\mkeps \; r_{1})$\\ 
-			& & \textit{else} $\Right(\mkeps \; r_{2})$\\
-			$\mkeps \; r_1\cdot r_2$ 	& $\dn$ & $\Seq\;(\mkeps\;r_1)\;(\mkeps \; r_2)$\\
+			$\mkeps \; (r_{1}+r_{2})$	& $\dn$ 
+						& $\textit{if}\; (\nullable \; r_{1}) \;\,
+							\textit{then}\;\, \Left \; (\mkeps \; r_{1})$\\ 
+						& & $\phantom{if}\; \textit{else}\;\, \Right \;(\mkeps \; r_{2})$\\
+			$\mkeps \; (r_1 \cdot r_2)$ 	& $\dn$ & $\Seq\;(\mkeps\;r_1)\;(\mkeps \; r_2)$\\
 			$\mkeps \; r^* $	        & $\dn$ & $\Stars\;[]$
 		\end{tabular}
 	\end{center}
 
 
 \noindent 
-We favour the left to match an empty string if there is a choice.
+We favour the left child $r_1$ of $r_1 + r_2$ 
+to match an empty string if there is a choice.
 When there is a star for us to match the empty string,
 we give the $\Stars$ constructor an empty list, meaning
-no iterations are taken.
+no iteration is taken.
 The result of a call to $\mkeps$ on a $\nullable$ $r$ would
 be a $\POSIX$ value corresponding to $r$:
-\begin{lemma}
-$\nullable(r) \implies (r, []) \rightarrow (\mkeps\; v)$
-\end{lemma}\label{mePosix}
-
-
-After the $\mkeps$-call, we inject back the characters one by one in order to build
-the lexical value $v_i$ for how the regex $r_i$ matches the string $s_i$
-($s_i = c_i \ldots c_{n-1}$ ) from the previous lexical value $v_{i+1}$.
+\begin{lemma}\label{mePosix}
+$\nullable\; r \implies (r, []) \rightarrow (\mkeps\; v)$
+\end{lemma}
+\begin{proof}
+	By induction on the shape of $r$.
+\end{proof}
+\noindent
+After the $\mkeps$-call, we inject back the characters one by one
+in reverse order as they were chopped off in the derivative phase.
+The fucntion for this is called $\inj$. $\inj$ and $\backslash$
+are not exactly reverse operations of one another, as $\inj$ 
+operates on values instead of regular
+expressions.
+In the diagram below, $v_i$ stands for the (POSIX) value 
+for how the regular expression 
+$r_i$ matches the string $s_i$ consisting of the last $n-i$ characters
+of $s$ (i.e. $s_i = c_i \ldots c_{n-1}$ ) from the previous lexical value $v_{i+1}$.
 After injecting back $n$ characters, we get the lexical value for how $r_0$
 matches $s$. 
-To do this, Sulzmann and Lu defined a function that reverses
-the ``chopping off'' of characters during the derivative phase. The
-corresponding function is called \emph{injection}, written
-$\textit{inj}$; it takes three arguments: the first one is a regular
-expression ${r_{i-1}}$, before the character is chopped off, the second
-is a character ${c_{i-1}}$, the character we want to inject and the
-third argument is the value ${v_i}$, into which one wants to inject the
-character (it corresponds to the regular expression after the character
-has been chopped off). The result of this function is a new value. 
 \begin{ceqn}
 \begin{equation}\label{graph:inj}
 \begin{tikzcd}
-r_1 \arrow[r, dashed] \arrow[d]& r_i \arrow[r, "\backslash c_i"]  \arrow[d]  & r_{i+1}  \arrow[r, dashed] \arrow[d]        & r_n \arrow[d, "mkeps" description] \\
-v_1           \arrow[u]                 & v_i  \arrow[l, dashed]                              & v_{i+1} \arrow[l,"inj_{r_i} c_i"]                 & v_n \arrow[l, dashed]         
+r_0 \arrow[r, dashed] \arrow[d]& r_i \arrow[r, "\backslash c_i"]  \arrow[d]  & r_{i+1}  \arrow[r, dashed] \arrow[d]        & r_n \arrow[d, "mkeps" description] \\
+v_0           \arrow[u]                 & v_i  \arrow[l, dashed]                              & v_{i+1} \arrow[l,"inj_{r_i} c_i"]                 & v_n \arrow[l, dashed]         
 \end{tikzcd}
 \end{equation}
 \end{ceqn}
-
-
 \noindent
-The
-definition of $\textit{inj}$ is as follows: 
-
+$\textit{inj}$ takes three arguments: a regular
+expression ${r_{i}}$, before the character is chopped off, 
+a character ${c_{i}}$, the character we want to inject back and 
+the third argument $v_{i+1}$ the value we want to inject into. 
+The result of this function is a new value $v_i$.
+The definition of $\textit{inj}$ is as follows: 
 \begin{center}
 \begin{tabular}{l@{\hspace{1mm}}c@{\hspace{1mm}}l}
   $\textit{inj}\,(c)\,c\,Empty$            & $\dn$ & $Char\,c$\\
diff -r 94604a5fd271 -r 28cb8089ec36 ChengsongTanPhdThesis/Chapters/Introduction.tex
--- a/ChengsongTanPhdThesis/Chapters/Introduction.tex	Wed Jul 13 08:35:45 2022 +0100
+++ b/ChengsongTanPhdThesis/Chapters/Introduction.tex	Thu Jul 14 14:57:32 2022 +0100
@@ -85,8 +85,8 @@
 \def\flex{\textit{flex}}
 \def\inj{\mathit{inj}}
 \def\Empty{\textit{Empty}}
-\def\Left{\mathit{Left}}
-\def\Right{\mathit{Right}}
+\def\Left{\textit{Left}}
+\def\Right{\textit{Right}}
 \def\Stars{\mathit{Stars}}
 \def\Char{\mathit{Char}}
 \def\Seq{\mathit{Seq}}