ChengsongTanPhdThesis/Chapters/Inj.tex
changeset 638 dd9dde2d902b
parent 637 e3752aac8ec2
child 639 80cc6dc4c98b
--- a/ChengsongTanPhdThesis/Chapters/Inj.tex	Sat Dec 24 11:55:04 2022 +0000
+++ b/ChengsongTanPhdThesis/Chapters/Inj.tex	Fri Dec 30 01:52:32 2022 +0000
@@ -13,7 +13,7 @@
 This is essentially a description in ``English''
 the functions and datatypes of our formalisation in Isabelle/HOL.
 We also define what $\POSIX$ lexing means, 
-followed by a lexing algorithm by Sulzmanna and Lu \parencite{Sulzmann2014} 
+followed by the first lexing algorithm by Sulzmanna and Lu \parencite{Sulzmann2014} 
 that produces the output conforming
 to the $\POSIX$ standard\footnote{In what follows 
 we choose to use the Isabelle-style notation
@@ -109,7 +109,7 @@
 to test membership of a string in 
 a set of strings. 
 For example, to test whether the string
-$bar$ is contained in the set $\{foo, bar, brak\}$, one takes derivative of the set with
+$bar$ is contained in the set $\{foo, bar, brak\}$, one can take derivative of the set with
 respect to the string $bar$:
 \begin{center}
 \begin{tabular}{lll}
@@ -122,8 +122,9 @@
 \end{center}
 \noindent
 and in the end, test whether the set
-has the empty string.\footnote{We use the infix notation $A\backslash c$
-	instead of $\Der \; c \; A$ for brevity, as it is clear we are operating
+contains the empty string.\footnote{We use the infix notation $A\backslash c$
+	instead of $\Der \; c \; A$ for brevity, as it will always be
+	clear from the context that we are operating
 on languages rather than regular expressions.}
 
 In general, if we have a language $S$,
@@ -204,7 +205,7 @@
 expression that matches only the empty string.\footnote{
 Some authors
 also use $\phi$ and $\epsilon$ for $\ZERO$ and $\ONE$
-but we prefer our notation.} 
+but we prefer this notation.} 
 The sequence regular expression is written $r_1\cdot r_2$
 and sometimes we omit the dot if it is clear which
 regular expression is meant; the alternative
@@ -227,7 +228,7 @@
 %Now with language derivatives of a language and regular expressions and
 %their language interpretations in place, we are ready to define derivatives on regular expressions.
 With $L$, we are ready to introduce Brzozowski derivatives on regular expressions.
-We do so by first introducing what properties it should satisfy.
+We do so by first introducing what properties they should satisfy.
 
 \subsection{Brzozowski Derivatives and a Regular Expression Matcher}
 %Recall, the language derivative acts on a set of strings
@@ -291,14 +292,13 @@
 he calls the derivative of a regular expression $r$
 with respect to a character $c$, written
 $r \backslash c$. This infix operator
-takes an original regular expression $r$ as input
-and a character as a right operand and
-outputs a result, which is a new regular expression.
+takes regular expression $r$ as input
+and a character as a right operand.
 The derivative operation on regular expression
 is defined such that the language of the derivative result 
 coincides with the language of the original 
 regular expression being taken 
-derivative with respect to the same character:
+derivative with respect to the same characters, namely
 \begin{property}
 
 \[
@@ -335,11 +335,11 @@
 \vspace{ 3mm }
 
 The derivatives on regular expression can again be 
-generalised to a string.
+generalised to strings.
 One could compute $r_{start} \backslash s$  and test membership of $s$
 in $L \; r_{start}$ by checking
 whether the empty string is in the language of 
-$r_{end}$ ($r_{start}\backslash s$).\\
+$r_{end}$ (that is $r_{start}\backslash s$).\\
 
 \vspace{2mm}
 \Longstack{
@@ -378,18 +378,18 @@
 }
 
 
-
+We have the property that
 \begin{property}
 	$s \in L \; r_{start} \iff [] \in L \; r_{end}$
 \end{property}
 \noindent
 Next, we give the recursive definition of derivative on
 regular expressions so that it satisfies the properties above.
-The derivative function, written $r\backslash c$, 
-takes a regular expression $r$ and character $c$, and
-returns a new regular expression representing
-the original regular expression's language $L \; r$
-being taken the language derivative with respect to $c$.
+%The derivative function, written $r\backslash c$, 
+%takes a regular expression $r$ and character $c$, and
+%returns a new regular expression representing
+%the original regular expression's language $L \; r$
+%being taken the language derivative with respect to $c$.
 \begin{table}
 	\begin{center}
 \begin{tabular}{lcl}
@@ -418,7 +418,7 @@
 	\begin{tabular}{lcl}
 		$(r_1 \cdot r_2 ) \backslash c$ & $\dn$ & 
 		$\textit{if}\;\,([] \in L(r_1))\; 
-		\textit{then} \; r_1 \backslash c \cdot r_2 + r_2 \backslash c$ \\
+		\textit{then} \; (r_1 \backslash c) \cdot r_2 + r_2 \backslash c$ \\
 		& & $\textit{else} \; (r_1 \backslash c) \cdot r_2$
 	\end{tabular}
 \end{center}
@@ -483,7 +483,7 @@
 is always nullable because it naturally
 contains the empty string.  
 \noindent
-We have the following correspondence between 
+We have the following two correspondences between 
 derivatives on regular expressions and
 derivatives on a set of strings:
 \begin{lemma}\label{derDer}
@@ -499,8 +499,8 @@
 	By induction on $r$.
 \end{proof}
 \noindent
-which is the main property of derivatives
-that enables us to reason about the correctness of
+which are the main properties of derivatives
+that enables us later to reason about the correctness of
 derivative-based matching.
 We can generalise the derivative operation shown above for single characters
 to strings as follows:
@@ -570,14 +570,14 @@
 \ref{NaiveMatcher}.
 Note that both axes are in logarithmic scale.
 Around two dozen characters
-would already ``explode'' the matcher with the regular expression 
+this algorithm already ``explodes'' with the regular expression 
 $(a^*)^*b$.
 To improve this situation, we need to introduce simplification
 rules for the intermediate results,
-such as $r + r \rightarrow r$,
+such as $r + r \rightarrow r$ or $\ONE \cdot r \rightarrow r$,
 and make sure those rules do not change the 
 language of the regular expression.
-One simpled-minded simplification function
+One simple-minded simplification function
 that achieves these requirements 
 is given below (see Ausaf et al. \cite{AusafDyckhoffUrban2016}):
 \begin{center}
@@ -733,7 +733,7 @@
 one of the values to be generated. $\POSIX$ is one of the
 disambiguation strategies that is widely adopted.
 
-Ausaf et al.\parencite{AusafDyckhoffUrban2016} 
+Ausaf et al. \cite{AusafDyckhoffUrban2016} 
 formalised the property 
 as a ternary relation.
 The $\POSIX$ value $v$ for a regular expression
@@ -843,7 +843,7 @@
 then a match with a keyword (if)
 followed by
 an identifier (foo) would be incorrect.
-POSIX lexing would generate what we want.
+POSIX lexing generates what is included by lexing.
 
 \noindent
 We know that a POSIX 
@@ -915,7 +915,7 @@
 after the initial phase of successive derivatives.
 This second phase generates a POSIX value 
 if the regular expression matches the string.
-Two functions are involved: $\inj$ and $\mkeps$.
+The algorithm uses two functions called $\inj$ and $\mkeps$.
 The function $\mkeps$ constructs a POSIX value from the last
 derivative $r_n$:
 \begin{ceqn}
@@ -958,7 +958,7 @@
 The result of $\mkeps$ on a $\nullable$ $r$ 
 is a POSIX value for $r$ and the empty string:
 \begin{lemma}\label{mePosix}
-$\nullable\; r \implies (r, []) \rightarrow (\mkeps\; v)$
+$\nullable\; r \implies (r, []) \rightarrow (\mkeps\; r)$
 \end{lemma}
 \begin{proof}
 	By induction on $r$.
@@ -1322,7 +1322,7 @@
 		   \Seq \; (\Stars \; [\Char \; a])\; (\Stars \; [])
 		  ] 
 \]
-And $\derssimp \; aa \; (a^*a^*)^*$ would be
+And $\derssimp \; aa \; (a^*a^*)^*$ is
 \[
 	((a^*a^* + a^*)+a^*)\cdot(a^*a^*)^* + 
 	(a^*a^* + a^*)\cdot(a^*a^*)^*.
@@ -1352,7 +1352,7 @@
 At any moment, the  subterms in a regular expression
 that will potentially result in a POSIX value is only
 a minority among the many other terms,
-and one can remove ones that are not possible to 
+and one can remove the ones that are not possible to 
 be POSIX.
 In the above example,
 \begin{equation}\label{eqn:growth2}
@@ -1379,15 +1379,15 @@
 \[
 	\Stars \; [\Seq \; (\Stars \; [])\; (\Stars \; [\Char \; a, \Char \; a])]
 \]
-is too hopeless to contribute a POSIX lexical value,
-and is therefore thrown away.
+do not to contribute a POSIX lexical value,
+and therefore can be thrown away.
 
 Ausaf et al. \cite{AusafDyckhoffUrban2016} 
 have come up with some simplification steps, however those steps
-are not yet sufficiently strong, so they achieve the above effects.
+are not yet sufficiently strong, to achieve the above effects.
 And even with these relatively mild simplifications, the proof
 is already quite a bit more complicated than the theorem \ref{lexerCorrectness}.
-One would prove something like this: 
+One would need to prove something like this: 
 \[
 	\textit{If}\; (\textit{snd} \; (\textit{simp} \; r\backslash c), s) \rightarrow  v  \;\;
 	\textit{then}\;\; (r, c::s) \rightarrow 
@@ -1396,7 +1396,7 @@
 instead of the simple lemma \ref{injPosix}, where now $\textit{simp}$
 not only has to return a simplified regular expression,
 but also what specific simplifications 
-has been done as a function on values
+have been done as a function on values
 showing how one can transform the value
 underlying the simplified regular expression
 to the unsimplified one.