ChengsongTanPhdThesis/Chapters/Bitcoded1.tex
changeset 640 bd1354127574
parent 639 80cc6dc4c98b
child 649 ef2b8abcbc55
--- a/ChengsongTanPhdThesis/Chapters/Bitcoded1.tex	Fri Dec 30 17:37:51 2022 +0000
+++ b/ChengsongTanPhdThesis/Chapters/Bitcoded1.tex	Fri Dec 30 23:41:44 2022 +0000
@@ -14,7 +14,7 @@
 regular expressions. 
 We have implemented their algorithm in Scala and Isabelle, 
 and found problems 
-in their algorithm such as de-duplication not working properly and redundant
+in their algorithm, such as de-duplication not working properly and redundant
 fixpoint construction. 
 \section{The Motivation Behind Using Bitcodes}
 Let us give again the definition of $\lexer$ from Chapter \ref{Inj}:
@@ -113,7 +113,7 @@
 \caption{Second derivative taken}
 \end{figure}
 \noindent
-As the number of derivative steps increase,
+As the number of derivative steps increases,
 the stack would increase:
 
 \begin{figure}[H]
@@ -327,12 +327,12 @@
 as the lexing proceeds.
 It becomes unnecessary
 to remember all the 
-intermediate expresssions, but only the most recent one
+intermediate expressions, but only the most recent one
 with this bit-carrying regular expression.
 Annotated regular expressions 
 are defined as the following 
 Isabelle datatype:\footnote{ We use subscript notation to indicate
-	that the bitcodes are auxiliary information that do not
+	that the bitcodes are auxiliary information that does not
 interfere with the structure of the regular expressions }
 \begin{center}
 \begin{tabular}{lcl}
@@ -452,9 +452,8 @@
 \end{tikzpicture}
 \caption{A bird's eye view of $\blexer$. The $_{bsi}a_{i}$s stand
 	for the annotated regular expressions in each derivative step.
-	The characters used in each derivative is written as $c_i$.
-The size of the derivatives typically increase during each derivative step,
-therefore we draw the larger circles to indicate that.
+	The characters used in each derivative are written as $c_i$.
+The size of the derivatives typically increases during each derivative step.
 The process starts with an internalise phase and concludes with a decoding phase.}
 \end{figure}
 \noindent
@@ -540,7 +539,7 @@
 The opposite of $\textit{internalise}$ is
 $\erase$, where all bit-codes are removed,
 and the alternative operator $\sum$ for annotated
-regular expressions is transformed to the binary version 
+regular expressions is transformed into the binary version 
 in (plain) regular expressions. This can be defined as follows:
 \begin{center}\label{eraseDef}
 	\begin{tabular}{lcl}
@@ -648,13 +647,13 @@
 The reverse operation of $\decode$ is $\code$.
 This function encodes a value into a bitcode by converting
 $\Left$ into $Z$, $\Right$ into $S$, and marks the start of any non-empty
-star iteration by $S$. The border where a star iteration
-terminates is marked by $Z$. 
+star iteration by $S$. $ Z$ marks the border where a star iteration
+terminates. 
 This coding is lossy, as it throws away the information about
 characters, and does not encode the ``boundary'' between two
 sequence values. Moreover, with only the bitcode we cannot even tell
 whether the $S$s and $Z$s are for $\Left/\Right$ or $\Stars$,
-but this will not be necessary in our correctness proof.
+but this will not be necessary for our correctness proof.
 \begin{center}
 \begin{tabular}{lcl}
   $\textit{code}(\Empty)$ & $\dn$ & $[]$\\
@@ -678,8 +677,7 @@
 \begin{proof}
 By proving a more general version of the lemma, on $\decode'$:
 \[\vdash v : r \implies \decode' \; ((\code \; v) @ ds) \; r = (v, ds) \]
-Then setting $ds$ to be $[]$ and unfolding $\decode$ definition,
-we obtain the property.
+Then we obtain the property by setting $ds$ to be $[]$ and unfolding the $\decode$ definition.
 \end{proof}
 \noindent
 Now we define the central part of Sulzmann and Lu's second lexing algorithm,
@@ -856,7 +854,7 @@
 then test whether the result is (b)nullable.
 If yes, then extract the bitcodes from the nullable expression,
 and decodes the bitcodes into a lexical value.
-If there does not exists a match between $r$ and $s$ the lexer
+If there does not exist a match between $r$ and $s$, the lexer
 outputs $\None$ indicating a mismatch.
 Ausaf and Urban formally proved the correctness of the $\blexer$, namely
 \begin{property}
@@ -1068,7 +1066,7 @@
 $\flex$.
 \subsection{Specifications of Some Helper Functions}
 The functions we introduce will give a more detailed view into 
-the lexing process, which is not be possible
+the lexing process, which is not possible
 using $\lexer$ or $\blexer$ alone.
 \subsubsection{$\textit{Retrieve}$}
 The first function we shall introduce is $\retrieve$.
@@ -1232,7 +1230,7 @@
 	$\vdash v : (r\backslash c) \implies \retrieve \; (r \backslash c)  \;  v= \retrieve \; r \; (\inj \; r\; c\; v)$
 \end{lemma}
 \begin{proof}
-	By induction on $r$, generalising over $v$.
+	We do an induction on $r$, generalising over $v$.
 	The induction principle in our formalisation
 	is function $\erase$'s cases.
 \end{proof}
@@ -1262,7 +1260,7 @@
 \end{tabular}
 \end{center}
 which accumulates the characters that need to be injected back, 
-and does the injection in a stack-like manner (the last taken derivative is first injected).
+and does the injection in a stack-like manner (the last character being chopped off in the derivatives phase is first injected).
 The function
 $\flex$ can calculate what $\lexer$ calculates, given the input regular expression
 $r$, the identity function $id$, the input string $s$ and the value