--- a/ChengsongTanPhdThesis/Chapters/Bitcoded1.tex Fri Dec 30 17:37:51 2022 +0000
+++ b/ChengsongTanPhdThesis/Chapters/Bitcoded1.tex Fri Dec 30 23:41:44 2022 +0000
@@ -14,7 +14,7 @@
regular expressions.
We have implemented their algorithm in Scala and Isabelle,
and found problems
-in their algorithm such as de-duplication not working properly and redundant
+in their algorithm, such as de-duplication not working properly and redundant
fixpoint construction.
\section{The Motivation Behind Using Bitcodes}
Let us give again the definition of $\lexer$ from Chapter \ref{Inj}:
@@ -113,7 +113,7 @@
\caption{Second derivative taken}
\end{figure}
\noindent
-As the number of derivative steps increase,
+As the number of derivative steps increases,
the stack would increase:
\begin{figure}[H]
@@ -327,12 +327,12 @@
as the lexing proceeds.
It becomes unnecessary
to remember all the
-intermediate expresssions, but only the most recent one
+intermediate expressions, but only the most recent one
with this bit-carrying regular expression.
Annotated regular expressions
are defined as the following
Isabelle datatype:\footnote{ We use subscript notation to indicate
- that the bitcodes are auxiliary information that do not
+ that the bitcodes are auxiliary information that does not
interfere with the structure of the regular expressions }
\begin{center}
\begin{tabular}{lcl}
@@ -452,9 +452,8 @@
\end{tikzpicture}
\caption{A bird's eye view of $\blexer$. The $_{bsi}a_{i}$s stand
for the annotated regular expressions in each derivative step.
- The characters used in each derivative is written as $c_i$.
-The size of the derivatives typically increase during each derivative step,
-therefore we draw the larger circles to indicate that.
+ The characters used in each derivative are written as $c_i$.
+The size of the derivatives typically increases during each derivative step.
The process starts with an internalise phase and concludes with a decoding phase.}
\end{figure}
\noindent
@@ -540,7 +539,7 @@
The opposite of $\textit{internalise}$ is
$\erase$, where all bit-codes are removed,
and the alternative operator $\sum$ for annotated
-regular expressions is transformed to the binary version
+regular expressions is transformed into the binary version
in (plain) regular expressions. This can be defined as follows:
\begin{center}\label{eraseDef}
\begin{tabular}{lcl}
@@ -648,13 +647,13 @@
The reverse operation of $\decode$ is $\code$.
This function encodes a value into a bitcode by converting
$\Left$ into $Z$, $\Right$ into $S$, and marks the start of any non-empty
-star iteration by $S$. The border where a star iteration
-terminates is marked by $Z$.
+star iteration by $S$. $ Z$ marks the border where a star iteration
+terminates.
This coding is lossy, as it throws away the information about
characters, and does not encode the ``boundary'' between two
sequence values. Moreover, with only the bitcode we cannot even tell
whether the $S$s and $Z$s are for $\Left/\Right$ or $\Stars$,
-but this will not be necessary in our correctness proof.
+but this will not be necessary for our correctness proof.
\begin{center}
\begin{tabular}{lcl}
$\textit{code}(\Empty)$ & $\dn$ & $[]$\\
@@ -678,8 +677,7 @@
\begin{proof}
By proving a more general version of the lemma, on $\decode'$:
\[\vdash v : r \implies \decode' \; ((\code \; v) @ ds) \; r = (v, ds) \]
-Then setting $ds$ to be $[]$ and unfolding $\decode$ definition,
-we obtain the property.
+Then we obtain the property by setting $ds$ to be $[]$ and unfolding the $\decode$ definition.
\end{proof}
\noindent
Now we define the central part of Sulzmann and Lu's second lexing algorithm,
@@ -856,7 +854,7 @@
then test whether the result is (b)nullable.
If yes, then extract the bitcodes from the nullable expression,
and decodes the bitcodes into a lexical value.
-If there does not exists a match between $r$ and $s$ the lexer
+If there does not exist a match between $r$ and $s$, the lexer
outputs $\None$ indicating a mismatch.
Ausaf and Urban formally proved the correctness of the $\blexer$, namely
\begin{property}
@@ -1068,7 +1066,7 @@
$\flex$.
\subsection{Specifications of Some Helper Functions}
The functions we introduce will give a more detailed view into
-the lexing process, which is not be possible
+the lexing process, which is not possible
using $\lexer$ or $\blexer$ alone.
\subsubsection{$\textit{Retrieve}$}
The first function we shall introduce is $\retrieve$.
@@ -1232,7 +1230,7 @@
$\vdash v : (r\backslash c) \implies \retrieve \; (r \backslash c) \; v= \retrieve \; r \; (\inj \; r\; c\; v)$
\end{lemma}
\begin{proof}
- By induction on $r$, generalising over $v$.
+ We do an induction on $r$, generalising over $v$.
The induction principle in our formalisation
is function $\erase$'s cases.
\end{proof}
@@ -1262,7 +1260,7 @@
\end{tabular}
\end{center}
which accumulates the characters that need to be injected back,
-and does the injection in a stack-like manner (the last taken derivative is first injected).
+and does the injection in a stack-like manner (the last character being chopped off in the derivatives phase is first injected).
The function
$\flex$ can calculate what $\lexer$ calculates, given the input regular expression
$r$, the identity function $id$, the input string $s$ and the value