diff -r 80cc6dc4c98b -r bd1354127574 ChengsongTanPhdThesis/Chapters/Bitcoded1.tex --- a/ChengsongTanPhdThesis/Chapters/Bitcoded1.tex Fri Dec 30 17:37:51 2022 +0000 +++ b/ChengsongTanPhdThesis/Chapters/Bitcoded1.tex Fri Dec 30 23:41:44 2022 +0000 @@ -14,7 +14,7 @@ regular expressions. We have implemented their algorithm in Scala and Isabelle, and found problems -in their algorithm such as de-duplication not working properly and redundant +in their algorithm, such as de-duplication not working properly and redundant fixpoint construction. \section{The Motivation Behind Using Bitcodes} Let us give again the definition of $\lexer$ from Chapter \ref{Inj}: @@ -113,7 +113,7 @@ \caption{Second derivative taken} \end{figure} \noindent -As the number of derivative steps increase, +As the number of derivative steps increases, the stack would increase: \begin{figure}[H] @@ -327,12 +327,12 @@ as the lexing proceeds. It becomes unnecessary to remember all the -intermediate expresssions, but only the most recent one +intermediate expressions, but only the most recent one with this bit-carrying regular expression. Annotated regular expressions are defined as the following Isabelle datatype:\footnote{ We use subscript notation to indicate - that the bitcodes are auxiliary information that do not + that the bitcodes are auxiliary information that does not interfere with the structure of the regular expressions } \begin{center} \begin{tabular}{lcl} @@ -452,9 +452,8 @@ \end{tikzpicture} \caption{A bird's eye view of $\blexer$. The $_{bsi}a_{i}$s stand for the annotated regular expressions in each derivative step. - The characters used in each derivative is written as $c_i$. -The size of the derivatives typically increase during each derivative step, -therefore we draw the larger circles to indicate that. + The characters used in each derivative are written as $c_i$. +The size of the derivatives typically increases during each derivative step. The process starts with an internalise phase and concludes with a decoding phase.} \end{figure} \noindent @@ -540,7 +539,7 @@ The opposite of $\textit{internalise}$ is $\erase$, where all bit-codes are removed, and the alternative operator $\sum$ for annotated -regular expressions is transformed to the binary version +regular expressions is transformed into the binary version in (plain) regular expressions. This can be defined as follows: \begin{center}\label{eraseDef} \begin{tabular}{lcl} @@ -648,13 +647,13 @@ The reverse operation of $\decode$ is $\code$. This function encodes a value into a bitcode by converting $\Left$ into $Z$, $\Right$ into $S$, and marks the start of any non-empty -star iteration by $S$. The border where a star iteration -terminates is marked by $Z$. +star iteration by $S$. $ Z$ marks the border where a star iteration +terminates. This coding is lossy, as it throws away the information about characters, and does not encode the ``boundary'' between two sequence values. Moreover, with only the bitcode we cannot even tell whether the $S$s and $Z$s are for $\Left/\Right$ or $\Stars$, -but this will not be necessary in our correctness proof. +but this will not be necessary for our correctness proof. \begin{center} \begin{tabular}{lcl} $\textit{code}(\Empty)$ & $\dn$ & $[]$\\ @@ -678,8 +677,7 @@ \begin{proof} By proving a more general version of the lemma, on $\decode'$: \[\vdash v : r \implies \decode' \; ((\code \; v) @ ds) \; r = (v, ds) \] -Then setting $ds$ to be $[]$ and unfolding $\decode$ definition, -we obtain the property. +Then we obtain the property by setting $ds$ to be $[]$ and unfolding the $\decode$ definition. \end{proof} \noindent Now we define the central part of Sulzmann and Lu's second lexing algorithm, @@ -856,7 +854,7 @@ then test whether the result is (b)nullable. If yes, then extract the bitcodes from the nullable expression, and decodes the bitcodes into a lexical value. -If there does not exists a match between $r$ and $s$ the lexer +If there does not exist a match between $r$ and $s$, the lexer outputs $\None$ indicating a mismatch. Ausaf and Urban formally proved the correctness of the $\blexer$, namely \begin{property} @@ -1068,7 +1066,7 @@ $\flex$. \subsection{Specifications of Some Helper Functions} The functions we introduce will give a more detailed view into -the lexing process, which is not be possible +the lexing process, which is not possible using $\lexer$ or $\blexer$ alone. \subsubsection{$\textit{Retrieve}$} The first function we shall introduce is $\retrieve$. @@ -1232,7 +1230,7 @@ $\vdash v : (r\backslash c) \implies \retrieve \; (r \backslash c) \; v= \retrieve \; r \; (\inj \; r\; c\; v)$ \end{lemma} \begin{proof} - By induction on $r$, generalising over $v$. + We do an induction on $r$, generalising over $v$. The induction principle in our formalisation is function $\erase$'s cases. \end{proof} @@ -1262,7 +1260,7 @@ \end{tabular} \end{center} which accumulates the characters that need to be injected back, -and does the injection in a stack-like manner (the last taken derivative is first injected). +and does the injection in a stack-like manner (the last character being chopped off in the derivatives phase is first injected). The function $\flex$ can calculate what $\lexer$ calculates, given the input regular expression $r$, the identity function $id$, the input string $s$ and the value