ChengsongTanPhdThesis/Chapters/Bitcoded1.tex
changeset 624 8ffa28fce271
parent 622 4b1149fb5aec
child 638 dd9dde2d902b
--- a/ChengsongTanPhdThesis/Chapters/Bitcoded1.tex	Sat Nov 12 00:37:23 2022 +0000
+++ b/ChengsongTanPhdThesis/Chapters/Bitcoded1.tex	Sat Nov 12 21:34:40 2022 +0000
@@ -10,14 +10,14 @@
 %Chapter 3\ref{Chapter3}. 
 In this chapter, we are going to describe the bit-coded algorithm
 introduced by Sulzmann and Lu \parencite{Sulzmann2014} to address the growth problem of 
+derivatives of
 regular expressions. 
-We have implemented their algorithm in Scala, and found out inefficiencies
+We have implemented their algorithm in Scala and Isabelle, 
+and found problems 
 in their algorithm such as de-duplication not working properly and redundant
-fixpoint construction. Their algorithm is improved and verified with the help of
-formal proofs.
+fixpoint construction. 
 \section{The Motivation Behind Using Bitcodes}
-We first do a recap of what was going on 
-in the lexer algorithm in Chapter \ref{Inj},
+Let us give again the definition of $\lexer$ from Chapter \ref{Inj}:
 \begin{center}
 \begin{tabular}{lcl}
 	$\lexer \; r \; [] $ & $=$ & $\textit{if} \; (\nullable \; r)\; \textit{then}\;  \Some(\mkeps \; r) \; \textit{else} \; \None$\\
@@ -27,7 +27,22 @@
 \end{tabular}
 \end{center}
 \noindent
-The algorithm recursively calls $\lexer$ on 
+The first problem with this algorithm is that
+for the function $\inj$ to work properly
+we cannot destroy the structure of a regular expression,
+and therefore canno simplify aggressively.
+For example,
+\[
+	r + (r + a) \rightarrow r + a
+\]
+cannot be done because that would require
+breaking up the inner alternative.
+The $\lexer$ function therefore only enables
+same-level de-duplications like
+\[
+	r + r \rightarrow r.
+\]
+Secondly, the algorithm recursively calls $\lexer$ on 
 each new character input,
 and before starting a child call
 it stores information of previous lexing steps
@@ -296,13 +311,13 @@
 \noindent
 Storing all $r_i, c_{i+1}$ pairs recursively
 allows the algorithm to work in an elegant way, at the expense of 
-storing quite a bit of verbose information.
-The stack seems to grow at least quadratically fast with respect
+storing quite a bit of verbose information on the stack.
+The stack seems to grow at least quadratically with respect
 to the input (not taking into account the size bloat of $r_i$),
-which can be inefficient and prone to stack overflow.
+which can be inefficient and prone to stack overflows.
 \section{Bitcoded Algorithm}
 To address this,
-Sulzmann and Lu chose to  define a new datatype 
+Sulzmann and Lu defined a new datatype 
 called \emph{annotated regular expression},
 which condenses all the partial lexing information
 (that was originally stored in $r_i, c_{i+1}$ pairs)
@@ -333,9 +348,9 @@
 where $bs$ stands for bit-codes, $a$  for $\mathbf{a}$nnotated regular
 expressions and $as$ for lists of annotated regular expressions.
 The alternative constructor, written, $\sum$, has been generalised to 
-accept a list of annotated regular expressions rather than just two.
-Why is it generalised? This is because when we open up nested 
-alternatives, there could be more than two elements at the same level
+take a list of annotated regular expressions rather than just two.
+Why is it generalised? This is because when we analyse nested 
+alternatives, there can be more than two elements at the same level
 after de-duplication, which can no longer be stored in a binary
 constructor.
 Bits and bitcodes (lists of bits) are defined as:
@@ -347,21 +362,20 @@
 \noindent
 We use $S$ and $Z$ rather than $1$ and $0$ is to avoid
 confusion with the regular expressions $\ZERO$ and $\ONE$.
-The idea is to use the attached bitcodes
+The idea is to use the bitcodes
 to indicate which choice was made at a certain point
 during lexing.
 For example,
 $(_{Z}a+_{S}b) \backslash a$ gives us
-$_{Z}\ONE + \ZERO$, this $Z$ bitcode will
+$_{Z}\ONE + \ZERO$, where the $Z$ bitcode will
 later be used to decode that a left branch was
-selected at the time when the part $a+b$ is being taken
-derivative of.
+selected at the time when the part $a+b$ was anallysed by
+derivative.
 \subsection{A Bird's Eye View of the Bit-coded Lexer}
-Before we give out the rest of the functions and definitions 
+Before we give the details of the functions and definitions 
 related to our
 $\blexer$ (\emph{b}-itcoded lexer), we first provide a high-level
-view of the algorithm, so the reader does not get lost in
-the details.
+view of the algorithm.
 \begin{figure}[H]
 \begin{tikzpicture}[->,>=stealth',shorten >=1pt,auto,thick] 
 %\draw (-6,-6) grid (6,6);
@@ -436,14 +450,17 @@
 
 
 \end{tikzpicture}
-\caption{Bird's Eye View of $\blexer$}
+\caption{A bird's eye view of $\blexer$. The $_{bsi}a_{i}$s stand
+	for the annotated regular expressions in each derivative step.
+	The characters used in each derivative is written as $c_i$.
+The relative size increase reflect the size increase in each derivative step.}
 \end{figure}
 \noindent
 The plain regular expressions
 are first ``lifted'' to an annotated regular expression,
-with the function called \emph{internalise}.
+with the function called \emph{internalise} ($r \rightarrow _{bs}a$).
 Then the annotated regular expression $_{bs}a$ will
-go through successive derivatives with respect 
+be transformed by successive derivatives with respect 
 to the input stream of characters $c_1, c_2$ etc.
 Each time a derivative is taken, the bitcodes
 are moved around, discarded or augmented together 
@@ -451,9 +468,9 @@
 After all input has been consumed, the 
 bitcodes are collected by $\bmkeps$,
 which traverses the nullable part of the regular expression
-and collect the bitcodes along the way.
+and collects the bitcodes along the way.
 The collected bitcodes are then $\decode$d with the guidance
-of the input regular expression $r$.
+of the input regular expression $r$ ( $_{bs}a \rightarrow r$).
 The most notable improvements of $\blexer$
 over $\lexer$ are 
 \begin{itemize}
@@ -461,43 +478,43 @@
 
 		An absence of the second injection phase.
 	\item
-		One need not store each pair of the 
+		One does not need to store each pair of the 
 		intermediate regular expressions $_{bs_i}a_i$ and  $c_{i+1}$. 
 		The previous annotated regular expression $_{bs_i}a_i$'s information is passed
 		on without loss to its successor $_{bs_{i+1}}a_{i+1}$,
-		and $c_{i+1}$'s already contained in the strings in $L\;r$ \footnote{
-		which can be easily recovered if we decode in the correct order}.
+		and $c_{i+1}$'s information is stored in $L\;r$.\footnote{
+		which will be used during the decode phase, where we use $r$ as
+	a source of information.}
 	\item
-		The simplification works much better--one can maintain correctness
-		while applying quite strong simplifications, as we shall wee.
+		Finally, simplification works much better--one can maintain correctness
+		while applying quite aggressive simplifications.
 \end{itemize}
-\noindent
-In the next section we will 
-give the operations needed in $\blexer$,
-with some remarks on the idea behind their definitions.
+%In the next section we will 
+%give the operations needed in $\blexer$,
+%with some remarks on the idea behind their definitions.
 \subsection{Operations in $\textit{Blexer}$}
 The first operation we define related to bit-coded regular expressions
 is how we move bits to the inside of regular expressions.
-Called $\fuse$, this operation attaches bit-codes 
-to the front of an annotated regular expression:
+This operation is called $\fuse$:
 \begin{center}
 \begin{tabular}{lcl}
-  $\textit{fuse}\;bs \; \ZERO$ & $\dn$ & $\ZERO$\\
-  $\textit{fuse}\;bs\; _{bs'}\ONE$ & $\dn$ &
+	$\textit{fuse}\;bs \; (\ZERO)$ & $\dn$ & $\ZERO$\\
+	$\textit{fuse}\;bs\; (_{bs'})\ONE$ & $\dn$ &
      $_{bs @ bs'}\ONE$\\
-  $\textit{fuse}\;bs\;_{bs'}{\bf c}$ & $\dn$ &
+	$\textit{fuse}\;bs\;(_{bs'}{\bf c})$ & $\dn$ &
      $_{bs@bs'}{\bf c}$\\
-  $\textit{fuse}\;bs\,_{bs'}\sum\textit{as}$ & $\dn$ &
+	$\textit{fuse}\;bs\;(_{bs'}\sum\textit{as})$ & $\dn$ &
      $_{bs@bs'}\sum\textit{as}$\\
-  $\textit{fuse}\;bs\; _{bs'}a_1\cdot a_2$ & $\dn$ &
+	$\textit{fuse}\;bs\; (_{bs'}a_1\cdot a_2)$ & $\dn$ &
      $_{bs@bs'}a_1 \cdot a_2$\\
-  $\textit{fuse}\;bs\,_{bs'}a^*$ & $\dn$ &
+  $\textit{fuse}\;bs\;(_{bs'}a^*)$ & $\dn$ &
      $_{bs @ bs'}a^*$
 \end{tabular}    
 \end{center} 
 
 \noindent
-With \emph{fuse} we are able to define the $\internalise$ function
+With \emph{fuse} we are able to define the $\internalise$ function, 
+written $(\_)^\uparrow$,
 that translates a ``standard'' regular expression into an
 annotated regular expression.
 This function will be applied before we start
@@ -518,15 +535,12 @@
 \end{tabular}    
 \end{center}    
 \noindent
-We use an up arrow with postfix notation
-to denote this operation
-for convenience. 
 The opposite of $\textit{internalise}$ is
-$\erase$, where all the bit-codes are removed,
+$\erase$, where all bit-codes are removed,
 and the alternative operator $\sum$ for annotated
 regular expressions is transformed to the binary version 
-in plain regular expressions.
-\begin{center}
+in (plain) regular expressions. This can be defined as follows:
+\begin{center}\label{eraseDef}
 	\begin{tabular}{lcl}
 		$\ZERO_\downarrow$ & $\dn$ & $\ZERO$\\
 		$( _{bs}\ONE )_\downarrow$ & $\dn$ & $\ONE$\\
@@ -535,7 +549,7 @@
 		$ (a_1) _\downarrow \cdot  (a_2) _\downarrow$\\
 		$( _{bs} [])_\downarrow $ & $\dn$ & $\ZERO $\\
 		$( _{bs} [a]  )_\downarrow$ & $\dn$ & $a_\downarrow$\\
-		$_{bs} \sum [a_1, \; a_2]$ & $\dn$ & $ (a_1) _\downarrow + ( a_2 ) _\downarrow $\\
+		$(_{bs} \sum [a_1, \; a_2])_\downarrow$ & $\dn$ & $ (a_1) _\downarrow + ( a_2 ) _\downarrow $\\
 		$(_{bs} \sum (a :: as))_\downarrow$ & $\dn$ & $ a_\downarrow + \; (_{[]} \sum as)_\downarrow$\\
 		$( _{bs} a^* )_\downarrow$ & $\dn$ & $(a_\downarrow)^*$
 	\end{tabular}
@@ -544,9 +558,11 @@
 We also abbreviate the $\erase\; a$ operation
 as $a_\downarrow$, for conciseness.
 
-Testing whether an annotated regular expression
+Determining whether an annotated regular expression
 contains the empty string in its lauguage requires
 a dedicated function $\bnullable$.
+In our formalisation this function is defined by simply calling $\erase$
+before $\nullable$.
 $\bnullable$ simply calls $\erase$ before $\nullable$.
 \begin{definition}
 		$\bnullable \; a \dn  \nullable \; (a_\downarrow)$
@@ -554,10 +570,10 @@
 The function for collecting 
 bitcodes from a 
 (b)nullable regular expression is called $\bmkeps$.
-$\bmkeps$ is a variation of the $\textit{mkeps}$ function,
+This function  is a lifted version of the $\textit{mkeps}$ function,
 which follows the path $\mkeps$ takes to traverse the
 $\nullable$ branches when visiting a regular expression,
-but gives back bitcodes instead of a value.
+but collects bitcodes instead of generating a value.
 \begin{center}
 \begin{tabular}{lcl}
   $\textit{bmkeps}\,(_{bs}\ONE)$ & $\dn$ & $bs$\\
@@ -572,10 +588,9 @@
 \end{tabular}    
 \end{center}    
 \noindent
-$\bmkeps$, just like $\mkeps$, 
+This function, just like $\mkeps$, 
 visits a regular expression tree respecting
-the POSIX rules. The difference, however, is that
-it does not create values, but only bitcodes.
+the POSIX rules. 
 It traverses each child of the sequence regular expression
 from left to right and creates a bitcode by stitching
 together bitcodes obtained from the children expressions.
@@ -588,7 +603,8 @@
 %child recursive calls return.
 The only time when $\bmkeps$ creates new bitcodes
 is when it completes a star's iterations by attaching a $S$ to the end of the bitcode
-list it returns.\\
+list it returns.
+
 The bitcodes extracted by $\bmkeps$ need to be 
 $\decode$d (with the guidance of a plain regular expression):
 %\begin{definition}[Bitdecoding of Values]\mbox{}
@@ -620,23 +636,24 @@
 \end{center} 
 \noindent
 The function $\decode'$ returns a pair consisting of 
-a partially decoded value and some leftover bit list that cannot
-be decide yet.
+a partially decoded value and some leftover bit-list.
 The function $\decode'$ succeeds if the left-over 
 bit-sequence is empty.
-$\decode$ is terminating as $\decode'$ is terminating.
-$\decode'$ is terminating 
-because at least one of $\decode'$'s parameters will go down in terms
-of size.\\
+
+%$\decode$ is terminating as $\decode'$ is terminating.
+%$\decode'$ is terminating 
+%because at least one of $\decode'$'s parameters will go down in terms
+%of size.\\
 The reverse operation of $\decode$ is $\code$.
-$\textit{code}$ encodes a value into a bitcode by converting
+This function encodes a value into a bitcode by converting
 $\Left$ into $Z$, $\Right$ into $S$, and marks the start of any non-empty
 star iteration by $S$. The border where a star iteration
 terminates is marked by $Z$. 
 This coding is lossy, as it throws away the information about
 characters, and does not encode the ``boundary'' between two
 sequence values. Moreover, with only the bitcode we cannot even tell
-whether the $S$s and $Z$s are for $\Left/\Right$ or $\Stars$.  
+whether the $S$s and $Z$s are for $\Left/\Right$ or $\Stars$,
+but this swill not be necessary in our proof.
 \begin{center}
 \begin{tabular}{lcl}
   $\textit{code}(\Empty)$ & $\dn$ & $[]$\\
@@ -650,7 +667,7 @@
 \end{tabular}    
 \end{center} 
 \noindent
-Assume we have a value $v$ and regular expression $r$
+We can prove that given a value $v$ and regular expression $r$
 with $\vdash v:r$,
 then we have the property that $\decode$ and $\code$ are
 reverse operations of one another:
@@ -664,8 +681,8 @@
 we obtain the property.
 \end{proof}
 \noindent
-Now we give out the central part of this lexing algorithm,
-the $\bder$ function (stands for \emph{b}itcoded-derivative):
+Now we define the central part of Sulzmann and Lu's lexing algorithm,
+the $\bder$ function (which stands for \emph{b}itcoded-derivative):
 \begin{center}
   \begin{tabular}{@{}lcl@{}}
   $(\ZERO)\,\backslash c$ & $\dn$ & $\ZERO$\\  
@@ -686,23 +703,17 @@
 \end{tabular}    
 \end{center}    
 \noindent
-For most time we use the infix notation $(\_\backslash\_)$ 
-to mean $\bder$ for brevity when
-there is no danger of confusion with derivatives on plain regular expressions.
-For example, we write $( _{[]}r^* ) \backslash c$ instead of $\bder \;c \; _{[]}r^*$,
-as the bitcodes at the front of $r^*$ indicates that it is 
-a bit-coded regular expression, not a plain one.\\
-$\bder$ tells us how regular expressions can be recursively traversed,
+The $\bder$ function tells us how regular expressions can be recursively traversed,
 where the bitcodes are augmented and carried around 
 when a derivative is taken.
 We give the intuition behind some of the more involved cases in 
 $\bder$. \\
 For example,
 in the \emph{star} case,
-a derivative on $_{bs}a^*$ means 
-that one more star iteratoin needs to be taken.
-we need to unfold it into a sequence,
-and attach an additional bit $Z$ to the front of $r \backslash c$
+a derivative of $_{bs}a^*$ means 
+that one more star iteration needs to be taken.
+We therefore need to unfold it into a sequence,
+and attach an additional bit $Z$ to the front of $a \backslash c$
 as a record to indicate one new star iteration is unfolded.
 
 \noindent
@@ -716,9 +727,9 @@
 
 \noindent
 This information will be recovered later by the $\decode$ function.
-The intuition is that the bit $Z$ will be decoded at the right location,
-because we accumulate bits from left to right (a rigorous proof will be given
-later).
+%The intuition is that the bit $Z$ will be decoded at the right location,
+%because we accumulate bits from left to right (a rigorous proof will be given
+%later).
 
 %\begin{tikzpicture}[ > = stealth, % arrow head style
 %        shorten > = 1pt, % don't touch arrow head to node
@@ -761,7 +772,7 @@
 
 \noindent
 Another place the $\bder$ function differs
-from derivatives on plain regular expressions
+from derivatives on regular expressions
 is the sequence case:
 \begin{center}
   \begin{tabular}{@{}lcl@{}}
@@ -774,36 +785,42 @@
 \end{tabular}    
 \end{center}    
 \noindent
-The difference is that (when $a_1$ is $\bnullable$)
+%We assume that $\bmkeps$ 
+%correctly extracts the bitcode for how $a_1$
+%matches the string prior to $c$ (more on this later).
+%The bitsequence $\textit{bs}$ which was initially 
+%attached to the first element of the sequence
+%$a_1 \cdot a_2$, has now been 
+%elevated to the top level of the $\sum$ constructor. 
+%This is because this piece of information will be needed 
+%whichever way the sequence is matched,
+%regardless of whether $c$ belongs to $a_1$ or $a_2$.
+The difference mainly lies in the $\textit{if}$ branch (when $a_1$ is $\bnullable$):
 we use $\bmkeps$ to store the lexing information
 in $a_1$ before collapsing 
 it (as it has been fully matched by string prior to $c$),
 and attach the collected bit-codes to the front of $a_2$
-before throwing away $a_1$. We assume that $\bmkeps$ 
-correctly extracts the bitcode for how $a_1$
-matches the string prior to $c$ (more on this later).
-The bitsequence $\textit{bs}$ which was initially 
-attached to the first element of the sequence
-$a_1 \cdot a_2$, has now been 
-elevated to the top level of the $\sum$ constructor. 
-This is because this piece of information will be needed 
-whichever way the sequence is matched,
-regardless of whether $c$ belongs to $a_1$ or $a_2$.
-
+before throwing away $a_1$. 
 In the injection-based lexer, $r_1$ is immediately thrown away in 
-subsequent derivatives on the right branch (when $r_1$ is $\nullable$),
-depite $r_1$ potentially storing information of previous matches:
+in the $\textit{if}$ branch,
+the information $r_1$ stores is therefore lost:
 \begin{center}
-	$(r_1 \cdot r_2 )\backslash c = (r_1 \backslash c) \cdot r_2 + r_2 \backslash c$
+	\begin{tabular}{lcl}
+$(r_1 \cdot r_2)\backslash c$ & $\dn$ & $\mathit{if} \, [] \in L(r_1)$\\
+	&   & $\mathit{then}\;(r_1\backslash c) \cdot r_2 \,+\, r_2\backslash c$\\
+	&   & $\mathit{else}\; \ldots$\\
+	\end{tabular}
 \end{center}
 \noindent
-this loss of information makes it necessary
-to store on stack all the regular expressions'
-``snapshots'' before they were
-taken derivative of.
-So that the related information will be available 
-once the child recursive 
-call finishes.\\
+
+%\noindent
+%this loss of information makes it necessary
+%to store on stack all the regular expressions'
+%``snapshots'' before they were
+%taken derivative of.
+%So that the related information will be available 
+%once the child recursive 
+%call finishes.\\
 The rest of the clauses of $\bder$ is rather similar to
 $\der$. \\
 Generalising the derivative operation with bitcodes to strings, we have 
@@ -829,16 +846,15 @@
   & & $\;\;\textit{else}\;\textit{None}$
 \end{tabular}
 \end{center}
-
 \noindent
-$\blexer$ first attaches bitcodes to a
-plain regular expression, and then do successive derivatives
+This function first attaches bitcodes to a
+plain regular expression $r$, and then builds successive derivatives
 with respect to the input string $s$, and
-then test whether the result is nullable.
-If yes, then extract the bitcodes out of the nullable expression,
+then test whether the result is (b)nullable.
+If yes, then extract the bitcodes from the nullable expression,
 and decodes the bitcodes into a lexical value.
 If there does not exists a match between $r$ and $s$ the lexer
-outputs $\None$ indicating a failed lex.\\
+outputs $\None$ indicating a mismatch.\\
 Ausaf and Urban formally proved the correctness of the $\blexer$, namely
 \begin{property}
 $\blexer \;r \; s = \lexer \; r \; s$.
@@ -848,7 +864,7 @@
 We introduce the proof later, after we give all
 the needed auxiliary functions and definitions.
 \subsection{An Example $\blexer$ Run}
-Before introducing the proof we shall first walk the reader 
+Before introducing the proof we shall first walk 
 through a concrete example of our $\blexer$ calculating the right 
 lexical information through bit-coded regular expressions.\\
 Consider the regular expression $(aa)^* \cdot (b+c)$ matching the string $aab$.
@@ -919,7 +935,7 @@
 is taken, which calls $\bmkeps$ on the nuallable $_Z\ONE\cdot (aa)^*$
 before processing $_Zb+_Sc$.\\
 The annotated regular expressions
-would look too cumbersome if we explicitly indicate all the
+would look overwhelming if we explicitly indicate all the
 locations where bitcodes are attached.
 For example,
 $(aa)^*\cdot (b+c)$ would 
@@ -927,7 +943,7 @@
 after 
 internalise.
 Therefore for readability we omit bitcodes if they are empty. 
-This applies to all example annotated 
+This applies to all annotated 
 regular expressions in this thesis.\\
 %and assume we have just read the first character $a$:
 %\begin{center}
@@ -1038,7 +1054,8 @@
 		allowing the possibility of 
 		pulling out the right lexical value from an
 		annotated regular expression at 
-		any stage of the algorithm.
+		any stage, using $\bmkeps$ or $\retrieve$ (details in
+		lemmas \ref{bmkepsRetrieve} and \ref{blexer_retrieve}).
 \end{itemize}
 We will elaborate on this, with the help of
 some helper functions such as $\retrieve$ and
@@ -1051,9 +1068,9 @@
 The first function we shall introduce is $\retrieve$.
 Sulzmann and Lu gave its definition, and
 Ausaf and Urban found
-its usage in mechanised proofs.
+useful in their mechanised proofs.
 Our bit-coded lexer ``retrieve''s the bitcodes using $\bmkeps$,
-after all the derivatives has been taken:
+after all the derivatives have been taken:
 \begin{center}
 \begin{tabular}{lcl}
 	& & $\ldots$\\
@@ -1063,21 +1080,13 @@
 \end{tabular}
 \end{center}
 \noindent
-$\bmkeps$ retrieves the value $v$'s
+The function $\bmkeps$ retrieves the value $v$'s
 information in the format
 of bitcodes, by travelling along the
 path of the regular expression that corresponds to a POSIX match,
 collecting all the bitcodes.
 We know that this "retrieved" bitcode leads to the correct value after decoding,
-which is $v_0$ in the injection-based lexing diagram.
-As an observation we pointed at the beginning of this section,
-the annotated regular expressions generated in successive derivative steps
-in $\blexer$ after $\erase$ has the same structure 
-as those appeared in $\lexer$.
-We redraw the diagram below to visualise this fact.
-We pretend that all the values are 
-ready despite they are only calculated in $\lexer$.
-In general we have $\vdash v_i:(a_i)_\downarrow$.
+which is $v_0$ in the injection-based lexing diagram:
 \vspace{20mm}
 \begin{figure}[H]%\label{graph:injLexer}
 \begin{center}
@@ -1099,11 +1108,17 @@
 \node[E = (p-1-2) (p-2-2)] {};
 \node[E = (p-1-3) (p-2-3)] {};
 \end{tikzpicture}
-
+\caption{Injection-based lexing diagram, 
+with matching value and regular expression pairs
+encircled}\label{Inj2}
 \end{figure}
 \vspace{20mm}
 \noindent
-We encircle in the diagram  all the pairs $v_i, a_i$ to show that these values
+We have that $\vdash v_i:(a_i)_\downarrow$,
+where $v_i$ are the intermediate values generated step by step in $\lexer$.
+These values are not explicitly created in $\blexer$.
+
+We encircled in the diagram  all the pairs $v_i, a_i$ to show that these values
 and regular expressions share the same structure.
 These pairs all contain adequate information to 
 obtain the final lexing result.
@@ -1111,7 +1126,7 @@
 lexical information is condensed in 
 $v_0$, whereas for the rightmost pair,
 the lexing result hides is in the bitcodes of $a_n$.\\
-$\bmkeps$ is able to extract from $a_n$ the result
+The function $\bmkeps$ is able to extract from $a_n$ the result
 by looking for nullable parts of the regular expression.
 However for regular expressions $a_i$ in general,
 they might not necessarily be nullable.
@@ -1129,9 +1144,10 @@
 		$f \; a_i\;v_i = f \; a_0 \; v_0 = \code \; v_0$ %\decode \;(\code \; v_0) \; (\erase \; a_0)$
 \end{itemize}
 \noindent
-Sulzmann and Lu came up with a function satisfying 
+Sulzmann and Lu came up with the following definition for $f$ satisfying 
 the above
-requirements, named \emph{retrieve}:
+requirements, and named it \emph{retrieve}:
+\vspace{-7mm}
 \begin{center}
 \begin{tabular}{llcl}
 	$\retrieve \; \, _{bs} \ONE$ & $   \Empty$ & $\dn$ & $\textit{bs}$\\
@@ -1155,8 +1171,9 @@
 \end{tabular}
 \end{center}
 \noindent
-As promised, $\retrieve$ collects the right bit-codes from the 
-final derivative $a_n$, guided by $v_n$:
+As stated, $\retrieve$ collects the right bit-codes from the 
+final derivative $a_n$, guided by $v_n$. This can be proved
+as follows:
 \begin{lemma}\label{bmkepsRetrieve}
 	$\bnullable \; a \implies \bmkeps \; a = \retrieve \; a \; (\mkeps \; (\erase \; a))$
 \end{lemma}
@@ -1164,17 +1181,17 @@
 	By a routine induction on $a$.
 \end{proof}
 \noindent
-The design of $\retrieve$ enables us to extract bitcodes
-from both annotated regular expressions and values.
-$\retrieve$ always ``sucks up'' all the information.
+%The design of $\retrieve$ enables us to extract bitcodes
+%from both annotated regular expressions and values.
+%$\retrieve$ always ``sucks up'' all the information.
 When more information is stored in the value, we would be able to
 extract more from the value, and vice versa.
 For example in star iterations, $\retrieve$ will be able to extract from $\Stars \; vs$
 exactly the same bitcodes as $\code \; (\Stars \; vs)$:
-\begin{lemma}
+\begin{lemma}\label{retrieveEncodeSTARS}
   If $\forall v \in vs. \vdash v : r$, and  $\code \; v = \retrieve \; (\internalise\; r) \; v$\\
   then $\code \; (\Stars \; vs) = \retrieve \; _{[]}((\internalise \;  r)^*) \; (\Stars \; vs)$
-\end{lemma}\label{retrieveEncodeSTARS}
+\end{lemma}
 \begin{proof}
 	By induction on the value list $vs$.
 \end{proof}
@@ -1184,8 +1201,8 @@
 \begin{center}
   $\retrieve \; _{bs}((\internalise \;  r)^*) \; (\Stars \; [] )  = bs @ [S]$.
 \end{center}
-In general, if we have a regular expression just internalised
-and the lexing result value, we could $\retrieve$ the bitcdoes
+In general, if we have a regular expression that is just internalised
+and the final lexing result value, we can $\retrieve$ the bitcodes
 that look as if we have en$\code$-ed the value as bitcodes:
 \begin{lemma}
 	$\vdash v : r \implies \retrieve \; (r)^\uparrow \; v = \code \; v$
@@ -1195,7 +1212,7 @@
 	The star case uses lemma \ref{retrieveEncodeSTARS}.
 \end{proof}
 \noindent
-The following property is more interesting, as
+The following property of $\retrieve$ is crucial, as
 it provides a "bridge" between $a_0$ and $a_n $ in the
 lexing diagram by linking adjacent regular expressions $a_i$ and
 $a_{i+1}$.
@@ -1203,13 +1220,14 @@
 can retrieve the same bits from a derivative 
 regular expression as those from 
 before the derivative took place,
-provided that the right values are used respectively:
+provided that the corresponding values are used respectively:
 \begin{lemma}\label{retrieveStepwise}
 	$\vdash v : (r\backslash c) \implies \retrieve \; (r \backslash c)  \;  v= \retrieve \; r \; (\inj \; r\; c\; v)$
 \end{lemma}
 \begin{proof}
-	By induction on $r$, where $v$ is allowed to be arbitrary.
-	The induction principle is function $\erase$'s cases.
+	By induction on $r$, generalising over $v$.
+	The induction principle in our formalisation
+	is function $\erase$'s cases.
 \end{proof}
 \noindent
 Before we move on to the next
@@ -1222,13 +1240,13 @@
 	Using lemma \ref{bmkepsRetrieve}.
 \end{proof}
 \noindent
-$\retrieve$ allows connecting
+The function $\retrieve$ allows connecting
 between the intermediate 
 results $a_i$ and $a_{i+1}$ in $\blexer$.
 For plain regular expressions something similar is required.
 
 \subsection{$\flex$}
-Ausaf and Urban cleverly defined an auxiliary function called $\flex$ for $\lexer$,
+Ausaf and Urban defined an auxiliary function called $\flex$ for $\lexer$,
 defined as
 \begin{center}
 \begin{tabular}{lcl}
@@ -1238,6 +1256,7 @@
 \end{center}
 which accumulates the characters that need to be injected back, 
 and does the injection in a stack-like manner (last taken derivative first injected).
+The function
 $\flex$ can calculate what $\lexer$ calculates, given the input regular expression
 $r$, the identity function $id$, the input string $s$ and the value
 $v_n= \mkeps \; (r\backslash s)$:
@@ -1249,13 +1268,13 @@
 \end{proof}
 \noindent
 The main advantage of using $\flex$
-in addition to $\lexer$ is that
+in $\lexer$ is that
 $\flex$ makes the value $v$ and function $f$
 that manipulates the value  explicit parameters,
 which ``exposes'' $\lexer$'s recursive call
 arguments and allows us to ``set breakpoints'' and ``resume''
-at any point during $\lexer$'s recursive calls.\\
-The following stepwise property holds. 
+at any point during $\lexer$'s recursive calls. Therefore
+the following stepwise property holds. 
 \begin{lemma}\label{flexStepwise}
 	$\textit{flex} \;\; r \;\; f \;\; (s@[c]) \;\; v = \flex \;\; r \;\; f  \;\; s \;\; (\inj \; (r\backslash s) \; c \; v) $
 \end{lemma}
@@ -1265,13 +1284,14 @@
 \noindent
 With $\flex$ and $\retrieve$, 
 we are ready to connect $\lexer$ and $\blexer$,
-giving the correctness proof.
+giving the correctness proof for $\blexer$.
 
 %----------------------------------------------------------------------------------------
 %	SECTION  correctness proof
 %----------------------------------------------------------------------------------------
-\section{Correctness of Bit-coded Algorithm (Without Simplification)}
-$\flex$ and $\blexer$ essentially calculates the same thing.
+\section{Correctness of the  Bit-coded Algorithm (Without Simplification)}
+We can first show that 
+$\flex$ and $\blexer$ calculates the same thing.
 \begin{lemma}\label{flex_retrieve}
 	If $\vdash v: (r\backslash s)$, then\\
 $\flex \; r \; \textit{id}\; s\; v = \decode \; (\retrieve \; (r\backslash s )\; v) \; r$
@@ -1328,14 +1348,18 @@
 result of the bitcoded lexer more explicitly,
 we use the fact from the previous chapter that
 \[
-	 (r, s) \rightarrow v \;\; \textit{iff} \;\; \lexer \; r \; s = v
+	 (r, s) \rightarrow v \;\; \textit{iff} \;\; \lexer \; r \; s =\Some \;  v
+	 \quad \quad
+	 \nexists v. (r, s) \rightarrow v\;\; \textit{iff} \;\; \lexer \;r\;s = None
 \]
 to obtain this corollary:
 \begin{corollary}\label{blexerPOSIX}
-	$(r, s) \rightarrow v \;\; \textit{iff} \;\; \blexer \; r \; s = v$
+	The $\blexer$ function correctly implements a POSIX lexer, namely\\
+	$(r, s) \rightarrow v \;\; \textit{iff} \;\; \blexer \; r \; s = \Some \; v$\\
+	$\nexists v. (r, s) \rightarrow v\;\; \textit{iff} \;\; \blexer \;r\;s = None$
 \end{corollary}
-Our main reason for wanting a bit-coded algorithm over 
-the injection-based one is for its capabilities of allowing
+Our main reason for analysing the bit-coded algorithm over 
+the injection-based one is that it allows us to define
 more aggressive simplifications.
 We will elaborate on this in the next chapter.