diff -r e71a6e2aca2d -r 35df9cdd36ca ChengsongTanPhdThesis/Chapters/Bitcoded1.tex
--- a/ChengsongTanPhdThesis/Chapters/Bitcoded1.tex	Mon Aug 15 17:26:08 2022 +0200
+++ b/ChengsongTanPhdThesis/Chapters/Bitcoded1.tex	Wed Aug 17 01:09:13 2022 +0100
@@ -11,11 +11,299 @@
 In this chapter, we are going to describe the bit-coded algorithm
 introduced by Sulzmann and Lu \parencite{Sulzmann2014} to address the growth problem of 
 regular expressions. 
+We have implemented their algorithm in Scala, and found out inefficiencies
+in their algorithm such as de-duplication not working properly and redundant
+fixpoint construction. Their algorithm is improved and verified with the help of
+formal proofs.
 \section{Bit-coded Algorithm}
-The lexer algorithm in Chapter \ref{Inj}, as shown in \ref{InjFigure},
+We first do a recap of what was going on 
+in the lexer algorithm in Chapter \ref{Inj},
+\begin{center}
+\begin{tabular}{lcl}
+	$\lexer \; r \; [] $ & $=$ & $\textit{if} \; (\nullable \; r)\; \textit{then}\;  \Some(\mkeps \; r) \; \textit{else} \; \None$\\
+	$\lexer \; r \;c::s$ & $=$ & $\textit{case}\; (\lexer \; (r\backslash c) \; s) \;\textit{of}\; $\\
+	& & $\quad \phantom{\mid}\; \None \implies \None$\\
+	& & $\quad \mid           \Some(v) \implies \Some(\inj \; r\; c\; v)$
+\end{tabular}
+\end{center}
+\noindent
+The algorithm recursively calls $\lexer$ on 
+each new character input,
+and before starting a child call
+it stores information of previous lexing steps
+on a stack, in the form of regular expressions
+and characters: $r_0$, $c_0$, $r_1$, $c_1$, etc.
+Each descent into deeper recursive calls in $\lexer$
+causes a new pair of $r_i, c_i$ to be pushed to the call stack.
+\begin{figure}
+\begin{tikzpicture}[->,>=stealth',shorten >=1pt,auto,thick] 
+%\draw (-6,-6) grid (6,6);
+\node  [ circle ] (r) at (-6, 5) {$r$};
+
+%\node (-4, 6) (c1) circle [radius = 0.3] {$c_1$};
+\node  [circle, minimum size = 0.1, draw] (c1) at (-4, 5.4) {$c_1$};
+%
+%\node (-2, 5) (r1) circle [radius = 0.5] {$r_1$};
+\node  [minimum size = 0.5, circle, draw] (r1) at (-2, 5) {$r_1$};
+
+\node [minimum width = 2cm, rectangle, draw] (stack) at (0, 3) {Stack};
+
+\path
+	(r)
+        edge [->, >=stealth',shorten >=1pt] node[left] {} (r1);
+
+\path   (r1)
+	edge [bend right, dashed] node {saved} (stack);
+\path   (c1)
+	edge [bend right, dashed] node {} (stack);
+
+
+\end{tikzpicture}
+\caption{First Derivative Taken}
+\end{figure}
+
+
+
+\begin{figure}
+\begin{tikzpicture}[->,>=stealth',shorten >=1pt,auto,thick] 
+%\draw (-6,-6) grid (6,6);
+\node  [ circle ] (r) at (-6, 5) {$r$};
+
+%\node (-4, 6) (c1) circle [radius = 0.3] {$c_1$};
+\node  [circle, minimum size = 0.1, ] (c1) at (-4, 5.4) {$c_1$};
+%
+%\node (-2, 5) (r1) circle [radius = 0.5] {$r_1$};
+\node  [minimum size = 0.5, circle, ] (r1) at (-2, 5) {$r_1$};
+
+
+\node [circle, minimum size = 0.1, draw] (c2) at (0, 5.4) {$c_2$};
+%
+%\node (2, 5) (r2) circle [radius = 0.5] {$r_2$};
+\node [circle, draw] (r2) at (2, 5) {$r_2$};
+\node [minimum width = 3cm, minimum height = 1cm, rectangle, draw] (stack) at (0, 2) {\large Stack};
+
+\path
+	(r)
+        edge [->, >=stealth',shorten >=1pt] node[left] {} (r1);
+
+\path   (r2)
+	edge [bend right, dashed] node {} (stack);
+\path   (c2)
+	edge [bend right, dashed] node {} (stack);
+
+\path   (r1)
+	edge [] node {} (r2);
+
+\end{tikzpicture}
+\caption{Second Derivative Taken}
+\end{figure}
+\noindent
+As the number of derivative steps increase,
+the stack would increase:
+
+\begin{figure}
+\begin{tikzpicture}[->,>=stealth',shorten >=1pt,auto,thick] 
+%\draw (-6,-6) grid (6,6);
+\node  [ circle ] (r) at (-6, 5) {$r$};
+
+%\node (-4, 6) (c1) circle [radius = 0.3] {$c_1$};
+\node  [circle, minimum size = 0.1, ] (c1) at (-4, 5.4) {$c_1$};
+%
+%\node (-2, 5) (r1) circle [radius = 0.5] {$r_1$};
+\node  [minimum size = 0.5, circle, ] (r1) at (-2, 5) {$r_1$};
+
+
+\node [circle, minimum size = 0.1, ] (c2) at (0, 5.4) {$c_2$};
+%
+%\node (2, 5) (r2) circle [radius = 0.5] {$r_2$};
+\node [circle, ] (r2) at (2, 5) {$r_2$};
+\node [minimum width = 4cm, minimum height = 2.5cm, rectangle, draw] (stack) at (0, 1) { \large Stack};
+
+\node [] (ldots) at (3.5, 5) {};
+%\node (6, 5) (rn) circle [radius = 0.5] {$r_n$};
+
+\node [minimum size = 0.5, circle, ] (rn) at (6, 5) {};
+
+\node (rldots) at ($(ldots)!.4!(rn)$) {\ldots};
+
+\path
+	(r)
+        edge [->, >=stealth',shorten >=1pt] node[left] {} (r1);
+
+\path   (rldots)
+	edge [bend left, dashed] node {} (stack);
+
+\path   (r1)
+	edge [] node {} (r2);
+
+\path   (r2)
+	edge [] node {} (ldots);
+\path   (ldots)
+	edge [bend left, dashed] node {} (stack);
+\path   (5.03, 4.9)
+	edge [bend left, dashed] node {} (stack);
+\end{tikzpicture}
+\caption{More Derivatives Taken}
+\end{figure}
+\noindent
+After all derivatives have been taken, the stack grows to a maximum size
+and the pair of regular expressions and characters $r_i, c_{i+1}$ 
+are then popped out and used in the injection phase:
+
+\begin{tikzpicture}[->,>=stealth',shorten >=1pt,auto,thick] 
+%\draw (-6,-6) grid (6,6);
+\node  [radius = 0.5, circle, draw] (r) at (-6, 5) {$r$};
+
+%\node (-4, 6) (c1) circle [radius = 0.3] {$c_1$};
+\node  [circle, minimum size = 0.1, draw] (c1) at (-4, 5.4) {$c_1$};
+%
+%\node (-2, 5) (r1) circle [radius = 0.5] {$r_1$};
+\node  [minimum size = 0.5, circle, draw] (r1) at (-2, 5) {$r_1$};
+%
+%\node (0, 6)  (c2) circle [radius = 0.3] {$c_2$};
+\node [circle, minimum size = 0.1, draw] (c2) at (0, 5.4) {$c_2$};
+%
+%\node (2, 5) (r2) circle [radius = 0.5] {$r_2$};
+\node [circle, draw] (r2) at (2, 5) {$r_2$};
+%
+%
+\node [] (ldots) at (4.5, 5) {};
+%\node (6, 5) (rn) circle [radius = 0.5] {$r_n$};
+
+\node [minimum size = 0.5, circle, draw] (rn) at (6, 5) {$r_n$};
+
+\node at ($(ldots)!.4!(rn)$) {\ldots};
+
+\node [minimum size = 0.5, circle, draw] (vn) at (6, -5) {$v_n$};
+
+\node [] (ldots2) at (3.5, -5) {};
+
+\node  [minimum size = 0.5, circle, draw] (v2) at (2, -5) {$v_2$};
+
+\node at ($(ldots2)!.4!(v2)$) {\ldots};
+
+
+\node [circle, draw] (v1) at (-2, -5) {$v_1$};
+
+\node  [radius = 0.5, circle, draw] (v) at (-6, -5) {$v$};
+
+\node [minimum size = 6cm, rectangle, draw] (stack) at (0, 0) {\Huge Stack};
+
+\path
+	(r)
+        edge [->, >=stealth',shorten >=1pt] node[left] {} (r1);
+\path
+	(r1)
+        edge [] node {} (r2);
+\path   (r2)
+	edge [] node {} (ldots);
+\path   (rn)
+	edge [] node {$\mkeps$} (vn);
+\path   (vn) 
+	edge [] node {} (ldots2);
+\path   (v2)
+	edge [] node {$\inj \; r_1 \; c_2\;v_2$} (v1);
+
+\path   (v1)
+	edge [] node {$\inj \; r \; c_1 \; v_1$} (v);
+\path   (r)
+	edge [dashed, bend right] node {} (stack);
+
+\path   (r1)
+	edge [dashed, ] node {} (stack);
+
+\path   (c1)
+	edge [dashed, bend right] node {} (stack);
+
+\path   (c2)
+	edge [dashed] node {} (stack);
+\path   (4.5, 5)
+	edge [dashed, bend left] node {} (stack);
+\path   (4.9, 5)
+	edge [dashed, bend left] node {} (stack);
+\path   (5.3, 5)
+	edge [dashed, bend left] node {} (stack);
+\path (r2)
+	edge [dashed, ] node {} (stack);
+\path (rn)
+	edge [dashed, bend left] node {} (stack);
+\end{tikzpicture}
+%\begin{tikzpicture}[->,>=stealth',shorten >=1pt,auto,thick] 
+%%\draw (-6,-6) grid (6,6);
+%\node  [radius = 0.5, circle, draw] (r) at (-6, 5) {$r$};
+%
+%%\node (-4, 6) (c1) circle [radius = 0.3] {$c_1$};
+%\node  [circle, minimum size = 0.1] (c1) at (-4, 5.4) {$c_1$};
+%%
+%%\node (-2, 5) (r1) circle [radius = 0.5] {$r_1$};
+%\node  [minimum size = 0.5, circle, draw] (r1) at (-2, 5) {$r_1$};
+%%
+%%\node (0, 6)  (c2) circle [radius = 0.3] {$c_2$};
+%\node [circle, minimum size = 0.1] (c2) at (0, 5.4) {$c_2$};
+%%
+%%\node (2, 5) (r2) circle [radius = 0.5] {$r_2$};
+%\node [circle, draw] (r2) at (2, 5) {$r_2$};
+%%
+%%
+%\node [] (ldots) at (4.5, 5) {};
+%%\node (6, 5) (rn) circle [radius = 0.5] {$r_n$};
+%
+%\node [minimum size = 0.5, circle, draw] (rn) at (6, 5) {$r_n$};
+%
+%\node at ($(ldots)!.4!(rn)$) {\ldots};
+%
+%\node [minimum size = 0.5, circle, draw] (vn) at (6, -5) {$v_n$};
+%
+%\node [] (ldots2) at (3.5, -5) {};
+%
+%\node  [minimum size = 0.5, circle, draw] (v2) at (2, -5) {$v_2$};
+%
+%\node at ($(ldots2)!.4!(v2)$) {\ldots};
+%
+%
+%\node [circle, draw] (v1) at (-2, -5) {$v_1$};
+%
+%\node  [radius = 0.5, circle, draw] (v) at (-6, -5) {$v$};
+%
+%\node [minimum size = 6cm, rectangle, draw] (stack) at (0, 0) {\Huge Stack};
+%
+%\path
+%	(r)
+%        edge [->, >=stealth',shorten >=1pt] node[left] {} (r1);
+%\path
+%	(r1)
+%        edge [] node {} (r2);
+%\path   (r2)
+%	edge [] node {} (ldots);
+%\path   (rn)
+%	edge [] node {$\mkeps$} (vn);
+%\path   (vn) 
+%	edge [] node {} (ldots2);
+%\path   (v2)
+%	edge [] node {} (v1);
+%
+%\path   (v1)
+%	edge [] node {} (v);
+%\path   (r)
+%	edge [] node {saved} (stack);
+%
+%\path   (r1)
+%	edge [] node {saved} (stack);
+%\end{tikzpicture}
+\noindent
+The information stored in characters and regular expressions
+make the algorithm work in an elegant way, at the expense of being
+storing quite a bit of verbose information.
+
+
+
+The lexer algorithm in Chapter \ref{Inj},
 stores information of previous lexing steps
 on a stack, in the form of regular expressions
 and characters: $r_0$, $c_0$, $r_1$, $c_1$, etc.
+The red part represents what we already know during the first
+derivative phase,
+and the blue part represents the unknown part of input.
 \begin{ceqn}
 \begin{equation}%\label{graph:injLexer}
 	\begin{tikzcd}[ampersand replacement=\&, execute at end picture={
@@ -39,22 +327,19 @@
 \end{equation}
 \end{ceqn}
 \noindent
-The red part represents what we already know during the first
-derivative phase,
-and the blue part represents the unknown part of input.
 The red area expands as we move towards $r_n$, 
-indicating an increasing stack size during lexing.
-Despite having some partial lexing information during
+indicating more is known about the lexing result.
+Despite knowing this partial lexing information during
 the forward derivative phase, we choose to store them
-temporarily, only to convert the information to lexical
-values at a later stage. In essence we are repeating work we
-have already done.
-This is both inefficient and prone to stack overflow.
-A natural question arises as to whether we can store lexing
+all the way until $r_n$ is reached.
+Then we reconstruct the value character by character
+values at a later stage, using information in a Last-In-First-Out
+manner. Although the algorithm is elegant and natural,
+it can be inefficient and prone to stack overflow.\\
+It turns out we can store lexing
 information on the fly, while still using regular expression 
 derivatives.
-
-If we remove the details of the individual 
+If we remove the individual 
 lexing steps, and use red and blue areas as before
 to indicate consumed (seen) input and constructed
 partial value (before recovering the rest of the stack),