--- a/ChengsongTanPhdThesis/Chapters/Bitcoded1.tex Mon Jul 24 11:09:48 2023 +0100
+++ b/ChengsongTanPhdThesis/Chapters/Bitcoded1.tex Tue Jul 25 17:28:29 2023 +0100
@@ -10,14 +10,30 @@
%Chapter 3\ref{Chapter3}.
In this chapter, we are going to describe the bit-coded algorithm
introduced by Sulzmann and Lu \parencite{Sulzmann2014} and their correctness proof.
+Just like in chapter \ref{Inj}, the algorithms and proofs have been included
+for self-containedness reasons,
+even though they have been originally found and described by
+Sulzmann and Lu (\cite{Sulzmann2014}) and
+Ausaf et al. (\cite{AusafDyckhoffUrban2016} and \cite{Ausaf}).
+%In addition to this, the
+The details of the proofs in this thesis
+also follow more closely the actual Isabelle formalisation.
+For example, lemma \ref{flexStepwise} and \ref{retrieveStepwise}
+are not included in the publications by Ausaf et al., despite them being
+some of the key lemmas leading to the correctness result.
+
+We will first motivate the bit-coded algorithm in section \ref{bitMotivate},
+and then introduce their formal definitions in section \ref{bitDef},
+followed by a description of the correctness proof of $\blexer$ in section \ref{blexerProof}.
+
%to address the growth problem of
%derivatives of
%regular expressions.
-We have implemented their algorithm in Scala and Isabelle,
-and found problems
-in their algorithm, such as de-duplication not working properly and redundant
-fixpoint construction.
-\section{The Motivation Behind Using Bitcodes}
+%We have implemented their algorithm in Scala and Isabelle,
+%and found problems
+%in their algorithm, such as de-duplication not working properly and redundant
+%fixpoint construction.
+\section{The Motivation Behind Using Bitcodes}\label{bitMotivate}
Let us give again the definition of $\lexer$ from Chapter \ref{Inj}:
\begin{center}
\begin{tabular}{lcl}
@@ -316,7 +332,7 @@
The stack seems to grow at least quadratically with respect
to the input (not taking into account the size bloat of $r_i$),
which can be inefficient and prone to stack overflows.
-\section{Bitcoded Algorithm}
+\section{Bitcoded Algorithm}\label{bitDef}
To address this,
Sulzmann and Lu defined a new datatype
called \emph{annotated regular expression},
@@ -1042,7 +1058,7 @@
%-----------------------------------
% SUBSECTION 1
%-----------------------------------
-\section{Correctness Proof of $\textit{Blexer}$}
+\section{Correctness Proof of $\textit{Blexer}$}\label{blexerProof}
Why is $\blexer$ correct?
In other words, why is it the case that
$\blexer$ outputs the same value as $\lexer$?
--- a/ChengsongTanPhdThesis/Chapters/Inj.tex Mon Jul 24 11:09:48 2023 +0100
+++ b/ChengsongTanPhdThesis/Chapters/Inj.tex Tue Jul 25 17:28:29 2023 +0100
@@ -1332,7 +1332,7 @@
%TODO: FILL in the other defs
\begin{center}
\begin{tabular}{lcl}
-$L \; \ZERO$ & $\dn$ & $\phi$\\
+$L \; \ZERO$ & $\dn$ & $\varnothing$\\
$L \; \ONE$ & $\dn$ & $\{[]\}$\\
$L \; c$ & $\dn$ & $\{[c]\}$\\
$L \; (r_1 + r_2)$ & $\dn$ & $ L \; r_1 \cup L \; r_2$\\
@@ -2072,15 +2072,7 @@
injected back to that sub-value;
$\inj$ assembles all parts
to form a new value.
-\subsection{An Example of how Injection Works}
-We will provide a few examples on why
-\begin{center}
- \begin{tabular}{lcl}
- $\inj \; (a\cdot b)\cdot c \; \Seq \; \ONE \; b$ & $=$ & $(a+e)$\\
- $A$ & $B$ & $C$
- \end{tabular}
-\end{center}
For instance, the last clause is an
injection into a sequence value $v_{i+1}$
whose second child
@@ -2111,6 +2103,101 @@
wrapped under the $\Stars$
constructor, giving us $\Stars \; ((\inj \; r \; c \; v) ::vs)$.
+Putting all together, Sulzmann and Lu obtained the following algorithm
+outlined in the injection-based lexing diagram \ref{graph:inj}:
+\begin{center}
+\begin{tabular}{lcl}
+ $\lexer \; r \; [] $ & $=$ & $\textit{if} \; (\nullable \; r)\; \textit{then}\; \Some(\mkeps \; r) \; \textit{else} \; \None$\\
+ $\lexer \; r \;c::s$ & $=$ & $\textit{case}\; (\lexer \; (r\backslash c) \; s) \;\textit{of}\; $\\
+ & & $\quad \phantom{\mid}\; \None \implies \None$\\
+ & & $\quad \mid \Some(v) \implies \Some(\inj \; r\; c\; v)$
+\end{tabular}
+\end{center}
+\noindent
+
+\subsection{Examples on How Injection and Lexer Works}
+We will provide a few examples on how $\inj$ and $\lexer$
+works by showing their values in each recursive call on some
+concrete examples.
+We start with the call $\lexer \; (a+aa)^*\cdot c\; aac$
+on the lexer, note that the value's character constructor
+$\Char \;c$ is abbreviated as $c$ for readability.
+Similarly the last derivative's sub-expression
+is abbreviated as $r_{=0}$\footnote{which is equal to
+$((\ZERO + (\ZERO a+ \ZERO))\cdot(a+aa)^* + (\ZERO + \ZERO a)\cdot (a+aa)^*)+
+((\ZERO+\ZERO a)\cdot(a+aa)^* + (\ZERO+\ZERO a)\cdot(a+aa)^*),
+$.}
+whose language interpretation is equivalent to that of
+$\ZERO$ and therefore not crucial to be displayed fully expanded,
+as they will not be injected into.%our example run.
+\begin{center}
+ \begin{tabular}{lcl}
+ $(a+\textcolor{magenta}{a}\textcolor{blue}{a})^* \cdot \textcolor{red}{c}$ & $\stackrel{\backslash \textcolor{magenta}{a}}{\rightarrow}$ & $((\ONE + \textcolor{magenta}{\ONE} \textcolor{blue}{a})\cdot (a+aa)^*)\cdot \textcolor{red}{c}+\ZERO$\\
+ %& $\stackrel{\rightarrow}{\backslash a}$ & $((\ONE + \ONE a)\cdot (a+aa)^*)\cdot c + \ZERO$\\
+ & $\stackrel{\backslash \textcolor{blue}{a}}{\rightarrow}$ &
+ $(((\ZERO+(\ZERO a+ \textcolor{blue}{\ONE}))\cdot (a+aa)^* + (\ONE+\ONE a)\cdot (a+aa)^* )\cdot \textcolor{red}{\mathbf{c}} + \ZERO)+\ZERO$\\
+ & $\stackrel{\backslash \textcolor{red}{c}}{\rightarrow}$ &
+ $((r_{=0}\cdot c + \textcolor{red}{\ONE})+\ZERO)+\ZERO$\\
+ & $\stackrel{\mkeps}{\rightarrow}$ & $\Left (\Left \; (\Right \; \textcolor{red}{\Empty}))$ \\
+ & $\stackrel{\inj \;\textcolor{red}{c} }{\rightarrow}$ &
+ $\Left \; (\Left \; (\Seq \;(\Left \; (\Seq \; (\Right \; (\Right\; \textcolor{blue}{\Empty})) \; \Stars \, [])) \; \textcolor{red}{c}))$\\
+ & $\stackrel{\inj \;\textcolor{blue}{a}}{\rightarrow}$ &
+ $\Left\; (\Seq \; (\Seq\; (\Right \; (\Seq \; \textcolor{magenta}{\Empty }\; \textcolor{blue}{ \mathbf{a}}) )
+ \;\Stars \,[]) \; \textcolor{red}{c})$\\
+ & $\stackrel{\inj \;\textcolor{magenta}{a}}{\rightarrow}$ & $\Seq \; (\Stars \; [\Right \; (\Seq \; \mathbf{\textcolor{magenta}{a}} \; \textcolor{blue}{a})]) \; \textcolor{red}{ c}$
+
+ %$\inj \; r \; c\A$
+ %$\inj \; (a\cdot b)\cdot c \; \Seq \; \ONE \; b$ & $=$ & $(a+e)$\\
+ \end{tabular}
+\end{center}
+\noindent
+We have assigned different colours for each character,
+as well as their corresponding locations in values and regular expressions.
+The most recently injected character is marked with a bold font.
+%Their corresponding derivative steps have been marked with the same
+%colour. We also mark the specific location where the coloured character
+%was injected into with the same colour, before and after the injection occurred.
+%TODO: can be also used to motivate injection->blexer in Bitcoded1
+To show the details of how $\inj$ works,
+we zoom in to the second injection above,
+illustrating the recursive calls involved:
+\begin{center}
+ \begin{tabular}{lcl}
+$\inj \;\quad ((\ONE + {\ONE}
+ \textcolor{blue}{a})\cdot (a+aa)^*)\cdot
+ c + \ZERO \; \quad \textcolor{blue}{a} \; $ & &\\
+$\quad \Left \;
+(\Left \; (\Seq \;(\Left \; (\Seq \; (\Right \; (\Right\;
+ \textcolor{blue}{\Empty})) \; \Stars \, [])) \; c))$ & \\$=$\\
+ $\Left \; (\inj \; ((\ONE + \ONE
+ \textcolor{blue}{a})\cdot (a+aa)^*)\cdot
+ c \quad \textcolor{blue}{a} \quad (\Left \; (\Seq\; ( \Left \; (\Seq \; (\Right \; (\Right\;
+ \textcolor{blue}{\Empty})) \; \Stars \, []) ) \; c ) )\;\;)$ &\\ $=$\\
+ $\Left \; (\Seq \; (\inj \; (\ONE + \ONE \textcolor{blue}{a})\cdot(a+aa)^* \quad \textcolor{blue}{a} \quad
+ (\Left \; (\Seq \; (\Right \; (\Right\;\textcolor{blue}{\Empty})))) ) \; c ) $ & \\$=$\\
+ $\Left \; (\Seq \; (\Seq \; (\inj \quad (\ONE + \ONE \textcolor{blue}{a}) \quad \textcolor{blue}{a}\quad \Right \;(\Right \; \textcolor{blue}{\Empty}) ) \; \Stars \,[])\; c)$ &\\ $=$ \\
+ $\Left \; (\Seq \; (\Seq \; (\Right \; (\inj \; \ONE \textcolor{blue}{a} \quad \textcolor{blue}{a}\quad (\Right \;\textcolor{blue}{\Empty})))) \; \Stars \, [])$
+ & \\ $=$\\
+ $\Left \; (\Seq \; (\Seq \; (\Right \; (\Seq \; (\mkeps \; \ONE)\;(\inj \;\textcolor{blue}{a} \; \textcolor{blue}{a} \; \textcolor{blue}{\Empty})) ) ) \; \Stars \, [] )$
+ & \\ $=$\\
+ $\Left \; (\Seq \; (\Seq \; (\Right \; (\Seq \; \Empty \; \textcolor{blue}{a}) ) ) \; \Stars \, [] )$
+ \end{tabular}
+\end{center}
+\noindent
+We will now introduce the proofs related to $\inj$ and
+$\lexer$. These proofs have originally been found by Ausaf et al.
+in their 2016 work \cite{AusafDyckhoffUrban2016}.
+Proofs to some of these lemmas have also been discussed in more detail in
+Ausaf's thesis \cite{Ausaf}.
+Nevertheless, we still introduce these proofs, to make this thesis
+self-contained as we show inductive variants used in these proofs break
+after more simplifications are introduced.
+%Also note that lemmas like \ref{}described in this thesis is a more faithful
+%representation of the actual accompanying Isabelle formalisation, and
+%therefore
+
+
+
Recall that lemma
\ref{mePosix} tells us that
$\mkeps$ always generates the POSIX value.
@@ -2212,17 +2299,14 @@
The star case can be proven similarly.
\end{proof}
\noindent
-Putting all together, Sulzmann and Lu obtained the following algorithm
-outlined in the diagram \ref{graph:inj}:
-\begin{center}
-\begin{tabular}{lcl}
- $\lexer \; r \; [] $ & $=$ & $\textit{if} \; (\nullable \; r)\; \textit{then}\; \Some(\mkeps \; r) \; \textit{else} \; \None$\\
- $\lexer \; r \;c::s$ & $=$ & $\textit{case}\; (\lexer \; (r\backslash c) \; s) \;\textit{of}\; $\\
- & & $\quad \phantom{\mid}\; \None \implies \None$\\
- & & $\quad \mid \Some(v) \implies \Some(\inj \; r\; c\; v)$
-\end{tabular}
-\end{center}
-\noindent
+
+
+
+
+
+
+
+%TODO: Cut from previous lexer def, need to make coherent
The central property of the $\lexer$ is that it gives the correct result
according to
POSIX rules.
--- a/RegexExplosionPlot/injToSimp.sc Mon Jul 24 11:09:48 2023 +0100
+++ b/RegexExplosionPlot/injToSimp.sc Tue Jul 25 17:28:29 2023 +0100
@@ -393,14 +393,14 @@
def simp_lexer(r: Rexp, s: List[Char]): Val = s match{
case Nil => {//println("mkeps!"); println(r);
val v = mkeps(r);
- //println(v);
+ println(v);
v}
case c::cs => {
val derc1 = der(c, r);
val derc = light_simp(derc1);
println("derivative size: "+ size(derc));
val v = simp_lexer(derc, cs);
- //println("Injection!");println("r: "+r); println("c: "+c); println("v:"+v);
+ println("Injection!");println("r: "+r); println("c: "+c); println("v:"+v);
inj(r, c, v ) }
}
@@ -415,7 +415,7 @@
//val derc = light_simp(derc1);
println("derivative size: "+ size(derc));
val v = print_lexer(derc, cs);
- //println("Injection!");println("r: "+r); println("c: "+c); println("v:"+v);
+ println("Injection!");println("r: "+r); println("c: "+c); println("v:"+v);
inj(r, c, v ) }
}
val r = STAR("a"|"aa") ~ ("c")
@@ -1229,9 +1229,10 @@
// println(" " + (t1 - t0)/1e9 + "")
// result
//}
-//val astarstarB = STAR(STAR("a"))~("b")
-//val data_points_x_axis = List.range(0, 60000, 3000)
+val astarstarB = STAR(STAR("a"))~("b")
+val data_points_x_axis = List.range(0, 60000, 3000)
//for (i <- data_points_x_axis) {print(i); time { blexing(astarstarB, ("a"*i)) } }
////println(data_points_x_axis.zip( List(0,1,3,4)))
//for(i <- List.range(0, 40000, 3000)) print(i + " ")
+//for (i <- data_points_x_axis) {print(i); time { simp_lexer(astarstarB, ("a"*i).toList) } }