lexing: changeset 667:660cf698eb26

--- a/ChengsongTanPhdThesis/Chapters/Bitcoded1.tex	Mon Jul 24 11:09:48 2023 +0100
+++ b/ChengsongTanPhdThesis/Chapters/Bitcoded1.tex	Tue Jul 25 17:28:29 2023 +0100
@@ -10,14 +10,30 @@
 %Chapter 3\ref{Chapter3}. 
 In this chapter, we are going to describe the bit-coded algorithm
 introduced by Sulzmann and Lu \parencite{Sulzmann2014} and their correctness proof.
+Just like in chapter \ref{Inj}, the algorithms and proofs have been included
+for self-containedness reasons,
+even though they have been originally found and described by
+Sulzmann and Lu (\cite{Sulzmann2014}) and 
+Ausaf et al. (\cite{AusafDyckhoffUrban2016} and \cite{Ausaf}).
+%In addition to this, the 
+The details of the proofs in this thesis 
+also follow more closely the actual Isabelle formalisation.
+For example, lemma \ref{flexStepwise} and \ref{retrieveStepwise}
+are not included in the publications by Ausaf et al., despite them being
+some of the key lemmas leading to the correctness result.
+
+We will first motivate the bit-coded algorithm in section \ref{bitMotivate},
+and then introduce their formal definitions in section \ref{bitDef},
+followed by a description of the correctness proof of $\blexer$ in section \ref{blexerProof}.
+
 %to address the growth problem of 
 %derivatives of
 %regular expressions. 
-We have implemented their algorithm in Scala and Isabelle, 
-and found problems 
-in their algorithm, such as de-duplication not working properly and redundant
-fixpoint construction. 
-\section{The Motivation Behind Using Bitcodes}
+%We have implemented their algorithm in Scala and Isabelle, 
+%and found problems 
+%in their algorithm, such as de-duplication not working properly and redundant
+%fixpoint construction. 
+\section{The Motivation Behind Using Bitcodes}\label{bitMotivate}
 Let us give again the definition of $\lexer$ from Chapter \ref{Inj}:
 \begin{center}
 \begin{tabular}{lcl}
@@ -316,7 +332,7 @@
 The stack seems to grow at least quadratically with respect
 to the input (not taking into account the size bloat of $r_i$),
 which can be inefficient and prone to stack overflows.
-\section{Bitcoded Algorithm}
+\section{Bitcoded Algorithm}\label{bitDef}
 To address this,
 Sulzmann and Lu defined a new datatype 
 called \emph{annotated regular expression},
@@ -1042,7 +1058,7 @@
 %-----------------------------------
 %	SUBSECTION 1
 %-----------------------------------
-\section{Correctness Proof of $\textit{Blexer}$}
+\section{Correctness Proof of $\textit{Blexer}$}\label{blexerProof}
 Why is $\blexer$ correct?
 In other words, why is it the case that 
 $\blexer$ outputs the same value as $\lexer$?

--- a/ChengsongTanPhdThesis/Chapters/Inj.tex	Mon Jul 24 11:09:48 2023 +0100
+++ b/ChengsongTanPhdThesis/Chapters/Inj.tex	Tue Jul 25 17:28:29 2023 +0100
@@ -1332,7 +1332,7 @@
 %TODO: FILL in the other defs
 \begin{center}
 \begin{tabular}{lcl}
-$L \; \ZERO$ & $\dn$ & $\phi$\\
+$L \; \ZERO$ & $\dn$ & $\varnothing$\\
 $L \; \ONE$ & $\dn$ & $\{[]\}$\\
 $L \; c$ & $\dn$ & $\{[c]\}$\\
 $L \; (r_1 + r_2)$ & $\dn$ & $ L \; r_1 \cup L \; r_2$\\
@@ -2072,15 +2072,7 @@
 injected back to that sub-value; 
 $\inj$ assembles all parts
 to form a new value.
-\subsection{An Example of how Injection Works}
-We will provide a few examples on why
 
-\begin{center}
-	\begin{tabular}{lcl}
-		$\inj \; (a\cdot b)\cdot c \; \Seq \; \ONE \; b$ & $=$ & $(a+e)$\\
-		$A$ & $B$ & $C$
-	\end{tabular}
-\end{center}
 For instance, the last clause is an
 injection into a sequence value $v_{i+1}$
 whose second child
@@ -2111,6 +2103,101 @@
 wrapped under the $\Stars$
 constructor, giving us $\Stars \; ((\inj \; r \; c \; v) ::vs)$.
 
+Putting all together, Sulzmann and Lu obtained the following algorithm
+outlined in the injection-based lexing diagram \ref{graph:inj}:
+\begin{center}
+\begin{tabular}{lcl}
+	$\lexer \; r \; [] $ & $=$ & $\textit{if} \; (\nullable \; r)\; \textit{then}\;  \Some(\mkeps \; r) \; \textit{else} \; \None$\\
+	$\lexer \; r \;c::s$ & $=$ & $\textit{case}\; (\lexer \; (r\backslash c) \; s) \;\textit{of}\; $\\
+	& & $\quad \phantom{\mid}\; \None \implies \None$\\
+	& & $\quad \mid           \Some(v) \implies \Some(\inj \; r\; c\; v)$
+\end{tabular}
+\end{center}
+\noindent
+
+\subsection{Examples on How Injection and Lexer Works}
+We will provide a few examples on how $\inj$ and $\lexer$
+works by showing their values in each recursive call on some 
+concrete examples.
+We start with the call $\lexer \; (a+aa)^*\cdot c\; aac$
+on the lexer, note that the value's character constructor
+$\Char \;c$ is abbreviated as $c$ for readability.
+Similarly the last derivative's sub-expression
+is abbreviated as $r_{=0}$\footnote{which is equal to
+$((\ZERO + (\ZERO a+ \ZERO))\cdot(a+aa)^* + (\ZERO + \ZERO a)\cdot (a+aa)^*)+
+((\ZERO+\ZERO a)\cdot(a+aa)^* + (\ZERO+\ZERO a)\cdot(a+aa)^*),
+$.}
+whose language interpretation is equivalent to that of
+$\ZERO$ and therefore not crucial to be displayed fully expanded,
+as they will not be injected into.%our example run.
+\begin{center}
+	\begin{tabular}{lcl}
+		$(a+\textcolor{magenta}{a}\textcolor{blue}{a})^* \cdot \textcolor{red}{c}$ & $\stackrel{\backslash \textcolor{magenta}{a}}{\rightarrow}$ & $((\ONE + \textcolor{magenta}{\ONE} \textcolor{blue}{a})\cdot (a+aa)^*)\cdot \textcolor{red}{c}+\ZERO$\\
+				   %& $\stackrel{\rightarrow}{\backslash a}$ & $((\ONE + \ONE a)\cdot (a+aa)^*)\cdot c + \ZERO$\\
+				    & $\stackrel{\backslash \textcolor{blue}{a}}{\rightarrow}$  &
+				    $(((\ZERO+(\ZERO a+ \textcolor{blue}{\ONE}))\cdot (a+aa)^* + (\ONE+\ONE a)\cdot (a+aa)^* )\cdot \textcolor{red}{\mathbf{c}} + \ZERO)+\ZERO$\\
+				    & $\stackrel{\backslash \textcolor{red}{c}}{\rightarrow}$  &
+				    $((r_{=0}\cdot c + \textcolor{red}{\ONE})+\ZERO)+\ZERO$\\
+				    & $\stackrel{\mkeps}{\rightarrow}$ & $\Left (\Left \; (\Right \; \textcolor{red}{\Empty}))$ \\
+				   & $\stackrel{\inj \;\textcolor{red}{c} }{\rightarrow}$ & 
+				   $\Left \; (\Left \; (\Seq \;(\Left \; (\Seq \; (\Right \; (\Right\; \textcolor{blue}{\Empty})) \; \Stars \, [])) \; \textcolor{red}{c}))$\\
+				   & $\stackrel{\inj \;\textcolor{blue}{a}}{\rightarrow}$ & 
+				   $\Left\; (\Seq \; (\Seq\; (\Right \; (\Seq \; \textcolor{magenta}{\Empty }\; \textcolor{blue}{ \mathbf{a}}) )
+				   \;\Stars \,[]) \; \textcolor{red}{c})$\\
+				   & $\stackrel{\inj \;\textcolor{magenta}{a}}{\rightarrow}$ & $\Seq \; (\Stars \; [\Right \; (\Seq \; \mathbf{\textcolor{magenta}{a}} \; \textcolor{blue}{a})]) \; \textcolor{red}{ c}$
+
+				   %$\inj \; r \; c\A$
+		%$\inj \; (a\cdot b)\cdot c \; \Seq \; \ONE \; b$ & $=$ & $(a+e)$\\
+	\end{tabular}
+\end{center}
+\noindent
+We have assigned different colours for each character,
+as well as their corresponding locations in values and regular expressions.
+The most recently injected character is marked with a bold font.
+%Their corresponding derivative steps have been marked with the same
+%colour. We also mark the specific location where the coloured character
+%was injected into with the same colour, before and after the injection occurred.
+%TODO: can be also used to motivate injection->blexer in Bitcoded1
+To show the details of how $\inj$ works,
+we zoom in to the second injection above, 
+illustrating the recursive calls involved: 
+\begin{center}
+	\begin{tabular}{lcl}
+$\inj \;\quad ((\ONE + {\ONE} 
+	\textcolor{blue}{a})\cdot (a+aa)^*)\cdot 
+	c + \ZERO \; \quad \textcolor{blue}{a} \; $ & &\\
+$\quad \Left \; 
+(\Left \; (\Seq \;(\Left \; (\Seq \; (\Right \; (\Right\; 
+	\textcolor{blue}{\Empty})) \; \Stars \, [])) \; c))$ & \\$=$\\
+	$\Left \; (\inj \; ((\ONE + \ONE 
+	\textcolor{blue}{a})\cdot (a+aa)^*)\cdot 
+	c \quad  \textcolor{blue}{a} \quad (\Left \; (\Seq\; ( \Left \; (\Seq \; (\Right \; (\Right\; 
+	\textcolor{blue}{\Empty})) \; \Stars \, []) ) \; c ) )\;\;)$ &\\ $=$\\
+	$\Left \; (\Seq \; (\inj \; (\ONE + \ONE \textcolor{blue}{a})\cdot(a+aa)^* \quad \textcolor{blue}{a} \quad
+	(\Left \; (\Seq \; (\Right \; (\Right\;\textcolor{blue}{\Empty})))) ) \; c ) $ & \\$=$\\
+	$\Left \; (\Seq \; (\Seq \; (\inj \quad (\ONE + \ONE \textcolor{blue}{a}) \quad \textcolor{blue}{a}\quad \Right \;(\Right \; \textcolor{blue}{\Empty}) ) \; \Stars \,[])\; c)$ &\\ $=$ \\
+	$\Left \; (\Seq \; (\Seq \; (\Right \; (\inj \; \ONE \textcolor{blue}{a} \quad \textcolor{blue}{a}\quad (\Right \;\textcolor{blue}{\Empty})))) \; \Stars \, [])$
+																		   & \\ $=$\\
+																		   $\Left \; (\Seq \; (\Seq \; (\Right \; (\Seq \; (\mkeps \; \ONE)\;(\inj \;\textcolor{blue}{a} \; \textcolor{blue}{a} \; \textcolor{blue}{\Empty})) ) ) \; \Stars \, [] )$
+																		   & \\ $=$\\
+																		   $\Left \; (\Seq \; (\Seq \; (\Right \; (\Seq \; \Empty \; \textcolor{blue}{a}) ) ) \; \Stars \, [] )$
+ 	\end{tabular}
+\end{center}
+\noindent
+We will now introduce the proofs related to $\inj$ and
+$\lexer$. These proofs have originally been found by Ausaf et al.
+in their 2016 work \cite{AusafDyckhoffUrban2016}.
+Proofs to some of these lemmas have also been discussed in more detail in 
+Ausaf's thesis \cite{Ausaf}.
+Nevertheless, we still introduce these proofs, to make this thesis
+self-contained as we show inductive variants used in these proofs break
+after more simplifications are introduced.
+%Also note that lemmas like \ref{}described in this thesis is a more faithful
+%representation of the actual accompanying Isabelle formalisation, and 
+%therefore 
+
+
+
 Recall that lemma 
 \ref{mePosix} tells us that
 $\mkeps$ always generates the POSIX value.
@@ -2212,17 +2299,14 @@
 	The star case can be proven similarly.
 \end{proof}
 \noindent
-Putting all together, Sulzmann and Lu obtained the following algorithm
-outlined in the diagram \ref{graph:inj}:
-\begin{center}
-\begin{tabular}{lcl}
-	$\lexer \; r \; [] $ & $=$ & $\textit{if} \; (\nullable \; r)\; \textit{then}\;  \Some(\mkeps \; r) \; \textit{else} \; \None$\\
-	$\lexer \; r \;c::s$ & $=$ & $\textit{case}\; (\lexer \; (r\backslash c) \; s) \;\textit{of}\; $\\
-	& & $\quad \phantom{\mid}\; \None \implies \None$\\
-	& & $\quad \mid           \Some(v) \implies \Some(\inj \; r\; c\; v)$
-\end{tabular}
-\end{center}
-\noindent
+
+
+
+
+
+
+
+%TODO: Cut from previous lexer def, need to make coherent
 The central property of the $\lexer$ is that it gives the correct result
 according to
 POSIX rules.

--- a/RegexExplosionPlot/injToSimp.sc	Mon Jul 24 11:09:48 2023 +0100
+++ b/RegexExplosionPlot/injToSimp.sc	Tue Jul 25 17:28:29 2023 +0100
@@ -393,14 +393,14 @@
 def simp_lexer(r: Rexp, s: List[Char]): Val = s match{
   case Nil => {//println("mkeps!"); println(r); 
     val v = mkeps(r); 
-    //println(v); 
+    println(v); 
     v}
   case c::cs => { 
     val derc1 = der(c, r);
     val derc = light_simp(derc1);
     println("derivative size: "+ size(derc));
     val v = simp_lexer(derc, cs); 
-    //println("Injection!");println("r: "+r); println("c: "+c); println("v:"+v); 
+    println("Injection!");println("r: "+r); println("c: "+c); println("v:"+v); 
     inj(r, c, v ) }
 }
 
@@ -415,7 +415,7 @@
     //val derc = light_simp(derc1);
     println("derivative size: "+ size(derc));
     val v = print_lexer(derc, cs); 
-    //println("Injection!");println("r: "+r); println("c: "+c); println("v:"+v); 
+    println("Injection!");println("r: "+r); println("c: "+c); println("v:"+v); 
     inj(r, c, v ) }
 }
 val r = STAR("a"|"aa") ~ ("c")
@@ -1229,9 +1229,10 @@
 //    println(" " + (t1 - t0)/1e9 + "")
 //    result
 //}
-//val astarstarB  = STAR(STAR("a"))~("b")
-//val data_points_x_axis = List.range(0, 60000, 3000)
+val astarstarB  = STAR(STAR("a"))~("b")
+val data_points_x_axis = List.range(0, 60000, 3000)
 //for (i <- data_points_x_axis) {print(i); time { blexing(astarstarB, ("a"*i)) } }
 ////println(data_points_x_axis.zip( List(0,1,3,4)))
 //for(i <- List.range(0, 40000, 3000)) print(i + " ")
 
+//for (i <- data_points_x_axis) {print(i); time { simp_lexer(astarstarB, ("a"*i).toList) } }

author	Chengsong
	Tue, 25 Jul 2023 17:28:29 +0100
changeset 667	660cf698eb26
parent 666	6da4516ea87d
child 668	3831621d7b14

ChengsongTanPhdThesis/Chapters/Bitcoded1.tex		file \| annotate \| diff \| comparison \| revisions
ChengsongTanPhdThesis/Chapters/Inj.tex		file \| annotate \| diff \| comparison \| revisions
RegexExplosionPlot/injToSimp.sc		file \| annotate \| diff \| comparison \| revisions