lexing: changeset 665:3bedbdce3a3b

--- a/ChengsongTanPhdThesis/Chapters/Introduction.tex	Mon Jul 10 19:29:22 2023 +0100
+++ b/ChengsongTanPhdThesis/Chapters/Introduction.tex	Fri Jul 14 00:32:41 2023 +0100
@@ -566,7 +566,7 @@
 We have made mainly two contributions in this thesis: %the
 %lexer we developed based on Brzozowski derivatives and 
 %Sulzmanna and Lu's developments called 
-proving the lexer $\blexersimp$ is both correctness and fast.
+proving the lexer $\blexersimp$ is both i) correctness and ii)fast.
 It is correct w.r.t a formalisation of POSIX lexing by Ausaf et al.\ref{AusafDyckhoffUrban2016}.
 It is fast compared with un-optimised implementations like Sulzmann and Lu's orignal
 development by our metric of internal data structures not growing unbounded.
@@ -574,8 +574,9 @@
 Our formalisation of complexity is unique among similar works in the sense that
 %is about the size of internal data structures.
 to our knowledge %we don't know of a 
-there is not a lexing/parsing algorithm with similar formalised correctness theorems.
-Common practices involve making some empirical analysis of the complexity of the algorithm
+there are not other certified 
+lexing/parsing algorithms with similar data structure size bound theorems.
+Common practices involve making empirical analysis of the complexity of the algorithm
 in question (\ref{Verbatim}, \ref{Verbatimpp}), or relying 
 on prior (unformalised) complexity analysis of well-known algorithms (\ref{ValiantParsing}),
 making them prone to complexity bugs.
@@ -600,14 +601,60 @@
 %The way to use a time credit Time credit is an integer
 %is assigned to a program prior to execution, and each step in the program consumes
 %one credit.
-This has been further developed to include the big-O notation
+Arma{\"e}l et al. have extended the framework to allow expressing time
+credits using big-O notations
 so one can prove both the functional correctness and asymptotic complexity
 of higher-order imperative algorithms \ref{bigOImperative}.
 %for example the complexity of
 %the Quicksort algorithm 
 %is $\Theta n\ln n$ 
+\marginpar{more work on formalising complexity}.
+%Our next step is to leverage these frameworks
+%It is a precursor to our 
+Our work focuses on the space complexity of the algorithm under our notion of the size of 
+a regular expression.
+Despite not being a direct asymptotic time complexity proof,
+our result is an important stepping leading towards one.
 
-\marginpar{more work on formalising complexity}.
+
+Brzozowski showed that there are finitely many similar deriviatives, 
+where similarity is defined in terms of ACI equations. 
+This allows him to use derivatives as a basis for DFAs where each state is 
+labelled with a derivative. 
+However, Brzozowski did not show anything about 
+the size of the derivatives.
+Antimirov showed that there can only be finitely 
+many partial derivatives for a regular expression and any set of 
+strings. He showed that the number is actually the 
+``alphabetical width'' plus 1. 
+From this result one can relatively easily establish 
+that the size of the partial derivatives is 
+no bigger than $(\textit{size} \; r)^3$ for every string.
+Unfortunately this result does not seem to carry over to our 
+setting because partial derivatives have the simplification 
+\begin{equation}\label{eq:headSplitRule}
+	(r_1 + r_2) \cdot r_3 \rightarrow  (r_1 \cdot r_3) + (r_2 \cdot r_3)
+\end{equation}
+built in. We cannot have this because otherwise we would 
+lose the POSIX property. For instance, the lexing result of
+regular expression
+\[
+	(a+ab)\cdot(bc+c)
+\]
+with respect to string $abc$ using our lexer with the simplification rule \ref{eq:headSplitRule}
+would be 
+\[
+	\Left (\Seq \; (\Char \; a), \Seq (\Char \; b) \; (\Char \; c)  )
+\]
+instead of the correct POSIX value
+\[
+	\Seq \; (\Right \; (\Seq \; (\Char \; a) \; (\Char \; b)) ) \; (\Char \;)
+\]
+Our result about the finite bound also does not say anything about the number of derivatives. 
+In fact there are infinitely many derivatives in general 
+because in the annotated regular expression for STAR we record the 
+number of iterations. What our result shows that the size of 
+the derivatives is bounded, not the number.
 \marginpar{new changes up to this point.}
author	Chengsong
	Fri, 14 Jul 2023 00:32:41 +0100
changeset 665	3bedbdce3a3b
parent 664	ba44144875b1
child 666	6da4516ea87d