--- a/ChengsongTanPhdThesis/Chapters/Introduction.tex Mon Jul 10 19:29:22 2023 +0100
+++ b/ChengsongTanPhdThesis/Chapters/Introduction.tex Fri Jul 14 00:32:41 2023 +0100
@@ -566,7 +566,7 @@
We have made mainly two contributions in this thesis: %the
%lexer we developed based on Brzozowski derivatives and
%Sulzmanna and Lu's developments called
-proving the lexer $\blexersimp$ is both correctness and fast.
+proving the lexer $\blexersimp$ is both i) correctness and ii)fast.
It is correct w.r.t a formalisation of POSIX lexing by Ausaf et al.\ref{AusafDyckhoffUrban2016}.
It is fast compared with un-optimised implementations like Sulzmann and Lu's orignal
development by our metric of internal data structures not growing unbounded.
@@ -574,8 +574,9 @@
Our formalisation of complexity is unique among similar works in the sense that
%is about the size of internal data structures.
to our knowledge %we don't know of a
-there is not a lexing/parsing algorithm with similar formalised correctness theorems.
-Common practices involve making some empirical analysis of the complexity of the algorithm
+there are not other certified
+lexing/parsing algorithms with similar data structure size bound theorems.
+Common practices involve making empirical analysis of the complexity of the algorithm
in question (\ref{Verbatim}, \ref{Verbatimpp}), or relying
on prior (unformalised) complexity analysis of well-known algorithms (\ref{ValiantParsing}),
making them prone to complexity bugs.
@@ -600,14 +601,60 @@
%The way to use a time credit Time credit is an integer
%is assigned to a program prior to execution, and each step in the program consumes
%one credit.
-This has been further developed to include the big-O notation
+Arma{\"e}l et al. have extended the framework to allow expressing time
+credits using big-O notations
so one can prove both the functional correctness and asymptotic complexity
of higher-order imperative algorithms \ref{bigOImperative}.
%for example the complexity of
%the Quicksort algorithm
%is $\Theta n\ln n$
+\marginpar{more work on formalising complexity}.
+%Our next step is to leverage these frameworks
+%It is a precursor to our
+Our work focuses on the space complexity of the algorithm under our notion of the size of
+a regular expression.
+Despite not being a direct asymptotic time complexity proof,
+our result is an important stepping leading towards one.
-\marginpar{more work on formalising complexity}.
+
+Brzozowski showed that there are finitely many similar deriviatives,
+where similarity is defined in terms of ACI equations.
+This allows him to use derivatives as a basis for DFAs where each state is
+labelled with a derivative.
+However, Brzozowski did not show anything about
+the size of the derivatives.
+Antimirov showed that there can only be finitely
+many partial derivatives for a regular expression and any set of
+strings. He showed that the number is actually the
+``alphabetical width'' plus 1.
+From this result one can relatively easily establish
+that the size of the partial derivatives is
+no bigger than $(\textit{size} \; r)^3$ for every string.
+Unfortunately this result does not seem to carry over to our
+setting because partial derivatives have the simplification
+\begin{equation}\label{eq:headSplitRule}
+ (r_1 + r_2) \cdot r_3 \rightarrow (r_1 \cdot r_3) + (r_2 \cdot r_3)
+\end{equation}
+built in. We cannot have this because otherwise we would
+lose the POSIX property. For instance, the lexing result of
+regular expression
+\[
+ (a+ab)\cdot(bc+c)
+\]
+with respect to string $abc$ using our lexer with the simplification rule \ref{eq:headSplitRule}
+would be
+\[
+ \Left (\Seq \; (\Char \; a), \Seq (\Char \; b) \; (\Char \; c) )
+\]
+instead of the correct POSIX value
+\[
+ \Seq \; (\Right \; (\Seq \; (\Char \; a) \; (\Char \; b)) ) \; (\Char \;)
+\]
+Our result about the finite bound also does not say anything about the number of derivatives.
+In fact there are infinitely many derivatives in general
+because in the annotated regular expression for STAR we record the
+number of iterations. What our result shows that the size of
+the derivatives is bounded, not the number.
\marginpar{new changes up to this point.}