# HG changeset patch # User Chengsong # Date 1689291161 -3600 # Node ID 3bedbdce3a3bac67440a6fb4e8556a85d0e0a3c6 # Parent ba44144875b11fd56c3bf0b6b0da1baa1af2faea a bit more intro, incorporating Christian chat messages diff -r ba44144875b1 -r 3bedbdce3a3b ChengsongTanPhdThesis/Chapters/Introduction.tex --- a/ChengsongTanPhdThesis/Chapters/Introduction.tex Mon Jul 10 19:29:22 2023 +0100 +++ b/ChengsongTanPhdThesis/Chapters/Introduction.tex Fri Jul 14 00:32:41 2023 +0100 @@ -566,7 +566,7 @@ We have made mainly two contributions in this thesis: %the %lexer we developed based on Brzozowski derivatives and %Sulzmanna and Lu's developments called -proving the lexer $\blexersimp$ is both correctness and fast. +proving the lexer $\blexersimp$ is both i) correctness and ii)fast. It is correct w.r.t a formalisation of POSIX lexing by Ausaf et al.\ref{AusafDyckhoffUrban2016}. It is fast compared with un-optimised implementations like Sulzmann and Lu's orignal development by our metric of internal data structures not growing unbounded. @@ -574,8 +574,9 @@ Our formalisation of complexity is unique among similar works in the sense that %is about the size of internal data structures. to our knowledge %we don't know of a -there is not a lexing/parsing algorithm with similar formalised correctness theorems. -Common practices involve making some empirical analysis of the complexity of the algorithm +there are not other certified +lexing/parsing algorithms with similar data structure size bound theorems. +Common practices involve making empirical analysis of the complexity of the algorithm in question (\ref{Verbatim}, \ref{Verbatimpp}), or relying on prior (unformalised) complexity analysis of well-known algorithms (\ref{ValiantParsing}), making them prone to complexity bugs. @@ -600,14 +601,60 @@ %The way to use a time credit Time credit is an integer %is assigned to a program prior to execution, and each step in the program consumes %one credit. -This has been further developed to include the big-O notation +Arma{\"e}l et al. have extended the framework to allow expressing time +credits using big-O notations so one can prove both the functional correctness and asymptotic complexity of higher-order imperative algorithms \ref{bigOImperative}. %for example the complexity of %the Quicksort algorithm %is $\Theta n\ln n$ +\marginpar{more work on formalising complexity}. +%Our next step is to leverage these frameworks +%It is a precursor to our +Our work focuses on the space complexity of the algorithm under our notion of the size of +a regular expression. +Despite not being a direct asymptotic time complexity proof, +our result is an important stepping leading towards one. -\marginpar{more work on formalising complexity}. + +Brzozowski showed that there are finitely many similar deriviatives, +where similarity is defined in terms of ACI equations. +This allows him to use derivatives as a basis for DFAs where each state is +labelled with a derivative. +However, Brzozowski did not show anything about +the size of the derivatives. +Antimirov showed that there can only be finitely +many partial derivatives for a regular expression and any set of +strings. He showed that the number is actually the +``alphabetical width'' plus 1. +From this result one can relatively easily establish +that the size of the partial derivatives is +no bigger than $(\textit{size} \; r)^3$ for every string. +Unfortunately this result does not seem to carry over to our +setting because partial derivatives have the simplification +\begin{equation}\label{eq:headSplitRule} + (r_1 + r_2) \cdot r_3 \rightarrow (r_1 \cdot r_3) + (r_2 \cdot r_3) +\end{equation} +built in. We cannot have this because otherwise we would +lose the POSIX property. For instance, the lexing result of +regular expression +\[ + (a+ab)\cdot(bc+c) +\] +with respect to string $abc$ using our lexer with the simplification rule \ref{eq:headSplitRule} +would be +\[ + \Left (\Seq \; (\Char \; a), \Seq (\Char \; b) \; (\Char \; c) ) +\] +instead of the correct POSIX value +\[ + \Seq \; (\Right \; (\Seq \; (\Char \; a) \; (\Char \; b)) ) \; (\Char \;) +\] +Our result about the finite bound also does not say anything about the number of derivatives. +In fact there are infinitely many derivatives in general +because in the annotated regular expression for STAR we record the +number of iterations. What our result shows that the size of +the derivatives is bounded, not the number. \marginpar{new changes up to this point.}