--- a/ChengsongTanPhdThesis/Chapters/Cubic.tex Wed Aug 31 12:51:53 2022 +0100
+++ b/ChengsongTanPhdThesis/Chapters/Cubic.tex Wed Aug 31 23:57:42 2022 +0100
@@ -5,20 +5,62 @@
\label{Cubic} %In Chapter 5\ref{Chapter5} we discuss stronger simplifications to improve the finite bound
%in Chapter 4 to a polynomial one, and demonstrate how one can extend the
%algorithm to include constructs such as bounded repetitions and negations.
-
+\lstset{style=myScalastyle}
+
+
+This chapter is a ``miscellaneous''
+chapter which records various
+extensions to our $\blexersimp$'s formalisations.\\
+Firstly we present further improvements
+made to our lexer algorithm $\blexersimp$.
+We devise a stronger simplification algorithm,
+called $\bsimpStrong$, which can prune away
+similar components in two regular expressions at the same
+alternative level,
+even if these regular expressions are not exactly the same.
+We call the lexer that uses this stronger simplification function
+$\blexerStrong$.
+We conjecture that both
+\begin{center}
+ $\blexerStrong \;r \; s = \blexer\; r\;s$
+\end{center}
+and
+\begin{center}
+ $\llbracket \bdersStrong{a}{s} \rrbracket = O(\llbracket a \rrbracket^3)$
+\end{center}
+hold, but formalising
+them is still work in progress.
+We give reasons why the correctness and cubic size bound proofs
+can be achieved,
+by exploring the connection between the internal
+data structure of our $\blexerStrong$ and
+Animirov's partial derivatives.\\
+Secondly, we extend our $\blexersimp$
+to support bounded repetitions ($r^{\{n\}}$).
+We update our formalisation of
+the correctness and finiteness properties to
+include this new construct. With bounded repetitions
+we are able to out-compete other verified lexers such as
+Verbatim++ on regular expressions which involve a lot of
+repetitions. We also present the idempotency property proof
+of $\bsimp$, which leverages the idempotency proof of $\rsimp$.
+This reinforces our claim that the fixpoint construction
+originally required by Sulzmann and Lu can be removed in $\blexersimp$.
+\\
+Last but not least, we present our efforts and challenges we met
+in further improving the algorithm by data
+structures such as zippers.
+
+
+
%----------------------------------------------------------------------------------------
% SECTION strongsimp
%----------------------------------------------------------------------------------------
\section{A Stronger Version of Simplification}
%TODO: search for isabelle proofs of algorithms that check equivalence
-In our bit-coded lexing algorithm, with or without simplification,
-two alternative (distinct) sub-matches for the (sub-)string and (sub-)regex pair
-are always expressed in the "derivative regular expression" as two
-disjoint alternative terms at the current (sub-)regex level. The fact that they
-are parallel tells us when we retrieve the information from this (sub-)regex
-there will always be a choice of which alternative term to take.
-As an example, the regular expression $aa \cdot a^*+ a \cdot a^*$ (omitting bit-codes)
-expresses two possibilities it will match future input.
+In our bitcoded lexing algorithm, (sub)terms represent (sub)matches.
+For example, the regular expression $aa \cdot a^*+ a \cdot a^*$ contains two terms,
+which expresses two possibilities it will match future input.
It will either match 2 $a$'s, then 0 or more $a$'s, in other words, at least 2 more $a$'s
\begin{figure}\label{string_3parts1}
\begin{center}
@@ -159,7 +201,8 @@
$\textit{acc}$, which acts as an element
is a stronger version of $\textit{distinctBy}$.
Here is a concise version of it (in the style of Scala):
-\begin{verbatim}
+\begin{figure}[H]
+\begin{lstlisting}
def distinctByPlus(rs: List[ARexp], acc: Set[Rexp] = Set()) :
List[ARexp] = rs match {
case Nil => Nil
@@ -170,7 +213,9 @@
prune(r, acc) :: distinctByPlus(rs, prune(r, acc) +: acc)
}
-\end{verbatim}
+\end{lstlisting}
+\caption{the distinctByPlus function}
+\end{figure}
But for the function $\textit{prune}$ there is a difficulty:
how could one define the $L$ function in a "computable" way,
so that they generate a (lazy infinite) set of strings in $L(r)$.
@@ -183,8 +228,8 @@
that have "language duplicates", but only eliminate the "exact duplicates".
For this we need to break up terms $(a+b)\cdot c$ into two
terms $a\cdot c$ and $b\cdot c$ before we add them to the accumulator:
-\begin{figure}
-\begin{verbatim}
+\begin{figure}[H]
+\begin{lstlisting}
def distinctWith(rs: List[ARexp],
pruneFunction: (ARexp, Set[Rexp]) => ARexp,
acc: Set[Rexp] = Set()) : List[ARexp] =
@@ -202,7 +247,7 @@
)
}
}
-\end{verbatim}
+\end{lstlisting}
\caption{A Stronger Version of $\textit{distinctBy}$}
\end{figure}
\noindent