ChengsongTanPhdThesis/Chapters/Cubic.tex
changeset 590 988e92a70704
parent 538 8016a2480704
child 591 b2d0de6aee18
--- a/ChengsongTanPhdThesis/Chapters/Cubic.tex	Wed Aug 31 12:51:53 2022 +0100
+++ b/ChengsongTanPhdThesis/Chapters/Cubic.tex	Wed Aug 31 23:57:42 2022 +0100
@@ -5,20 +5,62 @@
 \label{Cubic} %In Chapter 5\ref{Chapter5} we discuss stronger simplifications to improve the finite bound
 %in Chapter 4 to a polynomial one, and demonstrate how one can extend the
 %algorithm to include constructs such as bounded repetitions and negations.
- 
+\lstset{style=myScalastyle}
+
+
+This chapter is a ``miscellaneous''
+chapter which records various
+extensions to our $\blexersimp$'s formalisations.\\
+Firstly we present further improvements
+made to our lexer algorithm $\blexersimp$.
+We devise a stronger simplification algorithm,
+called $\bsimpStrong$, which can prune away
+similar components in two regular expressions at the same 
+alternative level,
+even if these regular expressions are not exactly the same.
+We call the lexer that uses this stronger simplification function
+$\blexerStrong$.
+We conjecture that both
+\begin{center}
+	$\blexerStrong \;r \; s = \blexer\; r\;s$
+\end{center}
+and
+\begin{center}
+	$\llbracket \bdersStrong{a}{s} \rrbracket = O(\llbracket a \rrbracket^3)$
+\end{center}
+hold, but formalising
+them is still work in progress.
+We give reasons why the correctness and cubic size bound proofs
+can be achieved,
+by exploring the connection between the internal
+data structure of our $\blexerStrong$ and
+Animirov's partial derivatives.\\
+Secondly, we extend our $\blexersimp$
+to support bounded repetitions ($r^{\{n\}}$).
+We update our formalisation of 
+the correctness and finiteness properties to
+include this new construct. With bounded repetitions
+we are able to out-compete other verified lexers such as
+Verbatim++ on regular expressions which involve a lot of
+repetitions. We also present the idempotency property proof
+of $\bsimp$, which leverages the idempotency proof of $\rsimp$.
+This reinforces our claim that the fixpoint construction
+originally required by Sulzmann and Lu can be removed in $\blexersimp$.
+\\
+Last but not least, we present our efforts and challenges we met
+in further improving the algorithm by data
+structures such as zippers.
+
+
+
 %----------------------------------------------------------------------------------------
 %	SECTION strongsimp
 %----------------------------------------------------------------------------------------
 \section{A Stronger Version of Simplification}
 %TODO: search for isabelle proofs of algorithms that check equivalence 
-In our bit-coded lexing algorithm, with or without simplification, 
-two alternative (distinct) sub-matches for the (sub-)string and (sub-)regex pair
-are always expressed in the "derivative regular expression" as two
-disjoint alternative terms at the current (sub-)regex level. The fact that they
-are parallel tells us when we retrieve the information from this (sub-)regex 
-there will always be a choice of which alternative term to take.
-As an example, the regular expression $aa \cdot a^*+ a \cdot a^*$ (omitting bit-codes)
-expresses two possibilities it will match future input.
+In our bitcoded lexing algorithm, (sub)terms represent (sub)matches.
+For example, the regular expression $aa \cdot a^*+ a \cdot a^*$ contains two terms, 
+which expresses two possibilities it will match future input.
 It will either match 2 $a$'s, then 0 or more $a$'s, in other words, at least 2 more $a$'s
 \begin{figure}\label{string_3parts1}
 \begin{center}
@@ -159,7 +201,8 @@
 $\textit{acc}$, which acts as an element
 is a stronger version of $\textit{distinctBy}$.
 Here is a concise version of it (in the style of Scala):
-\begin{verbatim}
+\begin{figure}[H]
+\begin{lstlisting}
 def distinctByPlus(rs: List[ARexp], acc: Set[Rexp] = Set()) : 
 List[ARexp] =  rs match {
   case Nil => Nil
@@ -170,7 +213,9 @@
       prune(r, acc) :: distinctByPlus(rs, prune(r, acc) +: acc)
 }
 
-\end{verbatim}
+\end{lstlisting}
+\caption{the distinctByPlus function}
+\end{figure}
 But for the function $\textit{prune}$ there is a difficulty:
 how could one define the $L$ function in a "computable" way,
 so that they generate a (lazy infinite) set of strings in $L(r)$.
@@ -183,8 +228,8 @@
 that have "language duplicates", but only eliminate the "exact duplicates".
 For this we need to break up terms $(a+b)\cdot c$ into two
 terms $a\cdot c$ and $b\cdot c$ before we add them to the accumulator:
-\begin{figure}
-\begin{verbatim}
+\begin{figure}[H]
+\begin{lstlisting}
 def distinctWith(rs: List[ARexp], 
                 pruneFunction: (ARexp, Set[Rexp]) => ARexp, 
                 acc: Set[Rexp] = Set()) : List[ARexp] = 
@@ -202,7 +247,7 @@
         )
       }
   }
-\end{verbatim}
+\end{lstlisting}
 \caption{A Stronger Version of $\textit{distinctBy}$}
 \end{figure}
 \noindent