lexing: comparison ChengsongTanPhdThesis/Chapters/Cubic.tex

equal deleted inserted replaced

-:86e0203db2da
+:988e92a70704
 \chapter{A Better Bound and Other Extensions} % Main chapter title
 \label{Cubic} %In Chapter 5\ref{Chapter5} we discuss stronger simplifications to improve the finite bound
 %in Chapter 4 to a polynomial one, and demonstrate how one can extend the
 %algorithm to include constructs such as bounded repetitions and negations.
+\lstset{style=myScalastyle}
+This chapter is a ``miscellaneous''
+chapter which records various
+extensions to our $\blexersimp$'s formalisations.\\
+Firstly we present further improvements
+made to our lexer algorithm $\blexersimp$.
+We devise a stronger simplification algorithm,
+called $\bsimpStrong$, which can prune away
+similar components in two regular expressions at the same
+alternative level,
+even if these regular expressions are not exactly the same.
+We call the lexer that uses this stronger simplification function
+$\blexerStrong$.
+We conjecture that both
+\begin{center}
+	$\blexerStrong \;r \; s = \blexer\; r\;s$
+\end{center}
+and
+\begin{center}
+	$\llbracket \bdersStrong{a}{s} \rrbracket = O(\llbracket a \rrbracket^3)$
+\end{center}
+hold, but formalising
+them is still work in progress.
+We give reasons why the correctness and cubic size bound proofs
+can be achieved,
+by exploring the connection between the internal
+data structure of our $\blexerStrong$ and
+Animirov's partial derivatives.\\
+Secondly, we extend our $\blexersimp$
+to support bounded repetitions ($r^{\{n\}}$).
+We update our formalisation of
+the correctness and finiteness properties to
+include this new construct. With bounded repetitions
+we are able to out-compete other verified lexers such as
+Verbatim++ on regular expressions which involve a lot of
+repetitions. We also present the idempotency property proof
+of $\bsimp$, which leverages the idempotency proof of $\rsimp$.
+This reinforces our claim that the fixpoint construction
+originally required by Sulzmann and Lu can be removed in $\blexersimp$.
+\\
+Last but not least, we present our efforts and challenges we met
+in further improving the algorithm by data
+structures such as zippers.
 %----------------------------------------------------------------------------------------
 %	SECTION strongsimp
 %----------------------------------------------------------------------------------------
 \section{A Stronger Version of Simplification}
 %TODO: search for isabelle proofs of algorithms that check equivalence
-In our bit-coded lexing algorithm, with or without simplification,
+In our bitcoded lexing algorithm, (sub)terms represent (sub)matches.
-two alternative (distinct) sub-matches for the (sub-)string and (sub-)regex pair
+For example, the regular expression $aa \cdot a^*+ a \cdot a^*$ contains two terms,
-are always expressed in the "derivative regular expression" as two
+which expresses two possibilities it will match future input.
-disjoint alternative terms at the current (sub-)regex level. The fact that they
-are parallel tells us when we retrieve the information from this (sub-)regex
-there will always be a choice of which alternative term to take.
-As an example, the regular expression $aa \cdot a^*+ a \cdot a^*$ (omitting bit-codes)
-expresses two possibilities it will match future input.
 It will either match 2 $a$'s, then 0 or more $a$'s, in other words, at least 2 more $a$'s
 \begin{figure}\label{string_3parts1}
 \begin{center}
 \begin{tikzpicture}[every node/.append style={draw, rounded corners, inner sep=10pt}]
 \node [rectangle split, rectangle split horizontal, rectangle split parts=3]
 To implement this idea into an algorithm, we define a "pruning function"
 $\textit{prune}$, that takes away parts of all terms in $r$ that belongs to
 $\textit{acc}$, which acts as an element
 is a stronger version of $\textit{distinctBy}$.
 Here is a concise version of it (in the style of Scala):
-\begin{verbatim}
+\begin{figure}[H]
+\begin{lstlisting}
 def distinctByPlus(rs: List[ARexp], acc: Set[Rexp] = Set()) :
 List[ARexp] =  rs match {
 case Nil => Nil
 case r :: rs =>
 if(acc.contains(erase(r)))
 distinctByPlus(rs, acc)
 else
 prune(r, acc) :: distinctByPlus(rs, prune(r, acc) +: acc)
 }
-\end{verbatim}
+\end{lstlisting}
+\caption{the distinctByPlus function}
+\end{figure}
 But for the function $\textit{prune}$ there is a difficulty:
 how could one define the $L$ function in a "computable" way,
 so that they generate a (lazy infinite) set of strings in $L(r)$.
 We also need a function that tests whether $L(r_1) \subseteq L(r_2)$
 is true.
 to a bound better than Antimirov's cubic bound).
 Rather, we do not try to eliminate in every instance of regular expressions
 that have "language duplicates", but only eliminate the "exact duplicates".
 For this we need to break up terms $(a+b)\cdot c$ into two
 terms $a\cdot c$ and $b\cdot c$ before we add them to the accumulator:
-\begin{figure}
+\begin{figure}[H]
-\begin{verbatim}
+\begin{lstlisting}
 def distinctWith(rs: List[ARexp],
 pruneFunction: (ARexp, Set[Rexp]) => ARexp,
 acc: Set[Rexp] = Set()) : List[ARexp] =
 rs match{
 case Nil => Nil
 pruneFunction,
 turnIntoTerms(erase(pruned_r)) ++: acc
 )
 }
 }
-\end{verbatim}
+\end{lstlisting}
 \caption{A Stronger Version of $\textit{distinctBy}$}
 \end{figure}
 \noindent
 This process is done by the function $\textit{turnIntoTerms}$:
 \begin{figure}

changeset 590	988e92a70704
parent 538	8016a2480704
child 591	b2d0de6aee18