lexing: changeset 590:988e92a70704

--- a/ChengsongTanPhdThesis/Chapters/Bitcoded2.tex	Wed Aug 31 12:51:53 2022 +0100
+++ b/ChengsongTanPhdThesis/Chapters/Bitcoded2.tex	Wed Aug 31 23:57:42 2022 +0100
@@ -308,20 +308,24 @@
 annotated regular expressions
 whereas $\erase$ would turn it back into a binary structure.
 Not having to mess with the structure 
-greatly simplifies the finiteness proof in chapter \ref{Finite}.
+greatly simplifies the finiteness proof in chapter 
+\ref{Finite} (we will follow up with more details there).
 We give the definitions of $\rerases$ here together with
 the new datatype used by $\rerases$ (as our plain
-regular expression datatype does not allow non-binary alternatives),
-and explain in detail
-why we want it in the next chapter.
+regular expression datatype does not allow non-binary alternatives).
 For the moment the reader can just think of 
 $\rerases$ as $\erase$ and $\rrexp$ as plain regular expressions.
-\[			\rrexp ::=   \RZERO \mid  \RONE
+\begin{figure}[H]
+\begin{center}	
+	$\rrexp ::=   \RZERO \mid  \RONE
 			 \mid  \RCHAR{c}  
 			 \mid  \RSEQ{r_1}{r_2}
 			 \mid  \RALTS{rs}
-			 \mid \RSTAR{r}        
-\]
+			 \mid \RSTAR{r}        $
+\end{center}
+\caption{$\rrexp$: plain regular expressions, but with $\sum$ alternative 
+constructor}\label{rrexpDef}
+\end{figure}
 The notation of $\rerases$ also follows that of $\erase$,
 which is a postfix operator written as a subscript,
 except that it has an \emph{r} attached to it to distinguish against $\erase$:

--- a/ChengsongTanPhdThesis/Chapters/Cubic.tex	Wed Aug 31 12:51:53 2022 +0100
+++ b/ChengsongTanPhdThesis/Chapters/Cubic.tex	Wed Aug 31 23:57:42 2022 +0100
@@ -5,20 +5,62 @@
 \label{Cubic} %In Chapter 5\ref{Chapter5} we discuss stronger simplifications to improve the finite bound
 %in Chapter 4 to a polynomial one, and demonstrate how one can extend the
 %algorithm to include constructs such as bounded repetitions and negations.
- 
+\lstset{style=myScalastyle}
+
+
+This chapter is a ``miscellaneous''
+chapter which records various
+extensions to our $\blexersimp$'s formalisations.\\
+Firstly we present further improvements
+made to our lexer algorithm $\blexersimp$.
+We devise a stronger simplification algorithm,
+called $\bsimpStrong$, which can prune away
+similar components in two regular expressions at the same 
+alternative level,
+even if these regular expressions are not exactly the same.
+We call the lexer that uses this stronger simplification function
+$\blexerStrong$.
+We conjecture that both
+\begin{center}
+	$\blexerStrong \;r \; s = \blexer\; r\;s$
+\end{center}
+and
+\begin{center}
+	$\llbracket \bdersStrong{a}{s} \rrbracket = O(\llbracket a \rrbracket^3)$
+\end{center}
+hold, but formalising
+them is still work in progress.
+We give reasons why the correctness and cubic size bound proofs
+can be achieved,
+by exploring the connection between the internal
+data structure of our $\blexerStrong$ and
+Animirov's partial derivatives.\\
+Secondly, we extend our $\blexersimp$
+to support bounded repetitions ($r^{\{n\}}$).
+We update our formalisation of 
+the correctness and finiteness properties to
+include this new construct. With bounded repetitions
+we are able to out-compete other verified lexers such as
+Verbatim++ on regular expressions which involve a lot of
+repetitions. We also present the idempotency property proof
+of $\bsimp$, which leverages the idempotency proof of $\rsimp$.
+This reinforces our claim that the fixpoint construction
+originally required by Sulzmann and Lu can be removed in $\blexersimp$.
+\\
+Last but not least, we present our efforts and challenges we met
+in further improving the algorithm by data
+structures such as zippers.
+
+
+
 %----------------------------------------------------------------------------------------
 %	SECTION strongsimp
 %----------------------------------------------------------------------------------------
 \section{A Stronger Version of Simplification}
 %TODO: search for isabelle proofs of algorithms that check equivalence 
-In our bit-coded lexing algorithm, with or without simplification, 
-two alternative (distinct) sub-matches for the (sub-)string and (sub-)regex pair
-are always expressed in the "derivative regular expression" as two
-disjoint alternative terms at the current (sub-)regex level. The fact that they
-are parallel tells us when we retrieve the information from this (sub-)regex 
-there will always be a choice of which alternative term to take.
-As an example, the regular expression $aa \cdot a^*+ a \cdot a^*$ (omitting bit-codes)
-expresses two possibilities it will match future input.
+In our bitcoded lexing algorithm, (sub)terms represent (sub)matches.
+For example, the regular expression $aa \cdot a^*+ a \cdot a^*$ contains two terms, 
+which expresses two possibilities it will match future input.
 It will either match 2 $a$'s, then 0 or more $a$'s, in other words, at least 2 more $a$'s
 \begin{figure}\label{string_3parts1}
 \begin{center}
@@ -159,7 +201,8 @@
 $\textit{acc}$, which acts as an element
 is a stronger version of $\textit{distinctBy}$.
 Here is a concise version of it (in the style of Scala):
-\begin{verbatim}
+\begin{figure}[H]
+\begin{lstlisting}
 def distinctByPlus(rs: List[ARexp], acc: Set[Rexp] = Set()) : 
 List[ARexp] =  rs match {
   case Nil => Nil
@@ -170,7 +213,9 @@
       prune(r, acc) :: distinctByPlus(rs, prune(r, acc) +: acc)
 }
 
-\end{verbatim}
+\end{lstlisting}
+\caption{the distinctByPlus function}
+\end{figure}
 But for the function $\textit{prune}$ there is a difficulty:
 how could one define the $L$ function in a "computable" way,
 so that they generate a (lazy infinite) set of strings in $L(r)$.
@@ -183,8 +228,8 @@
 that have "language duplicates", but only eliminate the "exact duplicates".
 For this we need to break up terms $(a+b)\cdot c$ into two
 terms $a\cdot c$ and $b\cdot c$ before we add them to the accumulator:
-\begin{figure}
-\begin{verbatim}
+\begin{figure}[H]
+\begin{lstlisting}
 def distinctWith(rs: List[ARexp], 
                 pruneFunction: (ARexp, Set[Rexp]) => ARexp, 
                 acc: Set[Rexp] = Set()) : List[ARexp] = 
@@ -202,7 +247,7 @@
         )
       }
   }
-\end{verbatim}
+\end{lstlisting}
 \caption{A Stronger Version of $\textit{distinctBy}$}
 \end{figure}
 \noindent

--- a/ChengsongTanPhdThesis/Chapters/Finite.tex	Wed Aug 31 12:51:53 2022 +0100
+++ b/ChengsongTanPhdThesis/Chapters/Finite.tex	Wed Aug 31 23:57:42 2022 +0100
@@ -8,15 +8,14 @@
 %regex's derivatives. 
 
 In this chapter we give a guarantee in terms of size: 
-given an annotated regular expression $a$, for any string $s$ 
-our algorithm's internal annotated regular expression 
-size
+given an annotated regular expression $a$, for any string $s$
+our algorithm $\blexersimp$'s internal annotated regular expression 
+size  is finitely bounded
+by a constant $N_a$ that only depends on $a$:
 \begin{center}
-$\llbracket \bderssimp{a}{s} \rrbracket$
+$\llbracket \bderssimp{a}{s} \rrbracket \leq N_a$
 \end{center}
 \noindent
-is finitely bounded
-by a constant $N_a$ that only depends on $a$,
 where the size of an annotated regular expression is defined
 in terms of the number of nodes in its tree structure:
 \begin{center}
@@ -26,80 +25,110 @@
 	$\llbracket _{bs} r_1 \cdot r_2 \rrbracket$ & $\dn$ & $\llbracket r_1 \rrbracket + \llbracket r_2 \rrbracket + 1$\\
 $\llbracket _{bs}\mathbf{c} \rrbracket $ & $\dn$ & $1$\\
 $\llbracket _{bs}\sum as \rrbracket $ & $\dn$ & $\map \; (\llbracket \_ \rrbracket)\; as   + 1$\\
-$\llbracket _{bs} a^* \rrbracket $ & $\dn$ & $\llbracket a \rrbracket + 1$
+$\llbracket _{bs} a^* \rrbracket $ & $\dn$ & $\llbracket a \rrbracket + 1$.
 \end{tabular}
 \end{center}
-
+We believe this size formalisation 
+of the algorithm is important in our context, because 
+\begin{itemize}
+	\item
+		It is a stepping stone towards an ``absence of catastrophic-backtracking''
+		guarantee. The next step would be to refine the bound $N_a$ so that it
+		is polynomial on $\llbracket a\rrbracket$.
+	\item
+		The size bound proof gives us a higher confidence that
+		our simplification algorithm $\simp$ does not ``mis-behave''
+		like $\simpsulz$ does.
+		The bound is universal, which is an advantage over work which 
+		only gives empirical evidence on some test cases.
+\end{itemize}
 \section{Formalising About Size}
 \noindent
 In our lexer $\blexersimp$,
 The regular expression is repeatedly being taken derivative of
 and then simplified.
-\begin{center}
-\begin{tikzpicture}[scale=2,node distance=1.4cm,
-                    every node/.style={minimum size=9mm}]
-\node (r0)  {$r$};
-\node (r1) [rectangle, draw=black, thick, right=of r0]{$r_1$};
+\begin{figure}[H]
+\begin{tikzpicture}[scale=2,
+                    every node/.style={minimum size=11mm},
+		    ->,>=stealth',shorten >=1pt,auto,thick
+		    ]
+\node (r0) [rectangle, draw=black, thick, minimum size = 5mm, draw=blue] {$a$};
+\node (r1) [rectangle, draw=black, thick, right=of r0, minimum size = 7mm]{$a_1$};
 \draw[->,line width=0.2mm](r0)--(r1) node[above,midway] {$\backslash c_1$};
-\node (r1s) [rectangle, draw=blue, thick, right=of r1, minimum size=7mm]{$r_{1s}$};
-\draw[->, line width=0.2mm](r1)--(r1s) node[above, midway] {simp};
-\node (r2) [rectangle, draw=black, thick,  right=of r1s]{$r_2$};
+
+\node (r1s) [rectangle, draw=blue, thick, right=of r1, minimum size=6mm]{$a_{1s}$};
+\draw[->, line width=0.2mm](r1)--(r1s) node[above, midway] {$\simp$};
+
+\node (r2) [rectangle, draw=black, thick,  right=of r1s, minimum size = 12mm]{$a_2$};
 \draw[->,line width=0.2mm](r1s)--(r2) node[above,midway] {$\backslash c_2$};
-\node (r2s) [rectangle, draw = blue, thick, right=of r2,minimum size=7mm]{$r_{2s}$};
-\draw[->,line width=0.2mm](r2)--(r2s) node[above,midway] {$simp$};
-\node (rns) [rectangle, draw = blue, thick, right=of r2s,minimum size=7mm]{$r_{ns}$};
+
+\node (r2s) [rectangle, draw = blue, thick, right=of r2,minimum size=6mm]{$a_{2s}$};
+\draw[->,line width=0.2mm](r2)--(r2s) node[above,midway] {$\simp$};
+
+\node (rns) [rectangle, draw = blue, thick, right=of r2s,minimum size=6mm]{$a_{ns}$};
 \draw[->,line width=0.2mm, dashed](r2s)--(rns) node[above,midway] {$\backslash \ldots$};
-\node (v) [circle, draw = blue, thick, right=of rns, minimum size=7mm, right=2.7cm]{$v$};
-\draw[->, line width=0.2mm](rns)--(v) node[above, midway] {collect+decode};
+
+\node (v) [circle, thick, draw, right=of rns, minimum size=6mm, right=1.7cm]{$v$};
+\draw[->, line width=0.2mm](rns)--(v) node[above, midway] {\bmkeps} node [below, midway] {\decode};
 \end{tikzpicture}
-\end{center}
+\caption{Regular expression size change during our $\blexersimp$ algorithm}\label{simpShrinks}
+\end{figure}
 \noindent
-As illustrated in the picture, 
-each time the regular expression 
-is taken derivative of, it grows (the black nodes),
-and after simplification, it shrinks (the blue nodes).
+Each time
+a derivative is taken, a regular expression might grow a bit,
+but simplification always takes care that 
+it stays small.
 This intuition is depicted by the relative size
-difference between the black and blue nodes.
-We give a mechanised proof that after simplification 
-the regular expression's size (the blue ones)
-$\bderssimp{a}{s}$ will never exceed a constant $N_a$ for
-a fixed $a$. 
+change between the black and blue nodes:
+After $\simp$ the node always shrinks.
+Our proof says that all the blue nodes
+stay below a size bound $N_a$ determined by $a$.
 
-There are two problems with this finiteness proof, though.
-\begin{itemize}
-	\item
-First, It is not yet a direct formalisation of our lexer's complexity,
-as a complexity proof would require looking into 
-the time it takes to execute {\bf all} the operations
-involved in the lexer (simp, collect, decode), not just the derivative.
-\item
-Second, the bound is not yet tight, and we seek to improve $N_a$ so that
-it is polynomial on $\llbracket a \rrbracket$.
-\end{itemize}
-Still, we believe this size formalisation 
-of the algorithm is important in our context, because 
-\begin{itemize}
-	\item
-		Derivatives are the most important phases of our lexer algorithm.
-		Size properties about derivatives covers the majority of the algorithm
-		and is therefore a good indication of complexity of the entire program.
-	\item
-		What the size bound proof does ensure is an absence of 
-		catastrophic backtracking, which is prevalent in regular expression engines
-		in popular programming languages like Java.
-		We prove catastrophic backtracking cannot happen for {\bf all} inputs,
-		which is an advantage over work which 
-		only gives empirical evidence on some test cases.
-\end{itemize}
-For example, Sulzmann and Lu's made claimed that their algorithm
-with bitcodes and simplifications can lex in linear time with respect
-to the input string. This assumes that each
-derivative operation takes constant time. 
-However, it turns out that on certain cases their lexer 
+\noindent
+Sulzmann and Lu's assumed something similar about their algorithm,
+though in fact their algorithm's size might be better depicted by the following graph:
+\begin{figure}[H]
+\begin{tikzpicture}[scale=2,
+                    every node/.style={minimum size=11mm},
+		    ->,>=stealth',shorten >=1pt,auto,thick
+		    ]
+\node (r0) [rectangle, draw=black, thick, minimum size = 5mm, draw=blue] {$a$};
+\node (r1) [rectangle, draw=black, thick, right=of r0, minimum size = 7mm]{$a_1$};
+\draw[->,line width=0.2mm](r0)--(r1) node[above,midway] {$\backslash c_1$};
+
+\node (r1s) [rectangle, draw=blue, thick, right=of r1, minimum size=7mm]{$a_{1s}$};
+\draw[->, line width=0.2mm](r1)--(r1s) node[above, midway] {$\simp'$};
+
+\node (r2) [rectangle, draw=black, thick,  right=of r1s, minimum size = 17mm]{$a_2$};
+\draw[->,line width=0.2mm](r1s)--(r2) node[above,midway] {$\backslash c_2$};
+
+\node (r2s) [rectangle, draw = blue, thick, right=of r2,minimum size=14mm]{$a_{2s}$};
+\draw[->,line width=0.2mm](r2)--(r2s) node[above,midway] {$\simp'$};
+
+\node (r3) [rectangle, draw = black, thick, right= of r2s, minimum size = 22mm]{$a_3$};
+\draw[->,line width=0.2mm](r2s)--(r3) node[above,midway] {$\backslash c_3$};
+
+\node (rns) [right = of r3, draw=blue, minimum size = 20mm]{$a_{3s}$};
+\draw[->,line width=0.2mm] (r3)--(rns) node [above, midway] {$\simp'$};
+
+\node (rnn) [right = of rns, minimum size = 1mm]{};
+\draw[->, dashed] (rns)--(rnn) node [above, midway] {$\ldots$};
+
+\end{tikzpicture}
+\caption{Regular expression size change during our $\blexersimp$ algorithm}\label{sulzShrinks}
+\end{figure}
+\noindent
+That is, on certain cases their lexer 
 will have an indefinite size explosion, causing the running time 
-of each derivative step to grow arbitrarily large.
-Here in our proof we state that such explosions cannot happen 
-with our simplification function.
+of each derivative step to grow arbitrarily large (for example 
+in \ref{SulzmannLuLexerTime}).
+The reason they made this mistake was that
+they tested out the run time of their
+lexer on particular examples such as $(a+b+ab)^*$
+and generalised to all cases, which
+cannot happen with our mecahnised proof.\\
+We give details of the proof in the next sections.
+
 \subsection{Overview of the Proof}
 Here is a bird's eye view of how the proof of finiteness works,
 which involves three steps:
@@ -112,45 +141,49 @@
                       top color=white,bottom color=black!20}]
 
 
-		      \node (0) at (-5,0) [node, text width=1.8cm, text centered] {$\llbracket \bderssimp{a}{s} \rrbracket$};
-		      \node (A) at (0,0) [node,text width=1.6cm,  text centered] {$\llbracket \rderssimp{r}{s} \rrbracket_r$};
-		      \node (B) at (3,0) [node,text width=3.0cm, anchor=west, minimum width = 40mm] {$\llbracket \textit{ClosedForm}(r, s)\rrbracket_r$};
+		      \node (0) at (-5,0) 
+			      [node, text width=1.8cm, text centered] 
+			      {$\llbracket \bderssimp{a}{s} \rrbracket$};
+		      \node (A) at (0,0) 
+			      [node,text width=1.6cm,  text centered] 
+			      {$\llbracket \rderssimp{r}{s} \rrbracket_r$};
+		      \node (B) at (3,0) 
+			      [node,text width=3.0cm, anchor=west, minimum width = 40mm] 
+			      {$\llbracket \textit{ClosedForm}(r, s)\rrbracket_r$};
   \node (C) at (9.5,0) [node, minimum width=10mm] {$N_r$};
  
-  \draw [->,line width=0.5mm] (0) -- node [above,pos=0.45] {=} (A) node [below, pos = 0.45] {$r = a \downarrow_r$} (A); 
-  \draw [->,line width=0.5mm] (A) -- node [above,pos=0.35] {$\quad =\ldots=$} (B); 
-  \draw [->,line width=0.5mm] (B) -- node [above,pos=0.35] {$\quad \leq \ldots \leq$} (C); 
+  \draw [->,line width=0.5mm] (0) -- 
+	  node [above,pos=0.45] {=} (A) node [below, pos = 0.45] {$(r = a \downarrow_r)$} (A); 
+  \draw [->,line width=0.5mm] (A) -- 
+	  node [above,pos=0.35] {$\quad =\ldots=$} (B); 
+  \draw [->,line width=0.5mm] (B) -- 
+	  node [above,pos=0.35] {$\quad \leq \ldots \leq$} (C); 
   \end{tikzpicture}
 \end{center}
 \noindent
 We explain the steps one by one:
 \begin{itemize}
-\item
-We first define a new datatype
-that is more straightforward to tweak 
-into the shape we want 
-compared with an annotated regular expression,
-called $\textit{rrexp}$s.
-Its inductive type definition and 
-derivative and simplification operations are
-almost identical to those of the annotated regular expressions,
-except that no bitcodes are attached.
-\item
-We have a set of equalities for this new datatype that enables one to
-rewrite $\rderssimp{r_1 \cdot r_2}{s}$ 
-and $\rderssimp{r^*}{s}$ (the most involved
-inductive cases)
-by a combinatioin of derivatives of their 
-children regular expressions ($r_1$, $r_2$, and $r$, respectively),
-which we call the \emph{Closed Forms}
-of the derivatives.
-\item
-The Closed Forms of the regular expressions
-are controlled by terms that
-are easier to deal with.
-Using inductive hypothesis, these terms
-are in turn bounded loosely
-by a large yet constant number.
+	\item
+		We first introduce the operations such as 
+		derivatives, simplification, size calculation, etc.
+		associated with $\rrexp$s, which we have given
+		a very brief introduction to in chapter \ref{Bitcoded2}.
+	\item
+		We have a set of equalities for this new datatype that enables one to
+		rewrite $\rderssimp{r_1 \cdot r_2}{s}$ 
+		and $\rderssimp{r^*}{s}$ (the most involved
+		inductive cases)
+		by a combinatioin of derivatives of their 
+		children regular expressions ($r_1$, $r_2$, and $r$, respectively),
+		which we call the \emph{Closed Forms}
+		of the derivatives.
+	\item
+		The Closed Forms of the regular expressions
+		are controlled by terms that
+		are easier to deal with.
+		Using inductive hypothesis, these terms
+		are in turn bounded loosely
+		by a large yet constant number.
 \end{itemize}
 We give details of these steps in the next sections.
 The first step is to use 
@@ -159,6 +192,15 @@
 annotated regular expressions. 
 
 \section{the $\textit{rrexp}$ Datatype and Its Size Functions}
+
+We first recap a bit about the new datatype
+we defined in \ref{rrexpDef},
+called $\textit{rrexp}$s.
+We chose $\rrexp$ over 
+the basic regular expressions
+because it is more straightforward to tweak 
+into the shape we want 
+compared with an annotated regular expression.
 We want to prove the size property about annotated regular expressions.
 The size is 
 written $\llbracket r\rrbracket$, whose intuitive definition is as below
@@ -232,6 +274,10 @@
 
 This does not hold for plain $\rexp$s. 
 
+
+		These operations are 
+		almost identical to those of the annotated regular expressions,
+		except that no bitcodes are attached.
 Of course, the bits which encode the lexing information would grow linearly with respect 
 to the input, which should be taken into account when we wish to tackle the runtime comlexity.
 But at the current stage 
@@ -2133,6 +2179,35 @@
 	$L(A^*B) = L(A\cdot A^* \cdot B + B)$
 \end{center}
 
+There are two problems with this finiteness result, though.
+\begin{itemize}
+	\item
+First, It is not yet a direct formalisation of our lexer's complexity,
+as a complexity proof would require looking into 
+the time it takes to execute {\bf all} the operations
+involved in the lexer (simp, collect, decode), not just the derivative.
+\item
+Second, the bound is not yet tight, and we seek to improve $N_a$ so that
+it is polynomial on $\llbracket a \rrbracket$.
+\end{itemize}
+Still, we believe this contribution is fruitful,
+because
+\begin{itemize}
+	\item
+
+		The size proof can serve as a cornerstone for a complexity
+		formalisation.
+		Derivatives are the most important phases of our lexer algorithm.
+		Size properties about derivatives covers the majority of the algorithm
+		and is therefore a good indication of complexity of the entire program.
+	\item
+		The bound is already a strong indication that catastrophic
+		backtracking is much less likely to occur in our $\blexersimp$
+		algorithm.
+		We refine $\blexersimp$ with $\blexerStrong$ in the next chapter
+		so that the bound becomes polynomial.
+\end{itemize}
+		
 %----------------------------------------------------------------------------------------
 %	SECTION 4
 %----------------------------------------------------------------------------------------

--- a/ChengsongTanPhdThesis/Chapters/Introduction.tex	Wed Aug 31 12:51:53 2022 +0100
+++ b/ChengsongTanPhdThesis/Chapters/Introduction.tex	Wed Aug 31 23:57:42 2022 +0100
@@ -117,6 +117,11 @@
 \def\map{\mathit{map}}
 \def\distinct{\mathit{distinct}}
 \def\blexersimp{\mathit{blexer}\_\mathit{simp}}
+\def\blexerStrong{\textit{blexerStrong}}
+\def\bsimpStrong{\textit{bsimpStrong}}
+%\def\bdersStrong{\textit{bdersStrong}}
+\newcommand{\bdersStrong}[2]{#1 \backslash_{bsimpStrongs} #2}
+
 \def\map{\textit{map}}
 \def\rrexp{\textit{rrexp}}
 \newcommand\rnullable[1]{\textit{rnullable} \; #1 }
@@ -153,20 +158,27 @@
 
 
 
-\pgfplotsset{
-    myplotstyle/.style={
-    legend style={draw=none, font=\small},
-    legend cell align=left,
-    legend pos=north east,
-    ylabel style={align=center, font=\bfseries\boldmath},
-    xlabel style={align=center, font=\bfseries\boldmath},
-    x tick label style={font=\bfseries\boldmath},
-    y tick label style={font=\bfseries\boldmath},
-    scaled ticks=true,
-    every axis plot/.append style={thick},
-    },
+
+\lstdefinestyle{myScalastyle}{
+  frame=tb,
+  language=scala,
+  aboveskip=3mm,
+  belowskip=3mm,
+  showstringspaces=false,
+  columns=flexible,
+  basicstyle={\small\ttfamily},
+  numbers=none,
+  numberstyle=\tiny\color{gray},
+  keywordstyle=\color{blue},
+  commentstyle=\color{dkgreen},
+  stringstyle=\color{mauve},
+  frame=single,
+  breaklines=true,
+  breakatwhitespace=true,
+  tabsize=3,
 }
 
+
 %----------------------------------------------------------------------------------------
 %This part is about regular expressions, Brzozowski derivatives,
 %and a bit-coded lexing algorithm with proven correctness and time bounds.

--- a/ChengsongTanPhdThesis/main.tex	Wed Aug 31 12:51:53 2022 +0100
+++ b/ChengsongTanPhdThesis/main.tex	Wed Aug 31 23:57:42 2022 +0100
@@ -44,6 +44,10 @@
 
 \usepackage{cancel}
 
+\usepackage{listings}
+\usepackage{xcolor}
+\usepackage{beramono}
+
 \usepackage{verbatim}
 \usepackage{float}
 \usepackage{mathpazo} % Use the Palatino font by default
@@ -110,6 +114,43 @@
 \DeclareCaptionType{mytype}[Illustration][]
 \newenvironment{envForCaption}{\captionsetup{type=mytype} }{}
 
+\pgfplotsset{
+    myplotstyle/.style={
+    legend style={draw=none, font=\small},
+    legend cell align=left,
+    legend pos=north east,
+    ylabel style={align=center, font=\bfseries\boldmath},
+    xlabel style={align=center, font=\bfseries\boldmath},
+    x tick label style={font=\bfseries\boldmath},
+    y tick label style={font=\bfseries\boldmath},
+    scaled ticks=true,
+    every axis plot/.append style={thick},
+    },
+}
+
+\definecolor{dkgreen}{rgb}{0,0.6,0}
+\definecolor{gray}{rgb}{0.5,0.5,0.5}
+\definecolor{mauve}{rgb}{0.58,0,0.82}
+\lstdefinestyle{myScalastyle}{
+  frame=tb,
+  language=scala,
+  aboveskip=3mm,
+  belowskip=3mm,
+  showstringspaces=false,
+  columns=flexible,
+  basicstyle={\small\ttfamily},
+  numbers=none,
+  numberstyle=\tiny\color{gray},
+  keywordstyle=\color{blue},
+  commentstyle=\color{dkgreen},
+  stringstyle=\color{mauve},
+  frame=single,
+  breaklines=true,
+  breakatwhitespace=true,
+  tabsize=3,
+}
+
+
 %----------------------------------------------------------------------------------------
 %	MARGIN SETTINGS
 %----------------------------------------------------------------------------------------

--- a/thys4/posix/FBound.thy	Wed Aug 31 12:51:53 2022 +0100
+++ b/thys4/posix/FBound.thy	Wed Aug 31 23:57:42 2022 +0100
@@ -241,8 +241,8 @@
 | "AZERO \<le>1 ASEQ bs AZERO r" 
 | "AZERO \<le>1 ASEQ bs r AZERO"
 | "fuse (bs @ bs1) r2 \<le>1 ASEQ bs (AONE bs1) r2"
-| " AALTs bs (rs1 @ rs) \<le>1 AALTs bs (rs1 @( AZERO # rs))"
-| "AALTs bs1 (rsa @ (map (fuse bs1) rs1) @ rsb) \<le>1 AALTs bs1 (rsa @ (AALTs bs1 rs1) # rsb)"
+| "AALTs bs (rs1 @ rs) \<le>1 AALTs bs (rs1 @( AZERO # rs))"
+| "AALTs bs (rsa @ (map (fuse bs1) rs1) @ rsb) \<le>1 AALTs bs (rsa @ (AALTs bs1 rs1) # rsb)"
 | "rerase a1 = rerase a2 \<Longrightarrow> AALTs bs (rsa @ [a1] @ rsb @ rsc) \<le>1 AALTs bs (rsa @ [a1] @ rsb @ [a2] @ rsc) "
 | "r1 \<le>1 r2 \<Longrightarrow> r1  \<le>1 ASEQ bs (AONE bs1) r2"
 | "r2 \<le>1 r1 \<Longrightarrow> AALTs bs (rs1 @ r2 # rs) \<le>1 AALTs bs (rs1 @ r1 # rs)"
@@ -252,6 +252,42 @@
 | "AZERO \<le>1 AALTs bs []"
 | "fuse bs r \<le>1 AALTs bs [r]"
 | "\<lbrakk>r1' \<le>1 r1;  r2' \<le>1 r2\<rbrakk> \<Longrightarrow> bsimp_ASEQ bs1 r1' r2' \<le>1 ASEQ bs1 r1 r2"
+| "\<lbrakk>AALTs bs rs1 \<le>1 AALTs bs rs2; r1 \<le>1 r2 \<rbrakk> \<Longrightarrow> AALTs bs (r1 # rs1) \<le>1 AALTs bs (r2 # rs2)"
+| "\<lbrakk>r1 \<le>1 r2; r2 \<le>1 r3 \<rbrakk> \<Longrightarrow> r1 \<le>1 r3"
+
+
+lemma leq1_6_variant1:
+  shows "AALTs bs ( (map (fuse bs1) rs1) @ rsb) \<le>1 AALTs bs ((AALTs bs1 rs1) # rsb)"
+  by (metis leq1.intros(6) self_append_conv2)
+
+
+
+lemma flts_leq1:
+  shows "AALTs bs (flts rs) \<le>1 AALTs bs rs"
+  apply(induct rule: flts.induct)
+         apply (simp add: leq1.intros(1))
+        apply simp
+  apply (metis append_Nil leq1.intros(17) leq1.intros(5))
+       apply simp
+       apply(subgoal_tac "AALTs bs (map (fuse bsa) rs1 @ flts rs) \<le>1 AALTs bs (AALTs bsa rs1 # flts rs)")
+        apply (meson leq1.intros(1) leq1.intros(16) leq1.intros(17))
+  using leq1_6_variant1 apply presburger
+  apply (simp add: leq1.intros(1) leq1.intros(16))
+  using leq1.intros(1) leq1.intros(16) apply auto[1]
+  using leq1.intros(1) leq1.intros(16) apply force
+   apply (simp add: leq1.intros(1) leq1.intros(16))
+  using leq1.intros(1) leq1.intros(16) by force
+
+lemma dB_leq12:
+  shows "AALTs bs (distinctWith rs1 eq1 (set rs2)) \<le>1 AALTs bs (rs1 @ rs2)"
+  sorry
+
+
+lemma dB_leq1:
+  shows "AALTs bs (distinctWith rs eq1 {}) \<le>1 AALTs bs rs"
+  by (metis append.right_neutral dB_leq12 list.set(1))
+
+
 
 lemma stupid_leq1_1:
   shows " rerase  r2 \<noteq> RSEQ r (RSEQ RONE (rerase r2))"
@@ -276,7 +312,8 @@
    apply(erule exE)
    apply simp
   apply (metis asize_rsize le_SucI rerase_fuse trans_le_add2)
-  by (smt (verit, best) Suc_eq_plus1 ab_semigroup_add_class.add_ac(1) add.commute add.right_neutral add_cancel_right_right add_mono_thms_linordered_semiring(1) asize.simps(5) asize_rsize nat_add_left_cancel_le order.trans order_trans plus_1_eq_Suc rSEQ_mono rerase_bsimp_ASEQ rsize.simps(5))
+  apply (smt (verit, best) Suc_eq_plus1 ab_semigroup_add_class.add_ac(1) add.commute add.right_neutral add_cancel_right_right add_mono_thms_linordered_semiring(1) asize.simps(5) asize_rsize nat_add_left_cancel_le order.trans order_trans plus_1_eq_Suc rSEQ_mono rerase_bsimp_ASEQ rsize.simps(5))
+  sorry
 
 
 
@@ -316,9 +353,9 @@
   apply force
   apply simp
   apply(simp add: asize_rsize)
-  by (simp add: rerase_fuse size_deciding_equality4)
-
-  
+  apply (simp add: rerase_fuse size_deciding_equality4)
+  apply (metis Suc_n_not_le_n asize_rsize leq1.intros(15) leq1_size rsize.simps(5) trans_le_add2)
+  sorry
 
   
 
@@ -344,9 +381,11 @@
      apply simp
     apply simp
    apply simp
-  apply simp
-  using r_finite1 rerase_fuse by auto
+   apply simp
 
+  using r_finite1 rerase_fuse apply auto[1]
+  apply (smt (verit, best) BlexerSimp.bsimp_ASEQ0 BlexerSimp.bsimp_ASEQ2 bsimp_ASEQ.simps(1) bsimp_ASEQ1 leq1_trans1 rerase.simps(5) rerase_bsimp_ASEQ rerase_fuse rrexp.inject(2) rsimp_SEQ.simps(22))
+  sorry
 
 
 
@@ -419,7 +458,16 @@
   shows "bsimp r \<le>1 r"
   apply(induct rule: bsimp.induct)
         apply simp
+        apply (simp add: leq1.intros(15))
+       apply simp
+  apply(case_tac rs)
+        apply simp
   
+  apply (simp add: leq1.intros(13))
+       apply(case_tac list)
+        apply simp
+  
+
   sorry

author	Chengsong
	Wed, 31 Aug 2022 23:57:42 +0100
changeset 590	988e92a70704
parent 589	86e0203db2da
child 591	b2d0de6aee18

ChengsongTanPhdThesis/Chapters/Bitcoded2.tex		file \| annotate \| diff \| comparison \| revisions
ChengsongTanPhdThesis/Chapters/Cubic.tex		file \| annotate \| diff \| comparison \| revisions
ChengsongTanPhdThesis/Chapters/Finite.tex		file \| annotate \| diff \| comparison \| revisions
ChengsongTanPhdThesis/Chapters/Introduction.tex		file \| annotate \| diff \| comparison \| revisions
ChengsongTanPhdThesis/main.tex		file \| annotate \| diff \| comparison \| revisions
thys4/posix/FBound.thy		file \| annotate \| diff \| comparison \| revisions