ChengsongTanPhdThesis/Chapters/Finite.tex
changeset 668 3831621d7b14
parent 663 0d1e68268d0f
--- a/ChengsongTanPhdThesis/Chapters/Finite.tex	Tue Jul 25 17:28:29 2023 +0100
+++ b/ChengsongTanPhdThesis/Chapters/Finite.tex	Wed Aug 23 03:02:31 2023 +0100
@@ -1,6 +1,6 @@
 % Chapter Template
 
-\chapter{Finiteness Bound} % Main chapter title
+\chapter{A Formal Proof That $\textit{Blexer}\_\textit{simp}$ will not Grow Unbounded} % Main chapter title
 
 \label{Finite} 
 %  In Chapter 4 \ref{Chapter4} we give the second guarantee
@@ -8,15 +8,18 @@
 %regex's derivatives. 
 %(this is cahpter 5 now)
 
-In this chapter we give a bound in terms of the size of 
+In this chapter we prove a bound in terms of the size of 
 the calculated derivatives: 
-given an annotated regular expression $a$, for any string $s$
+given an annotated regular expression $a$, there exists
+a constant integer $N$, such that for any string $s$
 our algorithm $\blexersimp$'s derivatives
-are finitely bounded
-by a constant that only depends on $a$.
-Formally we show that there exists a constant integer $N_a$ such that
+are bounded
+by $N$. %a constant that only depends on $a$.
+Formally this can be expresssed
+as 
+%we show that there exists a constant integer $N_a$ such that
 \begin{center}
-	$\llbracket \bderssimp{a}{s} \rrbracket \leq N_a$
+	$\llbracket \bderssimp{a}{s} \rrbracket \leq N$
 \end{center}
 \noindent
 where the size ($\llbracket \_ \rrbracket$) of 
@@ -36,7 +39,7 @@
 		save other possibilities on a stack when exploring one possible
 		path of matching. Catastrophic backtracking typically occurs
 		when the number of steps increase exponentially with respect
-		to input. In other words, the runtime is $O((c_r)^n)$ of the input
+		to input. In other words, the complexity is $O((c_r)^n)$ of the input
 		string length $n$, where the base of the exponent is determined by the
 		regular expression $r$.
 		%so that they
@@ -107,7 +110,7 @@
 			$\llbracket \ZERO \rrbracket$ & $\dn$ & $1$ \\
 			$\llbracket _{bs} r_1 \cdot r_2 \rrbracket$ & $\dn$ & $\llbracket r_1 \rrbracket + \llbracket r_2 \rrbracket + 1$\\
 			$\llbracket _{bs}\mathbf{c} \rrbracket $ & $\dn$ & $1$\\
-			$\llbracket _{bs}\sum as \rrbracket $ & $\dn$ & $\map \; (\llbracket \_ \rrbracket)\; as   + 1$\\
+			$\llbracket _{bs}\sum as \rrbracket $ & $\dn$ & $(\textit{sum}\; (\map \; (\llbracket \_ \rrbracket)\; as)  ) + 1$\\
 			$\llbracket _{bs} a^* \rrbracket $ & $\dn$ & $\llbracket a \rrbracket + 1$.
 		\end{tabular}
 	\end{center}
@@ -152,40 +155,40 @@
 Our proof states that all the blue nodes
 stay below a size bound $N_a$ determined by the input $a$.
 
-\noindent
 Sulzmann and Lu's assumed a similar picture of their algorithm,
-though in fact their algorithm's size might be better depicted by the following graph:
-\begin{figure}[H]
-	\begin{tikzpicture}[scale=2,
-		every node/.style={minimum size=11mm},
-		->,>=stealth',shorten >=1pt,auto,thick
-		]
-		\node (r0) [rectangle, draw=black, thick, minimum size = 5mm, draw=blue] {$a$};
-		\node (r1) [rectangle, draw=black, thick, right=of r0, minimum size = 7mm]{$a_1$};
-		\draw[->,line width=0.2mm](r0)--(r1) node[above,midway] {$\backslash c_1$};
-
-		\node (r1s) [rectangle, draw=blue, thick, right=of r1, minimum size=7mm]{$a_{1s}$};
-		\draw[->, line width=0.2mm](r1)--(r1s) node[above, midway] {$\simp'$};
-
-		\node (r2) [rectangle, draw=black, thick,  right=of r1s, minimum size = 17mm]{$a_2$};
-		\draw[->,line width=0.2mm](r1s)--(r2) node[above,midway] {$\backslash c_2$};
-
-		\node (r2s) [rectangle, draw = blue, thick, right=of r2,minimum size=14mm]{$a_{2s}$};
-		\draw[->,line width=0.2mm](r2)--(r2s) node[above,midway] {$\simp'$};
-
-		\node (r3) [rectangle, draw = black, thick, right= of r2s, minimum size = 22mm]{$a_3$};
-		\draw[->,line width=0.2mm](r2s)--(r3) node[above,midway] {$\backslash c_3$};
-
-		\node (rns) [right = of r3, draw=blue, minimum size = 20mm]{$a_{3s}$};
-		\draw[->,line width=0.2mm] (r3)--(rns) node [above, midway] {$\simp'$};
-
-		\node (rnn) [right = of rns, minimum size = 1mm]{};
-		\draw[->, dashed] (rns)--(rnn) node [above, midway] {$\ldots$};
-
-	\end{tikzpicture}
-	\caption{Regular expression size change during our $\blexersimp$ algorithm}\label{sulzShrinks}
-\end{figure}
-\noindent
+even though it did not work as they expected.
+%though in fact their algorithm's size might be better depicted by the following graph:
+%\begin{figure}[H]
+%	\begin{tikzpicture}[scale=2,
+%		every node/.style={minimum size=11mm},
+%		->,>=stealth',shorten >=1pt,auto,thick
+%		]
+%		\node (r0) [rectangle, draw=black, thick, minimum size = 5mm, draw=blue] {$a$};
+%		\node (r1) [rectangle, draw=black, thick, right=of r0, minimum size = 7mm]{$a_1$};
+%		\draw[->,line width=0.2mm](r0)--(r1) node[above,midway] {$\backslash c_1$};
+%
+%		\node (r1s) [rectangle, draw=blue, thick, right=of r1, minimum size=7mm]{$a_{1s}$};
+%		\draw[->, line width=0.2mm](r1)--(r1s) node[above, midway] {$\simp'$};
+%
+%		\node (r2) [rectangle, draw=black, thick,  right=of r1s, minimum size = 17mm]{$a_2$};
+%		\draw[->,line width=0.2mm](r1s)--(r2) node[above,midway] {$\backslash c_2$};
+%
+%		\node (r2s) [rectangle, draw = blue, thick, right=of r2,minimum size=14mm]{$a_{2s}$};
+%		\draw[->,line width=0.2mm](r2)--(r2s) node[above,midway] {$\simp'$};
+%
+%		\node (r3) [rectangle, draw = black, thick, right= of r2s, minimum size = 22mm]{$a_3$};
+%		\draw[->,line width=0.2mm](r2s)--(r3) node[above,midway] {$\backslash c_3$};
+%
+%		\node (rns) [right = of r3, draw=blue, minimum size = 20mm]{$a_{3s}$};
+%		\draw[->,line width=0.2mm] (r3)--(rns) node [above, midway] {$\simp'$};
+%
+%		\node (rnn) [right = of rns, minimum size = 1mm]{};
+%		\draw[->, dashed] (rns)--(rnn) node [above, midway] {$\ldots$};
+%
+%	\end{tikzpicture}
+%	\caption{Regular expression size change during our $\blexersimp$ algorithm}\label{sulzShrinks}
+%\end{figure}
+%\noindent
 The picture means that in some cases their lexer (where they use $\simpsulz$ 
 as the simplification function)
 will have a size explosion, causing the running time 
@@ -765,10 +768,13 @@
 we have to somehow get a more concrete form to begin.
 We call such more concrete representations the ``closed forms'' of
 string derivatives as opposed to their original definitions.
-The terminology ``closed form'' is borrowed from mathematics,
-which usually describe expressions that are solely comprised of finitely many
-well-known and easy-to-compute operations such as 
-additions, multiplications, and exponential functions.
+The name ``closed from'' was inspired by closed forms in math,
+and the similarity with closed forms here is that they make
+estimating the same term easier.
+%The terminology ``closed form'' is borrowed from mathematics,
+%which usually describe expressions that are solely comprised of finitely many
+%well-known and easy-to-compute operations such as 
+%additions, multiplications, and exponential functions.
 
 We start by proving some basic identities
 involving the simplification functions for r-regular expressions.