lexing: comparison ChengsongTanPhdThesis/Chapters/Finite.tex

equal deleted inserted replaced

-:80cc6dc4c98b
+:bd1354127574
 %  In Chapter 4 \ref{Chapter4} we give the second guarantee
 %of our bitcoded algorithm, that is a finite bound on the size of any
 %regex's derivatives.
-In this chapter we give a bound in terms of size of
+In this chapter we give a bound in terms of the size of
 the calculated derivatives:
 given an annotated regular expression $a$, for any string $s$
 our algorithm $\blexersimp$'s derivatives
 are finitely bounded
-by a constant  that only depends on $a$.
+by a constant that only depends on $a$.
 Formally we show that there exists an $N_a$ such that
 \begin{center}
 	$\llbracket \bderssimp{a}{s} \rrbracket \leq N_a$
 \end{center}
 \noindent
 where the size ($\llbracket \_ \rrbracket$) of
 an annotated regular expression is defined
 in terms of the number of nodes in its
-tree structure (its recursive definition given in the next page).
+tree structure (its recursive definition is given in the next page).
 We believe this size bound
-is important in the context of POSIX lexing, because
+is important in the context of POSIX lexing because
 \begin{itemize}
 	\item
 		It is a stepping stone towards the goal
 		of eliminating ``catastrophic backtracking''.
 		If the internal data structures used by our algorithm
 		be slow.
 		The next step is to refine the bound $N_a$ so that it
 		is not just finite but polynomial in $\llbracket a\rrbracket$.
 	\item
 		Having the finite bound formalised
-		gives us a higher confidence that
+		gives us higher confidence that
-		our simplification algorithm $\simp$ does not ``mis-behave''
+		our simplification algorithm $\simp$ does not ``misbehave''
 		like $\textit{simpSL}$ does.
 		The bound is universal for a given regular expression,
 		which is an advantage over work which
 		only gives empirical evidence on
 		some test cases (see for example Verbatim work \cite{Verbatimpp}).
 After $\simp$ the node shrinks.
 Our proof states that all the blue nodes
 stay below a size bound $N_a$ determined by the input $a$.
 \noindent
-Sulzmann and Lu's assumed a similar picture about their algorithm,
+Sulzmann and Lu's assumed a similar picture of their algorithm,
 though in fact their algorithm's size might be better depicted by the following graph:
 \begin{figure}[H]
 	\begin{tikzpicture}[scale=2,
 		every node/.style={minimum size=11mm},
 		->,>=stealth',shorten >=1pt,auto,thick
 of each derivative step to grow continuously (for example
 in \ref{SulzmannLuLexerTime}).
 They tested out the run time of their
 lexer on particular examples such as $(a+b+ab)^*$
 and claimed that their algorithm is linear w.r.t to the input.
-With our mecahnised proof, we avoid this type of unintentional
+With our mechanised proof, we avoid this type of unintentional
 generalisation.
-Before delving in to the details of the formalisation,
+Before delving into the details of the formalisation,
-we are going to provide an overview of it in the next subsection.
+we are going to provide an overview of it in the following subsection.
 \subsection{Overview of the Proof}
 A high-level overview of the main components of the finiteness proof
 is as follows:
 $\textit{rrexp}$s.
 They are annotated regular expressions without bitcodes,
 allowing a more convenient size bound proof.
 %Of course, the bits which encode the lexing information
 %would grow linearly with respect
-%to the input, which should be taken into account when we wish to tackle the runtime comlexity.
+%to the input, which should be taken into accounte when we wish to tackle the runtime complexity.
 %But for the sake of the structural size
 %we can safely ignore them.\\
 The datatype
 definition of the $\rrexp$, called
 \emph{r-regular expressions},
 	\end{tabular}
 \end{center}
 \noindent
 The $r$ in the subscript of $\llbracket \rrbracket_r$ is to
 differentiate with the same operation for annotated regular expressions.
-Adding $r$ as subscript will be used in
+Similar subscripts will be added for operations like $\rerase{}$:
-other operations as well.\\
-The transformation from an annotated regular expression
-to an r-regular expression is straightforward.
 \begin{center}
 	\begin{tabular}{lcl}
 		$\rerase{\ZERO}$ & $\dn$ & $\RZERO$\\
 		$\rerase{_{bs}\ONE}$ & $\dn$ & $\RONE$\\
 		$\rerase{_{bs}\mathbf{c}}$ & $\dn$ & $\RCHAR{c}$\\
 The singleton alternative $\sum [r]$ becomes $r$ during the
 $\erase$ function.
 The  annotated regular expression $\sum[a, b, c]$ would turn into
 $(a+(b+c))$.
 All these operations change the size and structure of
-an annotated regular expressions, adding unnecessary
+an annotated regular expression, adding unnecessary
 complexities to the size bound proof.
 For example, if we define the size of a basic plain regular expression
 in the usual way,
 \begin{center}
 %The way we obtain the bound for $\rrexp$s is by two steps:
 %\begin{itemize}
 %	\item
 %		First, we rewrite $r\backslash s$ into something else that is easier
-%		to bound. This step is especially important for the inductive case
+%		to bound. This step is crucial for the inductive case
 %		$r_1 \cdot r_2$ and $r^*$, where the derivative can grow and bloat in a wild way,
-%		but after simplification they will always be equal or smaller to a form consisting of an alternative
+%		but after simplification, they will always be equal or smaller to a form consisting of an alternative
 %		list of regular expressions $f \; (g\; (\sum rs))$ with some functions applied to it, where each element will be distinct after the function application.
 %	\item
 %		Then, for such a sum  list of regular expressions $f\; (g\; (\sum rs))$, we can control its size
 %		by estimation, since $\distinctBy$ and $\flts$ are well-behaved and working together would only
 %		reduce the size of a regular expression, not adding to it.
 for the set of substrings of $s$).
 The list $\sum rs$ will then be de-duplicated by $\textit{rdistinct}$
 in the simplification, which prevents the $rs$ from growing indefinitely.
 Based on this idea, we develop a proof in two steps.
-First, we show the equality (where
+First, we show the below equality (where
 $f$ and $g$ are functions that do not increase the size of the input)
 \begin{center}
 $r\backslash_{rsimps} s = f\; (\textit{rdistinct} \; (g\; \sum rs))$,
 \end{center}
 where $r = r_1 \cdot r_2$ or $r = r_0^*$ and so on.
 We will describe in detail the first step of the proof
 in the next section.
 \section{Closed Forms}
 In this section we introduce in detail
-how we express the string derivatives
+how to express the string derivatives
 of regular expressions (i.e. $r \backslash_r s$ where $s$ is a string
 rather than a single character) in a different way than
 our previous definition.
 In previous chapters, the derivative of a
 regular expression $r$ w.r.t a string $s$
 $(r_1 \cdot r_2)\backslash s$,
 we have to somehow get a more concrete form to begin.
 We call such more concrete representations the ``closed forms'' of
 string derivatives as opposed to their original definitions.
 The terminology ``closed form'' is borrowed from mathematics,
-which usually describe expressions that are solely comprised of
+which usually describe expressions that are solely comprised of finitely many
 well-known and easy-to-compute operations such as
-additions, multiplications, exponential functions.
+additions, multiplications, and exponential functions.
 We start by proving some basic identities
 involving the simplification functions for r-regular expressions.
 After that we introduce the rewrite relations
 $\rightsquigarrow_h$, $\rightsquigarrow^*_{scf}$
 $\rightsquigarrow_f$ and $\rightsquigarrow_g$.
-These relations involves similar techniques as in chapter \ref{Bitcoded2}
+These relations involve similar techniques as in chapter \ref{Bitcoded2}
 for annotated regular expressions.
 Finally, we use these identities to establish the
 closed forms of the alternative regular expression,
 the sequence regular expression, and the star regular expression.
 %$r_1\cdot r_2$, $r^*$ and $\sum rs$.
 \subsection{Some Basic Identities}
 In what follows we will often convert between lists
 and sets.
-We use Isabelle's $set$ to refere to the
+We use Isabelle's $set$ to refer to the
 function that converts a list $rs$ to the set
 containing all the elements in $rs$.
 \subsubsection{$\textit{rdistinct}$'s Does the Job of De-duplication}
 The $\textit{rdistinct}$ function, as its name suggests, will
 de-duplicate an r-regular expression list.
 It will also remove any elements that
-is already in the accumulator set.
+are already in the accumulator set.
 \begin{lemma}\label{rdistinctDoesTheJob}
 	%The function $\textit{rdistinct}$ satisfies the following
 	%properties:
 	Assume we have the predicate $\textit{isDistinct}$\footnote{We omit its
 	recursive definition here. Its Isabelle counterpart would be $\textit{distinct}$.}
 	\end{itemize}
 \end{lemma}
 \noindent
 \begin{proof}
 	The first part is by an induction on $rs$.
-	The second and third part can be proven by using the
+	The second and third parts can be proven by using the
 	inductive cases of $\textit{rdistinct}$.
 \end{proof}
 \noindent
 \subsubsection{The Properties of $\textit{Rflts}$}
 We give in this subsection some properties
 involving $\backslash_r$, $\backslash_{rsimps}$, $\textit{rflts}$ and
 $\textit{rsimp}_{ALTS} $, together with any non-trivial lemmas that lead to them.
-These will be helpful in later closed form proofs, when
+These will be helpful in later closed-form proofs, when
 we want to transform derivative terms which have
 %the ways in which multiple functions involving
 %those are composed together
-interleaving derivatives and  simplifications applied to them.
+interleaving derivatives and simplifications applied to them.
 \noindent
 %When the function $\textit{Rflts}$
-%is applied to the concatenation of two lists, the output can be calculated by first applying the
+%is applied to the concatenation of two lists; the output can be calculated by first applying the
-%functions on two lists separately, and then concatenating them together.
+%functions on two lists separately and then concatenating them together.
 $\textit{Rflts}$ is composable in terms of concatenation:
 \begin{lemma}\label{rfltsProps}
 	The function $\rflts$ has the properties below:\\
 	\begin{itemize}
 		\item
 \noindent
 We omit the recursive definition of the predicate $\textit{nonAlt}$,
 which evaluates to true when the regular expression is not an
 alternative, and false otherwise.
 The $\good$ property is preserved under $\rsimp_{ALTS}$, provided that
-its non-empty argument list of expressions are all good themsleves, and $\textit{nonAlt}$,
+its non-empty argument list of expressions are all good themselves, and $\textit{nonAlt}$,
 and unique:
 \begin{lemma}\label{rsimpaltsGood}
-	If $rs \neq []$ and forall $r \in rs. \textit{nonAlt} \; r$ and $\textit{isDistinct} \; rs$,
+	If $rs \neq []$ and for all $r \in rs. \textit{nonAlt} \; r$ and $\textit{isDistinct} \; rs$,
 	then $\good \; (\rsimpalts \; rs)$ if and only if forall $r \in rs. \; \good \; r$.
 \end{lemma}
 \noindent
 We also note that
 if a regular expression $r$ is good, then $\rflts$ on the singleton
 \end{lemma}
 \begin{proof}
 	By induction on $r$.
 \end{proof}
 \noindent
-With this we can prove that a regular expressions
+With this we can prove that a regular expression
 after simplification and flattening and de-duplication,
 will not contain any alternative regular expression directly:
 \begin{lemma}\label{nonaltFltsRd}
 	If $x \in \rdistinct{\rflts\; (\map \; \rsimp{} \; rs)}{\varnothing}$
 	then $\textit{nonAlt} \; x$.
 			$\good \; (\rsimp{y})$ or $\rsimp{y} = \RZERO$,
 			then $\good \; x$.
 	\end{itemize}
 \end{lemma}
 \begin{proof}
-	The first part is by induction on $rs$, where the induction
+	The first part is by induction, where the inductive cases
-	rule is the inductive cases for $\rflts$.
+	are the inductive cases of $\rflts$.
 	The second part is a corollary from the first part.
 \end{proof}
 This leads to good structural property of $\rsimp{}$,
 that after simplification, a regular expression is
 	By an induction on $r$. The inductive measure is the size $\llbracket \rrbracket_r$.
 	Lemma \ref{rsimpMono} says that
 	$\llbracket \rsimp{r}\rrbracket_r$ is smaller than or equal to
 	$\llbracket r \rrbracket_r$.
 	Therefore, in the $r_1 \cdot r_2$ and $\sum rs$ case,
-	Inductive hypothesis applies to the children regular expressions
+	The inductive hypothesis applies to the children regular expressions
 	$r_1$, $r_2$, etc. The lemma \ref{flts3Obv}'s precondition is satisfied
 	by that as well.
 	The lemmas \ref{nonnestedRsimp} and  \ref{nonaltFltsRd} are used
 	to ensure that goodness is preserved at the topmost level.
 \end{proof}
 		\item
 			$\rsimp{(\RALTS{rs})} = \rsimp{(\RALTS{\map \; \rsimp{} \; rs})}$
 	\end{itemize}
 \end{corollary}
 \noindent
-This will be useful in the later closed form proof's rewriting steps.
+This will be useful in the later closed-form proof's rewriting steps.
 Similarly, we state the following useful facts below:
 \begin{lemma}
 	The following equalities hold if $r = \rsimp{r'}$ for some $r'$:
 	\begin{itemize}
 		\item
 the first ($r\backslash c$) to the other ($r'\backslash c$).
 \begin{lemma}\label{interleave}
 	If $r \hrewrites r' $ then $\rder{c}{r} \hrewrites \rder{c}{r'}$
 \end{lemma}
 \noindent
-This allows us proving more $\mathbf{rsimp}$-equivalent terms, involving $\backslash_r$.
+This allows us to prove more $\mathbf{rsimp}$-equivalent terms, involving $\backslash_r$.
 \begin{lemma}\label{insideSimpRemoval}
 	$\rsimp{(\rder{c}{(\rsimp{r})})} = \rsimp{(\rder{c}{r})}  $
 \end{lemma}
 \noindent
 \begin{proof}
 		$((r_1 \backslash cc'c'' \cdot r_2 + r_2 \backslash c'') + r_2 \backslash c'c'')
 		+ r_2 \backslash cc'c''$ &
 		$   \longrightarrow_{\backslash c''} \quad \ldots$\\
 	\end{tabular}
 \end{center}
-Roughly speaking $r_1 \cdot r_2 \backslash s$ can be expresssed as
+Roughly speaking $r_1 \cdot r_2 \backslash s$ can be expressed as
 a giant alternative taking a list of terms
 $[r_1 \backslash_r s \cdot r_2, r_2 \backslash_r s'', r_2 \backslash_r s_1'', \ldots]$,
 where the head of the list is always the term
 representing a match involving only $r_1$, and the tail of the list consisting of
 terms of the shape $r_2 \backslash_r s''$, $s''$ being a suffix of $s$.
 In a closed-form one needs to take into account this (because
 closed forms require exact equality rather than language equivalence)
 and only generate the
 $r_2 \backslash_r s''$ terms satisfying the property
 \begin{center}
-$\exists  s'.  such \; that \; s'@s'' = s \;\; \land \;\;
+$\exists s'.  such \; that \; s'@s'' = s \;\; \land \;\;
 r_1 \backslash s' \; is \; nullable$.
 \end{center}
 Given the arguments $s$ and $r_1$, we denote the list of strings
 $s''$ satisfying the above property as $\vsuf{s}{r_1}$.
 The function $\vsuf{\_}{\_}$ is defined recursively on the structure of the string\footnote{
-	Perhaps a better name of it would be ``NullablePrefixSuffix''
+	Perhaps a better name for it would be ``NullablePrefixSuffix''
 	to differentiate with the list of \emph{all} prefixes of $s$, but
 	that is a bit too long for a function name and we are yet to find
 a more concise and easy-to-understand name.}
 \begin{center}
 	\begin{tabular}{lcl}
 When we have a string $s = c :: c' :: c'' \ldots$  such that $r \backslash c$, $r \backslash cc'$, $r \backslash c'$,
 $r \backslash cc'c''$, $r \backslash c'c''$, $r\backslash c''$ etc. are all nullable,
 the number of terms in $r^* \backslash s$ will grow exponentially rather than linearly
 in the sequence case.
 The good news is that the function $\textit{rsimp}$ will again
-ignore the difference between differently nesting patterns of alternatives,
+ignore the difference between different nesting patterns of alternatives,
 and the exponentially growing star derivative like
 \begin{center}
 	$(r_1 \backslash cc'c'' \cdot r^* + r \backslash c'') +
 	(r \backslash c'c'' \cdot r^* + r \backslash c'' \cdot r^*) $
 \end{center}
 		\map \; \hflataux{\_} (\map \; (\_\backslash_r c) \;(\hflataux{r})))$
 	\end{center}
 	holds.
 \end{lemma}
 \begin{proof}
-	By an induction on the inductivev cases of $\textit{createdByStar}$.
+	By an induction on the inductive cases of $\textit{createdByStar}$.
 \end{proof}
 %This is not entirely true for annotated regular expressions:
 %%TODO: bsimp bders \neq bderssimp
 %\begin{center}
 %	$(1+ (c\cdot \ASEQ{bs}{c^*}{c} ))$
 \end{lemma}
 \begin{proof}
 	By using the rewriting relation $\rightsquigarrow$
 \end{proof}
-And from this we obtain that a
+And from this we obtain the following fact: a
 regular expression created by star
 is the same as its flattened version, up to equivalence under $\textit{bsimp}$.
 For example,
 \begin{lemma}\label{hfauRsimpeq2}
 	$\textit{createdByStar} \; r \implies \rsimp{r} = \rsimp{\RALTS{\hflataux{r}}}$
 %\end{center}
 \section{Bounding Closed Forms}
 In this section, we introduce how we formalised the bound
 on closed forms.
-We first show that in general regular expressions up to a certain
+We first show that in general the number of regular expressions up to a certain
-size are finite.
+size is finite.
 Then we prove that functions such as $\rflts$
 will not cause the size of r-regular expressions to grow.
 Putting this together with a general bound
 on the finiteness of distinct regular expressions
-up to a certain size, we obtain a bound on
+up to a specific size, we obtain a bound on
 the closed forms.
 \subsection{Finiteness of Distinct Regular Expressions}
-We define the set of regular expressions whose size are no more than
+We define the set of regular expressions whose size is no more than
 a certain size $N$ as $\textit{sizeNregex} \; N$:
 \[
 	\textit{sizeNregex} \; N \dn \{r\; \mid \;  \llbracket r \rrbracket_r \leq N \}
 \]
 We have that $\textit{sizeNregex} \; N$ is always a finite set:
 \end{lemma}
 \begin{proof}
 	By splitting the set $\textit{sizeNregex} \; (N + 1)$ into
 	subsets by their categories:
 	$\{\ZERO_r, \ONE_r, c\}$, $\{r^* \mid r \in \textit{sizeNregex} \; N\}$,
-	and so on. Each of these subsets are finitely bounded.
+	and so on. Each of these subsets is finitely bounded.
 \end{proof}
 \noindent
 From this we get a corollary that
 if forall $r \in rs$, $\rsize{r} \leq N$, then the output of
 $\rdistinct{rs}{\varnothing}$ is a list of regular
 				\textit{rsimp} \; x))\; \{ \} ) \rrbracket_r
 			\]
 	\end{itemize}
 \end{lemma}
 \begin{proof}
-	Point 1, 3, 4 can be proven by an induction on $rs$.
+	Points 1, 3, and 4 can be proven by an induction on $rs$.
 	Point 2 is by case analysis on $r_1$ and $r_2$.
 	The last part is a corollary of the previous ones.
 \end{proof}
 \noindent
 With the lemmas for each inductive case in place, we are ready to get
 can be described by some inductive sub-structures
 (for example when $r = r_1 \cdot r_2$ then $rs'$
 will be solely comprised of $r_1 \backslash s'$
 and $r_2 \backslash s''$, $s'$ and $s''$ being
 sub-strings of $s$).
-which will each have a size uppder bound
+which will each have a size upper bound
-according to inductive hypothesis, which controls $r \backslash s$.
+according to the inductive hypothesis, which controls $r \backslash s$.
 We elaborate the above reasoning by a series of lemmas
 below, where straightforward proofs are omitted.
 %We want to apply it to our setting $\rsize{\rsimp{\sum rs}}$.
 We show that $\textit{rdistinct}$ and $\rflts$
 	\llbracket \rdistinct{rs}{\varnothing}  \rrbracket_r  $.
 \end{center}
 We need this so that we know the outcome of our real
 simplification is better than or equal to a rough estimate,
 and therefore can be bounded by that estimate.
-This is a bit harder to establish compared with proving
+This is a bit harder to establish compared to proving
 $\textit{flts}$ does not make a list larger (which can
 be proven using routine induction):
 \begin{center}
 	$\llbracket  \textit{rflts}\; rs \rrbracket_r \leq
 	\llbracket  \textit{rs} \rrbracket_r$
 \end{lemma}
 \begin{proof}
 	By using corollary \ref{interactionFltsDB}.
 \end{proof}
 \noindent
-This is a key lemma in establishing the bounds on all the
+This is a key lemma in establishing the bounds of all the
 closed forms.
 With this we are now ready to control the sizes of
 $(r_1 \cdot r_2 )\backslash s$ and $r^* \backslash s$.
 \begin{theorem}\label{rBound}
 	For any regex $r$, $\exists N_r. \forall s. \; \rsize{\rderssimp{r}{s}} \leq N_r$
 		$\llbracket r \rrbracket_r + n$\\
 	\end{tabular}
 \end{center}
 \noindent
 Arguably we should use $\log \; n$ for the size because
-the number of digits increase logarithmically w.r.t $n$.
+the number of digits increases logarithmically w.r.t $n$.
 For simplicity we choose to add the counter directly to the size.
 The derivative w.r.t a bounded regular expression
 is given as
 \begin{center}
 We confirm this point because the correctness theorem would also
 extend without surprise to $\blexersimp$.
 The rewrite rules such as $\rightsquigarrow$, $\stackrel{s}{\rightsquigarrow}$ and so on
 do not need to be changed,
 and only a few lemmas such as lemma \ref{fltsPreserves} need to be adjusted to
-add one more line which can be solved by sledgehammer
+add one more line which can be solved by the Sledgehammer tool
 to solve the $r^{\{n\}}$ inductive case.
 \subsubsection{Finiteness Proof Augmentation}
 The bounded repetitions are
 	By a reverse induction on $s$.
 	For the inductive case, note that if $\cbn \; r$ holds,
 	then $\cbn \; (r\backslash_r c)$ holds.
 \end{proof}
 \noindent
-In addition, for $\cbn$-shaped regular expressioins, one can flatten
+In addition, for $\cbn$-shaped regular expressions, one can flatten
 them:
 \begin{lemma}\label{ntimesHfauPushin}
 	If $\cbn \; r$ holds, then $\hflataux{r \backslash_r c} =
 	\textit{concat} \; (\map \; \hflataux{\map \; (\_\backslash_r c) \;
 	(\hflataux{r})})$
 		$\nString \; (\Some \; (c::s, n))$  & $\dn$ & $ \textit{true}$\\
 	\end{tabular}
 \end{center}
 \begin{lemma}\label{nupdatesNonempty}
 	If for all elements $o \in \textit{set} \; Ss$,
-	$\nString \; o$ holds, the we have that
+	$\nString \; o$ holds, then we have that
 	for all elements $o' \in \textit{set} \; (\nupdates \; s \; r \; Ss)$,
 	$\nString \; o'$ holds.
 \end{lemma}
 \begin{proof}
 	By an induction on $s$, where $Ss$ is set to vary over all possible values.
 We aim to formalise the correctness and size bound
 for constructs like $r^{\{\ldots n\}}$, $r^{\{n \ldots\}}$
 and so on, which is still work in progress.
 They should more or less follow the same recipe described in this section.
-Once we know about how to deal with them recursively using suitable auxiliary
+Once we know how to deal with them recursively using suitable auxiliary
-definitions, we are able to routinely establish the proofs.
+definitions, we can routinely establish the proofs.
 %----------------------------------------------------------------------------------------
 %	SECTION 3
 %----------------------------------------------------------------------------------------
 inflates the size bound of $\llbracket \bderssimp{r_2}{s} \rrbracket$ with the function
 $f(x) = x * 2^x$.
 This means the bound we have will surge up at least
 tower-exponentially with a linear increase of the depth.
-One might be quite skeptical about what this non-elementary
+One might be pretty skepticafl about what this non-elementary
 bound can bring us.
 It turns out that the giant bounds are far from being hit.
 Here we have some test data from randomly generated regular expressions:
 \begin{figure}[H]
 	\begin{tabular}{@{}c@{\hspace{2mm}}c@{\hspace{0mm}}c@{}}
 \end{tikzpicture}\\
 \multicolumn{3}{c}{}
 	\end{tabular}
 \caption{Graphs: size change of 3 randomly generated
 regular expressions $w.r.t.$ input string length.
-The x axis represents the length of input.}
+The x-axis represents the length of the input.}
 \end{figure}
 \noindent
 Most of the regex's sizes seem to stay within a polynomial bound $w.r.t$ the
 original size.
 We will discuss improvements to this bound in the next chapter.
 	\item
 		The size proof can serve as a starting point for a complexity
 		formalisation.
 		Derivatives are the most important phases of our lexer algorithm.
-		Size properties about derivatives covers the majority of the algorithm
+		Size properties about derivatives cover the majority of the algorithm
-		and is therefore a good indication of complexity of the entire program.
+		and is therefore a good indication of the complexity of the entire program.
 	\item
 		The bound is already a strong indication that catastrophic
 		backtracking is much less likely to occur in our $\blexersimp$
 		algorithm.
 		We refine $\blexersimp$ with $\blexerStrong$ in the next chapter
-One might wonder the actual bound rather than the loose bound we gave
+One might wonder about the actual bound rather than the loose bound we gave
-for the convenience of an easier proof.
+for the convenience of a more straightforward proof.
 How much can the regex $r^* \backslash s$ grow?
 As  earlier graphs have shown,
 %TODO: reference that graph where size grows quickly
 they can grow at a maximum speed
 exponential $w.r.t$ the number of characters,
 to express $(a ^ * + (aa) ^ * + (aaa) ^ * + \ldots +
 (\underbrace{a \ldots a}_{\text{n a's}})^*$ in the below discussion.
 The exponential size is triggered by that the regex
 $\sum_{i=1}^{n} (\underbrace{a \ldots a}_{\text{i a's}})^*$
 inside the $(\ldots) ^*$ having exponentially many
-different derivatives, despite those difference being minor.
+different derivatives, despite those differences being minor.
 $(\sum_{i=1}^{n} (\underbrace{a \ldots a}_{\text{i a's}})^*)^*\backslash \underbrace{a \ldots a}_{\text{m a's}}$
 will therefore contain the following terms (after flattening out all nested
 alternatives):
 \begin{center}
 $(\sum_{i = 1}^{n}  (\underbrace{a \ldots a}_{\text{((i - (m' \% i))\%i) a's}})\cdot  (\underbrace{a \ldots a}_{\text{i a's}})^* )\cdot (\sum_{i=1}^{n} (\underbrace{a \ldots a}_{\text{i a's}})^*)$\\
 %-----------------------------------
 %	SUBSECTION 1
 %-----------------------------------
 %\subsection{Syntactic Equivalence Under $\simp$}
-%We prove that minor differences can be annhilated
+%We prove that minor differences can be annihilated
 %by $\simp$.
 %For example,
 %\begin{center}
 %	$\simp \;(\simpALTs\; (\map \;(\_\backslash \; x)\; (\distinct \; \mathit{rs}\; \phi))) =
 %	\simp \;(\simpALTs \;(\distinct \;(\map \;(\_ \backslash\; x) \; \mathit{rs}) \; \phi))$

changeset 640	bd1354127574
parent 639	80cc6dc4c98b
child 659	2e05f04ed6b3