lexing: comparison ChengsongTanPhdThesis/Chapters/Bitcoded2.tex

equal deleted inserted replaced

-:80cc6dc4c98b
+:bd1354127574
 In this chapter we introduce simplifications
 for annotated regular expressions that can be applied to
 each intermediate derivative result. This allows
 us to make $\blexer$ much more efficient.
 Sulzmann and Lu already introduced some simplifications for bitcoded regular expressions,
-but their simplification functions  were inefficient and in some cases needed fixing.
+but their simplification functions could have been more efficient and in some cases needed fixing.
 %We contrast our simplification function
 %with Sulzmann and Lu's, indicating the simplicity of our algorithm.
 %This is another case for the usefulness
 %and reliability of formal proofs on algorithms.
 %These ``aggressive'' simplifications would not be possible in the injection-based
 							     & $\stackrel{\backslash a}{
 	\longrightarrow} $ & $\ldots$\\
 	\end{tabular}
 \end{center}
 \noindent
-As can be seen, there are serveral duplications.
+As can be seen, there are several duplications.
 A simple-minded simplification function cannot simplify
 the third regular expression in the above chain of derivative
 regular expressions, namely
 \begin{center}
 $((a^*a^* + a^*) + a^*)\cdot(a^*a^*)^* + (a^*a^* + a^*)\cdot(a^*a^*)^*$
 \end{center}
 because the duplicates are
-not next to each other and therefore the rule
+not next to each other, and therefore the rule
 $r+ r \rightarrow r$ from $\textit{simp}$ does not fire.
 One would expect a better simplification function to work in the
 following way:
 \begin{gather*}
 	((a^*a^* + \underbrace{a^*}_\text{A})+\underbrace{a^*}_\text{duplicate of A})\cdot(a^*a^*)^* +
 \caption{Lexing the regular expression $(a^*a^*)^*$ against strings of the form
 $\protect\underbrace{aa\ldots a}_\text{n \textit{a}s}
 $ using Sulzmann and Lu's lexer}\label{SulzmannLuLexer}
 \end{figure}
 \noindent
-At $n= 20$ we already get an out of memory error with Scala's normal
+At $n= 20$ we already get an out-of-memory error with Scala's normal
 JVM heap size settings.
 In fact their simplification does not improve much over
 the simple-minded simplifications we have shown in \ref{fig:BetterWaterloo}.
 The time required also grows exponentially:
 \begin{figure}[H]
 It is more desirable
 to flatten
 an entire list to open up possibilities for further simplifications
 with later regular expressions.
 Not flattening the rest of the elements also means that
-the later de-duplication processs
+the later de-duplication process
 does not fully remove further duplicates.
 For example,
 using $\textit{simp}\_{SL}$ we cannot
 simplify
 \begin{center}
 &&$\quad\textit{case} \; (a_1', a_2') \Rightarrow   _{bs}a_1' \cdot a_2'$
 	\end{tabular}
 \end{center}
 \noindent
 The most involved part is the $\sum$ clause, where we first call $\flts$ on
-the simplified children regular expression list $\textit{map}\; \textit{bsimp}\; \textit{as}$.
+the simplified children regular expression list $\textit{map}\; \textit{bsimp}\; \textit{as}$,
-and then call $\distinctBy$ on that list, the predicate determining whether two
+and then call $\distinctBy$ on that list. The predicate used in $\distinctBy$ for determining whether two
 elements are the same is $\rerases \; r_1 = \rerases\; r_2$.
 Finally, depending on whether the regular expression list $as'$ has turned into a
 singleton or empty list after $\flts$ and $\distinctBy$, $\textit{bsimp}_{ALTS}$
 decides whether to keep the current level constructor $\sum$ as it is, and
-removes it when there are less than two elements:
+removes it when there are fewer than two elements:
 \begin{center}
 	\begin{tabular}{lcl}
 		$\textit{bsimp}_{ALTS} \; bs \; as'$ & $ \dn$ & $ as' \; \textit{match}$\\
 &&$\quad\textit{case} \; [] \Rightarrow  \ZERO$ \\
 &&$\quad\textit{case} \; a :: [] \Rightarrow  \textit{fuse bs a}$ \\
 \end{figure}
 \noindent
 The rules $LT$ and $LH$ are for rewriting two regular expression lists
 such that one regular expression
 in the left-hand-side list is rewritable in one step
-to the right-hand-side's regular expression at the same position.
+to the right-hand side's regular expression at the same position.
 This helps with defining the ``context rule'' $AL$.
 The reflexive transitive closure of $\rightsquigarrow$ and $\stackrel{s}{\rightsquigarrow}$
 are defined in the usual way:
 \begin{figure}[H]
 			$rs_1 \stackrel{s*}{\rightsquigarrow} rs_2 \implies _{bs} \sum rs_1 \stackrel{*}{\rightsquigarrow} _{bs} \sum rs_2$
 		\item
 			$r \rightsquigarrow^* r' \implies _{bs} \sum (r :: rs)\; \rightsquigarrow^*\;  _{bs} \sum (r' :: rs)$
 		\item
-			The rewriting in many steps property is composible
+			The rewriting in many steps property is composable
 			in terms of the sequence constructor:\\
 			$r_1 \rightsquigarrow^* r_2
 			\implies _{bs} r_1 \cdot r_3 \rightsquigarrow^* \;
 			_{bs} r_2 \cdot r_3 \quad $
 			and
 	the inductive cases of $\stackrel{s*}{\rightsquigarrow}$ and $\rightsquigarrow^*$ respectively.
 	The third and fourth points are
 	by the properties $r_1 \rightsquigarrow r_2 \implies \fuse \; bs \; r_1 \implies \fuse \; bs \; r_2$ and
 	$rs_2 \stackrel{s}{\rightsquigarrow} rs_3
 	\implies \map \; (\fuse \; bs) rs_2 \stackrel{s*}{\rightsquigarrow} \map \; (\fuse \; bs)\; rs_3$,
-	which can be indutively proven by the inductive cases of $\rightsquigarrow$ and
+	which can be inductively proven by the inductive cases of $\rightsquigarrow$ and
 	$\stackrel{s}{\rightsquigarrow}$.
 \end{proof}
 \noindent
 The inference rules of $\stackrel{s}{\rightsquigarrow}$
 are defined in terms of the list cons operation, where
 if we can rewrite from one to the other in finitely
 many steps.
 For convenience,
 we define a predicate for a list of regular expressions
-having at least one nullable regular expressions:
+having at least one nullable regular expression:
 \begin{center}
 	$\textit{bnullables} \; rs \quad \dn \quad \exists r \in rs. \;\; \bnullable \; r$
 \end{center}
 \noindent
 The rewriting relation $\rightsquigarrow$ preserves (b)nullability:
 			\implies \bnullable \; r_1 = \bnullable \; r_2$
 	\end{itemize}
 \end{lemma}
 \begin{proof}
 	By rule induction of $\rightsquigarrow$ and $\stackrel{s}{\rightsquigarrow}$.
-	The third point is a corollary of the second.
+	The third point is a result of the second.
 \end{proof}
 \noindent
 For convenience again,
 we define $\bmkepss$ on a list $rs$,
 which extracts the bit-codes on the first $\bnullable$ element in $rs$:
 \begin{lemma}
 	$rs_1 @ rs_2 \stackrel{s*}{\rightsquigarrow}
 	(rs_1 @ (\distinctBy \; rs_2 \; \; \rerases \;\; (\map\;\; \rerases \; \; rs_1)))$
 \end{lemma}
 \noindent
-It says that that for a list made of two parts $rs_1 @ rs_2$,
+It says that for a list made of two parts $rs_1 @ rs_2$,
 one can throw away the duplicate
 elements in $rs_2$, as well as those that have appeared in $rs_1$.
 \begin{proof}
 	By induction on $rs_2$, where $rs_1$ is allowed to be arbitrary.
 \end{proof}
 \begin{proof}
 	By induction on $\rightsquigarrow$
 	and $\stackrel{s}{\rightsquigarrow}$, using a number of the previous lemmas.
 \end{proof}
 \noindent
-Now we can prove property 3, as an immediate corollary:
+Now we can prove property 3 as an immediate corollary:
 \begin{corollary}\label{rewritesBder}
 	$r_1 \rightsquigarrow^* r_2 \implies r_1 \backslash c \rightsquigarrow^*
 	r_2 \backslash c$
 \end{corollary}
 \begin{proof}
 	We know that they generate the same bits, if the lexing result is a match:
 	\begin{center}
 		$\bnullable \; (a \backslash s)
 		\implies \bmkeps \; (a \backslash s) = \bmkeps \; (\bderssimp{a}{s})$
 	\end{center}
-	Now that they generate the same bits, we know that they give the same value after decoding.
+	Now that they generate the same bits, we know they also give the same value after decoding.
 	\begin{center}
 		$\bnullable \; (a \backslash s)
 		\implies \decode \; r \; (\bmkeps \; (a \backslash s)) =
 		\decode \; r \; (\bmkeps \; (\bderssimp{a}{s}))$
 	\end{center}
 	$\nexists v. \; (r, s) \rightarrow v \;\; \textit{iff} \;\; \blexersimp\;
 	r\;s = \None$.
 \end{corollary}
 \subsection{Comments on the Proof Techniques Used}
-Straightforward and simple as the proof may seem,
+Straightforward as the proof may seem,
 the efforts we spent obtaining it were far from trivial.
 We initially attempted to re-use the argument
 in \cref{flex_retrieve}.
 The problem is that both functions $\inj$ and $\retrieve$ require
 that the annotated regular expressions stay unsimplified,
 	$_{[]}(_{ZZ}\ONE +  _{ZS}c ) $\\
 	and\\
 	$_{Z}(_{Z} \ONE + _{S} c)$
 \end{center}
-as equal, because they were both re-written
+as equal because they were both re-written
 from the same expression.
 The simplification rewriting rules
 given in \ref{rrewriteRules} are by no means
 final,
-one could come up new rules
+one could come up with new rules
 such as
 $\SEQ r_1 \cdot (\SEQ r_1 \cdot r_3) \rightarrow
 \SEQs [r_1, r_2, r_3]$.
 However this does not fit with the proof technique
 of our main theorem, but seem to not violate the POSIX

changeset 640	bd1354127574
parent 639	80cc6dc4c98b
child 649	ef2b8abcbc55