ChengsongTanPhdThesis/Chapters/Bitcoded2.tex
changeset 624 8ffa28fce271
parent 601 ce4e5151a836
child 639 80cc6dc4c98b
equal deleted inserted replaced
623:c0c1ebe09c7d 624:8ffa28fce271
    10 %Chapter 3\ref{Chapter3}. 
    10 %Chapter 3\ref{Chapter3}. 
    11 
    11 
    12 
    12 
    13 
    13 
    14 In this chapter we introduce simplifications
    14 In this chapter we introduce simplifications
    15 on annotated regular expressions that can be applied to 
    15 for annotated regular expressions that can be applied to 
    16 each intermediate derivative result. This allows
    16 each intermediate derivative result. This allows
    17 us to make $\blexer$ much more efficient.
    17 us to make $\blexer$ much more efficient.
    18 Sulzmann and Lu already had some bit-coded simplifications,
    18 Sulzmann and Lu already introduced some simplifications for bitcoded regular expressions,
    19 but their simplification functions  were inefficient.
    19 but their simplification functions  were inefficient and in some cases needs fixing.
    20 We contrast our simplification function 
    20 %We contrast our simplification function 
    21 with Sulzmann and Lu's, indicating the simplicity of our algorithm.
    21 %with Sulzmann and Lu's, indicating the simplicity of our algorithm.
    22 This is another case for the usefulness 
    22 %This is another case for the usefulness 
    23 and reliability of formal proofs on algorithms.
    23 %and reliability of formal proofs on algorithms.
    24 These ``aggressive'' simplifications would not be possible in the injection-based 
    24 %These ``aggressive'' simplifications would not be possible in the injection-based 
    25 lexing we introduced in chapter \ref{Inj}.
    25 %lexing we introduced in chapter \ref{Inj}.
    26 We then prove the correctness with the improved version of 
    26 %We then prove the correctness with the improved version of 
    27 $\blexer$, called $\blexersimp$, by establishing 
    27 %$\blexer$, called $\blexersimp$, by establishing 
    28 $\blexer \; r \; s= \blexersimp \; r \; s$ using a term rewriting system.
    28 %$\blexer \; r \; s= \blexersimp \; r \; s$ using a term rewriting system.
    29 
    29 %
    30 \section{Simplifications by Sulzmann and Lu}
    30 \section{Simplifications by Sulzmann and Lu}
    31 Consider the derivatives of examples such as $(a^*a^*)^*$
    31 Consider the derivatives of the following example $(a^*a^*)^*$:
    32 and $(a^* + (aa)^*)^*$:
    32 %and $(a^* + (aa)^*)^*$:
    33 \begin{center}
    33 \begin{center}
    34 	$(a^*a^*)^* \stackrel{\backslash a}{\longrightarrow} (a^*a^* + a^*)\cdot(a^*a^*)^* \stackrel{\backslash a}{\longrightarrow} $\\
    34 	\begin{tabular}{lcl}
    35 	$((a^*a^* + a^*) + a^*)\cdot(a^*a^*)^* + (a^*a^* + a^*)\cdot(a^*a^*)^* \stackrel{\backslash a}{\longrightarrow} \ldots$
    35 		$(a^*a^*)^*$ & $ \stackrel{\backslash a}{\longrightarrow}$ & 
    36 \end{center}
    36 		$ (a^*a^* + a^*)\cdot(a^*a^*)^*$\\
    37 \noindent
    37 			     & 
    38 As can be seen, there is a lot of duplication 
    38 		$ \stackrel{\backslash a}{\longrightarrow} $ & 
    39 in the example we have already mentioned in 
    39 	$((a^*a^* + a^*) + a^*)\cdot(a^*a^*)^* + (a^*a^* + a^*)\cdot(a^*a^*)^*$\\
    40 \ref{eqn:growth2}.
    40 							     & $\stackrel{\backslash a}{
       
    41 	\longrightarrow} $ & $\ldots$\\
       
    42 	\end{tabular}
       
    43 \end{center}
       
    44 \noindent
       
    45 As can be seen, there are serveral duplications.
    41 A simple-minded simplification function cannot simplify
    46 A simple-minded simplification function cannot simplify
    42 the third regular expression in the above chain of derivative
    47 the third regular expression in the above chain of derivative
    43 regular expressions, namely
    48 regular expressions, namely
    44 \begin{center}
    49 \begin{center}
    45 $((a^*a^* + a^*) + a^*)\cdot(a^*a^*)^* + (a^*a^* + a^*)\cdot(a^*a^*)^*$
    50 $((a^*a^* + a^*) + a^*)\cdot(a^*a^*)^* + (a^*a^* + a^*)\cdot(a^*a^*)^*$
    50 One would expect a better simplification function to work in the 
    55 One would expect a better simplification function to work in the 
    51 following way:
    56 following way:
    52 \begin{gather*}
    57 \begin{gather*}
    53 	((a^*a^* + \underbrace{a^*}_\text{A})+\underbrace{a^*}_\text{duplicate of A})\cdot(a^*a^*)^* + 
    58 	((a^*a^* + \underbrace{a^*}_\text{A})+\underbrace{a^*}_\text{duplicate of A})\cdot(a^*a^*)^* + 
    54 	\underbrace{(a^*a^* + a^*)\cdot(a^*a^*)^*}_\text{further simp removes this}.\\
    59 	\underbrace{(a^*a^* + a^*)\cdot(a^*a^*)^*}_\text{further simp removes this}.\\
    55 	\bigg\downarrow \\
    60 	\bigg\downarrow (1) \\
    56 	(a^*a^* + a^* 
    61 	(a^*a^* + a^* 
    57 	\color{gray} + a^* \color{black})\cdot(a^*a^*)^* + 
    62 	\color{gray} + a^* \color{black})\cdot(a^*a^*)^* + 
    58 	\underbrace{(a^*a^* + a^*)\cdot(a^*a^*)^*}_\text{further simp removes this} \\
    63 	\underbrace{(a^*a^* + a^*)\cdot(a^*a^*)^*}_\text{further simp removes this} \\
    59 	\bigg\downarrow \\
    64 	\bigg\downarrow (2) \\
    60 	(a^*a^* + a^* 
    65 	(a^*a^* + a^* 
    61 	)\cdot(a^*a^*)^*  
    66 	)\cdot(a^*a^*)^*  
    62 	\color{gray} + (a^*a^* + a^*) \cdot(a^*a^*)^*\\
    67 	\color{gray} + (a^*a^* + a^*) \cdot(a^*a^*)^*\\
    63 	\bigg\downarrow \\
    68 	\bigg\downarrow (3) \\
    64 	(a^*a^* + a^* 
    69 	(a^*a^* + a^* 
    65 	)\cdot(a^*a^*)^*  
    70 	)\cdot(a^*a^*)^*  
    66 \end{gather*}
    71 \end{gather*}
    67 \noindent
    72 \noindent
    68 In the first step, the nested alternative regular expression
    73 In the first step, the nested alternative regular expression
    69 $(a^*a^* + a^*) + a^*$ is flattened into $a^*a^* + a^* + a^*$.
    74 $(a^*a^* + a^*) + a^*$ is flattened into $a^*a^* + a^* + a^*$.
    70 Now the third term $a^*$ is clearly identified as a duplicate
    75 Now the third term $a^*$ can clearly be identified as a duplicate
    71 and therefore removed in the second step. This causes the two
    76 and therefore removed in the second step. 
       
    77 This causes the two
    72 top-level terms to become the same and the second $(a^*a^*+a^*)\cdot(a^*a^*)^*$ 
    78 top-level terms to become the same and the second $(a^*a^*+a^*)\cdot(a^*a^*)^*$ 
    73 removed in the final step.\\
    79 removed in the final step.
    74 This motivating example is from testing Sulzmann and Lu's 
    80 Sulzmann and Lu's simplification function (using our notations) can achieve this
    75 algorithm: their simplification does 
    81 simplification:
    76 not work!
       
    77 Consider their simplification (using our notations):
       
    78 \begin{center}
    82 \begin{center}
    79 	\begin{tabular}{lcl}
    83 	\begin{tabular}{lcl}
    80 		$\simpsulz \; _{bs}(_{bs'}\ONE \cdot r)$ & $\dn$ & 
    84 		$\textit{simp}\_{SL} \; _{bs}(_{bs'}\ONE \cdot r)$ & $\dn$ & 
    81 		$\textit{if} \; (\textit{zeroable} \; r)\; \textit{then} \;\; \ZERO$\\
    85 		$\textit{if} \; (\textit{zeroable} \; r)\; \textit{then} \;\; \ZERO$\\
    82 						   & &$\textit{else}\;\; \fuse \; (bs@ bs') \; r$\\
    86 						   & &$\textit{else}\;\; \fuse \; (bs@ bs') \; r$\\
    83 		$\simpsulz \;(_{bs}r_1\cdot r_2)$ & $\dn$ & $\textit{if} 
    87 		$\textit{simp}\_{SL} \;(_{bs}r_1\cdot r_2)$ & $\dn$ & $\textit{if} 
    84 		\; (\textit{zeroable} \; r_1 \; \textit{or} \; \textit{zeroable}\; r_2)\;
    88 		\; (\textit{zeroable} \; r_1 \; \textit{or} \; \textit{zeroable}\; r_2)\;
    85 		\textit{then} \;\; \ZERO$\\
    89 		\textit{then} \;\; \ZERO$\\
    86 					     & & $\textit{else}\;\;_{bs}((\simpsulz \;r_1)\cdot
    90 							    & & $\textit{else}\;\;_{bs}((\textit{simp}\_{SL} \;r_1)\cdot
    87 					     (\simpsulz \; r_2))$\\
    91 							    (\textit{simp}\_{SL} \; r_2))$\\
    88 		$\simpsulz  \; _{bs}\sum []$ & $\dn$ & $\ZERO$\\
    92 		$\textit{simp}\_{SL}  \; _{bs}\sum []$ & $\dn$ & $\ZERO$\\
    89 		$\simpsulz  \; _{bs}\sum ((_{bs'}\sum rs_1) :: rs_2)$ & $\dn$ &
    93 		$\textit{simp}\_{SL}  \; _{bs}\sum ((_{bs'}\sum rs_1) :: rs_2)$ & $\dn$ &
    90 		$_{bs}\sum ((\map \; (\fuse \; bs')\; rs_1) @ rs_2)$\\
    94 		$_{bs}\sum ((\map \; (\fuse \; bs')\; rs_1) @ rs_2)$\\
    91 		$\simpsulz  \; _{bs}\sum[r]$ & $\dn$ & $\fuse \; bs \; (\simpsulz  \; r)$\\
    95 		$\textit{simp}\_{SL}  \; _{bs}\sum[r]$ & $\dn$ & $\fuse \; bs \; (\textit{simp}\_{SL}  \; r)$\\
    92 		$\simpsulz  \; _{bs}\sum(r::rs)$ & $\dn$ & $_{bs}\sum 
    96 		$\textit{simp}\_{SL}  \; _{bs}\sum(r::rs)$ & $\dn$ & $_{bs}\sum 
    93 		(\nub \; (\filter \; (\not \circ \zeroable)\;((\simpsulz  \; r) :: \map \; \simpsulz  \; rs)))$\\ 
    97 		(\nub \; (\filter \; (\not \circ \zeroable)\;((\textit{simp}\_{SL}  \; r) :: \map \; \textit{simp}\_{SL}  \; rs)))$\\ 
    94 		
    98 		
    95 	\end{tabular}
    99 	\end{tabular}
    96 \end{center}
   100 \end{center}
    97 \noindent
   101 \noindent
    98 where the $\textit{zeroable}$ predicate 
   102 The $\textit{zeroable}$ predicate 
    99 tests whether the regular expression
   103 tests whether the regular expression
   100 is equivalent to $\ZERO$,
   104 is equivalent to $\ZERO$, and
   101 can be defined as:
   105 can be defined as:
   102 \begin{center}
   106 \begin{center}
   103 	\begin{tabular}{lcl}
   107 	\begin{tabular}{lcl}
   104 		$\zeroable \; _{bs}\sum (r::rs)$ & $\dn$ & $\zeroable \; r\;\; \land \;\;
   108 		$\zeroable \; _{bs}\sum (r::rs)$ & $\dn$ & $\zeroable \; r\;\; \land \;\;
   105 		\zeroable \;_{[]}\sum\;rs $\\
   109 		\zeroable \;_{[]}\sum\;rs $\\
   109 		$\zeroable\;_{bs}\ONE$ & $\dn$ & $\textit{false}$\\
   113 		$\zeroable\;_{bs}\ONE$ & $\dn$ & $\textit{false}$\\
   110 		$\zeroable\;_{bs}\ZERO$ & $\dn$ & $\textit{true}$
   114 		$\zeroable\;_{bs}\ZERO$ & $\dn$ & $\textit{true}$
   111 	\end{tabular}
   115 	\end{tabular}
   112 \end{center}
   116 \end{center}
   113 \noindent
   117 \noindent
   114 They suggested that the $\simpsulz $ function should be
   118 The 
       
   119 \begin{center}
       
   120 	\begin{tabular}{lcl}
       
   121 		$\textit{simp}\_{SL}  \; _{bs}\sum ((_{bs'}\sum rs_1) :: rs_2)$ & $\dn$ &
       
   122 		$_{bs}\sum ((\map \; (\fuse \; bs')\; rs_1) @ rs_2)$\\
       
   123 	\end{tabular}
       
   124 \end{center}
       
   125 \noindent
       
   126 clause does flatten the alternative as required in step (1),
       
   127 but $\textit{simp}\_{SL}$ is insufficient if we want to do steps (2) and (3),
       
   128 as these ``identical'' terms have different bit-annotations.
       
   129 They also suggested that the $\textit{simp}\_{SL} $ function should be
   115 applied repeatedly until a fixpoint is reached.
   130 applied repeatedly until a fixpoint is reached.
   116 We call this construction $\textit{sulzSimp}$:
   131 We call this construction $\textit{SLSimp}$:
   117 \begin{center}
   132 \begin{center}
   118 	\begin{tabular}{lcl}
   133 	\begin{tabular}{lcl}
   119 		$\textit{sulzSimp} \; r$ & $\dn$ & 
   134 		$\textit{SLSimp} \; r$ & $\dn$ & 
   120 		$\textit{while}((\simpsulz  \; r)\; \cancel{=} \; r)$ \\
   135 		$\textit{while}((\textit{simp}\_{SL}  \; r)\; \cancel{=} \; r)$ \\
   121 		& & $\quad r := \simpsulz  \; r$\\
   136 					 & & $\quad r := \textit{simp}\_{SL}  \; r$\\
   122 		& & $\textit{return} \; r$
   137 		& & $\textit{return} \; r$
   123 	\end{tabular}
   138 	\end{tabular}
   124 \end{center}
   139 \end{center}
   125 We call the operation of alternatingly 
   140 We call the operation of alternatingly 
   126 applying derivatives and simplifications
   141 applying derivatives and simplifications
   127 (until the string is exhausted) Sulz-simp-derivative,
   142 (until the string is exhausted) Sulz-simp-derivative,
   128 written $\backslash_{sulzSimp}$:
   143 written $\backslash_{SLSimp}$:
   129 \begin{center}
   144 \begin{center}
   130 \begin{tabular}{lcl}
   145 \begin{tabular}{lcl}
   131 	$r \backslash_{sulzSimp} (c\!::\!s) $ & $\dn$ & $(\textit{sulzSimp} \; (r \backslash c)) \backslash_{sulzSimp}\, s$ \\
   146 	$r \backslash_{SLSimp} (c\!::\!s) $ & $\dn$ & $(\textit{SLSimp} \; (r \backslash c)) \backslash_{SLSimp}\, s$ \\
   132 $r \backslash_{sulzSimp} [\,] $ & $\dn$ & $r$
   147 $r \backslash_{SLSimp} [\,] $ & $\dn$ & $r$
   133 \end{tabular}
   148 \end{tabular}
   134 \end{center}
   149 \end{center}
   135 \noindent
   150 \noindent
   136 After the derivatives have been taken, the bitcodes
   151 After the derivatives have been taken, the bitcodes
   137 are extracted and decoded in the same manner
   152 are extracted and decoded in the same manner
   138 as $\blexer$:
   153 as $\blexer$:
   139 \begin{center}
   154 \begin{center}
   140 \begin{tabular}{lcl}
   155 \begin{tabular}{lcl}
   141   $\textit{blexer\_sulzSimp}\;r\,s$ & $\dn$ &
   156   $\textit{blexer\_SLSimp}\;r\,s$ & $\dn$ &
   142       $\textit{let}\;a = (r^\uparrow)\backslash_{sulzSimp}\, s\;\textit{in}$\\                
   157       $\textit{let}\;a = (r^\uparrow)\backslash_{SLSimp}\, s\;\textit{in}$\\                
   143   & & $\;\;\textit{if}\; \textit{bnullable}(a)$\\
   158   & & $\;\;\textit{if}\; \textit{bnullable}(a)$\\
   144   & & $\;\;\textit{then}\;\textit{decode}\,(\textit{bmkeps}\,a)\,r$\\
   159   & & $\;\;\textit{then}\;\textit{decode}\,(\textit{bmkeps}\,a)\,r$\\
   145   & & $\;\;\textit{else}\;\textit{None}$
   160   & & $\;\;\textit{else}\;\textit{None}$
   146 \end{tabular}
   161 \end{tabular}
   147 \end{center}
   162 \end{center}
   148 \noindent
   163 \noindent
   149 We implemented this lexing algorithm in Scala, 
   164 We implemented this lexing algorithm in Scala, 
   150 and found that the final derivative regular expression
   165 and found that the final derivative regular expression
   151 size grows exponentially fast:
   166 size grows exponentially fast (note the logarithmic scale):
   152 \begin{figure}[H]
   167 \begin{figure}[H]
   153 	\centering
   168 	\centering
   154 \begin{tikzpicture}
   169 \begin{tikzpicture}
   155 \begin{axis}[
   170 \begin{axis}[
   156     xlabel={$n$},
   171     xlabel={$n$},
   167 $ using Sulzmann and Lu's lexer}\label{SulzmannLuLexer}
   182 $ using Sulzmann and Lu's lexer}\label{SulzmannLuLexer}
   168 \end{figure}
   183 \end{figure}
   169 \noindent
   184 \noindent
   170 At $n= 20$ we already get an out of memory error with Scala's normal 
   185 At $n= 20$ we already get an out of memory error with Scala's normal 
   171 JVM heap size settings.
   186 JVM heap size settings.
   172 In fact their simplification does not improve over
   187 In fact their simplification does not improve much over
   173 the simple-minded simplifications we have shown in \ref{fig:BetterWaterloo}.
   188 the simple-minded simplifications we have shown in \ref{fig:BetterWaterloo}.
   174 The time required also grows exponentially:
   189 The time required also grows exponentially:
   175 \begin{figure}[H]
   190 \begin{figure}[H]
   176 	\centering
   191 	\centering
   177 \begin{tikzpicture}
   192 \begin{tikzpicture}
   197 simp, fuse, mkEpsBC and isPhi leads to subcalls whose number is bound by the size of the regular expression involved. We claim that thanks to aggressively applying simp this size remains finite. Hence, we can argue that the above mentioned functions/operations have constant time complexity which implies that we can incrementally compute bit-coded parse trees in linear time in the size of the input. 
   212 simp, fuse, mkEpsBC and isPhi leads to subcalls whose number is bound by the size of the regular expression involved. We claim that thanks to aggressively applying simp this size remains finite. Hence, we can argue that the above mentioned functions/operations have constant time complexity which implies that we can incrementally compute bit-coded parse trees in linear time in the size of the input. 
   198 \end{quote}
   213 \end{quote}
   199 \noindent
   214 \noindent
   200 The assumption that the size of the regular expressions
   215 The assumption that the size of the regular expressions
   201 in the algorithm
   216 in the algorithm
   202 would stay below a finite constant is not ture.
   217 would stay below a finite constant is not true, not in the
   203 The main reason behind this is that (i) The $\textit{nub}$
   218 examples we considered.
       
   219 The main reason behind this is that (i) the $\textit{nub}$
   204 function requires identical annotations between two 
   220 function requires identical annotations between two 
   205 annotated regular expressions to qualify as duplicates,
   221 annotated regular expressions to qualify as duplicates,
   206 and cannot simplify the cases like $_{SZZ}a^*+_{SZS}a^*$
   222 and therefore cannot simplify cases like $_{SZZ}a^*+_{SZS}a^*$
   207 even if both $a^*$ denote the same language.
   223 even if both $a^*$ denote the same language, and
   208 (ii) The ``flattening'' only applies to the head of the list
   224 (ii) the ``flattening'' only applies to the head of the list
   209 in the 
   225 in the 
   210 \begin{center}
   226 \begin{center}
   211 	\begin{tabular}{lcl}
   227 	\begin{tabular}{lcl}
   212 
   228 
   213 		$\simpsulz  \; _{bs}\sum ((_{bs'}\sum rs_1) :: rs_2)$ & $\dn$ &
   229 		$\textit{simp}\_{SL}  \; _{bs}\sum ((_{bs'}\sum rs_1) :: rs_2)$ & $\dn$ &
   214 		$_{bs}\sum ((\map \; (\fuse \; bs')\; rs_1) @ rs_2)$\\
   230 		$_{bs}\sum ((\map \; (\fuse \; bs')\; rs_1) @ rs_2)$\\
   215 	\end{tabular}
   231 	\end{tabular}
   216 \end{center}
   232 \end{center}
   217 \noindent
   233 \noindent
   218 clause, and therefore is not thorough enough to simplify all
   234 clause, and therefore is not strong enough to simplify all
   219 needed parts of the regular expression.\\
   235 needed parts of the regular expression. Moreover, even if the regular expressions size
   220 In addition to that, even if the regular expressions size
       
   221 do stay finite, one has to take into account that
   236 do stay finite, one has to take into account that
   222 the $\simpsulz$ function is applied many times
   237 the $\textit{simp}\_{SL}$ function is applied many times
   223 in each derivative step, and that number is not necessarily
   238 in each derivative step, and that number is not necessarily
   224 a constant with respect to the size of the regular expression.
   239 a constant with respect to the size of the regular expression.
   225 To not get ``caught off guard'' by
   240 %To not get ``caught off guard'' by
   226 these counterexamples,
   241 %these counterexamples,
   227 one needs to be more careful when designing the
   242 %one needs to be more careful when designing the
   228 simplification function and making claims about them.
   243 %simplification function and making claims about them.
   229 
   244 
   230 \section{Our $\textit{Simp}$ Function}
   245 \section{Our $\textit{Simp}$ Function}
   231 We will now introduce our simplification function,
   246 We will now introduce our simplification function.
   232 by making a contrast with $\simpsulz$.
   247 %by making a contrast with $\textit{simp}\_{SL}$.
   233 We describe
   248 We also describe
   234 the ideas behind components in their algorithm 
   249 the ideas behind Sulzmann and Lu's $\textit{simp}\_{SL}$
   235 and why they fail to achieve the desired effect, followed
   250 algorithm 
   236 by our solution. These solutions come with correctness
   251 and why it fails to achieve the desired effect. 
   237 statements that are backed up by formal proofs.
   252 Our simplification function comes with a formal
       
   253 correctness proof.
   238 \subsection{Flattening Nested Alternatives}
   254 \subsection{Flattening Nested Alternatives}
   239 The idea behind the clause
   255 The idea behind the clause
   240 \begin{center}
   256 \begin{center}
   241 $\simpsulz  \; _{bs}\sum ((_{bs'}\sum rs_1) :: rs_2) \quad \dn \quad
   257 	$\textit{simp}\_{SL}  \; _{bs}\sum ((_{bs'}\sum rs_1) :: rs_2) \quad \dn \quad
   242 	       _{bs}\sum ((\map \; (\fuse \; bs')\; rs_1) @ rs_2)$
   258 	       _{bs}\sum ((\map \; (\fuse \; bs')\; rs_1) @ rs_2)$
   243 \end{center}
   259 \end{center}
   244 is that it allows
   260 is that it allows
   245 duplicate removal of regular expressions at different
   261 duplicate removal of regular expressions at different
   246 ``levels'' of alternatives.
   262 ``levels'' of alternatives.
   248 following simplification:
   264 following simplification:
   249 
   265 
   250 \begin{center}
   266 \begin{center}
   251 $(a+r)+r \longrightarrow a+r$
   267 $(a+r)+r \longrightarrow a+r$
   252 \end{center}
   268 \end{center}
   253 The problem here is that only the head element
   269 The problem is that only the head element
   254 is ``spilled out'',
   270 is ``spilled out''.
   255 whereas we would want to flatten
   271 It is more desirable
   256 an entire list to open up possibilities for further simplifications.
   272 to flatten
       
   273 an entire list to open up possibilities for further simplifications
       
   274 with later regular expressions.
   257 Not flattening the rest of the elements also means that
   275 Not flattening the rest of the elements also means that
   258 the later de-duplication processs 
   276 the later de-duplication processs 
   259 does not fully remove further duplicates.
   277 does not fully remove further duplicates.
   260 For example,
   278 For example,
   261 using $\simpsulz$ we could not 
   279 using $\textit{simp}\_{SL}$ we cannot
   262 simplify
   280 simplify
   263 \begin{center}
   281 \begin{center}
   264 	$((a^* a^*)+\underline{(a^* + a^*)})\cdot (a^*a^*)^*+
   282 	$((a^* a^*)+\underline{(a^* + a^*)})\cdot (a^*a^*)^*+
   265 ((a^*a^*)+a^*)\cdot (a^*a^*)^*$
   283 ((a^*a^*)+a^*)\cdot (a^*a^*)^*$
   266 \end{center}
   284 \end{center}
   267 due to the underlined part not in the first element
   285 due to the underlined part not being in the head 
   268 of the alternative.\\
   286 of the alternative.
   269 We define a flatten operation that flattens not only 
   287 
   270 the first regular expression of an alternative,
   288 We define our flatten operation so that it flattens 
   271 but the entire list: 
   289 the entire list: 
   272  \begin{center}
   290  \begin{center}
   273   \begin{tabular}{@{}lcl@{}}
   291   \begin{tabular}{@{}lcl@{}}
   274   $\textit{flts} \; (_{bs}\sum \textit{as}) :: \textit{as'}$ & $\dn$ & $(\textit{map} \;
   292   $\textit{flts} \; (_{bs}\sum \textit{as}) :: \textit{as'}$ & $\dn$ & $(\textit{map} \;
   275      (\textit{fuse}\;bs)\; \textit{as}) \; @ \; \textit{flts} \; as' $ \\
   293      (\textit{fuse}\;bs)\; \textit{as}) \; @ \; \textit{flts} \; as' $ \\
   276   $\textit{flts} \; \ZERO :: as'$ & $\dn$ & $ \textit{flts} \;  \textit{as'} $ \\
   294   $\textit{flts} \; \ZERO :: as'$ & $\dn$ & $ \textit{flts} \;  \textit{as'} $ \\
   280 \noindent
   298 \noindent
   281 Our $\flts$ operation 
   299 Our $\flts$ operation 
   282 also throws away $\ZERO$s
   300 also throws away $\ZERO$s
   283 as they do not contribute to a lexing result.
   301 as they do not contribute to a lexing result.
   284 \subsection{Duplicate Removal}
   302 \subsection{Duplicate Removal}
   285 After flattening is done, we are ready to deduplicate.
   303 After flattening is done, we can deduplicate.
   286 The de-duplicate function is called $\distinctBy$,
   304 The de-duplicate function is called $\distinctBy$,
   287 and that is where we make our second improvement over
   305 and that is where we make our second improvement over
   288 Sulzmann and Lu's.
   306 Sulzmann and Lu's simplification method.
   289 The process goes as follows:
   307 The process goes as follows:
   290 \begin{center}
   308 \begin{center}
   291 $rs \stackrel{\textit{flts}}{\longrightarrow} 
   309 $rs \stackrel{\textit{flts}}{\longrightarrow} 
   292 rs_{flat} 
   310 rs_{flat} 
   293 \xrightarrow{\distinctBy \; 
   311 \xrightarrow{\distinctBy \; 
   304 	\end{tabular}
   322 	\end{tabular}
   305 \end{center}
   323 \end{center}
   306 \noindent
   324 \noindent
   307 The reason we define a distinct function under a mapping $f$ is because
   325 The reason we define a distinct function under a mapping $f$ is because
   308 we want to eliminate regular expressions that are syntactically the same,
   326 we want to eliminate regular expressions that are syntactically the same,
   309 but with different bit-codes.
   327 but have different bit-codes.
   310 For example, we can remove the second $a^*a^*$ from
   328 For example, we can remove the second $a^*a^*$ from
   311 $_{ZSZ}a^*a^* + _{SZZ}a^*a^*$, because it
   329 $_{ZSZ}a^*a^* + _{SZZ}a^*a^*$, because it
   312 represents a match with shorter initial sub-match 
   330 represents a match with shorter initial sub-match 
   313 (and therefore is definitely not POSIX),
   331 (and therefore is definitely not POSIX),
   314 and will be discarded by $\bmkeps$ later.
   332 and will be discarded by $\bmkeps$ later.
   325 regular expressions we have already seen
   343 regular expressions we have already seen
   326 will definitely not contribute to a POSIX value,
   344 will definitely not contribute to a POSIX value,
   327 even if they are attached with different bitcodes.
   345 even if they are attached with different bitcodes.
   328 These duplicates therefore need to be removed.
   346 These duplicates therefore need to be removed.
   329 To achieve this, we call $\rerases$ as the function $f$ during the distinction
   347 To achieve this, we call $\rerases$ as the function $f$ during the distinction
   330 operation.\\
   348 operation. The function
   331 $\rerases$ is very similar to $\erase$, except that it preserves the structure
   349 $\rerases$ is very similar to $\erase$, except that it preserves the structure
   332 when erasing an alternative regular expression.
   350 when erasing an alternative regular expression.
   333 The reason why we use $\rerases$ instead of $\erase$ is that
   351 The reason why we use $\rerases$ instead of $\erase$ is that
   334 it keeps the structures of alternative 
   352 it keeps the structures of alternative 
   335 annotated regular expressions
   353 annotated regular expressions
   336 whereas $\erase$ would turn it back into a binary structure.
   354 whereas $\erase$ would turn it back into a binary  tree structure.
   337 Not having to mess with the structure 
   355 Not having to mess with the structure 
   338 greatly simplifies the finiteness proof in chapter 
   356 greatly simplifies the finiteness proof in chapter 
   339 \ref{Finite} (we will follow up with more details there).
   357 \ref{Finite}.
   340 We give the definitions of $\rerases$ here together with
   358 We give the definitions of $\rerases$ here together with
   341 the new datatype used by $\rerases$ (as our plain
   359 the new datatype used by $\rerases$ (as our plain
   342 regular expression datatype does not allow non-binary alternatives).
   360 regular expression datatype does not allow non-binary alternatives).
   343 For the moment the reader can just think of 
   361 For now we can think of 
   344 $\rerases$ as $\erase$ and $\rrexp$ as plain regular expressions.
   362 $\rerases$ as the function $(\_)_\downarrow$ defined in chapter \ref{Bitcoded1}
       
   363 and $\rrexp$ as plain regular expressions, but having a general list constructor
       
   364 for alternatives:
   345 \begin{figure}[H]
   365 \begin{figure}[H]
   346 \begin{center}	
   366 \begin{center}	
   347 	$\rrexp ::=   \RZERO \mid  \RONE
   367 	$\rrexp ::=   \RZERO \mid  \RONE
   348 			 \mid  \RCHAR{c}  
   368 			 \mid  \RCHAR{c}  
   349 			 \mid  \RSEQ{r_1}{r_2}
   369 			 \mid  \RSEQ{r_1}{r_2}
   351 			 \mid \RSTAR{r}        $
   371 			 \mid \RSTAR{r}        $
   352 \end{center}
   372 \end{center}
   353 \caption{$\rrexp$: plain regular expressions, but with $\sum$ alternative 
   373 \caption{$\rrexp$: plain regular expressions, but with $\sum$ alternative 
   354 constructor}\label{rrexpDef}
   374 constructor}\label{rrexpDef}
   355 \end{figure}
   375 \end{figure}
   356 The notation of $\rerases$ also follows that of $\erase$,
   376 The function $\rerases$ we define as follows:
   357 which is a postfix operator written as a subscript,
       
   358 except that it has an \emph{r} attached to it to distinguish against $\erase$:
       
   359 \begin{center}
   377 \begin{center}
   360 \begin{tabular}{lcl}
   378 \begin{tabular}{lcl}
   361 $\rerase{\ZERO}$ & $\dn$ & $\RZERO$\\
   379 $\rerase{\ZERO}$ & $\dn$ & $\RZERO$\\
   362 $\rerase{_{bs}\ONE}$ & $\dn$ & $\RONE$\\
   380 $\rerase{_{bs}\ONE}$ & $\dn$ & $\RONE$\\
   363 	$\rerase{_{bs}\mathbf{c}}$ & $\dn$ & $\RCHAR{c}$\\
   381 	$\rerase{_{bs}\mathbf{c}}$ & $\dn$ & $\RCHAR{c}$\\
   366 $\rerase{_{bs} a ^*}$ & $\dn$ & $\rerase{a}^*$
   384 $\rerase{_{bs} a ^*}$ & $\dn$ & $\rerase{a}^*$
   367 \end{tabular}
   385 \end{tabular}
   368 \end{center}
   386 \end{center}
   369 
   387 
   370 \subsection{Putting Things Together}
   388 \subsection{Putting Things Together}
   371 A recursive definition of our  simplification function 
   389 We can now give the definition of our  simplification function:
   372 is given below:
       
   373 %that looks somewhat similar to our Scala code is 
   390 %that looks somewhat similar to our Scala code is 
   374 \begin{center}
   391 \begin{center}
   375   \begin{tabular}{@{}lcl@{}}
   392   \begin{tabular}{@{}lcl@{}}
   376    
   393    
   377 	  $\textit{bsimp} \; (_{bs}a_1\cdot a_2)$ & $\dn$ & $ \textit{bsimp}_{ASEQ} \; bs \;(\textit{bsimp} \; a_1) \; (\textit{bsimp}  \; a_2)  $ \\
   394 	  $\textit{bsimp} \; (_{bs}a_1\cdot a_2)$ & $\dn$ & $ \textit{bsimp}_{ASEQ} \; bs \;(\textit{bsimp} \; a_1) \; (\textit{bsimp}  \; a_2)  $ \\
   402 The most involved part is the $\sum$ clause, where we first call $\flts$ on
   419 The most involved part is the $\sum$ clause, where we first call $\flts$ on
   403 the simplified children regular expression list $\textit{map}\; \textit{bsimp}\; \textit{as}$.
   420 the simplified children regular expression list $\textit{map}\; \textit{bsimp}\; \textit{as}$.
   404 and then call $\distinctBy$ on that list, the predicate determining whether two 
   421 and then call $\distinctBy$ on that list, the predicate determining whether two 
   405 elements are the same is $\rerases \; r_1 = \rerases\; r_2$.
   422 elements are the same is $\rerases \; r_1 = \rerases\; r_2$.
   406 Finally, depending on whether the regular expression list $as'$ has turned into a
   423 Finally, depending on whether the regular expression list $as'$ has turned into a
   407 singleton or empty list after $\flts$ and $\distinctBy$, $\textit{bsimp}_{AALTS}$
   424 singleton or empty list after $\flts$ and $\distinctBy$, $\textit{bsimp}_{ALTS}$
   408 decides whether to keep the current level constructor $\sum$ as it is, and 
   425 decides whether to keep the current level constructor $\sum$ as it is, and 
   409 removes it when there are less than two elements:
   426 removes it when there are less than two elements:
   410 \begin{center}
   427 \begin{center}
   411 	\begin{tabular}{lcl}
   428 	\begin{tabular}{lcl}
   412 		$\textit{bsimp}_{AALTS} \; bs \; as'$ & $ \dn$ & $ as' \; \textit{match}$\\		
   429 		$\textit{bsimp}_{ALTS} \; bs \; as'$ & $ \dn$ & $ as' \; \textit{match}$\\		
   413   &&$\quad\textit{case} \; [] \Rightarrow  \ZERO$ \\
   430   &&$\quad\textit{case} \; [] \Rightarrow  \ZERO$ \\
   414    &&$\quad\textit{case} \; a :: [] \Rightarrow  \textit{fuse bs a}$ \\
   431    &&$\quad\textit{case} \; a :: [] \Rightarrow  \textit{fuse bs a}$ \\
   415    &&$\quad\textit{case} \;  as' \Rightarrow _{bs}\sum \textit{as'}$\\ 
   432    &&$\quad\textit{case} \;  as' \Rightarrow _{bs}\sum \textit{as'}$\\ 
   416 	\end{tabular}
   433 	\end{tabular}
   417 	
   434 	
   418 \end{center}
   435 \end{center}
   419 Having defined the $\bsimp$ function,
   436 Having defined the $\textit{bsimp}$ function,
   420 we add it as a phase after a derivative is taken,
   437 we add it as a phase after a derivative is taken.
   421 so it stays small:
       
   422 \begin{center}
   438 \begin{center}
   423 	\begin{tabular}{lcl}
   439 	\begin{tabular}{lcl}
   424 		$r \backslash_{bsimp} s$ & $\dn$ & $\textit{bsimp}(r \backslash s)$
   440 		$r \backslash_{bsimp} s$ & $\dn$ & $\textit{bsimp}(r \backslash s)$
   425 	\end{tabular}
   441 	\end{tabular}
   426 \end{center}
   442 \end{center}
   427 %Following previous notations
   443 %Following previous notations
   428 %when extending from derivatives w.r.t.~character to derivative
   444 %when extending from derivatives w.r.t.~character to derivative
   429 %w.r.t.~string, we define the derivative that nests simplifications 
   445 %w.r.t.~string, we define the derivative that nests simplifications 
   430 %with derivatives:%\comment{simp in  the [] case?}
   446 %with derivatives:%\comment{simp in  the [] case?}
   431 We extend this from character to string:
   447 We extend this from characters to strings:
   432 \begin{center}
   448 \begin{center}
   433 \begin{tabular}{lcl}
   449 \begin{tabular}{lcl}
   434 $r \backslash_{bsimps} (c\!::\!s) $ & $\dn$ & $(r \backslash_{bsimp}\, c) \backslash_{bsimps}\, s$ \\
   450 $r \backslash_{bsimps} (c\!::\!s) $ & $\dn$ & $(r \backslash_{bsimp}\, c) \backslash_{bsimps}\, s$ \\
   435 $r \backslash_{bsimps} [\,] $ & $\dn$ & $r$
   451 $r \backslash_{bsimps} [\,] $ & $\dn$ & $r$
   436 \end{tabular}
   452 \end{tabular}
   456 
   472 
   457 \subsection{Examples $(a+aa)^*$ and $(a^*\cdot a^*)^*$
   473 \subsection{Examples $(a+aa)^*$ and $(a^*\cdot a^*)^*$
   458 After Simplification}
   474 After Simplification}
   459 Recall the
   475 Recall the
   460 previous $(a^*a^*)^*$ example
   476 previous $(a^*a^*)^*$ example
   461 where $\simpsulz$ could not
   477 where $\textit{simp}\_{SL}$ could not
   462 prevent the fast growth (over
   478 prevent the fast growth (over
   463 3 million nodes just below $20$ input length)
   479 3 million nodes just below $20$ input length)
   464 will be reduced to just 15 and stays constant no matter how long the
   480 will be reduced to just 15 and stays constant no matter how long the
   465 input string is.
   481 input string is.
   466 This is shown in the graphs below.
   482 This is shown in the graphs below.
   484 \begin{axis}[
   500 \begin{axis}[
   485     xlabel={$n$},
   501     xlabel={$n$},
   486     ylabel={derivative size},
   502     ylabel={derivative size},
   487     width = 7cm,
   503     width = 7cm,
   488     height = 4cm,
   504     height = 4cm,
   489     legend entries={Lexer with $\simpsulz$},  
   505     legend entries={Lexer with $\textit{simp}\_{SL}$},  
   490     legend pos=  north west,
   506     legend pos=  north west,
   491     legend cell align=left]
   507     legend cell align=left]
   492 \addplot[red,mark=*, mark options={fill=white}] table {BetterWaterloo.data};
   508 \addplot[red,mark=*, mark options={fill=white}] table {BetterWaterloo.data};
   493 \end{axis}
   509 \end{axis}
   494 \end{tikzpicture} 
   510 \end{tikzpicture} 
   497 \caption{Our Improvement over Sulzmann and Lu's in terms of size}
   513 \caption{Our Improvement over Sulzmann and Lu's in terms of size}
   498 \end{figure}
   514 \end{figure}
   499 \noindent
   515 \noindent
   500 Given the size difference, it is not
   516 Given the size difference, it is not
   501 surprising that our $\blexersimp$ significantly outperforms
   517 surprising that our $\blexersimp$ significantly outperforms
   502 $\textit{blexer\_sulzSimp}$.
   518 $\textit{blexer\_SLSimp}$.
   503 In the next section we are going to establish the
   519 In the next section we are going to establish that our
   504 first important property of our lexer--the correctness.
   520 simplification preserves the correctness of the algorithm.
   505 %----------------------------------------------------------------------------------------
   521 %----------------------------------------------------------------------------------------
   506 %	SECTION rewrite relation
   522 %	SECTION rewrite relation
   507 %----------------------------------------------------------------------------------------
   523 %----------------------------------------------------------------------------------------
   508 \section{Correctness of $\blexersimp$}
   524 \section{Correctness of $\blexersimp$}
   509 In this section we give details
       
   510 of the correctness proof of $\blexersimp$,
       
   511 one of the contributions of this thesis.\\
       
   512 We first introduce the rewriting relation \emph{rrewrite}
   525 We first introduce the rewriting relation \emph{rrewrite}
   513 ($\rrewrite$) between two regular expressions,
   526 ($\rrewrite$) between two regular expressions,
   514 which expresses an atomic
   527 which stands for an atomic
   515 simplification.
   528 simplification.
   516 We then prove properties about
   529 We then prove properties about
   517 this rewriting relation and its reflexive transitive closure.
   530 this rewriting relation and its reflexive transitive closure.
   518 Finally we leverage these properties to show
   531 Finally we leverage these properties to show
   519 an equivalence between the internal data structures of 
   532 an equivalence between the results generated by
   520 $\blexer$ and $\blexersimp$.
   533 $\blexer$ and $\blexersimp$.
   521 
   534 
   522 \subsection{The Rewriting Relation $\rrewrite$($\rightsquigarrow$)}
   535 \subsection{The Rewriting Relation $\rrewrite$($\rightsquigarrow$)}
   523 In the $\blexer$'s correctness proof, we
   536 In the $\blexer$'s correctness proof, we
   524 did not directly derive the fact that $\blexer$ generates the POSIX value,
   537 did not directly derive the fact that $\blexer$ generates the POSIX value,
   525 but first proved that $\blexer$ is linked with $\lexer$.
   538 but first proved that $\blexer$ generates the same result as $\lexer$.
   526 Then we re-use
   539 Then we re-use
   527 the correctness of $\lexer$
   540 the correctness of $\lexer$
   528 to obtain
   541 to obtain 
   529 \begin{center}
   542 \begin{center}
   530 	$(r, s) \rightarrow v \;\; \textit{iff} \;\; \blexer \; r \;s = v$.
   543 	$(r, s) \rightarrow v \;\; \textit{iff} \;\; \blexer \; r \;s = v$\\
   531 \end{center}
   544 	$\nexists v. \; (r, s) \rightarrow v \;\; \textit{iff} \;\; \blexer\;
       
   545 	r\;s = \None$.
       
   546 \end{center}
       
   547 %\begin{center}
       
   548 %	$(r, s) \rightarrow v \;\; \textit{iff} \;\; \blexer \; r \;s = v$.
       
   549 %\end{center}
   532 Here we apply this
   550 Here we apply this
   533 modularised technique again
   551 modularised technique again
   534 by first proving that
   552 by first proving that
   535 $\blexersimp \; r \; s $ 
   553 $\blexersimp \; r \; s $ 
   536 produces the same output as $\blexer \; r\; s$,
   554 produces the same output as $\blexer \; r\; s$,
   537 and then piecing it together with 
   555 and then piecing it together with 
   538 $\blexer$'s correctness to achieve our main
   556 $\blexer$'s correctness to achieve our main
   539 theorem:\footnote{ The case when 
   557 theorem:
   540 $s$ is not in $L \; r$, is routine to establish.}
   558 \begin{center}
   541 \begin{center}
   559 	$(r, s) \rightarrow v \; \;   \textit{iff} \;\;  \blexersimp \; r \; s = \Some \;v$
   542 	$(r, s) \rightarrow v \; \;   \textit{iff} \;\;  \blexersimp \; r \; s = v$
   560 	\\
       
   561 	$\nexists v. \; (r, s) \rightarrow v \;\; \textit{iff} \;\; \blexersimp\;
       
   562 	r\;s = \None$
   543 \end{center}
   563 \end{center}
   544 \noindent
   564 \noindent
   545 The overall idea for the proof
   565 The overall idea for the proof
   546 of $\blexer \;r \;s = \blexersimp \; r \;s$ 
   566 of $\blexer \;r \;s = \blexersimp \; r \;s$ 
   547 is that the transition from $r$ to $\textit{bsimp}\; r$ can be
   567 is that the transition from $r$ to $\textit{bsimp}\; r$ can be
   548 broken down into finitely many rewrite steps:
   568 broken down into smaller rewrite steps of the form:
   549 \begin{center}
   569 \begin{center}
   550 	$r \rightsquigarrow^* \textit{bsimp} \; r$
   570 	$r \rightsquigarrow^* \textit{bsimp} \; r$
   551 \end{center}
   571 \end{center}
   552 where each rewrite step, written $\rightsquigarrow$,
   572 where each rewrite step, written $\rightsquigarrow$,
   553 is an ``atomic'' simplification that
   573 is an ``atomic'' simplification that
   554 is similar to a small-step reduction in operational semantics:
   574 is similar to a small-step reduction in operational semantics (
       
   575 see figure \ref{rrewriteRules} for the rules):
   555 \begin{figure}[H]
   576 \begin{figure}[H]
   556 \begin{mathpar}
   577 \begin{mathpar}
   557 	\inferrule * [Right = $S\ZERO_l$]{\vspace{0em}}{_{bs} \ZERO \cdot r_2 \rightsquigarrow \ZERO\\}
   578 	\inferrule * [Right = $S\ZERO_l$]{\vspace{0em}}{_{bs} \ZERO \cdot r_2 \rightsquigarrow \ZERO\\}
   558 
   579 
   559 	\inferrule * [Right = $S\ZERO_r$]{\vspace{0em}}{_{bs} r_1 \cdot \ZERO \rightsquigarrow \ZERO\\}
   580 	\inferrule * [Right = $S\ZERO_r$]{\vspace{0em}}{_{bs} r_1 \cdot \ZERO \rightsquigarrow \ZERO\\}
   594 to be removed provided a regular expression 
   615 to be removed provided a regular expression 
   595 earlier in the list can match the same strings.
   616 earlier in the list can match the same strings.
   596 }\label{rrewriteRules}
   617 }\label{rrewriteRules}
   597 \end{figure}
   618 \end{figure}
   598 \noindent
   619 \noindent
   599 The rules such as $LT$ and $LH$ are for rewriting between two regular expression lists
   620 The rules $LT$ and $LH$ are for rewriting two regular expression lists
   600 such that one regular expression
   621 such that one regular expression
   601 in the left-hand-side list is rewritable in one step
   622 in the left-hand-side list is rewritable in one step
   602 to the right-hand-side's regular expression at the same position.
   623 to the right-hand-side's regular expression at the same position.
   603 This helps with defining the ``context rules'' such as $AL$.\\
   624 This helps with defining the ``context rule'' $AL$.\\
   604 The reflexive transitive closure of $\rightsquigarrow$ and $\stackrel{s}{\rightsquigarrow}$
   625 The reflexive transitive closure of $\rightsquigarrow$ and $\stackrel{s}{\rightsquigarrow}$
   605 are defined in the usual way:
   626 are defined in the usual way:
   606 \begin{figure}[H]
   627 \begin{figure}[H]
   607 	\centering
   628 	\centering
   608 \begin{mathpar}
   629 \begin{mathpar}
   617 \caption{The Reflexive Transitive Closure of 
   638 \caption{The Reflexive Transitive Closure of 
   618 $\rightsquigarrow$ and $\stackrel{s}{\rightsquigarrow}$}\label{transClosure}
   639 $\rightsquigarrow$ and $\stackrel{s}{\rightsquigarrow}$}\label{transClosure}
   619 \end{figure}
   640 \end{figure}
   620 %Two rewritable terms will remain rewritable to each other
   641 %Two rewritable terms will remain rewritable to each other
   621 %even after a derivative is taken:
   642 %even after a derivative is taken:
   622 Rewriting is preserved under derivatives,
   643 The main point of our rewriting relation
       
   644 is that it is preserved under derivatives,
   623 namely
   645 namely
   624 \begin{center}
   646 \begin{center}
   625 	$r_1 \rightsquigarrow r_2 \implies (r_1 \backslash c) \rightsquigarrow^* (r_2 \backslash c)$
   647 	$r_1 \rightsquigarrow r_2 \implies (r_1 \backslash c) \rightsquigarrow^* (r_2 \backslash c)$
   626 \end{center}
   648 \end{center}
   627 And finally, if two terms are rewritable to each other,
   649 And also, if two terms are rewritable to each other,
   628 then they produce the same bitcodes:
   650 then they produce the same bitcodes:
   629 \begin{center}
   651 \begin{center}
   630 	$r \rightsquigarrow^* r' \;\; \textit{then} \; \; \bmkeps \; r = \bmkeps \; r'$
   652 	$r \rightsquigarrow^* r' \;\; \textit{then} \; \; \bmkeps \; r = \bmkeps \; r'$
   631 \end{center}
   653 \end{center}
   632 The decoding phase of both $\blexer$ and $\blexersimp$
   654 The decoding phase of both $\blexer$ and $\blexersimp$
   633 are the same, which means that if they get the same
   655 are the same, which means that if they receive the same
   634 bitcodes before the decoding phase,
   656 bitcodes before the decoding phase,
   635 they get the same value after decoding is done.
   657 they generate the same value after decoding is done.
   636 We will prove the three properties 
   658 We will prove the three properties 
   637 we mentioned above in the next sub-section.
   659 we mentioned above in the next sub-section.
   638 \subsection{Important Properties of $\rightsquigarrow$}
   660 \subsection{Important Properties of $\rightsquigarrow$}
   639 First we prove some basic facts 
   661 First we prove some basic facts 
   640 about $\rightsquigarrow$, $\stackrel{s}{\rightsquigarrow}$, 
   662 about $\rightsquigarrow$, $\stackrel{s}{\rightsquigarrow}$, 
   683 	which can be indutively proven by the inductive cases of $\rightsquigarrow$ and 
   705 	which can be indutively proven by the inductive cases of $\rightsquigarrow$ and 
   684 	$\stackrel{s}{\rightsquigarrow}$.
   706 	$\stackrel{s}{\rightsquigarrow}$.
   685 \end{proof}
   707 \end{proof}
   686 \noindent
   708 \noindent
   687 The inference rules of $\stackrel{s}{\rightsquigarrow}$
   709 The inference rules of $\stackrel{s}{\rightsquigarrow}$
   688 are defined in terms of list cons operation, here
   710 are defined in terms of the list cons operation, where
   689 we establish that the 
   711 we establish that the 
   690 $\stackrel{s}{\rightsquigarrow}$ and $\stackrel{s*}{\rightsquigarrow}$ 
   712 $\stackrel{s}{\rightsquigarrow}$ and $\stackrel{s*}{\rightsquigarrow}$ 
   691 relation is also preserved w.r.t appending and prepending of a list.
   713 relation is also preserved w.r.t appending and prepending of a list.
   692 In addition, we
   714 In addition, we
   693 also prove some relations 
   715 also prove some relations 
   715 		\item
   737 		\item
   716 		
   738 		
   717 			$rs_3 \stackrel{s*}{\rightsquigarrow} rs_4 \land r_1 \rightsquigarrow^* r_2 \implies
   739 			$rs_3 \stackrel{s*}{\rightsquigarrow} rs_4 \land r_1 \rightsquigarrow^* r_2 \implies
   718 			r_2 :: rs_3 \stackrel{s*}{\rightsquigarrow} r_2 :: rs_4$
   740 			r_2 :: rs_3 \stackrel{s*}{\rightsquigarrow} r_2 :: rs_4$
   719 		\item			
   741 		\item			
   720 			If we could rewrite a regular expression 
   742 			If we can rewrite a regular expression 
   721 			in many steps to $\ZERO$, then 
   743 			in many steps to $\ZERO$, then 
   722 			we could also rewrite any sequence containing it to $\ZERO$:\\
   744 			we can also rewrite any sequence containing it to $\ZERO$:\\
   723 			$r_1 \rightsquigarrow^* \ZERO 
   745 			$r_1 \rightsquigarrow^* \ZERO 
   724 			\implies _{bs}r_1\cdot r_2 \rightsquigarrow^* \ZERO$
   746 			\implies _{bs}r_1\cdot r_2 \rightsquigarrow^* \ZERO$
   725 	\end{itemize}
   747 	\end{itemize}
   726 \end{lemma}
   748 \end{lemma}
   727 \begin{proof}
   749 \begin{proof}
   734 	$\stackrel{s*}{\rightsquigarrow}$ and using part one to three. 
   756 	$\stackrel{s*}{\rightsquigarrow}$ and using part one to three. 
   735 	The fifth part is a corollary of part four.
   757 	The fifth part is a corollary of part four.
   736 	The last part is proven by rule induction again on $\rightsquigarrow^*$.
   758 	The last part is proven by rule induction again on $\rightsquigarrow^*$.
   737 \end{proof}
   759 \end{proof}
   738 \noindent
   760 \noindent
   739 Now we are ready to give the proofs of the below properties:
   761 Now we are ready to give the proofs of the following properties:
   740 \begin{itemize}
   762 \begin{itemize}
   741 	\item
   763 	\item
   742 		$(r \rightsquigarrow^* r'\land \bnullable \; r_1) 
   764 		$r \rightsquigarrow^* r'\land \bnullable \; r_1 
   743 		\implies \bmkeps \; r = \bmkeps \; r'$. \\
   765 		\implies \bmkeps \; r = \bmkeps \; r'$. \\
   744 	\item
   766 	\item
   745 		$r \rightsquigarrow^* \textit{bsimp} \;r$.\\
   767 		$r \rightsquigarrow^* \textit{bsimp} \;r$.\\
   746 	\item
   768 	\item
   747 		$r \rightsquigarrow r' \implies r \backslash c \rightsquigarrow^* r'\backslash c$.\\
   769 		$r \rightsquigarrow r' \implies r \backslash c \rightsquigarrow^* r'\backslash c$.\\
   748 \end{itemize}
   770 \end{itemize}
   749 These properties would work together towards the correctness theorem.
   771 
   750 \subsubsection{Property 1: $(r \rightsquigarrow^* r'\land \bnullable \; r_1) 
   772 \subsubsection{Property 1: $r \rightsquigarrow^* r'\land \bnullable \; r_1 
   751 		\implies \bmkeps \; r = \bmkeps \; r'$}
   773 		\implies \bmkeps \; r = \bmkeps \; r'$}
   752 Intuitively, this property says we can 
   774 Intuitively, this property says we can 
   753 extract the same bitcodes using $\bmkeps$ from the nullable
   775 extract the same bitcodes using $\bmkeps$ from the nullable
   754 components of two regular expressions $r$ and $r'$,
   776 components of two regular expressions $r$ and $r'$,
   755 if we can rewrite from one to the other in finitely
   777 if we can rewrite from one to the other in finitely
   759 having at least one nullable regular expressions:
   781 having at least one nullable regular expressions:
   760 \begin{center}
   782 \begin{center}
   761 	$\textit{bnullables} \; rs \quad \dn \quad \exists r \in rs. \;\; \bnullable \; r$
   783 	$\textit{bnullables} \; rs \quad \dn \quad \exists r \in rs. \;\; \bnullable \; r$
   762 \end{center}
   784 \end{center}
   763 \noindent
   785 \noindent
   764 The rewriting relation $\rightsquigarrow$ preserves nullability:
   786 The rewriting relation $\rightsquigarrow$ preserves (b)nullability:
   765 \begin{lemma}\label{rewritesBnullable}
   787 \begin{lemma}\label{rewritesBnullable}
   766 	\hspace{0em}
   788 	\hspace{0em}
   767 	\begin{itemize}
   789 	\begin{itemize}
   768 		\item
   790 		\item
   769 			$\text{If} \; r_1 \rightsquigarrow r_2, \; 
   791 			$\text{If} \; r_1 \rightsquigarrow r_2, \; 
   810 \end{lemma}
   832 \end{lemma}
   811 \begin{proof}
   833 \begin{proof}
   812 	By rule induction over the cases that lead to $r_1 \rightsquigarrow r_2$.
   834 	By rule induction over the cases that lead to $r_1 \rightsquigarrow r_2$.
   813 \end{proof}
   835 \end{proof}
   814 \noindent
   836 \noindent
   815 With lemma \ref{rewriteBmkepsAux} we are ready to prove its
   837 With lemma \ref{rewriteBmkepsAux} in place we are ready to prove its
   816 many-step version: 
   838 many-step version: 
   817 \begin{lemma}
   839 \begin{lemma}
   818 	$\text{If} \;\; r \stackrel{*}{\rightsquigarrow} r' \;\; \text{and} \;\; \bnullable \; r, \;\;\; \text{then} \;\; \bmkeps \; r = \bmkeps \; r'$
   840 	$\text{If} \;\; r \stackrel{*}{\rightsquigarrow} r' \;\; \text{and} \;\; \bnullable \; r, \;\;\; \text{then} \;\; \bmkeps \; r = \bmkeps \; r'$
   819 \end{lemma}
   841 \end{lemma}
   820 \begin{proof}
   842 \begin{proof}
   821 	By rule induction of $\stackrel{*}{\rightsquigarrow} $.
   843 	By rule induction of $\stackrel{*}{\rightsquigarrow} $. Lemma 
   822 	$\ref{rewritesBnullable}$ tells us both $r$ and $r'$ are nullable.
   844 	$\ref{rewritesBnullable}$ gives us both $r$ and $r'$ are nullable.
   823 	\ref{rewriteBmkepsAux} solves the inductive case.
   845 	The lemma \ref{rewriteBmkepsAux} solves the inductive case.
   824 \end{proof}
   846 \end{proof}
   825 
   847 
   826 \subsubsection{Property 2: $r \stackrel{*}{\rightsquigarrow} \bsimp{r}$}
   848 \subsubsection{Property 2: $r \stackrel{*}{\rightsquigarrow} \textit{bsimp} \; r$}
   827 Now we get to the ``meaty'' part of the proof, 
   849 Now we get to the key part of the proof, 
   828 which says that our simplification's helper functions 
   850 which says that our simplification's helper functions 
   829 such as $\distinctBy$ and $\flts$ conform to 
   851 such as $\distinctBy$ and $\flts$ describe
   830 the $\stackrel{s*}{\rightsquigarrow}$ and 
   852 reducts of $\stackrel{s*}{\rightsquigarrow}$ and 
   831 $\rightsquigarrow^* $ rewriting relations.\\
   853 $\rightsquigarrow^* $.\\
   832 The first lemma to prove is a more general version of 
   854 The first lemma to prove is a more general version of 
   833 $rs_ 1 \rightsquigarrow^* \distinctBy \; rs_1 \; \phi$:
   855 $rs_ 1 \rightsquigarrow^* \distinctBy \; rs_1 \; \phi$:
   834 \begin{lemma}
   856 \begin{lemma}
   835 	$rs_1 @ rs_2 \stackrel{s*}{\rightsquigarrow} 
   857 	$rs_1 @ rs_2 \stackrel{s*}{\rightsquigarrow} 
   836 	(rs_1 @ (\distinctBy \; rs_2 \; \; \rerases \;\; (\map\;\; \rerases \; \; rs_1)))$
   858 	(rs_1 @ (\distinctBy \; rs_2 \; \; \rerases \;\; (\map\;\; \rerases \; \; rs_1)))$
   847 we get the corollary
   869 we get the corollary
   848 \begin{corollary}\label{dBPreserves}
   870 \begin{corollary}\label{dBPreserves}
   849 	$rs_1 \stackrel{s*}{\rightsquigarrow} \distinctBy \; rs_1 \; \phi$.
   871 	$rs_1 \stackrel{s*}{\rightsquigarrow} \distinctBy \; rs_1 \; \phi$.
   850 \end{corollary}
   872 \end{corollary}
   851 \noindent
   873 \noindent
   852 The flatten function $\flts$ conforms to
   874 Similarly the flatten function $\flts$ describes a reduct of
   853 $\stackrel{s*}{\rightsquigarrow}$ as well:
   875 $\stackrel{s*}{\rightsquigarrow}$ as well:
   854 
   876 
   855 \begin{lemma}\label{fltsPreserves}
   877 \begin{lemma}\label{fltsPreserves}
   856 	$rs \stackrel{s*}{\rightsquigarrow} \flts \; rs$
   878 	$rs \stackrel{s*}{\rightsquigarrow} \flts \; rs$
   857 \end{lemma}
   879 \end{lemma}
   863 \begin{lemma}\label{bsimpaltsPreserves}
   885 \begin{lemma}\label{bsimpaltsPreserves}
   864 	$_{bs} \sum rs \stackrel{*}{\rightsquigarrow} \bsimpalts \; _{bs} \; rs$
   886 	$_{bs} \sum rs \stackrel{*}{\rightsquigarrow} \bsimpalts \; _{bs} \; rs$
   865 \end{lemma}
   887 \end{lemma}
   866 \noindent
   888 \noindent
   867 The simplification function
   889 The simplification function
   868 $\textit{bsimp}$ only transforms the regex $r$ using steps specified by 
   890 $\textit{bsimp}$ only transforms the regular expression  using steps specified by 
   869 $\rightsquigarrow^*$ and nothing else.
   891 $\rightsquigarrow^*$ and nothing else:
   870 \begin{lemma}
   892 \begin{lemma}
   871 	$r \stackrel{*}{\rightsquigarrow} \bsimp{r}$
   893 	$r \stackrel{*}{\rightsquigarrow} \textit{bsimp} \; r$
   872 \end{lemma}
   894 \end{lemma}
   873 \begin{proof}
   895 \begin{proof}
   874 	By an induction on $r$.
   896 	By an induction on $r$.
   875 	The most involved case would be the alternative, 
   897 	The most involved case is the alternative, 
   876 	where we use lemmas \ref{bsimpaltsPreserves},
   898 	where we use lemmas \ref{bsimpaltsPreserves},
   877 	\ref{fltsPreserves} and \ref{dBPreserves} to do a series of rewriting:\\
   899 	\ref{fltsPreserves} and \ref{dBPreserves} to do a series of rewriting:\\
   878 	\begin{center}
   900 	\begin{center}
   879 		\begin{tabular}{lcl}
   901 		\begin{tabular}{lcl}
   880 			$rs$ &  $\stackrel{s*}{\rightsquigarrow}$ & $ \map \; \textit{bsimp} \; rs$\\
   902 			$rs$ &  $\stackrel{s*}{\rightsquigarrow}$ & $ \map \; \textit{bsimp} \; rs$\\
   881 			     &  $\stackrel{s*}{\rightsquigarrow}$ & $ \flts \; (\map \; \textit{bsimp} \; rs)$\\
   903 			     &  $\stackrel{s*}{\rightsquigarrow}$ & $ \flts \; (\map \; \textit{bsimp} \; rs)$\\
   882 			     &  $\stackrel{s*}{\rightsquigarrow}$ & $ \distinctBy \; 
   904 			     &  $\stackrel{s*}{\rightsquigarrow}$ & $ \distinctBy \; 
   883 			(\flts \; (\map \; \textit{bsimp}\; rs)) \; \rerases \; \phi$\\
   905 			(\flts \; (\map \; \textit{bsimp}\; rs)) \; \rerases \; \phi$\\
   884 		\end{tabular}
   906 		\end{tabular}
   885 	\end{center}
   907 	\end{center}
   886 	Using this we derive the following rewrite relation:\\
   908 	Using this we can derive the following rewrite sequence:\\
   887 	\begin{center}
   909 	\begin{center}
   888 		\begin{tabular}{lcl}
   910 		\begin{tabular}{lcl}
   889 			$r$ & $=$ & $_{bs}\sum rs$\\[1.5ex]
   911 			$r$ & $=$ & $_{bs}\sum rs$\\[1.5ex]
   890 			    & $\rightsquigarrow^*$ & $\bsimpalts \; bs \; rs$ \\[1.5ex]
   912 			    & $\rightsquigarrow^*$ & $\bsimpalts \; bs \; rs$ \\[1.5ex]
   891 			    & $\rightsquigarrow^*$ & $\ldots$ \\ [1.5ex]
   913 			    & $\rightsquigarrow^*$ & $\ldots$ \\ [1.5ex]
   898 			    & $\rightsquigarrow^*$ & $\textit{bsimp} \; r$\\[1.5ex]
   920 			    & $\rightsquigarrow^*$ & $\textit{bsimp} \; r$\\[1.5ex]
   899 		\end{tabular}
   921 		\end{tabular}
   900 	\end{center}	
   922 	\end{center}	
   901 \end{proof}
   923 \end{proof}
   902 \subsubsection{Property 3: $r_1 \stackrel{*}{\rightsquigarrow}  r_2 \implies r_1 \backslash c \stackrel{*}{\rightsquigarrow} r_2 \backslash c$}
   924 \subsubsection{Property 3: $r_1 \stackrel{*}{\rightsquigarrow}  r_2 \implies r_1 \backslash c \stackrel{*}{\rightsquigarrow} r_2 \backslash c$}
   903 The rewritability relation 
   925 The rewrite relation 
   904 $\rightsquigarrow$ is preserved under derivatives--
   926 $\rightsquigarrow$ changes into $\stackrel{*}{\rightsquigarrow}$
   905 it is just that we might need multiple steps 
   927 after derivatives are taken on both sides:
   906 where originally only one step was needed:
       
   907 \begin{lemma}\label{rewriteBder}
   928 \begin{lemma}\label{rewriteBder}
   908 	\hspace{0em}
   929 	\hspace{0em}
   909 	\begin{itemize}
   930 	\begin{itemize}
   910 		\item
   931 		\item
   911 			If $r_1 \rightsquigarrow r_2$, then $r_1 \backslash c 
   932 			If $r_1 \rightsquigarrow r_2$, then $r_1 \backslash c 
   925 \begin{corollary}\label{rewritesBder}
   946 \begin{corollary}\label{rewritesBder}
   926 	$r_1 \rightsquigarrow^* r_2 \implies r_1 \backslash c \rightsquigarrow^*   
   947 	$r_1 \rightsquigarrow^* r_2 \implies r_1 \backslash c \rightsquigarrow^*   
   927 	r_2 \backslash c$
   948 	r_2 \backslash c$
   928 \end{corollary}
   949 \end{corollary}
   929 \begin{proof}
   950 \begin{proof}
   930 	By rule induction of $\stackrel{*}{\rightsquigarrow} $ and using the previous lemma \ref{rewriteBder}.
   951 	By rule induction of $\stackrel{*}{\rightsquigarrow} $ and   lemma \ref{rewriteBder}.
   931 \end{proof}
   952 \end{proof}
   932 \noindent
   953 \noindent
   933 This can be extended and combined with $r \rightsquigarrow^* \textit{bsimp} \; r$
   954 This can be extended and combined with $r \rightsquigarrow^* \textit{bsimp} \; r$
   934 to obtain the rewritability between
   955 to obtain the correspondence between
   935 $\blexer$ and $\blexersimp$'s intermediate
   956 $\blexer$ and $\blexersimp$'s intermediate
   936 derivative regular expressions 
   957 derivative regular expressions 
   937 \begin{lemma}\label{bderBderssimp}
   958 \begin{lemma}\label{bderBderssimp}
   938 	$a \backslash s \rightsquigarrow^* \bderssimp{a}{s} $
   959 	$a \backslash s \rightsquigarrow^* \bderssimp{a}{s} $
   939 \end{lemma}
   960 \end{lemma}
   940 \begin{proof}
   961 \begin{proof}
   941 	By an induction on $s$.
   962 	By an induction on $s$.
   942 \end{proof}
   963 \end{proof}
   943 \subsection{Main Theorem}
   964 \subsection{Main Theorem}
   944 Now with \ref{bderBderssimp} we are ready for the main theorem.
   965 Now with \ref{bderBderssimp} in place we are ready for the main theorem.
   945 \begin{theorem}
   966 \begin{theorem}
   946 	$\blexer \; r \; s = \blexersimp{r}{s}$
   967 	$\blexer \; r \; s = \blexersimp{r}{s}$
   947 \end{theorem}
   968 \end{theorem}
   948 \noindent
   969 \noindent
   949 \begin{proof}
   970 \begin{proof}
   950 	One can rewrite in many steps from the original lexer's 
   971 	We can rewrite in many steps from the original lexer's 
   951 	derivative regular expressions to the 
   972 	derivative regular expressions to the 
   952 	lexer with simplification applied (by lemma \ref{bderBderssimp}):
   973 	lexer with simplification applied (by lemma \ref{bderBderssimp}):
   953 	\begin{center}
   974 	\begin{center}
   954 		$a \backslash s \stackrel{*}{\rightsquigarrow} \bderssimp{a}{s} $.
   975 		$a \backslash s \stackrel{*}{\rightsquigarrow} \bderssimp{a}{s} $.
   955 	\end{center}
   976 	\end{center}
   956 	we know that they give out the same bits, if the lexing result is a match:
   977 	We know that they generate the same bits, if the lexing result is a match:
   957 	\begin{center}
   978 	\begin{center}
   958 		$\bnullable \; (a \backslash s) 
   979 		$\bnullable \; (a \backslash s) 
   959 		\implies \bmkeps \; (a \backslash s) = \bmkeps \; (\bderssimp{a}{s})$
   980 		\implies \bmkeps \; (a \backslash s) = \bmkeps \; (\bderssimp{a}{s})$
   960 	\end{center}
   981 	\end{center}
   961 	Now that they give out the same bits, we know that they give the same value after decoding.
   982 	Now that they generate the same bits, we know that they give the same value after decoding.
   962 	\begin{center}
   983 	\begin{center}
   963 		$\bnullable \; (a \backslash s) 
   984 		$\bnullable \; (a \backslash s) 
   964 		\implies \decode \; r \; (\bmkeps \; (a \backslash s)) = 
   985 		\implies \decode \; r \; (\bmkeps \; (a \backslash s)) = 
   965 		\decode \; r \; (\bmkeps \; (\bderssimp{a}{s}))$
   986 		\decode \; r \; (\bmkeps \; (\bderssimp{a}{s}))$
   966 	\end{center}
   987 	\end{center}
   967 	Which is equivalent to our proof goal:
   988 	Which is required by our proof goal:
   968 	\begin{center}
   989 	\begin{center}
   969 		$\blexer \; r \; s = \blexersimp \; r \; s$.
   990 		$\blexer \; r \; s = \blexersimp \; r \; s$.
   970 	\end{center}	
   991 	\end{center}	
   971 \end{proof}
   992 \end{proof}
   972 \noindent
   993 \noindent
   973 As a corollary,
   994 As a corollary,
   974 we link this result with the lemma we proved earlier that 
   995 we can link this result with the lemma we proved earlier that 
   975 \begin{center}
   996 \begin{center}
   976 	$(r, s) \rightarrow v \;\; \textit{iff}\;\; \blexer \; r \; s = v$
   997 	$(r, s) \rightarrow v \;\; \textit{iff}\;\; \blexer \; r \; s = \Some \;v$\\
       
   998 	$\nexists v. \; (r, s) \rightarrow v \;\; \textit{iff} \;\; \blexer\;
       
   999 	r\;s = \None$.
   977 \end{center}
  1000 \end{center}
   978 and obtain the corollary that the bit-coded lexer with simplification is
  1001 and obtain the corollary that the bit-coded lexer with simplification is
   979 indeed correctly outputting POSIX lexing result, if such a result exists.
  1002 indeed correctly outputting POSIX lexing result, if such a result exists.
   980 \begin{corollary}
  1003 \begin{corollary}
   981 	$(r, s) \rightarrow v \;\; \textit{iff} \;\; \blexersimp \; r\; s $
  1004 	$(r, s) \rightarrow v \;\; \textit{iff} \;\; \blexersimp \; r\; s = \Some \; v$\\
       
  1005 	$\nexists v. \; (r, s) \rightarrow v \;\; \textit{iff} \;\; \blexersimp\;
       
  1006 	r\;s = \None$.
   982 \end{corollary}
  1007 \end{corollary}
   983 
  1008 
   984 \subsection{Comments on the Proof Techniques Used}
  1009 \subsection{Comments on the Proof Techniques Used}
   985 Straightforward and simple as the proof may seem,
  1010 Straightforward and simple as the proof may seem,
   986 the efforts we spent obtaining it were far from trivial.\\
  1011 the efforts we spent obtaining it were far from trivial.
   987 We initially attempted to re-use the argument 
  1012 We initially attempted to re-use the argument 
   988 in \cref{flex_retrieve}. 
  1013 in \cref{flex_retrieve}. 
   989 The problem was that both functions $\inj$ and $\retrieve$ require 
  1014 The problem is that both functions $\inj$ and $\retrieve$ require 
   990 that the annotated regular expressions stay unsimplified, 
  1015 that the annotated regular expressions stay unsimplified, 
   991 so that one can 
  1016 so that one can 
   992 correctly compare $v_{i+1}$ and $r_i$  and $v_i$ 
  1017 correctly compare $v_{i+1}$ and $r_i$  and $v_i$ 
   993 in diagram \ref{graph:inj} and 
  1018 in diagram \ref{graph:inj} and 
   994 ``fit the key into the lock hole''.
  1019 ``fit the key into the lock hole''.
  1039 final,
  1064 final,
  1040 one could come up new rules
  1065 one could come up new rules
  1041 such as 
  1066 such as 
  1042 $\SEQ r_1 \cdot (\SEQ r_1 \cdot r_3) \rightarrow
  1067 $\SEQ r_1 \cdot (\SEQ r_1 \cdot r_3) \rightarrow
  1043 \SEQs [r_1, r_2, r_3]$.
  1068 \SEQs [r_1, r_2, r_3]$.
  1044 This does not fit with the proof technique
  1069 However this does not fit with the proof technique
  1045 of our main theorem, but seem to not violate the POSIX
  1070 of our main theorem, but seem to not violate the POSIX
  1046 property.\\
  1071 property.
  1047 Having correctness property is good. 
  1072 
  1048 But we would also a guarantee that the lexer is not slow in 
  1073 Having established the correctness of our
  1049 some sense, for exampe, not grinding to a halt regardless of the input.
  1074 $\blexersimp$, in the next chapter we shall prove that with our $\simp$ function,
  1050 As we have already seen, Sulzmann and Lu's simplification function
  1075 for a given $r$, the derivative size is always
  1051 $\simpsulz$ cannot achieve this, because their claim that
       
  1052 the regular expression size does not grow arbitrary large
       
  1053 was not true. 
       
  1054 In the next chapter we shall prove that with our $\simp$, 
       
  1055 for a given $r$, the internal derivative size is always
       
  1056 finitely bounded by a constant.
  1076 finitely bounded by a constant.