ChengsongTanPhdThesis/Chapters/Finite.tex
changeset 659 2e05f04ed6b3
parent 640 bd1354127574
child 660 eddc4eaba7c4
equal deleted inserted replaced
658:273c176d9027 659:2e05f04ed6b3
   206 \begin{itemize}
   206 \begin{itemize}
   207 	\item
   207 	\item
   208 		We first introduce the operations such as 
   208 		We first introduce the operations such as 
   209 		derivatives, simplification, size calculation, etc.
   209 		derivatives, simplification, size calculation, etc.
   210 		associated with $\rrexp$s, which we have introduced
   210 		associated with $\rrexp$s, which we have introduced
   211 		in chapter \ref{Bitcoded2}.
   211 		in chapter \ref{Bitcoded2}. As promised we will discuss
       
   212 		why they are needed in \ref{whyRerase}.
   212 		The operations on $\rrexp$s are identical to those on
   213 		The operations on $\rrexp$s are identical to those on
   213 		annotated regular expressions except that they dispense with
   214 		annotated regular expressions except that they dispense with
   214 		bitcodes. This means that all proofs about size of $\rrexp$s will apply to
   215 		bitcodes. This means that all proofs about size of $\rrexp$s will apply to
   215 		annotated regular expressions, because the size of a regular
   216 		annotated regular expressions, because the size of a regular
   216 		expression is independent of the bitcodes.
   217 		expression is independent of the bitcodes.
   279 		$\rerase{_{bs}\sum as}$ & $\dn$ & $\RALTS{\map \; \rerase{\_} \; as}$\\
   280 		$\rerase{_{bs}\sum as}$ & $\dn$ & $\RALTS{\map \; \rerase{\_} \; as}$\\
   280 		$\rerase{_{bs} a ^*}$ & $\dn$ & $\rerase{a} ^*$
   281 		$\rerase{_{bs} a ^*}$ & $\dn$ & $\rerase{a} ^*$
   281 	\end{tabular}
   282 	\end{tabular}
   282 \end{center}
   283 \end{center}
   283 
   284 
   284 \subsection{Why a New Datatype?}
   285 \subsection{Why a New Datatype?}\label{whyRerase}
   285 The reason we
   286 \marginpar{\em added label so this section can be referenced by other parts of the thesis
   286 define a new datatype is that 
   287 so that interested readers can jump to/be reassured that there will explanations.}
   287 the $\erase$ function 
   288 Originally the erase operation $(\_)_\downarrow$ was
   288 does not preserve the structure of annotated
   289 used by Ausaf et al. in their proofs related to $\blexer$.
   289 regular expressions.
   290 This function was not part of the lexing algorithm, and the sole purpose was to
   290 We initially started by using 
   291 bridge the gap between the $r$
   291 plain regular expressions and tried to prove
   292 %$\textit{rexp}$ 
   292 lemma \ref{rsizeAsize},
   293 (un-annotated) and $\textit{arexp}$ (annotated)
   293 however the $\erase$ function messes with the structure of the 
   294 regular expression datatypes so as to leverage the correctness
   294 annotated regular expression.
   295 theorem of $\lexer$.%to establish the correctness of $\blexer$.
   295 The $+$ constructor
   296 For example, lemma \ref{retrieveStepwise} %and \ref{bmkepsRetrieve} 
   296 of basic regular expressions is only binary, whereas $\sum$ 
   297 uses $\erase$ to convert an annotated regular expression $a$ into
   297 takes a list. Therefore we need to convert between
   298 a plain one so that it can be used by $\inj$ to create the desired value
   298 annotated and normal regular expressions as follows:
   299 $\inj\; (a)_\downarrow \; c \; v$.
       
   300 
       
   301 Ideally $\erase$ should only remove the auxiliary information not related to the
       
   302 structure--the 
       
   303 bitcodes. However there exists a complication
       
   304 where the alternative constructors have different arity for $\textit{arexp}$
       
   305 and $\textit{r}$:
   299 \begin{center}
   306 \begin{center}
   300 	\begin{tabular}{lcl}
   307 	\begin{tabular}{lcl}
   301 		$\erase \; _{bs}\sum [] $ & $\dn$ & $\ZERO$\\
   308 		$\textit{r}$ & $::=$ & $\ldots \;|\; (\_ + \_) \; ::\; "\textit{r} \Rightarrow \textit{r} \Rightarrow \textit{r}" | \ldots$\\
   302 		$\erase \; _{bs}\sum [a]$ & $\dn$ & $a$\\
   309 		$\textit{arexp}$ & $::=$ & $\ldots\; |\; (\Sigma \_ ) \; ::\; "\textit{arexp} \; list \Rightarrow \textit{arexp}" | \ldots$
   303 		$\erase \; _{bs}\sum a :: as$ & $\dn$ & $a + (\erase \; _{[]} \sum as)\quad \text{if $as$ length over 1}$
   310 	\end{tabular}
   304 	\end{tabular}
   311 \end{center}
   305 \end{center}
   312 \noindent
   306 \noindent
   313 To convert between the two
   307 As can be seen, alternative regular expressions with an empty argument list
   314 $\erase$ has to recursively disassemble a list into nested binary applications of the 
   308 will be turned into a $\ZERO$.
   315 $(\_ + \_)$ operator,
   309 The singleton alternative $\sum [r]$ becomes $r$ during the
   316 handling corner cases like empty or
   310 $\erase$ function.
   317 singleton alternative lists:
   311 The  annotated regular expression $\sum[a, b, c]$ would turn into
   318 %becomes $r$ during the
   312 $(a+(b+c))$.
   319 %$\erase$ function.
   313 All these operations change the size and structure of
   320 %The  annotated regular expression $\sum[a, b, c]$ would turn into
   314 an annotated regular expression, adding unnecessary 
   321 %$(a+(b+c))$.
   315 complexities to the size bound proof.
   322 \begin{center}
   316 
   323 	\begin{tabular}{lcl}
       
   324 		$ (_{bs}\sum [])_\downarrow $ & $\dn$ & $\ZERO$\\
       
   325 		$ (_{bs}\sum [a])_\downarrow$ & $\dn$ & $a$\\
       
   326 		$ (_{bs}\sum a_1 :: a_2)_\downarrow$ & $\dn$ & $(a_1)_\downarrow + (a_2)_\downarrow)$\\
       
   327 		$ (_{bs}\sum a :: as)_\downarrow$ & $\dn$ & $a_\downarrow + (\erase \; _{[]} \sum as)$
       
   328 	\end{tabular}
       
   329 \end{center}
       
   330 \noindent
       
   331 These operations inevitably change the structure and size of
       
   332 an annotated regular expression. For example,
       
   333 $a_1 = \sum _{Z}[x]$ has size 2, but $(a_1)_\downarrow = x$ 
       
   334 only has size 1.
       
   335 %adding unnecessary 
       
   336 %complexities to the size bound proof.
       
   337 %The reason we
       
   338 %define a new datatype is that 
       
   339 %the $\erase$ function 
       
   340 %does not preserve the structure of annotated
       
   341 %regular expressions.
       
   342 %We initially started by using 
       
   343 %plain regular expressions and tried to prove
       
   344 %lemma \ref{rsizeAsize},
       
   345 %however the $\erase$ function messes with the structure of the 
       
   346 %annotated regular expression.
       
   347 %The $+$ constructor
       
   348 %of basic regular expressions is only binary, whereas $\sum$ 
       
   349 %takes a list. Therefore we need to convert between
       
   350 %annotated and normal regular expressions as follows:
   317 For example, if we define the size of a basic plain regular expression 
   351 For example, if we define the size of a basic plain regular expression 
   318 in the usual way,
   352 in the usual way,
   319 \begin{center}
   353 \begin{center}
   320 	\begin{tabular}{lcl}
   354 	\begin{tabular}{lcl}
   321 		$\llbracket \ONE \rrbracket_p$ & $\dn$ & $1$\\
   355 		$\llbracket \ONE \rrbracket_p$ & $\dn$ & $1$\\
   322 		$\llbracket \ZERO \rrbracket_p$ & $\dn$ & $1$ \\
   356 		$\llbracket \ZERO \rrbracket_p$ & $\dn$ & $1$ \\
   323 		$\llbracket r_1 \cdot r_2 \rrbracket_p$ & $\dn$ & $\llbracket r_1 \rrbracket_p + \llbracket r_2 \rrbracket_p + 1$\\
   357 		$\llbracket r_1 + r_2 \rrbracket_p$ & $\dn$ & $\llbracket r_1 \rrbracket_p + \llbracket r_2 \rrbracket_p + 1$\\
   324 		$\llbracket \mathbf{c} \rrbracket_p $ & $\dn$ & $1$\\
   358 		$\llbracket \mathbf{c} \rrbracket_p $ & $\dn$ & $1$\\
   325 		$\llbracket r_1 \cdot r_2 \rrbracket_p $ & $\dn$ & $\llbracket r_1 \rrbracket_p \; + \llbracket r_2 \rrbracket_p + 1$\\
   359 		$\llbracket r_1 \cdot r_2 \rrbracket_p $ & $\dn$ & $\llbracket r_1 \rrbracket_p \; + \llbracket r_2 \rrbracket_p + 1$\\
   326 		$\llbracket a^* \rrbracket_p $ & $\dn$ & $\llbracket a \rrbracket_p + 1$
   360 		$\llbracket a^* \rrbracket_p $ & $\dn$ & $\llbracket a \rrbracket_p + 1$
   327 	\end{tabular}
   361 	\end{tabular}
   328 \end{center}
   362 \end{center}
   330 Then the property
   364 Then the property
   331 \begin{center}
   365 \begin{center}
   332 	$\llbracket a \rrbracket \stackrel{?}{=} \llbracket a_\downarrow \rrbracket_p$
   366 	$\llbracket a \rrbracket \stackrel{?}{=} \llbracket a_\downarrow \rrbracket_p$
   333 \end{center}
   367 \end{center}
   334 does not hold.
   368 does not hold.
   335 With $\textit{rerase}$, however, 
   369 %With $\textit{rerase}$, however, 
   336 only the bitcodes are thrown away.
   370 %only the bitcodes are thrown away.
   337 Everything about the structure remains intact.
   371 That leads to us defining the new regular expression datatype without
   338 Therefore it does not change the size
   372 bitcodes but with a list alternative constructor, and defining a new erase function
   339 of an annotated regular expression and we have:
   373 in a strictly structure-preserving manner:
   340 \begin{lemma}\label{rsizeAsize}
   374 \begin{center}
   341 	$\rsize{\rerase a} = \asize a$
   375 	\begin{tabular}{lcl}
   342 \end{lemma}
   376 		$\textit{rrexp}$ & $::=$ & $\ldots\; |\; (\sum \_ ) \; ::\; "\textit{rrexp} \; list \Rightarrow \textit{rrexp}" | \ldots$\\
   343 \begin{proof}
   377 		$\rerase{_{bs}\sum as}$ & $\dn$ & $\RALTS{\map \; \rerase{\_} \; as}$\\
   344 	By routine structural induction on $a$.
   378 	\end{tabular}
   345 \end{proof}
   379 \end{center}
       
   380 \noindent
       
   381 %But
       
   382 %Everything about the structure remains intact.
       
   383 %Therefore it does not change the size
       
   384 %of an annotated regular expression and we have:
   346 \noindent
   385 \noindent
   347 One might be able to prove an inequality such as
   386 One might be able to prove an inequality such as
   348 $\llbracket a \rrbracket  \leq \llbracket  a_\downarrow \rrbracket_p $
   387 $\llbracket a \rrbracket  \leq \llbracket  a_\downarrow \rrbracket_p $
   349 and then estimate $\llbracket  a_\downarrow \rrbracket_p$,
   388 and then estimate $\llbracket  a_\downarrow \rrbracket_p$,
   350 but we found our approach more straightforward.\\
   389 but we found our approach more straightforward.\\
   473 Everything about the size of annotated regular expressions after the application
   512 Everything about the size of annotated regular expressions after the application
   474 of function $\bsimp$ and $\backslash_{simps}$
   513 of function $\bsimp$ and $\backslash_{simps}$
   475 can be calculated via the size of r-regular expressions after the application
   514 can be calculated via the size of r-regular expressions after the application
   476 of $\rsimp$ and $\backslash_{rsimps}$:
   515 of $\rsimp$ and $\backslash_{rsimps}$:
   477 \begin{lemma}\label{sizeRelations}
   516 \begin{lemma}\label{sizeRelations}
   478 	The following two equalities hold:
   517 	The following equalities hold:
   479 	\begin{itemize}
   518 	\begin{itemize}
       
   519 		\item
       
   520 			$\rsize{\rerase a} = \asize a$
   480 		\item
   521 		\item
   481 			$\asize{\bsimps \; a} = \rsize{\rsimp{ \rerase{a}}}$
   522 			$\asize{\bsimps \; a} = \rsize{\rsimp{ \rerase{a}}}$
   482 		\item
   523 		\item
   483 			$\asize{\bderssimp{a}{s}} =  \rsize{\rderssimp{\rerase{a}}{s}}$
   524 			$\asize{\bderssimp{a}{s}} =  \rsize{\rderssimp{\rerase{a}}{s}}$
   484 	\end{itemize}
   525 	\end{itemize}
   485 \end{lemma}
   526 \end{lemma}
   486 \begin{proof}
   527 \begin{proof}
   487 	The first part is by induction on the inductive cases
   528 	First part follows from the definition of $(\_)_{\downarrow_r}$.
       
   529 	The second part is by induction on the inductive cases
   488 	of $\textit{bsimp}$.
   530 	of $\textit{bsimp}$.
   489 	The second part is by induction on the string $s$,
   531 	The third part is by induction on the string $s$,
   490 	where the inductive step follows from part one.
   532 	where the inductive step follows from part one.
   491 \end{proof}
   533 \end{proof}
   492 \noindent
   534 \noindent
   493 With lemma \ref{sizeRelations},
   535 With lemma \ref{sizeRelations},
   494 we will be able to focus on 
   536 we will be able to focus on