lexing: comparison ChengsongTanPhdThesis/Chapters/Finite.tex

equal deleted inserted replaced

-:273c176d9027
+:2e05f04ed6b3
 \begin{itemize}
 	\item
 		We first introduce the operations such as
 		derivatives, simplification, size calculation, etc.
 		associated with $\rrexp$s, which we have introduced
-		in chapter \ref{Bitcoded2}.
+		in chapter \ref{Bitcoded2}. As promised we will discuss
+		why they are needed in \ref{whyRerase}.
 		The operations on $\rrexp$s are identical to those on
 		annotated regular expressions except that they dispense with
 		bitcodes. This means that all proofs about size of $\rrexp$s will apply to
 		annotated regular expressions, because the size of a regular
 		expression is independent of the bitcodes.
 		$\rerase{_{bs}\sum as}$ & $\dn$ & $\RALTS{\map \; \rerase{\_} \; as}$\\
 		$\rerase{_{bs} a ^*}$ & $\dn$ & $\rerase{a} ^*$
 	\end{tabular}
 \end{center}
-\subsection{Why a New Datatype?}
+\subsection{Why a New Datatype?}\label{whyRerase}
-The reason we
+\marginpar{\em added label so this section can be referenced by other parts of the thesis
-define a new datatype is that
+so that interested readers can jump to/be reassured that there will explanations.}
-the $\erase$ function
+Originally the erase operation $(\_)_\downarrow$ was
-does not preserve the structure of annotated
+used by Ausaf et al. in their proofs related to $\blexer$.
-regular expressions.
+This function was not part of the lexing algorithm, and the sole purpose was to
-We initially started by using
+bridge the gap between the $r$
-plain regular expressions and tried to prove
+%$\textit{rexp}$
-lemma \ref{rsizeAsize},
+(un-annotated) and $\textit{arexp}$ (annotated)
-however the $\erase$ function messes with the structure of the
+regular expression datatypes so as to leverage the correctness
-annotated regular expression.
+theorem of $\lexer$.%to establish the correctness of $\blexer$.
-The $+$ constructor
+For example, lemma \ref{retrieveStepwise} %and \ref{bmkepsRetrieve}
-of basic regular expressions is only binary, whereas $\sum$
+uses $\erase$ to convert an annotated regular expression $a$ into
-takes a list. Therefore we need to convert between
+a plain one so that it can be used by $\inj$ to create the desired value
-annotated and normal regular expressions as follows:
+$\inj\; (a)_\downarrow \; c \; v$.
+Ideally $\erase$ should only remove the auxiliary information not related to the
+structure--the
+bitcodes. However there exists a complication
+where the alternative constructors have different arity for $\textit{arexp}$
+and $\textit{r}$:
 \begin{center}
 	\begin{tabular}{lcl}
-		$\erase \; _{bs}\sum [] $ & $\dn$ & $\ZERO$\\
+		$\textit{r}$ & $::=$ & $\ldots \;|\; (\_ + \_) \; ::\; "\textit{r} \Rightarrow \textit{r} \Rightarrow \textit{r}" | \ldots$\\
-		$\erase \; _{bs}\sum [a]$ & $\dn$ & $a$\\
+		$\textit{arexp}$ & $::=$ & $\ldots\; |\; (\Sigma \_ ) \; ::\; "\textit{arexp} \; list \Rightarrow \textit{arexp}" | \ldots$
-		$\erase \; _{bs}\sum a :: as$ & $\dn$ & $a + (\erase \; _{[]} \sum as)\quad \text{if $as$ length over 1}$
+	\end{tabular}
-	\end{tabular}
+\end{center}
-\end{center}
+\noindent
-\noindent
+To convert between the two
-As can be seen, alternative regular expressions with an empty argument list
+$\erase$ has to recursively disassemble a list into nested binary applications of the
-will be turned into a $\ZERO$.
+$(\_ + \_)$ operator,
-The singleton alternative $\sum [r]$ becomes $r$ during the
+handling corner cases like empty or
-$\erase$ function.
+singleton alternative lists:
-The  annotated regular expression $\sum[a, b, c]$ would turn into
+%becomes $r$ during the
-$(a+(b+c))$.
+%$\erase$ function.
-All these operations change the size and structure of
+%The  annotated regular expression $\sum[a, b, c]$ would turn into
-an annotated regular expression, adding unnecessary
+%$(a+(b+c))$.
-complexities to the size bound proof.
+\begin{center}
+	\begin{tabular}{lcl}
+		$ (_{bs}\sum [])_\downarrow $ & $\dn$ & $\ZERO$\\
+		$ (_{bs}\sum [a])_\downarrow$ & $\dn$ & $a$\\
+		$ (_{bs}\sum a_1 :: a_2)_\downarrow$ & $\dn$ & $(a_1)_\downarrow + (a_2)_\downarrow)$\\
+		$ (_{bs}\sum a :: as)_\downarrow$ & $\dn$ & $a_\downarrow + (\erase \; _{[]} \sum as)$
+	\end{tabular}
+\end{center}
+\noindent
+These operations inevitably change the structure and size of
+an annotated regular expression. For example,
+$a_1 = \sum _{Z}[x]$ has size 2, but $(a_1)_\downarrow = x$
+only has size 1.
+%adding unnecessary
+%complexities to the size bound proof.
+%The reason we
+%define a new datatype is that
+%the $\erase$ function
+%does not preserve the structure of annotated
+%regular expressions.
+%We initially started by using
+%plain regular expressions and tried to prove
+%lemma \ref{rsizeAsize},
+%however the $\erase$ function messes with the structure of the
+%annotated regular expression.
+%The $+$ constructor
+%of basic regular expressions is only binary, whereas $\sum$
+%takes a list. Therefore we need to convert between
+%annotated and normal regular expressions as follows:
 For example, if we define the size of a basic plain regular expression
 in the usual way,
 \begin{center}
 	\begin{tabular}{lcl}
 		$\llbracket \ONE \rrbracket_p$ & $\dn$ & $1$\\
 		$\llbracket \ZERO \rrbracket_p$ & $\dn$ & $1$ \\
-		$\llbracket r_1 \cdot r_2 \rrbracket_p$ & $\dn$ & $\llbracket r_1 \rrbracket_p + \llbracket r_2 \rrbracket_p + 1$\\
+		$\llbracket r_1 + r_2 \rrbracket_p$ & $\dn$ & $\llbracket r_1 \rrbracket_p + \llbracket r_2 \rrbracket_p + 1$\\
 		$\llbracket \mathbf{c} \rrbracket_p $ & $\dn$ & $1$\\
 		$\llbracket r_1 \cdot r_2 \rrbracket_p $ & $\dn$ & $\llbracket r_1 \rrbracket_p \; + \llbracket r_2 \rrbracket_p + 1$\\
 		$\llbracket a^* \rrbracket_p $ & $\dn$ & $\llbracket a \rrbracket_p + 1$
 	\end{tabular}
 \end{center}
 Then the property
 \begin{center}
 	$\llbracket a \rrbracket \stackrel{?}{=} \llbracket a_\downarrow \rrbracket_p$
 \end{center}
 does not hold.
-With $\textit{rerase}$, however,
+%With $\textit{rerase}$, however,
-only the bitcodes are thrown away.
+%only the bitcodes are thrown away.
-Everything about the structure remains intact.
+That leads to us defining the new regular expression datatype without
-Therefore it does not change the size
+bitcodes but with a list alternative constructor, and defining a new erase function
-of an annotated regular expression and we have:
+in a strictly structure-preserving manner:
-\begin{lemma}\label{rsizeAsize}
+\begin{center}
-	$\rsize{\rerase a} = \asize a$
+	\begin{tabular}{lcl}
-\end{lemma}
+		$\textit{rrexp}$ & $::=$ & $\ldots\; |\; (\sum \_ ) \; ::\; "\textit{rrexp} \; list \Rightarrow \textit{rrexp}" | \ldots$\\
-\begin{proof}
+		$\rerase{_{bs}\sum as}$ & $\dn$ & $\RALTS{\map \; \rerase{\_} \; as}$\\
-	By routine structural induction on $a$.
+	\end{tabular}
-\end{proof}
+\end{center}
+\noindent
+%But
+%Everything about the structure remains intact.
+%Therefore it does not change the size
+%of an annotated regular expression and we have:
 \noindent
 One might be able to prove an inequality such as
 $\llbracket a \rrbracket  \leq \llbracket  a_\downarrow \rrbracket_p $
 and then estimate $\llbracket  a_\downarrow \rrbracket_p$,
 but we found our approach more straightforward.\\
 Everything about the size of annotated regular expressions after the application
 of function $\bsimp$ and $\backslash_{simps}$
 can be calculated via the size of r-regular expressions after the application
 of $\rsimp$ and $\backslash_{rsimps}$:
 \begin{lemma}\label{sizeRelations}
-	The following two equalities hold:
+	The following equalities hold:
 	\begin{itemize}
+		\item
+			$\rsize{\rerase a} = \asize a$
 		\item
 			$\asize{\bsimps \; a} = \rsize{\rsimp{ \rerase{a}}}$
 		\item
 			$\asize{\bderssimp{a}{s}} =  \rsize{\rderssimp{\rerase{a}}{s}}$
 	\end{itemize}
 \end{lemma}
 \begin{proof}
-	The first part is by induction on the inductive cases
+	First part follows from the definition of $(\_)_{\downarrow_r}$.
+	The second part is by induction on the inductive cases
 	of $\textit{bsimp}$.
-	The second part is by induction on the string $s$,
+	The third part is by induction on the string $s$,
 	where the inductive step follows from part one.
 \end{proof}
 \noindent
 With lemma \ref{sizeRelations},
 we will be able to focus on

changeset 659	2e05f04ed6b3
parent 640	bd1354127574
child 660	eddc4eaba7c4