lexing: comparison ChengsongTanPhdThesis/Chapters/Bitcoded2.tex

equal deleted inserted replaced

-:80e1114d6421
+:86e0203db2da
 \end{proof}
 \noindent
 As a corollary,
 we link this result with the lemma we proved earlier that
 \begin{center}
-	$(r, s) \rightarrow v \implies \blexer \; r \; s = v$
+	$(r, s) \rightarrow v \;\; \textit{iff}\;\; \blexer \; r \; s = v$
 \end{center}
 and obtain the corollary that the bit-coded lexer with simplification is
 indeed correctly outputting POSIX lexing result, if such a result exists.
 \begin{corollary}
-	$(r, s) \rightarrow v \implies \blexersimp{r}{s}$
+	$(r, s) \rightarrow v \;\; \textit{iff} \;\; \blexersimp \; r\; s $
 \end{corollary}
 \subsection{Comments on the Proof Techniques Used}
-The non-trivial part of proving the correctness of the algorithm with simplification
+Straightforward and simple as the proof may seem,
-compared with not having simplification is that we can no longer use the argument
+the efforts we spent obtaining it was far from trivial.\\
-in \cref{flex_retrieve}.
+We initially attempted to re-use the argument
-The function \retrieve needs the cumbersome structure of the (umsimplified)
+in \cref{flex_retrieve}.
-annotated regular expression to
+The problem was that both functions $\inj$ and $\retrieve$ require
-agree with the structure of the value, but simplification will always mess with the
+that the annotated regular expressions stay unsimplified,
-structure.
+so that one can
+correctly compare $v_{i+1}$ and $r_i$  and $v_i$
-We also tried to prove $\bsimp{\bderssimp{a}{s}} = \bsimp{a\backslash s}$,
+in diagram \ref{graph:inj} and
-but this turns out to be not true, A counterexample of this being
+``fit the key into the lock hole''.
-\[ r = [(1+c)\cdot [aa \cdot (1+c)]] \land s = aa
+\noindent
+We also tried to prove
+\begin{center}
+$\textit{bsimp} \;\; (\bderssimp{a}{s}) =
+\textit{bsimp} \;\;  (a\backslash s)$,
+\end{center}
+but this turns out to be not true.
+A counterexample would be
+\[ a = [(_{Z}1+_{S}c)\cdot [bb \cdot (_{Z}1+_{S}c)]] \;\;
+	\text{and} \;\; s = bb.
 \]
+\noindent
-Then we would have $\bsimp{a \backslash s}$ being
+Then we would have
-$_{[]}(_{ZZ}\ONE +  _{ZS}c ) $
+\begin{center}
-whereas $\bsimp{\bderssimp{a}{s}}$ would be
+	$\textit{bsimp}\;\; ( a \backslash s )$ =
-$_{Z}(_{Z} \ONE + _{S} c)$.
+	$_{[]}(_{ZZ}\ONE +  _{ZS}c ) $
-Unfortunately if we apply $\textit{bsimp}$ at different
+\end{center}
-stages we will always have this discrepancy, due to
+\noindent
-whether the $\map \; (\fuse\; bs) \; as$ operation in $\textit{bsimp}$
+whereas
-is taken at some points will be entirely dependant on when the simplification
+\begin{center}
-take place whether there is a larger alternative structure surrounding the
+	$\textit{bsimp} \;\;( \bderssimp{a}{s} )$ =
-alternative being simplified.
+	$_{Z}(_{Z} \ONE + _{S} c)$.
-The good thing about $\stackrel{*}{\rightsquigarrow} $ is that it allows
+\end{center}
-us not specify how exactly the "atomic" simplification steps $\rightsquigarrow$
+Unfortunately,
-are taken, but simply say that they can be taken to make two similar
+if we apply $\textit{bsimp}$ differently
-regular expressions equal, and can be done after interleaving derivatives
+we will always have this discrepancy.
-and simplifications.
+This is due to
+the $\map \; (\fuse\; bs) \; as$ operation
+happening at different locations in the regular expression.\\
-Having correctness property is good. But we would also like the lexer to be efficient in
+The rewriting relation
-some sense, for exampe, not grinding to a halt at certain cases.
+$\rightsquigarrow^*$
-In the next chapter we shall prove that for a given $r$, the internal derivative size is always
+allows us to ignore this discrepancy
+and view the expressions
+\begin{center}
+	$_{[]}(_{ZZ}\ONE +  _{ZS}c ) $\\
+	and\\
+	$_{Z}(_{Z} \ONE + _{S} c)$
+\end{center}
+as equal, because they were both re-written
+from the same expression.\\
+Having correctness property is good.
+But we would also a guarantee that the lexer is not slow in
+some sense, for exampe, not grinding to a halt regardless of the input.
+As we have already seen, Sulzmann and Lu's simplification function
+$\simpsulz$ cannot achieve this, because their claim that
+the regular expression size does not grow arbitrary large
+was not true.
+In the next chapter we shall prove that with our $\simp$,
+for a given $r$, the internal derivative size is always
 finitely bounded by a constant.
-we would expect in the

changeset 589	86e0203db2da
parent 588	80e1114d6421
child 590	988e92a70704