ChengsongTanPhdThesis/Chapters/Bitcoded2.tex
changeset 589 86e0203db2da
parent 588 80e1114d6421
child 590 988e92a70704
equal deleted inserted replaced
588:80e1114d6421 589:86e0203db2da
   940 \end{proof}
   940 \end{proof}
   941 \noindent
   941 \noindent
   942 As a corollary,
   942 As a corollary,
   943 we link this result with the lemma we proved earlier that 
   943 we link this result with the lemma we proved earlier that 
   944 \begin{center}
   944 \begin{center}
   945 	$(r, s) \rightarrow v \implies \blexer \; r \; s = v$
   945 	$(r, s) \rightarrow v \;\; \textit{iff}\;\; \blexer \; r \; s = v$
   946 \end{center}
   946 \end{center}
   947 and obtain the corollary that the bit-coded lexer with simplification is
   947 and obtain the corollary that the bit-coded lexer with simplification is
   948 indeed correctly outputting POSIX lexing result, if such a result exists.
   948 indeed correctly outputting POSIX lexing result, if such a result exists.
   949 \begin{corollary}
   949 \begin{corollary}
   950 	$(r, s) \rightarrow v \implies \blexersimp{r}{s}$
   950 	$(r, s) \rightarrow v \;\; \textit{iff} \;\; \blexersimp \; r\; s $
   951 \end{corollary}
   951 \end{corollary}
   952 
   952 
   953 \subsection{Comments on the Proof Techniques Used}
   953 \subsection{Comments on the Proof Techniques Used}
   954 The non-trivial part of proving the correctness of the algorithm with simplification
   954 Straightforward and simple as the proof may seem,
   955 compared with not having simplification is that we can no longer use the argument 
   955 the efforts we spent obtaining it was far from trivial.\\
   956 in \cref{flex_retrieve}.
   956 We initially attempted to re-use the argument 
   957 The function \retrieve needs the cumbersome structure of the (umsimplified)
   957 in \cref{flex_retrieve}. 
   958 annotated regular expression to 
   958 The problem was that both functions $\inj$ and $\retrieve$ require 
   959 agree with the structure of the value, but simplification will always mess with the 
   959 that the annotated regular expressions stay unsimplified, 
   960 structure.
   960 so that one can 
   961 
   961 correctly compare $v_{i+1}$ and $r_i$  and $v_i$ 
   962 We also tried to prove $\bsimp{\bderssimp{a}{s}} = \bsimp{a\backslash s}$,
   962 in diagram \ref{graph:inj} and 
   963 but this turns out to be not true, A counterexample of this being
   963 ``fit the key into the lock hole''.
   964 \[ r = [(1+c)\cdot [aa \cdot (1+c)]] \land s = aa
   964 
       
   965 \noindent
       
   966 We also tried to prove 
       
   967 \begin{center}
       
   968 $\textit{bsimp} \;\; (\bderssimp{a}{s}) = 
       
   969 \textit{bsimp} \;\;  (a\backslash s)$,
       
   970 \end{center}
       
   971 but this turns out to be not true.
       
   972 A counterexample would be
       
   973 \[ a = [(_{Z}1+_{S}c)\cdot [bb \cdot (_{Z}1+_{S}c)]] \;\; 
       
   974 	\text{and} \;\; s = bb.
   965 \]
   975 \]
   966 
   976 \noindent
   967 Then we would have $\bsimp{a \backslash s}$ being 
   977 Then we would have 
   968 $_{[]}(_{ZZ}\ONE +  _{ZS}c ) $
   978 \begin{center}
   969 whereas $\bsimp{\bderssimp{a}{s}}$ would be 
   979 	$\textit{bsimp}\;\; ( a \backslash s )$ =
   970 $_{Z}(_{Z} \ONE + _{S} c)$.
   980 	$_{[]}(_{ZZ}\ONE +  _{ZS}c ) $
   971 Unfortunately if we apply $\textit{bsimp}$ at different
   981 \end{center}
   972 stages we will always have this discrepancy, due to 
   982 \noindent
   973 whether the $\map \; (\fuse\; bs) \; as$ operation in $\textit{bsimp}$
   983 whereas 
   974 is taken at some points will be entirely dependant on when the simplification 
   984 \begin{center}
   975 take place whether there is a larger alternative structure surrounding the 
   985 	$\textit{bsimp} \;\;( \bderssimp{a}{s} )$ =  
   976 alternative being simplified.
   986 	$_{Z}(_{Z} \ONE + _{S} c)$.
   977 The good thing about $\stackrel{*}{\rightsquigarrow} $ is that it allows
   987 \end{center}
   978 us not specify how exactly the "atomic" simplification steps $\rightsquigarrow$
   988 Unfortunately, 
   979 are taken, but simply say that they can be taken to make two similar 
   989 if we apply $\textit{bsimp}$ differently
   980 regular expressions equal, and can be done after interleaving derivatives
   990 we will always have this discrepancy. 
   981 and simplifications.
   991 This is due to 
   982 
   992 the $\map \; (\fuse\; bs) \; as$ operation 
   983 
   993 happening at different locations in the regular expression.\\
   984 Having correctness property is good. But we would also like the lexer to be efficient in 
   994 The rewriting relation 
   985 some sense, for exampe, not grinding to a halt at certain cases.
   995 $\rightsquigarrow^*$ 
   986 In the next chapter we shall prove that for a given $r$, the internal derivative size is always
   996 allows us to ignore this discrepancy
       
   997 and view the expressions 
       
   998 \begin{center}
       
   999 	$_{[]}(_{ZZ}\ONE +  _{ZS}c ) $\\
       
  1000 	and\\
       
  1001 	$_{Z}(_{Z} \ONE + _{S} c)$
       
  1002 
       
  1003 \end{center}
       
  1004 as equal, because they were both re-written
       
  1005 from the same expression.\\
       
  1006 Having correctness property is good. 
       
  1007 But we would also a guarantee that the lexer is not slow in 
       
  1008 some sense, for exampe, not grinding to a halt regardless of the input.
       
  1009 As we have already seen, Sulzmann and Lu's simplification function
       
  1010 $\simpsulz$ cannot achieve this, because their claim that
       
  1011 the regular expression size does not grow arbitrary large
       
  1012 was not true. 
       
  1013 In the next chapter we shall prove that with our $\simp$, 
       
  1014 for a given $r$, the internal derivative size is always
   987 finitely bounded by a constant.
  1015 finitely bounded by a constant.
   988 we would expect in the