940 \end{proof} |
940 \end{proof} |
941 \noindent |
941 \noindent |
942 As a corollary, |
942 As a corollary, |
943 we link this result with the lemma we proved earlier that |
943 we link this result with the lemma we proved earlier that |
944 \begin{center} |
944 \begin{center} |
945 $(r, s) \rightarrow v \implies \blexer \; r \; s = v$ |
945 $(r, s) \rightarrow v \;\; \textit{iff}\;\; \blexer \; r \; s = v$ |
946 \end{center} |
946 \end{center} |
947 and obtain the corollary that the bit-coded lexer with simplification is |
947 and obtain the corollary that the bit-coded lexer with simplification is |
948 indeed correctly outputting POSIX lexing result, if such a result exists. |
948 indeed correctly outputting POSIX lexing result, if such a result exists. |
949 \begin{corollary} |
949 \begin{corollary} |
950 $(r, s) \rightarrow v \implies \blexersimp{r}{s}$ |
950 $(r, s) \rightarrow v \;\; \textit{iff} \;\; \blexersimp \; r\; s $ |
951 \end{corollary} |
951 \end{corollary} |
952 |
952 |
953 \subsection{Comments on the Proof Techniques Used} |
953 \subsection{Comments on the Proof Techniques Used} |
954 The non-trivial part of proving the correctness of the algorithm with simplification |
954 Straightforward and simple as the proof may seem, |
955 compared with not having simplification is that we can no longer use the argument |
955 the efforts we spent obtaining it was far from trivial.\\ |
956 in \cref{flex_retrieve}. |
956 We initially attempted to re-use the argument |
957 The function \retrieve needs the cumbersome structure of the (umsimplified) |
957 in \cref{flex_retrieve}. |
958 annotated regular expression to |
958 The problem was that both functions $\inj$ and $\retrieve$ require |
959 agree with the structure of the value, but simplification will always mess with the |
959 that the annotated regular expressions stay unsimplified, |
960 structure. |
960 so that one can |
961 |
961 correctly compare $v_{i+1}$ and $r_i$ and $v_i$ |
962 We also tried to prove $\bsimp{\bderssimp{a}{s}} = \bsimp{a\backslash s}$, |
962 in diagram \ref{graph:inj} and |
963 but this turns out to be not true, A counterexample of this being |
963 ``fit the key into the lock hole''. |
964 \[ r = [(1+c)\cdot [aa \cdot (1+c)]] \land s = aa |
964 |
|
965 \noindent |
|
966 We also tried to prove |
|
967 \begin{center} |
|
968 $\textit{bsimp} \;\; (\bderssimp{a}{s}) = |
|
969 \textit{bsimp} \;\; (a\backslash s)$, |
|
970 \end{center} |
|
971 but this turns out to be not true. |
|
972 A counterexample would be |
|
973 \[ a = [(_{Z}1+_{S}c)\cdot [bb \cdot (_{Z}1+_{S}c)]] \;\; |
|
974 \text{and} \;\; s = bb. |
965 \] |
975 \] |
966 |
976 \noindent |
967 Then we would have $\bsimp{a \backslash s}$ being |
977 Then we would have |
968 $_{[]}(_{ZZ}\ONE + _{ZS}c ) $ |
978 \begin{center} |
969 whereas $\bsimp{\bderssimp{a}{s}}$ would be |
979 $\textit{bsimp}\;\; ( a \backslash s )$ = |
970 $_{Z}(_{Z} \ONE + _{S} c)$. |
980 $_{[]}(_{ZZ}\ONE + _{ZS}c ) $ |
971 Unfortunately if we apply $\textit{bsimp}$ at different |
981 \end{center} |
972 stages we will always have this discrepancy, due to |
982 \noindent |
973 whether the $\map \; (\fuse\; bs) \; as$ operation in $\textit{bsimp}$ |
983 whereas |
974 is taken at some points will be entirely dependant on when the simplification |
984 \begin{center} |
975 take place whether there is a larger alternative structure surrounding the |
985 $\textit{bsimp} \;\;( \bderssimp{a}{s} )$ = |
976 alternative being simplified. |
986 $_{Z}(_{Z} \ONE + _{S} c)$. |
977 The good thing about $\stackrel{*}{\rightsquigarrow} $ is that it allows |
987 \end{center} |
978 us not specify how exactly the "atomic" simplification steps $\rightsquigarrow$ |
988 Unfortunately, |
979 are taken, but simply say that they can be taken to make two similar |
989 if we apply $\textit{bsimp}$ differently |
980 regular expressions equal, and can be done after interleaving derivatives |
990 we will always have this discrepancy. |
981 and simplifications. |
991 This is due to |
982 |
992 the $\map \; (\fuse\; bs) \; as$ operation |
983 |
993 happening at different locations in the regular expression.\\ |
984 Having correctness property is good. But we would also like the lexer to be efficient in |
994 The rewriting relation |
985 some sense, for exampe, not grinding to a halt at certain cases. |
995 $\rightsquigarrow^*$ |
986 In the next chapter we shall prove that for a given $r$, the internal derivative size is always |
996 allows us to ignore this discrepancy |
|
997 and view the expressions |
|
998 \begin{center} |
|
999 $_{[]}(_{ZZ}\ONE + _{ZS}c ) $\\ |
|
1000 and\\ |
|
1001 $_{Z}(_{Z} \ONE + _{S} c)$ |
|
1002 |
|
1003 \end{center} |
|
1004 as equal, because they were both re-written |
|
1005 from the same expression.\\ |
|
1006 Having correctness property is good. |
|
1007 But we would also a guarantee that the lexer is not slow in |
|
1008 some sense, for exampe, not grinding to a halt regardless of the input. |
|
1009 As we have already seen, Sulzmann and Lu's simplification function |
|
1010 $\simpsulz$ cannot achieve this, because their claim that |
|
1011 the regular expression size does not grow arbitrary large |
|
1012 was not true. |
|
1013 In the next chapter we shall prove that with our $\simp$, |
|
1014 for a given $r$, the internal derivative size is always |
987 finitely bounded by a constant. |
1015 finitely bounded by a constant. |
988 we would expect in the |
|