thys2/Paper/Paper.thy
changeset 425 14c558ae0b09
parent 424 2416fdec6396
child 426 5b77220fdf01
equal deleted inserted replaced
424:2416fdec6396 425:14c558ae0b09
    25 notation (latex output)
    25 notation (latex output)
    26   der_syn ("_\\_" [79, 1000] 76) and
    26   der_syn ("_\\_" [79, 1000] 76) and
    27   ders_syn ("_\\_" [79, 1000] 76) and
    27   ders_syn ("_\\_" [79, 1000] 76) and
    28   bder_syn ("_\\_" [79, 1000] 76) and
    28   bder_syn ("_\\_" [79, 1000] 76) and
    29   bders ("_\\_" [79, 1000] 76) and
    29   bders ("_\\_" [79, 1000] 76) and
    30   bders_simp ("_\\\<^sub>s\<^sub>i\<^sub>m\<^sub>p _" [79, 1000] 76) and
    30   bders_simp ("_\\\<^sub>b\<^sub>s\<^sub>i\<^sub>m\<^sub>p _" [79, 1000] 76) and
    31 
    31 
    32   ZERO ("\<^bold>0" 81) and 
    32   ZERO ("\<^bold>0" 81) and 
    33   ONE ("\<^bold>1" 81) and 
    33   ONE ("\<^bold>1" 81) and 
    34   CH ("_" [1000] 80) and
    34   CH ("_" [1000] 80) and
    35   ALT ("_ + _" [77,77] 78) and
    35   ALT ("_ + _" [77,77] 78) and
    68   bsimp_ASEQ ("bsimpSEQ _ _ _" [10,1000,1000] 80) and
    68   bsimp_ASEQ ("bsimpSEQ _ _ _" [10,1000,1000] 80) and
    69   bmkeps ("bmkeps _" [1000] 80) and
    69   bmkeps ("bmkeps _" [1000] 80) and
    70 
    70 
    71   srewrite ("_\<^latex>\<open>\\mbox{$\\,\\stackrel{s}{\\leadsto}$}\<close> _" [71, 71] 80) and
    71   srewrite ("_\<^latex>\<open>\\mbox{$\\,\\stackrel{s}{\\leadsto}$}\<close> _" [71, 71] 80) and
    72   rrewrites ("_ \<^latex>\<open>\\mbox{$\\,\\leadsto^*$}\<close> _" [71, 71] 80) and
    72   rrewrites ("_ \<^latex>\<open>\\mbox{$\\,\\leadsto^*$}\<close> _" [71, 71] 80) and
       
    73   srewrites ("_ \<^latex>\<open>\\mbox{$\\,\\stackrel{s}{\\leadsto}^*$}\<close> _" [71, 71] 80) and
    73   blexer_simp ("blexer\<^sup>+" 1000) 
    74   blexer_simp ("blexer\<^sup>+" 1000) 
    74 
    75 
    75 
    76 
    76 lemma better_retrieve:
    77 lemma better_retrieve:
    77    shows "rs \<noteq> Nil ==> retrieve (AALTs bs (r#rs)) (Left v) = bs @ retrieve r v"
    78    shows "rs \<noteq> Nil ==> retrieve (AALTs bs (r#rs)) (Left v) = bs @ retrieve r v"
   186 does not achieve the intended objective with bitcoded regular expressions.  The
   187 does not achieve the intended objective with bitcoded regular expressions.  The
   187 reason is that in the bitcoded setting, each copy generally has a
   188 reason is that in the bitcoded setting, each copy generally has a
   188 different bitcode annotation---so @{text nub} would never ``fire''.
   189 different bitcode annotation---so @{text nub} would never ``fire''.
   189 Inspired by Scala's library for lists, we shall instead use a @{text
   190 Inspired by Scala's library for lists, we shall instead use a @{text
   190 distinctBy} function that finds duplicates under an erasing function
   191 distinctBy} function that finds duplicates under an erasing function
   191 that deletes bitcodes.
   192 which deletes bitcodes.
   192 We shall also introduce our own argument and definitions for
   193 We shall also introduce our own argument and definitions for
   193 establishing the correctness of the bitcoded algorithm when 
   194 establishing the correctness of the bitcoded algorithm when 
   194 simplifications are included.\medskip
   195 simplifications are included.\medskip
   195 
   196 
   196 \noindent In this paper, we shall first briefly introduce the basic notions
   197 \noindent In this paper, we shall first briefly introduce the basic notions
   356 	lexing acquired its name from the fact that the corresponding
   357 	lexing acquired its name from the fact that the corresponding
   357 	rules were described as part of the POSIX specification for
   358 	rules were described as part of the POSIX specification for
   358 	Unix-like operating systems \cite{POSIX}.} Sometimes these
   359 	Unix-like operating systems \cite{POSIX}.} Sometimes these
   359   informal rules are called \emph{maximal much rule} and \emph{rule priority}.
   360   informal rules are called \emph{maximal much rule} and \emph{rule priority}.
   360   One contribution of our earlier paper is to give a convenient
   361   One contribution of our earlier paper is to give a convenient
   361  specification for what a POSIX value is (the inductive rules are shown in
   362  specification for what POSIX values are (the inductive rules are shown in
   362   Figure~\ref{POSIXrules}).
   363   Figure~\ref{POSIXrules}).
   363 
   364 
   364 \begin{figure}[t]
   365 \begin{figure}[t]
   365   \begin{center}
   366   \begin{center}
   366   \begin{tabular}{c}
   367   \begin{tabular}{c}
   434   where the path from the left to the right involving @{term derivatives}/@{const
   435   where the path from the left to the right involving @{term derivatives}/@{const
   435   nullable} is the first phase of the algorithm (calculating successive
   436   nullable} is the first phase of the algorithm (calculating successive
   436   \Brz's derivatives) and @{const mkeps}/@{text inj}, the path from right to
   437   \Brz's derivatives) and @{const mkeps}/@{text inj}, the path from right to
   437   left, the second phase. The picture above shows the steps required when a
   438   left, the second phase. The picture above shows the steps required when a
   438   regular expression, say @{text "r\<^sub>1"}, matches the string @{term
   439   regular expression, say @{text "r\<^sub>1"}, matches the string @{term
   439   "[a,b,c]"}. The lexing algorithm by Sulzmann and Lu can be defined as:
   440   "[a,b,c]"}. The first lexing algorithm by Sulzmann and Lu can be defined as:
   440 
   441 
   441   \begin{figure}[t]
   442   \begin{figure}[t]
   442 \begin{center}
   443 \begin{center}
   443 \begin{tikzpicture}[scale=2,node distance=1.3cm,
   444 \begin{tikzpicture}[scale=2,node distance=1.3cm,
   444                     every node/.style={minimum size=6mm}]
   445                     every node/.style={minimum size=6mm}]
   511 
   512 
   512   \noindent
   513   \noindent
   513   In fact we have shown that in the success case the generated POSIX value $v$ is
   514   In fact we have shown that in the success case the generated POSIX value $v$ is
   514   unique and in the failure case that there is no POSIX value $v$ that satisfies
   515   unique and in the failure case that there is no POSIX value $v$ that satisfies
   515   $(s, r) \rightarrow v$. While the algorithm is correct, it is excrutiatingly
   516   $(s, r) \rightarrow v$. While the algorithm is correct, it is excrutiatingly
   516   slow in cases where the derivatives grow arbitrarily (see example from the
   517   slow in cases where the derivatives grow arbitrarily (recall the example from the
   517   Introduction). However it can be used as a convenient reference point for the correctness
   518   Introduction). However it can be used as a convenient reference point for the correctness
   518   proof of the second algorithm by Sulzmann and Lu, which we shall describe next.
   519   proof of the second algorithm by Sulzmann and Lu, which we shall describe next.
   519   
   520   
   520 *}
   521 *}
   521 
   522 
   542   \noindent where @{text bs} stands for bitsequences; @{text r},
   543   \noindent where @{text bs} stands for bitsequences; @{text r},
   543   @{text "r\<^sub>1"} and @{text "r\<^sub>2"} for bitcoded regular
   544   @{text "r\<^sub>1"} and @{text "r\<^sub>2"} for bitcoded regular
   544   expressions; and @{text rs} for lists of bitcoded regular
   545   expressions; and @{text rs} for lists of bitcoded regular
   545   expressions. The binary alternative @{text "ALT bs r\<^sub>1 r\<^sub>2"}
   546   expressions. The binary alternative @{text "ALT bs r\<^sub>1 r\<^sub>2"}
   546   is just an abbreviation for \mbox{@{text "ALTs bs [r\<^sub>1, r\<^sub>2]"}}. 
   547   is just an abbreviation for \mbox{@{text "ALTs bs [r\<^sub>1, r\<^sub>2]"}}. 
   547   For bitsequences we just use lists made up of the
   548   For bitsequences we use lists made up of the
   548   constants @{text Z} and @{text S}.  The idea with bitcoded regular
   549   constants @{text Z} and @{text S}.  The idea with bitcoded regular
   549   expressions is to incrementally generate the value information (for
   550   expressions is to incrementally generate the value information (for
   550   example @{text Left} and @{text Right}) as bitsequences. For this 
   551   example @{text Left} and @{text Right}) as bitsequences. For this 
   551   Sulzmann and Lu define a coding
   552   Sulzmann and Lu define a coding
   552   function for how values can be coded into bitsequences.
   553   function for how values can be coded into bitsequences.
   627   for any bit-sequence $bs$ and $\vdash v : r$. This property can be
   628   for any bit-sequence $bs$ and $\vdash v : r$. This property can be
   628   easily proved by induction on $\vdash v : r$.
   629   easily proved by induction on $\vdash v : r$.
   629 \end{proof}  
   630 \end{proof}  
   630 
   631 
   631   Sulzmann and Lu define the function \emph{internalise}
   632   Sulzmann and Lu define the function \emph{internalise}
   632   in order to transform standard regular expressions into annotated
   633   in order to transform (standard) regular expressions into annotated
   633   regular expressions. We write this operation as $r^\uparrow$.
   634   regular expressions. We write this operation as $r^\uparrow$.
   634   This internalisation uses the following
   635   This internalisation uses the following
   635   \emph{fuse} function.	     
   636   \emph{fuse} function.	     
   636 
   637 
   637   \begin{center}
   638   \begin{center}
   650   \end{tabular}    
   651   \end{tabular}    
   651   \end{center}    
   652   \end{center}    
   652 
   653 
   653   \noindent
   654   \noindent
   654   A regular expression can then be \emph{internalised} into a bitcoded
   655   A regular expression can then be \emph{internalised} into a bitcoded
   655   regular expression as follows.
   656   regular expression as follows:
   656 
   657 
   657   \begin{center}
   658   \begin{center}
   658   \begin{tabular}{lcl}
   659   \begin{tabular}{lcl}
   659   $(\ZERO)^\uparrow$ & $\dn$ & $\textit{ZERO}$\\
   660   $(\ZERO)^\uparrow$ & $\dn$ & $\textit{ZERO}$\\
   660   $(\ONE)^\uparrow$ & $\dn$ & $\textit{ONE}\,[]$\\
   661   $(\ONE)^\uparrow$ & $\dn$ & $\textit{ONE}\,[]$\\
   674   transforms a bitcoded regular expression into a (standard) regular
   675   transforms a bitcoded regular expression into a (standard) regular
   675   expression by just erasing the annotated bitsequences. We omit the
   676   expression by just erasing the annotated bitsequences. We omit the
   676   straightforward definition. For defining the algorithm, we also need
   677   straightforward definition. For defining the algorithm, we also need
   677   the functions \textit{bnullable} and \textit{bmkeps}, which are the
   678   the functions \textit{bnullable} and \textit{bmkeps}, which are the
   678   ``lifted'' versions of \textit{nullable} and \textit{mkeps} acting on
   679   ``lifted'' versions of \textit{nullable} and \textit{mkeps} acting on
   679   bitcoded regular expressions, instead of regular expressions.
   680   bitcoded regular expressions.
   680   %
   681   %
   681   \begin{center}
   682   \begin{center}
   682   \begin{tabular}{@ {}c@ {}c@ {}}
   683   \begin{tabular}{@ {}c@ {}c@ {}}
   683   \begin{tabular}{@ {}l@ {\hspace{1mm}}c@ {\hspace{1mm}}l}
   684   \begin{tabular}{@ {}l@ {\hspace{1mm}}c@ {\hspace{1mm}}l}
   684   $\textit{bnullable}\,(\textit{ZERO})$ & $\dn$ & $\textit{false}$\\
   685   $\textit{bnullable}\,(\textit{ZERO})$ & $\dn$ & $\textit{false}$\\
   708  
   709  
   709 
   710 
   710   \noindent
   711   \noindent
   711   The key function in the bitcoded algorithm is the derivative of a
   712   The key function in the bitcoded algorithm is the derivative of a
   712   bitcoded regular expression. This derivative calculates the
   713   bitcoded regular expression. This derivative calculates the
   713   derivative but at the same time also the incremental part of bitsequences
   714   derivative but at the same time also the incremental part of the bitsequences
   714   that contribute to constructing a POSIX value.	
   715   that contribute to constructing a POSIX value.	
   715 
   716 
   716   \begin{center}
   717   \begin{center}
   717   \begin{tabular}{@ {}lcl@ {}}
   718   \begin{tabular}{@ {}lcl@ {}}
   718   $(\textit{ZERO})\backslash c$ & $\dn$ & $\textit{ZERO}$ \\  
   719   $(\textit{ZERO})\backslash c$ & $\dn$ & $\textit{ZERO}$ \\  
   809 \end{proof}
   810 \end{proof}
   810 
   811 
   811 \noindent
   812 \noindent
   812 The only difficulty left for the correctness proof is that the bitcoded algorithm
   813 The only difficulty left for the correctness proof is that the bitcoded algorithm
   813 has only a ``forward phase'' where POSIX values are generated incrementally.
   814 has only a ``forward phase'' where POSIX values are generated incrementally.
   814 We can achieve the same effect with @{text lexer} by stacking up injection
   815 We can achieve the same effect with @{text lexer} (which has two phases) by stacking up injection
   815 functions during the forward phase. An auxiliary function, called $\textit{flex}$,
   816 functions during the forward phase. An auxiliary function, called $\textit{flex}$,
   816 allows us to recast the rules of $\lexer$ (with its two phases) in terms of a single
   817 allows us to recast the rules of $\lexer$ in terms of a single
   817 phase and stacked up injection functions.
   818 phase and stacked up injection functions.
   818 
   819 
   819 \begin{center}
   820 \begin{center}
   820 \begin{tabular}{lcl}
   821 \begin{tabular}{lcl}
   821   $\textit{flex}\;r\,f\,[]$ & $\dn$ & $f$\\
   822   $\textit{flex}\;r\,f\,[]$ & $\dn$ & $f$\\
   824 \end{tabular}    
   825 \end{tabular}    
   825 \end{center}    
   826 \end{center}    
   826 
   827 
   827 \noindent
   828 \noindent
   828 The point of this function is that when
   829 The point of this function is that when
   829 reaching the end of the string, we just need to apply the stacked
   830 reaching the end of the string, we just need to apply the stacked up
   830 injection functions to the value generated by @{text mkeps}.
   831 injection functions to the value generated by @{text mkeps}.
   831 Using this function we can recast the success case in @{text lexer} 
   832 Using this function we can recast the success case in @{text lexer} 
   832 as follows:
   833 as follows:
   833 
   834 
   834 \begin{proposition}\label{flex}
   835 \begin{proposition}\label{flex}
   841 value generated by \textit{lexer} can also be obtained by a different
   842 value generated by \textit{lexer} can also be obtained by a different
   842 method. While this different method is not efficient (we essentially
   843 method. While this different method is not efficient (we essentially
   843 need to traverse the string $s$ twice, once for building the
   844 need to traverse the string $s$ twice, once for building the
   844 derivative $r\backslash s$ and another time for stacking up injection
   845 derivative $r\backslash s$ and another time for stacking up injection
   845 functions using \textit{flex}), it helps us with proving
   846 functions using \textit{flex}), it helps us with proving
   846 that incrementally building up values generates the same result.
   847 that incrementally building up values in @{text blexer} generates the same result.
   847 
   848 
   848 This brings us to our main lemma in this section: if we calculate a
   849 This brings us to our main lemma in this section: if we calculate a
   849 derivative, say $r\backslash s$ and have a value, say $v$, inhabited
   850 derivative, say $r\backslash s$, and have a value, say $v$, inhabited
   850 by this derivative, then we can produce the result $\lexer$ generates
   851 by this derivative, then we can produce the result @{text lexer} generates
   851 by applying this value to the stacked-up injection functions
   852 by applying this value to the stacked-up injection functions
   852 that $\textit{flex}$ assembles. The lemma establishes that this is the same
   853 that $\textit{flex}$ assembles. The lemma establishes that this is the same
   853 value as if we build the annotated derivative $r^\uparrow\backslash s$
   854 value as if we build the annotated derivative $r^\uparrow\backslash s$
   854 and then retrieve the corresponding bitcoded version, followed by a
   855 and then retrieve the corresponding bitcoded version, followed by a
   855 decoding step.
   856 decoding step.
   898 \noindent
   899 \noindent
   899 With this lemma in place, we can prove the correctness of \textit{blexer}---it indeed
   900 With this lemma in place, we can prove the correctness of \textit{blexer}---it indeed
   900 produces the same result as \textit{lexer}.
   901 produces the same result as \textit{lexer}.
   901 
   902 
   902 
   903 
   903 \begin{theorem}
   904 \begin{theorem}\label{thmone}
   904 $\textit{lexer}\,r\,s = \textit{blexer}\,r\,s$
   905 $\textit{lexer}\,r\,s = \textit{blexer}\,r\,s$
   905 \end{theorem}  
   906 \end{theorem}  
   906 
   907 
   907 \begin{proof}
   908 \begin{proof}
   908   We can first expand both sides using Prop.~\ref{flex} and the
   909   We can first expand both sides using Prop.~\ref{flex} and the
   967       \underbrace{\mbox{$(\ONE + \ONE{}a) \cdot (a + aa)^*$}}_{r} \;+\;
   968       \underbrace{\mbox{$(\ONE + \ONE{}a) \cdot (a + aa)^*$}}_{r} \;+\;
   968      ((a + aa)^* + \underbrace{\mbox{$(\ONE + \ONE{}a) \cdot (a + aa)^*$}}_{r})
   969      ((a + aa)^* + \underbrace{\mbox{$(\ONE + \ONE{}a) \cdot (a + aa)^*$}}_{r})
   969      \end{equation}
   970      \end{equation}
   970 
   971 
   971      \noindent This is a simpler derivative, but unfortunately we
   972      \noindent This is a simpler derivative, but unfortunately we
   972      cannot make further simplifications. This is a problem because
   973      cannot make any further simplifications. This is a problem because
   973      the outermost alternatives contains two copies of the same
   974      the outermost alternatives contains two copies of the same
   974      regular expression (underlined with $r$). The copies will
   975      regular expression (underlined with $r$). These copies will
   975      spawn new copies in later steps and they in turn further copies. This
   976      spawn new copies in later derivative steps and they in turn even more copies. This
   976      destroys an hope of taming the size of the derivatives.  But the
   977      destroys any hope of taming the size of the derivatives.  But the
   977      second copy of $r$ in \eqref{derivex} will never contribute to a
   978      second copy of $r$ in \eqref{derivex} will never contribute to a
   978      value, because POSIX lexing will always prefer matching a string
   979      value, because POSIX lexing will always prefer matching a string
   979      with the first copy. So in principle it could be removed.
   980      with the first copy. So it could be safely removed without affecting the correctness of the algorithm.
   980      The dilemma with the simple-minded
   981      The dilemma with the simple-minded
   981      simplification rules above is that the rule $r + r \Rightarrow r$
   982      simplification rules above is that the rule $r + r \Rightarrow r$
   982      will never be applicable because as can be seen in this example the
   983      will never be applicable because as can be seen in this example the
   983      regular expressions are separated by another sub-regular expression.
   984      regular expressions are not next to each other but separated by another regular expression.
   984 
   985 
   985      But here is where Sulzmann and Lu's representation of generalised
   986      But here is where Sulzmann and Lu's representation of generalised
   986      alternatives in the bitcoded algorithm shines: in @{term
   987      alternatives in the bitcoded algorithm shines: in @{term
   987      "ALTs bs rs"} we can define a more aggressive simplification by
   988      "ALTs bs rs"} we can define a more aggressive simplification by
   988      recursively simplifying all regular expressions in @{text rs} and
   989      recursively simplifying all regular expressions in @{text rs} and
   989      then analyse the resulting list and remove any duplicates.
   990      then analyse the resulting list and remove any duplicates.
   990      Another advantage is that the bitsequences in  bitcoded regular
   991      Another advantage with the bitsequences in  bitcoded regular
   991      expressions can be easily modified such that simplification does not
   992      expressions is that they can be easily modified such that simplification does not
   992      interfere with the value constructions. For example we can ``flatten'', or
   993      interfere with the value constructions. For example we can ``flatten'', or
   993      de-nest, @{text ALTs} as follows
   994      de-nest, @{text ALTs} as follows
   994      %
   995      %
   995      \[
   996      \[
   996      @{term "ALTs bs\<^sub>1 ((ALTs bs\<^sub>2 rs\<^sub>2) # rs\<^sub>1)"}
   997      @{term "ALTs bs\<^sub>1 ((ALTs bs\<^sub>2 rs\<^sub>2) # rs\<^sub>1)"}
  1004      ensure that the correct value corresponding to the original (unsimplified)
  1005      ensure that the correct value corresponding to the original (unsimplified)
  1005      regular expression can still be extracted. %In this way the value construction
  1006      regular expression can still be extracted. %In this way the value construction
  1006      %is not affected by simplification. 
  1007      %is not affected by simplification. 
  1007 
  1008 
  1008      However there is one problem with the definition for the more
  1009      However there is one problem with the definition for the more
  1009      aggressive simlification rules by Sulzmann and Lu. Recasting
  1010      aggressive simlification rules described by Sulzmann and Lu. Recasting
  1010      their definition with our syntax they define the step of removing
  1011      their definition with our syntax they define the step of removing
  1011      duplicates as
  1012      duplicates as
  1012      %
  1013      %
  1013      \[ @{text "bsimp (ALTs bs rs)"} \dn @{text "ALTs
  1014      \[ @{text "bsimp (ALTs bs rs)"} \dn @{text "ALTs
  1014      bs (nup (map bsimp rs))"}
  1015      bs (nup (map bsimp rs))"}
  1018      expressions in @{text rs} (using @{text map}) and then use
  1019      expressions in @{text rs} (using @{text map}) and then use
  1019      Haskell's @{text nub}-function to remove potential
  1020      Haskell's @{text nub}-function to remove potential
  1020      duplicates. While this makes sense when considering the example
  1021      duplicates. While this makes sense when considering the example
  1021      shown in \eqref{derivex}, @{text nub} is the inappropriate
  1022      shown in \eqref{derivex}, @{text nub} is the inappropriate
  1022      function in the case of bitcoded regular expressions. The reason
  1023      function in the case of bitcoded regular expressions. The reason
  1023      is that in general the n elements in @{text rs} will have a
  1024      is that in general the elements in @{text rs} will have a
  1024      different bitsequence annotated to it and in this way @{text nub}
  1025      different annotated bitsequence and in this way @{text nub}
  1025      will never find a duplicate to be removed. The correct way to
  1026      will never find a duplicate to be removed. The correct way to
  1026      handle this situation is to first \emph{erase} the regular
  1027      handle this situation is to first \emph{erase} the regular
  1027      expressions when comparing potential duplicates. This is inspired
  1028      expressions when comparing potential duplicates. This is inspired
  1028      by Scala's list functions of the form \mbox{@{text "distinctBy rs f
  1029      by Scala's list functions of the form \mbox{@{text "distinctBy rs f
  1029      acc"}} where a function is applied first before two elements
  1030      acc"}} where a function is applied first before two elements
  1035      @{thm (lhs) distinctBy.simps(2)} & $\dn$ & @{thm (rhs) distinctBy.simps(2)}
  1036      @{thm (lhs) distinctBy.simps(2)} & $\dn$ & @{thm (rhs) distinctBy.simps(2)}
  1036      \end{tabular}
  1037      \end{tabular}
  1037      \end{center}
  1038      \end{center}
  1038 
  1039 
  1039      \noindent where we scan the list from left to right (because we
  1040      \noindent where we scan the list from left to right (because we
  1040      have to remove later copies). In this function, @{text f} is a
  1041      have to remove later copies). In @{text distinctBy}, @{text f} is a
  1041      function and @{text acc} is an accumulator for regular
  1042      function and @{text acc} is an accumulator for regular
  1042      expressions---essentially a set of elements we have already seen
  1043      expressions---essentially a set of regular expression that we have already seen
  1043      while scanning the list. Therefore we delete an element, say @{text x},
  1044      while scanning the list. Therefore we delete an element, say @{text x},
  1044      from the list provided @{text "f x"} is already in the accumulator;
  1045      from the list provided @{text "f x"} is already in the accumulator;
  1045      otherwise we keep @{text x} and scan the rest of the list but now
  1046      otherwise we keep @{text x} and scan the rest of the list but 
  1046      add @{text "f x"} as another element to @{text acc}. We will use
  1047      add @{text "f x"} as another ``seen'' element to @{text acc}. We will use
  1047      @{term distinctBy} where @{text f} is our erase functions, @{term "erase (DUMMY)"},
  1048      @{term distinctBy} where @{text f} is the erase functions, @{term "erase (DUMMY)"},
  1048      that deletes bitsequences from bitcoded regular expressions.
  1049      that deletes bitsequences from bitcoded regular expressions.
  1049      This is clearly a computationally more expensive operation, than @{text nub},
  1050      This is clearly a computationally more expensive operation, than @{text nub},
  1050      but is needed in order to make the removal of unnecessary copies
  1051      but is needed in order to make the removal of unnecessary copies
  1051      to work.
  1052      to work properly.
  1052 
  1053 
  1053      Our simplification function depends on three helper functions, one is called
  1054      Our simplification function depends on three helper functions, one is called
  1054      @{text flts} and defined as follows:
  1055      @{text flts} and analyses lists of regular expressions coming from alternatives.
       
  1056      It is defined as follows:
  1055 
  1057 
  1056      \begin{center}
  1058      \begin{center}
  1057      \begin{tabular}{l@ {\hspace{1mm}}c@ {\hspace{1mm}}l}
  1059      \begin{tabular}{l@ {\hspace{1mm}}c@ {\hspace{1mm}}l}
  1058      @{thm (lhs) flts.simps(1)} & $\dn$ & @{thm (rhs) flts.simps(1)}\\
  1060      @{thm (lhs) flts.simps(1)} & $\dn$ & @{thm (rhs) flts.simps(1)}\\
  1059      @{thm (lhs) flts.simps(2)} & $\dn$ & @{thm (rhs) flts.simps(2)}\\
  1061      @{thm (lhs) flts.simps(2)} & $\dn$ & @{thm (rhs) flts.simps(2)}\\
  1060      @{thm (lhs) flts.simps(3)[of "bs'" "rs'"]} & $\dn$ & @{thm (rhs) flts.simps(3)[of "bs'" "rs'"]}\\
  1062      @{thm (lhs) flts.simps(3)[of "bs'" "rs'"]} & $\dn$ & @{thm (rhs) flts.simps(3)[of "bs'" "rs'"]}\\
  1061      \end{tabular}
  1063      \end{tabular}
  1062      \end{center}
  1064      \end{center}
  1063 
  1065 
  1064      \noindent
  1066      \noindent
  1065      The second clause removes all instances of @{text ZERO} in alternatives and
  1067      The second clause of @{text flts} removes all instances of @{text ZERO} in alternatives and
  1066      the second ``spills'' out nested alternatives (but retaining the
  1068      the second ``spills'' out nested alternatives (but retaining the
  1067      bitsequence @{text "bs'"} accumulated in the inner alternative). There are
  1069      bitsequence @{text "bs'"} accumulated in the inner alternative). There are
  1068      some corner cases to be considered when the resulting list inside an alternative is
  1070      some corner cases to be considered when the resulting list inside an alternative is
  1069      empty or a singleton list. We take care of those cases in the
  1071      empty or a singleton list. We take care of those cases in the
  1070      @{text "bsimpALTs"} function; similarly we define a helper function that simplifies
  1072      @{text "bsimpALTs"} function; similarly we define a helper function that simplifies
  1101      \end{tabular}
  1103      \end{tabular}
  1102      \end{center}
  1104      \end{center}
  1103 
  1105 
  1104      \noindent
  1106      \noindent
  1105      As far as we can see, our recursive function @{term bsimp} simplifies regular
  1107      As far as we can see, our recursive function @{term bsimp} simplifies regular
  1106      expressions as intended by Sulzmann and Lu. There is no point to apply the
  1108      expressions as intended by Sulzmann and Lu. There is no point in applying the
  1107      @{text bsimp}
  1109      @{text bsimp} function repeatedly (like the simplification in their paper which needs to be
  1108      function repeatedly (like the simplification in their paper which is applied
  1110      applied until a fixpoint is reached) because we can show that @{term bsimp} is idempotent,
  1109      until a fixpoint is reached), because we can show that it is idempotent, that is
  1111      that is
  1110 
  1112 
  1111      \begin{proposition}
  1113      \begin{proposition}
  1112      ???
  1114      @{term "bsimp (bsimp r) = bsimp r"}
  1113      \end{proposition}
  1115      \end{proposition}
  1114 
  1116 
       
  1117      \noindent
       
  1118      This can be proved by induction on @{text r} but requires a detailed analysis
       
  1119      that the de-nesting of alternatives always results in a flat list of regular
       
  1120      expressions. We omit the details since it does not concern the correctness proof.
       
  1121      
       
  1122      Next we can include simplification after each derivative step leading to the
       
  1123      following notion of bitcoded derivatives:
       
  1124      
       
  1125      \begin{center}
       
  1126       \begin{tabular}{cc}
       
  1127       \begin{tabular}{l@ {\hspace{1mm}}c@ {\hspace{1mm}}l}
       
  1128       @{thm (lhs) bders_simp.simps(1)} & $\dn$ & @{thm (rhs) bders_simp.simps(1)}
       
  1129       \end{tabular}
       
  1130       &
       
  1131       \begin{tabular}{l@ {\hspace{1mm}}c@ {\hspace{1mm}}l}
       
  1132       @{thm (lhs) bders_simp.simps(2)} & $\dn$ & @{thm (rhs) bders_simp.simps(2)}
       
  1133       \end{tabular}
       
  1134       \end{tabular}
       
  1135       \end{center}
       
  1136 
       
  1137       \noindent
       
  1138       and use it in the improved lexing algorithm defined as
       
  1139 
       
  1140      \begin{center}
       
  1141 \begin{tabular}{lcl}
       
  1142   $\textit{blexer}^+\;r\,s$ & $\dn$ &
       
  1143       $\textit{let}\;r_{der} = (r^\uparrow)\backslash_{bsimp}\, s\;\textit{in}$\\                
       
  1144   & & $\;\;\;\;\textit{if}\; \textit{bnullable}(r_{der}) \;\;\textit{then}\;\textit{decode}\,(\textit{bmkeps}\,r_{der})\,r
       
  1145        \;\;\textit{else}\;\textit{None}$
       
  1146 \end{tabular}
       
  1147 \end{center}
       
  1148 
       
  1149        \noindent The remaining task is to show that @{term blexer} and
       
  1150        @{term "blexer_simp"} generate the same answers.
       
  1151 
       
  1152        When we first
       
  1153        attempted this proof we encountered a problem with the idea
       
  1154        in Sulzmann and Lu's paper where the argument seems to be to appeal
       
  1155        again to the @{text retrieve}-function defined for the unsimplified version
       
  1156        of the algorithm. But
       
  1157        this does not work, because desirable properties such as
       
  1158      %
       
  1159      \[
       
  1160      @{text "retrieve r v = retrieve (bsimp r) v"}
       
  1161      \]
       
  1162 
       
  1163      \noindent do not hold under simplification---this property
       
  1164      essentially purports that we can retrieve the same value from a
       
  1165      simplified version of the regular expression. To start with @{text retrieve}
       
  1166      depends on the fact that the value @{text v} correspond to the
       
  1167      structure of the regular exprssions---but the whole point of simplification
       
  1168      is to ``destroy'' this structure by making the regular expression simpler.
       
  1169      To see this consider the regular expression @{text "r = r' + 0"} and a corresponding
       
  1170      value @{text "v = Left v'"}. If we annotate bitcodes to @{text "r"}, then 
       
  1171      we can use @{text retrieve} and @{text v} in order to extract a corresponding
       
  1172      bitsequence. The reason that this works is that @{text r} is an alternative
       
  1173      regular expression and @{text v} a corresponding value. However, if we simplify
       
  1174      @{text r}, then @{text v} does not correspond to the shape of the regular 
       
  1175      expression anymore. So unless one can somehow
       
  1176      synchronise the change in the simplified regular expressions with
       
  1177      the original POSIX value, there is no hope of appealing to @{text retrieve} in the
       
  1178      correctness argument for @{term blexer_simp}.
       
  1179 
       
  1180      We found it more helpful to introduce the rewriting systems shown in
       
  1181      Figure~\ref{SimpRewrites}. The idea is to generate 
       
  1182      simplified regular expressions in small steps (unlike the @{text bsimp}-function which
       
  1183      does the same in a big step), and show that each of
       
  1184      the small steps preserves the bitcodes that lead to the final POSIX value.
       
  1185      The rewrite system is organised such that $\leadsto$ is for bitcode regular
       
  1186      expressions and $\stackrel{s}{\leadsto}$ for lists of bitcoded regular
       
  1187      expressions. The former essentially implements the simplifications of
       
  1188      @{text "bsimpSEQ"} and @{text flts}; while the latter implements the
       
  1189      simplifications in @{text "bsimpALTs"}. We can show that any bitcoded
       
  1190      regular expression reduces in one or more steps to the simplified
       
  1191      regular expression generated by @{text bsimp}:
       
  1192 
       
  1193      \begin{lemma}\label{lemone}
       
  1194      @{thm[mode=IfThen] rewrites_to_bsimp}
       
  1195      \end{lemma}
       
  1196 
       
  1197      \begin{proof}
       
  1198      By induction on @{text r}. For this we can use the properties
       
  1199      @{thm fltsfrewrites} and @{thm ss6_stronger}. The latter uses
       
  1200      repeated applications of the $LD$ rule which allows the removal
       
  1201      of duplicates that can recognise the same strings. 
       
  1202      \end{proof}
       
  1203 
       
  1204      \noindent
       
  1205      We can show that this rewrite system preserves @{term bnullable}, that 
       
  1206      is simplification, essentially, does not affect nullability:
  1115 
  1207 
  1116      \begin{lemma}
  1208      \begin{lemma}
  1117      @{thm[mode=IfThen] bnullable0(1)[of "r\<^sub>1" "r\<^sub>2"]}
  1209      @{thm[mode=IfThen] bnullable0(1)[of "r\<^sub>1" "r\<^sub>2"]}
  1118      \end{lemma}
  1210      \end{lemma}
  1119 
  1211 
  1120 
  1212      \begin{proof}
  1121      \begin{lemma}
  1213      Straightforward mutual induction on the definition of $\leadsto$ and $\stackrel{s}{\leadsto}$.
       
  1214      The only interesting case is the rule $LD$ where the property holds since by the side-conditions of that rule the empty string will
       
  1215      be in both @{text "L (rs\<^sub>a @ [r\<^sub>1] @ rs\<^sub>b @ [r\<^sub>2] @ rs\<^sub>c)"} and
       
  1216      @{text "L (rs\<^sub>a @ [r\<^sub>1] @ rs\<^sub>b @ rs\<^sub>c)"}.
       
  1217      \end{proof}
       
  1218 
       
  1219      \noindent
       
  1220      From this, we can show that @{text bmkeps} will producte the same bitsequence
       
  1221      as long as one of the bitcoded regular expressions in $\leadsto$ is nullable (this lemma
       
  1222      establishes the missing fact we were not able to establish using @{text retrieve}, as suggested
       
  1223      in the paper by Sulzmannn and Lu). 
       
  1224 
       
  1225 
       
  1226      \begin{lemma}\label{lemthree}
  1122      @{thm[mode=IfThen] rewrite_bmkeps_aux(1)[of "r\<^sub>1" "r\<^sub>2"]}
  1227      @{thm[mode=IfThen] rewrite_bmkeps_aux(1)[of "r\<^sub>1" "r\<^sub>2"]}
  1123      \end{lemma}
  1228      \end{lemma}
  1124 
  1229 
  1125      \begin{lemma}
  1230      \begin{proof}
  1126      @{thm[mode=IfThen] rewrites_to_bsimp}
  1231      By straightforward mutual induction on the definition of $\leadsto$ and $\stackrel{s}{\leadsto}$.
  1127      \end{lemma}
  1232      Again the only interesting case is the rule $LD$ where we need to ensure that
       
  1233      \[
       
  1234      @{text "bmkeps (rs\<^sub>a @ [r\<^sub>1] @ rs\<^sub>b @ [r\<^sub>2] @ rs\<^sub>c) =
       
  1235         bmkeps (rs\<^sub>a @ [r\<^sub>1] @ rs\<^sub>b @ rs\<^sub>c)"}	
       
  1236      \]
       
  1237 
       
  1238      \noindent holds. This is indeed the case because according to the POSIX rules the
       
  1239      generated bitsequence is determined by the first alternative that can match the
       
  1240      string (in this case being nullable).
       
  1241      \end{proof}
       
  1242 
       
  1243      \noindent
       
  1244      Crucial is also the fact that derivative steps and simplification steps can be interleaved,
       
  1245      which is shown by the fact that $\leadsto$ is preserved under derivatives.
  1128 
  1246 
  1129      \begin{lemma}
  1247      \begin{lemma}
  1130      @{thm[mode=IfThen] rewrite_preserves_bder(1)[of "r\<^sub>1" "r\<^sub>2"]}
  1248      @{thm[mode=IfThen] rewrite_preserves_bder(1)[of "r\<^sub>1" "r\<^sub>2"]}
  1131      \end{lemma}
  1249      \end{lemma}
  1132 
  1250 
  1133      \begin{lemma}
  1251      \begin{proof}
       
  1252      By straightforward mutual induction on the definition of $\leadsto$ and $\stackrel{s}{\leadsto}$.
       
  1253      The case for $LD$ holds because @{term "L (erase (bder c r\<^sub>2)) \<subseteq> L (erase (bder c r\<^sub>1))"}
       
  1254      if and only if @{term "L (erase (r\<^sub>2)) \<subseteq> L (erase (r\<^sub>1))"}.
       
  1255      \end{proof}
       
  1256 
       
  1257 
       
  1258      \noindent
       
  1259      Using this fact together with Lemma~\ref{lemone} allows us to prove the central lemma
       
  1260      that the unsimplified
       
  1261      derivative (with a string @{term s}) reduces to the simplified derivative (with the same string).
       
  1262      
       
  1263      \begin{lemma}\label{lemtwo}
  1134      @{thm[mode=IfThen] central}
  1264      @{thm[mode=IfThen] central}
  1135      \end{lemma}
  1265      \end{lemma}
  1136 
  1266 
       
  1267      \begin{proof}
       
  1268      By reverse induction on @{term s} generalising over @{text r}.
       
  1269      \end{proof}
       
  1270 
       
  1271      \noindent
       
  1272      With these lemmas in place we can finaly establish that @{term "blexer_simp"} and @{term "blexer"}
       
  1273      generate the same value, and using Theorem~\ref{thmone} from the previous section that this value
       
  1274      is indeed the POSIX value.
       
  1275      
  1137      \begin{theorem}
  1276      \begin{theorem}
  1138      @{thm[mode=IfThen] main_blexer_simp}
  1277      @{thm[mode=IfThen] main_blexer_simp}
  1139      \end{theorem}
  1278      \end{theorem}
  1140 
  1279 
  1141      Sulzmann \& Lu apply simplification via a fixpoint operation
  1280      \begin{proof}
  1142 
  1281      By unfolding the definitions and using Lemmas~\ref{lemtwo} and \ref{lemthree}. 	
  1143      ; also does not use erase to filter out duplicates.
  1282      \end{proof}
  1144   
  1283      
  1145    not direct correspondence with PDERs, because of example
  1284      \noindent
  1146    problem with retrieve 
  1285      This completes the correctness proof for the second POSIX lexing algorithm by Sulzmann and Lu.
  1147 
  1286      The interesting point of this algorithm is that the sizes of derivatives do not grow arbitrarily, which
  1148    correctness
  1287      we shall show next.
  1149 
       
  1150    
       
  1151     
       
  1152 
       
  1153 
       
  1154 
  1288 
  1155    \begin{figure}[t]
  1289    \begin{figure}[t]
  1156   \begin{center}
  1290   \begin{center}
  1157   \begin{tabular}{c}
  1291   \begin{tabular}{c}
  1158   @{thm[mode=Axiom] bs1[of _ "r\<^sub>2"]}\qquad
  1292   @{thm[mode=Axiom] bs1[of _ "r\<^sub>2"]}$S\ZERO{}_l$\qquad
  1159   @{thm[mode=Axiom] bs2[of _ "r\<^sub>1"]}\qquad
  1293   @{thm[mode=Axiom] bs2[of _ "r\<^sub>1"]}$S\ZERO{}_r$\\
  1160   @{thm[mode=Axiom] bs3[of "bs\<^sub>1" "bs\<^sub>2"]}\\
  1294   @{thm[mode=Axiom] bs3[of "bs\<^sub>1" "bs\<^sub>2"]}$S\ONE$\\
  1161   @{thm[mode=Rule] bs4[of "r\<^sub>1" "r\<^sub>2" _ "r\<^sub>3"]}\qquad
  1295   @{thm[mode=Rule] bs4[of "r\<^sub>1" "r\<^sub>2" _ "r\<^sub>3"]}SL\qquad
  1162   @{thm[mode=Rule] bs5[of "r\<^sub>3" "r\<^sub>4" _ "r\<^sub>1"]}\\
  1296   @{thm[mode=Rule] bs5[of "r\<^sub>3" "r\<^sub>4" _ "r\<^sub>1"]}SR\\
  1163   @{thm[mode=Axiom] bs6}\qquad
  1297   @{thm[mode=Axiom] bs6}$A0$\qquad
  1164   @{thm[mode=Axiom] bs7}\\
  1298   @{thm[mode=Axiom] bs7}$A1$\\
  1165   @{thm[mode=Rule] bs8[of "rs\<^sub>1" "rs\<^sub>2"]}\\
  1299   @{thm[mode=Rule] bs8[of "rs\<^sub>1" "rs\<^sub>2"]}$AL$\\
  1166   %@ { t hm[mode=Axiom] ss1}\qquad
  1300   @{thm[mode=Rule] ss2[of "rs\<^sub>1" "rs\<^sub>2"]}$LH$\qquad
  1167   @{thm[mode=Rule] ss2[of "rs\<^sub>1" "rs\<^sub>2"]}\qquad
  1301   @{thm[mode=Rule] ss3[of "r\<^sub>1" "r\<^sub>2"]}$LT$\\
  1168   @{thm[mode=Rule] ss3[of "r\<^sub>1" "r\<^sub>2"]}\\
  1302   @{thm[mode=Axiom] ss4}$L\ZERO$\qquad
  1169   @{thm[mode=Axiom] ss4}\qquad
  1303   @{thm[mode=Axiom] ss5[of "bs" "rs\<^sub>1" "rs\<^sub>2"]}$LS$\medskip\\
  1170   @{thm[mode=Axiom] ss5[of "bs" "rs\<^sub>1" "rs\<^sub>2"]}\\
  1304   @{thm[mode=Rule] ss6[of "r\<^sub>2" "r\<^sub>1" "rs\<^sub>1" "rs\<^sub>2" "rs\<^sub>3"]}$LD$\\
  1171   @{thm[mode=Rule] ss6[of "r\<^sub>2" "r\<^sub>1" "rs\<^sub>1" "rs\<^sub>2" "rs\<^sub>3"]}\\
       
  1172   \end{tabular}
  1305   \end{tabular}
  1173   \end{center}
  1306   \end{center}
  1174   \caption{???}\label{SimpRewrites}
  1307   \caption{The rewrite rules that generate simplified regular expressions
       
  1308   in small steps: @{term "rrewrite r\<^sub>1 r\<^sub>2"} is for bitcoded regular
       
  1309   expressions and @{term "rrewrites rs\<^sub>1 rs\<^sub>2"} for \emph{lists} of bitcoded
       
  1310   regular expressions. Interesting is the $LD$ rule that allows copies of regular
       
  1311   expressions be removed provided a regular expression earlier in the list can
       
  1312   match the same strings.}\label{SimpRewrites}
  1175   \end{figure}
  1313   \end{figure}
  1176 *}
  1314 *}
  1177 
  1315 
  1178 section {* Bound - NO *}
  1316 section {* Bound - NO *}
  1179 
  1317 
  1180 
  1318 
  1181 section {* Conclusion *}
  1319 section {* Conclusion *}
  1182 
  1320 
  1183 text {*
  1321 text {*
  1184 
  1322 
       
  1323 We set out in this work to implement in Isabelle/HOL the secod lexing
       
  1324    algorithm by Sulzmann and Lu \cite{Sulzmann2014}. This algorithm is
       
  1325    interesting because it includes an ``aggressive'' simplification
       
  1326    function that keeps the sizes of derivatives finite.
       
  1327 
       
  1328    Having proved the correctness of the POSIX lexing algorithm in
       
  1329    [53], which lessons have we learned? Well, we feel this is a
       
  1330    perfect example where formal proofs give further insight into the
       
  1331    matter at hand. Our proofs were done both done by hand and
       
  1332    checked in Isabelle/HOL. The experience of doing our proofs in this
       
  1333    way has been that the mechanical checking was absolutely essential:
       
  1334    despite the apparent maturity, this subject area has hidden
       
  1335    snares.
  1185 
  1336 
  1186 
  1337 
  1187 %%\bibliographystyle{plain}
  1338 %%\bibliographystyle{plain}
  1188 \bibliography{root}
  1339 \bibliography{root}
  1189 *}
  1340 *}