--- a/thys2/Paper/Paper.thy Wed Feb 09 00:29:04 2022 +0000
+++ b/thys2/Paper/Paper.thy Wed Feb 09 11:21:28 2022 +0000
@@ -27,7 +27,7 @@
ders_syn ("_\\_" [79, 1000] 76) and
bder_syn ("_\\_" [79, 1000] 76) and
bders ("_\\_" [79, 1000] 76) and
- bders_simp ("_\\\<^sub>s\<^sub>i\<^sub>m\<^sub>p _" [79, 1000] 76) and
+ bders_simp ("_\\\<^sub>b\<^sub>s\<^sub>i\<^sub>m\<^sub>p _" [79, 1000] 76) and
ZERO ("\<^bold>0" 81) and
ONE ("\<^bold>1" 81) and
@@ -70,6 +70,7 @@
srewrite ("_\<^latex>\<open>\\mbox{$\\,\\stackrel{s}{\\leadsto}$}\<close> _" [71, 71] 80) and
rrewrites ("_ \<^latex>\<open>\\mbox{$\\,\\leadsto^*$}\<close> _" [71, 71] 80) and
+ srewrites ("_ \<^latex>\<open>\\mbox{$\\,\\stackrel{s}{\\leadsto}^*$}\<close> _" [71, 71] 80) and
blexer_simp ("blexer\<^sup>+" 1000)
@@ -188,7 +189,7 @@
different bitcode annotation---so @{text nub} would never ``fire''.
Inspired by Scala's library for lists, we shall instead use a @{text
distinctBy} function that finds duplicates under an erasing function
-that deletes bitcodes.
+which deletes bitcodes.
We shall also introduce our own argument and definitions for
establishing the correctness of the bitcoded algorithm when
simplifications are included.\medskip
@@ -358,7 +359,7 @@
Unix-like operating systems \cite{POSIX}.} Sometimes these
informal rules are called \emph{maximal much rule} and \emph{rule priority}.
One contribution of our earlier paper is to give a convenient
- specification for what a POSIX value is (the inductive rules are shown in
+ specification for what POSIX values are (the inductive rules are shown in
Figure~\ref{POSIXrules}).
\begin{figure}[t]
@@ -436,7 +437,7 @@
\Brz's derivatives) and @{const mkeps}/@{text inj}, the path from right to
left, the second phase. The picture above shows the steps required when a
regular expression, say @{text "r\<^sub>1"}, matches the string @{term
- "[a,b,c]"}. The lexing algorithm by Sulzmann and Lu can be defined as:
+ "[a,b,c]"}. The first lexing algorithm by Sulzmann and Lu can be defined as:
\begin{figure}[t]
\begin{center}
@@ -513,7 +514,7 @@
In fact we have shown that in the success case the generated POSIX value $v$ is
unique and in the failure case that there is no POSIX value $v$ that satisfies
$(s, r) \rightarrow v$. While the algorithm is correct, it is excrutiatingly
- slow in cases where the derivatives grow arbitrarily (see example from the
+ slow in cases where the derivatives grow arbitrarily (recall the example from the
Introduction). However it can be used as a convenient reference point for the correctness
proof of the second algorithm by Sulzmann and Lu, which we shall describe next.
@@ -544,7 +545,7 @@
expressions; and @{text rs} for lists of bitcoded regular
expressions. The binary alternative @{text "ALT bs r\<^sub>1 r\<^sub>2"}
is just an abbreviation for \mbox{@{text "ALTs bs [r\<^sub>1, r\<^sub>2]"}}.
- For bitsequences we just use lists made up of the
+ For bitsequences we use lists made up of the
constants @{text Z} and @{text S}. The idea with bitcoded regular
expressions is to incrementally generate the value information (for
example @{text Left} and @{text Right}) as bitsequences. For this
@@ -629,7 +630,7 @@
\end{proof}
Sulzmann and Lu define the function \emph{internalise}
- in order to transform standard regular expressions into annotated
+ in order to transform (standard) regular expressions into annotated
regular expressions. We write this operation as $r^\uparrow$.
This internalisation uses the following
\emph{fuse} function.
@@ -652,7 +653,7 @@
\noindent
A regular expression can then be \emph{internalised} into a bitcoded
- regular expression as follows.
+ regular expression as follows:
\begin{center}
\begin{tabular}{lcl}
@@ -676,7 +677,7 @@
straightforward definition. For defining the algorithm, we also need
the functions \textit{bnullable} and \textit{bmkeps}, which are the
``lifted'' versions of \textit{nullable} and \textit{mkeps} acting on
- bitcoded regular expressions, instead of regular expressions.
+ bitcoded regular expressions.
%
\begin{center}
\begin{tabular}{@ {}c@ {}c@ {}}
@@ -710,7 +711,7 @@
\noindent
The key function in the bitcoded algorithm is the derivative of a
bitcoded regular expression. This derivative calculates the
- derivative but at the same time also the incremental part of bitsequences
+ derivative but at the same time also the incremental part of the bitsequences
that contribute to constructing a POSIX value.
\begin{center}
@@ -811,9 +812,9 @@
\noindent
The only difficulty left for the correctness proof is that the bitcoded algorithm
has only a ``forward phase'' where POSIX values are generated incrementally.
-We can achieve the same effect with @{text lexer} by stacking up injection
+We can achieve the same effect with @{text lexer} (which has two phases) by stacking up injection
functions during the forward phase. An auxiliary function, called $\textit{flex}$,
-allows us to recast the rules of $\lexer$ (with its two phases) in terms of a single
+allows us to recast the rules of $\lexer$ in terms of a single
phase and stacked up injection functions.
\begin{center}
@@ -826,7 +827,7 @@
\noindent
The point of this function is that when
-reaching the end of the string, we just need to apply the stacked
+reaching the end of the string, we just need to apply the stacked up
injection functions to the value generated by @{text mkeps}.
Using this function we can recast the success case in @{text lexer}
as follows:
@@ -843,11 +844,11 @@
need to traverse the string $s$ twice, once for building the
derivative $r\backslash s$ and another time for stacking up injection
functions using \textit{flex}), it helps us with proving
-that incrementally building up values generates the same result.
+that incrementally building up values in @{text blexer} generates the same result.
This brings us to our main lemma in this section: if we calculate a
-derivative, say $r\backslash s$ and have a value, say $v$, inhabited
-by this derivative, then we can produce the result $\lexer$ generates
+derivative, say $r\backslash s$, and have a value, say $v$, inhabited
+by this derivative, then we can produce the result @{text lexer} generates
by applying this value to the stacked-up injection functions
that $\textit{flex}$ assembles. The lemma establishes that this is the same
value as if we build the annotated derivative $r^\uparrow\backslash s$
@@ -900,7 +901,7 @@
produces the same result as \textit{lexer}.
-\begin{theorem}
+\begin{theorem}\label{thmone}
$\textit{lexer}\,r\,s = \textit{blexer}\,r\,s$
\end{theorem}
@@ -969,26 +970,26 @@
\end{equation}
\noindent This is a simpler derivative, but unfortunately we
- cannot make further simplifications. This is a problem because
+ cannot make any further simplifications. This is a problem because
the outermost alternatives contains two copies of the same
- regular expression (underlined with $r$). The copies will
- spawn new copies in later steps and they in turn further copies. This
- destroys an hope of taming the size of the derivatives. But the
+ regular expression (underlined with $r$). These copies will
+ spawn new copies in later derivative steps and they in turn even more copies. This
+ destroys any hope of taming the size of the derivatives. But the
second copy of $r$ in \eqref{derivex} will never contribute to a
value, because POSIX lexing will always prefer matching a string
- with the first copy. So in principle it could be removed.
+ with the first copy. So it could be safely removed without affecting the correctness of the algorithm.
The dilemma with the simple-minded
simplification rules above is that the rule $r + r \Rightarrow r$
will never be applicable because as can be seen in this example the
- regular expressions are separated by another sub-regular expression.
+ regular expressions are not next to each other but separated by another regular expression.
But here is where Sulzmann and Lu's representation of generalised
alternatives in the bitcoded algorithm shines: in @{term
"ALTs bs rs"} we can define a more aggressive simplification by
recursively simplifying all regular expressions in @{text rs} and
then analyse the resulting list and remove any duplicates.
- Another advantage is that the bitsequences in bitcoded regular
- expressions can be easily modified such that simplification does not
+ Another advantage with the bitsequences in bitcoded regular
+ expressions is that they can be easily modified such that simplification does not
interfere with the value constructions. For example we can ``flatten'', or
de-nest, @{text ALTs} as follows
%
@@ -1006,7 +1007,7 @@
%is not affected by simplification.
However there is one problem with the definition for the more
- aggressive simlification rules by Sulzmann and Lu. Recasting
+ aggressive simlification rules described by Sulzmann and Lu. Recasting
their definition with our syntax they define the step of removing
duplicates as
%
@@ -1020,8 +1021,8 @@
duplicates. While this makes sense when considering the example
shown in \eqref{derivex}, @{text nub} is the inappropriate
function in the case of bitcoded regular expressions. The reason
- is that in general the n elements in @{text rs} will have a
- different bitsequence annotated to it and in this way @{text nub}
+ is that in general the elements in @{text rs} will have a
+ different annotated bitsequence and in this way @{text nub}
will never find a duplicate to be removed. The correct way to
handle this situation is to first \emph{erase} the regular
expressions when comparing potential duplicates. This is inspired
@@ -1037,21 +1038,22 @@
\end{center}
\noindent where we scan the list from left to right (because we
- have to remove later copies). In this function, @{text f} is a
+ have to remove later copies). In @{text distinctBy}, @{text f} is a
function and @{text acc} is an accumulator for regular
- expressions---essentially a set of elements we have already seen
+ expressions---essentially a set of regular expression that we have already seen
while scanning the list. Therefore we delete an element, say @{text x},
from the list provided @{text "f x"} is already in the accumulator;
- otherwise we keep @{text x} and scan the rest of the list but now
- add @{text "f x"} as another element to @{text acc}. We will use
- @{term distinctBy} where @{text f} is our erase functions, @{term "erase (DUMMY)"},
+ otherwise we keep @{text x} and scan the rest of the list but
+ add @{text "f x"} as another ``seen'' element to @{text acc}. We will use
+ @{term distinctBy} where @{text f} is the erase functions, @{term "erase (DUMMY)"},
that deletes bitsequences from bitcoded regular expressions.
This is clearly a computationally more expensive operation, than @{text nub},
but is needed in order to make the removal of unnecessary copies
- to work.
+ to work properly.
Our simplification function depends on three helper functions, one is called
- @{text flts} and defined as follows:
+ @{text flts} and analyses lists of regular expressions coming from alternatives.
+ It is defined as follows:
\begin{center}
\begin{tabular}{l@ {\hspace{1mm}}c@ {\hspace{1mm}}l}
@@ -1062,7 +1064,7 @@
\end{center}
\noindent
- The second clause removes all instances of @{text ZERO} in alternatives and
+ The second clause of @{text flts} removes all instances of @{text ZERO} in alternatives and
the second ``spills'' out nested alternatives (but retaining the
bitsequence @{text "bs'"} accumulated in the inner alternative). There are
some corner cases to be considered when the resulting list inside an alternative is
@@ -1103,75 +1105,211 @@
\noindent
As far as we can see, our recursive function @{term bsimp} simplifies regular
- expressions as intended by Sulzmann and Lu. There is no point to apply the
- @{text bsimp}
- function repeatedly (like the simplification in their paper which is applied
- until a fixpoint is reached), because we can show that it is idempotent, that is
+ expressions as intended by Sulzmann and Lu. There is no point in applying the
+ @{text bsimp} function repeatedly (like the simplification in their paper which needs to be
+ applied until a fixpoint is reached) because we can show that @{term bsimp} is idempotent,
+ that is
\begin{proposition}
- ???
+ @{term "bsimp (bsimp r) = bsimp r"}
\end{proposition}
+ \noindent
+ This can be proved by induction on @{text r} but requires a detailed analysis
+ that the de-nesting of alternatives always results in a flat list of regular
+ expressions. We omit the details since it does not concern the correctness proof.
+
+ Next we can include simplification after each derivative step leading to the
+ following notion of bitcoded derivatives:
+
+ \begin{center}
+ \begin{tabular}{cc}
+ \begin{tabular}{l@ {\hspace{1mm}}c@ {\hspace{1mm}}l}
+ @{thm (lhs) bders_simp.simps(1)} & $\dn$ & @{thm (rhs) bders_simp.simps(1)}
+ \end{tabular}
+ &
+ \begin{tabular}{l@ {\hspace{1mm}}c@ {\hspace{1mm}}l}
+ @{thm (lhs) bders_simp.simps(2)} & $\dn$ & @{thm (rhs) bders_simp.simps(2)}
+ \end{tabular}
+ \end{tabular}
+ \end{center}
+
+ \noindent
+ and use it in the improved lexing algorithm defined as
+
+ \begin{center}
+\begin{tabular}{lcl}
+ $\textit{blexer}^+\;r\,s$ & $\dn$ &
+ $\textit{let}\;r_{der} = (r^\uparrow)\backslash_{bsimp}\, s\;\textit{in}$\\
+ & & $\;\;\;\;\textit{if}\; \textit{bnullable}(r_{der}) \;\;\textit{then}\;\textit{decode}\,(\textit{bmkeps}\,r_{der})\,r
+ \;\;\textit{else}\;\textit{None}$
+\end{tabular}
+\end{center}
+
+ \noindent The remaining task is to show that @{term blexer} and
+ @{term "blexer_simp"} generate the same answers.
+
+ When we first
+ attempted this proof we encountered a problem with the idea
+ in Sulzmann and Lu's paper where the argument seems to be to appeal
+ again to the @{text retrieve}-function defined for the unsimplified version
+ of the algorithm. But
+ this does not work, because desirable properties such as
+ %
+ \[
+ @{text "retrieve r v = retrieve (bsimp r) v"}
+ \]
+
+ \noindent do not hold under simplification---this property
+ essentially purports that we can retrieve the same value from a
+ simplified version of the regular expression. To start with @{text retrieve}
+ depends on the fact that the value @{text v} correspond to the
+ structure of the regular exprssions---but the whole point of simplification
+ is to ``destroy'' this structure by making the regular expression simpler.
+ To see this consider the regular expression @{text "r = r' + 0"} and a corresponding
+ value @{text "v = Left v'"}. If we annotate bitcodes to @{text "r"}, then
+ we can use @{text retrieve} and @{text v} in order to extract a corresponding
+ bitsequence. The reason that this works is that @{text r} is an alternative
+ regular expression and @{text v} a corresponding value. However, if we simplify
+ @{text r}, then @{text v} does not correspond to the shape of the regular
+ expression anymore. So unless one can somehow
+ synchronise the change in the simplified regular expressions with
+ the original POSIX value, there is no hope of appealing to @{text retrieve} in the
+ correctness argument for @{term blexer_simp}.
+
+ We found it more helpful to introduce the rewriting systems shown in
+ Figure~\ref{SimpRewrites}. The idea is to generate
+ simplified regular expressions in small steps (unlike the @{text bsimp}-function which
+ does the same in a big step), and show that each of
+ the small steps preserves the bitcodes that lead to the final POSIX value.
+ The rewrite system is organised such that $\leadsto$ is for bitcode regular
+ expressions and $\stackrel{s}{\leadsto}$ for lists of bitcoded regular
+ expressions. The former essentially implements the simplifications of
+ @{text "bsimpSEQ"} and @{text flts}; while the latter implements the
+ simplifications in @{text "bsimpALTs"}. We can show that any bitcoded
+ regular expression reduces in one or more steps to the simplified
+ regular expression generated by @{text bsimp}:
+
+ \begin{lemma}\label{lemone}
+ @{thm[mode=IfThen] rewrites_to_bsimp}
+ \end{lemma}
+
+ \begin{proof}
+ By induction on @{text r}. For this we can use the properties
+ @{thm fltsfrewrites} and @{thm ss6_stronger}. The latter uses
+ repeated applications of the $LD$ rule which allows the removal
+ of duplicates that can recognise the same strings.
+ \end{proof}
+
+ \noindent
+ We can show that this rewrite system preserves @{term bnullable}, that
+ is simplification, essentially, does not affect nullability:
\begin{lemma}
@{thm[mode=IfThen] bnullable0(1)[of "r\<^sub>1" "r\<^sub>2"]}
\end{lemma}
+ \begin{proof}
+ Straightforward mutual induction on the definition of $\leadsto$ and $\stackrel{s}{\leadsto}$.
+ The only interesting case is the rule $LD$ where the property holds since by the side-conditions of that rule the empty string will
+ be in both @{text "L (rs\<^sub>a @ [r\<^sub>1] @ rs\<^sub>b @ [r\<^sub>2] @ rs\<^sub>c)"} and
+ @{text "L (rs\<^sub>a @ [r\<^sub>1] @ rs\<^sub>b @ rs\<^sub>c)"}.
+ \end{proof}
- \begin{lemma}
+ \noindent
+ From this, we can show that @{text bmkeps} will producte the same bitsequence
+ as long as one of the bitcoded regular expressions in $\leadsto$ is nullable (this lemma
+ establishes the missing fact we were not able to establish using @{text retrieve}, as suggested
+ in the paper by Sulzmannn and Lu).
+
+
+ \begin{lemma}\label{lemthree}
@{thm[mode=IfThen] rewrite_bmkeps_aux(1)[of "r\<^sub>1" "r\<^sub>2"]}
\end{lemma}
- \begin{lemma}
- @{thm[mode=IfThen] rewrites_to_bsimp}
- \end{lemma}
+ \begin{proof}
+ By straightforward mutual induction on the definition of $\leadsto$ and $\stackrel{s}{\leadsto}$.
+ Again the only interesting case is the rule $LD$ where we need to ensure that
+ \[
+ @{text "bmkeps (rs\<^sub>a @ [r\<^sub>1] @ rs\<^sub>b @ [r\<^sub>2] @ rs\<^sub>c) =
+ bmkeps (rs\<^sub>a @ [r\<^sub>1] @ rs\<^sub>b @ rs\<^sub>c)"}
+ \]
+
+ \noindent holds. This is indeed the case because according to the POSIX rules the
+ generated bitsequence is determined by the first alternative that can match the
+ string (in this case being nullable).
+ \end{proof}
+
+ \noindent
+ Crucial is also the fact that derivative steps and simplification steps can be interleaved,
+ which is shown by the fact that $\leadsto$ is preserved under derivatives.
\begin{lemma}
@{thm[mode=IfThen] rewrite_preserves_bder(1)[of "r\<^sub>1" "r\<^sub>2"]}
\end{lemma}
- \begin{lemma}
+ \begin{proof}
+ By straightforward mutual induction on the definition of $\leadsto$ and $\stackrel{s}{\leadsto}$.
+ The case for $LD$ holds because @{term "L (erase (bder c r\<^sub>2)) \<subseteq> L (erase (bder c r\<^sub>1))"}
+ if and only if @{term "L (erase (r\<^sub>2)) \<subseteq> L (erase (r\<^sub>1))"}.
+ \end{proof}
+
+
+ \noindent
+ Using this fact together with Lemma~\ref{lemone} allows us to prove the central lemma
+ that the unsimplified
+ derivative (with a string @{term s}) reduces to the simplified derivative (with the same string).
+
+ \begin{lemma}\label{lemtwo}
@{thm[mode=IfThen] central}
\end{lemma}
+ \begin{proof}
+ By reverse induction on @{term s} generalising over @{text r}.
+ \end{proof}
+
+ \noindent
+ With these lemmas in place we can finaly establish that @{term "blexer_simp"} and @{term "blexer"}
+ generate the same value, and using Theorem~\ref{thmone} from the previous section that this value
+ is indeed the POSIX value.
+
\begin{theorem}
@{thm[mode=IfThen] main_blexer_simp}
\end{theorem}
- Sulzmann \& Lu apply simplification via a fixpoint operation
-
- ; also does not use erase to filter out duplicates.
-
- not direct correspondence with PDERs, because of example
- problem with retrieve
-
- correctness
-
-
-
-
-
+ \begin{proof}
+ By unfolding the definitions and using Lemmas~\ref{lemtwo} and \ref{lemthree}.
+ \end{proof}
+
+ \noindent
+ This completes the correctness proof for the second POSIX lexing algorithm by Sulzmann and Lu.
+ The interesting point of this algorithm is that the sizes of derivatives do not grow arbitrarily, which
+ we shall show next.
\begin{figure}[t]
\begin{center}
\begin{tabular}{c}
- @{thm[mode=Axiom] bs1[of _ "r\<^sub>2"]}\qquad
- @{thm[mode=Axiom] bs2[of _ "r\<^sub>1"]}\qquad
- @{thm[mode=Axiom] bs3[of "bs\<^sub>1" "bs\<^sub>2"]}\\
- @{thm[mode=Rule] bs4[of "r\<^sub>1" "r\<^sub>2" _ "r\<^sub>3"]}\qquad
- @{thm[mode=Rule] bs5[of "r\<^sub>3" "r\<^sub>4" _ "r\<^sub>1"]}\\
- @{thm[mode=Axiom] bs6}\qquad
- @{thm[mode=Axiom] bs7}\\
- @{thm[mode=Rule] bs8[of "rs\<^sub>1" "rs\<^sub>2"]}\\
- %@ { t hm[mode=Axiom] ss1}\qquad
- @{thm[mode=Rule] ss2[of "rs\<^sub>1" "rs\<^sub>2"]}\qquad
- @{thm[mode=Rule] ss3[of "r\<^sub>1" "r\<^sub>2"]}\\
- @{thm[mode=Axiom] ss4}\qquad
- @{thm[mode=Axiom] ss5[of "bs" "rs\<^sub>1" "rs\<^sub>2"]}\\
- @{thm[mode=Rule] ss6[of "r\<^sub>2" "r\<^sub>1" "rs\<^sub>1" "rs\<^sub>2" "rs\<^sub>3"]}\\
+ @{thm[mode=Axiom] bs1[of _ "r\<^sub>2"]}$S\ZERO{}_l$\qquad
+ @{thm[mode=Axiom] bs2[of _ "r\<^sub>1"]}$S\ZERO{}_r$\\
+ @{thm[mode=Axiom] bs3[of "bs\<^sub>1" "bs\<^sub>2"]}$S\ONE$\\
+ @{thm[mode=Rule] bs4[of "r\<^sub>1" "r\<^sub>2" _ "r\<^sub>3"]}SL\qquad
+ @{thm[mode=Rule] bs5[of "r\<^sub>3" "r\<^sub>4" _ "r\<^sub>1"]}SR\\
+ @{thm[mode=Axiom] bs6}$A0$\qquad
+ @{thm[mode=Axiom] bs7}$A1$\\
+ @{thm[mode=Rule] bs8[of "rs\<^sub>1" "rs\<^sub>2"]}$AL$\\
+ @{thm[mode=Rule] ss2[of "rs\<^sub>1" "rs\<^sub>2"]}$LH$\qquad
+ @{thm[mode=Rule] ss3[of "r\<^sub>1" "r\<^sub>2"]}$LT$\\
+ @{thm[mode=Axiom] ss4}$L\ZERO$\qquad
+ @{thm[mode=Axiom] ss5[of "bs" "rs\<^sub>1" "rs\<^sub>2"]}$LS$\medskip\\
+ @{thm[mode=Rule] ss6[of "r\<^sub>2" "r\<^sub>1" "rs\<^sub>1" "rs\<^sub>2" "rs\<^sub>3"]}$LD$\\
\end{tabular}
\end{center}
- \caption{???}\label{SimpRewrites}
+ \caption{The rewrite rules that generate simplified regular expressions
+ in small steps: @{term "rrewrite r\<^sub>1 r\<^sub>2"} is for bitcoded regular
+ expressions and @{term "rrewrites rs\<^sub>1 rs\<^sub>2"} for \emph{lists} of bitcoded
+ regular expressions. Interesting is the $LD$ rule that allows copies of regular
+ expressions be removed provided a regular expression earlier in the list can
+ match the same strings.}\label{SimpRewrites}
\end{figure}
*}
@@ -1182,6 +1320,19 @@
text {*
+We set out in this work to implement in Isabelle/HOL the secod lexing
+ algorithm by Sulzmann and Lu \cite{Sulzmann2014}. This algorithm is
+ interesting because it includes an ``aggressive'' simplification
+ function that keeps the sizes of derivatives finite.
+
+ Having proved the correctness of the POSIX lexing algorithm in
+ [53], which lessons have we learned? Well, we feel this is a
+ perfect example where formal proofs give further insight into the
+ matter at hand. Our proofs were done both done by hand and
+ checked in Isabelle/HOL. The experience of doing our proofs in this
+ way has been that the mechanical checking was absolutely essential:
+ despite the apparent maturity, this subject area has hidden
+ snares.
%%\bibliographystyle{plain}