diff -r a730a5a0bab9 -r 89e6605c4ca4 thys/Journal/Paper.thy --- a/thys/Journal/Paper.thy Tue Jul 23 21:21:49 2019 +0100 +++ b/thys/Journal/Paper.thy Mon Jul 29 09:37:20 2019 +0100 @@ -132,14 +132,14 @@ -section {* Introduction *} +section \Introduction\ -text {* +text \ Brzozowski \cite{Brzozowski1964} introduced the notion of the {\em -derivative} @{term "der c r"} of a regular expression @{text r} w.r.t.\ -a character~@{text c}, and showed that it gave a simple solution to the +derivative} @{term "der c r"} of a regular expression \r\ w.r.t.\ +a character~\c\, and showed that it gave a simple solution to the problem of matching a string @{term s} with a regular expression @{term r}: if the derivative of @{term r} w.r.t.\ (in succession) all the characters of the string matches the empty string, then @{term r} @@ -175,8 +175,7 @@ into a sequence of tokens, POSIX is the more natural disambiguation strategy for what programmers consider basic syntactic building blocks in their programs. These building blocks are often specified by some -regular expressions, say @{text "r\<^bsub>key\<^esub>"} and @{text -"r\<^bsub>id\<^esub>"} for recognising keywords and identifiers, +regular expressions, say \r\<^bsub>key\<^esub>\ and \r\<^bsub>id\<^esub>\ for recognising keywords and identifiers, respectively. There are a few underlying (informal) rules behind tokenising a string in a POSIX \cite{POSIX} fashion: @@ -196,23 +195,22 @@ be longer than no match at all. \end{itemize} -\noindent Consider for example a regular expression @{text -"r\<^bsub>key\<^esub>"} for recognising keywords such as @{text "if"}, -@{text "then"} and so on; and @{text "r\<^bsub>id\<^esub>"} +\noindent Consider for example a regular expression \r\<^bsub>key\<^esub>\ for recognising keywords such as \if\, +\then\ and so on; and \r\<^bsub>id\<^esub>\ recognising identifiers (say, a single character followed by characters or numbers). Then we can form the regular expression -@{text "(r\<^bsub>key\<^esub> + r\<^bsub>id\<^esub>)\<^sup>\"} -and use POSIX matching to tokenise strings, say @{text "iffoo"} and -@{text "if"}. For @{text "iffoo"} we obtain by the Longest Match Rule +\(r\<^bsub>key\<^esub> + r\<^bsub>id\<^esub>)\<^sup>\\ +and use POSIX matching to tokenise strings, say \iffoo\ and +\if\. For \iffoo\ we obtain by the Longest Match Rule a single identifier token, not a keyword followed by an -identifier. For @{text "if"} we obtain by the Priority Rule a keyword -token, not an identifier token---even if @{text "r\<^bsub>id\<^esub>"} -matches also. By the Star Rule we know @{text "(r\<^bsub>key\<^esub> + -r\<^bsub>id\<^esub>)\<^sup>\"} matches @{text "iffoo"}, -respectively @{text "if"}, in exactly one `iteration' of the star. The +identifier. For \if\ we obtain by the Priority Rule a keyword +token, not an identifier token---even if \r\<^bsub>id\<^esub>\ +matches also. By the Star Rule we know \(r\<^bsub>key\<^esub> + +r\<^bsub>id\<^esub>)\<^sup>\\ matches \iffoo\, +respectively \if\, in exactly one `iteration' of the star. The Empty String Rule is for cases where, for example, the regular expression -@{text "(a\<^sup>\)\<^sup>\"} matches against the -string @{text "bc"}. Then the longest initial matched substring is the +\(a\<^sup>\)\<^sup>\\ matches against the +string \bc\. Then the longest initial matched substring is the empty string, which is matched by both the whole regular expression and the parenthesised subexpression. @@ -225,25 +223,24 @@ expression matches a string, values encode the information of \emph{how} the string is matched by the regular expression---that is, which part of the string is matched by which part of the regular -expression. For this consider again the string @{text "xy"} and -the regular expression \mbox{@{text "(x + (y + xy))\<^sup>\"}} +expression. For this consider again the string \xy\ and +the regular expression \mbox{\(x + (y + xy))\<^sup>\\} (this time fully parenthesised). We can view this regular expression -as tree and if the string @{text xy} is matched by two Star -`iterations', then the @{text x} is matched by the left-most -alternative in this tree and the @{text y} by the right-left alternative. This +as tree and if the string \xy\ is matched by two Star +`iterations', then the \x\ is matched by the left-most +alternative in this tree and the \y\ by the right-left alternative. This suggests to record this matching as \begin{center} @{term "Stars [Left(Char x), Right(Left(Char y))]"} \end{center} -\noindent where @{const Stars}, @{text Left}, @{text Right} and @{text -Char} are constructors for values. @{text Stars} records how many -iterations were used; @{text Left}, respectively @{text Right}, which +\noindent where @{const Stars}, \Left\, \Right\ and \Char\ are constructors for values. \Stars\ records how many +iterations were used; \Left\, respectively \Right\, which alternative is used. This `tree view' leads naturally to the idea that regular expressions act as types and values as inhabiting those types (see, for example, \cite{HosoyaVouillonPierce2005}). The value for -matching @{text "xy"} in a single `iteration', i.e.~the POSIX value, +matching \xy\ in a single `iteration', i.e.~the POSIX value, would look as follows \begin{center} @@ -316,11 +313,11 @@ We extend our results to ??? Bitcoded version?? -*} +\ -section {* Preliminaries *} +section \Preliminaries\ -text {* \noindent Strings in Isabelle/HOL are lists of characters with +text \\noindent Strings in Isabelle/HOL are lists of characters with the empty string being represented by the empty list, written @{term "[]"}, and list-cons being written as @{term "DUMMY # DUMMY"}. Often we use the usual bracket notation for lists also for strings; for @@ -333,7 +330,7 @@ inductive datatype: \begin{center} - @{text "r :="} + \r :=\ @{const "ZERO"} $\mid$ @{const "ONE"} $\mid$ @{term "CHAR c"} $\mid$ @@ -365,8 +362,8 @@ DUMMY"} for the concatenation of two languages (it is also list-append for strings). We use the star-notation for regular expressions and for languages (in the last clause above). The star for languages is defined - inductively by two clauses: @{text "(i)"} the empty string being in - the star of a language and @{text "(ii)"} if @{term "s\<^sub>1"} is in a + inductively by two clauses: \(i)\ the empty string being in + the star of a language and \(ii)\ if @{term "s\<^sub>1"} is in a language and @{term "s\<^sub>2"} in the star of this language, then also @{term "s\<^sub>1 @ s\<^sub>2"} is in the star of this language. It will also be convenient to use the following notion of a \emph{semantic derivative} (or \emph{left @@ -459,11 +456,11 @@ \cite{Sulzmann2014} is to append another phase to this algorithm in order to calculate a [lexical] value. We will explain the details next. -*} +\ -section {* POSIX Regular Expression Matching\label{posixsec} *} +section \POSIX Regular Expression Matching\label{posixsec}\ -text {* +text \ There have been many previous works that use values for encoding \emph{how} a regular expression matches a string. @@ -473,7 +470,7 @@ are defined as the inductive datatype \begin{center} - @{text "v :="} + \v :=\ @{const "Void"} $\mid$ @{term "val.Char c"} $\mid$ @{term "Left v"} $\mid$ @@ -532,8 +529,8 @@ \end{center} \noindent where in the clause for @{const "Stars"} we use the - notation @{term "v \ set vs"} for indicating that @{text v} is a - member in the list @{text vs}. We require in this rule that every + notation @{term "v \ set vs"} for indicating that \v\ is a + member in the list \vs\. We require in this rule that every value in @{term vs} flattens to a non-empty string. The idea is that @{term "Stars"}-values satisfy the informal Star Rule (see Introduction) where the $^\star$ does not match the empty string unless this is @@ -549,9 +546,9 @@ \end{proposition} \noindent - Given a regular expression @{text r} and a string @{text s}, we define the - set of all \emph{Lexical Values} inhabited by @{text r} with the underlying string - being @{text s}:\footnote{Okui and Suzuki refer to our lexical values + Given a regular expression \r\ and a string \s\, we define the + set of all \emph{Lexical Values} inhabited by \r\ with the underlying string + being \s\:\footnote{Okui and Suzuki refer to our lexical values as \emph{canonical values} in \cite{OkuiSuzuki2010}. The notion of \emph{non-problematic values} by Cardelli and Frisch \cite{Frisch2004} is related, but not identical to our lexical values.} @@ -573,7 +570,7 @@ infinitely many values, but according to our more restricted definition only a single value, namely @{thm LV_STAR_ONE_empty}. - If a regular expression @{text r} matches a string @{text s}, then + If a regular expression \r\ matches a string \s\, then generally the set @{term "LV r s"} is not just a singleton set. In case of POSIX matching the problem is to calculate the unique lexical value that satisfies the (informal) POSIX rules from the Introduction. @@ -582,9 +579,9 @@ path from the left to the right involving @{term derivatives}/@{const nullable} is the first phase of the algorithm (calculating successive \Brz's derivatives) and @{const - mkeps}/@{text inj}, the path from right to left, the second + mkeps}/\inj\, the path from right to left, the second phase. This picture shows the steps required when a regular - expression, say @{text "r\<^sub>1"}, matches the string @{term + expression, say \r\<^sub>1\, matches the string @{term "[a,b,c]"}. We first build the three derivatives (according to @{term a}, @{term b} and @{term c}). We then use @{const nullable} to find out whether the resulting derivative regular expression @@ -609,11 +606,11 @@ \node (v4) [below=of r4]{@{term "v\<^sub>4"}}; \draw[->,line width=1mm](r4) -- (v4); \node (v3) [left=of v4] {@{term "v\<^sub>3"}}; -\draw[->,line width=1mm](v4)--(v3) node[below,midway] {@{text "inj r\<^sub>3 c"}}; +\draw[->,line width=1mm](v4)--(v3) node[below,midway] {\inj r\<^sub>3 c\}; \node (v2) [left=of v3]{@{term "v\<^sub>2"}}; -\draw[->,line width=1mm](v3)--(v2) node[below,midway] {@{text "inj r\<^sub>2 b"}}; +\draw[->,line width=1mm](v3)--(v2) node[below,midway] {\inj r\<^sub>2 b\}; \node (v1) [left=of v2] {@{term "v\<^sub>1"}}; -\draw[->,line width=1mm](v2)--(v1) node[below,midway] {@{text "inj r\<^sub>1 a"}}; +\draw[->,line width=1mm](v2)--(v1) node[below,midway] {\inj r\<^sub>1 a\}; \draw (r4) node[anchor=north west] {\;\raisebox{-8mm}{@{term "mkeps"}}}; \end{tikzpicture} \end{center} @@ -647,8 +644,7 @@ makes some subtle choices leading to a POSIX value: for example if an alternative regular expression, say @{term "ALT r\<^sub>1 r\<^sub>2"}, can match the empty string and furthermore @{term "r\<^sub>1"} can match the - empty string, then we return a @{text Left}-value. The @{text - Right}-value will only be returned if @{term "r\<^sub>1"} cannot match the empty + empty string, then we return a \Left\-value. The \Right\-value will only be returned if @{term "r\<^sub>1"} cannot match the empty string. The most interesting idea from Sulzmann and Lu \cite{Sulzmann2014} is @@ -690,25 +686,25 @@ might be instructive to look first at the three sequence cases (clauses \textit{(4)} -- \textit{(6)}). In each case we need to construct an ``injected value'' for @{term "SEQ r\<^sub>1 r\<^sub>2"}. This must be a value of the form @{term - "Seq DUMMY DUMMY"}\,. Recall the clause of the @{text derivative}-function + "Seq DUMMY DUMMY"}\,. Recall the clause of the \derivative\-function for sequence regular expressions: \begin{center} @{thm (lhs) der.simps(5)[of c "r\<^sub>1" "r\<^sub>2"]} $\dn$ @{thm (rhs) der.simps(5)[of c "r\<^sub>1" "r\<^sub>2"]} \end{center} - \noindent Consider first the @{text "else"}-branch where the derivative is @{term + \noindent Consider first the \else\-branch where the derivative is @{term "SEQ (der c r\<^sub>1) r\<^sub>2"}. The corresponding value must therefore be of the form @{term "Seq v\<^sub>1 v\<^sub>2"}, which matches the left-hand - side in clause~\textit{(4)} of @{term inj}. In the @{text "if"}-branch the derivative is an + side in clause~\textit{(4)} of @{term inj}. In the \if\-branch the derivative is an alternative, namely @{term "ALT (SEQ (der c r\<^sub>1) r\<^sub>2) (der c - r\<^sub>2)"}. This means we either have to consider a @{text Left}- or - @{text Right}-value. In case of the @{text Left}-value we know further it + r\<^sub>2)"}. This means we either have to consider a \Left\- or + \Right\-value. In case of the \Left\-value we know further it must be a value for a sequence regular expression. Therefore the pattern we match in the clause \textit{(5)} is @{term "Left (Seq v\<^sub>1 v\<^sub>2)"}, while in \textit{(6)} it is just @{term "Right v\<^sub>2"}. One more interesting point is in the right-hand side of clause \textit{(6)}: since in this case the - regular expression @{text "r\<^sub>1"} does not ``contribute'' to + regular expression \r\<^sub>1\ does not ``contribute'' to matching the string, that means it only matches the empty string, we need to call @{const mkeps} in order to construct a value for how @{term "r\<^sub>1"} can match this empty string. A similar argument applies for why we can @@ -728,7 +724,7 @@ value has a prepended character @{term c}; the second part shows that the underlying string of an @{const mkeps}-value is always the empty string (given the regular expression is nullable since otherwise - @{text mkeps} might not be defined). + \mkeps\ might not be defined). \begin{lemma}\mbox{}\smallskip\\\label{Prf_injval_flat} \begin{tabular}{ll} @@ -743,16 +739,16 @@ an induction on @{term r}. There are no interesting cases.\qed \end{proof} - Having defined the @{const mkeps} and @{text inj} function we can extend + Having defined the @{const mkeps} and \inj\ function we can extend \Brz's matcher so that a value is constructed (assuming the regular expression matches the string). The clauses of the Sulzmann and Lu lexer are \begin{center} \begin{tabular}{lcl} @{thm (lhs) lexer.simps(1)} & $\dn$ & @{thm (rhs) lexer.simps(1)}\\ - @{thm (lhs) lexer.simps(2)} & $\dn$ & @{text "case"} @{term "lexer (der c r) s"} @{text of}\\ - & & \phantom{$|$} @{term "None"} @{text "\"} @{term None}\\ - & & $|$ @{term "Some v"} @{text "\"} @{term "Some (injval r c v)"} + @{thm (lhs) lexer.simps(2)} & $\dn$ & \case\ @{term "lexer (der c r) s"} \of\\\ + & & \phantom{$|$} @{term "None"} \\\ @{term None}\\ + & & $|$ @{term "Some v"} \\\ @{term "Some (injval r c v)"} \end{tabular} \end{center} @@ -784,24 +780,24 @@ \begin{figure}[t] \begin{center} \begin{tabular}{c} - @{thm[mode=Axiom] Posix.intros(1)}@{text "P"}@{term "ONE"} \qquad - @{thm[mode=Axiom] Posix.intros(2)}@{text "P"}@{term "c"}\medskip\\ - @{thm[mode=Rule] Posix.intros(3)[of "s" "r\<^sub>1" "v" "r\<^sub>2"]}@{text "P+L"}\qquad - @{thm[mode=Rule] Posix.intros(4)[of "s" "r\<^sub>2" "v" "r\<^sub>1"]}@{text "P+R"}\medskip\\ + @{thm[mode=Axiom] Posix.intros(1)}\P\@{term "ONE"} \qquad + @{thm[mode=Axiom] Posix.intros(2)}\P\@{term "c"}\medskip\\ + @{thm[mode=Rule] Posix.intros(3)[of "s" "r\<^sub>1" "v" "r\<^sub>2"]}\P+L\\qquad + @{thm[mode=Rule] Posix.intros(4)[of "s" "r\<^sub>2" "v" "r\<^sub>1"]}\P+R\\medskip\\ $\mprset{flushleft} \inferrule {@{thm (prem 1) Posix.intros(5)[of "s\<^sub>1" "r\<^sub>1" "v\<^sub>1" "s\<^sub>2" "r\<^sub>2" "v\<^sub>2"]} \qquad @{thm (prem 2) Posix.intros(5)[of "s\<^sub>1" "r\<^sub>1" "v\<^sub>1" "s\<^sub>2" "r\<^sub>2" "v\<^sub>2"]} \\\\ @{thm (prem 3) Posix.intros(5)[of "s\<^sub>1" "r\<^sub>1" "v\<^sub>1" "s\<^sub>2" "r\<^sub>2" "v\<^sub>2"]}} - {@{thm (concl) Posix.intros(5)[of "s\<^sub>1" "r\<^sub>1" "v\<^sub>1" "s\<^sub>2" "r\<^sub>2" "v\<^sub>2"]}}$@{text "PS"}\\ - @{thm[mode=Axiom] Posix.intros(7)}@{text "P[]"}\medskip\\ + {@{thm (concl) Posix.intros(5)[of "s\<^sub>1" "r\<^sub>1" "v\<^sub>1" "s\<^sub>2" "r\<^sub>2" "v\<^sub>2"]}}$\PS\\\ + @{thm[mode=Axiom] Posix.intros(7)}\P[]\\medskip\\ $\mprset{flushleft} \inferrule {@{thm (prem 1) Posix.intros(6)[of "s\<^sub>1" "r" "v" "s\<^sub>2" "vs"]} \qquad @{thm (prem 2) Posix.intros(6)[of "s\<^sub>1" "r" "v" "s\<^sub>2" "vs"]} \qquad @{thm (prem 3) Posix.intros(6)[of "s\<^sub>1" "r" "v" "s\<^sub>2" "vs"]} \\\\ @{thm (prem 4) Posix.intros(6)[of "s\<^sub>1" "r" "v" "s\<^sub>2" "vs"]}} - {@{thm (concl) Posix.intros(6)[of "s\<^sub>1" "r" "v" "s\<^sub>2" "vs"]}}$@{text "P\"} + {@{thm (concl) Posix.intros(6)[of "s\<^sub>1" "r" "v" "s\<^sub>2" "vs"]}}$\P\\ \end{tabular} \end{center} \caption{Our inductive definition of POSIX values.}\label{POSIXrules} @@ -825,13 +821,12 @@ \noindent We claim that our @{term "s \ r \ v"} relation captures the idea behind the four informal POSIX rules shown in the Introduction: Consider for example the - rules @{text "P+L"} and @{text "P+R"} where the POSIX value for a string + rules \P+L\ and \P+R\ where the POSIX value for a string and an alternative regular expression, that is @{term "(s, ALT r\<^sub>1 r\<^sub>2)"}, - is specified---it is always a @{text "Left"}-value, \emph{except} when the + is specified---it is always a \Left\-value, \emph{except} when the string to be matched is not in the language of @{term "r\<^sub>1"}; only then it - is a @{text Right}-value (see the side-condition in @{text "P+R"}). - Interesting is also the rule for sequence regular expressions (@{text - "PS"}). The first two premises state that @{term "v\<^sub>1"} and @{term "v\<^sub>2"} + is a \Right\-value (see the side-condition in \P+R\). + Interesting is also the rule for sequence regular expressions (\PS\). The first two premises state that @{term "v\<^sub>1"} and @{term "v\<^sub>2"} are the POSIX values for @{term "(s\<^sub>1, r\<^sub>1)"} and @{term "(s\<^sub>2, r\<^sub>2)"} respectively. Consider now the third premise and note that the POSIX value of this rule should match the string \mbox{@{term "s\<^sub>1 @ s\<^sub>2"}}. According to the @@ -841,21 +836,20 @@ \emph{exist} an @{term "s\<^sub>3"} and @{term "s\<^sub>4"} such that @{term "s\<^sub>2"} can be split up into a non-empty string @{term "s\<^sub>3"} and a possibly empty string @{term "s\<^sub>4"}. Moreover the longer string @{term "s\<^sub>1 @ s\<^sub>3"} can be - matched by @{text "r\<^sub>1"} and the shorter @{term "s\<^sub>4"} can still be + matched by \r\<^sub>1\ and the shorter @{term "s\<^sub>4"} can still be matched by @{term "r\<^sub>2"}. In this case @{term "s\<^sub>1"} would \emph{not} be the longest initial split of \mbox{@{term "s\<^sub>1 @ s\<^sub>2"}} and therefore @{term "Seq v\<^sub>1 v\<^sub>2"} cannot be a POSIX value for @{term "(s\<^sub>1 @ s\<^sub>2, SEQ r\<^sub>1 r\<^sub>2)"}. The main point is that our side-condition ensures the Longest Match Rule is satisfied. - A similar condition is imposed on the POSIX value in the @{text - "P\"}-rule. Also there we want that @{term "s\<^sub>1"} is the longest initial + A similar condition is imposed on the POSIX value in the \P\\-rule. Also there we want that @{term "s\<^sub>1"} is the longest initial split of @{term "s\<^sub>1 @ s\<^sub>2"} and furthermore the corresponding value @{term v} cannot be flattened to the empty string. In effect, we require that in each ``iteration'' of the star, some non-empty substring needs to be ``chipped'' away; only in case of the empty string we accept @{term "Stars []"} as the POSIX value. Indeed we can show that our POSIX values - are lexical values which exclude those @{text Stars} that contain subvalues + are lexical values which exclude those \Stars\ that contain subvalues that flatten to the empty string. \begin{lemma}\label{LVposix} @@ -879,7 +873,7 @@ \end{proof} \noindent - The central lemma for our POSIX relation is that the @{text inj}-function + The central lemma for our POSIX relation is that the \inj\-function preserves POSIX values. \begin{lemma}\label{Posix2} @@ -887,17 +881,17 @@ \end{lemma} \begin{proof} - By induction on @{text r}. We explain two cases. + By induction on \r\. We explain two cases. \begin{itemize} \item[$\bullet$] Case @{term "r = ALT r\<^sub>1 r\<^sub>2"}. There are - two subcases, namely @{text "(a)"} \mbox{@{term "v = Left v'"}} and @{term - "s \ der c r\<^sub>1 \ v'"}; and @{text "(b)"} @{term "v = Right v'"}, @{term - "s \ L (der c r\<^sub>1)"} and @{term "s \ der c r\<^sub>2 \ v'"}. In @{text "(a)"} we + two subcases, namely \(a)\ \mbox{@{term "v = Left v'"}} and @{term + "s \ der c r\<^sub>1 \ v'"}; and \(b)\ @{term "v = Right v'"}, @{term + "s \ L (der c r\<^sub>1)"} and @{term "s \ der c r\<^sub>2 \ v'"}. In \(a)\ we know @{term "s \ der c r\<^sub>1 \ v'"}, from which we can infer @{term "(c # s) \ r\<^sub>1 \ injval r\<^sub>1 c v'"} by induction hypothesis and hence @{term "(c # s) \ ALT r\<^sub>1 r\<^sub>2 \ injval (ALT r\<^sub>1 r\<^sub>2) c (Left v')"} as needed. Similarly - in subcase @{text "(b)"} where, however, in addition we have to use + in subcase \(b)\ where, however, in addition we have to use Proposition~\ref{derprop}(2) in order to infer @{term "c # s \ L r\<^sub>1"} from @{term "s \ L (der c r\<^sub>1)"}.\smallskip @@ -905,13 +899,13 @@ \begin{quote} \begin{description} - \item[@{text "(a)"}] @{term "v = Left (Seq v\<^sub>1 v\<^sub>2)"} and @{term "nullable r\<^sub>1"} - \item[@{text "(b)"}] @{term "v = Right v\<^sub>1"} and @{term "nullable r\<^sub>1"} - \item[@{text "(c)"}] @{term "v = Seq v\<^sub>1 v\<^sub>2"} and @{term "\ nullable r\<^sub>1"} + \item[\(a)\] @{term "v = Left (Seq v\<^sub>1 v\<^sub>2)"} and @{term "nullable r\<^sub>1"} + \item[\(b)\] @{term "v = Right v\<^sub>1"} and @{term "nullable r\<^sub>1"} + \item[\(c)\] @{term "v = Seq v\<^sub>1 v\<^sub>2"} and @{term "\ nullable r\<^sub>1"} \end{description} \end{quote} - \noindent For @{text "(a)"} we know @{term "s\<^sub>1 \ der c r\<^sub>1 \ v\<^sub>1"} and + \noindent For \(a)\ we know @{term "s\<^sub>1 \ der c r\<^sub>1 \ v\<^sub>1"} and @{term "s\<^sub>2 \ r\<^sub>2 \ v\<^sub>2"} as well as % \[@{term "\ (\s\<^sub>3 s\<^sub>4. s\<^sub>3 \ [] \ s\<^sub>3 @ s\<^sub>4 = s\<^sub>2 \ s\<^sub>1 @ s\<^sub>3 \ L (der c r\<^sub>1) \ s\<^sub>4 \ L r\<^sub>2)"}\] @@ -920,12 +914,12 @@ % \[@{term "\ (\s\<^sub>3 s\<^sub>4. s\<^sub>3 \ [] \ s\<^sub>3 @ s\<^sub>4 = s\<^sub>2 \ (c # s\<^sub>1) @ s\<^sub>3 \ L r\<^sub>1 \ s\<^sub>4 \ L r\<^sub>2)"}\] - \noindent We can use the induction hypothesis for @{text "r\<^sub>1"} to obtain + \noindent We can use the induction hypothesis for \r\<^sub>1\ to obtain @{term "(c # s\<^sub>1) \ r\<^sub>1 \ injval r\<^sub>1 c v\<^sub>1"}. Putting this all together allows us to infer - @{term "((c # s\<^sub>1) @ s\<^sub>2) \ SEQ r\<^sub>1 r\<^sub>2 \ Seq (injval r\<^sub>1 c v\<^sub>1) v\<^sub>2"}. The case @{text "(c)"} + @{term "((c # s\<^sub>1) @ s\<^sub>2) \ SEQ r\<^sub>1 r\<^sub>2 \ Seq (injval r\<^sub>1 c v\<^sub>1) v\<^sub>2"}. The case \(c)\ is similar. - For @{text "(b)"} we know @{term "s \ der c r\<^sub>2 \ v\<^sub>1"} and + For \(b)\ we know @{term "s \ der c r\<^sub>2 \ v\<^sub>1"} and @{term "s\<^sub>1 @ s\<^sub>2 \ L (SEQ (der c r\<^sub>1) r\<^sub>2)"}. From the former we have @{term "(c # s) \ r\<^sub>2 \ (injval r\<^sub>2 c v\<^sub>1)"} by induction hypothesis for @{term "r\<^sub>2"}. From the latter we can infer @@ -979,11 +973,11 @@ In the next section we show that our specification coincides with another one given by Okui and Suzuki using a different technique. -*} +\ -section {* Ordering of Values according to Okui and Suzuki*} +section \Ordering of Values according to Okui and Suzuki\ -text {* +text \ While in the previous section we have defined POSIX values directly in terms of a ternary relation (see inference rules in Figure~\ref{POSIXrules}), @@ -1014,54 +1008,51 @@ \end{center} \noindent - At position @{text "[0,1]"} of this value is the - subvalue @{text "Char y"} and at position @{text "[1]"} the + At position \[0,1]\ of this value is the + subvalue \Char y\ and at position \[1]\ the subvalue @{term "Char z"}. At the `root' position, or empty list - @{term "[]"}, is the whole value @{term v}. Positions such as @{text - "[0,1,0]"} or @{text "[2]"} are outside of @{text - v}. If it exists, the subvalue of @{term v} at a position @{text - p}, written @{term "at v p"}, can be recursively defined by + @{term "[]"}, is the whole value @{term v}. Positions such as \[0,1,0]\ or \[2]\ are outside of \v\. If it exists, the subvalue of @{term v} at a position \p\, written @{term "at v p"}, can be recursively defined by \begin{center} \begin{tabular}{r@ {\hspace{0mm}}lcl} - @{term v} & @{text "\\<^bsub>[]\<^esub>"} & @{text "\"}& @{thm (rhs) at.simps(1)}\\ - @{term "Left v"} & @{text "\\<^bsub>0::ps\<^esub>"} & @{text "\"}& @{thm (rhs) at.simps(2)}\\ - @{term "Right v"} & @{text "\\<^bsub>1::ps\<^esub>"} & @{text "\"} & + @{term v} & \\\<^bsub>[]\<^esub>\ & \\\& @{thm (rhs) at.simps(1)}\\ + @{term "Left v"} & \\\<^bsub>0::ps\<^esub>\ & \\\& @{thm (rhs) at.simps(2)}\\ + @{term "Right v"} & \\\<^bsub>1::ps\<^esub>\ & \\\ & @{thm (rhs) at.simps(3)[simplified Suc_0_fold]}\\ - @{term "Seq v\<^sub>1 v\<^sub>2"} & @{text "\\<^bsub>0::ps\<^esub>"} & @{text "\"} & + @{term "Seq v\<^sub>1 v\<^sub>2"} & \\\<^bsub>0::ps\<^esub>\ & \\\ & @{thm (rhs) at.simps(4)[where ?v1.0="v\<^sub>1" and ?v2.0="v\<^sub>2"]} \\ - @{term "Seq v\<^sub>1 v\<^sub>2"} & @{text "\\<^bsub>1::ps\<^esub>"} - & @{text "\"} & + @{term "Seq v\<^sub>1 v\<^sub>2"} & \\\<^bsub>1::ps\<^esub>\ + & \\\ & @{thm (rhs) at.simps(5)[where ?v1.0="v\<^sub>1" and ?v2.0="v\<^sub>2", simplified Suc_0_fold]} \\ - @{term "Stars vs"} & @{text "\\<^bsub>n::ps\<^esub>"} & @{text "\"}& @{thm (rhs) at.simps(6)}\\ + @{term "Stars vs"} & \\\<^bsub>n::ps\<^esub>\ & \\\& @{thm (rhs) at.simps(6)}\\ \end{tabular} \end{center} \noindent In the last clause we use Isabelle's notation @{term "vs ! n"} for the - @{text n}th element in a list. The set of positions inside a value @{text v}, + \n\th element in a list. The set of positions inside a value \v\, written @{term "Pos v"}, is given by \begin{center} \begin{tabular}{lcl} - @{thm (lhs) Pos.simps(1)} & @{text "\"} & @{thm (rhs) Pos.simps(1)}\\ - @{thm (lhs) Pos.simps(2)} & @{text "\"} & @{thm (rhs) Pos.simps(2)}\\ - @{thm (lhs) Pos.simps(3)} & @{text "\"} & @{thm (rhs) Pos.simps(3)}\\ - @{thm (lhs) Pos.simps(4)} & @{text "\"} & @{thm (rhs) Pos.simps(4)}\\ + @{thm (lhs) Pos.simps(1)} & \\\ & @{thm (rhs) Pos.simps(1)}\\ + @{thm (lhs) Pos.simps(2)} & \\\ & @{thm (rhs) Pos.simps(2)}\\ + @{thm (lhs) Pos.simps(3)} & \\\ & @{thm (rhs) Pos.simps(3)}\\ + @{thm (lhs) Pos.simps(4)} & \\\ & @{thm (rhs) Pos.simps(4)}\\ @{thm (lhs) Pos.simps(5)[where ?v1.0="v\<^sub>1" and ?v2.0="v\<^sub>2"]} - & @{text "\"} + & \\\ & @{thm (rhs) Pos.simps(5)[where ?v1.0="v\<^sub>1" and ?v2.0="v\<^sub>2"]}\\ - @{thm (lhs) Pos_stars} & @{text "\"} & @{thm (rhs) Pos_stars}\\ + @{thm (lhs) Pos_stars} & \\\ & @{thm (rhs) Pos_stars}\\ \end{tabular} \end{center} \noindent - whereby @{text len} in the last clause stands for the length of a list. Clearly + whereby \len\ in the last clause stands for the length of a list. Clearly for every position inside a value there exists a subvalue at that position. To help understanding the ordering of Okui and Suzuki, consider again the earlier value - @{text v} and compare it with the following @{text w}: + \v\ and compare it with the following \w\: \begin{center} \begin{tabular}{l} @@ -1070,16 +1061,16 @@ \end{tabular} \end{center} - \noindent Both values match the string @{text "xyz"}, that means if + \noindent Both values match the string \xyz\, that means if we flatten these values at their respective root position, we obtain - @{text "xyz"}. However, at position @{text "[0]"}, @{text v} matches - @{text xy} whereas @{text w} matches only the shorter @{text x}. So - according to the Longest Match Rule, we should prefer @{text v}, - rather than @{text w} as POSIX value for string @{text xyz} (and + \xyz\. However, at position \[0]\, \v\ matches + \xy\ whereas \w\ matches only the shorter \x\. So + according to the Longest Match Rule, we should prefer \v\, + rather than \w\ as POSIX value for string \xyz\ (and corresponding regular expression). In order to formalise this idea, Okui and Suzuki introduce a measure for - subvalues at position @{text p}, called the \emph{norm} of @{text v} - at position @{text p}. We can define this measure in Isabelle as an + subvalues at position \p\, called the \emph{norm} of \v\ + at position \p\. We can define this measure in Isabelle as an integer as follows \begin{center} @@ -1087,10 +1078,10 @@ \end{center} \noindent where we take the length of the flattened value at - position @{text p}, provided the position is inside @{text v}; if - not, then the norm is @{text "-1"}. The default for outside + position \p\, provided the position is inside \v\; if + not, then the norm is \-1\. The default for outside positions is crucial for the POSIX requirement of preferring a - @{text Left}-value over a @{text Right}-value (if they can match the + \Left\-value over a \Right\-value (if they can match the same string---see the Priority Rule from the Introduction). For this consider @@ -1098,17 +1089,17 @@ @{term "v == Left (Char x)"} \qquad and \qquad @{term "w == Right (Char x)"} \end{center} - \noindent Both values match @{text x}. At position @{text "[0]"} - the norm of @{term v} is @{text 1} (the subvalue matches @{text x}), - but the norm of @{text w} is @{text "-1"} (the position is outside - @{text w} according to how we defined the `inside' positions of - @{text Left}- and @{text Right}-values). Of course at position - @{text "[1]"}, the norms @{term "pflat_len v [1]"} and @{term + \noindent Both values match \x\. At position \[0]\ + the norm of @{term v} is \1\ (the subvalue matches \x\), + but the norm of \w\ is \-1\ (the position is outside + \w\ according to how we defined the `inside' positions of + \Left\- and \Right\-values). Of course at position + \[1]\, the norms @{term "pflat_len v [1]"} and @{term "pflat_len w [1]"} are reversed, but the point is that subvalues will be analysed according to lexicographically ordered - positions. According to this ordering, the position @{text "[0]"} - takes precedence over @{text "[1]"} and thus also @{text v} will be - preferred over @{text w}. The lexicographic ordering of positions, written + positions. According to this ordering, the position \[0]\ + takes precedence over \[1]\ and thus also \v\ will be + preferred over \w\. The lexicographic ordering of positions, written @{term "DUMMY \lex DUMMY"}, can be conveniently formalised by three inference rules @@ -1123,18 +1114,18 @@ With the norm and lexicographic order in place, we can state the key definition of Okui and Suzuki - \cite{OkuiSuzuki2010}: a value @{term "v\<^sub>1"} is \emph{smaller at position @{text p}} than + \cite{OkuiSuzuki2010}: a value @{term "v\<^sub>1"} is \emph{smaller at position \p\} than @{term "v\<^sub>2"}, written @{term "v\<^sub>1 \val p v\<^sub>2"}, - if and only if $(i)$ the norm at position @{text p} is + if and only if $(i)$ the norm at position \p\ is greater in @{term "v\<^sub>1"} (that is the string @{term "flat (at v\<^sub>1 p)"} is longer than @{term "flat (at v\<^sub>2 p)"}) and $(ii)$ all subvalues at positions that are inside @{term "v\<^sub>1"} or @{term "v\<^sub>2"} and that are - lexicographically smaller than @{text p}, we have the same norm, namely + lexicographically smaller than \p\, we have the same norm, namely \begin{center} \begin{tabular}{c} @{thm (lhs) PosOrd_def[where ?v1.0="v\<^sub>1" and ?v2.0="v\<^sub>2"]} - @{text "\"} + \\\ $\begin{cases} (i) & @{term "pflat_len v\<^sub>1 p > pflat_len v\<^sub>2 p"} \quad\text{and}\smallskip \\ (ii) & @{term "(\q \ Pos v\<^sub>1 \ Pos v\<^sub>2. q \lex p --> pflat_len v\<^sub>1 q = pflat_len v\<^sub>2 q)"} @@ -1142,11 +1133,9 @@ \end{tabular} \end{center} - \noindent The position @{text p} in this definition acts as the - \emph{first distinct position} of @{text "v\<^sub>1"} and @{text - "v\<^sub>2"}, where both values match strings of different length - \cite{OkuiSuzuki2010}. Since at @{text p} the values @{text - "v\<^sub>1"} and @{text "v\<^sub>2"} match different strings, the + \noindent The position \p\ in this definition acts as the + \emph{first distinct position} of \v\<^sub>1\ and \v\<^sub>2\, where both values match strings of different length + \cite{OkuiSuzuki2010}. Since at \p\ the values \v\<^sub>1\ and \v\<^sub>2\ match different strings, the ordering is irreflexive. Derived from the definition above are the following two orderings: @@ -1168,11 +1157,8 @@ @{thm [mode=IfThen] PosOrd_trans[where ?v1.0="v\<^sub>1" and ?v2.0="v\<^sub>2" and ?v3.0="v\<^sub>3"]} \end{lemma} - \begin{proof} From the assumption we obtain two positions @{text p} - and @{text q}, where the values @{text "v\<^sub>1"} and @{text - "v\<^sub>2"} (respectively @{text "v\<^sub>2"} and @{text - "v\<^sub>3"}) are `distinct'. Since @{text - "\\<^bsub>lex\<^esub>"} is trichotomous, we need to consider + \begin{proof} From the assumption we obtain two positions \p\ + and \q\, where the values \v\<^sub>1\ and \v\<^sub>2\ (respectively \v\<^sub>2\ and \v\<^sub>3\) are `distinct'. Since \\\<^bsub>lex\<^esub>\ is trichotomous, we need to consider three cases, namely @{term "p = q"}, @{term "p \lex q"} and @{term "q \lex p"}. Let us look at the first case. Clearly @{term "pflat_len v\<^sub>2 p < pflat_len v\<^sub>1 p"} and @{term @@ -1184,7 +1170,7 @@ v\<^sub>1"}, then we can infer from the first assumption that @{term "pflat_len v\<^sub>1 p' = pflat_len v\<^sub>2 p'"}. But this means that @{term "p'"} must be in @{term "Pos v\<^sub>2"} too (the norm - cannot be @{text "-1"} given @{term "p' \ Pos v\<^sub>1"}). + cannot be \-1\ given @{term "p' \ Pos v\<^sub>1"}). Hence we can use the second assumption and infer @{term "pflat_len v\<^sub>2 p' = pflat_len v\<^sub>3 p'"}, which concludes this case with @{term "v\<^sub>1 :\val @@ -1193,15 +1179,15 @@ \noindent The proof for $\preccurlyeq$ is similar and omitted. - It is also straightforward to show that @{text "\"} and + It is also straightforward to show that \\\ and $\preccurlyeq$ are partial orders. Okui and Suzuki furthermore show that they are linear orderings for lexical values \cite{OkuiSuzuki2010} of a given regular expression and given string, but we have not formalised this in Isabelle. It is not essential for our results. What we are going to show below is - that for a given @{text r} and @{text s}, the orderings have a unique + that for a given \r\ and \s\, the orderings have a unique minimal element on the set @{term "LV r s"}, which is the POSIX value we defined in the previous section. We start with two properties that - show how the length of a flattened value relates to the @{text "\"}-ordering. + show how the length of a flattened value relates to the \\\-ordering. \begin{proposition}\mbox{}\smallskip\\\label{ordlen} \begin{tabular}{@ {}ll} @@ -1259,8 +1245,7 @@ \noindent One might prefer that statements \textit{(4)} and \textit{(5)} (respectively \textit{(6)} and \textit{(7)}) - are combined into a single \textit{iff}-statement (like the ones for @{text - Left} and @{text Right}). Unfortunately this cannot be done easily: such + are combined into a single \textit{iff}-statement (like the ones for \Left\ and \Right\). Unfortunately this cannot be done easily: such a single statement would require an additional assumption about the two values @{term "Seq v\<^sub>1 v\<^sub>2"} and @{term "Seq w\<^sub>1 w\<^sub>2"} being inhabited by the same regular expression. The @@ -1271,10 +1256,8 @@ Next we establish how Okui and Suzuki's orderings relate to our - definition of POSIX values. Given a @{text POSIX} value @{text "v\<^sub>1"} - for @{text r} and @{text s}, then any other lexical value @{text - "v\<^sub>2"} in @{term "LV r s"} is greater or equal than @{text - "v\<^sub>1"}, namely: + definition of POSIX values. Given a \POSIX\ value \v\<^sub>1\ + for \r\ and \s\, then any other lexical value \v\<^sub>2\ in @{term "LV r s"} is greater or equal than \v\<^sub>1\, namely: \begin{theorem}\label{orderone} @@ -1283,53 +1266,53 @@ \begin{proof} By induction on our POSIX rules. By Theorem~\ref{posixdeterm} and the definition of @{const LV}, it is clear - that @{text "v\<^sub>1"} and @{text "v\<^sub>2"} have the same + that \v\<^sub>1\ and \v\<^sub>2\ have the same underlying string @{term s}. The three base cases are straightforward: for example for @{term "v\<^sub>1 = Void"}, we have that @{term "v\<^sub>2 \ LV ONE []"} must also be of the form \mbox{@{term "v\<^sub>2 = Void"}}. Therefore we have @{term "v\<^sub>1 :\val v\<^sub>2"}. The inductive cases for - @{text r} being of the form @{term "ALT r\<^sub>1 r\<^sub>2"} and + \r\ being of the form @{term "ALT r\<^sub>1 r\<^sub>2"} and @{term "SEQ r\<^sub>1 r\<^sub>2"} are as follows: \begin{itemize} - \item[$\bullet$] Case @{text "P+L"} with @{term "s \ (ALT r\<^sub>1 r\<^sub>2) + \item[$\bullet$] Case \P+L\ with @{term "s \ (ALT r\<^sub>1 r\<^sub>2) \ (Left w\<^sub>1)"}: In this case the value @{term "v\<^sub>2"} is either of the form @{term "Left w\<^sub>2"} or @{term "Right w\<^sub>2"}. In the latter case we can immediately conclude with \mbox{@{term "v\<^sub>1 - :\val v\<^sub>2"}} since a @{text Left}-value with the - same underlying string @{text s} is always smaller than a - @{text Right}-value by Proposition~\ref{ordintros}\textit{(1)}. + :\val v\<^sub>2"}} since a \Left\-value with the + same underlying string \s\ is always smaller than a + \Right\-value by Proposition~\ref{ordintros}\textit{(1)}. In the former case we have @{term "w\<^sub>2 \ LV r\<^sub>1 s"} and can use the induction hypothesis to infer @{term "w\<^sub>1 :\val w\<^sub>2"}. Because @{term "w\<^sub>1"} and @{term "w\<^sub>2"} have the same underlying string - @{text s}, we can conclude with @{term "Left w\<^sub>1 + \s\, we can conclude with @{term "Left w\<^sub>1 :\val Left w\<^sub>2"} using Proposition~\ref{ordintros}\textit{(2)}.\smallskip - \item[$\bullet$] Case @{text "P+R"} with @{term "s \ (ALT r\<^sub>1 r\<^sub>2) + \item[$\bullet$] Case \P+R\ with @{term "s \ (ALT r\<^sub>1 r\<^sub>2) \ (Right w\<^sub>1)"}: This case similar to the previous case, except that we additionally know @{term "s \ L r\<^sub>1"}. This is needed when @{term "v\<^sub>2"} is of the form \mbox{@{term "Left w\<^sub>2"}}. Since \mbox{@{term "flat v\<^sub>2 = flat - w\<^sub>2"} @{text "= s"}} and @{term "\ w\<^sub>2 : + w\<^sub>2"} \= s\} and @{term "\ w\<^sub>2 : r\<^sub>1"}, we can derive a contradiction for \mbox{@{term "s \ L r\<^sub>1"}} using Proposition~\ref{inhabs}. So also in this case \mbox{@{term "v\<^sub>1 :\val v\<^sub>2"}}.\smallskip - \item[$\bullet$] Case @{text "PS"} with @{term "(s\<^sub>1 @ + \item[$\bullet$] Case \PS\ with @{term "(s\<^sub>1 @ s\<^sub>2) \ (SEQ r\<^sub>1 r\<^sub>2) \ (Seq w\<^sub>1 w\<^sub>2)"}: We can assume @{term "v\<^sub>2 = Seq (u\<^sub>1) (u\<^sub>2)"} with @{term "\ u\<^sub>1 : r\<^sub>1"} and \mbox{@{term "\ u\<^sub>2 : r\<^sub>2"}}. We have @{term "s\<^sub>1 @ s\<^sub>2 = (flat u\<^sub>1) @ (flat u\<^sub>2)"}. By the side-condition of the - @{text PS}-rule we know that either @{term "s\<^sub>1 = flat + \PS\-rule we know that either @{term "s\<^sub>1 = flat u\<^sub>1"} or that @{term "flat u\<^sub>1"} is a strict prefix of @{term "s\<^sub>1"}. In the latter case we can infer @{term "w\<^sub>1 :\val u\<^sub>1"} by @@ -1348,19 +1331,18 @@ \end{itemize} - \noindent The case for @{text "P\"} is similar to the @{text PS}-case and omitted.\qed + \noindent The case for \P\\ is similar to the \PS\-case and omitted.\qed \end{proof} - \noindent This theorem shows that our @{text POSIX} value for a - regular expression @{text r} and string @{term s} is in fact a - minimal element of the values in @{text "LV r s"}. By + \noindent This theorem shows that our \POSIX\ value for a + regular expression \r\ and string @{term s} is in fact a + minimal element of the values in \LV r s\. By Proposition~\ref{ordlen}\textit{(2)} we also know that any value in - @{text "LV r s'"}, with @{term "s'"} being a strict prefix, cannot be - smaller than @{text "v\<^sub>1"}. The next theorem shows the + \LV r s'\, with @{term "s'"} being a strict prefix, cannot be + smaller than \v\<^sub>1\. The next theorem shows the opposite---namely any minimal element in @{term "LV r s"} must be a - @{text POSIX} value. This can be established by induction on @{text - r}, but the proof can be drastically simplified by using the fact - from the previous section about the existence of a @{text POSIX} value + \POSIX\ value. This can be established by induction on \r\, but the proof can be drastically simplified by using the fact + from the previous section about the existence of a \POSIX\ value whenever a string @{term "s \ L r"}. @@ -1372,7 +1354,7 @@ If @{thm (prem 1) PosOrd_Posix[where ?v1.0="v\<^sub>1"]} then @{term "s \ L r"} by Proposition~\ref{inhabs}. Hence by Theorem~\ref{lexercorrect}(2) there exists a - @{text POSIX} value @{term "v\<^sub>P"} with @{term "s \ r \ v\<^sub>P"} + \POSIX\ value @{term "v\<^sub>P"} with @{term "s \ r \ v\<^sub>P"} and by Lemma~\ref{LVposix} we also have \mbox{@{term "v\<^sub>P \ LV r s"}}. By Theorem~\ref{orderone} we therefore have @{term "v\<^sub>P :\val v\<^sub>1"}. If @{term "v\<^sub>P = v\<^sub>1"} then @@ -1393,17 +1375,17 @@ \noindent To sum up, we have shown that the (unique) minimal elements - of the ordering by Okui and Suzuki are exactly the @{text POSIX} + of the ordering by Okui and Suzuki are exactly the \POSIX\ values we defined inductively in Section~\ref{posixsec}. This provides - an independent confirmation that our ternary relation formalise the + an independent confirmation that our ternary relation formalises the informal POSIX rules. -*} +\ -section {* Bitcoded Lexing *} +section \Bitcoded Lexing\ -text {* +text \ Incremental calculation of the value. To simplify the proof we first define the function @{const flex} which calculates the ``iterated'' injection function. With this we can @@ -1502,7 +1484,7 @@ @{thm [mode=IfThen] bder_retrieve} -By induction on @{text r} +By induction on \r\ \begin{theorem}[Main Lemma]\mbox{}\\ @{thm [mode=IfThen] MAIN_decode} @@ -1518,11 +1500,11 @@ @{thm blexer_correctness} \end{theorem} -*} +\ -section {* Optimisations *} +section \Optimisations\ -text {* +text \ Derivatives as calculated by \Brz's method are usually more complex regular expressions than the initial one; the result is that the @@ -1538,10 +1520,10 @@ \begin{equation}\label{Simpl} \begin{array}{lcllcllcllcl} - @{term "ALT ZERO r"} & @{text "\"} & @{term r} \hspace{8mm}%\\ - @{term "ALT r ZERO"} & @{text "\"} & @{term r} \hspace{8mm}%\\ - @{term "SEQ ONE r"} & @{text "\"} & @{term r} \hspace{8mm}%\\ - @{term "SEQ r ONE"} & @{text "\"} & @{term r} + @{term "ALT ZERO r"} & \\\ & @{term r} \hspace{8mm}%\\ + @{term "ALT r ZERO"} & \\\ & @{term r} \hspace{8mm}%\\ + @{term "SEQ ONE r"} & \\\ & @{term r} \hspace{8mm}%\\ + @{term "SEQ r ONE"} & \\\ & @{term r} \end{array} \end{equation} @@ -1558,15 +1540,15 @@ \begin{center} \begin{tabular}{lcl} - @{thm (lhs) F_RIGHT.simps(1)} & $\dn$ & @{text "Right (f v)"}\\ - @{thm (lhs) F_LEFT.simps(1)} & $\dn$ & @{text "Left (f v)"}\\ + @{thm (lhs) F_RIGHT.simps(1)} & $\dn$ & \Right (f v)\\\ + @{thm (lhs) F_LEFT.simps(1)} & $\dn$ & \Left (f v)\\\ - @{thm (lhs) F_ALT.simps(1)} & $\dn$ & @{text "Right (f\<^sub>2 v)"}\\ - @{thm (lhs) F_ALT.simps(2)} & $\dn$ & @{text "Left (f\<^sub>1 v)"}\\ + @{thm (lhs) F_ALT.simps(1)} & $\dn$ & \Right (f\<^sub>2 v)\\\ + @{thm (lhs) F_ALT.simps(2)} & $\dn$ & \Left (f\<^sub>1 v)\\\ - @{thm (lhs) F_SEQ1.simps(1)} & $\dn$ & @{text "Seq (f\<^sub>1 ()) (f\<^sub>2 v)"}\\ - @{thm (lhs) F_SEQ2.simps(1)} & $\dn$ & @{text "Seq (f\<^sub>1 v) (f\<^sub>2 ())"}\\ - @{thm (lhs) F_SEQ.simps(1)} & $\dn$ & @{text "Seq (f\<^sub>1 v\<^sub>1) (f\<^sub>2 v\<^sub>2)"}\medskip\\ + @{thm (lhs) F_SEQ1.simps(1)} & $\dn$ & \Seq (f\<^sub>1 ()) (f\<^sub>2 v)\\\ + @{thm (lhs) F_SEQ2.simps(1)} & $\dn$ & \Seq (f\<^sub>1 v) (f\<^sub>2 ())\\\ + @{thm (lhs) F_SEQ.simps(1)} & $\dn$ & \Seq (f\<^sub>1 v\<^sub>1) (f\<^sub>2 v\<^sub>2)\\medskip\\ %\end{tabular} % %\begin{tabular}{lcl} @@ -1580,7 +1562,7 @@ \end{center} \noindent - The functions @{text "simp\<^bsub>Alt\<^esub>"} and @{text "simp\<^bsub>Seq\<^esub>"} encode the simplification rules + The functions \simp\<^bsub>Alt\<^esub>\ and \simp\<^bsub>Seq\<^esub>\ encode the simplification rules in \eqref{Simpl} and compose the rectification functions (simplifications can occur deep inside the regular expression). The main simplification function is then @@ -1602,17 +1584,17 @@ \begin{tabular}{lcl} @{thm (lhs) slexer.simps(1)} & $\dn$ & @{thm (rhs) slexer.simps(1)}\\ @{thm (lhs) slexer.simps(2)} & $\dn$ & - @{text "let (r\<^sub>s, f\<^sub>r) = simp (r "}$\backslash$@{text " c) in"}\\ - & & @{text "case"} @{term "slexer r\<^sub>s s"} @{text of}\\ - & & \phantom{$|$} @{term "None"} @{text "\"} @{term None}\\ - & & $|$ @{term "Some v"} @{text "\"} @{text "Some (inj r c (f\<^sub>r v))"} + \let (r\<^sub>s, f\<^sub>r) = simp (r \$\backslash$\ c) in\\\ + & & \case\ @{term "slexer r\<^sub>s s"} \of\\\ + & & \phantom{$|$} @{term "None"} \\\ @{term None}\\ + & & $|$ @{term "Some v"} \\\ \Some (inj r c (f\<^sub>r v))\ \end{tabular} \end{center} \noindent In the second clause we first calculate the derivative @{term "der c r"} and then simplify the result. This gives us a simplified derivative - @{text "r\<^sub>s"} and a rectification function @{text "f\<^sub>r"}. The lexer + \r\<^sub>s\ and a rectification function \f\<^sub>r\. The lexer is then recursively called with the simplified derivative, but before we inject the character @{term c} into the value @{term v}, we need to rectify @{term v} (that is construct @{term "f\<^sub>r v"}). Before we can establish the correctness @@ -1627,7 +1609,7 @@ \end{tabular} \end{lemma} - \begin{proof} Both are by induction on @{text r}. There is no + \begin{proof} Both are by induction on \r\. There is no interesting case for the first statement. For the second statement, of interest are the @{term "r = ALT r\<^sub>1 r\<^sub>2"} and @{term "r = SEQ r\<^sub>1 r\<^sub>2"} cases. In each case we have to analyse four subcases whether @@ -1639,7 +1621,7 @@ and by IH also (*) @{term "s \ r\<^sub>2 \ (snd (simp r\<^sub>2) v)"}. Given @{term "fst (simp r\<^sub>1) = ZERO"} we know @{term "L (fst (simp r\<^sub>1)) = {}"}. By the first statement @{term "L r\<^sub>1"} is the empty set, meaning (**) @{term "s \ L r\<^sub>1"}. - Taking (*) and (**) together gives by the \mbox{@{text "P+R"}}-rule + Taking (*) and (**) together gives by the \mbox{\P+R\}-rule @{term "s \ ALT r\<^sub>1 r\<^sub>2 \ Right (snd (simp r\<^sub>2) v)"}. In turn this gives @{term "s \ ALT r\<^sub>1 r\<^sub>2 \ snd (simp (ALT r\<^sub>1 r\<^sub>2)) v"} as we need to show. The other cases are similar.\qed @@ -1683,12 +1665,12 @@ \end{proof} -*} +\ -section {* HERE *} +section \HERE\ -text {* +text \ \begin{center} \begin{tabular}{llcl} 1) & @{thm (lhs) erase.simps(1)} & $\dn$ & @{thm (rhs) erase.simps(1)}\\ @@ -1751,7 +1733,7 @@ We can move out the @{term "fuse [Z]"} and then use the IH to show that left-hand side and right-hand side are equal. This completes the proof. \end{proof} -*} +\