diff -r a730a5a0bab9 -r 89e6605c4ca4 thys/Journal/Paper.thy --- a/thys/Journal/Paper.thy Tue Jul 23 21:21:49 2019 +0100 +++ b/thys/Journal/Paper.thy Mon Jul 29 09:37:20 2019 +0100 @@ -132,14 +132,14 @@ -section {* Introduction *} +section \Introduction\ -text {* +text \ Brzozowski \cite{Brzozowski1964} introduced the notion of the {\em -derivative} @{term "der c r"} of a regular expression @{text r} w.r.t.\ -a character~@{text c}, and showed that it gave a simple solution to the +derivative} @{term "der c r"} of a regular expression \r\ w.r.t.\ +a character~\c\, and showed that it gave a simple solution to the problem of matching a string @{term s} with a regular expression @{term r}: if the derivative of @{term r} w.r.t.\ (in succession) all the characters of the string matches the empty string, then @{term r} @@ -175,8 +175,7 @@ into a sequence of tokens, POSIX is the more natural disambiguation strategy for what programmers consider basic syntactic building blocks in their programs. These building blocks are often specified by some -regular expressions, say @{text "r\<^bsub>key\<^esub>"} and @{text -"r\<^bsub>id\<^esub>"} for recognising keywords and identifiers, +regular expressions, say \r\<^bsub>key\<^esub>\ and \r\<^bsub>id\<^esub>\ for recognising keywords and identifiers, respectively. There are a few underlying (informal) rules behind tokenising a string in a POSIX \cite{POSIX} fashion: @@ -196,23 +195,22 @@ be longer than no match at all. \end{itemize} -\noindent Consider for example a regular expression @{text -"r\<^bsub>key\<^esub>"} for recognising keywords such as @{text "if"}, -@{text "then"} and so on; and @{text "r\<^bsub>id\<^esub>"} +\noindent Consider for example a regular expression \r\<^bsub>key\<^esub>\ for recognising keywords such as \if\, +\then\ and so on; and \r\<^bsub>id\<^esub>\ recognising identifiers (say, a single character followed by characters or numbers). Then we can form the regular expression -@{text "(r\<^bsub>key\<^esub> + r\<^bsub>id\<^esub>)\<^sup>\"} -and use POSIX matching to tokenise strings, say @{text "iffoo"} and -@{text "if"}. For @{text "iffoo"} we obtain by the Longest Match Rule +$r\<^bsub>key\<^esub> + r\<^bsub>id\<^esub>)\<^sup>\\ +and use POSIX matching to tokenise strings, say \iffoo\ and +\if\. For \iffoo\ we obtain by the Longest Match Rule a single identifier token, not a keyword followed by an -identifier. For @{text "if"} we obtain by the Priority Rule a keyword -token, not an identifier token---even if @{text "r\<^bsub>id\<^esub>"} -matches also. By the Star Rule we know @{text "(r\<^bsub>key\<^esub> + -r\<^bsub>id\<^esub>)\<^sup>\"} matches @{text "iffoo"}, -respectively @{text "if"}, in exactly one `iteration' of the star. The +identifier. For \if\ we obtain by the Priority Rule a keyword +token, not an identifier token---even if \r\<^bsub>id\<^esub>\ +matches also. By the Star Rule we know \(r\<^bsub>key\<^esub> + +r\<^bsub>id\<^esub>)\<^sup>\\ matches \iffoo\, +respectively \if\, in exactly one `iteration' of the star. The Empty String Rule is for cases where, for example, the regular expression -@{text "(a\<^sup>$\<^sup>\"} matches against the -string @{text "bc"}. Then the longest initial matched substring is the +$a\<^sup>$\<^sup>\\ matches against the +string \bc\. Then the longest initial matched substring is the empty string, which is matched by both the whole regular expression and the parenthesised subexpression. @@ -225,25 +223,24 @@ expression matches a string, values encode the information of \emph{how} the string is matched by the regular expression---that is, which part of the string is matched by which part of the regular -expression. For this consider again the string @{text "xy"} and -the regular expression \mbox{@{text "(x + (y + xy))\<^sup>\"}} +expression. For this consider again the string \xy\ and +the regular expression \mbox{\(x + (y + xy))\<^sup>\\} (this time fully parenthesised). We can view this regular expression -as tree and if the string @{text xy} is matched by two Star -`iterations', then the @{text x} is matched by the left-most -alternative in this tree and the @{text y} by the right-left alternative. This +as tree and if the string \xy\ is matched by two Star +`iterations', then the \x\ is matched by the left-most +alternative in this tree and the \y\ by the right-left alternative. This suggests to record this matching as \begin{center} @{term "Stars [Left(Char x), Right(Left(Char y))]"} \end{center} -\noindent where @{const Stars}, @{text Left}, @{text Right} and @{text -Char} are constructors for values. @{text Stars} records how many -iterations were used; @{text Left}, respectively @{text Right}, which +\noindent where @{const Stars}, \Left\, \Right\ and \Char\ are constructors for values. \Stars\ records how many +iterations were used; \Left\, respectively \Right\, which alternative is used. This `tree view' leads naturally to the idea that regular expressions act as types and values as inhabiting those types (see, for example, \cite{HosoyaVouillonPierce2005}). The value for -matching @{text "xy"} in a single `iteration', i.e.~the POSIX value, +matching \xy\ in a single `iteration', i.e.~the POSIX value, would look as follows \begin{center} @@ -316,11 +313,11 @@ We extend our results to ??? Bitcoded version?? -*} +\ -section {* Preliminaries *} +section \Preliminaries\ -text {* \noindent Strings in Isabelle/HOL are lists of characters with +text \\noindent Strings in Isabelle/HOL are lists of characters with the empty string being represented by the empty list, written @{term "[]"}, and list-cons being written as @{term "DUMMY # DUMMY"}. Often we use the usual bracket notation for lists also for strings; for @@ -333,7 +330,7 @@ inductive datatype: \begin{center} - @{text "r :="} + \r :=\ @{const "ZERO"} $\mid$ @{const "ONE"} $\mid$ @{term "CHAR c"} $\mid$ @@ -365,8 +362,8 @@ DUMMY"} for the concatenation of two languages (it is also list-append for strings). We use the star-notation for regular expressions and for languages (in the last clause above). The star for languages is defined - inductively by two clauses: @{text "(i)"} the empty string being in - the star of a language and @{text "(ii)"} if @{term "s\<^sub>1"} is in a + inductively by two clauses: \(i)\ the empty string being in + the star of a language and \(ii)\ if @{term "s\<^sub>1"} is in a language and @{term "s\<^sub>2"} in the star of this language, then also @{term "s\<^sub>1 @ s\<^sub>2"} is in the star of this language. It will also be convenient to use the following notion of a \emph{semantic derivative} (or \emph{left @@ -459,11 +456,11 @@ \cite{Sulzmann2014} is to append another phase to this algorithm in order to calculate a [lexical] value. We will explain the details next. -*} +\ -section {* POSIX Regular Expression Matching\label{posixsec} *} +section \POSIX Regular Expression Matching\label{posixsec}\ -text {* +text \ There have been many previous works that use values for encoding \emph{how} a regular expression matches a string. @@ -473,7 +470,7 @@ are defined as the inductive datatype \begin{center} - @{text "v :="} + \v :=\ @{const "Void"} $\mid$ @{term "val.Char c"} $\mid$ @{term "Left v"} $\mid$ @@ -532,8 +529,8 @@ \end{center} \noindent where in the clause for @{const "Stars"} we use the - notation @{term "v \ set vs"} for indicating that @{text v} is a - member in the list @{text vs}. We require in this rule that every + notation @{term "v \ set vs"} for indicating that \v\ is a + member in the list \vs\. We require in this rule that every value in @{term vs} flattens to a non-empty string. The idea is that @{term "Stars"}-values satisfy the informal Star Rule (see Introduction) where the $^\star$ does not match the empty string unless this is @@ -549,9 +546,9 @@ \end{proposition} \noindent - Given a regular expression @{text r} and a string @{text s}, we define the - set of all \emph{Lexical Values} inhabited by @{text r} with the underlying string - being @{text s}:\footnote{Okui and Suzuki refer to our lexical values + Given a regular expression \r\ and a string \s\, we define the + set of all \emph{Lexical Values} inhabited by \r\ with the underlying string + being \s\:\footnote{Okui and Suzuki refer to our lexical values as \emph{canonical values} in \cite{OkuiSuzuki2010}. The notion of \emph{non-problematic values} by Cardelli and Frisch \cite{Frisch2004} is related, but not identical to our lexical values.} @@ -573,7 +570,7 @@ infinitely many values, but according to our more restricted definition only a single value, namely @{thm LV_STAR_ONE_empty}. - If a regular expression @{text r} matches a string @{text s}, then + If a regular expression \r\ matches a string \s\, then generally the set @{term "LV r s"} is not just a singleton set. In case of POSIX matching the problem is to calculate the unique lexical value that satisfies the (informal) POSIX rules from the Introduction. @@ -582,9 +579,9 @@ path from the left to the right involving @{term derivatives}/@{const nullable} is the first phase of the algorithm (calculating successive \Brz's derivatives) and @{const - mkeps}/@{text inj}, the path from right to left, the second + mkeps}/\inj\, the path from right to left, the second phase. This picture shows the steps required when a regular - expression, say @{text "r\<^sub>1"}, matches the string @{term + expression, say \r\<^sub>1\, matches the string @{term "[a,b,c]"}. We first build the three derivatives (according to @{term a}, @{term b} and @{term c}). We then use @{const nullable} to find out whether the resulting derivative regular expression @@ -609,11 +606,11 @@ \node (v4) [below=of r4]{@{term "v\<^sub>4"}}; \draw[->,line width=1mm](r4) -- (v4); \node (v3) [left=of v4] {@{term "v\<^sub>3"}}; -\draw[->,line width=1mm](v4)--(v3) node[below,midway] {@{text "inj r\<^sub>3 c"}}; +\draw[->,line width=1mm](v4)--(v3) node[below,midway] {\inj r\<^sub>3 c\}; \node (v2) [left=of v3]{@{term "v\<^sub>2"}}; -\draw[->,line width=1mm](v3)--(v2) node[below,midway] {@{text "inj r\<^sub>2 b"}}; +\draw[->,line width=1mm](v3)--(v2) node[below,midway] {\inj r\<^sub>2 b\}; \node (v1) [left=of v2] {@{term "v\<^sub>1"}}; -\draw[->,line width=1mm](v2)--(v1) node[below,midway] {@{text "inj r\<^sub>1 a"}}; +\draw[->,line width=1mm](v2)--(v1) node[below,midway] {\inj r\<^sub>1 a\}; \draw (r4) node[anchor=north west] {\;\raisebox{-8mm}{@{term "mkeps"}}}; \end{tikzpicture} \end{center} @@ -647,8 +644,7 @@ makes some subtle choices leading to a POSIX value: for example if an alternative regular expression, say @{term "ALT r\<^sub>1 r\<^sub>2"}, can match the empty string and furthermore @{term "r\<^sub>1"} can match the - empty string, then we return a @{text Left}-value. The @{text - Right}-value will only be returned if @{term "r\<^sub>1"} cannot match the empty + empty string, then we return a \Left\-value. The \Right\-value will only be returned if @{term "r\<^sub>1"} cannot match the empty string. The most interesting idea from Sulzmann and Lu \cite{Sulzmann2014} is @@ -690,25 +686,25 @@ might be instructive to look first at the three sequence cases (clauses \textit{(4)} -- \textit{(6)}). In each case we need to construct an ``injected value'' for @{term "SEQ r\<^sub>1 r\<^sub>2"}. This must be a value of the form @{term - "Seq DUMMY DUMMY"}\,. Recall the clause of the @{text derivative}-function + "Seq DUMMY DUMMY"}\,. Recall the clause of the \derivative\-function for sequence regular expressions: \begin{center} @{thm (lhs) der.simps(5)[of c "r\<^sub>1" "r\<^sub>2"]} $\dn$ @{thm (rhs) der.simps(5)[of c "r\<^sub>1" "r\<^sub>2"]} \end{center} - \noindent Consider first the @{text "else"}-branch where the derivative is @{term + \noindent Consider first the \else\-branch where the derivative is @{term "SEQ (der c r\<^sub>1) r\<^sub>2"}. The corresponding value must therefore be of the form @{term "Seq v\<^sub>1 v\<^sub>2"}, which matches the left-hand - side in clause~\textit{(4)} of @{term inj}. In the @{text "if"}-branch the derivative is an + side in clause~\textit{(4)} of @{term inj}. In the \if\-branch the derivative is an alternative, namely @{term "ALT (SEQ (der c r\<^sub>1) r\<^sub>2) (der c - r\<^sub>2)"}. This means we either have to consider a @{text Left}- or - @{text Right}-value. In case of the @{text Left}-value we know further it + r\<^sub>2)"}. This means we either have to consider a \Left\- or + \Right\-value. In case of the \Left\-value we know further it must be a value for a sequence regular expression. Therefore the pattern we match in the clause \textit{(5)} is @{term "Left (Seq v\<^sub>1 v\<^sub>2)"}, while in \textit{(6)} it is just @{term "Right v\<^sub>2"}. One more interesting point is in the right-hand side of clause \textit{(6)}: since in this case the - regular expression @{text "r\<^sub>1"} does not ``contribute'' to + regular expression \r\<^sub>1\ does not ``contribute'' to matching the string, that means it only matches the empty string, we need to call @{const mkeps} in order to construct a value for how @{term "r\<^sub>1"} can match this empty string. A similar argument applies for why we can @@ -728,7 +724,7 @@ value has a prepended character @{term c}; the second part shows that the underlying string of an @{const mkeps}-value is always the empty string (given the regular expression is nullable since otherwise - @{text mkeps} might not be defined). + \mkeps\ might not be defined). \begin{lemma}\mbox{}\smallskip\\\label{Prf_injval_flat} \begin{tabular}{ll} @@ -743,16 +739,16 @@ an induction on @{term r}. There are no interesting cases.\qed \end{proof} - Having defined the @{const mkeps} and @{text inj} function we can extend + Having defined the @{const mkeps} and \inj\ function we can extend \Brz's matcher so that a value is constructed (assuming the regular expression matches the string). The clauses of the Sulzmann and Lu lexer are \begin{center} \begin{tabular}{lcl} @{thm (lhs) lexer.simps(1)} & $\dn$ & @{thm (rhs) lexer.simps(1)}\\ - @{thm (lhs) lexer.simps(2)} & $\dn$ & @{text "case"} @{term "lexer (der c r) s"} @{text of}\\ - & & \phantom{$|$} @{term "None"} @{text "\"} @{term None}\\ - & & $|$ @{term "Some v"} @{text "\"} @{term "Some (injval r c v)"} + @{thm (lhs) lexer.simps(2)} & $\dn$ & \case\ @{term "lexer (der c r) s"} \of\\\ + & & \phantom{$|$} @{term "None"} \\\ @{term None}\\ + & & $|$ @{term "Some v"} \\\ @{term "Some (injval r c v)"} \end{tabular} \end{center} @@ -784,24 +780,24 @@ \begin{figure}[t] \begin{center} \begin{tabular}{c} - @{thm[mode=Axiom] Posix.intros(1)}@{text "P"}@{term "ONE"} \qquad - @{thm[mode=Axiom] Posix.intros(2)}@{text "P"}@{term "c"}\medskip\\ - @{thm[mode=Rule] Posix.intros(3)[of "s" "r\<^sub>1" "v" "r\<^sub>2"]}@{text "P+L"}\qquad - @{thm[mode=Rule] Posix.intros(4)[of "s" "r\<^sub>2" "v" "r\<^sub>1"]}@{text "P+R"}\medskip\\ + @{thm[mode=Axiom] Posix.intros(1)}\P\@{term "ONE"} \qquad + @{thm[mode=Axiom] Posix.intros(2)}\P\@{term "c"}\medskip\\ + @{thm[mode=Rule] Posix.intros(3)[of "s" "r\<^sub>1" "v" "r\<^sub>2"]}\P+L\\qquad + @{thm[mode=Rule] Posix.intros(4)[of "s" "r\<^sub>2" "v" "r\<^sub>1"]}\P+R\\medskip\\ $\mprset{flushleft} \inferrule {@{thm (prem 1) Posix.intros(5)[of "s\<^sub>1" "r\<^sub>1" "v\<^sub>1" "s\<^sub>2" "r\<^sub>2" "v\<^sub>2"]} \qquad @{thm (prem 2) Posix.intros(5)[of "s\<^sub>1" "r\<^sub>1" "v\<^sub>1" "s\<^sub>2" "r\<^sub>2" "v\<^sub>2"]} \\\\ @{thm (prem 3) Posix.intros(5)[of "s\<^sub>1" "r\<^sub>1" "v\<^sub>1" "s\<^sub>2" "r\<^sub>2" "v\<^sub>2"]}} - {@{thm (concl) Posix.intros(5)[of "s\<^sub>1" "r\<^sub>1" "v\<^sub>1" "s\<^sub>2" "r\<^sub>2" "v\<^sub>2"]}}$@{text "PS"}\\ - @{thm[mode=Axiom] Posix.intros(7)}@{text "P[]"}\medskip\\ + {@{thm (concl) Posix.intros(5)[of "s\<^sub>1" "r\<^sub>1" "v\<^sub>1" "s\<^sub>2" "r\<^sub>2" "v\<^sub>2"]}}$\PS\\\ + @{thm[mode=Axiom] Posix.intros(7)}\P[]\\medskip\\ $\mprset{flushleft} \inferrule {@{thm (prem 1) Posix.intros(6)[of "s\<^sub>1" "r" "v" "s\<^sub>2" "vs"]} \qquad @{thm (prem 2) Posix.intros(6)[of "s\<^sub>1" "r" "v" "s\<^sub>2" "vs"]} \qquad @{thm (prem 3) Posix.intros(6)[of "s\<^sub>1" "r" "v" "s\<^sub>2" "vs"]} \\\\ @{thm (prem 4) Posix.intros(6)[of "s\<^sub>1" "r" "v" "s\<^sub>2" "vs"]}} - {@{thm (concl) Posix.intros(6)[of "s\<^sub>1" "r" "v" "s\<^sub>2" "vs"]}}$@{text "P\"} + {@{thm (concl) Posix.intros(6)[of "s\<^sub>1" "r" "v" "s\<^sub>2" "vs"]}}$\P\\ \end{tabular} \end{center} \caption{Our inductive definition of POSIX values.}\label{POSIXrules} @@ -825,13 +821,12 @@ \noindent We claim that our @{term "s \ r \ v"} relation captures the idea behind the four informal POSIX rules shown in the Introduction: Consider for example the - rules @{text "P+L"} and @{text "P+R"} where the POSIX value for a string + rules \P+L\ and \P+R\ where the POSIX value for a string and an alternative regular expression, that is @{term "(s, ALT r\<^sub>1 r\<^sub>2)"}, - is specified---it is always a @{text "Left"}-value, \emph{except} when the + is specified---it is always a \Left\-value, \emph{except} when the string to be matched is not in the language of @{term "r\<^sub>1"}; only then it - is a @{text Right}-value (see the side-condition in @{text "P+R"}). - Interesting is also the rule for sequence regular expressions (@{text - "PS"}). The first two premises state that @{term "v\<^sub>1"} and @{term "v\<^sub>2"} + is a \Right\-value (see the side-condition in \P+R\). + Interesting is also the rule for sequence regular expressions (\PS\). The first two premises state that @{term "v\<^sub>1"} and @{term "v\<^sub>2"} are the POSIX values for @{term "(s\<^sub>1, r\<^sub>1)"} and @{term "(s\<^sub>2, r\<^sub>2)"} respectively. Consider now the third premise and note that the POSIX value of this rule should match the string \mbox{@{term "s\<^sub>1 @ s\<^sub>2"}}. According to the @@ -841,21 +836,20 @@ \emph{exist} an @{term "s\<^sub>3"} and @{term "s\<^sub>4"} such that @{term "s\<^sub>2"} can be split up into a non-empty string @{term "s\<^sub>3"} and a possibly empty string @{term "s\<^sub>4"}. Moreover the longer string @{term "s\<^sub>1 @ s\<^sub>3"} can be - matched by @{text "r\<^sub>1"} and the shorter @{term "s\<^sub>4"} can still be + matched by \r\<^sub>1\ and the shorter @{term "s\<^sub>4"} can still be matched by @{term "r\<^sub>2"}. In this case @{term "s\<^sub>1"} would \emph{not} be the longest initial split of \mbox{@{term "s\<^sub>1 @ s\<^sub>2"}} and therefore @{term "Seq v\<^sub>1 v\<^sub>2"} cannot be a POSIX value for @{term "(s\<^sub>1 @ s\<^sub>2, SEQ r\<^sub>1 r\<^sub>2)"}. The main point is that our side-condition ensures the Longest Match Rule is satisfied. - A similar condition is imposed on the POSIX value in the @{text - "P\"}-rule. Also there we want that @{term "s\<^sub>1"} is the longest initial + A similar condition is imposed on the POSIX value in the \P\\-rule. Also there we want that @{term "s\<^sub>1"} is the longest initial split of @{term "s\<^sub>1 @ s\<^sub>2"} and furthermore the corresponding value @{term v} cannot be flattened to the empty string. In effect, we require that in each ``iteration'' of the star, some non-empty substring needs to be ``chipped'' away; only in case of the empty string we accept @{term "Stars []"} as the POSIX value. Indeed we can show that our POSIX values - are lexical values which exclude those @{text Stars} that contain subvalues + are lexical values which exclude those \Stars\ that contain subvalues that flatten to the empty string. \begin{lemma}\label{LVposix} @@ -879,7 +873,7 @@ \end{proof} \noindent - The central lemma for our POSIX relation is that the @{text inj}-function + The central lemma for our POSIX relation is that the \inj\-function preserves POSIX values. \begin{lemma}\label{Posix2} @@ -887,17 +881,17 @@ \end{lemma} \begin{proof} - By induction on @{text r}. We explain two cases. + By induction on \r\. We explain two cases. \begin{itemize} \item[$\bullet$] Case @{term "r = ALT r\<^sub>1 r\<^sub>2"}. There are - two subcases, namely @{text "(a)"} \mbox{@{term "v = Left v'"}} and @{term - "s \ der c r\<^sub>1 \ v'"}; and @{text "(b)"} @{term "v = Right v'"}, @{term - "s \ L (der c r\<^sub>1)"} and @{term "s \ der c r\<^sub>2 \ v'"}. In @{text "(a)"} we + two subcases, namely \(a)\ \mbox{@{term "v = Left v'"}} and @{term + "s \ der c r\<^sub>1 \ v'"}; and \(b)\ @{term "v = Right v'"}, @{term + "s \ L (der c r\<^sub>1)"} and @{term "s \ der c r\<^sub>2 \ v'"}. In \(a)\ we know @{term "s \ der c r\<^sub>1 \ v'"}, from which we can infer @{term "(c # s) \ r\<^sub>1 \ injval r\<^sub>1 c v'"} by induction hypothesis and hence @{term "(c # s) \ ALT r\<^sub>1 r\<^sub>2 \ injval (ALT r\<^sub>1 r\<^sub>2) c (Left v')"} as needed. Similarly - in subcase @{text "(b)"} where, however, in addition we have to use + in subcase \(b)\ where, however, in addition we have to use Proposition~\ref{derprop}(2) in order to infer @{term "c # s \ L r\<^sub>1"} from @{term "s \ L (der c r\<^sub>1)"}.\smallskip @@ -905,13 +899,13 @@ \begin{quote} \begin{description} - \item[@{text "(a)"}] @{term "v = Left (Seq v\<^sub>1 v\<^sub>2)"} and @{term "nullable r\<^sub>1"} - \item[@{text "(b)"}] @{term "v = Right v\<^sub>1"} and @{term "nullable r\<^sub>1"} - \item[@{text "(c)"}] @{term "v = Seq v\<^sub>1 v\<^sub>2"} and @{term "\ nullable r\<^sub>1"} + \item[\(a)\] @{term "v = Left (Seq v\<^sub>1 v\<^sub>2)"} and @{term "nullable r\<^sub>1"} + \item[\(b)\] @{term "v = Right v\<^sub>1"} and @{term "nullable r\<^sub>1"} + \item[\(c)\] @{term "v = Seq v\<^sub>1 v\<^sub>2"} and @{term "\ nullable r\<^sub>1"} \end{description} \end{quote} - \noindent For @{text "(a)"} we know @{term "s\<^sub>1 \ der c r\<^sub>1 \ v\<^sub>1"} and + \noindent For \(a)\ we know @{term "s\<^sub>1 \ der c r\<^sub>1 \ v\<^sub>1"} and @{term "s\<^sub>2 \ r\<^sub>2 \ v\<^sub>2"} as well as % \[@{term "\ (\s\<^sub>3 s\<^sub>4. s\<^sub>3 \ [] \ s\<^sub>3 @ s\<^sub>4 = s\<^sub>2 \ s\<^sub>1 @ s\<^sub>3 \ L (der c r\<^sub>1) \ s\<^sub>4 \ L r\<^sub>2)"}\] @@ -920,12 +914,12 @@ % \[@{term "\ (\s\<^sub>3 s\<^sub>4. s\<^sub>3 \ [] \ s\<^sub>3 @ s\<^sub>4 = s\<^sub>2 \ (c # s\<^sub>1) @ s\<^sub>3 \ L r\<^sub>1 \ s\<^sub>4 \ L r\<^sub>2)"}\] - \noindent We can use the induction hypothesis for @{text "r\<^sub>1"} to obtain + \noindent We can use the induction hypothesis for \r\<^sub>1\ to obtain @{term "(c # s\<^sub>1) \ r\<^sub>1 \ injval r\<^sub>1 c v\<^sub>1"}. Putting this all together allows us to infer - @{term "((c # s\<^sub>1) @ s\<^sub>2) \ SEQ r\<^sub>1 r\<^sub>2 \ Seq (injval r\<^sub>1 c v\<^sub>1) v\<^sub>2"}. The case @{text "(c)"} + @{term "((c # s\<^sub>1) @ s\<^sub>2) \ SEQ r\<^sub>1 r\<^sub>2 \ Seq (injval r\<^sub>1 c v\<^sub>1) v\<^sub>2"}. The case \(c)\ is similar. - For @{text "(b)"} we know @{term "s \ der c r\<^sub>2 \ v\<^sub>1"} and + For \(b)\ we know @{term "s \ der c r\<^sub>2 \ v\<^sub>1"} and @{term "s\<^sub>1 @ s\<^sub>2 \ L (SEQ (der c r\<^sub>1) r\<^sub>2)"}. From the former we have @{term "(c # s) \ r\<^sub>2 \ (injval r\<^sub>2 c v\<^sub>1)"} by induction hypothesis for @{term "r\<^sub>2"}. From the latter we can infer @@ -979,11 +973,11 @@ In the next section we show that our specification coincides with another one given by Okui and Suzuki using a different technique. -*} +\ -section {* Ordering of Values according to Okui and Suzuki*} +section \Ordering of Values according to Okui and Suzuki\ -text {* +text \ While in the previous section we have defined POSIX values directly in terms of a ternary relation (see inference rules in Figure~\ref{POSIXrules}), @@ -1014,54 +1008,51 @@ \end{center} \noindent - At position @{text "[0,1]"} of this value is the - subvalue @{text "Char y"} and at position @{text "[1]"} the + At position \[0,1]\ of this value is the + subvalue \Char y\ and at position \[1]\ the subvalue @{term "Char z"}. At the `root' position, or empty list - @{term "[]"}, is the whole value @{term v}. Positions such as @{text - "[0,1,0]"} or @{text "[2]"} are outside of @{text - v}. If it exists, the subvalue of @{term v} at a position @{text - p}, written @{term "at v p"}, can be recursively defined by + @{term "[]"}, is the whole value @{term v}. Positions such as \[0,1,0]\ or \[2]\ are outside of \v\. If it exists, the subvalue of @{term v} at a position \p\, written @{term "at v p"}, can be recursively defined by \begin{center} \begin{tabular}{r@ {\hspace{0mm}}lcl} - @{term v} & @{text "\\<^bsub>[]\<^esub>"} & @{text "\"}& @{thm (rhs) at.simps(1)}\\ - @{term "Left v"} & @{text "\\<^bsub>0::ps\<^esub>"} & @{text "\"}& @{thm (rhs) at.simps(2)}\\ - @{term "Right v"} & @{text "\\<^bsub>1::ps\<^esub>"} & @{text "\"} & + @{term v} & \\\<^bsub>[]\<^esub>\ & \\\& @{thm (rhs) at.simps(1)}\\ + @{term "Left v"} & \\\<^bsub>0::ps\<^esub>\ & \\\& @{thm (rhs) at.simps(2)}\\ + @{term "Right v"} & \\\<^bsub>1::ps\<^esub>\ & \\\ & @{thm (rhs) at.simps(3)[simplified Suc_0_fold]}\\ - @{term "Seq v\<^sub>1 v\<^sub>2"} & @{text "\\<^bsub>0::ps\<^esub>"} & @{text "\"} & + @{term "Seq v\<^sub>1 v\<^sub>2"} & \\\<^bsub>0::ps\<^esub>\ & \\\ & @{thm (rhs) at.simps(4)[where ?v1.0="v\<^sub>1" and ?v2.0="v\<^sub>2"]} \\ - @{term "Seq v\<^sub>1 v\<^sub>2"} & @{text "\\<^bsub>1::ps\<^esub>"} - & @{text "\"} & + @{term "Seq v\<^sub>1 v\<^sub>2"} & \\\<^bsub>1::ps\<^esub>\ + & \\\ & @{thm (rhs) at.simps(5)[where ?v1.0="v\<^sub>1" and ?v2.0="v\<^sub>2", simplified Suc_0_fold]} \\ - @{term "Stars vs"} & @{text "\\<^bsub>n::ps\<^esub>"} & @{text "\"}& @{thm (rhs) at.simps(6)}\\ + @{term "Stars vs"} & \\\<^bsub>n::ps\<^esub>\ & \\\& @{thm (rhs) at.simps(6)}\\ \end{tabular} \end{center} \noindent In the last clause we use Isabelle's notation @{term "vs ! n"} for the - @{text n}th element in a list. The set of positions inside a value @{text v}, + \n\th element in a list. The set of positions inside a value \v\, written @{term "Pos v"}, is given by \begin{center} \begin{tabular}{lcl} - @{thm (lhs) Pos.simps(1)} & @{text "\"} & @{thm (rhs) Pos.simps(1)}\\ - @{thm (lhs) Pos.simps(2)} & @{text "\"} & @{thm (rhs) Pos.simps(2)}\\ - @{thm (lhs) Pos.simps(3)} & @{text "\"} & @{thm (rhs) Pos.simps(3)}\\ - @{thm (lhs) Pos.simps(4)} & @{text "\"} & @{thm (rhs) Pos.simps(4)}\\ + @{thm (lhs) Pos.simps(1)} & \\\ & @{thm (rhs) Pos.simps(1)}\\ + @{thm (lhs) Pos.simps(2)} & \\\ & @{thm (rhs) Pos.simps(2)}\\ + @{thm (lhs) Pos.simps(3)} & \\\ & @{thm (rhs) Pos.simps(3)}\\ + @{thm (lhs) Pos.simps(4)} & \\\ & @{thm (rhs) Pos.simps(4)}\\ @{thm (lhs) Pos.simps(5)[where ?v1.0="v\<^sub>1" and ?v2.0="v\<^sub>2"]} - & @{text "\"} + & \\\ & @{thm (rhs) Pos.simps(5)[where ?v1.0="v\<^sub>1" and ?v2.0="v\<^sub>2"]}\\ - @{thm (lhs) Pos_stars} & @{text "\"} & @{thm (rhs) Pos_stars}\\ + @{thm (lhs) Pos_stars} & \\\ & @{thm (rhs) Pos_stars}\\ \end{tabular} \end{center} \noindent - whereby @{text len} in the last clause stands for the length of a list. Clearly + whereby \len\ in the last clause stands for the length of a list. Clearly for every position inside a value there exists a subvalue at that position. To help understanding the ordering of Okui and Suzuki, consider again the earlier value - @{text v} and compare it with the following @{text w}: + \v\ and compare it with the following \w\: \begin{center} \begin{tabular}{l} @@ -1070,16 +1061,16 @@ \end{tabular} \end{center} - \noindent Both values match the string @{text "xyz"}, that means if + \noindent Both values match the string \xyz\, that means if we flatten these values at their respective root position, we obtain - @{text "xyz"}. However, at position @{text "[0]"}, @{text v} matches - @{text xy} whereas @{text w} matches only the shorter @{text x}. So - according to the Longest Match Rule, we should prefer @{text v}, - rather than @{text w} as POSIX value for string @{text xyz} (and + \xyz\. However, at position \[0]\, \v\ matches + \xy\ whereas \w\ matches only the shorter \x\. So + according to the Longest Match Rule, we should prefer \v\, + rather than \w\ as POSIX value for string \xyz\ (and corresponding regular expression). In order to formalise this idea, Okui and Suzuki introduce a measure for - subvalues at position @{text p}, called the \emph{norm} of @{text v} - at position @{text p}. We can define this measure in Isabelle as an + subvalues at position \p\, called the \emph{norm} of \v\ + at position \p\. We can define this measure in Isabelle as an integer as follows \begin{center} @@ -1087,10 +1078,10 @@ \end{center} \noindent where we take the length of the flattened value at - position @{text p}, provided the position is inside @{text v}; if - not, then the norm is @{text "-1"}. The default for outside + position \p\, provided the position is inside \