thys/Journal/Paper.thy
changeset 273 e85099ac4c6c
parent 272 f16019b11179
child 274 692b62426677
equal deleted inserted replaced
272:f16019b11179 273:e85099ac4c6c
    17 
    17 
    18 syntax (latex output)
    18 syntax (latex output)
    19   "_Collect" :: "pttrn => bool => 'a set"              ("(1{_ \<^raw:\mbox{\boldmath$\mid$}> _})")
    19   "_Collect" :: "pttrn => bool => 'a set"              ("(1{_ \<^raw:\mbox{\boldmath$\mid$}> _})")
    20   "_CollectIn" :: "pttrn => 'a set => bool => 'a set"   ("(1{_ \<in> _ |e _})")
    20   "_CollectIn" :: "pttrn => 'a set => bool => 'a set"   ("(1{_ \<in> _ |e _})")
    21 
    21 
       
    22 syntax
       
    23   "_Not_Ex" :: "idts \<Rightarrow> bool \<Rightarrow> bool"  ("(3\<nexists>_.a./ _)" [0, 10] 10)
       
    24   "_Not_Ex1" :: "pttrn \<Rightarrow> bool \<Rightarrow> bool"  ("(3\<nexists>!_.a./ _)" [0, 10] 10)
       
    25 
    22 
    26 
    23 abbreviation 
    27 abbreviation 
    24   "der_syn r c \<equiv> der c r"
    28   "der_syn r c \<equiv> der c r"
    25 
    29 
    26 abbreviation 
    30 abbreviation 
    35 
    39 
    36 notation (latex output)
    40 notation (latex output)
    37   If  ("(\<^raw:\textrm{>if\<^raw:}> (_)/ \<^raw:\textrm{>then\<^raw:}> (_)/ \<^raw:\textrm{>else\<^raw:}> (_))" 10) and
    41   If  ("(\<^raw:\textrm{>if\<^raw:}> (_)/ \<^raw:\textrm{>then\<^raw:}> (_)/ \<^raw:\textrm{>else\<^raw:}> (_))" 10) and
    38   Cons ("_\<^raw:\mbox{$\,$}>::\<^raw:\mbox{$\,$}>_" [75,73] 73) and  
    42   Cons ("_\<^raw:\mbox{$\,$}>::\<^raw:\mbox{$\,$}>_" [75,73] 73) and  
    39 
    43 
    40   ZERO ("\<^bold>0" 78) and 
    44   ZERO ("\<^bold>0" 81) and 
    41   ONE ("\<^bold>1" 1000) and 
    45   ONE ("\<^bold>1" 81) and 
    42   CHAR ("_" [1000] 80) and
    46   CHAR ("_" [1000] 80) and
    43   ALT ("_ + _" [77,77] 78) and
    47   ALT ("_ + _" [77,77] 78) and
    44   SEQ ("_ \<cdot> _" [77,77] 78) and
    48   SEQ ("_ \<cdot> _" [77,77] 78) and
    45   STAR ("_\<^sup>\<star>" [1000] 78) and
    49   STAR ("_\<^sup>\<star>" [78] 78) and
    46   
    50   
    47   val.Void ("Empty" 78) and
    51   val.Void ("Empty" 78) and
    48   val.Char ("Char _" [1000] 78) and
    52   val.Char ("Char _" [1000] 78) and
    49   val.Left ("Left _" [79] 78) and
    53   val.Left ("Left _" [79] 78) and
    50   val.Right ("Right _" [1000] 78) and
    54   val.Right ("Right _" [1000] 78) and
    54   L ("L'(_')" [10] 78) and
    58   L ("L'(_')" [10] 78) and
    55   LV ("LV _ _" [80,73] 78) and
    59   LV ("LV _ _" [80,73] 78) and
    56   der_syn ("_\\_" [79, 1000] 76) and  
    60   der_syn ("_\\_" [79, 1000] 76) and  
    57   ders_syn ("_\\_" [79, 1000] 76) and
    61   ders_syn ("_\\_" [79, 1000] 76) and
    58   flat ("|_|" [75] 74) and
    62   flat ("|_|" [75] 74) and
       
    63   flats ("|_|" [72] 74) and
    59   Sequ ("_ @ _" [78,77] 63) and
    64   Sequ ("_ @ _" [78,77] 63) and
    60   injval ("inj _ _ _" [79,77,79] 76) and 
    65   injval ("inj _ _ _" [79,77,79] 76) and 
    61   mkeps ("mkeps _" [79] 76) and 
    66   mkeps ("mkeps _" [79] 76) and 
    62   length ("len _" [73] 73) and
    67   length ("len _" [73] 73) and
    63   intlen ("len _" [73] 73) and
    68   intlen ("len _" [73] 73) and
   104 p9. The condition "not exists s3 s4..." appears often enough (in particular in
   109 p9. The condition "not exists s3 s4..." appears often enough (in particular in
   105 the proof of Lemma 3) to warrant a definition.
   110 the proof of Lemma 3) to warrant a definition.
   106 
   111 
   107 *)
   112 *)
   108 
   113 
       
   114 
   109 (*>*)
   115 (*>*)
   110 
   116 
   111 
   117 
   112 
   118 
   113 section {* Introduction *}
   119 section {* Introduction *}
   185 identifier. For @{text "if"} we obtain by the Priority Rule a keyword
   191 identifier. For @{text "if"} we obtain by the Priority Rule a keyword
   186 token, not an identifier token---even if @{text "r\<^bsub>id\<^esub>"}
   192 token, not an identifier token---even if @{text "r\<^bsub>id\<^esub>"}
   187 matches also. By the Star Rule we know @{text "(r\<^bsub>key\<^esub> +
   193 matches also. By the Star Rule we know @{text "(r\<^bsub>key\<^esub> +
   188 r\<^bsub>id\<^esub>)\<^sup>\<star>"} matches @{text "iffoo"},
   194 r\<^bsub>id\<^esub>)\<^sup>\<star>"} matches @{text "iffoo"},
   189 respectively @{text "if"}, in exactly one `iteration' of the star. The
   195 respectively @{text "if"}, in exactly one `iteration' of the star. The
   190 Empty String Rule is for cases where, for example, @{text
   196 Empty String Rule is for cases where, for example, the regular expression 
   191 "(a\<^sup>\<star>)\<^sup>\<star>"} matches against the
   197 @{text "(a\<^sup>\<star>)\<^sup>\<star>"} matches against the
   192 string @{text "bc"}. Then the longest initial matched substring is the
   198 string @{text "bc"}. Then the longest initial matched substring is the
   193 empty string, which is matched by both the whole regular expression
   199 empty string, which is matched by both the whole regular expression
   194 and the parenthesised subexpression.
   200 and the parenthesised subexpression.
   195 
   201 
   196 
   202 
   200 to allow generation not just of a YES/NO answer but of an actual
   206 to allow generation not just of a YES/NO answer but of an actual
   201 matching, called a [lexical] {\em value}. Assuming a regular
   207 matching, called a [lexical] {\em value}. Assuming a regular
   202 expression matches a string, values encode the information of
   208 expression matches a string, values encode the information of
   203 \emph{how} the string is matched by the regular expression---that is,
   209 \emph{how} the string is matched by the regular expression---that is,
   204 which part of the string is matched by which part of the regular
   210 which part of the string is matched by which part of the regular
   205 expression. For this consider again the the string @{text "xy"} and
   211 expression. For this consider again the string @{text "xy"} and
   206 the regular expression \mbox{@{text "(x + (y +
   212 the regular expression \mbox{@{text "(x + (y + xy))\<^sup>\<star>"}}
   207 xy))\<^sup>\<star>"}}. The POSIX value, which corresponds to using the
   213 (this time fully parenthesised). We can view this regular expression
   208 star in only one repetition,
   214 as tree and if the string @{text xy} is matched by two Star
   209 
   215 `iterations', then the @{text x} is matched by the left-most
   210 
   216 alternative in this tree and the @{text y} by the right-left alternative. This
   211 \marginpar{explain values; who introduced them} 
   217 suggests to record this matching as
       
   218 
       
   219 \begin{center}
       
   220 @{term "Stars [Left(Char x), Right(Left(Char y))]"}
       
   221 \end{center}
       
   222 
       
   223 \noindent where @{const Stars}, @{text Left}, @{text Right} and @{text
       
   224 Char} are constructors for values. @{text Stars} records how many
       
   225 iterations were used; @{text Left}, respectively @{text Right}, which
       
   226 alternative is used. This `tree view' leads naturally to the
       
   227 idea that regular expressions act as types and values as inhabiting
       
   228 those types. This view was first put forward by ???. The value for the
       
   229 single `iteration', i.e.~the POSIX value, would look as follows
       
   230 
       
   231 \begin{center}
       
   232 @{term "Stars [Seq (Char x) (Char y)]"}
       
   233 \end{center}
       
   234 
       
   235 \noindent where @{const Stars} has only a single-element list for the
       
   236 single iteration and @{const Seq} indicates that @{term xy} is matched 
       
   237 by a sequence regular expression, which we will in what follows 
       
   238 write more formally as @{term "SEQ x y"}.
   212 
   239 
   213 
   240 
   214 Sulzmann and Lu give a simple algorithm to calculate a value that
   241 Sulzmann and Lu give a simple algorithm to calculate a value that
   215 appears to be the value associated with POSIX matching.  The challenge
   242 appears to be the value associated with POSIX matching.  The challenge
   216 then is to specify that value, in an algorithm-independent fashion,
   243 then is to specify that value, in an algorithm-independent fashion,
   278 
   305 
   279 *}
   306 *}
   280 
   307 
   281 section {* Preliminaries *}
   308 section {* Preliminaries *}
   282 
   309 
   283 text {* \noindent Strings in Isabelle/HOL are lists of characters with the
   310 text {* \noindent Strings in Isabelle/HOL are lists of characters with
   284 empty string being represented by the empty list, written @{term "[]"}, and
   311 the empty string being represented by the empty list, written @{term
   285 list-cons being written as @{term "DUMMY # DUMMY"}. Often we use the usual
   312 "[]"}, and list-cons being written as @{term "DUMMY # DUMMY"}. Often
   286 bracket notation for lists also for strings; for example a string consisting
   313 we use the usual bracket notation for lists also for strings; for
   287 of just a single character @{term c} is written @{term "[c]"}. By using the
   314 example a string consisting of just a single character @{term c} is
       
   315 written @{term "[c]"}. We use the usual definitions for 
       
   316 \emph{prefixes} and \emph{strict prefixes} of strings.  By using the
   288 type @{type char} for characters we have a supply of finitely many
   317 type @{type char} for characters we have a supply of finitely many
   289 characters roughly corresponding to the ASCII character set. Regular
   318 characters roughly corresponding to the ASCII character set. Regular
   290 expressions are defined as usual as the elements of the following inductive
   319 expressions are defined as usual as the elements of the following
   291 datatype:
   320 inductive datatype:
   292 
   321 
   293   \begin{center}
   322   \begin{center}
   294   @{text "r :="}
   323   @{text "r :="}
   295   @{const "ZERO"} $\mid$
   324   @{const "ZERO"} $\mid$
   296   @{const "ONE"} $\mid$
   325   @{const "ONE"} $\mid$
   306   language of a regular expression is also defined as usual by the
   335   language of a regular expression is also defined as usual by the
   307   recursive function @{term L} with the six clauses:
   336   recursive function @{term L} with the six clauses:
   308 
   337 
   309   \begin{center}
   338   \begin{center}
   310   \begin{tabular}{l@ {\hspace{4mm}}rcl}
   339   \begin{tabular}{l@ {\hspace{4mm}}rcl}
   311   (1) & @{thm (lhs) L.simps(1)} & $\dn$ & @{thm (rhs) L.simps(1)}\\
   340   \textit{(1)} & @{thm (lhs) L.simps(1)} & $\dn$ & @{thm (rhs) L.simps(1)}\\
   312   (2) & @{thm (lhs) L.simps(2)} & $\dn$ & @{thm (rhs) L.simps(2)}\\
   341   \textit{(2)} & @{thm (lhs) L.simps(2)} & $\dn$ & @{thm (rhs) L.simps(2)}\\
   313   (3) & @{thm (lhs) L.simps(3)} & $\dn$ & @{thm (rhs) L.simps(3)}\\
   342   \textit{(3)} & @{thm (lhs) L.simps(3)} & $\dn$ & @{thm (rhs) L.simps(3)}\\
   314   (4) & @{thm (lhs) L.simps(4)[of "r\<^sub>1" "r\<^sub>2"]} & $\dn$ & @{thm (rhs) L.simps(4)[of "r\<^sub>1" "r\<^sub>2"]}\\
   343   \textit{(4)} & @{thm (lhs) L.simps(4)[of "r\<^sub>1" "r\<^sub>2"]} & $\dn$ & 
   315   (5) & @{thm (lhs) L.simps(5)[of "r\<^sub>1" "r\<^sub>2"]} & $\dn$ & @{thm (rhs) L.simps(5)[of "r\<^sub>1" "r\<^sub>2"]}\\
   344         @{thm (rhs) L.simps(4)[of "r\<^sub>1" "r\<^sub>2"]}\\
   316   (6) & @{thm (lhs) L.simps(6)} & $\dn$ & @{thm (rhs) L.simps(6)}\\
   345   \textit{(5)} & @{thm (lhs) L.simps(5)[of "r\<^sub>1" "r\<^sub>2"]} & $\dn$ & 
       
   346         @{thm (rhs) L.simps(5)[of "r\<^sub>1" "r\<^sub>2"]}\\
       
   347   \textit{(6)} & @{thm (lhs) L.simps(6)} & $\dn$ & @{thm (rhs) L.simps(6)}\\
   317   \end{tabular}
   348   \end{tabular}
   318   \end{center}
   349   \end{center}
   319   
   350   
   320   \noindent In clause (4) we use the operation @{term "DUMMY ;;
   351   \noindent In clause \textit{(4)} we use the operation @{term "DUMMY ;;
   321   DUMMY"} for the concatenation of two languages (it is also list-append for
   352   DUMMY"} for the concatenation of two languages (it is also list-append for
   322   strings). We use the star-notation for regular expressions and for
   353   strings). We use the star-notation for regular expressions and for
   323   languages (in the last clause above). The star for languages is defined
   354   languages (in the last clause above). The star for languages is defined
   324   inductively by two clauses: @{text "(i)"} the empty string being in
   355   inductively by two clauses: @{text "(i)"} the empty string being in
   325   the star of a language and @{text "(ii)"} if @{term "s\<^sub>1"} is in a
   356   the star of a language and @{text "(ii)"} if @{term "s\<^sub>1"} is in a
   360   @{thm (lhs) nullable.simps(1)} & $\dn$ & @{thm (rhs) nullable.simps(1)}\\
   391   @{thm (lhs) nullable.simps(1)} & $\dn$ & @{thm (rhs) nullable.simps(1)}\\
   361   @{thm (lhs) nullable.simps(2)} & $\dn$ & @{thm (rhs) nullable.simps(2)}\\
   392   @{thm (lhs) nullable.simps(2)} & $\dn$ & @{thm (rhs) nullable.simps(2)}\\
   362   @{thm (lhs) nullable.simps(3)} & $\dn$ & @{thm (rhs) nullable.simps(3)}\\
   393   @{thm (lhs) nullable.simps(3)} & $\dn$ & @{thm (rhs) nullable.simps(3)}\\
   363   @{thm (lhs) nullable.simps(4)[of "r\<^sub>1" "r\<^sub>2"]} & $\dn$ & @{thm (rhs) nullable.simps(4)[of "r\<^sub>1" "r\<^sub>2"]}\\
   394   @{thm (lhs) nullable.simps(4)[of "r\<^sub>1" "r\<^sub>2"]} & $\dn$ & @{thm (rhs) nullable.simps(4)[of "r\<^sub>1" "r\<^sub>2"]}\\
   364   @{thm (lhs) nullable.simps(5)[of "r\<^sub>1" "r\<^sub>2"]} & $\dn$ & @{thm (rhs) nullable.simps(5)[of "r\<^sub>1" "r\<^sub>2"]}\\
   395   @{thm (lhs) nullable.simps(5)[of "r\<^sub>1" "r\<^sub>2"]} & $\dn$ & @{thm (rhs) nullable.simps(5)[of "r\<^sub>1" "r\<^sub>2"]}\\
   365   @{thm (lhs) nullable.simps(6)} & $\dn$ & @{thm (rhs) nullable.simps(6)}%\medskip\\
   396   @{thm (lhs) nullable.simps(6)} & $\dn$ & @{thm (rhs) nullable.simps(6)}\medskip\\
   366   \end{tabular}
   397 
   367   \end{center}
   398 %  \end{tabular}
   368 
   399 %  \end{center}
   369   \begin{center}
   400 
   370   \begin{tabular}{lcl}
   401 %  \begin{center}
       
   402 %  \begin{tabular}{lcl}
       
   403 
   371   @{thm (lhs) der.simps(1)} & $\dn$ & @{thm (rhs) der.simps(1)}\\
   404   @{thm (lhs) der.simps(1)} & $\dn$ & @{thm (rhs) der.simps(1)}\\
   372   @{thm (lhs) der.simps(2)} & $\dn$ & @{thm (rhs) der.simps(2)}\\
   405   @{thm (lhs) der.simps(2)} & $\dn$ & @{thm (rhs) der.simps(2)}\\
   373   @{thm (lhs) der.simps(3)} & $\dn$ & @{thm (rhs) der.simps(3)}\\
   406   @{thm (lhs) der.simps(3)} & $\dn$ & @{thm (rhs) der.simps(3)}\\
   374   @{thm (lhs) der.simps(4)[of c "r\<^sub>1" "r\<^sub>2"]} & $\dn$ & @{thm (rhs) der.simps(4)[of c "r\<^sub>1" "r\<^sub>2"]}\\
   407   @{thm (lhs) der.simps(4)[of c "r\<^sub>1" "r\<^sub>2"]} & $\dn$ & @{thm (rhs) der.simps(4)[of c "r\<^sub>1" "r\<^sub>2"]}\\
   375   @{thm (lhs) der.simps(5)[of c "r\<^sub>1" "r\<^sub>2"]} & $\dn$ & @{thm (rhs) der.simps(5)[of c "r\<^sub>1" "r\<^sub>2"]}\\
   408   @{thm (lhs) der.simps(5)[of c "r\<^sub>1" "r\<^sub>2"]} & $\dn$ & @{thm (rhs) der.simps(5)[of c "r\<^sub>1" "r\<^sub>2"]}\\
   390   \noindent Given the equations in \eqref{SemDer}, it is a relatively easy
   423   \noindent Given the equations in \eqref{SemDer}, it is a relatively easy
   391   exercise in mechanical reasoning to establish that
   424   exercise in mechanical reasoning to establish that
   392 
   425 
   393   \begin{proposition}\label{derprop}\mbox{}\\ 
   426   \begin{proposition}\label{derprop}\mbox{}\\ 
   394   \begin{tabular}{ll}
   427   \begin{tabular}{ll}
   395   @{text "(1)"} & @{thm (lhs) nullable_correctness} if and only if
   428   \textit{(1)} & @{thm (lhs) nullable_correctness} if and only if
   396   @{thm (rhs) nullable_correctness}, and \\ 
   429   @{thm (rhs) nullable_correctness}, and \\ 
   397   @{text "(2)"} & @{thm[mode=IfThen] der_correctness}.
   430   \textit{(2)} & @{thm[mode=IfThen] der_correctness}.
   398   \end{tabular}
   431   \end{tabular}
   399   \end{proposition}
   432   \end{proposition}
   400 
   433 
   401   \noindent With this in place it is also very routine to prove that the
   434   \noindent With this in place it is also very routine to prove that the
   402   regular expression matcher defined as
   435   regular expression matcher defined as
   455   @{thm (lhs) flat.simps(6)} & $\dn$ & @{thm (rhs) flat.simps(6)}\\
   488   @{thm (lhs) flat.simps(6)} & $\dn$ & @{thm (rhs) flat.simps(6)}\\
   456   @{thm (lhs) flat.simps(7)} & $\dn$ & @{thm (rhs) flat.simps(7)}\\
   489   @{thm (lhs) flat.simps(7)} & $\dn$ & @{thm (rhs) flat.simps(7)}\\
   457   \end{tabular}
   490   \end{tabular}
   458   \end{center}
   491   \end{center}
   459 
   492 
   460   \noindent Sulzmann and Lu also define inductively an inhabitation relation
   493   \noindent We will sometimes refer to the underlying string of a
   461   that associates values to regular expressions. We define this relation as 
   494   value as \emph{flattened value}.  We will also overload our notation and 
   462   follows:\footnote{Note that the rule for @{term Stars} differs from our 
   495   use @{term "flats vs"} for flattening a list of values and concatenating
   463   earlier paper \cite{AusafDyckhoffUrban2016}. There we used the original
   496   the resulting strings.
   464   definition by Sulzmann and Lu which does not require that the values @{term "v \<in> set vs"}
   497   
   465   flatten to a non-empty string. The reason for introducing the 
   498   Sulzmann and Lu define
   466   more restricted version of lexical values is convenience later 
   499   inductively an \emph{inhabitation relation} that associates values to
   467   on when reasoning about 
   500   regular expressions. We define this relation as
   468   an ordering relation for values.} 
   501   follows:\footnote{Note that the rule for @{term Stars} differs from
       
   502   our earlier paper \cite{AusafDyckhoffUrban2016}. There we used the
       
   503   original definition by Sulzmann and Lu which does not require that
       
   504   the values @{term "v \<in> set vs"} flatten to a non-empty
       
   505   string. The reason for introducing the more restricted version of
       
   506   lexical values is convenience later on when reasoning about an
       
   507   ordering relation for values.}
   469 
   508 
   470   \begin{center}
   509   \begin{center}
   471   \begin{tabular}{c@ {\hspace{12mm}}c}
   510   \begin{tabular}{c@ {\hspace{12mm}}c}
   472   \\[-8mm]
   511   \\[-8mm]
   473   @{thm[mode=Axiom] Prf.intros(4)} & 
   512   @{thm[mode=Axiom] Prf.intros(4)} & 
   499   \noindent
   538   \noindent
   500   Given a regular expression @{text r} and a string @{text s}, we define the 
   539   Given a regular expression @{text r} and a string @{text s}, we define the 
   501   set of all \emph{Lexical Values} inhabited by @{text r} with the underlying string 
   540   set of all \emph{Lexical Values} inhabited by @{text r} with the underlying string 
   502   being @{text s}:\footnote{Okui and Suzuki refer to our lexical values 
   541   being @{text s}:\footnote{Okui and Suzuki refer to our lexical values 
   503   as \emph{canonical values} in \cite{OkuiSuzuki2010}. The notion of \emph{non-problematic
   542   as \emph{canonical values} in \cite{OkuiSuzuki2010}. The notion of \emph{non-problematic
   504   values} by Cardelli and Frisch \cite{Frisch2004} is similar, but not identical
   543   values} by Cardelli and Frisch \cite{Frisch2004} is related, but not identical
   505   to our lexical values.}
   544   to our lexical values.}
   506   
   545   
   507   \begin{center}
   546   \begin{center}
   508   @{thm LV_def}
   547   @{thm LV_def}
   509   \end{center}
   548   \end{center}
   517   \noindent This finiteness property does not hold in general if we
   556   \noindent This finiteness property does not hold in general if we
   518   remove the side-condition about @{term "flat v \<noteq> []"} in the
   557   remove the side-condition about @{term "flat v \<noteq> []"} in the
   519   @{term Stars}-rule above. For example using Sulzmann and Lu's
   558   @{term Stars}-rule above. For example using Sulzmann and Lu's
   520   less restrictive definition, @{term "LV (STAR ONE) []"} would contain
   559   less restrictive definition, @{term "LV (STAR ONE) []"} would contain
   521   infinitely many values, but according to our more restricted
   560   infinitely many values, but according to our more restricted
   522   definition @{thm LV_STAR_ONE_empty}.
   561   definition only a single value, namely @{thm LV_STAR_ONE_empty}.
   523 
   562 
   524   If a regular expression @{text r} matches a string @{text s}, then
   563   If a regular expression @{text r} matches a string @{text s}, then
   525   generally the set @{term "LV r s"} is not just a singleton set.  In
   564   generally the set @{term "LV r s"} is not just a singleton set.  In
   526   case of POSIX matching the problem is to calculate the unique lexical value
   565   case of POSIX matching the problem is to calculate the unique lexical value
   527   that satisfies the (informal) POSIX rules from the Introduction.
   566   that satisfies the (informal) POSIX rules from the Introduction.
   616   expressions and by analysing the shape of values (corresponding to 
   655   expressions and by analysing the shape of values (corresponding to 
   617   the derivative regular expressions).
   656   the derivative regular expressions).
   618   %
   657   %
   619   \begin{center}
   658   \begin{center}
   620   \begin{tabular}{l@ {\hspace{5mm}}lcl}
   659   \begin{tabular}{l@ {\hspace{5mm}}lcl}
   621   (1) & @{thm (lhs) injval.simps(1)} & $\dn$ & @{thm (rhs) injval.simps(1)}\\
   660   \textit{(1)} & @{thm (lhs) injval.simps(1)} & $\dn$ & @{thm (rhs) injval.simps(1)}\\
   622   (2) & @{thm (lhs) injval.simps(2)[of "r\<^sub>1" "r\<^sub>2" "c" "v\<^sub>1"]} & $\dn$ & 
   661   \textit{(2)} & @{thm (lhs) injval.simps(2)[of "r\<^sub>1" "r\<^sub>2" "c" "v\<^sub>1"]} & $\dn$ & 
   623       @{thm (rhs) injval.simps(2)[of "r\<^sub>1" "r\<^sub>2" "c" "v\<^sub>1"]}\\
   662       @{thm (rhs) injval.simps(2)[of "r\<^sub>1" "r\<^sub>2" "c" "v\<^sub>1"]}\\
   624   (3) & @{thm (lhs) injval.simps(3)[of "r\<^sub>1" "r\<^sub>2" "c" "v\<^sub>2"]} & $\dn$ & 
   663   \textit{(3)} & @{thm (lhs) injval.simps(3)[of "r\<^sub>1" "r\<^sub>2" "c" "v\<^sub>2"]} & $\dn$ & 
   625       @{thm (rhs) injval.simps(3)[of "r\<^sub>1" "r\<^sub>2" "c" "v\<^sub>2"]}\\
   664       @{thm (rhs) injval.simps(3)[of "r\<^sub>1" "r\<^sub>2" "c" "v\<^sub>2"]}\\
   626   (4) & @{thm (lhs) injval.simps(4)[of "r\<^sub>1" "r\<^sub>2" "c" "v\<^sub>1" "v\<^sub>2"]} & $\dn$ 
   665   \textit{(4)} & @{thm (lhs) injval.simps(4)[of "r\<^sub>1" "r\<^sub>2" "c" "v\<^sub>1" "v\<^sub>2"]} & $\dn$ 
   627       & @{thm (rhs) injval.simps(4)[of "r\<^sub>1" "r\<^sub>2" "c" "v\<^sub>1" "v\<^sub>2"]}\\
   666       & @{thm (rhs) injval.simps(4)[of "r\<^sub>1" "r\<^sub>2" "c" "v\<^sub>1" "v\<^sub>2"]}\\
   628   (5) & @{thm (lhs) injval.simps(5)[of "r\<^sub>1" "r\<^sub>2" "c" "v\<^sub>1" "v\<^sub>2"]} & $\dn$ 
   667   \textit{(5)} & @{thm (lhs) injval.simps(5)[of "r\<^sub>1" "r\<^sub>2" "c" "v\<^sub>1" "v\<^sub>2"]} & $\dn$ 
   629       & @{thm (rhs) injval.simps(5)[of "r\<^sub>1" "r\<^sub>2" "c" "v\<^sub>1" "v\<^sub>2"]}\\
   668       & @{thm (rhs) injval.simps(5)[of "r\<^sub>1" "r\<^sub>2" "c" "v\<^sub>1" "v\<^sub>2"]}\\
   630   (6) & @{thm (lhs) injval.simps(6)[of "r\<^sub>1" "r\<^sub>2" "c" "v\<^sub>2"]} & $\dn$ 
   669   \textit{(6)} & @{thm (lhs) injval.simps(6)[of "r\<^sub>1" "r\<^sub>2" "c" "v\<^sub>2"]} & $\dn$ 
   631       & @{thm (rhs) injval.simps(6)[of "r\<^sub>1" "r\<^sub>2" "c" "v\<^sub>2"]}\\
   670       & @{thm (rhs) injval.simps(6)[of "r\<^sub>1" "r\<^sub>2" "c" "v\<^sub>2"]}\\
   632   (7) & @{thm (lhs) injval.simps(7)[of "r" "c" "v" "vs"]} & $\dn$ 
   671   \textit{(7)} & @{thm (lhs) injval.simps(7)[of "r" "c" "v" "vs"]} & $\dn$ 
   633       & @{thm (rhs) injval.simps(7)[of "r" "c" "v" "vs"]}\\
   672       & @{thm (rhs) injval.simps(7)[of "r" "c" "v" "vs"]}\\
   634   \end{tabular}
   673   \end{tabular}
   635   \end{center}
   674   \end{center}
   636 
   675 
   637   \noindent To better understand what is going on in this definition it
   676   \noindent To better understand what is going on in this definition it
   638   might be instructive to look first at the three sequence cases (clauses
   677   might be instructive to look first at the three sequence cases (clauses
   639   (4)--(6)). In each case we need to construct an ``injected value'' for
   678   \textit{(4)} -- \textit{(6)}). In each case we need to construct an ``injected value'' for
   640   @{term "SEQ r\<^sub>1 r\<^sub>2"}. This must be a value of the form @{term
   679   @{term "SEQ r\<^sub>1 r\<^sub>2"}. This must be a value of the form @{term
   641   "Seq DUMMY DUMMY"}\,. Recall the clause of the @{text derivative}-function
   680   "Seq DUMMY DUMMY"}\,. Recall the clause of the @{text derivative}-function
   642   for sequence regular expressions:
   681   for sequence regular expressions:
   643 
   682 
   644   \begin{center}
   683   \begin{center}
   646   \end{center}
   685   \end{center}
   647 
   686 
   648   \noindent Consider first the @{text "else"}-branch where the derivative is @{term
   687   \noindent Consider first the @{text "else"}-branch where the derivative is @{term
   649   "SEQ (der c r\<^sub>1) r\<^sub>2"}. The corresponding value must therefore
   688   "SEQ (der c r\<^sub>1) r\<^sub>2"}. The corresponding value must therefore
   650   be of the form @{term "Seq v\<^sub>1 v\<^sub>2"}, which matches the left-hand
   689   be of the form @{term "Seq v\<^sub>1 v\<^sub>2"}, which matches the left-hand
   651   side in clause~(4) of @{term inj}. In the @{text "if"}-branch the derivative is an
   690   side in clause~\textit{(4)} of @{term inj}. In the @{text "if"}-branch the derivative is an
   652   alternative, namely @{term "ALT (SEQ (der c r\<^sub>1) r\<^sub>2) (der c
   691   alternative, namely @{term "ALT (SEQ (der c r\<^sub>1) r\<^sub>2) (der c
   653   r\<^sub>2)"}. This means we either have to consider a @{text Left}- or
   692   r\<^sub>2)"}. This means we either have to consider a @{text Left}- or
   654   @{text Right}-value. In case of the @{text Left}-value we know further it
   693   @{text Right}-value. In case of the @{text Left}-value we know further it
   655   must be a value for a sequence regular expression. Therefore the pattern
   694   must be a value for a sequence regular expression. Therefore the pattern
   656   we match in the clause (5) is @{term "Left (Seq v\<^sub>1 v\<^sub>2)"},
   695   we match in the clause \textit{(5)} is @{term "Left (Seq v\<^sub>1 v\<^sub>2)"},
   657   while in (6) it is just @{term "Right v\<^sub>2"}. One more interesting
   696   while in \textit{(6)} it is just @{term "Right v\<^sub>2"}. One more interesting
   658   point is in the right-hand side of clause (6): since in this case the
   697   point is in the right-hand side of clause \textit{(6)}: since in this case the
   659   regular expression @{text "r\<^sub>1"} does not ``contribute'' to
   698   regular expression @{text "r\<^sub>1"} does not ``contribute'' to
   660   matching the string, that means it only matches the empty string, we need to
   699   matching the string, that means it only matches the empty string, we need to
   661   call @{const mkeps} in order to construct a value for how @{term "r\<^sub>1"}
   700   call @{const mkeps} in order to construct a value for how @{term "r\<^sub>1"}
   662   can match this empty string. A similar argument applies for why we can
   701   can match this empty string. A similar argument applies for why we can
   663   expect in the left-hand side of clause (7) that the value is of the form
   702   expect in the left-hand side of clause \textit{(7)} that the value is of the form
   664   @{term "Seq v (Stars vs)"}---the derivative of a star is @{term "SEQ (der c r)
   703   @{term "Seq v (Stars vs)"}---the derivative of a star is @{term "SEQ (der c r)
   665   (STAR r)"}. Finally, the reason for why we can ignore the second argument
   704   (STAR r)"}. Finally, the reason for why we can ignore the second argument
   666   in clause (1) of @{term inj} is that it will only ever be called in cases
   705   in clause \textit{(1)} of @{term inj} is that it will only ever be called in cases
   667   where @{term "c=d"}, but the usual linearity restrictions in patterns do
   706   where @{term "c=d"}, but the usual linearity restrictions in patterns do
   668   not allow us to build this constraint explicitly into our function
   707   not allow us to build this constraint explicitly into our function
   669   definition.\footnote{Sulzmann and Lu state this clause as @{thm (lhs)
   708   definition.\footnote{Sulzmann and Lu state this clause as @{thm (lhs)
   670   injval.simps(1)[of "c" "c"]} $\dn$ @{thm (rhs) injval.simps(1)[of "c"]},
   709   injval.simps(1)[of "c" "c"]} $\dn$ @{thm (rhs) injval.simps(1)[of "c"]},
   671   but our deviation is harmless.}
   710   but our deviation is harmless.}
   710   virtue of this algorithm is that it can be implemented with ease in any
   749   virtue of this algorithm is that it can be implemented with ease in any
   711   functional programming language and also in Isabelle/HOL. In the remaining
   750   functional programming language and also in Isabelle/HOL. In the remaining
   712   part of this section we prove that this algorithm is correct.
   751   part of this section we prove that this algorithm is correct.
   713 
   752 
   714   The well-known idea of POSIX matching is informally defined by some
   753   The well-known idea of POSIX matching is informally defined by some
   715   rules such as the longest match and priority rule (see
   754   rules such as the Longest Match and Priority Rules (see
   716   Introduction); as correctly argued in \cite{Sulzmann2014}, this
   755   Introduction); as correctly argued in \cite{Sulzmann2014}, this
   717   needs formal specification. Sulzmann and Lu define an ``ordering
   756   needs formal specification. Sulzmann and Lu define an ``ordering
   718   relation'' between values and argue that there is a maximum value,
   757   relation'' between values and argue that there is a maximum value,
   719   as given by the derivative-based algorithm.  In contrast, we shall
   758   as given by the derivative-based algorithm.  In contrast, we shall
   720   introduce a simple inductive definition that specifies directly what
   759   introduce a simple inductive definition that specifies directly what
   721   a \emph{POSIX value} is, incorporating the POSIX-specific choices
   760   a \emph{POSIX value} is, incorporating the POSIX-specific choices
   722   into the side-conditions of our rules. Our definition is inspired by
   761   into the side-conditions of our rules. Our definition is inspired by
   723   the matching relation given by Vansummeren
   762   the matching relation given by Vansummeren~\cite{Vansummeren2006}. 
   724   \cite{Vansummeren2006}. The relation we define is ternary and
   763   The relation we define is ternary and
   725   written as \mbox{@{term "s \<in> r \<rightarrow> v"}}, relating
   764   written as \mbox{@{term "s \<in> r \<rightarrow> v"}}, relating
   726   strings, regular expressions and values; the inductive rules are given in 
   765   strings, regular expressions and values; the inductive rules are given in 
   727   Figure~\ref{POSIXrules}.
   766   Figure~\ref{POSIXrules}.
   728   We can prove that given a string @{term s} and regular expression @{term
   767   We can prove that given a string @{term s} and regular expression @{term
   729    r}, the POSIX value @{term v} is uniquely determined by @{term "s \<in> r \<rightarrow> v"}.
   768    r}, the POSIX value @{term v} is uniquely determined by @{term "s \<in> r \<rightarrow> v"}.
   800   "P\<star>"}-rule. Also there we want that @{term "s\<^sub>1"} is the longest initial
   839   "P\<star>"}-rule. Also there we want that @{term "s\<^sub>1"} is the longest initial
   801   split of @{term "s\<^sub>1 @ s\<^sub>2"} and furthermore the corresponding value
   840   split of @{term "s\<^sub>1 @ s\<^sub>2"} and furthermore the corresponding value
   802   @{term v} cannot be flattened to the empty string. In effect, we require
   841   @{term v} cannot be flattened to the empty string. In effect, we require
   803   that in each ``iteration'' of the star, some non-empty substring needs to
   842   that in each ``iteration'' of the star, some non-empty substring needs to
   804   be ``chipped'' away; only in case of the empty string we accept @{term
   843   be ``chipped'' away; only in case of the empty string we accept @{term
   805   "Stars []"} as the POSIX value. Indeed we can show that our POSIX value
   844   "Stars []"} as the POSIX value. Indeed we can show that our POSIX values
   806   is a lexical value which excludes those @{text Stars} containing subvalues 
   845   are lexical values which exclude those @{text Stars} that contain subvalues 
   807   that flatten to the empty string.
   846   that flatten to the empty string.
   808 
   847 
   809   \begin{lemma}\label{LVposix}
   848   \begin{lemma}\label{LVposix}
   810   @{thm [mode=IfThen] Posix_LV}
   849   @{thm [mode=IfThen] Posix_LV}
   811   \end{lemma}
   850   \end{lemma}
   907 
   946 
   908   \begin{proof}
   947   \begin{proof}
   909   By induction on @{term s} using Lemma~\ref{lemmkeps} and \ref{Posix2}.\qed  
   948   By induction on @{term s} using Lemma~\ref{lemmkeps} and \ref{Posix2}.\qed  
   910   \end{proof}
   949   \end{proof}
   911 
   950 
   912   \noindent In (2) we further know by Theorem~\ref{posixdeterm} that the
   951   \noindent In \textit{(2)} we further know by Theorem~\ref{posixdeterm} that the
   913   value returned by the lexer must be unique.   A simple corollary 
   952   value returned by the lexer must be unique.   A simple corollary 
   914   of our two theorems is:
   953   of our two theorems is:
   915 
   954 
   916   \begin{corollary}\mbox{}\smallskip\\\label{lexercorrectcor}
   955   \begin{corollary}\mbox{}\smallskip\\\label{lexercorrectcor}
   917   \begin{tabular}{ll}
   956   \begin{tabular}{ll}
   940   as the maximal elements.  An extended version of \cite{Sulzmann2014}
   979   as the maximal elements.  An extended version of \cite{Sulzmann2014}
   941   is available at the website of its first author; this includes more
   980   is available at the website of its first author; this includes more
   942   details of their proofs, but which are evidently not in final form
   981   details of their proofs, but which are evidently not in final form
   943   yet. Unfortunately, we were not able to verify claims that their
   982   yet. Unfortunately, we were not able to verify claims that their
   944   ordering has properties such as being transitive or having maximal
   983   ordering has properties such as being transitive or having maximal
   945   elements.
   984   elements. 
   946  
   985  
   947   Okui and Suzuki \cite{OkuiSuzuki2010,OkuiSuzukiTech} described
   986   Okui and Suzuki \cite{OkuiSuzuki2010,OkuiSuzukiTech} described
   948   another ordering of values, which they use to establish the
   987   another ordering of values, which they use to establish the
   949   correctness of their automata-based algorithm for POSIX matching.
   988   correctness of their automata-based algorithm for POSIX matching.
   950   Their ordering resembles some aspects of the one given by Sulzmann
   989   Their ordering resembles some aspects of the one given by Sulzmann
   951   and Lu, but is quite different. To begin with, Okui and Suzuki
   990   and Lu, but overall is quite different. To begin with, Okui and
   952   identify POSIX values as minimal, rather than maximal, elements in
   991   Suzuki identify POSIX values as minimal, rather than maximal,
   953   their ordering. A more substantial difference is that the ordering
   992   elements in their ordering. A more substantial difference is that
   954   by Okui and Suzuki uses \emph{positions} in order to identify and
   993   the ordering by Okui and Suzuki uses \emph{positions} in order to
   955   compare subvalues. Positions are lists of natural numbers. This
   994   identify and compare subvalues. Positions are lists of natural
   956   allows them to quite naturally formalise the Longest Match and
   995   numbers. This allows them to quite naturally formalise the Longest
   957   Priority rules of the informal POSIX standard.  Consider for example
   996   Match and Priority rules of the informal POSIX standard.  Consider
   958   the value @{term v}
   997   for example the value @{term v}
   959 
   998 
   960   \begin{center}
   999   \begin{center}
   961   @{term "v == Stars [Seq (Char x) (Char y), Char z]"}
  1000   @{term "v == Stars [Seq (Char x) (Char y), Char z]"}
   962   \end{center}
  1001   \end{center}
   963 
  1002 
   964   \noindent
  1003   \noindent
   965   At position @{text "[0,1]"} of this value is the
  1004   At position @{text "[0,1]"} of this value is the
   966   subvalue @{text "Char y"} and at position @{text "[1]"} the
  1005   subvalue @{text "Char y"} and at position @{text "[1]"} the
   967   subvalue @{term "Char z"}.  At the `root' position, or empty list
  1006   subvalue @{term "Char z"}.  At the `root' position, or empty list
   968   @{term "[]"}, is the whole value @{term v}. The positions @{text
  1007   @{term "[]"}, is the whole value @{term v}. Positions such as @{text
   969   "[0,1,0]"} and @{text "[2]"}, for example, are outside of @{text
  1008   "[0,1,0]"} or @{text "[2]"} are outside of @{text
   970   v}. If it exists, the subvalue of @{term v} at a position @{text
  1009   v}. If it exists, the subvalue of @{term v} at a position @{text
   971   p}, written @{term "at v p"}, can be recursively defined by
  1010   p}, written @{term "at v p"}, can be recursively defined by
   972   
  1011   
   973   \begin{center}
  1012   \begin{center}
   974   \begin{tabular}{r@ {\hspace{0mm}}lcl}
  1013   \begin{tabular}{r@ {\hspace{0mm}}lcl}
   985   \end{tabular} 
  1024   \end{tabular} 
   986   \end{center}
  1025   \end{center}
   987 
  1026 
   988   \noindent In the last clause we use Isabelle's notation @{term "vs ! n"} for the
  1027   \noindent In the last clause we use Isabelle's notation @{term "vs ! n"} for the
   989   @{text n}th element in a list.  The set of positions inside a value @{text v},
  1028   @{text n}th element in a list.  The set of positions inside a value @{text v},
   990   written @{term "Pos v"}, is given by the clauses
  1029   written @{term "Pos v"}, is given by 
   991 
  1030 
   992   \begin{center}
  1031   \begin{center}
   993   \begin{tabular}{lcl}
  1032   \begin{tabular}{lcl}
   994   @{thm (lhs) Pos.simps(1)} & @{text "\<equiv>"} & @{thm (rhs) Pos.simps(1)}\\
  1033   @{thm (lhs) Pos.simps(1)} & @{text "\<equiv>"} & @{thm (rhs) Pos.simps(1)}\\
   995   @{thm (lhs) Pos.simps(2)} & @{text "\<equiv>"} & @{thm (rhs) Pos.simps(2)}\\
  1034   @{thm (lhs) Pos.simps(2)} & @{text "\<equiv>"} & @{thm (rhs) Pos.simps(2)}\\
  1001   @{thm (lhs) Pos_stars} & @{text "\<equiv>"} & @{thm (rhs) Pos_stars}\\
  1040   @{thm (lhs) Pos_stars} & @{text "\<equiv>"} & @{thm (rhs) Pos_stars}\\
  1002   \end{tabular}
  1041   \end{tabular}
  1003   \end{center}
  1042   \end{center}
  1004 
  1043 
  1005   \noindent 
  1044   \noindent 
  1006   whereby @{text len} stands for the length of a list. Clearly
  1045   whereby @{text len} in the last clause stands for the length of a list. Clearly
  1007   for every position inside a value there exists a subvalue at that position.
  1046   for every position inside a value there exists a subvalue at that position.
  1008  
  1047  
  1009 
  1048 
  1010   To help understanding the ordering of Okui and Suzuki, consider again 
  1049   To help understanding the ordering of Okui and Suzuki, consider again 
  1011   the earlier value
  1050   the earlier value
  1017   @{term "w == Stars [Char x, Char y, Char z]"}  
  1056   @{term "w == Stars [Char x, Char y, Char z]"}  
  1018   \end{tabular}
  1057   \end{tabular}
  1019   \end{center}
  1058   \end{center}
  1020 
  1059 
  1021   \noindent Both values match the string @{text "xyz"}, that means if
  1060   \noindent Both values match the string @{text "xyz"}, that means if
  1022   we flatten the values at their respective root position, we obtain
  1061   we flatten these values at their respective root position, we obtain
  1023   @{text "xyz"}. However, at position @{text "[0]"}, @{text v} matches
  1062   @{text "xyz"}. However, at position @{text "[0]"}, @{text v} matches
  1024   @{text xy} whereas @{text w} matches only the shorter @{text x}. So
  1063   @{text xy} whereas @{text w} matches only the shorter @{text x}. So
  1025   according to the Longest Match Rule, we should prefer @{text v},
  1064   according to the Longest Match Rule, we should prefer @{text v},
  1026   rather than @{text w} as POSIX value for string @{text xyz} (and
  1065   rather than @{text w} as POSIX value for string @{text xyz} (and
  1027   corresponding regular expression). In order to
  1066   corresponding regular expression). In order to
  1044 
  1083 
  1045   \begin{center}
  1084   \begin{center}
  1046   @{term "v == Left (Char x)"} \qquad and \qquad @{term "w == Right (Char x)"}
  1085   @{term "v == Left (Char x)"} \qquad and \qquad @{term "w == Right (Char x)"}
  1047   \end{center}
  1086   \end{center}
  1048 
  1087 
  1049   \noindent Both values match @{text x}, but at position @{text "[0]"}
  1088   \noindent Both values match @{text x}. At position @{text "[0]"}
  1050   the norm of @{term v} is @{text 1} (the subvalue matches @{text x}),
  1089   the norm of @{term v} is @{text 1} (the subvalue matches @{text x}),
  1051   but the norm of @{text w} is @{text "-1"} (the position is outside
  1090   but the norm of @{text w} is @{text "-1"} (the position is outside
  1052   @{text w} according to how we defined the `inside' positions of
  1091   @{text w} according to how we defined the `inside' positions of
  1053   @{text Left}- and @{text Right}-values).  Of course at position
  1092   @{text Left}- and @{text Right}-values).  Of course at position
  1054   @{text "[1]"}, the norms @{term "pflat_len v [1]"} and @{term
  1093   @{text "[1]"}, the norms @{term "pflat_len v [1]"} and @{term
  1055   "pflat_len w [1]"} are reversed, but the point is that subvalues
  1094   "pflat_len w [1]"} are reversed, but the point is that subvalues
  1056   will be analysed according to lexicographically ordered
  1095   will be analysed according to lexicographically ordered
  1057   positions. According to this ordering, the position @{text "[0]"}
  1096   positions. According to this ordering, the position @{text "[0]"}
  1058   takes precedence.  The lexicographic ordering of positions, written
  1097   takes precedence over @{text "[1]"} and thus also @{text v} will be 
       
  1098   preferred over @{text w}.  The lexicographic ordering of positions, written
  1059   @{term "DUMMY \<sqsubset>lex DUMMY"}, can be conveniently formalised
  1099   @{term "DUMMY \<sqsubset>lex DUMMY"}, can be conveniently formalised
  1060   by three inference rules
  1100   by three inference rules
  1061 
  1101 
  1062   \begin{center}
  1102   \begin{center}
  1063   \begin{tabular}{ccc}
  1103   \begin{tabular}{ccc}
  1068   \end{tabular}
  1108   \end{tabular}
  1069   \end{center}
  1109   \end{center}
  1070 
  1110 
  1071   With the norm and lexicographic order in place,
  1111   With the norm and lexicographic order in place,
  1072   we can state the key definition of Okui and Suzuki
  1112   we can state the key definition of Okui and Suzuki
  1073   \cite{OkuiSuzuki2010}: a value @{term "v\<^sub>1"} is \emph{smaller} than
  1113   \cite{OkuiSuzuki2010}: a value @{term "v\<^sub>1"} is \emph{smaller at position @{text p}} than
  1074   @{term "v\<^sub>2"} if and only if  $(i)$ the norm at position @{text p} is
  1114   @{term "v\<^sub>2"}, written @{term "v\<^sub>1 \<sqsubset>val p v\<^sub>2"}, 
       
  1115   if and only if  $(i)$ the norm at position @{text p} is
  1075   greater in @{term "v\<^sub>1"} (that is the string @{term "flat (at v\<^sub>1 p)"} is longer 
  1116   greater in @{term "v\<^sub>1"} (that is the string @{term "flat (at v\<^sub>1 p)"} is longer 
  1076   than @{term "flat (at v\<^sub>2 p)"}) and $(ii)$ all subvalues at 
  1117   than @{term "flat (at v\<^sub>2 p)"}) and $(ii)$ all subvalues at 
  1077   positions that are inside @{term "v\<^sub>1"} or @{term "v\<^sub>2"} and that are
  1118   positions that are inside @{term "v\<^sub>1"} or @{term "v\<^sub>2"} and that are
  1078   lexicographically smaller than @{text p}, we have the same norm, namely
  1119   lexicographically smaller than @{text p}, we have the same norm, namely
  1079 
  1120 
  1104   \end{center}
  1145   \end{center}
  1105 
  1146 
  1106  While we encountered a number of obstacles for establishing properties like
  1147  While we encountered a number of obstacles for establishing properties like
  1107  transitivity for the ordering of Sulzmann and Lu (and which we failed
  1148  transitivity for the ordering of Sulzmann and Lu (and which we failed
  1108  to overcome), it is relatively straightforward to establish this
  1149  to overcome), it is relatively straightforward to establish this
  1109  property for the ordering by Okui and Suzuki.
  1150  property for the orderings
       
  1151  @{term "DUMMY :\<sqsubset>val DUMMY"} and @{term "DUMMY :\<sqsubseteq>val DUMMY"}  
       
  1152  by Okui and Suzuki.
  1110 
  1153 
  1111  \begin{lemma}[Transitivity]\label{transitivity}
  1154  \begin{lemma}[Transitivity]\label{transitivity}
  1112  @{thm [mode=IfThen] PosOrd_trans[where ?v1.0="v\<^sub>1" and ?v2.0="v\<^sub>2" and ?v3.0="v\<^sub>3"]} 
  1155  @{thm [mode=IfThen] PosOrd_trans[where ?v1.0="v\<^sub>1" and ?v2.0="v\<^sub>2" and ?v3.0="v\<^sub>3"]} 
  1113  \end{lemma}
  1156  \end{lemma}
  1114 
  1157 
  1116  and @{text q}, where the values @{text "v\<^sub>1"} and @{text
  1159  and @{text q}, where the values @{text "v\<^sub>1"} and @{text
  1117  "v\<^sub>2"} (respectively @{text "v\<^sub>2"} and @{text
  1160  "v\<^sub>2"} (respectively @{text "v\<^sub>2"} and @{text
  1118  "v\<^sub>3"}) are `distinct'.  Since @{text
  1161  "v\<^sub>3"}) are `distinct'.  Since @{text
  1119  "\<prec>\<^bsub>lex\<^esub>"} is trichotomous, we need to consider
  1162  "\<prec>\<^bsub>lex\<^esub>"} is trichotomous, we need to consider
  1120  three cases, namely @{term "p = q"}, @{term "p \<sqsubset>lex q"} and
  1163  three cases, namely @{term "p = q"}, @{term "p \<sqsubset>lex q"} and
  1121  @{term "q \<sqsubset>lex p"}. Let us look at the first case.
  1164  @{term "q \<sqsubset>lex p"}. Let us look at the first case.  Clearly
  1122  Clearly @{term "pflat_len v\<^sub>2 p < pflat_len v\<^sub>1 p"}
  1165  @{term "pflat_len v\<^sub>2 p < pflat_len v\<^sub>1 p"} and @{term
  1123  and @{term "pflat_len v\<^sub>3 p < pflat_len v\<^sub>2 p"}
  1166  "pflat_len v\<^sub>3 p < pflat_len v\<^sub>2 p"} imply @{term
  1124  imply @{term "pflat_len v\<^sub>3 p < pflat_len v\<^sub>1 p"}.
  1167  "pflat_len v\<^sub>3 p < pflat_len v\<^sub>1 p"}.  It remains to show
  1125  It remains to show for a @{term "p' \<in> Pos v\<^sub>1 \<union> Pos v\<^sub>3"}
  1168  that for a @{term "p' \<in> Pos v\<^sub>1 \<union> Pos v\<^sub>3"}
  1126  with @{term "p' \<sqsubset>lex p"} that  
  1169  with @{term "p' \<sqsubset>lex p"} that @{term "pflat_len v\<^sub>1
  1127  @{term "pflat_len v\<^sub>1 p' = pflat_len v\<^sub>3 p'"} holds.
  1170  p' = pflat_len v\<^sub>3 p'"} holds.  Suppose @{term "p' \<in> Pos
  1128  Suppose @{term "p' \<in> Pos v\<^sub>1"}, then we can infer from the 
  1171  v\<^sub>1"}, then we can infer from the first assumption that @{term
  1129  first assumption that @{term "pflat_len v\<^sub>1 p' = pflat_len v\<^sub>2 p'"}.
  1172  "pflat_len v\<^sub>1 p' = pflat_len v\<^sub>2 p'"}.  But this means
  1130  But this means that @{term "p'"} must be in  @{term "Pos v\<^sub>2"} too.
  1173  that @{term "p'"} must be in @{term "Pos v\<^sub>2"} too (the norm
  1131  Hence we can use the second assumption and infer  @{term "pflat_len v\<^sub>2 p' = pflat_len v\<^sub>3 p'"}, which concludes
  1174  cannot be @{text "-1"} given @{term "p' \<in> Pos v\<^sub>1"}).  
  1132  this case with @{term "v\<^sub>1 :\<sqsubset>val v\<^sub>3"}. 
  1175  Hence we can use the second assumption and
  1133  The reasoning in the other cases is similar.\qed
  1176  infer @{term "pflat_len v\<^sub>2 p' = pflat_len v\<^sub>3 p'"},
       
  1177  which concludes this case with @{term "v\<^sub>1 :\<sqsubset>val
       
  1178  v\<^sub>3"}.  The reasoning in the other cases is similar.\qed
  1134  \end{proof}
  1179  \end{proof}
  1135 
  1180 
  1136  \noindent It is straightforward to show that @{text "\<prec>"} and
  1181  \noindent 
  1137  $\preccurlyeq$ are partial orders.  Okui and Suzuki also show that it
  1182  The proof for $\preccurlyeq$ is similar and omitted.
  1138  is a linear order for lexical values \cite{OkuiSuzuki2010} of a given
  1183  It is also straightforward to show that @{text "\<prec>"} and
  1139  regular expression and given string, but we have not done this. It is
  1184  $\preccurlyeq$ are partial orders.  Okui and Suzuki furthermore show that they
       
  1185  are linear orderings for lexical values \cite{OkuiSuzuki2010} of a given
       
  1186  regular expression and given string, but we have not formalised this in Isabelle. It is
  1140  not essential for our results. What we are going to show below is
  1187  not essential for our results. What we are going to show below is
  1141  that for a given @{text r} and @{text s}, the ordering has a unique
  1188  that for a given @{text r} and @{text s}, the orderings have a unique
  1142  minimal element on the set @{term "LV r s"}, which is the POSIX value
  1189  minimal element on the set @{term "LV r s"}, which is the POSIX value
  1143  we defined in the previous section.
  1190  we defined in the previous section. We start with two properties that
  1144 
  1191  show how the length of a flattened value relates to the @{text "\<prec>"}-ordering.
  1145 
  1192 
  1146  Lemma 1
  1193  \begin{proposition}\mbox{}\smallskip\\\label{ordlen}
  1147 
  1194  \begin{tabular}{@ {}ll}
  1148  @{thm [mode=IfThen] PosOrd_shorterE[where ?v1.0="v\<^sub>1" and ?v2.0="v\<^sub>2"]}
  1195  (1) &
  1149 
  1196  @{thm [mode=IfThen] PosOrd_shorterE[where ?v1.0="v\<^sub>1" and ?v2.0="v\<^sub>2"]}\\
  1150  but in the other direction only
  1197  (2) &
  1151 
       
  1152  @{thm [mode=IfThen] PosOrd_shorterI[where ?v1.0="v\<^sub>1" and ?v2.0="v\<^sub>2"]} 
  1198  @{thm [mode=IfThen] PosOrd_shorterI[where ?v1.0="v\<^sub>1" and ?v2.0="v\<^sub>2"]} 
  1153 
  1199  \end{tabular} 
       
  1200  \end{proposition}
  1154  
  1201  
  1155 
  1202  \noindent Both properties follow from the definition of the ordering. Note that
  1156   Next we establish how Okui and Suzuki's ordering relates to our
  1203  \textit{(2)} entails that a value, say @{term "v\<^sub>2"}, whose underlying 
       
  1204  string is a strict prefix of another flattened value, say @{term "v\<^sub>1"}, then
       
  1205  @{term "v\<^sub>1"} must be smaller than @{term "v\<^sub>2"}. For our proofs it
       
  1206  will be useful to have the following properties---in each case the underlying strings 
       
  1207  of the compared values are the same: 
       
  1208 
       
  1209   \begin{proposition}\mbox{}\smallskip\\\label{ordintros}
       
  1210   \begin{tabular}{ll}
       
  1211   \textit{(1)} & 
       
  1212   @{thm [mode=IfThen] PosOrd_Left_Right[where ?v1.0="v\<^sub>1" and ?v2.0="v\<^sub>2"]}\\
       
  1213   \textit{(2)} & If
       
  1214   @{thm (prem 1) PosOrd_Left_eq[where ?v1.0="v\<^sub>1" and ?v2.0="v\<^sub>2"]} \;then\;
       
  1215   @{thm (lhs) PosOrd_Left_eq[where ?v1.0="v\<^sub>1" and ?v2.0="v\<^sub>2"]} \;iff\;
       
  1216   @{thm (rhs) PosOrd_Left_eq[where ?v1.0="v\<^sub>1" and ?v2.0="v\<^sub>2"]}\\
       
  1217   \textit{(3)} & If
       
  1218   @{thm (prem 1) PosOrd_Right_eq[where ?v1.0="v\<^sub>1" and ?v2.0="v\<^sub>2"]} \;then\;
       
  1219   @{thm (lhs) PosOrd_Right_eq[where ?v1.0="v\<^sub>1" and ?v2.0="v\<^sub>2"]} \;iff\;
       
  1220   @{thm (rhs) PosOrd_Right_eq[where ?v1.0="v\<^sub>1" and ?v2.0="v\<^sub>2"]}\\
       
  1221   \textit{(4)} & If
       
  1222   @{thm (prem 1) PosOrd_Seq_eq[where ?v2.0="v\<^sub>2" and ?w2.0="w\<^sub>2"]} \;then\;
       
  1223   @{thm (lhs) PosOrd_Seq_eq[where ?v2.0="v\<^sub>2" and ?w2.0="w\<^sub>2"]} \;iff\;
       
  1224   @{thm (rhs) PosOrd_Seq_eq[where ?v2.0="v\<^sub>2" and ?w2.0="w\<^sub>2"]}\\
       
  1225   \textit{(5)} & If
       
  1226   @{thm (prem 2) PosOrd_SeqI1[simplified, where ?v1.0="v\<^sub>1" and ?v2.0="v\<^sub>2" and
       
  1227                                     ?w1.0="w\<^sub>1" and ?w2.0="w\<^sub>2"]} \;and\;
       
  1228   @{thm (prem 1) PosOrd_SeqI1[where ?v1.0="v\<^sub>1" and ?v2.0="v\<^sub>2" and
       
  1229                                     ?w1.0="w\<^sub>1" and ?w2.0="w\<^sub>2"]} \;then\;
       
  1230   @{thm (concl) PosOrd_SeqI1[where ?v1.0="v\<^sub>1" and ?v2.0="v\<^sub>2" and
       
  1231                                    ?w1.0="w\<^sub>1" and ?w2.0="w\<^sub>2"]}\\
       
  1232   \textit{(6)} & If
       
  1233   @{thm (prem 1) PosOrd_Stars_append_eq[where ?vs1.0="vs\<^sub>1" and ?vs2.0="vs\<^sub>2"]} \;then\;
       
  1234   @{thm (lhs) PosOrd_Stars_append_eq[where ?vs1.0="vs\<^sub>1" and ?vs2.0="vs\<^sub>2"]} \;iff\;
       
  1235   @{thm (rhs) PosOrd_Stars_append_eq[where ?vs1.0="vs\<^sub>1" and ?vs2.0="vs\<^sub>2"]}\\  
       
  1236   
       
  1237   \textit{(7)} & If
       
  1238   @{thm (prem 2) PosOrd_StarsI[where ?v1.0="v\<^sub>1" and ?v2.0="v\<^sub>2" and
       
  1239                             ?vs1.0="vs\<^sub>1" and ?vs2.0="vs\<^sub>2"]} \;and\;
       
  1240   @{thm (prem 1) PosOrd_StarsI[where ?v1.0="v\<^sub>1" and ?v2.0="v\<^sub>2" and
       
  1241                             ?vs1.0="vs\<^sub>1" and ?vs2.0="vs\<^sub>2"]} \;then\;
       
  1242    @{thm (concl) PosOrd_StarsI[where ?v1.0="v\<^sub>1" and ?v2.0="v\<^sub>2" and
       
  1243                             ?vs1.0="vs\<^sub>1" and ?vs2.0="vs\<^sub>2"]}\\
       
  1244   \end{tabular} 
       
  1245   \end{proposition}
       
  1246 
       
  1247   \noindent One might prefer that statements \textit{(4)} and \textit{(5)} 
       
  1248   (respectively \textit{(6)} and \textit{(7)})
       
  1249   are combined into a single \textit{iff}-statement (like the ones for @{text
       
  1250   Left} and @{text Right}). Unfortunately this cannot be done easily: such
       
  1251   a single statement would require an additional assumption about the
       
  1252   two values @{term "Seq v\<^sub>1 v\<^sub>2"} and @{term "Seq w\<^sub>1 w\<^sub>2"}
       
  1253   being inhabited by the same regular expression. The
       
  1254   complexity of the proofs involved seems to not justify such a
       
  1255   `cleaner' single statement. The statements given are just the properties that
       
  1256   allow us to establish our theorems. The proofs for Proposition~\ref{ordintros}
       
  1257   are routine.
       
  1258  
       
  1259 
       
  1260   Next we establish how Okui and Suzuki's orderings relate to our
  1157   definition of POSIX values.  Given a @{text POSIX} value @{text "v\<^sub>1"}
  1261   definition of POSIX values.  Given a @{text POSIX} value @{text "v\<^sub>1"}
  1158   for @{text r} and @{text s}, then any other lexical value @{text
  1262   for @{text r} and @{text s}, then any other lexical value @{text
  1159   "v\<^sub>2"} in @{term "LV r s"} is greater or equal than @{text
  1263   "v\<^sub>2"} in @{term "LV r s"} is greater or equal than @{text
  1160   "v\<^sub>1"}, namely:
  1264   "v\<^sub>1"}, namely:
  1161 
  1265 
  1177 
  1281 
  1178 
  1282 
  1179   \begin{itemize} 
  1283   \begin{itemize} 
  1180 
  1284 
  1181   \item[$\bullet$] Case @{text "P+L"} with @{term "s \<in> (ALT r\<^sub>1 r\<^sub>2)
  1285   \item[$\bullet$] Case @{text "P+L"} with @{term "s \<in> (ALT r\<^sub>1 r\<^sub>2)
  1182   \<rightarrow> (Left w\<^sub>1)"}: In this case @{term "v\<^sub>1 =
  1286   \<rightarrow> (Left w\<^sub>1)"}: In this case the value 
  1183   Left w\<^sub>1"} and the value @{term "v\<^sub>2"} is either of the
  1287   @{term "v\<^sub>2"} is either of the
  1184   form @{term "Left w\<^sub>2"} or @{term "Right w\<^sub>2"}. In the
  1288   form @{term "Left w\<^sub>2"} or @{term "Right w\<^sub>2"}. In the
  1185   latter case we can immediately conclude with @{term "v\<^sub>1
  1289   latter case we can immediately conclude with \mbox{@{term "v\<^sub>1
  1186   :\<sqsubseteq>val v\<^sub>2"} since a @{text Left}-value with the
  1290   :\<sqsubseteq>val v\<^sub>2"}} since a @{text Left}-value with the
  1187   same underlying string @{text s} is always smaller than a
  1291   same underlying string @{text s} is always smaller than a
  1188   @{text Right}-value.  In the former case we have @{term "w\<^sub>2
  1292   @{text Right}-value by Proposition~\ref{ordintros}\textit{(1)}.  
       
  1293   In the former case we have @{term "w\<^sub>2
  1189   \<in> LV r\<^sub>1 s"} and can use the induction hypothesis to infer
  1294   \<in> LV r\<^sub>1 s"} and can use the induction hypothesis to infer
  1190   @{term "w\<^sub>1 :\<sqsubseteq>val w\<^sub>2"}. Because @{term
  1295   @{term "w\<^sub>1 :\<sqsubseteq>val w\<^sub>2"}. Because @{term
  1191   "w\<^sub>1"} and @{term "w\<^sub>2"} have the same underlying string
  1296   "w\<^sub>1"} and @{term "w\<^sub>2"} have the same underlying string
  1192   @{text s}, we can conclude with @{term "Left w\<^sub>1
  1297   @{text s}, we can conclude with @{term "Left w\<^sub>1
  1193   :\<sqsubseteq>val Left w\<^sub>2"} by Prop ???.\smallskip
  1298   :\<sqsubseteq>val Left w\<^sub>2"} using
       
  1299   Proposition~\ref{ordintros}\textit{(2)}.\smallskip
  1194 
  1300 
  1195   \item[$\bullet$] Case @{text "P+R"} with @{term "s \<in> (ALT r\<^sub>1 r\<^sub>2)
  1301   \item[$\bullet$] Case @{text "P+R"} with @{term "s \<in> (ALT r\<^sub>1 r\<^sub>2)
  1196   \<rightarrow> (Right w\<^sub>1)"}: This case similar to the previous
  1302   \<rightarrow> (Right w\<^sub>1)"}: This case similar to the previous
  1197   case, except that we additionally know @{term "s \<notin> L
  1303   case, except that we additionally know @{term "s \<notin> L
  1198   r\<^sub>1"}. This is needed when @{term "v\<^sub>2"} is of the form
  1304   r\<^sub>1"}. This is needed when @{term "v\<^sub>2"} is of the form
  1199   @{term "Left w\<^sub>2"}. Since \mbox{@{term "flat v\<^sub>2 = flat
  1305   \mbox{@{term "Left w\<^sub>2"}}. Since \mbox{@{term "flat v\<^sub>2 = flat
  1200   w\<^sub>2"} @{text "= s"}} and @{term "\<Turnstile> w\<^sub>2 :
  1306   w\<^sub>2"} @{text "= s"}} and @{term "\<Turnstile> w\<^sub>2 :
  1201   r\<^sub>1"}, we can derive a contradiction for \mbox{@{term "s \<notin> L
  1307   r\<^sub>1"}, we can derive a contradiction for \mbox{@{term "s \<notin> L
  1202   r\<^sub>1"}} using
  1308   r\<^sub>1"}} using
  1203   Proposition~\ref{inhabs}. So also in this case \mbox{@{term "v\<^sub>1
  1309   Proposition~\ref{inhabs}. So also in this case \mbox{@{term "v\<^sub>1
  1204   :\<sqsubseteq>val v\<^sub>2"}}.\smallskip
  1310   :\<sqsubseteq>val v\<^sub>2"}}.\smallskip
  1205 
  1311 
  1206   \item[$\bullet$] Case @{text "PS"} with @{term "(s\<^sub>1 @ s\<^sub>2) \<in> (SEQ
  1312   \item[$\bullet$] Case @{text "PS"} with @{term "(s\<^sub>1 @
  1207   r\<^sub>1 r\<^sub>2) \<rightarrow> (Seq w\<^sub>1 w\<^sub>2)"}: We
  1313   s\<^sub>2) \<in> (SEQ r\<^sub>1 r\<^sub>2) \<rightarrow> (Seq
  1208   can assume @{term "v\<^sub>2 = Seq (u\<^sub>1) (u\<^sub>2)"} with
  1314   w\<^sub>1 w\<^sub>2)"}: We can assume @{term "v\<^sub>2 = Seq
  1209   @{term "\<Turnstile> u\<^sub>1 : r\<^sub>1"} and \mbox{@{term
  1315   (u\<^sub>1) (u\<^sub>2)"} with @{term "\<Turnstile> u\<^sub>1 :
  1210   "\<Turnstile> u\<^sub>2 : r\<^sub>2"}}. We have @{term "s\<^sub>1 @
  1316   r\<^sub>1"} and \mbox{@{term "\<Turnstile> u\<^sub>2 :
  1211   s\<^sub>2 = (flat u\<^sub>1) @ (flat u\<^sub>2)"}.  By the
  1317   r\<^sub>2"}}. We have @{term "s\<^sub>1 @ s\<^sub>2 = (flat
  1212   side-condition of the @{text PS}-rule we know that either @{term
  1318   u\<^sub>1) @ (flat u\<^sub>2)"}.  By the side-condition of the
  1213   "s\<^sub>1 = flat u\<^sub>1"} or that @{term "flat u\<^sub>1"} is a
  1319   @{text PS}-rule we know that either @{term "s\<^sub>1 = flat
  1214   strict prefix ??? of @{term "s\<^sub>1"}. In the latter case we can
  1320   u\<^sub>1"} or that @{term "flat u\<^sub>1"} is a strict prefix of
  1215   infer @{term "w\<^sub>1 :\<sqsubset>val u\<^sub>1"} by ???  and from
  1321   @{term "s\<^sub>1"}. In the latter case we can infer @{term
  1216   this @{term "v\<^sub>1 :\<sqsubseteq>val v\<^sub>2"} by ???.  In the
  1322   "w\<^sub>1 :\<sqsubset>val u\<^sub>1"} by
  1217   former case we know @{term "u\<^sub>1 \<in> LV r\<^sub>1 s\<^sub>1"}
  1323   Proposition~\ref{ordlen}\textit{(2)} and from this @{term "v\<^sub>1
  1218   and @{term "u\<^sub>2 \<in> LV r\<^sub>2 s\<^sub>2"}. With this we
  1324   :\<sqsubseteq>val v\<^sub>2"} by Proposition~\ref{ordintros}\textit{(5)}
  1219   can use the induction hypotheses to infer @{term "w\<^sub>1
  1325   (as noted above @{term "v\<^sub>1"} and @{term "v\<^sub>2"} must have the
  1220   :\<sqsubseteq>val u\<^sub>1"} and @{term "w\<^sub>2
  1326   same underlying string).
  1221   :\<sqsubseteq>val u\<^sub>2"}. By ??? we can again infer @{term
  1327   In the former case we know
  1222   "v\<^sub>1 :\<sqsubseteq>val v\<^sub>2"}.
  1328   @{term "u\<^sub>1 \<in> LV r\<^sub>1 s\<^sub>1"} and @{term
       
  1329   "u\<^sub>2 \<in> LV r\<^sub>2 s\<^sub>2"}. With this we can use the
       
  1330   induction hypotheses to infer @{term "w\<^sub>1 :\<sqsubseteq>val
       
  1331   u\<^sub>1"} and @{term "w\<^sub>2 :\<sqsubseteq>val u\<^sub>2"}. By
       
  1332   Proposition~\ref{ordintros}\textit{(4,5)} we can again infer 
       
  1333   @{term "v\<^sub>1 :\<sqsubseteq>val
       
  1334   v\<^sub>2"}.
  1223 
  1335 
  1224   \end{itemize}
  1336   \end{itemize}
  1225 
  1337 
  1226   \noindent The case for @{text "P\<star>"} is similar to the @{text PS}-case and omitted.\qed
  1338   \noindent The case for @{text "P\<star>"} is similar to the @{text PS}-case and omitted.\qed
  1227   \end{proof}
  1339   \end{proof}
  1228 
  1340 
  1229   \noindent This theorem shows that our @{text POSIX} value for a
  1341   \noindent This theorem shows that our @{text POSIX} value for a
  1230   regular expression @{text r} and string @{term s} is in fact a
  1342   regular expression @{text r} and string @{term s} is in fact a
  1231   minimal element of the values in @{text "LV r s"}. By ??? we also
  1343   minimal element of the values in @{text "LV r s"}. By
  1232   know that any value in @{text "LV s' r"}, with @{term "s'"} being a
  1344   Proposition~\ref{ordlen}\textit{(2)} we also know that any value in
  1233   prefix, cannot be smaller than @{text "v\<^sub>1"}. The next theorem
  1345   @{text "LV s' r"}, with @{term "s'"} being a strict prefix, cannot be
  1234   shows the opposite---namely any minimal element in @{term "LV r s"}
  1346   smaller than @{text "v\<^sub>1"}. The next theorem shows the
  1235   must be a @{text POSIX} value. For this it helps that we proved in
  1347   opposite---namely any minimal element in @{term "LV r s"} must be a
  1236   the previous section that whenever a string @{term "s \<in> L r"} then 
  1348   @{text POSIX} value. This can be established by induction on @{text
  1237   a corresponding @{text POSIX} value exists and is a lexical value, 
  1349   r}, but the proof can be drastically simplified by using the fact
  1238   see Theorem ??? and Lemma ???. 
  1350   from the previous section about the existence of a @{text POSIX} value
       
  1351   whenever a string @{term "s \<in> L r"}.
       
  1352 
  1239 
  1353 
  1240   \begin{theorem}
  1354   \begin{theorem}
  1241   @{thm [mode=IfThen] PosOrd_Posix[where ?v1.0="v\<^sub>1"]} 
  1355   @{thm [mode=IfThen] PosOrd_Posix[where ?v1.0="v\<^sub>1"]} 
  1242   \end{theorem}
  1356   \end{theorem}
  1243 
  1357 
  1244   \begin{proof} 
  1358   \begin{proof} 
  1245   If @{thm (prem 1) PosOrd_Posix[where ?v1.0="v\<^sub>1"]} then 
  1359   If @{thm (prem 1) PosOrd_Posix[where ?v1.0="v\<^sub>1"]} then 
  1246   @{term "s \<in> L r"} by Proposition~\ref{inhabs}. Hence by Theorem~\ref{lexercorrect}(2) 
  1360   @{term "s \<in> L r"} by Proposition~\ref{inhabs}. Hence by Theorem~\ref{lexercorrect}(2) 
  1247   there exists a
  1361   there exists a
  1248   @{text POSIX} value @{term "v\<^sub>P"} with @{term "s \<in> r \<rightarrow> v\<^sub>P"}
  1362   @{text POSIX} value @{term "v\<^sub>P"} with @{term "s \<in> r \<rightarrow> v\<^sub>P"}
  1249   and by Lemma~\ref{LVposix} we also have @{term "v\<^sub>P \<in> LV r s"}.
  1363   and by Lemma~\ref{LVposix} we also have \mbox{@{term "v\<^sub>P \<in> LV r s"}}.
  1250   By Theorem~\ref{orderone} we therefore have 
  1364   By Theorem~\ref{orderone} we therefore have 
  1251   @{term "v\<^sub>P :\<sqsubseteq>val v\<^sub>1"}. If @{term "v\<^sub>P = v\<^sub>1"} then
  1365   @{term "v\<^sub>P :\<sqsubseteq>val v\<^sub>1"}. If @{term "v\<^sub>P = v\<^sub>1"} then
  1252   we are done. Otherwise we have @{term "v\<^sub>P :\<sqsubset>val v\<^sub>1"} which 
  1366   we are done. Otherwise we have @{term "v\<^sub>P :\<sqsubset>val v\<^sub>1"}, which 
  1253   however contradicts the second assumption and we are done too.\qed
  1367   however contradicts the second assumption about @{term "v\<^sub>1"} being the smallest
       
  1368   element in @{term "LV r s"}. So we are done in this case too.\qed
  1254   \end{proof}
  1369   \end{proof}
       
  1370 
       
  1371   \noindent
       
  1372   From this we can also show 
       
  1373   that if @{term "LV r s"} is non-empty (or equivalently @{term "s \<in> L r"}) then 
       
  1374   it has a unique minimal element:
  1255 
  1375 
  1256   \begin{corollary}
  1376   \begin{corollary}
  1257   @{thm [mode=IfThen] Least_existence1}
  1377   @{thm [mode=IfThen] Least_existence1}
  1258   \end{corollary}
  1378   \end{corollary}
  1259 
  1379 
  1260   \noindent To sum up, we have shown that minimal elements according
  1380 
  1261   to the ordering by Okui and Suzuki are exactly the @{text POSIX}
  1381 
  1262   values we defined inductively in Section~\ref{posixsec} 
  1382   \noindent To sum up, we have shown that the (unique) minimal elements 
  1263 
  1383   of the ordering by Okui and Suzuki are exactly the @{text POSIX}
  1264 
  1384   values we defined inductively in Section~\ref{posixsec}. This provides
  1265    IS THE minimal element unique? We have not shown totality.
  1385   an independent confirmation that our ternary relation formalise the
       
  1386   informal POSIX rules. 
       
  1387 
  1266 *}
  1388 *}
  1267 
  1389 
  1268 section {* Optimisations *}
  1390 section {* Optimisations *}
  1269 
  1391 
  1270 text {*
  1392 text {*
  1434 
  1556 
  1435 text {*
  1557 text {*
  1436 
  1558 
  1437   A strong point in favour of
  1559   A strong point in favour of
  1438   Sulzmann and Lu's algorithm is that it can be extended in various
  1560   Sulzmann and Lu's algorithm is that it can be extended in various
  1439   ways.
  1561   ways.  If we are interested in tokenising a string, then we need to not just
  1440 
       
  1441   If we are interested in tokenising a string, then we need to not just
       
  1442   split up the string into tokens, but also ``classify'' the tokens (for
  1562   split up the string into tokens, but also ``classify'' the tokens (for
  1443   example whether it is a keyword or an identifier). This can be done with
  1563   example whether it is a keyword or an identifier). This can be done with
  1444   only minor modifications to the algorithm by introducing \emph{record
  1564   only minor modifications to the algorithm by introducing \emph{record
  1445   regular expressions} and \emph{record values} (for example
  1565   regular expressions} and \emph{record values} (for example
  1446   \cite{Sulzmann2014b}):
  1566   \cite{Sulzmann2014b}):