thys/Paper/Paper.thy
changeset 112 698967eceaf1
parent 111 289728193164
child 113 90fe1a1d7d0e
equal deleted inserted replaced
111:289728193164 112:698967eceaf1
    84 experience of doing our proofs has been that this mechanical checking was
    84 experience of doing our proofs has been that this mechanical checking was
    85 absolutely essential: this subject area has hidden snares. This was also
    85 absolutely essential: this subject area has hidden snares. This was also
    86 noted by Kuklewitz \cite{Kuklewicz} who found that nearly all POSIX matching
    86 noted by Kuklewitz \cite{Kuklewicz} who found that nearly all POSIX matching
    87 implementations are ``buggy'' \cite[Page 203]{Sulzmann2014}.
    87 implementations are ``buggy'' \cite[Page 203]{Sulzmann2014}.
    88 
    88 
    89 If a regular expression matches a string, then in general there are more
    89 If a regular expression matches a string, then in general there is more
    90 than one way of how the string is matched. There are two commonly used
    90 than one way of how the string is matched. There are two commonly used
    91 disambiguation strategies to generate a unique answer: one is called GREEDY
    91 disambiguation strategies to generate a unique answer: one is called GREEDY
    92 matching \cite{Frisch2004} and the other is POSIX
    92 matching \cite{Frisch2004} and the other is POSIX
    93 matching~\cite{Kuklewicz,Sulzmann2014}. For example consider the string
    93 matching~\cite{Kuklewicz,Sulzmann2014}. For example consider the string
    94 @{term xy} and the regular expression \mbox{@{term "STAR (ALT (ALT x y) xy)"}}.
    94 @{term xy} and the regular expression \mbox{@{term "STAR (ALT (ALT x y)
    95 Either the string can be matched in two `iterations' by the single
    95 xy)"}}. Either the string can be matched in two `iterations' by the single
    96 letter-regular expressions @{term x} and @{term y}, or directly in one
    96 letter-regular expressions @{term x} and @{term y}, or directly in one
    97 iteration by @{term xy}. The first case corresponds to GREEDY matching,
    97 iteration by @{term xy}. The first case corresponds to GREEDY matching,
    98 which first matches with the left-most symbol and only matches the next
    98 which first matches with the left-most symbol and only matches the next
    99 symbol in case of a mismatch (this is greedy in the sense of preferring
    99 symbol in case of a mismatch (this is greedy in the sense of preferring
   100 instant gratification to delayed repletion). The second case is POSIX
   100 instant gratification to delayed repletion). The second case is POSIX
   101 matching, which prefers the longest match.
   101 matching, which prefers the longest match.
   102 
   102 
   103 In the context of lexing, where an input string needs to be separated into a
   103 In the context of lexing, where an input string needs to be split up into a
   104 sequence of tokens, POSIX is the more natural disambiguation strategy for
   104 sequence of tokens, POSIX is the more natural disambiguation strategy for
   105 what programmers consider basic syntactic building blocks in their programs.
   105 what programmers consider basic syntactic building blocks in their programs.
   106 These building blocks are often specified by some regular expressions, say @{text
   106 These building blocks are often specified by some regular expressions, say
   107 "r\<^bsub>key\<^esub>"} and @{text "r\<^bsub>id\<^esub>"} for recognising
   107 @{text "r\<^bsub>key\<^esub>"} and @{text "r\<^bsub>id\<^esub>"} for recognising keywords and
   108 keywords and identifiers, respectively. There are two underlying rules
   108 identifiers, respectively. There are two underlying (informal) rules behind
   109 behind tokenising a string in a POSIX fashion:
   109 tokenising a string in a POSIX fashion:
   110 
   110 
   111 \begin{itemize} 
   111 \begin{itemize} 
   112 \item[$\bullet$] \underline{The Longest Match Rule (or ``maximal munch rule''):}
   112 \item[$\bullet$] \underline{The Longest Match Rule (or ``maximal munch rule''):}
   113 
   113 
   114 The longest initial substring matched by any regular expression is taken as
   114 The longest initial substring matched by any regular expression is taken as
   192   @{thm (lhs) L.simps(6)} & $\dn$ & @{thm (rhs) L.simps(6)}\\
   192   @{thm (lhs) L.simps(6)} & $\dn$ & @{thm (rhs) L.simps(6)}\\
   193   \end{tabular}
   193   \end{tabular}
   194   \end{center}
   194   \end{center}
   195   
   195   
   196   \noindent In the fourth clause we use @{term "DUMMY ;; DUMMY"} for the
   196   \noindent In the fourth clause we use @{term "DUMMY ;; DUMMY"} for the
   197   concatenation of two languages. We use the star-notation for regular
   197   concatenation of two languages (it is also list-append for strings). We
   198   expressions and sets of strings (in the last clause). The star on sets is
   198   use the star-notation for regular expressions and also for languages (in
   199   defined inductively as usual by two clauses for the empty string being in
   199   the last clause). The star for languages is defined inductively as usual
   200   the star of a language and is @{term "s\<^sub>1"} is in a language and
   200   by two clauses for the empty string being in the star of a language and if
   201   @{term "s\<^sub>2"} and in the star of this language then also @{term
   201   @{term "s\<^sub>1"} is in a language and @{term "s\<^sub>2"} in the star of this
   202   "s\<^sub>1 @ s\<^sub>2"} is in the star of this language.
   202   language, then also @{term "s\<^sub>1 @ s\<^sub>2"} is in the star of this language.
   203   
   203   
   204   \emph{Semantic derivatives} of sets of strings are defined as
   204   \emph{Semantic derivatives} of sets of strings are defined as
   205 
   205 
   206   \begin{center}
   206   \begin{center}
   207   \begin{tabular}{lcl}
   207   \begin{tabular}{lcl}
   208   @{thm (lhs) Der_def} & $\dn$ & @{thm (rhs) Der_def}\\
   208   @{thm (lhs) Der_def} & $\dn$ & @{thm (rhs) Der_def}\\
   209   \end{tabular}
   209   \end{tabular}
   210   \end{center}
   210   \end{center}
   211   
   211   
   212   \noindent where the second definitions lifts the notion of semantic
   212   
   213   derivatives from characters to strings.  
       
   214 
   213 
   215   \noindent
   214   \noindent
   216   The nullable function
   215   The nullable function
   217 
   216 
   218   \begin{center}
   217   \begin{center}
   228 
   227 
   229   \noindent
   228   \noindent
   230   The derivative function for characters and strings
   229   The derivative function for characters and strings
   231 
   230 
   232   \begin{center}
   231   \begin{center}
   233   \begin{tabular}{lcp{7.5cm}}
   232   \begin{tabular}{lcp{8cm}}
   234   @{thm (lhs) der.simps(1)} & $\dn$ & @{thm (rhs) der.simps(1)}\\
   233   @{thm (lhs) der.simps(1)} & $\dn$ & @{thm (rhs) der.simps(1)}\\
   235   @{thm (lhs) der.simps(2)} & $\dn$ & @{thm (rhs) der.simps(2)}\\
   234   @{thm (lhs) der.simps(2)} & $\dn$ & @{thm (rhs) der.simps(2)}\\
   236   @{thm (lhs) der.simps(3)} & $\dn$ & @{thm (rhs) der.simps(3)}\\
   235   @{thm (lhs) der.simps(3)} & $\dn$ & @{thm (rhs) der.simps(3)}\\
   237   @{thm (lhs) der.simps(4)[of c "r\<^sub>1" "r\<^sub>2"]} & $\dn$ & @{thm (rhs) der.simps(4)[of c "r\<^sub>1" "r\<^sub>2"]}\\
   236   @{thm (lhs) der.simps(4)[of c "r\<^sub>1" "r\<^sub>2"]} & $\dn$ & @{thm (rhs) der.simps(4)[of c "r\<^sub>1" "r\<^sub>2"]}\\
   238   @{thm (lhs) der.simps(5)[of c "r\<^sub>1" "r\<^sub>2"]} & $\dn$ & @{thm (rhs) der.simps(5)[of c "r\<^sub>1" "r\<^sub>2"]}\\
   237   @{thm (lhs) der.simps(5)[of c "r\<^sub>1" "r\<^sub>2"]} & $\dn$ & @{thm (rhs) der.simps(5)[of c "r\<^sub>1" "r\<^sub>2"]}\\
   286 +_2$, $R tl\circ$ and $R tl*$ (as they are now called). By contrast,
   285 +_2$, $R tl\circ$ and $R tl*$ (as they are now called). By contrast,
   287 \cite{Sulzmann2014} defines a relation between values and argues that there is a
   286 \cite{Sulzmann2014} defines a relation between values and argues that there is a
   288 maximum value, as given by the derivative-based algorithm yet to be spelt
   287 maximum value, as given by the derivative-based algorithm yet to be spelt
   289 out. The relation we define is ternary, relating strings, values and regular
   288 out. The relation we define is ternary, relating strings, values and regular
   290 expressions.
   289 expressions.
       
   290 
       
   291 Our Posix relation @{term "s \<in> r \<rightarrow> v"}
       
   292 
       
   293   \begin{center}
       
   294   \begin{tabular}{c}
       
   295   @{thm[mode=Axiom] PMatch.intros(1)} \qquad
       
   296   @{thm[mode=Axiom] PMatch.intros(2)}\medskip\\
       
   297   @{thm[mode=Rule] PMatch.intros(3)[of "s" "r\<^sub>1" "v" "r\<^sub>2"]}\qquad
       
   298   @{thm[mode=Rule] PMatch.intros(4)[of "s" "r\<^sub>2" "v" "r\<^sub>1"]}\medskip\\
       
   299   \multicolumn{1}{p{5cm}}{@{thm[mode=Rule] PMatch.intros(5)[of "s\<^sub>1" "r\<^sub>1" "v\<^sub>1" "s\<^sub>2" "r\<^sub>2" "v\<^sub>2"]}}\medskip\\
       
   300   @{thm[mode=Rule] PMatch.intros(6)[of "s\<^sub>1" "r" "v" "s\<^sub>2" "vs"]}\medskip\\
       
   301   @{thm[mode=Axiom] PMatch.intros(7)}\medskip\\
       
   302   \end{tabular}
       
   303   \end{center}
   291 
   304 
   292 *}
   305 *}
   293 
   306 
   294 section {* The Argument by Sulzmmann and Lu *}
   307 section {* The Argument by Sulzmmann and Lu *}
   295 
   308 
   384   This relation for \emph{non-problematic} is written @{term "\<Turnstile> v : r"}.
   397   This relation for \emph{non-problematic} is written @{term "\<Turnstile> v : r"}.
   385   \bigskip
   398   \bigskip
   386 
   399 
   387 
   400 
   388   \noindent
   401   \noindent
   389   Our Posix relation @{term "s \<in> r \<rightarrow> v"}
   402   
   390 
       
   391   \begin{center}
       
   392   \begin{tabular}{c}
       
   393   @{thm[mode=Axiom] PMatch.intros(1)} \qquad
       
   394   @{thm[mode=Axiom] PMatch.intros(2)}\medskip\\
       
   395   @{thm[mode=Rule] PMatch.intros(3)[of "s" "r\<^sub>1" "v" "r\<^sub>2"]}\qquad
       
   396   @{thm[mode=Rule] PMatch.intros(4)[of "s" "r\<^sub>2" "v" "r\<^sub>1"]}\medskip\\
       
   397   \multicolumn{1}{p{5cm}}{@{thm[mode=Rule] PMatch.intros(5)[of "s\<^sub>1" "r\<^sub>1" "v\<^sub>1" "s\<^sub>2" "r\<^sub>2" "v\<^sub>2"]}}\medskip\\
       
   398   @{thm[mode=Rule] PMatch.intros(6)[of "s\<^sub>1" "r" "v" "s\<^sub>2" "vs"]}\medskip\\
       
   399   @{thm[mode=Axiom] PMatch.intros(7)}\medskip\\
       
   400   \end{tabular}
       
   401   \end{center}
       
   402 
   403 
   403   \noindent
   404   \noindent
   404   Our version of Sulzmann's ordering relation
   405   Our version of Sulzmann's ordering relation
   405 
   406 
   406   \begin{center}
   407   \begin{center}