thys/Paper/Paper.thy
changeset 119 71e26f43c896
parent 118 79efc0bcfc96
child 120 d74bfa11802c
equal deleted inserted replaced
118:79efc0bcfc96 119:71e26f43c896
   137 \item[$\bullet$] \underline{The Longest Match Rule (or ``maximal munch rule''):}
   137 \item[$\bullet$] \underline{The Longest Match Rule (or ``maximal munch rule''):}
   138 
   138 
   139 The longest initial substring matched by any regular expression is taken as
   139 The longest initial substring matched by any regular expression is taken as
   140 next token.\smallskip
   140 next token.\smallskip
   141 
   141 
   142 \item[$\bullet$] \underline{Rule Priority:}
   142 \item[$\bullet$] \underline{Priority Rule:}
   143 
   143 
   144 For a particular longest initial substring, the first regular expression
   144 For a particular longest initial substring, the first regular expression
   145 that can match determines the token.
   145 that can match determines the token.
   146 \end{itemize}
   146 \end{itemize}
   147  
   147  
   366   \end{center}
   366   \end{center}
   367 
   367 
   368   \noindent Note that no values are associated with the regular expression
   368   \noindent Note that no values are associated with the regular expression
   369   @{term ZERO}, and that the only value associated with the regular
   369   @{term ZERO}, and that the only value associated with the regular
   370   expression @{term ONE} is @{term Void}, pronounced (if one must) as {\em
   370   expression @{term ONE} is @{term Void}, pronounced (if one must) as {\em
   371   ``Void''}. It is routine to stablish how values ``inhabiting'' a regular
   371   ``Void''}. It is routine to establish how values ``inhabiting'' a regular
   372   expression correspond to the language of a regular expression, namely
   372   expression correspond to the language of a regular expression, namely
   373 
   373 
   374   \begin{proposition}
   374   \begin{proposition}
   375   @{thm L_flat_Prf}
   375   @{thm L_flat_Prf}
   376   \end{proposition}
   376   \end{proposition}
   524                      & & \phantom{$|$} @{term "None"}  @{text "\<Rightarrow>"} @{term None}\\
   524                      & & \phantom{$|$} @{term "None"}  @{text "\<Rightarrow>"} @{term None}\\
   525                      & & $|$ @{term "Some v"} @{text "\<Rightarrow>"} @{term "Some (injval r c v)"}                          
   525                      & & $|$ @{term "Some v"} @{text "\<Rightarrow>"} @{term "Some (injval r c v)"}                          
   526   \end{tabular}
   526   \end{tabular}
   527   \end{center}
   527   \end{center}
   528 
   528 
   529 
   529   \noindent If the regular expression does not match, @{const None} is
   530 
   530   returned. If the regular expression does match the string, then @{const
   531   NOT DONE YET 
   531   Some} value is returned. Again the virtues of this algorithm is that it
   532 
   532   can be implemented with ease in a functional programming language and also
   533   Therefore there are, for example, three
   533   in Isabelle/HOL. In the remaining part of this section we prove that
   534   cases for sequence regular expressions (for all possible shapes of the
   534   this algorithm is correct.
   535   value).
   535 
   536  
   536   The well-known idea of POSIX matching is informally defined by the longest
   537   Again the virtues of this algorithm is that it can be
   537   match and priority rule; as correctly argued in \cite{Sulzmann2014}, this
   538   implemented with ease in a functional programming language and
   538   needs formal specification. 
   539   also in Isabelle/HOL.
   539 
   540 
   540   We use a simple inductive definition to specify this notion, incorporating
   541 The well-known idea of POSIX lexing is informally defined in (for example)
   541   the POSIX-specific choices into the side-conditions for the rules $R tl
   542 \cite{posix}; as correctly argued in \cite{Sulzmann2014}, this needs formal
   542   +_2$, $R tl\circ$ and $R tl*$ (as they are now called). By contrast,
   543 specification. The rough idea is that, in contrast to the so-called GREEDY
   543   \cite{Sulzmann2014} defines a relation between values and argues that there is a
   544 algorithm, POSIX lexing chooses to match more deeply and using left choices
   544   maximum value, as given by the derivative-based algorithm yet to be spelt
   545 rather than a right choices. For example, note that to match the string 
   545   out. The relation we define is ternary, relating strings, values and regular
   546 @{term "[a, b]"} with the regular expression $(a + \mts)\circ (b+ab)$ the matching
   546   expressions.
   547 will return $( Void, Right(ab))$ rather than $(Left\ a, Left\ b)$. [The
       
   548 regular expression $ab$ is short for $(Lit\ a) \circ (Lit\ b)$.] Similarly,
       
   549 to match {\em ``a''} with $(a+a)$ the leftmost $a$ will be chosen.
       
   550 
       
   551 We use a simple inductive definition to specify this notion, incorporating
       
   552 the POSIX-specific choices into the side-conditions for the rules $R tl
       
   553 +_2$, $R tl\circ$ and $R tl*$ (as they are now called). By contrast,
       
   554 \cite{Sulzmann2014} defines a relation between values and argues that there is a
       
   555 maximum value, as given by the derivative-based algorithm yet to be spelt
       
   556 out. The relation we define is ternary, relating strings, values and regular
       
   557 expressions.
       
   558 
   547 
   559 Our Posix relation @{term "s \<in> r \<rightarrow> v"}
   548 Our Posix relation @{term "s \<in> r \<rightarrow> v"}
   560 
   549 
   561   \begin{center}
   550   \begin{center}
   562   \begin{tabular}{c}
   551   \begin{tabular}{c}