diff -r 79efc0bcfc96 -r 71e26f43c896 thys/Paper/Paper.thy --- a/thys/Paper/Paper.thy Sun Mar 06 18:40:43 2016 +0000 +++ b/thys/Paper/Paper.thy Sun Mar 06 20:00:47 2016 +0000 @@ -139,7 +139,7 @@ The longest initial substring matched by any regular expression is taken as next token.\smallskip -\item[$\bullet$] \underline{Rule Priority:} +\item[$\bullet$] \underline{Priority Rule:} For a particular longest initial substring, the first regular expression that can match determines the token. @@ -368,7 +368,7 @@ \noindent Note that no values are associated with the regular expression @{term ZERO}, and that the only value associated with the regular expression @{term ONE} is @{term Void}, pronounced (if one must) as {\em - ``Void''}. It is routine to stablish how values ``inhabiting'' a regular + ``Void''}. It is routine to establish how values ``inhabiting'' a regular expression correspond to the language of a regular expression, namely \begin{proposition} @@ -526,35 +526,24 @@ \end{tabular} \end{center} - - - NOT DONE YET - - Therefore there are, for example, three - cases for sequence regular expressions (for all possible shapes of the - value). - - Again the virtues of this algorithm is that it can be - implemented with ease in a functional programming language and - also in Isabelle/HOL. + \noindent If the regular expression does not match, @{const None} is + returned. If the regular expression does match the string, then @{const + Some} value is returned. Again the virtues of this algorithm is that it + can be implemented with ease in a functional programming language and also + in Isabelle/HOL. In the remaining part of this section we prove that + this algorithm is correct. -The well-known idea of POSIX lexing is informally defined in (for example) -\cite{posix}; as correctly argued in \cite{Sulzmann2014}, this needs formal -specification. The rough idea is that, in contrast to the so-called GREEDY -algorithm, POSIX lexing chooses to match more deeply and using left choices -rather than a right choices. For example, note that to match the string -@{term "[a, b]"} with the regular expression $(a + \mts)\circ (b+ab)$ the matching -will return $( Void, Right(ab))$ rather than $(Left\ a, Left\ b)$. [The -regular expression $ab$ is short for $(Lit\ a) \circ (Lit\ b)$.] Similarly, -to match {\em ``a''} with $(a+a)$ the leftmost $a$ will be chosen. + The well-known idea of POSIX matching is informally defined by the longest + match and priority rule; as correctly argued in \cite{Sulzmann2014}, this + needs formal specification. -We use a simple inductive definition to specify this notion, incorporating -the POSIX-specific choices into the side-conditions for the rules $R tl -+_2$, $R tl\circ$ and $R tl*$ (as they are now called). By contrast, -\cite{Sulzmann2014} defines a relation between values and argues that there is a -maximum value, as given by the derivative-based algorithm yet to be spelt -out. The relation we define is ternary, relating strings, values and regular -expressions. + We use a simple inductive definition to specify this notion, incorporating + the POSIX-specific choices into the side-conditions for the rules $R tl + +_2$, $R tl\circ$ and $R tl*$ (as they are now called). By contrast, + \cite{Sulzmann2014} defines a relation between values and argues that there is a + maximum value, as given by the derivative-based algorithm yet to be spelt + out. The relation we define is ternary, relating strings, values and regular + expressions. Our Posix relation @{term "s \ r \ v"}