# HG changeset patch # User Christian Urban # Date 1456993359 0 # Node ID 698967eceaf152efee5b152f276e7a9e6e135d0d # Parent 28972819316481854d6fa93761ede3205fe9fcdc updated diff -r 289728193164 -r 698967eceaf1 thys/Paper/Paper.thy --- a/thys/Paper/Paper.thy Wed Mar 02 14:35:50 2016 +0000 +++ b/thys/Paper/Paper.thy Thu Mar 03 08:22:39 2016 +0000 @@ -86,13 +86,13 @@ noted by Kuklewitz \cite{Kuklewicz} who found that nearly all POSIX matching implementations are ``buggy'' \cite[Page 203]{Sulzmann2014}. -If a regular expression matches a string, then in general there are more +If a regular expression matches a string, then in general there is more than one way of how the string is matched. There are two commonly used disambiguation strategies to generate a unique answer: one is called GREEDY matching \cite{Frisch2004} and the other is POSIX matching~\cite{Kuklewicz,Sulzmann2014}. For example consider the string -@{term xy} and the regular expression \mbox{@{term "STAR (ALT (ALT x y) xy)"}}. -Either the string can be matched in two `iterations' by the single +@{term xy} and the regular expression \mbox{@{term "STAR (ALT (ALT x y) +xy)"}}. Either the string can be matched in two `iterations' by the single letter-regular expressions @{term x} and @{term y}, or directly in one iteration by @{term xy}. The first case corresponds to GREEDY matching, which first matches with the left-most symbol and only matches the next @@ -100,13 +100,13 @@ instant gratification to delayed repletion). The second case is POSIX matching, which prefers the longest match. -In the context of lexing, where an input string needs to be separated into a +In the context of lexing, where an input string needs to be split up into a sequence of tokens, POSIX is the more natural disambiguation strategy for what programmers consider basic syntactic building blocks in their programs. -These building blocks are often specified by some regular expressions, say @{text -"r\<^bsub>key\<^esub>"} and @{text "r\<^bsub>id\<^esub>"} for recognising -keywords and identifiers, respectively. There are two underlying rules -behind tokenising a string in a POSIX fashion: +These building blocks are often specified by some regular expressions, say +@{text "r\<^bsub>key\<^esub>"} and @{text "r\<^bsub>id\<^esub>"} for recognising keywords and +identifiers, respectively. There are two underlying (informal) rules behind +tokenising a string in a POSIX fashion: \begin{itemize} \item[$\bullet$] \underline{The Longest Match Rule (or ``maximal munch rule''):} @@ -194,12 +194,12 @@ \end{center} \noindent In the fourth clause we use @{term "DUMMY ;; DUMMY"} for the - concatenation of two languages. We use the star-notation for regular - expressions and sets of strings (in the last clause). The star on sets is - defined inductively as usual by two clauses for the empty string being in - the star of a language and is @{term "s\<^sub>1"} is in a language and - @{term "s\<^sub>2"} and in the star of this language then also @{term - "s\<^sub>1 @ s\<^sub>2"} is in the star of this language. + concatenation of two languages (it is also list-append for strings). We + use the star-notation for regular expressions and also for languages (in + the last clause). The star for languages is defined inductively as usual + by two clauses for the empty string being in the star of a language and if + @{term "s\<^sub>1"} is in a language and @{term "s\<^sub>2"} in the star of this + language, then also @{term "s\<^sub>1 @ s\<^sub>2"} is in the star of this language. \emph{Semantic derivatives} of sets of strings are defined as @@ -209,8 +209,7 @@ \end{tabular} \end{center} - \noindent where the second definitions lifts the notion of semantic - derivatives from characters to strings. + \noindent The nullable function @@ -230,7 +229,7 @@ The derivative function for characters and strings \begin{center} - \begin{tabular}{lcp{7.5cm}} + \begin{tabular}{lcp{8cm}} @{thm (lhs) der.simps(1)} & $\dn$ & @{thm (rhs) der.simps(1)}\\ @{thm (lhs) der.simps(2)} & $\dn$ & @{thm (rhs) der.simps(2)}\\ @{thm (lhs) der.simps(3)} & $\dn$ & @{thm (rhs) der.simps(3)}\\ @@ -289,6 +288,20 @@ out. The relation we define is ternary, relating strings, values and regular expressions. +Our Posix relation @{term "s \ r \ v"} + + \begin{center} + \begin{tabular}{c} + @{thm[mode=Axiom] PMatch.intros(1)} \qquad + @{thm[mode=Axiom] PMatch.intros(2)}\medskip\\ + @{thm[mode=Rule] PMatch.intros(3)[of "s" "r\<^sub>1" "v" "r\<^sub>2"]}\qquad + @{thm[mode=Rule] PMatch.intros(4)[of "s" "r\<^sub>2" "v" "r\<^sub>1"]}\medskip\\ + \multicolumn{1}{p{5cm}}{@{thm[mode=Rule] PMatch.intros(5)[of "s\<^sub>1" "r\<^sub>1" "v\<^sub>1" "s\<^sub>2" "r\<^sub>2" "v\<^sub>2"]}}\medskip\\ + @{thm[mode=Rule] PMatch.intros(6)[of "s\<^sub>1" "r" "v" "s\<^sub>2" "vs"]}\medskip\\ + @{thm[mode=Axiom] PMatch.intros(7)}\medskip\\ + \end{tabular} + \end{center} + *} section {* The Argument by Sulzmmann and Lu *} @@ -386,19 +399,7 @@ \noindent - Our Posix relation @{term "s \ r \ v"} - - \begin{center} - \begin{tabular}{c} - @{thm[mode=Axiom] PMatch.intros(1)} \qquad - @{thm[mode=Axiom] PMatch.intros(2)}\medskip\\ - @{thm[mode=Rule] PMatch.intros(3)[of "s" "r\<^sub>1" "v" "r\<^sub>2"]}\qquad - @{thm[mode=Rule] PMatch.intros(4)[of "s" "r\<^sub>2" "v" "r\<^sub>1"]}\medskip\\ - \multicolumn{1}{p{5cm}}{@{thm[mode=Rule] PMatch.intros(5)[of "s\<^sub>1" "r\<^sub>1" "v\<^sub>1" "s\<^sub>2" "r\<^sub>2" "v\<^sub>2"]}}\medskip\\ - @{thm[mode=Rule] PMatch.intros(6)[of "s\<^sub>1" "r" "v" "s\<^sub>2" "vs"]}\medskip\\ - @{thm[mode=Axiom] PMatch.intros(7)}\medskip\\ - \end{tabular} - \end{center} + \noindent Our version of Sulzmann's ordering relation diff -r 289728193164 -r 698967eceaf1 thys/ReStar.thy --- a/thys/ReStar.thy Wed Mar 02 14:35:50 2016 +0000 +++ b/thys/ReStar.thy Thu Mar 03 08:22:39 2016 +0000 @@ -29,7 +29,7 @@ definition Der :: "char \ string set \ string set" where - "Der c A \ {s. [c] @ s \ A}" + "Der c A \ {s. c # s \ A}" lemma Der_null [simp]: shows "Der c {} = {}" diff -r 289728193164 -r 698967eceaf1 thys/paper.pdf Binary file thys/paper.pdf has changed