--- a/thys/Paper/Paper.thy Wed Mar 02 14:35:50 2016 +0000
+++ b/thys/Paper/Paper.thy Thu Mar 03 08:22:39 2016 +0000
@@ -86,13 +86,13 @@
noted by Kuklewitz \cite{Kuklewicz} who found that nearly all POSIX matching
implementations are ``buggy'' \cite[Page 203]{Sulzmann2014}.
-If a regular expression matches a string, then in general there are more
+If a regular expression matches a string, then in general there is more
than one way of how the string is matched. There are two commonly used
disambiguation strategies to generate a unique answer: one is called GREEDY
matching \cite{Frisch2004} and the other is POSIX
matching~\cite{Kuklewicz,Sulzmann2014}. For example consider the string
-@{term xy} and the regular expression \mbox{@{term "STAR (ALT (ALT x y) xy)"}}.
-Either the string can be matched in two `iterations' by the single
+@{term xy} and the regular expression \mbox{@{term "STAR (ALT (ALT x y)
+xy)"}}. Either the string can be matched in two `iterations' by the single
letter-regular expressions @{term x} and @{term y}, or directly in one
iteration by @{term xy}. The first case corresponds to GREEDY matching,
which first matches with the left-most symbol and only matches the next
@@ -100,13 +100,13 @@
instant gratification to delayed repletion). The second case is POSIX
matching, which prefers the longest match.
-In the context of lexing, where an input string needs to be separated into a
+In the context of lexing, where an input string needs to be split up into a
sequence of tokens, POSIX is the more natural disambiguation strategy for
what programmers consider basic syntactic building blocks in their programs.
-These building blocks are often specified by some regular expressions, say @{text
-"r\<^bsub>key\<^esub>"} and @{text "r\<^bsub>id\<^esub>"} for recognising
-keywords and identifiers, respectively. There are two underlying rules
-behind tokenising a string in a POSIX fashion:
+These building blocks are often specified by some regular expressions, say
+@{text "r\<^bsub>key\<^esub>"} and @{text "r\<^bsub>id\<^esub>"} for recognising keywords and
+identifiers, respectively. There are two underlying (informal) rules behind
+tokenising a string in a POSIX fashion:
\begin{itemize}
\item[$\bullet$] \underline{The Longest Match Rule (or ``maximal munch rule''):}
@@ -194,12 +194,12 @@
\end{center}
\noindent In the fourth clause we use @{term "DUMMY ;; DUMMY"} for the
- concatenation of two languages. We use the star-notation for regular
- expressions and sets of strings (in the last clause). The star on sets is
- defined inductively as usual by two clauses for the empty string being in
- the star of a language and is @{term "s\<^sub>1"} is in a language and
- @{term "s\<^sub>2"} and in the star of this language then also @{term
- "s\<^sub>1 @ s\<^sub>2"} is in the star of this language.
+ concatenation of two languages (it is also list-append for strings). We
+ use the star-notation for regular expressions and also for languages (in
+ the last clause). The star for languages is defined inductively as usual
+ by two clauses for the empty string being in the star of a language and if
+ @{term "s\<^sub>1"} is in a language and @{term "s\<^sub>2"} in the star of this
+ language, then also @{term "s\<^sub>1 @ s\<^sub>2"} is in the star of this language.
\emph{Semantic derivatives} of sets of strings are defined as
@@ -209,8 +209,7 @@
\end{tabular}
\end{center}
- \noindent where the second definitions lifts the notion of semantic
- derivatives from characters to strings.
+
\noindent
The nullable function
@@ -230,7 +229,7 @@
The derivative function for characters and strings
\begin{center}
- \begin{tabular}{lcp{7.5cm}}
+ \begin{tabular}{lcp{8cm}}
@{thm (lhs) der.simps(1)} & $\dn$ & @{thm (rhs) der.simps(1)}\\
@{thm (lhs) der.simps(2)} & $\dn$ & @{thm (rhs) der.simps(2)}\\
@{thm (lhs) der.simps(3)} & $\dn$ & @{thm (rhs) der.simps(3)}\\
@@ -289,6 +288,20 @@
out. The relation we define is ternary, relating strings, values and regular
expressions.
+Our Posix relation @{term "s \<in> r \<rightarrow> v"}
+
+ \begin{center}
+ \begin{tabular}{c}
+ @{thm[mode=Axiom] PMatch.intros(1)} \qquad
+ @{thm[mode=Axiom] PMatch.intros(2)}\medskip\\
+ @{thm[mode=Rule] PMatch.intros(3)[of "s" "r\<^sub>1" "v" "r\<^sub>2"]}\qquad
+ @{thm[mode=Rule] PMatch.intros(4)[of "s" "r\<^sub>2" "v" "r\<^sub>1"]}\medskip\\
+ \multicolumn{1}{p{5cm}}{@{thm[mode=Rule] PMatch.intros(5)[of "s\<^sub>1" "r\<^sub>1" "v\<^sub>1" "s\<^sub>2" "r\<^sub>2" "v\<^sub>2"]}}\medskip\\
+ @{thm[mode=Rule] PMatch.intros(6)[of "s\<^sub>1" "r" "v" "s\<^sub>2" "vs"]}\medskip\\
+ @{thm[mode=Axiom] PMatch.intros(7)}\medskip\\
+ \end{tabular}
+ \end{center}
+
*}
section {* The Argument by Sulzmmann and Lu *}
@@ -386,19 +399,7 @@
\noindent
- Our Posix relation @{term "s \<in> r \<rightarrow> v"}
-
- \begin{center}
- \begin{tabular}{c}
- @{thm[mode=Axiom] PMatch.intros(1)} \qquad
- @{thm[mode=Axiom] PMatch.intros(2)}\medskip\\
- @{thm[mode=Rule] PMatch.intros(3)[of "s" "r\<^sub>1" "v" "r\<^sub>2"]}\qquad
- @{thm[mode=Rule] PMatch.intros(4)[of "s" "r\<^sub>2" "v" "r\<^sub>1"]}\medskip\\
- \multicolumn{1}{p{5cm}}{@{thm[mode=Rule] PMatch.intros(5)[of "s\<^sub>1" "r\<^sub>1" "v\<^sub>1" "s\<^sub>2" "r\<^sub>2" "v\<^sub>2"]}}\medskip\\
- @{thm[mode=Rule] PMatch.intros(6)[of "s\<^sub>1" "r" "v" "s\<^sub>2" "vs"]}\medskip\\
- @{thm[mode=Axiom] PMatch.intros(7)}\medskip\\
- \end{tabular}
- \end{center}
+
\noindent
Our version of Sulzmann's ordering relation