thys/Paper/Paper.thy
changeset 112 698967eceaf1
parent 111 289728193164
child 113 90fe1a1d7d0e
--- a/thys/Paper/Paper.thy	Wed Mar 02 14:35:50 2016 +0000
+++ b/thys/Paper/Paper.thy	Thu Mar 03 08:22:39 2016 +0000
@@ -86,13 +86,13 @@
 noted by Kuklewitz \cite{Kuklewicz} who found that nearly all POSIX matching
 implementations are ``buggy'' \cite[Page 203]{Sulzmann2014}.
 
-If a regular expression matches a string, then in general there are more
+If a regular expression matches a string, then in general there is more
 than one way of how the string is matched. There are two commonly used
 disambiguation strategies to generate a unique answer: one is called GREEDY
 matching \cite{Frisch2004} and the other is POSIX
 matching~\cite{Kuklewicz,Sulzmann2014}. For example consider the string
-@{term xy} and the regular expression \mbox{@{term "STAR (ALT (ALT x y) xy)"}}.
-Either the string can be matched in two `iterations' by the single
+@{term xy} and the regular expression \mbox{@{term "STAR (ALT (ALT x y)
+xy)"}}. Either the string can be matched in two `iterations' by the single
 letter-regular expressions @{term x} and @{term y}, or directly in one
 iteration by @{term xy}. The first case corresponds to GREEDY matching,
 which first matches with the left-most symbol and only matches the next
@@ -100,13 +100,13 @@
 instant gratification to delayed repletion). The second case is POSIX
 matching, which prefers the longest match.
 
-In the context of lexing, where an input string needs to be separated into a
+In the context of lexing, where an input string needs to be split up into a
 sequence of tokens, POSIX is the more natural disambiguation strategy for
 what programmers consider basic syntactic building blocks in their programs.
-These building blocks are often specified by some regular expressions, say @{text
-"r\<^bsub>key\<^esub>"} and @{text "r\<^bsub>id\<^esub>"} for recognising
-keywords and identifiers, respectively. There are two underlying rules
-behind tokenising a string in a POSIX fashion:
+These building blocks are often specified by some regular expressions, say
+@{text "r\<^bsub>key\<^esub>"} and @{text "r\<^bsub>id\<^esub>"} for recognising keywords and
+identifiers, respectively. There are two underlying (informal) rules behind
+tokenising a string in a POSIX fashion:
 
 \begin{itemize} 
 \item[$\bullet$] \underline{The Longest Match Rule (or ``maximal munch rule''):}
@@ -194,12 +194,12 @@
   \end{center}
   
   \noindent In the fourth clause we use @{term "DUMMY ;; DUMMY"} for the
-  concatenation of two languages. We use the star-notation for regular
-  expressions and sets of strings (in the last clause). The star on sets is
-  defined inductively as usual by two clauses for the empty string being in
-  the star of a language and is @{term "s\<^sub>1"} is in a language and
-  @{term "s\<^sub>2"} and in the star of this language then also @{term
-  "s\<^sub>1 @ s\<^sub>2"} is in the star of this language.
+  concatenation of two languages (it is also list-append for strings). We
+  use the star-notation for regular expressions and also for languages (in
+  the last clause). The star for languages is defined inductively as usual
+  by two clauses for the empty string being in the star of a language and if
+  @{term "s\<^sub>1"} is in a language and @{term "s\<^sub>2"} in the star of this
+  language, then also @{term "s\<^sub>1 @ s\<^sub>2"} is in the star of this language.
   
   \emph{Semantic derivatives} of sets of strings are defined as
 
@@ -209,8 +209,7 @@
   \end{tabular}
   \end{center}
   
-  \noindent where the second definitions lifts the notion of semantic
-  derivatives from characters to strings.  
+  
 
   \noindent
   The nullable function
@@ -230,7 +229,7 @@
   The derivative function for characters and strings
 
   \begin{center}
-  \begin{tabular}{lcp{7.5cm}}
+  \begin{tabular}{lcp{8cm}}
   @{thm (lhs) der.simps(1)} & $\dn$ & @{thm (rhs) der.simps(1)}\\
   @{thm (lhs) der.simps(2)} & $\dn$ & @{thm (rhs) der.simps(2)}\\
   @{thm (lhs) der.simps(3)} & $\dn$ & @{thm (rhs) der.simps(3)}\\
@@ -289,6 +288,20 @@
 out. The relation we define is ternary, relating strings, values and regular
 expressions.
 
+Our Posix relation @{term "s \<in> r \<rightarrow> v"}
+
+  \begin{center}
+  \begin{tabular}{c}
+  @{thm[mode=Axiom] PMatch.intros(1)} \qquad
+  @{thm[mode=Axiom] PMatch.intros(2)}\medskip\\
+  @{thm[mode=Rule] PMatch.intros(3)[of "s" "r\<^sub>1" "v" "r\<^sub>2"]}\qquad
+  @{thm[mode=Rule] PMatch.intros(4)[of "s" "r\<^sub>2" "v" "r\<^sub>1"]}\medskip\\
+  \multicolumn{1}{p{5cm}}{@{thm[mode=Rule] PMatch.intros(5)[of "s\<^sub>1" "r\<^sub>1" "v\<^sub>1" "s\<^sub>2" "r\<^sub>2" "v\<^sub>2"]}}\medskip\\
+  @{thm[mode=Rule] PMatch.intros(6)[of "s\<^sub>1" "r" "v" "s\<^sub>2" "vs"]}\medskip\\
+  @{thm[mode=Axiom] PMatch.intros(7)}\medskip\\
+  \end{tabular}
+  \end{center}
+
 *}
 
 section {* The Argument by Sulzmmann and Lu *}
@@ -386,19 +399,7 @@
 
 
   \noindent
-  Our Posix relation @{term "s \<in> r \<rightarrow> v"}
-
-  \begin{center}
-  \begin{tabular}{c}
-  @{thm[mode=Axiom] PMatch.intros(1)} \qquad
-  @{thm[mode=Axiom] PMatch.intros(2)}\medskip\\
-  @{thm[mode=Rule] PMatch.intros(3)[of "s" "r\<^sub>1" "v" "r\<^sub>2"]}\qquad
-  @{thm[mode=Rule] PMatch.intros(4)[of "s" "r\<^sub>2" "v" "r\<^sub>1"]}\medskip\\
-  \multicolumn{1}{p{5cm}}{@{thm[mode=Rule] PMatch.intros(5)[of "s\<^sub>1" "r\<^sub>1" "v\<^sub>1" "s\<^sub>2" "r\<^sub>2" "v\<^sub>2"]}}\medskip\\
-  @{thm[mode=Rule] PMatch.intros(6)[of "s\<^sub>1" "r" "v" "s\<^sub>2" "vs"]}\medskip\\
-  @{thm[mode=Axiom] PMatch.intros(7)}\medskip\\
-  \end{tabular}
-  \end{center}
+  
 
   \noindent
   Our version of Sulzmann's ordering relation