lexing: comparison thys/Paper/Paper.thy

equal deleted inserted replaced

-:289728193164
+:698967eceaf1
 experience of doing our proofs has been that this mechanical checking was
 absolutely essential: this subject area has hidden snares. This was also
 noted by Kuklewitz \cite{Kuklewicz} who found that nearly all POSIX matching
 implementations are ``buggy'' \cite[Page 203]{Sulzmann2014}.
-If a regular expression matches a string, then in general there are more
+If a regular expression matches a string, then in general there is more
 than one way of how the string is matched. There are two commonly used
 disambiguation strategies to generate a unique answer: one is called GREEDY
 matching \cite{Frisch2004} and the other is POSIX
 matching~\cite{Kuklewicz,Sulzmann2014}. For example consider the string
-@{term xy} and the regular expression \mbox{@{term "STAR (ALT (ALT x y) xy)"}}.
+@{term xy} and the regular expression \mbox{@{term "STAR (ALT (ALT x y)
-Either the string can be matched in two `iterations' by the single
+xy)"}}. Either the string can be matched in two `iterations' by the single
 letter-regular expressions @{term x} and @{term y}, or directly in one
 iteration by @{term xy}. The first case corresponds to GREEDY matching,
 which first matches with the left-most symbol and only matches the next
 symbol in case of a mismatch (this is greedy in the sense of preferring
 instant gratification to delayed repletion). The second case is POSIX
 matching, which prefers the longest match.
-In the context of lexing, where an input string needs to be separated into a
+In the context of lexing, where an input string needs to be split up into a
 sequence of tokens, POSIX is the more natural disambiguation strategy for
 what programmers consider basic syntactic building blocks in their programs.
-These building blocks are often specified by some regular expressions, say @{text
+These building blocks are often specified by some regular expressions, say
-"r\<^bsub>key\<^esub>"} and @{text "r\<^bsub>id\<^esub>"} for recognising
+@{text "r\<^bsub>key\<^esub>"} and @{text "r\<^bsub>id\<^esub>"} for recognising keywords and
-keywords and identifiers, respectively. There are two underlying rules
+identifiers, respectively. There are two underlying (informal) rules behind
-behind tokenising a string in a POSIX fashion:
+tokenising a string in a POSIX fashion:
 \begin{itemize}
 \item[$\bullet$] \underline{The Longest Match Rule (or ``maximal munch rule''):}
 The longest initial substring matched by any regular expression is taken as
 @{thm (lhs) L.simps(6)} & $\dn$ & @{thm (rhs) L.simps(6)}\\
 \end{tabular}
 \end{center}
 \noindent In the fourth clause we use @{term "DUMMY ;; DUMMY"} for the
-concatenation of two languages. We use the star-notation for regular
+concatenation of two languages (it is also list-append for strings). We
-expressions and sets of strings (in the last clause). The star on sets is
+use the star-notation for regular expressions and also for languages (in
-defined inductively as usual by two clauses for the empty string being in
+the last clause). The star for languages is defined inductively as usual
-the star of a language and is @{term "s\<^sub>1"} is in a language and
+by two clauses for the empty string being in the star of a language and if
-@{term "s\<^sub>2"} and in the star of this language then also @{term
+@{term "s\<^sub>1"} is in a language and @{term "s\<^sub>2"} in the star of this
-"s\<^sub>1 @ s\<^sub>2"} is in the star of this language.
+language, then also @{term "s\<^sub>1 @ s\<^sub>2"} is in the star of this language.
 \emph{Semantic derivatives} of sets of strings are defined as
 \begin{center}
 \begin{tabular}{lcl}
 @{thm (lhs) Der_def} & $\dn$ & @{thm (rhs) Der_def}\\
 \end{tabular}
 \end{center}
-\noindent where the second definitions lifts the notion of semantic
-derivatives from characters to strings.
 \noindent
 The nullable function
 \begin{center}
 \noindent
 The derivative function for characters and strings
 \begin{center}
-\begin{tabular}{lcp{7.5cm}}
+\begin{tabular}{lcp{8cm}}
 @{thm (lhs) der.simps(1)} & $\dn$ & @{thm (rhs) der.simps(1)}\\
 @{thm (lhs) der.simps(2)} & $\dn$ & @{thm (rhs) der.simps(2)}\\
 @{thm (lhs) der.simps(3)} & $\dn$ & @{thm (rhs) der.simps(3)}\\
 @{thm (lhs) der.simps(4)[of c "r\<^sub>1" "r\<^sub>2"]} & $\dn$ & @{thm (rhs) der.simps(4)[of c "r\<^sub>1" "r\<^sub>2"]}\\
 @{thm (lhs) der.simps(5)[of c "r\<^sub>1" "r\<^sub>2"]} & $\dn$ & @{thm (rhs) der.simps(5)[of c "r\<^sub>1" "r\<^sub>2"]}\\
 +_2$, $R tl\circ$ and $R tl*$ (as they are now called). By contrast,
 \cite{Sulzmann2014} defines a relation between values and argues that there is a
 maximum value, as given by the derivative-based algorithm yet to be spelt
 out. The relation we define is ternary, relating strings, values and regular
 expressions.
+Our Posix relation @{term "s \<in> r \<rightarrow> v"}
+\begin{center}
+\begin{tabular}{c}
+@{thm[mode=Axiom] PMatch.intros(1)} \qquad
+@{thm[mode=Axiom] PMatch.intros(2)}\medskip\\
+@{thm[mode=Rule] PMatch.intros(3)[of "s" "r\<^sub>1" "v" "r\<^sub>2"]}\qquad
+@{thm[mode=Rule] PMatch.intros(4)[of "s" "r\<^sub>2" "v" "r\<^sub>1"]}\medskip\\
+\multicolumn{1}{p{5cm}}{@{thm[mode=Rule] PMatch.intros(5)[of "s\<^sub>1" "r\<^sub>1" "v\<^sub>1" "s\<^sub>2" "r\<^sub>2" "v\<^sub>2"]}}\medskip\\
+@{thm[mode=Rule] PMatch.intros(6)[of "s\<^sub>1" "r" "v" "s\<^sub>2" "vs"]}\medskip\\
+@{thm[mode=Axiom] PMatch.intros(7)}\medskip\\
+\end{tabular}
+\end{center}
 *}
 section {* The Argument by Sulzmmann and Lu *}
 This relation for \emph{non-problematic} is written @{term "\<Turnstile> v : r"}.
 \bigskip
 \noindent
-Our Posix relation @{term "s \<in> r \<rightarrow> v"}
-\begin{center}
-\begin{tabular}{c}
-@{thm[mode=Axiom] PMatch.intros(1)} \qquad
-@{thm[mode=Axiom] PMatch.intros(2)}\medskip\\
-@{thm[mode=Rule] PMatch.intros(3)[of "s" "r\<^sub>1" "v" "r\<^sub>2"]}\qquad
-@{thm[mode=Rule] PMatch.intros(4)[of "s" "r\<^sub>2" "v" "r\<^sub>1"]}\medskip\\
-\multicolumn{1}{p{5cm}}{@{thm[mode=Rule] PMatch.intros(5)[of "s\<^sub>1" "r\<^sub>1" "v\<^sub>1" "s\<^sub>2" "r\<^sub>2" "v\<^sub>2"]}}\medskip\\
-@{thm[mode=Rule] PMatch.intros(6)[of "s\<^sub>1" "r" "v" "s\<^sub>2" "vs"]}\medskip\\
-@{thm[mode=Axiom] PMatch.intros(7)}\medskip\\
-\end{tabular}
-\end{center}
 \noindent
 Our version of Sulzmann's ordering relation
 \begin{center}

changeset 112	698967eceaf1
parent 111	289728193164
child 113	90fe1a1d7d0e