lexing: comparison thys/Paper/Paper.thy

equal deleted inserted replaced

-:fee5641c5994
+:fa0d8aa5d7de
 that associates values to regular expressions:
 \begin{center}
 \begin{tabular}{c}
 @{thm[mode=Axiom] Prf.intros(4)} \qquad
-@{thm[mode=Axiom] Prf.intros(5)[of "c"]}\medskip\\
+@{thm[mode=Axiom] Prf.intros(5)[of "c"]}\smallskip\\
 @{thm[mode=Rule] Prf.intros(2)[of "v\<^sub>1" "r\<^sub>1" "r\<^sub>2"]} \qquad
-@{thm[mode=Rule] Prf.intros(3)[of "v\<^sub>2" "r\<^sub>1" "r\<^sub>2"]}\medskip\\
+@{thm[mode=Rule] Prf.intros(3)[of "v\<^sub>2" "r\<^sub>1" "r\<^sub>2"]}\smallskip\\
-@{thm[mode=Rule] Prf.intros(1)[of "v\<^sub>1" "r\<^sub>1" "v\<^sub>2" "r\<^sub>2"]}\medskip\\
+@{thm[mode=Rule] Prf.intros(1)[of "v\<^sub>1" "r\<^sub>1" "v\<^sub>2" "r\<^sub>2"]}\smallskip\\
 @{thm[mode=Axiom] Prf.intros(6)[of "r"]} \qquad
-@{thm[mode=Rule] Prf.intros(7)[of "v" "r" "vs"]}\medskip\\
+@{thm[mode=Rule] Prf.intros(7)[of "v" "r" "vs"]}
 \end{tabular}
 \end{center}
 \noindent Note that no values are associated with the regular expression
 @{term ZERO}, and that the only value associated with the regular
 "[a,b,c]"}. We first build the three derivatives (according to @{term a},
 @{term b} and @{term c}). We then use @{const nullable} to find out
 whether the resulting derivative regular expression @{term "r\<^sub>4"}
 can match the empty string. If yes, we call the function @{const mkeps}
 that produces a value @{term "v\<^sub>4"} for how @{term "r\<^sub>4"} can
-match the empty string (taking into account the POSIX rules in case
+match the empty string (taking into account the POSIX constraints in case
 there are several ways). This functions is defined by the clauses:
 \begin{figure}[t]
 \begin{center}
 \begin{tikzpicture}[scale=2,node distance=1.3cm,
 \draw (r4) node[anchor=north west] {\;\raisebox{-8mm}{@{term "mkeps"}}};
 \end{tikzpicture}
 \end{center}
 \caption{The two phases of the algorithm by Sulzmann \& Lu \cite{Sulzmann2014},
 matching the string @{term "[a,b,c]"}. The first phase (the arrows from
-left to right) is \Brz's matcher building succesive derivatives. If the
+left to right) is \Brz's matcher building successive derivatives. If the
 last regular expression is @{term nullable}, then the functions of the
 second phase are called (the top-down and right-to-left arrows): first
 @{term mkeps} calculates a value witnessing
 how the empty string has been recognised by @{term "r\<^sub>4"}. After
-that the function @{term inj} `injects back' the characters of the string into
+that the function @{term inj} ``injects back'' the characters of the string into
 the values.
 \label{Sulz}}
 \end{figure}
 \begin{center}
 The most interesting idea from Sulzmann and Lu \cite{Sulzmann2014} is
 the construction of a value for how @{term "r\<^sub>1"} can match the
 string @{term "[a,b,c]"} from the value how the last derivative, @{term
 "r\<^sub>4"} in Fig~\ref{Sulz}, can match the empty string. Sulzmann and
 Lu achieve this by stepwise ``injecting back'' the characters into the
-values thus inverting the operation of building derivatives on the level
+values thus inverting the operation of building derivatives, but on the level
 of values. The corresponding function, called @{term inj}, takes three
 arguments, a regular expression, a character and a value. For example in
 the first (or right-most) @{term inj}-step in Fig~\ref{Sulz} the regular
 expression @{term "r\<^sub>3"}, the character @{term c} from the last
 derivative step and @{term "v\<^sub>4"}, which is the value corresponding
 to the derivative regular expression @{term "r\<^sub>4"}. The result is
 the new value @{term "v\<^sub>3"}. The final result of the algorithm is
-the value @{term "v\<^sub>1"} corresponding to the input regular
+the value @{term "v\<^sub>1"}. The @{term inj} function is defined by recursion on regular
-expression. The @{term inj} function is by recursion on the regular
 expressions and by analysing the shape of values (corresponding to
 the derivative regular expressions).
+%
 \begin{center}
 \begin{tabular}{l@ {\hspace{5mm}}lcl}
 (1) & @{thm (lhs) injval.simps(1)} & $\dn$ & @{thm (rhs) injval.simps(1)}\\
 (2) & @{thm (lhs) injval.simps(2)[of "r\<^sub>1" "r\<^sub>2" "c" "v\<^sub>1"]} & $\dn$ &
 @{thm (rhs) injval.simps(2)[of "r\<^sub>1" "r\<^sub>2" "c" "v\<^sub>1"]}\\
 \begin{center}
 @{thm (lhs) der.simps(5)[of c "r\<^sub>1" "r\<^sub>2"]} $\dn$ @{thm (rhs) der.simps(5)[of c "r\<^sub>1" "r\<^sub>2"]}
 \end{center}
-\noindent Consider first the else-branch where the derivative is @{term
+\noindent Consider first the @{text "else"}-branch where the derivative is @{term
 "SEQ (der c r\<^sub>1) r\<^sub>2"}. The corresponding value must therefore
 be the form @{term "Seq v\<^sub>1 v\<^sub>2"}, which matches the left-hand
-side in clause (4) of @{term inj}. In the if-branch the derivative is an
+side in clause~(4) of @{term inj}. In the @{text "if"}-branch the derivative is an
 alternative, namely @{term "ALT (SEQ (der c r\<^sub>1) r\<^sub>2) (der c
 r\<^sub>2)"}. This means we either have to consider a @{text Left}- or
 @{text Right}-value. In case of the @{text Left}-value we know further it
 must be a value for a sequence regular expression. Therefore the pattern
 we match in the clause (5) is @{term "Left (Seq v\<^sub>1 v\<^sub>2)"},
 regular expression @{text "r\<^sub>1"} does not ``contribute'' to
 matching the string, that means it only matches the empty string, we need to
 call @{const mkeps} in order to construct a value for how @{term "r\<^sub>1"}
 can match this empty string. A similar argument applies for why we can
 expect in the left-hand side of clause (7) that the value is of the form
-@{term "Seq v (Stars vs)"}---the derivative of a star is @{term "SEQ r
+@{term "Seq v (Stars vs)"}---the derivative of a star is @{term "SEQ (der c r)
 (STAR r)"}. Finally, the reason for why we can ignore the second argument
 in clause (1) of @{term inj} is that it will only ever be called in cases
 where @{term "c=d"}, but the usual linearity restrictions in patterns do
 not allow us to build this constraint explicitly into our function
 definition.\footnote{Sulzmann and Lu state this clause as @{thm (lhs)
 but our deviation is harmless.}
 The idea of the @{term inj}-function to ``inject'' a character, say
 @{term c}, into a value can be made precise by the first part of the
 following lemma, which shows that the underlying string of an injected
-value has a prepend character @{term c}; the second part shows that the
+value has a prepended character @{term c}; the second part shows that the
 underlying string of an @{const mkeps}-value is always the empty string
 (given the regular expression is nullable since otherwise @{text mkeps}
 might not be defined).
 \begin{lemma}\mbox{}\smallskip\\\label{Prf_injval_flat}
 \end{center}
 \noindent If the regular expression does not match the string, @{const None} is
 returned, indicating an error is raised. If the regular expression \emph{does}
 match the string, then @{const Some} value is returned. One important
-virtue of this algorithm is that it can be implemented with ease in a
+virtue of this algorithm is that it can be implemented with ease in any
 functional programming language and also in Isabelle/HOL. In the remaining
 part of this section we prove that this algorithm is correct.
 The well-known idea of POSIX matching is informally defined by the longest
-match and priority rule; as correctly argued in \cite{Sulzmann2014}, this
+match and priority rule (see Introduction); as correctly argued in \cite{Sulzmann2014}, this
 needs formal specification. Sulzmann and Lu define a \emph{dominance}
 relation\footnote{Sulzmann and Lu call it an ordering relation, but
 without giving evidence that it is transitive.} between values and argue
 that there is a maximum value, as given by the derivative-based algorithm.
 In contrast, we shall introduce a simple inductive definition that
 "PS"}). The first two premises state that @{term "v\<^sub>1"} and @{term "v\<^sub>2"}
 are the POSIX values for @{term "(s\<^sub>1, r\<^sub>1)"} and @{term "(s\<^sub>2, r\<^sub>2)"}
 respectively. Consider now the third premise and note that the POSIX value
 of this rule should match the string @{term "s\<^sub>1 @ s\<^sub>2"}. According to the
 longest match rule, we want that the @{term "s\<^sub>1"} is the longest initial
-split of @{term "s\<^sub>1 @ s\<^sub>2"} such that @{term "s\<^sub>2"} is still recognised
+split of \mbox{@{term "s\<^sub>1 @ s\<^sub>2"}} such that @{term "s\<^sub>2"} is still recognised
 by @{term "r\<^sub>2"}. Let us assume, contrary to the third premise, that there
 \emph{exist} an @{term "s\<^sub>3"} and @{term "s\<^sub>4"} such that @{term "s\<^sub>2"}
-can be split up into a non-empty string @{term "s\<^sub>3"} and possibly empty
+can be split up into a non-empty string @{term "s\<^sub>3"} and a possibly empty
 string @{term "s\<^sub>4"}. Moreover the longer string @{term "s\<^sub>1 @ s\<^sub>3"} can be
 matched by @{text "r\<^sub>1"} and the shorter @{term "s\<^sub>4"} can still be
-matched by @{term "r\<^sub>2"}. In this case @{term "s\<^sub>1"} would not be the
+matched by @{term "r\<^sub>2"}. In this case @{term "s\<^sub>1"} would \emph{not} be the
-longest initial split of @{term "s\<^sub>1 @ s\<^sub>2"} and therefore @{term "Seq v\<^sub>1
+longest initial split of \mbox{@{term "s\<^sub>1 @ s\<^sub>2"}} and therefore @{term "Seq v\<^sub>1
 v\<^sub>2"} cannot be a POSIX value for @{term "(s\<^sub>1 @ s\<^sub>2, SEQ r\<^sub>1 r\<^sub>2)"}.
 The main point is that this side-condition ensures the longest
 match rule is satisfied.
 A similar condition is imposed on the POSIX value in the @{text

changeset 136	fa0d8aa5d7de
parent 135	fee5641c5994
child 137	4178b7e71809