366 that associates values to regular expressions: |
366 that associates values to regular expressions: |
367 |
367 |
368 \begin{center} |
368 \begin{center} |
369 \begin{tabular}{c} |
369 \begin{tabular}{c} |
370 @{thm[mode=Axiom] Prf.intros(4)} \qquad |
370 @{thm[mode=Axiom] Prf.intros(4)} \qquad |
371 @{thm[mode=Axiom] Prf.intros(5)[of "c"]}\medskip\\ |
371 @{thm[mode=Axiom] Prf.intros(5)[of "c"]}\smallskip\\ |
372 @{thm[mode=Rule] Prf.intros(2)[of "v\<^sub>1" "r\<^sub>1" "r\<^sub>2"]} \qquad |
372 @{thm[mode=Rule] Prf.intros(2)[of "v\<^sub>1" "r\<^sub>1" "r\<^sub>2"]} \qquad |
373 @{thm[mode=Rule] Prf.intros(3)[of "v\<^sub>2" "r\<^sub>1" "r\<^sub>2"]}\medskip\\ |
373 @{thm[mode=Rule] Prf.intros(3)[of "v\<^sub>2" "r\<^sub>1" "r\<^sub>2"]}\smallskip\\ |
374 @{thm[mode=Rule] Prf.intros(1)[of "v\<^sub>1" "r\<^sub>1" "v\<^sub>2" "r\<^sub>2"]}\medskip\\ |
374 @{thm[mode=Rule] Prf.intros(1)[of "v\<^sub>1" "r\<^sub>1" "v\<^sub>2" "r\<^sub>2"]}\smallskip\\ |
375 @{thm[mode=Axiom] Prf.intros(6)[of "r"]} \qquad |
375 @{thm[mode=Axiom] Prf.intros(6)[of "r"]} \qquad |
376 @{thm[mode=Rule] Prf.intros(7)[of "v" "r" "vs"]}\medskip\\ |
376 @{thm[mode=Rule] Prf.intros(7)[of "v" "r" "vs"]} |
377 \end{tabular} |
377 \end{tabular} |
378 \end{center} |
378 \end{center} |
379 |
379 |
380 \noindent Note that no values are associated with the regular expression |
380 \noindent Note that no values are associated with the regular expression |
381 @{term ZERO}, and that the only value associated with the regular |
381 @{term ZERO}, and that the only value associated with the regular |
400 "[a,b,c]"}. We first build the three derivatives (according to @{term a}, |
400 "[a,b,c]"}. We first build the three derivatives (according to @{term a}, |
401 @{term b} and @{term c}). We then use @{const nullable} to find out |
401 @{term b} and @{term c}). We then use @{const nullable} to find out |
402 whether the resulting derivative regular expression @{term "r\<^sub>4"} |
402 whether the resulting derivative regular expression @{term "r\<^sub>4"} |
403 can match the empty string. If yes, we call the function @{const mkeps} |
403 can match the empty string. If yes, we call the function @{const mkeps} |
404 that produces a value @{term "v\<^sub>4"} for how @{term "r\<^sub>4"} can |
404 that produces a value @{term "v\<^sub>4"} for how @{term "r\<^sub>4"} can |
405 match the empty string (taking into account the POSIX rules in case |
405 match the empty string (taking into account the POSIX constraints in case |
406 there are several ways). This functions is defined by the clauses: |
406 there are several ways). This functions is defined by the clauses: |
407 |
407 |
408 \begin{figure}[t] |
408 \begin{figure}[t] |
409 \begin{center} |
409 \begin{center} |
410 \begin{tikzpicture}[scale=2,node distance=1.3cm, |
410 \begin{tikzpicture}[scale=2,node distance=1.3cm, |
428 \draw (r4) node[anchor=north west] {\;\raisebox{-8mm}{@{term "mkeps"}}}; |
428 \draw (r4) node[anchor=north west] {\;\raisebox{-8mm}{@{term "mkeps"}}}; |
429 \end{tikzpicture} |
429 \end{tikzpicture} |
430 \end{center} |
430 \end{center} |
431 \caption{The two phases of the algorithm by Sulzmann \& Lu \cite{Sulzmann2014}, |
431 \caption{The two phases of the algorithm by Sulzmann \& Lu \cite{Sulzmann2014}, |
432 matching the string @{term "[a,b,c]"}. The first phase (the arrows from |
432 matching the string @{term "[a,b,c]"}. The first phase (the arrows from |
433 left to right) is \Brz's matcher building succesive derivatives. If the |
433 left to right) is \Brz's matcher building successive derivatives. If the |
434 last regular expression is @{term nullable}, then the functions of the |
434 last regular expression is @{term nullable}, then the functions of the |
435 second phase are called (the top-down and right-to-left arrows): first |
435 second phase are called (the top-down and right-to-left arrows): first |
436 @{term mkeps} calculates a value witnessing |
436 @{term mkeps} calculates a value witnessing |
437 how the empty string has been recognised by @{term "r\<^sub>4"}. After |
437 how the empty string has been recognised by @{term "r\<^sub>4"}. After |
438 that the function @{term inj} `injects back' the characters of the string into |
438 that the function @{term inj} ``injects back'' the characters of the string into |
439 the values. |
439 the values. |
440 \label{Sulz}} |
440 \label{Sulz}} |
441 \end{figure} |
441 \end{figure} |
442 |
442 |
443 \begin{center} |
443 \begin{center} |
463 The most interesting idea from Sulzmann and Lu \cite{Sulzmann2014} is |
463 The most interesting idea from Sulzmann and Lu \cite{Sulzmann2014} is |
464 the construction of a value for how @{term "r\<^sub>1"} can match the |
464 the construction of a value for how @{term "r\<^sub>1"} can match the |
465 string @{term "[a,b,c]"} from the value how the last derivative, @{term |
465 string @{term "[a,b,c]"} from the value how the last derivative, @{term |
466 "r\<^sub>4"} in Fig~\ref{Sulz}, can match the empty string. Sulzmann and |
466 "r\<^sub>4"} in Fig~\ref{Sulz}, can match the empty string. Sulzmann and |
467 Lu achieve this by stepwise ``injecting back'' the characters into the |
467 Lu achieve this by stepwise ``injecting back'' the characters into the |
468 values thus inverting the operation of building derivatives on the level |
468 values thus inverting the operation of building derivatives, but on the level |
469 of values. The corresponding function, called @{term inj}, takes three |
469 of values. The corresponding function, called @{term inj}, takes three |
470 arguments, a regular expression, a character and a value. For example in |
470 arguments, a regular expression, a character and a value. For example in |
471 the first (or right-most) @{term inj}-step in Fig~\ref{Sulz} the regular |
471 the first (or right-most) @{term inj}-step in Fig~\ref{Sulz} the regular |
472 expression @{term "r\<^sub>3"}, the character @{term c} from the last |
472 expression @{term "r\<^sub>3"}, the character @{term c} from the last |
473 derivative step and @{term "v\<^sub>4"}, which is the value corresponding |
473 derivative step and @{term "v\<^sub>4"}, which is the value corresponding |
474 to the derivative regular expression @{term "r\<^sub>4"}. The result is |
474 to the derivative regular expression @{term "r\<^sub>4"}. The result is |
475 the new value @{term "v\<^sub>3"}. The final result of the algorithm is |
475 the new value @{term "v\<^sub>3"}. The final result of the algorithm is |
476 the value @{term "v\<^sub>1"} corresponding to the input regular |
476 the value @{term "v\<^sub>1"}. The @{term inj} function is defined by recursion on regular |
477 expression. The @{term inj} function is by recursion on the regular |
|
478 expressions and by analysing the shape of values (corresponding to |
477 expressions and by analysing the shape of values (corresponding to |
479 the derivative regular expressions). |
478 the derivative regular expressions). |
480 |
479 % |
481 \begin{center} |
480 \begin{center} |
482 \begin{tabular}{l@ {\hspace{5mm}}lcl} |
481 \begin{tabular}{l@ {\hspace{5mm}}lcl} |
483 (1) & @{thm (lhs) injval.simps(1)} & $\dn$ & @{thm (rhs) injval.simps(1)}\\ |
482 (1) & @{thm (lhs) injval.simps(1)} & $\dn$ & @{thm (rhs) injval.simps(1)}\\ |
484 (2) & @{thm (lhs) injval.simps(2)[of "r\<^sub>1" "r\<^sub>2" "c" "v\<^sub>1"]} & $\dn$ & |
483 (2) & @{thm (lhs) injval.simps(2)[of "r\<^sub>1" "r\<^sub>2" "c" "v\<^sub>1"]} & $\dn$ & |
485 @{thm (rhs) injval.simps(2)[of "r\<^sub>1" "r\<^sub>2" "c" "v\<^sub>1"]}\\ |
484 @{thm (rhs) injval.simps(2)[of "r\<^sub>1" "r\<^sub>2" "c" "v\<^sub>1"]}\\ |
505 |
504 |
506 \begin{center} |
505 \begin{center} |
507 @{thm (lhs) der.simps(5)[of c "r\<^sub>1" "r\<^sub>2"]} $\dn$ @{thm (rhs) der.simps(5)[of c "r\<^sub>1" "r\<^sub>2"]} |
506 @{thm (lhs) der.simps(5)[of c "r\<^sub>1" "r\<^sub>2"]} $\dn$ @{thm (rhs) der.simps(5)[of c "r\<^sub>1" "r\<^sub>2"]} |
508 \end{center} |
507 \end{center} |
509 |
508 |
510 \noindent Consider first the else-branch where the derivative is @{term |
509 \noindent Consider first the @{text "else"}-branch where the derivative is @{term |
511 "SEQ (der c r\<^sub>1) r\<^sub>2"}. The corresponding value must therefore |
510 "SEQ (der c r\<^sub>1) r\<^sub>2"}. The corresponding value must therefore |
512 be the form @{term "Seq v\<^sub>1 v\<^sub>2"}, which matches the left-hand |
511 be the form @{term "Seq v\<^sub>1 v\<^sub>2"}, which matches the left-hand |
513 side in clause (4) of @{term inj}. In the if-branch the derivative is an |
512 side in clause~(4) of @{term inj}. In the @{text "if"}-branch the derivative is an |
514 alternative, namely @{term "ALT (SEQ (der c r\<^sub>1) r\<^sub>2) (der c |
513 alternative, namely @{term "ALT (SEQ (der c r\<^sub>1) r\<^sub>2) (der c |
515 r\<^sub>2)"}. This means we either have to consider a @{text Left}- or |
514 r\<^sub>2)"}. This means we either have to consider a @{text Left}- or |
516 @{text Right}-value. In case of the @{text Left}-value we know further it |
515 @{text Right}-value. In case of the @{text Left}-value we know further it |
517 must be a value for a sequence regular expression. Therefore the pattern |
516 must be a value for a sequence regular expression. Therefore the pattern |
518 we match in the clause (5) is @{term "Left (Seq v\<^sub>1 v\<^sub>2)"}, |
517 we match in the clause (5) is @{term "Left (Seq v\<^sub>1 v\<^sub>2)"}, |
521 regular expression @{text "r\<^sub>1"} does not ``contribute'' to |
520 regular expression @{text "r\<^sub>1"} does not ``contribute'' to |
522 matching the string, that means it only matches the empty string, we need to |
521 matching the string, that means it only matches the empty string, we need to |
523 call @{const mkeps} in order to construct a value for how @{term "r\<^sub>1"} |
522 call @{const mkeps} in order to construct a value for how @{term "r\<^sub>1"} |
524 can match this empty string. A similar argument applies for why we can |
523 can match this empty string. A similar argument applies for why we can |
525 expect in the left-hand side of clause (7) that the value is of the form |
524 expect in the left-hand side of clause (7) that the value is of the form |
526 @{term "Seq v (Stars vs)"}---the derivative of a star is @{term "SEQ r |
525 @{term "Seq v (Stars vs)"}---the derivative of a star is @{term "SEQ (der c r) |
527 (STAR r)"}. Finally, the reason for why we can ignore the second argument |
526 (STAR r)"}. Finally, the reason for why we can ignore the second argument |
528 in clause (1) of @{term inj} is that it will only ever be called in cases |
527 in clause (1) of @{term inj} is that it will only ever be called in cases |
529 where @{term "c=d"}, but the usual linearity restrictions in patterns do |
528 where @{term "c=d"}, but the usual linearity restrictions in patterns do |
530 not allow us to build this constraint explicitly into our function |
529 not allow us to build this constraint explicitly into our function |
531 definition.\footnote{Sulzmann and Lu state this clause as @{thm (lhs) |
530 definition.\footnote{Sulzmann and Lu state this clause as @{thm (lhs) |
533 but our deviation is harmless.} |
532 but our deviation is harmless.} |
534 |
533 |
535 The idea of the @{term inj}-function to ``inject'' a character, say |
534 The idea of the @{term inj}-function to ``inject'' a character, say |
536 @{term c}, into a value can be made precise by the first part of the |
535 @{term c}, into a value can be made precise by the first part of the |
537 following lemma, which shows that the underlying string of an injected |
536 following lemma, which shows that the underlying string of an injected |
538 value has a prepend character @{term c}; the second part shows that the |
537 value has a prepended character @{term c}; the second part shows that the |
539 underlying string of an @{const mkeps}-value is always the empty string |
538 underlying string of an @{const mkeps}-value is always the empty string |
540 (given the regular expression is nullable since otherwise @{text mkeps} |
539 (given the regular expression is nullable since otherwise @{text mkeps} |
541 might not be defined). |
540 might not be defined). |
542 |
541 |
543 \begin{lemma}\mbox{}\smallskip\\\label{Prf_injval_flat} |
542 \begin{lemma}\mbox{}\smallskip\\\label{Prf_injval_flat} |
567 \end{center} |
566 \end{center} |
568 |
567 |
569 \noindent If the regular expression does not match the string, @{const None} is |
568 \noindent If the regular expression does not match the string, @{const None} is |
570 returned, indicating an error is raised. If the regular expression \emph{does} |
569 returned, indicating an error is raised. If the regular expression \emph{does} |
571 match the string, then @{const Some} value is returned. One important |
570 match the string, then @{const Some} value is returned. One important |
572 virtue of this algorithm is that it can be implemented with ease in a |
571 virtue of this algorithm is that it can be implemented with ease in any |
573 functional programming language and also in Isabelle/HOL. In the remaining |
572 functional programming language and also in Isabelle/HOL. In the remaining |
574 part of this section we prove that this algorithm is correct. |
573 part of this section we prove that this algorithm is correct. |
575 |
574 |
576 The well-known idea of POSIX matching is informally defined by the longest |
575 The well-known idea of POSIX matching is informally defined by the longest |
577 match and priority rule; as correctly argued in \cite{Sulzmann2014}, this |
576 match and priority rule (see Introduction); as correctly argued in \cite{Sulzmann2014}, this |
578 needs formal specification. Sulzmann and Lu define a \emph{dominance} |
577 needs formal specification. Sulzmann and Lu define a \emph{dominance} |
579 relation\footnote{Sulzmann and Lu call it an ordering relation, but |
578 relation\footnote{Sulzmann and Lu call it an ordering relation, but |
580 without giving evidence that it is transitive.} between values and argue |
579 without giving evidence that it is transitive.} between values and argue |
581 that there is a maximum value, as given by the derivative-based algorithm. |
580 that there is a maximum value, as given by the derivative-based algorithm. |
582 In contrast, we shall introduce a simple inductive definition that |
581 In contrast, we shall introduce a simple inductive definition that |
621 "PS"}). The first two premises state that @{term "v\<^sub>1"} and @{term "v\<^sub>2"} |
620 "PS"}). The first two premises state that @{term "v\<^sub>1"} and @{term "v\<^sub>2"} |
622 are the POSIX values for @{term "(s\<^sub>1, r\<^sub>1)"} and @{term "(s\<^sub>2, r\<^sub>2)"} |
621 are the POSIX values for @{term "(s\<^sub>1, r\<^sub>1)"} and @{term "(s\<^sub>2, r\<^sub>2)"} |
623 respectively. Consider now the third premise and note that the POSIX value |
622 respectively. Consider now the third premise and note that the POSIX value |
624 of this rule should match the string @{term "s\<^sub>1 @ s\<^sub>2"}. According to the |
623 of this rule should match the string @{term "s\<^sub>1 @ s\<^sub>2"}. According to the |
625 longest match rule, we want that the @{term "s\<^sub>1"} is the longest initial |
624 longest match rule, we want that the @{term "s\<^sub>1"} is the longest initial |
626 split of @{term "s\<^sub>1 @ s\<^sub>2"} such that @{term "s\<^sub>2"} is still recognised |
625 split of \mbox{@{term "s\<^sub>1 @ s\<^sub>2"}} such that @{term "s\<^sub>2"} is still recognised |
627 by @{term "r\<^sub>2"}. Let us assume, contrary to the third premise, that there |
626 by @{term "r\<^sub>2"}. Let us assume, contrary to the third premise, that there |
628 \emph{exist} an @{term "s\<^sub>3"} and @{term "s\<^sub>4"} such that @{term "s\<^sub>2"} |
627 \emph{exist} an @{term "s\<^sub>3"} and @{term "s\<^sub>4"} such that @{term "s\<^sub>2"} |
629 can be split up into a non-empty string @{term "s\<^sub>3"} and possibly empty |
628 can be split up into a non-empty string @{term "s\<^sub>3"} and a possibly empty |
630 string @{term "s\<^sub>4"}. Moreover the longer string @{term "s\<^sub>1 @ s\<^sub>3"} can be |
629 string @{term "s\<^sub>4"}. Moreover the longer string @{term "s\<^sub>1 @ s\<^sub>3"} can be |
631 matched by @{text "r\<^sub>1"} and the shorter @{term "s\<^sub>4"} can still be |
630 matched by @{text "r\<^sub>1"} and the shorter @{term "s\<^sub>4"} can still be |
632 matched by @{term "r\<^sub>2"}. In this case @{term "s\<^sub>1"} would not be the |
631 matched by @{term "r\<^sub>2"}. In this case @{term "s\<^sub>1"} would \emph{not} be the |
633 longest initial split of @{term "s\<^sub>1 @ s\<^sub>2"} and therefore @{term "Seq v\<^sub>1 |
632 longest initial split of \mbox{@{term "s\<^sub>1 @ s\<^sub>2"}} and therefore @{term "Seq v\<^sub>1 |
634 v\<^sub>2"} cannot be a POSIX value for @{term "(s\<^sub>1 @ s\<^sub>2, SEQ r\<^sub>1 r\<^sub>2)"}. |
633 v\<^sub>2"} cannot be a POSIX value for @{term "(s\<^sub>1 @ s\<^sub>2, SEQ r\<^sub>1 r\<^sub>2)"}. |
635 The main point is that this side-condition ensures the longest |
634 The main point is that this side-condition ensures the longest |
636 match rule is satisfied. |
635 match rule is satisfied. |
637 |
636 |
638 A similar condition is imposed on the POSIX value in the @{text |
637 A similar condition is imposed on the POSIX value in the @{text |