173 ``aggressive'' simplification method that keeps the size of the |
173 ``aggressive'' simplification method that keeps the size of the |
174 derivatives finitely bounded no matter what the length of the string is. |
174 derivatives finitely bounded no matter what the length of the string is. |
175 They make some informal claims about the correctness and linear behaviour |
175 They make some informal claims about the correctness and linear behaviour |
176 of this version, but do not provide any supporting proof arguments, not |
176 of this version, but do not provide any supporting proof arguments, not |
177 even ``pencil-and-paper'' arguments. They write about their bitcoded |
177 even ``pencil-and-paper'' arguments. They write about their bitcoded |
178 \emph{incremental parsing method} (that is the algorithm to be formalised |
178 \emph{incremental parsing method} (that is the algorithm to be fixed and formalised |
179 in this paper): |
179 in this paper): |
180 % |
180 % |
181 \begin{quote}\it |
181 \begin{quote}\it |
182 ``Correctness Claim: We further claim that the incremental parsing |
182 ``Correctness Claim: We further claim that the incremental parsing |
183 method [..] in combination with the simplification steps [..] |
183 method [..] in combination with the simplification steps [..] |
184 yields POSIX parse trees. We have tested this claim |
184 yields POSIX parse trees. We have tested this claim |
185 extensively [..] but yet |
185 extensively [..] but yet |
186 have to work out all proof details.'' \cite[Page 14]{Sulzmann2014} |
186 have to work out all proof details.'' \cite[Page 14]{Sulzmann2014} |
187 \end{quote} |
187 \end{quote} |
188 |
188 |
189 \noindent{}\textbf{Contributions:} We have implemented in Isabelle/HOL |
189 \noindent{}\textbf{Contributions:} We fill this gap by implementing in Isabelle/HOL |
190 the derivative-based lexing algorithm of Sulzmann and Lu |
190 our version of the derivative-based lexing algorithm of Sulzmann and Lu |
191 \cite{Sulzmann2014} where regular expressions are annotated with |
191 \cite{Sulzmann2014} where regular expressions are annotated with |
192 bitsequences. We define the crucial simplification function as a |
192 bitsequences. We define the crucial simplification function as a |
193 recursive function, without the need of a fixpoint operation. One objective of |
193 recursive function, without the need of a fixpoint operation. One objective of |
194 the simplification function is to remove duplicates of regular |
194 the simplification function is to remove duplicates of regular |
195 expressions. For this Sulzmann and Lu use in their paper the standard |
195 expressions. For this Sulzmann and Lu use in their paper the standard |
198 reason is that in the bitcoded setting, each copy generally has a |
198 reason is that in the bitcoded setting, each copy generally has a |
199 different bitcode annotation---so @{text nub} would never ``fire''. |
199 different bitcode annotation---so @{text nub} would never ``fire''. |
200 Inspired by Scala's library for lists, we shall instead use a @{text |
200 Inspired by Scala's library for lists, we shall instead use a @{text |
201 distinctWith} function that finds duplicates under an ``erasing'' function |
201 distinctWith} function that finds duplicates under an ``erasing'' function |
202 which deletes bitcodes before comparing regular expressions. |
202 which deletes bitcodes before comparing regular expressions. |
203 We shall also introduce our \emph{own} argument and definitions for |
203 We shall also introduce our \emph{own} arguments and definitions for |
204 establishing the correctness of the bitcoded algorithm when |
204 establishing the correctness of the bitcoded algorithm when |
205 simplifications are included. Finally we |
205 simplifications are included. Finally we |
206 establish that the size of derivatives can be finitely bounded.\medskip%\footnote{ In this paper, we shall first briefly introduce the basic notions |
206 establish that the size of derivatives can be finitely bounded.\medskip%\footnote{ In this paper, we shall first briefly introduce the basic notions |
207 %of regular expressions and describe the definition |
207 %of regular expressions and describe the definition |
208 %of POSIX lexing from our earlier work \cite{AusafDyckhoffUrban2016}. This serves |
208 %of POSIX lexing from our earlier work \cite{AusafDyckhoffUrban2016}. This serves |
328 \end{tabular} |
328 \end{tabular} |
329 \end{center} |
329 \end{center} |
330 |
330 |
331 \noindent |
331 \noindent |
332 We can extend this definition to give derivatives w.r.t.~strings, |
332 We can extend this definition to give derivatives w.r.t.~strings, |
333 namely @{thm (lhs) ders.simps(1)}$\dn$@{thm (rhs) ders.simps(1)} |
333 namely as @{thm (lhs) ders.simps(1)}$\dn$@{thm (rhs) ders.simps(1)} |
334 and @{thm (lhs) ders.simps(2)}$\dn$@{thm (rhs) ders.simps(2)}. |
334 and @{thm (lhs) ders.simps(2)}$\dn$@{thm (rhs) ders.simps(2)}. |
335 Using @{text nullable} and the derivative operation, we can |
335 Using @{text nullable} and the derivative operation, we can |
336 define a simple regular expression matcher, namely |
336 define a simple regular expression matcher, namely |
337 $@{text "match s r"} \;\dn\; @{term nullable}(r\backslash s)$. |
337 $@{text "match s r"} \dn @{term nullable}(r\backslash s)$. |
338 This is essentially Brzozowski's algorithm from 1964. Its |
338 This is essentially Brzozowski's algorithm from 1964. Its |
339 main virtue is that the algorithm can be easily implemented as a |
339 main virtue is that the algorithm can be easily implemented as a |
340 functional program (either in a functional programming language or in |
340 functional program (either in a functional programming language or in |
341 a theorem prover). The correctness of @{text match} amounts to |
341 a theorem prover). The correctness of @{text match} amounts to |
342 establishing the property:%\footnote{It is a fun exercise to formally prove this property in a theorem prover.} |
342 establishing the property:%\footnote{It is a fun exercise to formally prove this property in a theorem prover.} |
382 Sulzmann and Lu also define inductively an |
382 Sulzmann and Lu also define inductively an |
383 inhabitation relation that associates values to regular expressions. Our |
383 inhabitation relation that associates values to regular expressions. Our |
384 version of this relation is defined the following six rules for the values: |
384 version of this relation is defined the following six rules for the values: |
385 % |
385 % |
386 \begin{center} |
386 \begin{center} |
387 \begin{tabular}{@ {}c@ {}} |
387 \begin{tabular}{@ {}l@ {}} |
388 @{thm[mode=Axiom] Prf.intros(4)}\qquad |
388 @{thm[mode=Axiom] Prf.intros(4)}\quad |
|
389 @{thm[mode=Rule] Prf.intros(2)[of "v\<^sub>1" "r\<^sub>1" "r\<^sub>2"]}\quad |
|
390 @{thm[mode=Rule] Prf.intros(3)[of "v\<^sub>2" "r\<^sub>2" "r\<^sub>1"]}\quad |
|
391 @{thm[mode=Rule] Prf.intros(1)[of "v\<^sub>1" "r\<^sub>1" "v\<^sub>2" "r\<^sub>2"]}\medskip\\ |
389 @{thm[mode=Axiom] Prf.intros(5)[of "c"]}\qquad |
392 @{thm[mode=Axiom] Prf.intros(5)[of "c"]}\qquad |
390 @{thm[mode=Rule] Prf.intros(2)[of "v\<^sub>1" "r\<^sub>1" "r\<^sub>2"]}\qquad |
|
391 @{thm[mode=Rule] Prf.intros(3)[of "v\<^sub>2" "r\<^sub>2" "r\<^sub>1"]}\qquad |
|
392 @{thm[mode=Rule] Prf.intros(1)[of "v\<^sub>1" "r\<^sub>1" "v\<^sub>2" "r\<^sub>2"]}\medskip\\ |
|
393 @{thm[mode=Rule] Prf.intros(6)[of "vs" "r"]}\qquad |
393 @{thm[mode=Rule] Prf.intros(6)[of "vs" "r"]}\qquad |
394 $\mprset{flushleft}\inferrule{ |
394 $\mprset{flushleft}\inferrule{ |
395 @{thm (prem 1) Prf.intros(7)[of "vs\<^sub>1" "r" "vs\<^sub>2" "n"]}\\\\ |
395 @{thm (prem 1) Prf.intros(7)[of "vs\<^sub>1" "r" "vs\<^sub>2" "n"]}\\\\ |
396 @{thm (prem 2) Prf.intros(7)[of "vs\<^sub>1" "r" "vs\<^sub>2" "n"]}\quad |
396 @{thm (prem 2) Prf.intros(7)[of "vs\<^sub>1" "r" "vs\<^sub>2" "n"]}\\\\ |
397 @{thm (prem 3) Prf.intros(7)[of "vs\<^sub>1" "r" "vs\<^sub>2" "n"]} |
397 @{thm (prem 3) Prf.intros(7)[of "vs\<^sub>1" "r" "vs\<^sub>2" "n"]} |
398 } |
398 } |
399 {@{thm (concl) Prf.intros(7)[of "vs\<^sub>1" "r" "vs\<^sub>2" "n"]}} |
399 {@{thm (concl) Prf.intros(7)[of "vs\<^sub>1" "r" "vs\<^sub>2" "n"]}} |
400 $ |
400 $ |
401 \end{tabular} |
401 \end{tabular} |
486 @{thm (lhs) mkeps.simps(1)} & $\dn$ & @{thm (rhs) mkeps.simps(1)}\\ |
486 @{thm (lhs) mkeps.simps(1)} & $\dn$ & @{thm (rhs) mkeps.simps(1)}\\ |
487 @{thm (lhs) mkeps.simps(2)[of "r\<^sub>1" "r\<^sub>2"]} & $\dn$ & @{thm (rhs) mkeps.simps(2)[of "r\<^sub>1" "r\<^sub>2"]}\\ |
487 @{thm (lhs) mkeps.simps(2)[of "r\<^sub>1" "r\<^sub>2"]} & $\dn$ & @{thm (rhs) mkeps.simps(2)[of "r\<^sub>1" "r\<^sub>2"]}\\ |
488 @{thm (lhs) mkeps.simps(3)[of "r\<^sub>1" "r\<^sub>2"]} & $\dn$ & @{thm (rhs) mkeps.simps(3)[of "r\<^sub>1" "r\<^sub>2"]}\\ |
488 @{thm (lhs) mkeps.simps(3)[of "r\<^sub>1" "r\<^sub>2"]} & $\dn$ & @{thm (rhs) mkeps.simps(3)[of "r\<^sub>1" "r\<^sub>2"]}\\ |
489 @{thm (lhs) mkeps.simps(4)} & $\dn$ & @{thm (rhs) mkeps.simps(4)}\\ |
489 @{thm (lhs) mkeps.simps(4)} & $\dn$ & @{thm (rhs) mkeps.simps(4)}\\ |
490 @{thm (lhs) mkeps.simps(5)} & $\dn$ & @{thm (rhs) mkeps.simps(5)}\\ |
490 @{thm (lhs) mkeps.simps(5)} & $\dn$ & @{thm (rhs) mkeps.simps(5)}\\ |
491 \end{tabular}\medskip\\ |
491 \end{tabular} |
492 |
492 \medskip\\ |
493 \begin{tabular}{@ {}cc@ {}} |
493 |
|
494 %!\begin{tabular}{@ {}cc@ {}} |
494 \begin{tabular}{@ {}l@ {\hspace{1mm}}c@ {\hspace{1mm}}l@ {}@ {}} |
495 \begin{tabular}{@ {}l@ {\hspace{1mm}}c@ {\hspace{1mm}}l@ {}@ {}} |
495 @{thm (lhs) injval.simps(1)} & $\dn$ & @{thm (rhs) injval.simps(1)}\\ |
496 @{thm (lhs) injval.simps(1)} & $\dn$ & @{thm (rhs) injval.simps(1)}\\ |
496 @{thm (lhs) injval.simps(2)[of "r\<^sub>1" "r\<^sub>2" "c" "v\<^sub>1"]} & $\dn$ & |
497 @{thm (lhs) injval.simps(2)[of "r\<^sub>1" "r\<^sub>2" "c" "v\<^sub>1"]} & $\dn$ & |
497 @{thm (rhs) injval.simps(2)[of "r\<^sub>1" "r\<^sub>2" "c" "v\<^sub>1"]}\\ |
498 @{thm (rhs) injval.simps(2)[of "r\<^sub>1" "r\<^sub>2" "c" "v\<^sub>1"]}\\ |
498 @{thm (lhs) injval.simps(3)[of "r\<^sub>1" "r\<^sub>2" "c" "v\<^sub>2"]} & $\dn$ & |
499 @{thm (lhs) injval.simps(3)[of "r\<^sub>1" "r\<^sub>2" "c" "v\<^sub>2"]} & $\dn$ & |
499 @{thm (rhs) injval.simps(3)[of "r\<^sub>1" "r\<^sub>2" "c" "v\<^sub>2"]}\\ |
500 @{thm (rhs) injval.simps(3)[of "r\<^sub>1" "r\<^sub>2" "c" "v\<^sub>2"]}\\ |
|
501 %! |
|
502 @{thm (lhs) injval.simps(4)[of "r\<^sub>1" "r\<^sub>2" "c" "v\<^sub>1" "v\<^sub>2"]} & $\dn$ |
|
503 & @{thm (rhs) injval.simps(4)[of "r\<^sub>1" "r\<^sub>2" "c" "v\<^sub>1" "v\<^sub>2"]}\\ |
|
504 @{thm (lhs) injval.simps(5)[of "r\<^sub>1" "r\<^sub>2" "c" "v\<^sub>1" "v\<^sub>2"]} & $\dn$ |
|
505 & @{thm (rhs) injval.simps(5)[of "r\<^sub>1" "r\<^sub>2" "c" "v\<^sub>1" "v\<^sub>2"]}\\ |
|
506 @{thm (lhs) injval.simps(6)[of "r\<^sub>1" "r\<^sub>2" "c" "v\<^sub>2"]} & $\dn$ & |
|
507 @{thm (rhs) injval.simps(6)[of "r\<^sub>1" "r\<^sub>2" "c" "v\<^sub>2"]}\\ |
|
508 %! |
500 @{thm (lhs) injval.simps(7)[of "r" "c" "v" "vs"]} & $\dn$ |
509 @{thm (lhs) injval.simps(7)[of "r" "c" "v" "vs"]} & $\dn$ |
501 & @{thm (rhs) injval.simps(7)[of "r" "c" "v" "vs"]}\\ |
510 & @{thm (rhs) injval.simps(7)[of "r" "c" "v" "vs"]}\\ |
502 @{thm (lhs) injval.simps(8)[of "r" "n" "c" "v" "vs"]} & $\dn$ |
511 @{thm (lhs) injval.simps(8)[of "r" "n" "c" "v" "vs"]} & $\dn$ |
503 & @{thm (rhs) injval.simps(8)[of "r" "n" "c" "v" "vs"]} |
512 & @{thm (rhs) injval.simps(8)[of "r" "n" "c" "v" "vs"]} |
504 \end{tabular} |
513 \end{tabular} |
505 & |
514 %!& |
506 \begin{tabular}{@ {\hspace{-3mm}}l@ {\hspace{1mm}}c@ {\hspace{1mm}}l@ {}@ {}} |
515 %!\begin{tabular}{@ {\hspace{-3mm}}l@ {\hspace{1mm}}c@ {\hspace{1mm}}l@ {}@ {}} |
507 @{thm (lhs) injval.simps(4)[of "r\<^sub>1" "r\<^sub>2" "c" "v\<^sub>1" "v\<^sub>2"]} & $\dn$ |
516 |
508 & @{thm (rhs) injval.simps(4)[of "r\<^sub>1" "r\<^sub>2" "c" "v\<^sub>1" "v\<^sub>2"]}\\ |
517 %!\end{tabular} |
509 @{thm (lhs) injval.simps(5)[of "r\<^sub>1" "r\<^sub>2" "c" "v\<^sub>1" "v\<^sub>2"]} & $\dn$ |
518 %!\end{tabular} |
510 & @{thm (rhs) injval.simps(5)[of "r\<^sub>1" "r\<^sub>2" "c" "v\<^sub>1" "v\<^sub>2"]}\\ |
519 \end{tabular}%!\smallskip |
511 @{thm (lhs) injval.simps(6)[of "r\<^sub>1" "r\<^sub>2" "c" "v\<^sub>2"]} & $\dn$\\ |
|
512 \multicolumn{3}{@ {}r@ {}}{@{thm (rhs) injval.simps(6)[of "r\<^sub>1" "r\<^sub>2" "c" "v\<^sub>2"]}}\\ \mbox{}\\ \mbox{} |
|
513 \end{tabular} |
|
514 \end{tabular} |
|
515 \end{tabular}\smallskip |
|
516 \end{center} |
520 \end{center} |
517 |
521 |
518 \noindent |
522 \noindent |
519 The function @{text mkeps} is run when the last derivative is nullable, that is |
523 The function @{text mkeps} is run when the last derivative is nullable, that is |
520 the string to be matched is in the language of the regular expression. It generates |
524 the string to be matched is in the language of the regular expression. It generates |
649 |
653 |
650 In the second part of their paper \cite{Sulzmann2014}, |
654 In the second part of their paper \cite{Sulzmann2014}, |
651 Sulzmann and Lu describe another algorithm that also generates POSIX |
655 Sulzmann and Lu describe another algorithm that also generates POSIX |
652 values but dispenses with the second phase where characters are |
656 values but dispenses with the second phase where characters are |
653 injected ``back'' into values. For this they annotate bitcodes to |
657 injected ``back'' into values. For this they annotate bitcodes to |
654 regular expressions, which we define in Isabelle/HOL as the datatype |
658 regular expressions, which we define in Isabelle/HOL as the datatype\medskip |
655 |
659 |
656 \begin{center} |
660 %!\begin{center} |
657 @{term breg} $\;::=\;$ @{term "AZERO"} |
661 \noindent |
658 $\;\mid\;$ @{term "AONE bs"} |
662 \begin{minipage}{1.01\textwidth} |
659 $\;\mid\;$ @{term "ACHAR bs c"} |
663 \,@{term breg} $\,::=\,$ @{term "AZERO"} |
660 $\;\mid\;$ @{term "AALTs bs rs"} |
664 $\,\mid$ @{term "AONE bs"} |
661 $\;\mid\;$ @{term "ASEQ bs r\<^sub>1 r\<^sub>2"} |
665 $\,\mid$ @{term "ACHAR bs c"} |
662 $\;\mid\;$ @{term "ASTAR bs r"} |
666 $\,\mid$ @{term "AALTs bs rs"} |
663 $\;\mid\;$ @{term "ANTIMES bs r n"} |
667 $\,\mid$ @{term "ASEQ bs r\<^sub>1 r\<^sub>2"} |
664 \end{center} |
668 $\,\mid$ @{term "ASTAR bs r"} |
|
669 $\,\mid$ \mbox{@{term "ANTIMES bs r n"}}\hfill\mbox{} |
|
670 \end{minipage}\medskip |
|
671 %!\end{center} |
665 |
672 |
666 \noindent where @{text bs} stands for bitsequences; @{text r}, |
673 \noindent where @{text bs} stands for bitsequences; @{text r}, |
667 @{text "r\<^sub>1"} and @{text "r\<^sub>2"} for bitcoded regular |
674 @{text "r\<^sub>1"} and @{text "r\<^sub>2"} for bitcoded regular |
668 expressions; and @{text rs} for lists of bitcoded regular |
675 expressions; and @{text rs} for lists of bitcoded regular |
669 expressions. The binary alternative @{text "ALT bs r\<^sub>1 r\<^sub>2"} |
676 expressions. The binary alternative @{text "ALT bs r\<^sub>1 r\<^sub>2"} |
702 @{text S} followed by some bitsequence. Similarly, we use @{text Z} to indicate |
709 @{text S} followed by some bitsequence. Similarly, we use @{text Z} to indicate |
703 if there is still a value coming in the list of @{text Stars}, whereas @{text S} |
710 if there is still a value coming in the list of @{text Stars}, whereas @{text S} |
704 indicates the end of the list. The lossiness makes the process of |
711 indicates the end of the list. The lossiness makes the process of |
705 decoding a bit more involved, but the point is that if we have a |
712 decoding a bit more involved, but the point is that if we have a |
706 regular expression \emph{and} a bitsequence of a corresponding value, |
713 regular expression \emph{and} a bitsequence of a corresponding value, |
707 then we can always decode the value accurately (see Fig.~\ref{decode}). |
714 then we can always decode the value accurately~(see Fig.~\ref{decode}). |
708 The function \textit{decode} checks whether all of the bitsequence is |
715 The function \textit{decode} checks whether all of the bitsequence is |
709 consumed and returns the corresponding value as @{term "Some v"}; otherwise |
716 consumed and returns the corresponding value as @{term "Some v"}; otherwise |
710 it fails with @{text "None"}. We can establish that for a value $v$ |
717 it fails with @{text "None"}. We can establish that for a value $v$ |
711 inhabited by a regular expression $r$, the decoding of its |
718 inhabited by a regular expression $r$, the decoding of its |
712 bitsequence never fails (see also \cite{NielsenHenglein2011}). |
719 bitsequence never fails (see also \cite{NielsenHenglein2011}). |
713 |
720 |
714 %The decoding can be |
721 %The decoding can be |
715 %defined by using two functions called $\textit{decode}'$ and |
722 %defined by using two functions called $\textit{decode}'$ and |
716 %\textit{decode}: |
723 %\textit{decode}: |
717 |
724 |
718 \begin{figure} |
725 \begin{figure}[t] |
719 \begin{center} |
726 \begin{center} |
720 \begin{tabular}{@ {}l@ {\hspace{1mm}}c@ {\hspace{1mm}}l@ {}} |
727 \begin{tabular}{@ {}l@ {\hspace{1mm}}c@ {\hspace{1mm}}l@ {}} |
721 $\textit{decode}'\,bs\,(\ONE)$ & $\;\dn\;$ & $(\Empty, bs)$\\ |
728 $\textit{decode}'\,bs\,(\ONE)$ & $\;\dn\;$ & $(\Empty, bs)$\\ |
722 $\textit{decode}'\,bs\,(c)$ & $\;\dn\;$ & $(\Char\,c, bs)$\\ |
729 $\textit{decode}'\,bs\,(c)$ & $\;\dn\;$ & $(\Char\,c, bs)$\\ |
723 $\textit{decode}'\,(\Z\!::\!bs)\;(r_1 + r_2)$ & $\dn$ & |
730 $\textit{decode}'\,(\Z\!::\!bs)\;(r_1 + r_2)$ & $\dn$ & |
733 $\textit{decode}'\,(\S\!::\!bs)\,(r^*)$ & $\dn$ & $(\Stars\,[], bs)$\\ |
740 $\textit{decode}'\,(\S\!::\!bs)\,(r^*)$ & $\dn$ & $(\Stars\,[], bs)$\\ |
734 $\textit{decode}'\,(\Z\!::\!bs)\,(r^*)$ & $\dn$ & |
741 $\textit{decode}'\,(\Z\!::\!bs)\,(r^*)$ & $\dn$ & |
735 $\textit{let}\,(v, bs_1) = \textit{decode}'\,bs\,r\;\textit{in}$\\ |
742 $\textit{let}\,(v, bs_1) = \textit{decode}'\,bs\,r\;\textit{in}$\\ |
736 & & $\textit{let}\,(\Stars\,vs, bs_2) = \textit{decode}'\,bs_1\,r^*$ |
743 & & $\textit{let}\,(\Stars\,vs, bs_2) = \textit{decode}'\,bs_1\,r^*$ |
737 \hspace{2mm}$\textit{in}\;(\Stars\,v\!::\!vs, bs_2)$\\ |
744 \hspace{2mm}$\textit{in}\;(\Stars\,v\!::\!vs, bs_2)$\\ |
738 $\textit{decode}'\,bs\,(r^{\{n\}})$ & $\dn$ & $\textit{decode}'\,bs\,r^*$\smallskip\medskip\\ |
745 $\textit{decode}'\,bs\,(r^{\{n\}})$ & $\dn$ & $\textit{decode}'\,bs\,r^*$\smallskip\smallskip\\ |
739 |
746 |
740 $\textit{decode}\,bs\,r$ & $\dn$ & |
747 $\textit{decode}\,bs\,r$ & $\dn$ & |
741 $\textit{let}\,(v, bs') = \textit{decode}'\,bs\,r\;\textit{in}$\\ |
748 $\textit{let}\,(v, bs') = \textit{decode}'\,bs\,r\;\textit{in}$\\ |
742 & & $\;\;\;\,\textit{if}\;bs' = []\;\textit{then}\;\textit{Some}\,v\;\textit{else}\;\textit{None}$\\[-4mm] |
749 & & $\;\;\;\,\textit{if}\;bs' = []\;\textit{then}\;\textit{Some}\,v\;\textit{else}\;\textit{None}$\\[-5mm] |
743 \end{tabular} |
750 \end{tabular} |
744 \end{center} |
751 \end{center} |
745 \caption{Two functions, called $\textit{decode}'$ and \textit{decode}, for decoding a value from a bitsequence with the help of a regular expression.\\[-5mm]}\label{decode} |
752 \caption{Two functions, called $\textit{decode}'$ and \textit{decode}, for decoding a value from a bitsequence with the help of a regular expression.\\[-5mm]}\label{decode} |
746 \end{figure} |
753 \end{figure} |
747 |
754 |
761 |
768 |
762 Sulzmann and Lu define the function \emph{internalise} |
769 Sulzmann and Lu define the function \emph{internalise} |
763 in order to transform (standard) regular expressions into annotated |
770 in order to transform (standard) regular expressions into annotated |
764 regular expressions. We write this operation as $r^\uparrow$. |
771 regular expressions. We write this operation as $r^\uparrow$. |
765 This internalisation uses the following |
772 This internalisation uses the following |
766 \emph{fuse} function. |
773 \emph{fuse} function.\medskip |
|
774 |
767 % |
775 % |
768 \begin{center} |
776 %!\begin{center} |
769 \begin{tabular}{@ {}cc@ {}} |
777 \noindent\begin{minipage}{\textwidth} |
|
778 \begin{tabular}{@ {}c@ {\hspace{3mm}}c@ {}} |
770 \begin{tabular}{l@ {\hspace{1mm}}c@ {\hspace{1mm}}l} |
779 \begin{tabular}{l@ {\hspace{1mm}}c@ {\hspace{1mm}}l} |
771 $\textit{fuse}\,bs\,(\textit{ZERO})$ & $\dn$ & $\textit{ZERO}$\\ |
780 $\textit{fuse}\,bs\,(\textit{ZERO})$ & $\dn$ & $\textit{ZERO}$\\ |
772 $\textit{fuse}\,bs\,(\textit{ONE}\,bs')$ & $\dn$ & |
781 $\textit{fuse}\,bs\,(\textit{ONE}\,bs')$ & $\dn$ & |
773 $\textit{ONE}\,(bs\,@\,bs')$\\ |
782 $\textit{ONE}\,(bs\,@\,bs')$\\ |
774 $\textit{fuse}\,bs\,(\textit{CHAR}\,bs'\,c)$ & $\dn$ & |
783 $\textit{fuse}\,bs\,(\textit{CHAR}\,bs'\,c)$ & $\dn$ & |
784 $\textit{STAR}\,(bs\,@\,bs')\,r$\\ |
793 $\textit{STAR}\,(bs\,@\,bs')\,r$\\ |
785 $\textit{fuse}\,bs\,(\textit{NT}\,bs'\,r\,n)$ & $\dn$ & |
794 $\textit{fuse}\,bs\,(\textit{NT}\,bs'\,r\,n)$ & $\dn$ & |
786 $\textit{NT}\,(bs\,@\,bs')\,r\,n$ |
795 $\textit{NT}\,(bs\,@\,bs')\,r\,n$ |
787 \end{tabular} |
796 \end{tabular} |
788 \end{tabular} |
797 \end{tabular} |
789 \end{center} |
798 \end{minipage}\medskip |
|
799 %!\end{center} |
790 |
800 |
791 \noindent |
801 \noindent |
792 A regular expression can then be \emph{internalised} into a bitcoded |
802 A regular expression can then be \emph{internalised} into a bitcoded |
793 regular expression as follows: |
803 regular expression as follows: |
794 % |
804 % |
|
805 %!\begin{center} |
|
806 %!\begin{tabular}{@ {}c@ {\hspace{2mm}}c@ {\hspace{2mm}}c@ {}} |
|
807 %!\begin{tabular}{l@ {\hspace{1mm}}c@ {\hspace{1mm}}l} |
|
808 %!$(\ZERO)^\uparrow$ & $\dn$ & $\textit{ZERO}$\\ |
|
809 %!$(\ONE)^\uparrow$ & $\dn$ & $\textit{ONE}\,[]$ |
|
810 %!\end{tabular} |
|
811 %!& |
|
812 %!\begin{tabular}{l@ {\hspace{1mm}}c@ {\hspace{1mm}}l} |
|
813 %!$(c)^\uparrow$ & $\dn$ & $\textit{CHAR}\,[]\,c$\\ |
|
814 %!$(r^*)^\uparrow$ & $\dn$ & $\textit{STAR}\;[]\,r^\uparrow$\\ |
|
815 %!$(r^{\{n\}})^\uparrow$ & $\dn$ & |
|
816 %! $\textit{NT}\;[]\,r^\uparrow\,n$ |
|
817 %!\end{tabular} |
|
818 %!& |
|
819 %!\begin{tabular}{l@ {\hspace{1mm}}c@ {\hspace{1mm}}l} |
|
820 %!$(r_1 + r_2)^\uparrow$ & $\dn$ & |
|
821 %! $\textit{ALT}\;[]\,(\textit{fuse}\,[\Z]\,r_1^\uparrow)\, |
|
822 %! (\textit{fuse}\,[\S]\,r_2^\uparrow)$\\ |
|
823 %!$(r_1\cdot r_2)^\uparrow$ & $\dn$ & |
|
824 %! $\textit{SEQ}\;[]\,r_1^\uparrow\,r_2^\uparrow$ |
|
825 %!\end{tabular} |
|
826 %!\end{tabular} |
|
827 %!\end{center} |
|
828 %! |
795 \begin{center} |
829 \begin{center} |
796 \begin{tabular}{@ {}ccc@ {}} |
830 \begin{tabular}{@ {}c@ {\hspace{7mm}}c@ {}} |
797 \begin{tabular}{l@ {\hspace{1mm}}c@ {\hspace{1mm}}l} |
831 \begin{tabular}{l@ {\hspace{1mm}}c@ {\hspace{1mm}}l} |
798 $(\ZERO)^\uparrow$ & $\dn$ & $\textit{ZERO}$\\ |
832 $(\ZERO)^\uparrow$ & $\dn$ & $\textit{ZERO}$\\ |
799 $(\ONE)^\uparrow$ & $\dn$ & $\textit{ONE}\,[]$ |
833 $(\ONE)^\uparrow$ & $\dn$ & $\textit{ONE}\,[]$\\ |
800 \end{tabular} |
|
801 & |
|
802 \begin{tabular}{l@ {\hspace{1mm}}c@ {\hspace{1mm}}l} |
|
803 $(c)^\uparrow$ & $\dn$ & $\textit{CHAR}\,[]\,c$\\ |
834 $(c)^\uparrow$ & $\dn$ & $\textit{CHAR}\,[]\,c$\\ |
804 $(r^*)^\uparrow$ & $\dn$ & $\textit{STAR}\;[]\,r^\uparrow$\\ |
835 $(r^*)^\uparrow$ & $\dn$ & $\textit{STAR}\;[]\,r^\uparrow$\\ |
805 $(r^{\{n\}})^\uparrow$ & $\dn$ & |
|
806 $\textit{NT}\;[]\,r^\uparrow\,n$ |
|
807 \end{tabular} |
836 \end{tabular} |
808 & |
837 & |
809 \begin{tabular}{l@ {\hspace{1mm}}c@ {\hspace{1mm}}l} |
838 \begin{tabular}{l@ {\hspace{1mm}}c@ {\hspace{1mm}}l} |
810 $(r_1 + r_2)^\uparrow$ & $\dn$ & |
839 $(r_1 + r_2)^\uparrow$ & $\dn$ & |
811 $\textit{ALT}\;[]\,(\textit{fuse}\,[\Z]\,r_1^\uparrow)\, |
840 $\textit{ALT}\;[]\,(\textit{fuse}\,[\Z]\,r_1^\uparrow)\, |
812 (\textit{fuse}\,[\S]\,r_2^\uparrow)$\\ |
841 (\textit{fuse}\,[\S]\,r_2^\uparrow)$\\ |
813 $(r_1\cdot r_2)^\uparrow$ & $\dn$ & |
842 $(r_1\cdot r_2)^\uparrow$ & $\dn$ & |
814 $\textit{SEQ}\;[]\,r_1^\uparrow\,r_2^\uparrow$ |
843 $\textit{SEQ}\;[]\,r_1^\uparrow\,r_2^\uparrow$\\ |
|
844 $(r^{\{n\}})^\uparrow$ & $\dn$ & |
|
845 $\textit{NT}\;[]\,r^\uparrow\,n$ |
815 \end{tabular} |
846 \end{tabular} |
816 \end{tabular} |
847 \end{tabular} |
817 \end{center} |
848 \end{center} |
818 |
849 |
819 \noindent |
850 \noindent |
824 the functions \textit{bnullable} and \textit{bmkeps}(\textit{s}), which are the |
855 the functions \textit{bnullable} and \textit{bmkeps}(\textit{s}), which are the |
825 ``lifted'' versions of \textit{nullable} and \textit{mkeps} acting on |
856 ``lifted'' versions of \textit{nullable} and \textit{mkeps} acting on |
826 bitcoded regular expressions. |
857 bitcoded regular expressions. |
827 % |
858 % |
828 \begin{center} |
859 \begin{center} |
829 \begin{tabular}{@ {\hspace{-1mm}}c@ {\hspace{0mm}}c@ {}} |
860 \begin{tabular}{@ {\hspace{-1mm}}c@ {\hspace{6mm}}c@ {}} |
830 \begin{tabular}{@ {}l@ {\hspace{0.5mm}}c@ {\hspace{1mm}}l} |
861 \begin{tabular}{@ {}l@ {\hspace{0.5mm}}c@ {\hspace{1mm}}l} |
831 $\textit{bnullable}\,(\textit{ZERO})$ & $\dn$ & $\textit{False}$\\ |
862 $\textit{bnullable}\,(\textit{ZERO})$ & $\dn$ & $\textit{False}$\\ |
832 $\textit{bnullable}\,(\textit{ONE}\,bs)$ & $\dn$ & $\textit{True}$\\ |
863 $\textit{bnullable}\,(\textit{ONE}\,bs)$ & $\dn$ & $\textit{True}$\\ |
833 $\textit{bnullable}\,(\textit{CHAR}\,bs\,c)$ & $\dn$ & $\textit{False}$\\ |
864 $\textit{bnullable}\,(\textit{CHAR}\,bs\,c)$ & $\dn$ & $\textit{False}$\\ |
834 $\textit{bnullable}\,(\textit{ALTs}\,bs\,\rs)$ & $\dn$ & |
865 $\textit{bnullable}\,(\textit{ALTs}\,bs\,\rs)$ & $\dn$ &\\ |
835 $\exists\, r \in \rs. \,\textit{bnullable}\,r$\\ |
866 \multicolumn{3}{l}{$\quad\exists\, r \in \rs. \,\textit{bnullable}\,r$}\\ |
836 $\textit{bnullable}\,(\textit{SEQ}\,bs\,r_1\,r_2)$ & $\dn$ & |
867 $\textit{bnullable}\,(\textit{SEQ}\,bs\,r_1\,r_2)$ & $\dn$ &\\ |
837 $\textit{bnullable}\,r_1\wedge \textit{bnullable}\,r_2$\\ |
868 \multicolumn{3}{l}{$\quad\textit{bnullable}\,r_1\wedge \textit{bnullable}\,r_2$}\\ |
838 $\textit{bnullable}\,(\textit{STAR}\,bs\,r)$ & $\dn$ & |
869 $\textit{bnullable}\,(\textit{STAR}\,bs\,r)$ & $\dn$ & |
839 $\textit{True}$\\ |
870 $\textit{True}$\\ |
840 $\textit{bnullable}\,(\textit{NT}\,bs\,r\,n)$ & $\dn$ &\\ |
871 $\textit{bnullable}\,(\textit{NT}\,bs\,r\,n)$ & $\dn$ &\\ |
841 \multicolumn{3}{l}{$\textit{if}\;n = 0\;\textit{then}\;\textit{True}\; |
872 \multicolumn{3}{l}{$\quad\textit{if}\;n = 0\;\textit{then}\;\textit{True}\; |
842 \textit{else}\;\textit{bnullable}\,r$}\\ |
873 \textit{else}\;\textit{bnullable}\,r$}\\ |
843 \end{tabular} |
874 \end{tabular} |
844 & |
875 & |
845 \begin{tabular}{@ {}l@ {\hspace{0.5mm}}c@ {\hspace{1mm}}l@ {}} |
876 \begin{tabular}{@ {}l@ {\hspace{0.5mm}}c@ {\hspace{1mm}}l@ {}} |
846 $\textit{bmkeps}\,(\textit{ONE}\,bs)$ & $\dn$ & $bs$\\ |
877 $\textit{bmkeps}\,(\textit{ONE}\,bs)$ & $\dn$ & $bs$\\ |
847 $\textit{bmkeps}\,(\textit{ALTs}\,bs\,\rs)$ & $\dn$ & |
878 $\textit{bmkeps}\,(\textit{ALTs}\,bs\,\rs)$ & $\dn$ & |
848 $bs\,@\,\textit{bmkepss}\,\rs$\\ |
879 $bs\,@\,\textit{bmkepss}\,\rs$\\ |
849 $\textit{bmkeps}\,(\textit{SEQ}\,bs\,r_1\,r_2)$ & $\dn$ &\\ |
880 $\textit{bmkeps}\,(\textit{SEQ}\,bs\,r_1\,r_2)$ & $\dn$ &\\ |
850 \multicolumn{3}{l}{$bs \,@\,\textit{bmkeps}\,r_1\,@\, \textit{bmkeps}\,r_2$}\\ |
881 \multicolumn{3}{l}{$\quad{}bs \,@\,\textit{bmkeps}\,r_1\,@\, \textit{bmkeps}\,r_2$}\\ |
851 $\textit{bmkeps}\,(\textit{STAR}\,bs\,r)$ & $\dn$ & |
882 $\textit{bmkeps}\,(\textit{STAR}\,bs\,r)$ & $\dn$ & |
852 $bs \,@\, [\S]$\\ |
883 $bs \,@\, [\S]$\\ |
853 $\textit{bmkeps}\,(\textit{NT}\,bs\,r\,n)$ & $\dn$ &\\ |
884 $\textit{bmkeps}\,(\textit{NT}\,bs\,r\,n)$ & $\dn$ &\\ |
854 \multicolumn{3}{l@ {}}{$\textit{if}\;n=0\;\textit{then}\;bs \,@\, [\S]$}\\ |
885 \multicolumn{3}{l@ {}}{$\quad\textit{if}\;n=0\;\textit{then}\;bs \,@\, [\S]$}\\ |
855 \multicolumn{3}{l@ {}}{$\textit{else}\,bs \,@\, [\Z] \,@\, \textit{bmkeps}\,r\,@\, |
886 \multicolumn{3}{l@ {}}{$\quad\textit{else}\,bs \,@\, [\Z] \,@\, \textit{bmkeps}\,r\,@\, |
856 \textit{bmkeps}\,(@{term "ANTIMES [] r (n - 1)"})$}\\ |
887 \textit{bmkeps}\,(@{term "ANTIMES [] r (n - 1)"})$}\\ |
857 $\textit{bmkepss}\,(r\!::\!\rs)$ & $\dn$ &\\ |
888 $\textit{bmkepss}\,(r\!::\!\rs)$ & $\dn$ &\\ |
858 \multicolumn{3}{l}{$\textit{if}\;\textit{bnullable}\,r\,\textit{then}\;\textit{bmkeps}\,r\; |
889 \multicolumn{3}{l}{$\quad\textit{if}\;\textit{bnullable}\,r\,\textit{then}\;\textit{bmkeps}\,r\; |
859 \textit{else}\;\textit{bmkepss}\,\rs$} |
890 \textit{else}\;\textit{bmkepss}\,\rs$} |
860 \end{tabular} |
891 \end{tabular} |
861 \end{tabular} |
892 \end{tabular} |
862 \end{center} |
893 \end{center} |
863 |
894 |
956 is the same as if we first erase the bitcoded regular expression and |
988 is the same as if we first erase the bitcoded regular expression and |
957 then perform the ``standard'' derivative operation. |
989 then perform the ``standard'' derivative operation. |
958 |
990 |
959 \begin{lemma}\label{bnullable}%\mbox{}\smallskip\\ |
991 \begin{lemma}\label{bnullable}%\mbox{}\smallskip\\ |
960 \textit{(1)} $(r\backslash s)^\downarrow = (r^\downarrow)\backslash s$\\ |
992 \textit{(1)} $(r\backslash s)^\downarrow = (r^\downarrow)\backslash s$\\ |
961 \mbox{\hspace{22mm}}\textit{(2)} $\textit{bnullable}(r)$ iff $\textit{nullable}(r^\downarrow)$\\ |
993 \mbox{\hspace{17mm}}\textit{(2)} $\textit{bnullable}(r)$ iff $\textit{nullable}(r^\downarrow)$\\ |
962 \mbox{\hspace{22mm}}\textit{(3)} $\textit{bmkeps}(r) = \textit{retrieve}\,r\,(\textit{mkeps}\,(r^\downarrow))$ provided $\textit{nullable}(r^\downarrow)$ |
994 \mbox{\hspace{17mm}}\textit{(3)} $\textit{bmkeps}(r) = \textit{retrieve}\,r\,(\textit{mkeps}\,(r^\downarrow))$ provided $\textit{nullable}(r^\downarrow)$ |
963 %\begin{tabular}{ll} |
995 %\begin{tabular}{ll} |
964 %\textit{(1)} & $(r\backslash s)^\downarrow = (r^\downarrow)\backslash s$\\ |
996 %\textit{(1)} & $(r\backslash s)^\downarrow = (r^\downarrow)\backslash s$\\ |
965 %\textit{(2)} & $\textit{bnullable}(r)$ iff $\textit{nullable}(r^\downarrow)$\\ |
997 %\textit{(2)} & $\textit{bnullable}(r)$ iff $\textit{nullable}(r^\downarrow)$\\ |
966 %\textit{(3)} & $\textit{bmkeps}(r) = \textit{retrieve}\,r\,(\textit{mkeps}\,(r^\downarrow))$ provided $\textit{nullable}(r^\downarrow)$. |
998 %\textit{(3)} & $\textit{bmkeps}(r) = \textit{retrieve}\,r\,(\textit{mkeps}\,(r^\downarrow))$ provided $\textit{nullable}(r^\downarrow)$. |
967 %\end{tabular} |
999 %\end{tabular} |
1196 empty or a singleton list. We take care of those cases in the |
1231 empty or a singleton list. We take care of those cases in the |
1197 @{text "bsimpALTs"} function; similarly we define a helper function that simplifies |
1232 @{text "bsimpALTs"} function; similarly we define a helper function that simplifies |
1198 sequences according to the usual rules about @{text ZERO}s and @{text ONE}s: |
1233 sequences according to the usual rules about @{text ZERO}s and @{text ONE}s: |
1199 |
1234 |
1200 \begin{center} |
1235 \begin{center} |
1201 \begin{tabular}{c@ {\hspace{5mm}}c} |
1236 \begin{tabular}{@ {}c@ {\hspace{4mm}}c@ {}} |
1202 \begin{tabular}{l@ {\hspace{1mm}}c@ {\hspace{1mm}}l} |
1237 \begin{tabular}{@ {}l@ {\hspace{1mm}}c@ {\hspace{1mm}}l@ {}} |
1203 @{text "bsimpALTs bs []"} & $\dn$ & @{text "ZERO"}\\ |
1238 @{text "bsimpALTs bs []"} & $\dn$ & @{text "ZERO"}\\ |
1204 @{text "bsimpALTs bs [r]"} & $\dn$ & @{text "fuse bs r"}\\ |
1239 @{text "bsimpALTs bs [r]"} & $\dn$ & @{text "fuse bs r"}\\ |
1205 @{text "bsimpALTs bs rs"} & $\dn$ & @{text "ALTs bs rs"}\\ |
1240 @{text "bsimpALTs bs rs"} & $\dn$ & @{text "ALTs bs rs"}\\ |
1206 \mbox{}\\ |
1241 \mbox{}\\ |
1207 \end{tabular} |
1242 \end{tabular} |
1208 & |
1243 & |
1209 \begin{tabular}{l@ {\hspace{1mm}}c@ {\hspace{1mm}}l} |
1244 \begin{tabular}{@ {}l@ {\hspace{1mm}}c@ {\hspace{1mm}}l@ {}} |
1210 @{text "bsimpSEQ bs _ ZERO"} & $\dn$ & @{text "ZERO"}\\ |
1245 @{text "bsimpSEQ bs _ ZERO"} & $\dn$ & @{text "ZERO"}\\ |
1211 @{text "bsimpSEQ bs ZERO _"} & $\dn$ & @{text "ZERO"}\\ |
1246 @{text "bsimpSEQ bs ZERO _"} & $\dn$ & @{text "ZERO"}\\ |
1212 @{text "bsimpSEQ bs\<^sub>1 (ONE bs\<^sub>2) r\<^sub>2"} |
1247 @{text "bsimpSEQ bs\<^sub>1 (ONE bs\<^sub>2) r\<^sub>2"} |
1213 & $\dn$ & @{text "fuse (bs\<^sub>1 @ bs\<^sub>2) r\<^sub>2"}\\ |
1248 & $\dn$ & @{text "fuse (bs\<^sub>1 @ bs\<^sub>2) r\<^sub>2"}\\ |
1214 @{text "bsimpSEQ bs r\<^sub>1 r\<^sub>2"} & $\dn$ & @{text "SEQ bs r\<^sub>1 r\<^sub>2"} |
1249 @{text "bsimpSEQ bs r\<^sub>1 r\<^sub>2"} & $\dn$ & @{text "SEQ bs r\<^sub>1 r\<^sub>2"} |
1441 Note that the bound $N$ is a bound for \emph{all} strings, no matter how long they are. |
1477 Note that the bound $N$ is a bound for \emph{all} strings, no matter how long they are. |
1442 We establish this bound by induction on $r$. The base cases for @{term AZERO}, |
1478 We establish this bound by induction on $r$. The base cases for @{term AZERO}, |
1443 @{term "AONE bs"} and @{term "ACHAR bs c"} are straightforward. The interesting case is |
1479 @{term "AONE bs"} and @{term "ACHAR bs c"} are straightforward. The interesting case is |
1444 for sequences of the form @{term "ASEQ bs r\<^sub>1 r\<^sub>2"}. In this case our induction |
1480 for sequences of the form @{term "ASEQ bs r\<^sub>1 r\<^sub>2"}. In this case our induction |
1445 hypotheses state $\exists N_1. \forall s. \; \llbracket{}@{term "bders_simp r\<^sub>1 s"}\rrbracket \leq N_1$ and |
1481 hypotheses state $\exists N_1. \forall s. \; \llbracket{}@{term "bders_simp r\<^sub>1 s"}\rrbracket \leq N_1$ and |
1446 $\exists N_2. \forall s. \; \llbracket@{term "bders_simp r\<^sub>2 s"}\rrbracket \leq N_2$. We can reason as follows |
1482 $\exists N_2. \forall s. \; \llbracket@{term "bders_simp r\<^sub>2 s"}\rrbracket \leq N_2$. We can reason as follows\medskip |
1447 % |
1483 % |
1448 \begin{center} |
1484 %!\begin{center} |
|
1485 |
|
1486 \noindent\begin{minipage}{\textwidth} |
1449 \begin{tabular}{lcll} |
1487 \begin{tabular}{lcll} |
1450 & & $ \llbracket@{term "bders_simp (ASEQ bs r\<^sub>1 r\<^sub>2) s"}\rrbracket$\\ |
1488 & & $ \llbracket@{term "bders_simp (ASEQ bs r\<^sub>1 r\<^sub>2) s"}\rrbracket$\\ |
1451 & $ = $ & $\llbracket bsimp\,(\textit{ALTs}\;bs\;((@{term "ASEQ [] (bders_simp r\<^sub>1 s) r\<^sub>2"}) :: |
1489 & $ = $ & $\llbracket bsimp\,(\textit{ALTs}\;bs\;((@{term "ASEQ [] (bders_simp r\<^sub>1 s) r\<^sub>2"}) :: |
1452 [@{term "bders_simp r\<^sub>2 s'"} \;|\; s' \in \textit{Suffix}(@{text r}_1, s)]))\rrbracket $ & (1) \\ |
1490 [@{term "bders_simp r\<^sub>2 s'"} \;|\; s' \in \textit{Suf}(@{text r}_1, s)]))\rrbracket $ & (1) \\ |
1453 & $\leq$ & |
1491 & $\leq$ & |
1454 $\llbracket\textit{distinctWith}\,((@{term "ASEQ [] (bders_simp r\<^sub>1 s) r\<^sub>2"}) :: |
1492 $\llbracket\textit{distinctWith}\,((@{term "ASEQ [] (bders_simp r\<^sub>1 s) r\<^sub>2"}) :: |
1455 [@{term "bders_simp r\<^sub>2 s'"} \;|\; s' \in \textit{Suffix}(@{text r}_1, s)])\,\approx\;@{term "{}"}\rrbracket + 1 $ & (2) \\ |
1493 [@{term "bders_simp r\<^sub>2 s'"} \;|\; s' \in \textit{Suf}(@{text r}_1, s)])\,\approx\;@{term "{}"}\rrbracket + 1 $ & (2) \\ |
1456 & $\leq$ & $\llbracket@{term "ASEQ [] (bders_simp r\<^sub>1 s) r\<^sub>2"}\rrbracket + |
1494 & $\leq$ & $\llbracket@{term "ASEQ [] (bders_simp r\<^sub>1 s) r\<^sub>2"}\rrbracket + |
1457 \llbracket\textit{distinctWith}\,[@{term "bders_simp r\<^sub>2 s'"} \;|\; s' \in \textit{Suffix}(@{text r}_1, s)]\,\approx\;@{term "{}"}\rrbracket + 1 $ & (3) \\ |
1495 \llbracket\textit{distinctWith}\,[@{term "bders_simp r\<^sub>2 s'"} \;|\; s' \in \textit{Suf}(@{text r}_1, s)]\,\approx\;@{term "{}"}\rrbracket + 1 $ & (3) \\ |
1458 & $\leq$ & $N_1 + \llbracket r_2\rrbracket + 2 + |
1496 & $\leq$ & $N_1 + \llbracket r_2\rrbracket + 2 + |
1459 \llbracket\textit{distinctWith}\,[@{term "bders_simp r\<^sub>2 s'"} \;|\; s' \in \textit{Suffix}(@{text r}_1, s)]\,\approx\;@{term "{}"}\rrbracket$ & (4)\\ |
1497 \llbracket\textit{distinctWith}\,[@{term "bders_simp r\<^sub>2 s'"} \;|\; s' \in \textit{Suf}(@{text r}_1, s)]\,\approx\;@{term "{}"}\rrbracket$ & (4)\\ |
1460 & $\leq$ & $N_1 + \llbracket r_2\rrbracket + 2 + l_{N_{2}} * N_{2}$ & (5) |
1498 & $\leq$ & $N_1 + \llbracket r_2\rrbracket + 2 + l_{N_{2}} * N_{2}$ & (5) |
1461 \end{tabular} |
1499 \end{tabular} |
1462 \end{center} |
1500 \end{minipage}\medskip |
|
1501 %!\end{center} |
1463 |
1502 |
1464 % tell Chengsong about Indian paper of closed forms of derivatives |
1503 % tell Chengsong about Indian paper of closed forms of derivatives |
1465 |
1504 |
1466 \noindent |
1505 \noindent |
1467 where in (1) the $\textit{Suffix}(@{text "r"}_1, s)$ are all the suffixes of $s$ where @{term "bders_simp r\<^sub>1 s'"} is nullable ($s'$ being a suffix of $s$). |
1506 where in (1) the $\textit{Suf}(@{text "r"}_1, s)$ are all the suffixes of $s$ where @{term "bders_simp r\<^sub>1 s'"} is nullable ($s'$ being a suffix of $s$). |
1468 In (3) we know that $\llbracket@{term "ASEQ [] (bders_simp r\<^sub>1 s) r\<^sub>2"}\rrbracket$ is |
1507 In (3) we know that $\llbracket@{term "ASEQ [] (bders_simp r\<^sub>1 s) r\<^sub>2"}\rrbracket$ is |
1469 bounded by $N_1 + \llbracket{}r_2\rrbracket + 1$. In (5) we know the list comprehension contains only regular expressions of size smaller |
1508 bounded by $N_1 + \llbracket{}r_2\rrbracket + 1$. In (5) we know the list comprehension contains only regular expressions of size smaller |
1470 than $N_2$. The list length after @{text distinctWith} is bounded by a number, which we call $l_{N_2}$. It stands |
1509 than $N_2$. The list length after @{text distinctWith} is bounded by a number, which we call $l_{N_2}$. It stands |
1471 for the number of distinct regular expressions smaller than $N_2$ (there can only be finitely many of them). |
1510 for the number of distinct regular expressions smaller than $N_2$ (there can only be finitely many of them). |
1472 We reason similarly for @{text STAR}.\smallskip |
1511 We reason similarly for @{text STAR}.\smallskip |
1500 \] |
1539 \] |
1501 |
1540 |
1502 \noindent |
1541 \noindent |
1503 where the @{text "Left"}-alternatives get priority. However, this violates |
1542 where the @{text "Left"}-alternatives get priority. However, this violates |
1504 the POSIX rules and we have not been able to |
1543 the POSIX rules and we have not been able to |
1505 reconcile this problem. Therefore we leave better bounds for future work.\\[-6.5mm] |
1544 reconcile this problem. Therefore we leave better bounds for future work.%!\\[-6.5mm] |
|
1545 |
|
1546 Note that while Antimirov was able to give a bound on the \emph{size} |
|
1547 of his partial derivatives~\cite{Antimirov95}, Brzozowski gave a bound |
|
1548 on the \emph{number} of derivatives, provided they are quotient via |
|
1549 ACI rules \cite{Brzozowski1964}. Brozozowski's result is crucial when one |
|
1550 uses derivatives for obtaining an automaton (it essentially bounds |
|
1551 the number of states). However, this result does \emph{not} |
|
1552 transfer to our setting where we are interested in the \emph{size} of the |
|
1553 derivatives. For example it is not true for our derivatives that the |
|
1554 set of of derivatives $r \backslash s$ for a given $r$ and all strings |
|
1555 $s$ is finite. This is because for our annotated regular expressions |
|
1556 the bitcode annotation is dependent on the number of iterations that |
|
1557 are needed for STAR-regular expressions. Since we want to do lexing |
|
1558 by calculating (as fast as possible) derivatives, the bound on the size |
|
1559 of the derivatives is important, not the number of derivatives. |
|
1560 |
1506 |
1561 |
1507 *} |
1562 *} |
1508 |
1563 |
1509 |
1564 |
1510 section {* Conclusion *} |
1565 section {* Conclusion *} |