258 |
258 |
259 In our work here we also add to the usual ``basic'' regular |
259 In our work here we also add to the usual ``basic'' regular |
260 expressions the \emph{bounded} regular expression @{term "NTIMES r |
260 expressions the \emph{bounded} regular expression @{term "NTIMES r |
261 n"} where the @{term n} specifies that @{term r} should match |
261 n"} where the @{term n} specifies that @{term r} should match |
262 exactly @{term n}-times (it is not included in Sulzmann and Lu's original work). For brevity we omit the other bounded |
262 exactly @{term n}-times (it is not included in Sulzmann and Lu's original work). For brevity we omit the other bounded |
263 regular expressions @{text "r"}$^{\{..n\}}$, @{text "r"}$^{\{n..\}}$ |
263 regular expressions @{text "r"}$^{\{..\textit{n}\}}$, |
264 and @{text "r"}$^{\{n..m\}}$ which specify intervals for how many |
264 @{text "r"}$^{\{\textit{n}..\}}$ |
|
265 and @{text "r"}$^{\{\textit{n}..\textit{m}\}}$ which specify intervals for how many |
265 times @{text r} should match. The results presented in this paper |
266 times @{text r} should match. The results presented in this paper |
266 extend straightforwardly to them too. The importance of the bounded |
267 extend straightforwardly to them too. The importance of the bounded |
267 regular expressions is that they are often used in practical |
268 regular expressions is that they are often used in practical |
268 applications, such as Snort (a system for detecting network |
269 applications, such as Snort (a system for detecting network |
269 intrusions) and also in XML Schema definitions. According to Bj\"{o}rklund et |
270 intrusions) and also in XML Schema definitions. According to Bj\"{o}rklund et |
270 al~\cite{BjorklundMartensTimm2015}, bounded regular expressions |
271 al~\cite{BjorklundMartensTimm2015}, bounded regular expressions |
271 occur frequently in the latter and can have counters of up to |
272 occur frequently in the latter and can have counters of up to |
272 ten million. The problem is that tools based on the classic notion |
273 ten million. The problem is that tools based on the classic notion |
273 of automata need to expand @{text "r"}$^{\{n\}}$ into @{text n} |
274 of automata need to expand @{text "r"}$^{\{\textit{n}\}}$ into @{text n} |
274 connected copies of the automaton for @{text r}. This leads to very |
275 connected copies of the automaton for @{text r}. This leads to very |
275 inefficient matching algorithms or algorithms that consume large |
276 inefficient matching algorithms or algorithms that consume large |
276 amounts of memory. A classic example is the regular expression |
277 amounts of memory. A classic example is the regular expression |
277 \mbox{@{term "SEQ (SEQ (STAR (ALT a b)) a) (NTIMES (ALT a b) n)"}} |
278 \mbox{@{term "SEQ (SEQ (STAR (ALT a b)) a) (NTIMES (ALT a b) n)"}} |
278 where the minimal DFA requires at least $2^{n + 1}$ states (see |
279 where the minimal DFA requires at least $2^{n + 1}$ states (see |
395 inhabitation relation that associates values to regular expressions. Our |
396 inhabitation relation that associates values to regular expressions. Our |
396 version of this relation is defined by the following six rules: |
397 version of this relation is defined by the following six rules: |
397 % |
398 % |
398 \begin{center} |
399 \begin{center} |
399 \begin{tabular}{@ {}l@ {}} |
400 \begin{tabular}{@ {}l@ {}} |
400 @{thm[mode=Axiom] Prf.intros(4)}\quad |
401 @{thm[mode=Axiom] Prf.intros(4)}\qquad |
401 @{thm[mode=Rule] Prf.intros(2)[of "v\<^sub>1" "r\<^sub>1" "r\<^sub>2"]}\quad |
402 @{thm[mode=Rule] Prf.intros(2)[of "v\<^sub>1" "r\<^sub>1" "r\<^sub>2"]}\qquad |
402 @{thm[mode=Rule] Prf.intros(3)[of "v\<^sub>2" "r\<^sub>2" "r\<^sub>1"]}\quad |
403 @{thm[mode=Rule] Prf.intros(3)[of "v\<^sub>2" "r\<^sub>2" "r\<^sub>1"]}\qquad |
403 @{thm[mode=Rule] Prf.intros(1)[of "v\<^sub>1" "r\<^sub>1" "v\<^sub>2" "r\<^sub>2"]}\medskip\\ |
404 @{thm[mode=Rule] Prf.intros(1)[of "v\<^sub>1" "r\<^sub>1" "v\<^sub>2" "r\<^sub>2"]}\medskip\\ |
404 @{thm[mode=Axiom] Prf.intros(5)[of "c"]}\qquad |
405 @{thm[mode=Axiom] Prf.intros(5)[of "c"]}\qquad |
405 @{thm[mode=Rule] Prf.intros(6)[of "vs" "r"]}\qquad |
406 @{thm[mode=Rule] Prf.intros(6)[of "vs" "r"]}\qquad |
406 $\mprset{flushleft}\inferrule{ |
407 $\mprset{flushleft}\inferrule{ |
407 @{thm (prem 1) Prf.intros(7)[of "vs\<^sub>1" "r" "vs\<^sub>2" "n"]}\\\\ |
408 @{thm (prem 1) Prf.intros(7)[of "vs\<^sub>1" "r" "vs\<^sub>2" "n"]}\\\\ |
540 a value for how the last derivative can match the empty string. In case |
541 a value for how the last derivative can match the empty string. In case |
541 of @{term "NTIMES r n"} we use the function @{term replicate} in order to generate |
542 of @{term "NTIMES r n"} we use the function @{term replicate} in order to generate |
542 a list of exactly @{term n} copies, which is the length of the list we expect in this |
543 a list of exactly @{term n} copies, which is the length of the list we expect in this |
543 case. The injection function\footnote{While the character argument @{text c} is not |
544 case. The injection function\footnote{While the character argument @{text c} is not |
544 strictly necessary in the @{text inj}-function for the fragment of regular expressions we |
545 strictly necessary in the @{text inj}-function for the fragment of regular expressions we |
545 use in this paper, it is necessary for extended regular expressions, for example for the range regular expression of the form @{text "[a-z]"}. |
546 use in this paper, it is necessary for extended regular expressions. For example for the range regular expression of the form @{text "[a-z]"}. |
546 We therefore keep this argument from the original formulation of @{text inj} by Sulzmann and Lu.} |
547 We therefore keep this argument from the original formulation of @{text inj} by Sulzmann and Lu.} |
547 then calculates the corresponding value for each intermediate derivative until |
548 then calculates the corresponding value for each intermediate derivative until |
548 a value for the original regular expression is generated. |
549 a value for the original regular expression is generated. |
549 Graphically the algorithm by |
550 Graphically the algorithm by |
550 Sulzmann and Lu can be illustrated by the following picture %in Figure~\ref{Sulz} |
551 Sulzmann and Lu can be illustrated by the following picture %in Figure~\ref{Sulz} |
1567 @{term "Seq (Left (Char a)) (Left (Char b))"} |
1568 @{term "Seq (Left (Char a)) (Left (Char b))"} |
1568 where the @{text "Left"}-alternatives get priority. However, this violates |
1569 where the @{text "Left"}-alternatives get priority. However, this violates |
1569 the POSIX rules and we have not been able to |
1570 the POSIX rules and we have not been able to |
1570 reconcile this problem. Therefore we leave better bounds for future work.%!\\[-6.5mm] |
1571 reconcile this problem. Therefore we leave better bounds for future work.%!\\[-6.5mm] |
1571 |
1572 |
1572 Note that while Antimirov was able to give a bound on the \emph{size} |
1573 Note also that while Antimirov was able to give a bound on the \emph{size} |
1573 of his partial derivatives~\cite{Antimirov95}, Brzozowski gave a bound |
1574 of his partial derivatives~\cite{Antimirov95}, Brzozowski gave a bound |
1574 on the \emph{number} of derivatives, provided they are quotient via |
1575 on the \emph{number} of derivatives, provided they are quotient via |
1575 ACI rules \cite{Brzozowski1964}. Brzozowski's result is crucial when one |
1576 ACI rules \cite{Brzozowski1964}. Brzozowski's result is crucial when one |
1576 uses his derivatives for obtaining a DFA (it essentially bounds |
1577 uses his derivatives for obtaining a DFA (it essentially bounds |
1577 the number of states). However, this result does \emph{not} |
1578 the number of states). However, this result does \emph{not} |