lexing: comparison thys3/Paper.thy

equal deleted inserted replaced

-:9580bae0500d
+:9f984ff20020
 In our work here we also add to the usual ``basic'' regular
 expressions the \emph{bounded} regular expression @{term "NTIMES r
 n"} where the @{term n} specifies that @{term r} should match
 exactly @{term n}-times (it is not included in Sulzmann and Lu's original work). For brevity we omit the other bounded
-regular expressions @{text "r"}$^{\{..n\}}$, @{text "r"}$^{\{n..\}}$
+regular expressions @{text "r"}$^{\{..\textit{n}\}}$,
-and @{text "r"}$^{\{n..m\}}$ which specify intervals for how many
+@{text "r"}$^{\{\textit{n}..\}}$
+and @{text "r"}$^{\{\textit{n}..\textit{m}\}}$ which specify intervals for how many
 times @{text r} should match. The results presented in this paper
 extend straightforwardly to them too. The importance of the bounded
 regular expressions is that they are often used in practical
 applications, such as Snort (a system for detecting network
 intrusions) and also in XML Schema definitions. According to Bj\"{o}rklund et
 al~\cite{BjorklundMartensTimm2015}, bounded regular expressions
 occur frequently in the latter and can have counters of up to
 ten million.  The problem is that tools based on the classic notion
-of automata need to expand @{text "r"}$^{\{n\}}$ into @{text n}
+of automata need to expand @{text "r"}$^{\{\textit{n}\}}$ into @{text n}
 connected copies of the automaton for @{text r}. This leads to very
 inefficient matching algorithms or algorithms that consume large
 amounts of memory.  A classic example is the regular expression
 \mbox{@{term "SEQ (SEQ (STAR (ALT a b)) a) (NTIMES (ALT a b) n)"}}
 where the minimal DFA requires at least $2^{n + 1}$ states (see
 inhabitation relation that associates values to regular expressions. Our
 version of this relation is defined by the following six rules:
 %
 \begin{center}
 \begin{tabular}{@ {}l@ {}}
-@{thm[mode=Axiom] Prf.intros(4)}\quad
+@{thm[mode=Axiom] Prf.intros(4)}\qquad
-@{thm[mode=Rule] Prf.intros(2)[of "v\<^sub>1" "r\<^sub>1" "r\<^sub>2"]}\quad
+@{thm[mode=Rule] Prf.intros(2)[of "v\<^sub>1" "r\<^sub>1" "r\<^sub>2"]}\qquad
-@{thm[mode=Rule] Prf.intros(3)[of "v\<^sub>2" "r\<^sub>2" "r\<^sub>1"]}\quad
+@{thm[mode=Rule] Prf.intros(3)[of "v\<^sub>2" "r\<^sub>2" "r\<^sub>1"]}\qquad
 @{thm[mode=Rule] Prf.intros(1)[of "v\<^sub>1" "r\<^sub>1" "v\<^sub>2" "r\<^sub>2"]}\medskip\\
 @{thm[mode=Axiom] Prf.intros(5)[of "c"]}\qquad
 @{thm[mode=Rule] Prf.intros(6)[of "vs" "r"]}\qquad
 $\mprset{flushleft}\inferrule{
 @{thm (prem 1) Prf.intros(7)[of "vs\<^sub>1" "r"  "vs\<^sub>2" "n"]}\\\\
 a value for how the last derivative can match the empty string. In case
 of @{term "NTIMES r n"} we use the function @{term replicate} in order to generate
 a list of exactly @{term n} copies, which is the length of the list we expect in this
 case.  The injection function\footnote{While the character argument @{text c} is not
 strictly necessary in the @{text inj}-function for the fragment of regular expressions we
-use in this paper, it is necessary for extended regular expressions, for example for the range regular expression of the form @{text "[a-z]"}.
+use in this paper, it is necessary for extended regular expressions. For example for the range regular expression of the form @{text "[a-z]"}.
 We therefore keep this argument from the original formulation of @{text inj} by Sulzmann and Lu.}
 then calculates the corresponding value for each intermediate derivative until
 a value for the original regular expression is generated.
 Graphically the algorithm by
 Sulzmann and Lu can be illustrated by the following picture %in Figure~\ref{Sulz}
 @{term "Seq (Left (Char a)) (Left (Char b))"}
 where the @{text "Left"}-alternatives get priority. However, this violates
 the POSIX rules and we have not been able to
 reconcile this problem. Therefore we leave better bounds for future work.%!\\[-6.5mm]
-Note that while Antimirov was able to give a bound on the \emph{size}
+Note also that while Antimirov was able to give a bound on the \emph{size}
 of his partial derivatives~\cite{Antimirov95}, Brzozowski gave a bound
 on the \emph{number} of derivatives, provided they are quotient via
 ACI rules \cite{Brzozowski1964}. Brzozowski's result is crucial when one
 uses his derivatives for obtaining a DFA (it essentially bounds
 the number of states). However, this result does \emph{not}

changeset 644	9f984ff20020
parent 643	9580bae0500d