lexing: comparison thys3/Paper.thy

equal deleted inserted replaced

-:2c9a3aba8ebc
+:a5f666410101
 and @{text "r"}$^{\{n..m\}}$ which specify intervals for how many
 times @{text r} should match. The results presented in this paper
 extend straightforwardly to them too. The importance of the bounded
 regular expressions is that they are often used in practical
 applications, such as Snort (a system for detecting network
-intrusion) and also in XML Schema definitions. According to Bj\"{o}rklund et
+intrusions) and also in XML Schema definitions. According to Bj\"{o}rklund et
 al~\cite{BjorklundMartensTimm2015}, bounded regular expressions
 occur frequently in the latter and can have counters of up to
 ten million.  The problem is that tools based on the classic notion
 of automata need to expand @{text "r"}$^{\{n\}}$ into @{text n}
 connected copies of the automaton for @{text r}. This leads to very
 %\textit{decode}:
 \begin{figure}
 \begin{center}
 \begin{tabular}{@ {}l@ {\hspace{1mm}}c@ {\hspace{1mm}}l@ {}}
-\multicolumn{3}{@ {}l}{$\textit{decode}'\,bs\,(\ONE)$ $\;\dn\;$ $(\Empty, bs)$ \quad\qquad
+$\textit{decode}'\,bs\,(\ONE)$ & $\;\dn\;$ & $(\Empty, bs)$\\
-$\textit{decode}'\,bs\,(c)$ $\;\dn\;$ $(\Char\,c, bs)$}\\
+$\textit{decode}'\,bs\,(c)$ & $\;\dn\;$ & $(\Char\,c, bs)$\\
 $\textit{decode}'\,(\Z\!::\!bs)\;(r_1 + r_2)$ & $\dn$ &
 $\textit{let}\,(v, bs_1) = \textit{decode}'\,bs\,r_1\;\textit{in}\;
 (\Left\,v, bs_1)$\\
 $\textit{decode}'\,(\S\!::\!bs)\;(r_1 + r_2)$ & $\dn$ &
 $\textit{let}\,(v, bs_1) = \textit{decode}'\,bs\,r_2\;\textit{in}\;
 $\textit{decode}'\,(\S\!::\!bs)\,(r^*)$ & $\dn$ & $(\Stars\,[], bs)$\\
 $\textit{decode}'\,(\Z\!::\!bs)\,(r^*)$ & $\dn$ &
 $\textit{let}\,(v, bs_1) = \textit{decode}'\,bs\,r\;\textit{in}$\\
 & &   $\textit{let}\,(\Stars\,vs, bs_2) = \textit{decode}'\,bs_1\,r^*$
 \hspace{2mm}$\textit{in}\;(\Stars\,v\!::\!vs, bs_2)$\\
-$\textit{decode}'\,(\S\!::\!bs)\,(r^{\{n\}})$ & $\dn$ & $(\Stars\,[], bs)$\\
+$\textit{decode}'\,bs\,(r^{\{n\}})$ & $\dn$ & $\textit{decode}'\,bs\,r^*$\smallskip\medskip\\
-$\textit{decode}'\,(\Z\!::\!bs)\,(r^{\{n\}})$ & $\dn$ &
-$\textit{let}\,(v, bs_1) = \textit{decode}'\,bs\,r\;\textit{in}$\\
+$\textit{decode}\,bs\,r$ & $\dn$ &
-& &   $\textit{let}\,(\Stars\,vs, bs_2) = \textit{decode}'\,bs_1\,r^{\{n - 1\}}$
+$\textit{let}\,(v, bs') = \textit{decode}'\,bs\,r\;\textit{in}$\\
-\hspace{2mm}$\textit{in}\;(\Stars\,v\!::\!vs, bs_2)$\medskip\\
+& & $\;\;\;\,\textit{if}\;bs' = []\;\textit{then}\;\textit{Some}\,v\;\textit{else}\;\textit{None}$\\[-4mm]
-\multicolumn{3}{@ {}l}{$\textit{decode}\,bs\,r$ $\dn$
-$\textit{let}\,(v, bs') = \textit{decode}'\,bs\,r\;\textit{in}$
-$\;\,\textit{if}\;bs' = []\;\textit{then}\;\textit{Some}\,v\;
-\textit{else}\;\textit{None}$}\\[-4mm]
 \end{tabular}
 \end{center}
 \caption{Two functions, called $\textit{decode}'$ and \textit{decode}, for decoding a value from a bitsequence with the help of a regular expression.\\[-5mm]}\label{decode}
 \end{figure}
 We can then prove the correctness of \textit{blexer}---it indeed
 produces the same result as \textit{lexer}.
 \begin{theorem}\label{thmone}
-$\textit{lexer}\,r\,s = \textit{blexer}\,r\,s$
+$\textit{blexer}\,r\,s = \textit{lexer}\,r\,s$
 \end{theorem}
 \noindent This establishes that the bitcoded algorithm \emph{without}
 expressions is that they can be easily modified such that simplification does not
 interfere with the value constructions. For example we can ``flatten'', or
 de-nest, or spill out, @{text ALTs} as follows
 %
 \[
-@{term "ALTs bs\<^sub>1 ((ALTs bs\<^sub>2 rs\<^sub>2) # rs\<^sub>1)"}
+@{term "ALTs bs\<^sub>1 (((ALTs bs\<^sub>2 rs\<^sub>2)) # rs\<^sub>1)"}
 \quad\xrightarrow{bsimp}\quad
 @{term "ALTs bs\<^sub>1 ((map (fuse bs\<^sub>2) rs\<^sub>2) @ rs\<^sub>1)"}
 \]
 \noindent
 %\end{proof}
 \noindent
 With these lemmas in place we can finally establish that @{term "blexer_simp"} and @{term "blexer"}
 generate the same value, and using Theorem~\ref{thmone} from the previous section that this value
-is indeed the POSIX value.
+is indeed the POSIX value as generated by \textit{lexer}.
 \begin{theorem}
-@{thm[mode=IfThen] main_blexer_simp}
+@{thm[mode=IfThen] main_blexer_simp[symmetric]} \; (@{text "= lexer r s"}\; by Thm.~\ref{thmone})
 \end{theorem}
 %\begin{proof}
 %By unfolding the definitions and using Lemmas~\ref{lemtwo} and \ref{lemthree}.
 %\end{proof}
 obscure, examples.
 %We found that from an implementation
 %point-of-view it is really important to have the formal proofs of
 %the corresponding properties at hand.
-We can of course only make a claim about the correctness and the sizes of the
+With the results reported here, we can of course only make a claim about the correctness
+of the algorithm and the sizes of the
 derivatives, not about the efficiency or runtime of our version of
 Sulzman and Lu's algorithm. But we found the size is an important
 first indicator about efficiency: clearly if the derivatives can
 grow to arbitrarily big sizes and the algorithm needs to traverse
 the derivatives possibly several times, then the algorithm will be

changeset 599	a5f666410101
parent 578	e71a6e2aca2d
child 615	8881a09a06fd