lexing: comparison thys3/Paper.thy

equal deleted inserted replaced

-:ab626b60ee64
+:6a100d32314c
 than one string). But even when fixing a string from the language of the
 regular expression, there are generally more than one way of how the
 regular expression can match this string. POSIX lexing is about
 identifying the unique value for a given regular expression and a
 string that satisfies the informal POSIX rules (see
-\cite{POSIX,Kuklewicz,OkuiSuzuki2010,Sulzmann2014,Vansummeren2006}).\footnote{POSIX
+\cite{POSIX,Kuklewicz,OkuiSuzuki2010,Sulzmann2014,Vansummeren2006}).
-	lexing acquired its name from the fact that the corresponding
+%\footnote{POSIX
-	rules were described as part of the POSIX specification for
+%	lexing acquired its name from the fact that the corresponding
-	Unix-like operating systems \cite{POSIX}.} Sometimes these
+%	rules were described as part of the POSIX specification for
+%	Unix-like operating systems \cite{POSIX}.}
+Sometimes these
 informal rules are called \emph{maximal munch rule} and \emph{rule priority}.
 One contribution of our earlier paper is to give a convenient
 specification for what POSIX values are (the inductive rules are shown in
 Figure~\ref{POSIXrules}).
 \begin{figure}[t]
-\begin{center}
+\begin{center}\small%
 \begin{tabular}{@ {}c@ {}}
+\\[-9mm]
 @{thm[mode=Axiom] Posix.intros(1)}\<open>P\<close>@{term "ONE"} \quad
 @{thm[mode=Axiom] Posix.intros(2)}\<open>P\<close>@{term "c"}\quad
 @{thm[mode=Rule] Posix.intros(3)[of "s" "r\<^sub>1" "v" "r\<^sub>2"]}\<open>P+L\<close>\quad
 @{thm[mode=Rule] Posix.intros(4)[of "s" "r\<^sub>2" "v" "r\<^sub>1"]}\<open>P+R\<close>\medskip\\
 $\mprset{flushleft}
 \end{center}
 %
 \noindent
 The picture shows the steps required when a
 regular expression, say @{text "r\<^sub>1"}, matches the string @{term
-"[a,b,c]"}. The first lexing algorithm by Sulzmann and Lu can be defined as:\\[-8mm]
+"[a,b,c]"}. The first lexing algorithm by Sulzmann and Lu can be defined as:%\\[-8mm]
 %  \begin{figure}[t]
 %\begin{center}
 %\begin{tikzpicture}[scale=1,node distance=1cm,
 %                    every node/.style={minimum size=6mm}]
 \end{proposition}
 \noindent
 With this in place we were able to prove:
+\begin{proposition}\mbox{}\label{lexercorrect}
-\begin{proposition}\mbox{}\smallskip\\\label{lexercorrect}
+\textrm{(1)} @{thm (lhs) lexer_correct_None} if and only if @{thm (rhs) lexer_correct_None}.\\
-\begin{tabular}{ll}
+\mbox{\hspace{29mm}}\textrm{(2)}\; @{thm (lhs) lexer_correct_Some} if and only if @{thm (rhs) lexer_correct_Some}.
-(1) & @{thm (lhs) lexer_correct_None} if and only if @{thm (rhs) lexer_correct_None}\\
+%
-(2) & @{thm (lhs) lexer_correct_Some} if and only if @{thm (rhs) lexer_correct_Some}\\
+% \smallskip\\
-\end{tabular}
+%\begin{tabular}{ll}
+%(1) & @{thm (lhs) lexer_correct_None} if and only if @{thm (rhs) lexer_correct_None}\\
+%(2) & @{thm (lhs) lexer_correct_Some} if and only if @{thm (rhs) lexer_correct_Some}\\
+%\end{tabular}
 \end{proposition}
 \noindent
 In fact we have shown that, in the success case, the generated POSIX value $v$ is
 unique and in the failure case that there is no POSIX value $v$ that satisfies
 relate to the ``standard'' operations on regular expressions. For
 example if we build a bitcoded derivative and erase the result, this
 is the same as if we first erase the bitcoded regular expression and
 then perform the ``standard'' derivative operation.
-\begin{lemma}\label{bnullable}\mbox{}\smallskip\\
+\begin{lemma}\label{bnullable}%\mbox{}\smallskip\\
-\begin{tabular}{ll}
+\textit{(1)} $(r\backslash s)^\downarrow = (r^\downarrow)\backslash s$\\
-\textit{(1)} & $(r\backslash s)^\downarrow = (r^\downarrow)\backslash s$\\
+\mbox{\hspace{22mm}}\textit{(2)} $\textit{bnullable}(r)$ iff $\textit{nullable}(r^\downarrow)$\\
-\textit{(2)} & $\textit{bnullable}(r)$ iff $\textit{nullable}(r^\downarrow)$\\
+\mbox{\hspace{22mm}}\textit{(3)} $\textit{bmkeps}(r) = \textit{retrieve}\,r\,(\textit{mkeps}\,(r^\downarrow))$ provided $\textit{nullable}(r^\downarrow)$
-\textit{(3)} & $\textit{bmkeps}(r) = \textit{retrieve}\,r\,(\textit{mkeps}\,(r^\downarrow))$ provided $\textit{nullable}(r^\downarrow)$.
+%\begin{tabular}{ll}
-\end{tabular}
+%\textit{(1)} & $(r\backslash s)^\downarrow = (r^\downarrow)\backslash s$\\
+%\textit{(2)} & $\textit{bnullable}(r)$ iff $\textit{nullable}(r^\downarrow)$\\
+%\textit{(3)} & $\textit{bmkeps}(r) = \textit{retrieve}\,r\,(\textit{mkeps}\,(r^\downarrow))$ provided $\textit{nullable}(r^\downarrow)$.
+%\end{tabular}
 \end{lemma}
 %\begin{proof}
 %  All properties are by induction on annotated regular expressions.
 %  %There are no interesting cases.
 @{text flts} and analyses lists of regular expressions coming from alternatives.
 It is defined as follows:
 \begin{center}
 \begin{tabular}{l@ {\hspace{1mm}}c@ {\hspace{1mm}}l}
-@{thm (lhs) flts.simps(1)} & $\dn$ & @{thm (rhs) flts.simps(1)}\\
+\multicolumn{3}{@ {}c}{@{thm (lhs) flts.simps(1)} $\dn$ @{thm (rhs) flts.simps(1)} \qquad\qquad\qquad\qquad
-@{thm (lhs) flts.simps(2)} & $\dn$ & @{thm (rhs) flts.simps(2)}\\
+@{thm (lhs) flts.simps(2)} $\dn$ @{thm (rhs) flts.simps(2)}}\\
 @{thm (lhs) flts.simps(3)[of "bs'" "rs'"]} & $\dn$ & @{thm (rhs) flts.simps(3)[of "bs'" "rs'"]}\\
 \end{tabular}
 \end{center}
 \noindent
 we shall show next.
 \begin{figure}[t]
 \begin{center}
 \begin{tabular}{@ {\hspace{-8mm}}c@ {}}
+\\[-7mm]
 @{thm[mode=Axiom] bs1[of _ "r\<^sub>2"]}$S\ZERO{}_l$\quad
 @{thm[mode=Axiom] bs2[of _ "r\<^sub>1"]}$S\ZERO{}_r$\quad
 @{thm[mode=Axiom] bs3[of "bs\<^sub>1" "bs\<^sub>2"]}$S\ONE$\\
 @{thm[mode=Rule] bs4[of "r\<^sub>1" "r\<^sub>2" _ "r\<^sub>3"]}SL\qquad
 @{thm[mode=Rule] bs5[of "r\<^sub>3" "r\<^sub>4" _ "r\<^sub>1"]}SR\\
 where in (1) the $\textit{Suffix}(@{text "r"}_1, s)$ are the all the suffixes of $s$ where @{term "bders_simp r\<^sub>1 s'"} is nullable ($s'$ being a suffix of $s$).
 In (3) we know that  $\llbracket@{term "ASEQ [] (bders_simp r\<^sub>1 s) r\<^sub>2"}\rrbracket$ is
 bounded by $N_1 + \llbracket{}r_2\rrbracket + 1$. In (5) we know the list comprehension contains only regular expressions of size smaller
 than $N_2$. The list length after @{text distinctWith} is bounded by a number, which we call $l_{N_2}$. It stands
 for the number of distinct regular expressions smaller than $N_2$ (there can only be finitely many of them).
-We reason similarly for @{text STAR}.\medskip
+We reason similarly for @{text STAR}.\smallskip
-\noindent
 Clearly we give in this finiteness argument (Step (5)) a very loose bound that is
 far from the actual bound we can expect. We can do better than this, but this does not improve
 the finiteness property we are proving. If we are interested in a polynomial bound,
 one would hope to obtain a similar tight bound as for partial
 derivatives introduced by Antimirov \cite{Antimirov95}. After all the idea with
 to introduce our own definitions and proof ideas in order to establish the
 correctness.  Our interest in the second algorithm
 lies in the fact that by using bitcoded regular expressions and an aggressive
 simplification method there is a chance that the derivatives
 can be kept universally small  (we established in this paper that
-they can be kept finitely bounded for any string). This is important if one is after
+they can be kept finitely bounded for any string).
-an efficient POSIX lexing algorithm based on derivatives.
+%This is important if one is after
+%an efficient POSIX lexing algorithm based on derivatives.
 Having proved the correctness of the POSIX lexing algorithm, which
 lessons have we learned? Well, we feel this is a very good example
 where formal proofs give further insight into the matter at
 hand. For example it is very hard to see a problem with @{text nub}
 vs @{text distinctWith} with only experimental data---one would still
 see the correct result but find that simplification does not simplify in well-chosen, but not
-obscure, examples. We found that from an implementation
+obscure, examples.
-point-of-view it is really important to have the formal proofs of
+%We found that from an implementation
-the corresponding properties at hand.
+%point-of-view it is really important to have the formal proofs of
+%the corresponding properties at hand.
 We have also developed a
 healthy suspicion when experimental data is used to back up
 efficiency claims. For example Sulzmann and Lu write about their
 equivalent of @{term blexer_simp} \textit{``...we can incrementally compute
 The contribution of this paper is to make sure
 derivatives do not grow arbitrarily big (universially) In the example \mbox{@{text "(a + aa)\<^sup>*"}},
 \emph{all} derivatives have a size of 17 or less. The result is that
 lexing a string of, say, 50\,000 a's with the regular expression \mbox{@{text "(a + aa)\<^sup>*"}} takes approximately
 10 seconds with our Scala implementation
-of the presented algorithm.
+of the presented algorithm. Our Isabelle code including the results from Sec.~5 is available from \url{https://github.com/urbanchr/posix}.
-\smallskip
+%\\[-10mm]
-\noindent
-Our Isabelle code including the results from Sec.~5 is available from \url{https://github.com/urbanchr/posix}.\\[-10mm]
 %%\bibliographystyle{plain}
 \bibliography{root}

changeset 499	6a100d32314c
parent 498	ab626b60ee64
child 502	1ab693d6342f