--- a/thys3/Paper.thy Mon May 02 00:23:39 2022 +0100
+++ b/thys3/Paper.thy Mon May 02 00:24:01 2022 +0100
@@ -351,18 +351,21 @@
regular expression can match this string. POSIX lexing is about
identifying the unique value for a given regular expression and a
string that satisfies the informal POSIX rules (see
- \cite{POSIX,Kuklewicz,OkuiSuzuki2010,Sulzmann2014,Vansummeren2006}).\footnote{POSIX
- lexing acquired its name from the fact that the corresponding
- rules were described as part of the POSIX specification for
- Unix-like operating systems \cite{POSIX}.} Sometimes these
+ \cite{POSIX,Kuklewicz,OkuiSuzuki2010,Sulzmann2014,Vansummeren2006}).
+ %\footnote{POSIX
+ % lexing acquired its name from the fact that the corresponding
+ % rules were described as part of the POSIX specification for
+ % Unix-like operating systems \cite{POSIX}.}
+ Sometimes these
informal rules are called \emph{maximal munch rule} and \emph{rule priority}.
One contribution of our earlier paper is to give a convenient
specification for what POSIX values are (the inductive rules are shown in
Figure~\ref{POSIXrules}).
\begin{figure}[t]
- \begin{center}
+ \begin{center}\small%
\begin{tabular}{@ {}c@ {}}
+ \\[-9mm]
@{thm[mode=Axiom] Posix.intros(1)}\<open>P\<close>@{term "ONE"} \quad
@{thm[mode=Axiom] Posix.intros(2)}\<open>P\<close>@{term "c"}\quad
@{thm[mode=Rule] Posix.intros(3)[of "s" "r\<^sub>1" "v" "r\<^sub>2"]}\<open>P+L\<close>\quad
@@ -466,7 +469,7 @@
\noindent
The picture shows the steps required when a
regular expression, say @{text "r\<^sub>1"}, matches the string @{term
- "[a,b,c]"}. The first lexing algorithm by Sulzmann and Lu can be defined as:\\[-8mm]
+ "[a,b,c]"}. The first lexing algorithm by Sulzmann and Lu can be defined as:%\\[-8mm]
% \begin{figure}[t]
%\begin{center}
@@ -531,12 +534,15 @@
\noindent
With this in place we were able to prove:
-
- \begin{proposition}\mbox{}\smallskip\\\label{lexercorrect}
- \begin{tabular}{ll}
- (1) & @{thm (lhs) lexer_correct_None} if and only if @{thm (rhs) lexer_correct_None}\\
- (2) & @{thm (lhs) lexer_correct_Some} if and only if @{thm (rhs) lexer_correct_Some}\\
- \end{tabular}
+ \begin{proposition}\mbox{}\label{lexercorrect}
+ \textrm{(1)} @{thm (lhs) lexer_correct_None} if and only if @{thm (rhs) lexer_correct_None}.\\
+ \mbox{\hspace{29mm}}\textrm{(2)}\; @{thm (lhs) lexer_correct_Some} if and only if @{thm (rhs) lexer_correct_Some}.
+ %
+ % \smallskip\\
+ %\begin{tabular}{ll}
+ %(1) & @{thm (lhs) lexer_correct_None} if and only if @{thm (rhs) lexer_correct_None}\\
+ %(2) & @{thm (lhs) lexer_correct_Some} if and only if @{thm (rhs) lexer_correct_Some}\\
+ %\end{tabular}
\end{proposition}
\noindent
@@ -843,12 +849,15 @@
is the same as if we first erase the bitcoded regular expression and
then perform the ``standard'' derivative operation.
-\begin{lemma}\label{bnullable}\mbox{}\smallskip\\
- \begin{tabular}{ll}
-\textit{(1)} & $(r\backslash s)^\downarrow = (r^\downarrow)\backslash s$\\
-\textit{(2)} & $\textit{bnullable}(r)$ iff $\textit{nullable}(r^\downarrow)$\\
-\textit{(3)} & $\textit{bmkeps}(r) = \textit{retrieve}\,r\,(\textit{mkeps}\,(r^\downarrow))$ provided $\textit{nullable}(r^\downarrow)$.
-\end{tabular}
+\begin{lemma}\label{bnullable}%\mbox{}\smallskip\\
+\textit{(1)} $(r\backslash s)^\downarrow = (r^\downarrow)\backslash s$\\
+\mbox{\hspace{22mm}}\textit{(2)} $\textit{bnullable}(r)$ iff $\textit{nullable}(r^\downarrow)$\\
+\mbox{\hspace{22mm}}\textit{(3)} $\textit{bmkeps}(r) = \textit{retrieve}\,r\,(\textit{mkeps}\,(r^\downarrow))$ provided $\textit{nullable}(r^\downarrow)$
+%\begin{tabular}{ll}
+%\textit{(1)} & $(r\backslash s)^\downarrow = (r^\downarrow)\backslash s$\\
+%\textit{(2)} & $\textit{bnullable}(r)$ iff $\textit{nullable}(r^\downarrow)$\\
+%\textit{(3)} & $\textit{bmkeps}(r) = \textit{retrieve}\,r\,(\textit{mkeps}\,(r^\downarrow))$ provided $\textit{nullable}(r^\downarrow)$.
+%\end{tabular}
\end{lemma}
%\begin{proof}
@@ -1065,8 +1074,8 @@
\begin{center}
\begin{tabular}{l@ {\hspace{1mm}}c@ {\hspace{1mm}}l}
- @{thm (lhs) flts.simps(1)} & $\dn$ & @{thm (rhs) flts.simps(1)}\\
- @{thm (lhs) flts.simps(2)} & $\dn$ & @{thm (rhs) flts.simps(2)}\\
+ \multicolumn{3}{@ {}c}{@{thm (lhs) flts.simps(1)} $\dn$ @{thm (rhs) flts.simps(1)} \qquad\qquad\qquad\qquad
+ @{thm (lhs) flts.simps(2)} $\dn$ @{thm (rhs) flts.simps(2)}}\\
@{thm (lhs) flts.simps(3)[of "bs'" "rs'"]} & $\dn$ & @{thm (rhs) flts.simps(3)[of "bs'" "rs'"]}\\
\end{tabular}
\end{center}
@@ -1277,6 +1286,7 @@
\begin{figure}[t]
\begin{center}
\begin{tabular}{@ {\hspace{-8mm}}c@ {}}
+ \\[-7mm]
@{thm[mode=Axiom] bs1[of _ "r\<^sub>2"]}$S\ZERO{}_l$\quad
@{thm[mode=Axiom] bs2[of _ "r\<^sub>1"]}$S\ZERO{}_r$\quad
@{thm[mode=Axiom] bs3[of "bs\<^sub>1" "bs\<^sub>2"]}$S\ONE$\\
@@ -1348,9 +1358,9 @@
bounded by $N_1 + \llbracket{}r_2\rrbracket + 1$. In (5) we know the list comprehension contains only regular expressions of size smaller
than $N_2$. The list length after @{text distinctWith} is bounded by a number, which we call $l_{N_2}$. It stands
for the number of distinct regular expressions smaller than $N_2$ (there can only be finitely many of them).
-We reason similarly for @{text STAR}.\medskip
+We reason similarly for @{text STAR}.\smallskip
-\noindent
+
Clearly we give in this finiteness argument (Step (5)) a very loose bound that is
far from the actual bound we can expect. We can do better than this, but this does not improve
the finiteness property we are proving. If we are interested in a polynomial bound,
@@ -1384,8 +1394,9 @@
lies in the fact that by using bitcoded regular expressions and an aggressive
simplification method there is a chance that the derivatives
can be kept universally small (we established in this paper that
- they can be kept finitely bounded for any string). This is important if one is after
- an efficient POSIX lexing algorithm based on derivatives.
+ they can be kept finitely bounded for any string).
+ %This is important if one is after
+ %an efficient POSIX lexing algorithm based on derivatives.
Having proved the correctness of the POSIX lexing algorithm, which
lessons have we learned? Well, we feel this is a very good example
@@ -1393,9 +1404,10 @@
hand. For example it is very hard to see a problem with @{text nub}
vs @{text distinctWith} with only experimental data---one would still
see the correct result but find that simplification does not simplify in well-chosen, but not
- obscure, examples. We found that from an implementation
- point-of-view it is really important to have the formal proofs of
- the corresponding properties at hand.
+ obscure, examples.
+ %We found that from an implementation
+ %point-of-view it is really important to have the formal proofs of
+ %the corresponding properties at hand.
We have also developed a
healthy suspicion when experimental data is used to back up
@@ -1430,11 +1442,8 @@
\emph{all} derivatives have a size of 17 or less. The result is that
lexing a string of, say, 50\,000 a's with the regular expression \mbox{@{text "(a + aa)\<^sup>*"}} takes approximately
10 seconds with our Scala implementation
- of the presented algorithm.
- \smallskip
-
- \noindent
- Our Isabelle code including the results from Sec.~5 is available from \url{https://github.com/urbanchr/posix}.\\[-10mm]
+ of the presented algorithm. Our Isabelle code including the results from Sec.~5 is available from \url{https://github.com/urbanchr/posix}.
+ %\\[-10mm]
%%\bibliographystyle{plain}