# HG changeset patch # User Chengsong # Date 1651447441 -3600 # Node ID a6d72af04096368d5d876dac7a44734c4f350793 # Parent 4d9eecfc936a8c4c0c8663196aa3d72113384bde# Parent 6a100d32314cfdf61ee7e182a37a1388dec096eb hahah diff -r 4d9eecfc936a -r a6d72af04096 thys3/Paper.thy --- a/thys3/Paper.thy Mon May 02 00:23:39 2022 +0100 +++ b/thys3/Paper.thy Mon May 02 00:24:01 2022 +0100 @@ -351,18 +351,21 @@ regular expression can match this string. POSIX lexing is about identifying the unique value for a given regular expression and a string that satisfies the informal POSIX rules (see - \cite{POSIX,Kuklewicz,OkuiSuzuki2010,Sulzmann2014,Vansummeren2006}).\footnote{POSIX - lexing acquired its name from the fact that the corresponding - rules were described as part of the POSIX specification for - Unix-like operating systems \cite{POSIX}.} Sometimes these + \cite{POSIX,Kuklewicz,OkuiSuzuki2010,Sulzmann2014,Vansummeren2006}). + %\footnote{POSIX + % lexing acquired its name from the fact that the corresponding + % rules were described as part of the POSIX specification for + % Unix-like operating systems \cite{POSIX}.} + Sometimes these informal rules are called \emph{maximal munch rule} and \emph{rule priority}. One contribution of our earlier paper is to give a convenient specification for what POSIX values are (the inductive rules are shown in Figure~\ref{POSIXrules}). \begin{figure}[t] - \begin{center} + \begin{center}\small% \begin{tabular}{@ {}c@ {}} + \\[-9mm] @{thm[mode=Axiom] Posix.intros(1)}\P\@{term "ONE"} \quad @{thm[mode=Axiom] Posix.intros(2)}\P\@{term "c"}\quad @{thm[mode=Rule] Posix.intros(3)[of "s" "r\<^sub>1" "v" "r\<^sub>2"]}\P+L\\quad @@ -466,7 +469,7 @@ \noindent The picture shows the steps required when a regular expression, say @{text "r\<^sub>1"}, matches the string @{term - "[a,b,c]"}. The first lexing algorithm by Sulzmann and Lu can be defined as:\\[-8mm] + "[a,b,c]"}. The first lexing algorithm by Sulzmann and Lu can be defined as:%\\[-8mm] % \begin{figure}[t] %\begin{center} @@ -531,12 +534,15 @@ \noindent With this in place we were able to prove: - - \begin{proposition}\mbox{}\smallskip\\\label{lexercorrect} - \begin{tabular}{ll} - (1) & @{thm (lhs) lexer_correct_None} if and only if @{thm (rhs) lexer_correct_None}\\ - (2) & @{thm (lhs) lexer_correct_Some} if and only if @{thm (rhs) lexer_correct_Some}\\ - \end{tabular} + \begin{proposition}\mbox{}\label{lexercorrect} + \textrm{(1)} @{thm (lhs) lexer_correct_None} if and only if @{thm (rhs) lexer_correct_None}.\\ + \mbox{\hspace{29mm}}\textrm{(2)}\; @{thm (lhs) lexer_correct_Some} if and only if @{thm (rhs) lexer_correct_Some}. + % + % \smallskip\\ + %\begin{tabular}{ll} + %(1) & @{thm (lhs) lexer_correct_None} if and only if @{thm (rhs) lexer_correct_None}\\ + %(2) & @{thm (lhs) lexer_correct_Some} if and only if @{thm (rhs) lexer_correct_Some}\\ + %\end{tabular} \end{proposition} \noindent @@ -843,12 +849,15 @@ is the same as if we first erase the bitcoded regular expression and then perform the ``standard'' derivative operation. -\begin{lemma}\label{bnullable}\mbox{}\smallskip\\ - \begin{tabular}{ll} -\textit{(1)} & $(r\backslash s)^\downarrow = (r^\downarrow)\backslash s$\\ -\textit{(2)} & $\textit{bnullable}(r)$ iff $\textit{nullable}(r^\downarrow)$\\ -\textit{(3)} & $\textit{bmkeps}(r) = \textit{retrieve}\,r\,(\textit{mkeps}\,(r^\downarrow))$ provided $\textit{nullable}(r^\downarrow)$. -\end{tabular} +\begin{lemma}\label{bnullable}%\mbox{}\smallskip\\ +\textit{(1)} $(r\backslash s)^\downarrow = (r^\downarrow)\backslash s$\\ +\mbox{\hspace{22mm}}\textit{(2)} $\textit{bnullable}(r)$ iff $\textit{nullable}(r^\downarrow)$\\ +\mbox{\hspace{22mm}}\textit{(3)} $\textit{bmkeps}(r) = \textit{retrieve}\,r\,(\textit{mkeps}\,(r^\downarrow))$ provided $\textit{nullable}(r^\downarrow)$ +%\begin{tabular}{ll} +%\textit{(1)} & $(r\backslash s)^\downarrow = (r^\downarrow)\backslash s$\\ +%\textit{(2)} & $\textit{bnullable}(r)$ iff $\textit{nullable}(r^\downarrow)$\\ +%\textit{(3)} & $\textit{bmkeps}(r) = \textit{retrieve}\,r\,(\textit{mkeps}\,(r^\downarrow))$ provided $\textit{nullable}(r^\downarrow)$. +%\end{tabular} \end{lemma} %\begin{proof} @@ -1065,8 +1074,8 @@ \begin{center} \begin{tabular}{l@ {\hspace{1mm}}c@ {\hspace{1mm}}l} - @{thm (lhs) flts.simps(1)} & $\dn$ & @{thm (rhs) flts.simps(1)}\\ - @{thm (lhs) flts.simps(2)} & $\dn$ & @{thm (rhs) flts.simps(2)}\\ + \multicolumn{3}{@ {}c}{@{thm (lhs) flts.simps(1)} $\dn$ @{thm (rhs) flts.simps(1)} \qquad\qquad\qquad\qquad + @{thm (lhs) flts.simps(2)} $\dn$ @{thm (rhs) flts.simps(2)}}\\ @{thm (lhs) flts.simps(3)[of "bs'" "rs'"]} & $\dn$ & @{thm (rhs) flts.simps(3)[of "bs'" "rs'"]}\\ \end{tabular} \end{center} @@ -1277,6 +1286,7 @@ \begin{figure}[t] \begin{center} \begin{tabular}{@ {\hspace{-8mm}}c@ {}} + \\[-7mm] @{thm[mode=Axiom] bs1[of _ "r\<^sub>2"]}$S\ZERO{}_l$\quad @{thm[mode=Axiom] bs2[of _ "r\<^sub>1"]}$S\ZERO{}_r$\quad @{thm[mode=Axiom] bs3[of "bs\<^sub>1" "bs\<^sub>2"]}$S\ONE$\\ @@ -1348,9 +1358,9 @@ bounded by $N_1 + \llbracket{}r_2\rrbracket + 1$. In (5) we know the list comprehension contains only regular expressions of size smaller than $N_2$. The list length after @{text distinctWith} is bounded by a number, which we call $l_{N_2}$. It stands for the number of distinct regular expressions smaller than $N_2$ (there can only be finitely many of them). -We reason similarly for @{text STAR}.\medskip +We reason similarly for @{text STAR}.\smallskip -\noindent + Clearly we give in this finiteness argument (Step (5)) a very loose bound that is far from the actual bound we can expect. We can do better than this, but this does not improve the finiteness property we are proving. If we are interested in a polynomial bound, @@ -1384,8 +1394,9 @@ lies in the fact that by using bitcoded regular expressions and an aggressive simplification method there is a chance that the derivatives can be kept universally small (we established in this paper that - they can be kept finitely bounded for any string). This is important if one is after - an efficient POSIX lexing algorithm based on derivatives. + they can be kept finitely bounded for any string). + %This is important if one is after + %an efficient POSIX lexing algorithm based on derivatives. Having proved the correctness of the POSIX lexing algorithm, which lessons have we learned? Well, we feel this is a very good example @@ -1393,9 +1404,10 @@ hand. For example it is very hard to see a problem with @{text nub} vs @{text distinctWith} with only experimental data---one would still see the correct result but find that simplification does not simplify in well-chosen, but not - obscure, examples. We found that from an implementation - point-of-view it is really important to have the formal proofs of - the corresponding properties at hand. + obscure, examples. + %We found that from an implementation + %point-of-view it is really important to have the formal proofs of + %the corresponding properties at hand. We have also developed a healthy suspicion when experimental data is used to back up @@ -1430,11 +1442,8 @@ \emph{all} derivatives have a size of 17 or less. The result is that lexing a string of, say, 50\,000 a's with the regular expression \mbox{@{text "(a + aa)\<^sup>*"}} takes approximately 10 seconds with our Scala implementation - of the presented algorithm. - \smallskip - - \noindent - Our Isabelle code including the results from Sec.~5 is available from \url{https://github.com/urbanchr/posix}.\\[-10mm] + of the presented algorithm. Our Isabelle code including the results from Sec.~5 is available from \url{https://github.com/urbanchr/posix}. + %\\[-10mm] %%\bibliographystyle{plain} diff -r 4d9eecfc936a -r a6d72af04096 thys3/document/root.tex --- a/thys3/document/root.tex Mon May 02 00:23:39 2022 +0100 +++ b/thys3/document/root.tex Mon May 02 00:24:01 2022 +0100 @@ -35,8 +35,8 @@ \addtolength{\oddsidemargin}{-1.5mm} \addtolength{\evensidemargin}{-1.5mm} -\addtolength{\textwidth}{4mm} -\addtolength{\textheight}{1.5mm} +\addtolength{\textwidth}{3.4mm} +\addtolength{\textheight}{1.4mm} \def\lexer{\mathit{lexer}} \def\mkeps{\mathit{mkeps}}