lexing: changeset 501:a6d72af04096

--- a/thys3/Paper.thy	Mon May 02 00:23:39 2022 +0100
+++ b/thys3/Paper.thy	Mon May 02 00:24:01 2022 +0100
@@ -351,18 +351,21 @@
   regular expression can match this string. POSIX lexing is about
   identifying the unique value for a given regular expression and a
   string that satisfies the informal POSIX rules (see
-  \cite{POSIX,Kuklewicz,OkuiSuzuki2010,Sulzmann2014,Vansummeren2006}).\footnote{POSIX
-	lexing acquired its name from the fact that the corresponding
-	rules were described as part of the POSIX specification for
-	Unix-like operating systems \cite{POSIX}.} Sometimes these
+  \cite{POSIX,Kuklewicz,OkuiSuzuki2010,Sulzmann2014,Vansummeren2006}).
+  %\footnote{POSIX
+  %	lexing acquired its name from the fact that the corresponding
+  %	rules were described as part of the POSIX specification for
+  %	Unix-like operating systems \cite{POSIX}.}
+  Sometimes these
   informal rules are called \emph{maximal munch rule} and \emph{rule priority}.
   One contribution of our earlier paper is to give a convenient
  specification for what POSIX values are (the inductive rules are shown in
   Figure~\ref{POSIXrules}).
 
 \begin{figure}[t]
-  \begin{center}
+  \begin{center}\small%
   \begin{tabular}{@ {}c@ {}}
+  \\[-9mm]
   @{thm[mode=Axiom] Posix.intros(1)}\<open>P\<close>@{term "ONE"} \quad
   @{thm[mode=Axiom] Posix.intros(2)}\<open>P\<close>@{term "c"}\quad
   @{thm[mode=Rule] Posix.intros(3)[of "s" "r\<^sub>1" "v" "r\<^sub>2"]}\<open>P+L\<close>\quad
@@ -466,7 +469,7 @@
 \noindent
   The picture shows the steps required when a
   regular expression, say @{text "r\<^sub>1"}, matches the string @{term
-  "[a,b,c]"}. The first lexing algorithm by Sulzmann and Lu can be defined as:\\[-8mm]
+  "[a,b,c]"}. The first lexing algorithm by Sulzmann and Lu can be defined as:%\\[-8mm]
 
 %  \begin{figure}[t]
 %\begin{center}
@@ -531,12 +534,15 @@
   \noindent
   With this in place we were able to prove:
 
-
-  \begin{proposition}\mbox{}\smallskip\\\label{lexercorrect}
-  \begin{tabular}{ll}
-  (1) & @{thm (lhs) lexer_correct_None} if and only if @{thm (rhs) lexer_correct_None}\\
-  (2) & @{thm (lhs) lexer_correct_Some} if and only if @{thm (rhs) lexer_correct_Some}\\
-  \end{tabular}
+  \begin{proposition}\mbox{}\label{lexercorrect}
+  \textrm{(1)} @{thm (lhs) lexer_correct_None} if and only if @{thm (rhs) lexer_correct_None}.\\
+  \mbox{\hspace{29mm}}\textrm{(2)}\; @{thm (lhs) lexer_correct_Some} if and only if @{thm (rhs) lexer_correct_Some}.
+  %
+  % \smallskip\\
+  %\begin{tabular}{ll}
+  %(1) & @{thm (lhs) lexer_correct_None} if and only if @{thm (rhs) lexer_correct_None}\\
+  %(2) & @{thm (lhs) lexer_correct_Some} if and only if @{thm (rhs) lexer_correct_Some}\\
+  %\end{tabular}
   \end{proposition}
 
   \noindent
@@ -843,12 +849,15 @@
 is the same as if we first erase the bitcoded regular expression and
 then perform the ``standard'' derivative operation.
 
-\begin{lemma}\label{bnullable}\mbox{}\smallskip\\
-  \begin{tabular}{ll}
-\textit{(1)} & $(r\backslash s)^\downarrow = (r^\downarrow)\backslash s$\\    
-\textit{(2)} & $\textit{bnullable}(r)$ iff $\textit{nullable}(r^\downarrow)$\\
-\textit{(3)} & $\textit{bmkeps}(r) = \textit{retrieve}\,r\,(\textit{mkeps}\,(r^\downarrow))$ provided $\textit{nullable}(r^\downarrow)$.
-\end{tabular}  
+\begin{lemma}\label{bnullable}%\mbox{}\smallskip\\
+\textit{(1)} $(r\backslash s)^\downarrow = (r^\downarrow)\backslash s$\\    
+\mbox{\hspace{22mm}}\textit{(2)} $\textit{bnullable}(r)$ iff $\textit{nullable}(r^\downarrow)$\\
+\mbox{\hspace{22mm}}\textit{(3)} $\textit{bmkeps}(r) = \textit{retrieve}\,r\,(\textit{mkeps}\,(r^\downarrow))$ provided $\textit{nullable}(r^\downarrow)$
+%\begin{tabular}{ll}
+%\textit{(1)} & $(r\backslash s)^\downarrow = (r^\downarrow)\backslash s$\\    
+%\textit{(2)} & $\textit{bnullable}(r)$ iff $\textit{nullable}(r^\downarrow)$\\
+%\textit{(3)} & $\textit{bmkeps}(r) = \textit{retrieve}\,r\,(\textit{mkeps}\,(r^\downarrow))$ provided $\textit{nullable}(r^\downarrow)$.
+%\end{tabular}  
 \end{lemma}
 
 %\begin{proof}
@@ -1065,8 +1074,8 @@
 
      \begin{center}
      \begin{tabular}{l@ {\hspace{1mm}}c@ {\hspace{1mm}}l}
-     @{thm (lhs) flts.simps(1)} & $\dn$ & @{thm (rhs) flts.simps(1)}\\
-     @{thm (lhs) flts.simps(2)} & $\dn$ & @{thm (rhs) flts.simps(2)}\\
+     \multicolumn{3}{@ {}c}{@{thm (lhs) flts.simps(1)} $\dn$ @{thm (rhs) flts.simps(1)} \qquad\qquad\qquad\qquad
+     @{thm (lhs) flts.simps(2)} $\dn$ @{thm (rhs) flts.simps(2)}}\\
      @{thm (lhs) flts.simps(3)[of "bs'" "rs'"]} & $\dn$ & @{thm (rhs) flts.simps(3)[of "bs'" "rs'"]}\\
      \end{tabular}
      \end{center}
@@ -1277,6 +1286,7 @@
    \begin{figure}[t]
   \begin{center}
   \begin{tabular}{@ {\hspace{-8mm}}c@ {}}
+  \\[-7mm]
   @{thm[mode=Axiom] bs1[of _ "r\<^sub>2"]}$S\ZERO{}_l$\quad
   @{thm[mode=Axiom] bs2[of _ "r\<^sub>1"]}$S\ZERO{}_r$\quad
   @{thm[mode=Axiom] bs3[of "bs\<^sub>1" "bs\<^sub>2"]}$S\ONE$\\
@@ -1348,9 +1358,9 @@
 bounded by $N_1 + \llbracket{}r_2\rrbracket + 1$. In (5) we know the list comprehension contains only regular expressions of size smaller
 than $N_2$. The list length after @{text distinctWith} is bounded by a number, which we call $l_{N_2}$. It stands
 for the number of distinct regular expressions smaller than $N_2$ (there can only be finitely many of them).
-We reason similarly for @{text STAR}.\medskip
+We reason similarly for @{text STAR}.\smallskip
 
-\noindent
+
 Clearly we give in this finiteness argument (Step (5)) a very loose bound that is
 far from the actual bound we can expect. We can do better than this, but this does not improve
 the finiteness property we are proving. If we are interested in a polynomial bound,
@@ -1384,8 +1394,9 @@
    lies in the fact that by using bitcoded regular expressions and an aggressive
    simplification method there is a chance that the derivatives
    can be kept universally small  (we established in this paper that
-   they can be kept finitely bounded for any string). This is important if one is after
-   an efficient POSIX lexing algorithm based on derivatives.
+   they can be kept finitely bounded for any string).
+   %This is important if one is after
+   %an efficient POSIX lexing algorithm based on derivatives.
 
    Having proved the correctness of the POSIX lexing algorithm, which
    lessons have we learned? Well, we feel this is a very good example
@@ -1393,9 +1404,10 @@
    hand. For example it is very hard to see a problem with @{text nub}
    vs @{text distinctWith} with only experimental data---one would still
    see the correct result but find that simplification does not simplify in well-chosen, but not
-   obscure, examples. We found that from an implementation
-   point-of-view it is really important to have the formal proofs of
-   the corresponding properties at hand.
+   obscure, examples.
+   %We found that from an implementation
+   %point-of-view it is really important to have the formal proofs of
+   %the corresponding properties at hand.
 
    We have also developed a
    healthy suspicion when experimental data is used to back up
@@ -1430,11 +1442,8 @@
    \emph{all} derivatives have a size of 17 or less. The result is that
    lexing a string of, say, 50\,000 a's with the regular expression \mbox{@{text "(a + aa)\<^sup>*"}} takes approximately
    10 seconds with our Scala implementation
-   of the presented algorithm. 
-   \smallskip
-
-   \noindent
-   Our Isabelle code including the results from Sec.~5 is available from \url{https://github.com/urbanchr/posix}.\\[-10mm]
+   of the presented algorithm. Our Isabelle code including the results from Sec.~5 is available from \url{https://github.com/urbanchr/posix}.
+   %\\[-10mm]
 
 
 %%\bibliographystyle{plain}

--- a/thys3/document/root.tex	Mon May 02 00:23:39 2022 +0100
+++ b/thys3/document/root.tex	Mon May 02 00:24:01 2022 +0100
@@ -35,8 +35,8 @@
 
 \addtolength{\oddsidemargin}{-1.5mm}
 \addtolength{\evensidemargin}{-1.5mm}
-\addtolength{\textwidth}{4mm}
-\addtolength{\textheight}{1.5mm}
+\addtolength{\textwidth}{3.4mm}
+\addtolength{\textheight}{1.4mm}
 
 \def\lexer{\mathit{lexer}}
 \def\mkeps{\mathit{mkeps}}

author	Chengsong
	Mon, 02 May 2022 00:24:01 +0100
changeset 501	a6d72af04096
parent 500	4d9eecfc936a (current diff)
parent 499	6a100d32314c (diff)
child 502	1ab693d6342f