diff -r 0cce1aee0fb2 -r 222333d2bdc2 thys2/Paper/Paper.thy --- a/thys2/Paper/Paper.thy Mon Feb 21 23:38:26 2022 +0000 +++ b/thys2/Paper/Paper.thy Wed Mar 02 11:43:41 2022 +0000 @@ -675,7 +675,7 @@ transforms a bitcoded regular expression into a (standard) regular expression by just erasing the annotated bitsequences. We omit the straightforward definition. For defining the algorithm, we also need - the functions \textit{bnullable} and \textit{bmkeps}, which are the + the functions \textit{bnullable} and \textit{bmkeps}(\textit{s}), which are the ``lifted'' versions of \textit{nullable} and \textit{mkeps} acting on bitcoded regular expressions. % @@ -695,14 +695,16 @@ & \begin{tabular}{@ {}l@ {\hspace{1mm}}c@ {\hspace{1mm}}l@ {}} $\textit{bmkeps}\,(\textit{ONE}\,bs)$ & $\dn$ & $bs$\\ - $\textit{bmkeps}\,(\textit{ALTs}\,bs\,r\!::\!\rs)$ & $\dn$ & - $\textit{if}\;\textit{bnullable}\,r$\\ - & &$\textit{then}\;bs\,@\,\textit{bmkeps}\,r$\\ - & &$\textit{else}\;bs\,@\,\textit{bmkeps}\,\rs$\\ + $\textit{bmkeps}\,(\textit{ALTs}\,bs\,\rs)$ & $\dn$ & + $bs\,@\,\textit{bmkepss}\,\rs$\\ $\textit{bmkeps}\,(\textit{SEQ}\,bs\,r_1\,r_2)$ & $\dn$ &\\ \multicolumn{3}{r}{$bs \,@\,\textit{bmkeps}\,r_1\,@\, \textit{bmkeps}\,r_2$}\\ $\textit{bmkeps}\,(\textit{STAR}\,bs\,r)$ & $\dn$ & - $bs \,@\, [\S]$ + $bs \,@\, [\S]$\\ + $\textit{bmkepss}\,(r\!::\!\rs)$ & $\dn$ & + $\textit{if}\;\textit{bnullable}\,r$\\ + & &$\textit{then}\;\textit{bmkeps}\,r$\\ + & &$\textit{else}\;\textit{bmkepss}\,\rs$ \end{tabular} \end{tabular} \end{center} @@ -1040,7 +1042,7 @@ \noindent where we scan the list from left to right (because we have to remove later copies). In @{text distinctBy}, @{text f} is a function and @{text acc} is an accumulator for regular - expressions---essentially a set of regular expression that we have already seen + expressions---essentially a set of regular expressions that we have already seen while scanning the list. Therefore we delete an element, say @{text x}, from the list provided @{text "f x"} is already in the accumulator; otherwise we keep @{text x} and scan the rest of the list but @@ -1182,12 +1184,12 @@ simplified regular expressions in small steps (unlike the @{text bsimp}-function which does the same in a big step), and show that each of the small steps preserves the bitcodes that lead to the final POSIX value. - The rewrite system is organised such that $\leadsto$ is for bitcode regular + The rewrite system is organised such that $\leadsto$ is for bitcoded regular expressions and $\stackrel{s}{\leadsto}$ for lists of bitcoded regular expressions. The former essentially implements the simplifications of @{text "bsimpSEQ"} and @{text flts}; while the latter implements the simplifications in @{text "bsimpALTs"}. We can show that any bitcoded - regular expression reduces in one or more steps to the simplified + regular expression reduces in zero or more steps to the simplified regular expression generated by @{text bsimp}: \begin{lemma}\label{lemone} @@ -1317,9 +1319,9 @@ text {* -In this section let us sketch our argument on why the size of the simplified +In this section let us sketch our argument for why the size of the simplified derivatives with the aggressive simplification function is finite. Suppose -we have a size functions for bitcoded regular expressions, written +we have a size function for bitcoded regular expressions, written @{text "|r|"}, which counts the number of nodes if we regard $r$ as a tree (we omit the precise definition). For this we show that for every $r$ there exists a bound $N$ @@ -1418,7 +1420,7 @@ \cite[Page 14]{Sulzmann2014}. Given the growth of the derivatives in some cases even after aggressive simplification, this - is a hard to believe fact. A similar claim of about a theoretical runtime + is a hard to believe fact. A similar claim about a theoretical runtime of @{text "O(n\<^sup>2)"} is made for the Verbatim lexer, which calculates POSIX matches and is based on derivatives \cite{verbatim}. In this case derivatives are not simplified. Clearly our result of having finite