673 \noindent |
673 \noindent |
674 There is also an \emph{erase}-function, written $r^\downarrow$, which |
674 There is also an \emph{erase}-function, written $r^\downarrow$, which |
675 transforms a bitcoded regular expression into a (standard) regular |
675 transforms a bitcoded regular expression into a (standard) regular |
676 expression by just erasing the annotated bitsequences. We omit the |
676 expression by just erasing the annotated bitsequences. We omit the |
677 straightforward definition. For defining the algorithm, we also need |
677 straightforward definition. For defining the algorithm, we also need |
678 the functions \textit{bnullable} and \textit{bmkeps}, which are the |
678 the functions \textit{bnullable} and \textit{bmkeps}(\textit{s}), which are the |
679 ``lifted'' versions of \textit{nullable} and \textit{mkeps} acting on |
679 ``lifted'' versions of \textit{nullable} and \textit{mkeps} acting on |
680 bitcoded regular expressions. |
680 bitcoded regular expressions. |
681 % |
681 % |
682 \begin{center} |
682 \begin{center} |
683 \begin{tabular}{@ {}c@ {}c@ {}} |
683 \begin{tabular}{@ {}c@ {}c@ {}} |
693 $\textit{true}$ |
693 $\textit{true}$ |
694 \end{tabular} |
694 \end{tabular} |
695 & |
695 & |
696 \begin{tabular}{@ {}l@ {\hspace{1mm}}c@ {\hspace{1mm}}l@ {}} |
696 \begin{tabular}{@ {}l@ {\hspace{1mm}}c@ {\hspace{1mm}}l@ {}} |
697 $\textit{bmkeps}\,(\textit{ONE}\,bs)$ & $\dn$ & $bs$\\ |
697 $\textit{bmkeps}\,(\textit{ONE}\,bs)$ & $\dn$ & $bs$\\ |
698 $\textit{bmkeps}\,(\textit{ALTs}\,bs\,r\!::\!\rs)$ & $\dn$ & |
698 $\textit{bmkeps}\,(\textit{ALTs}\,bs\,\rs)$ & $\dn$ & |
699 $\textit{if}\;\textit{bnullable}\,r$\\ |
699 $bs\,@\,\textit{bmkepss}\,\rs$\\ |
700 & &$\textit{then}\;bs\,@\,\textit{bmkeps}\,r$\\ |
|
701 & &$\textit{else}\;bs\,@\,\textit{bmkeps}\,\rs$\\ |
|
702 $\textit{bmkeps}\,(\textit{SEQ}\,bs\,r_1\,r_2)$ & $\dn$ &\\ |
700 $\textit{bmkeps}\,(\textit{SEQ}\,bs\,r_1\,r_2)$ & $\dn$ &\\ |
703 \multicolumn{3}{r}{$bs \,@\,\textit{bmkeps}\,r_1\,@\, \textit{bmkeps}\,r_2$}\\ |
701 \multicolumn{3}{r}{$bs \,@\,\textit{bmkeps}\,r_1\,@\, \textit{bmkeps}\,r_2$}\\ |
704 $\textit{bmkeps}\,(\textit{STAR}\,bs\,r)$ & $\dn$ & |
702 $\textit{bmkeps}\,(\textit{STAR}\,bs\,r)$ & $\dn$ & |
705 $bs \,@\, [\S]$ |
703 $bs \,@\, [\S]$\\ |
|
704 $\textit{bmkepss}\,(r\!::\!\rs)$ & $\dn$ & |
|
705 $\textit{if}\;\textit{bnullable}\,r$\\ |
|
706 & &$\textit{then}\;\textit{bmkeps}\,r$\\ |
|
707 & &$\textit{else}\;\textit{bmkepss}\,\rs$ |
706 \end{tabular} |
708 \end{tabular} |
707 \end{tabular} |
709 \end{tabular} |
708 \end{center} |
710 \end{center} |
709 |
711 |
710 |
712 |
1038 \end{center} |
1040 \end{center} |
1039 |
1041 |
1040 \noindent where we scan the list from left to right (because we |
1042 \noindent where we scan the list from left to right (because we |
1041 have to remove later copies). In @{text distinctBy}, @{text f} is a |
1043 have to remove later copies). In @{text distinctBy}, @{text f} is a |
1042 function and @{text acc} is an accumulator for regular |
1044 function and @{text acc} is an accumulator for regular |
1043 expressions---essentially a set of regular expression that we have already seen |
1045 expressions---essentially a set of regular expressions that we have already seen |
1044 while scanning the list. Therefore we delete an element, say @{text x}, |
1046 while scanning the list. Therefore we delete an element, say @{text x}, |
1045 from the list provided @{text "f x"} is already in the accumulator; |
1047 from the list provided @{text "f x"} is already in the accumulator; |
1046 otherwise we keep @{text x} and scan the rest of the list but |
1048 otherwise we keep @{text x} and scan the rest of the list but |
1047 add @{text "f x"} as another ``seen'' element to @{text acc}. We will use |
1049 add @{text "f x"} as another ``seen'' element to @{text acc}. We will use |
1048 @{term distinctBy} where @{text f} is the erase functions, @{term "erase (DUMMY)"}, |
1050 @{term distinctBy} where @{text f} is the erase functions, @{term "erase (DUMMY)"}, |
1180 We found it more helpful to introduce the rewriting systems shown in |
1182 We found it more helpful to introduce the rewriting systems shown in |
1181 Figure~\ref{SimpRewrites}. The idea is to generate |
1183 Figure~\ref{SimpRewrites}. The idea is to generate |
1182 simplified regular expressions in small steps (unlike the @{text bsimp}-function which |
1184 simplified regular expressions in small steps (unlike the @{text bsimp}-function which |
1183 does the same in a big step), and show that each of |
1185 does the same in a big step), and show that each of |
1184 the small steps preserves the bitcodes that lead to the final POSIX value. |
1186 the small steps preserves the bitcodes that lead to the final POSIX value. |
1185 The rewrite system is organised such that $\leadsto$ is for bitcode regular |
1187 The rewrite system is organised such that $\leadsto$ is for bitcoded regular |
1186 expressions and $\stackrel{s}{\leadsto}$ for lists of bitcoded regular |
1188 expressions and $\stackrel{s}{\leadsto}$ for lists of bitcoded regular |
1187 expressions. The former essentially implements the simplifications of |
1189 expressions. The former essentially implements the simplifications of |
1188 @{text "bsimpSEQ"} and @{text flts}; while the latter implements the |
1190 @{text "bsimpSEQ"} and @{text flts}; while the latter implements the |
1189 simplifications in @{text "bsimpALTs"}. We can show that any bitcoded |
1191 simplifications in @{text "bsimpALTs"}. We can show that any bitcoded |
1190 regular expression reduces in one or more steps to the simplified |
1192 regular expression reduces in zero or more steps to the simplified |
1191 regular expression generated by @{text bsimp}: |
1193 regular expression generated by @{text bsimp}: |
1192 |
1194 |
1193 \begin{lemma}\label{lemone} |
1195 \begin{lemma}\label{lemone} |
1194 @{thm[mode=IfThen] rewrites_to_bsimp} |
1196 @{thm[mode=IfThen] rewrites_to_bsimp} |
1195 \end{lemma} |
1197 \end{lemma} |
1315 |
1317 |
1316 section {* Finiteness of Derivatives *} |
1318 section {* Finiteness of Derivatives *} |
1317 |
1319 |
1318 text {* |
1320 text {* |
1319 |
1321 |
1320 In this section let us sketch our argument on why the size of the simplified |
1322 In this section let us sketch our argument for why the size of the simplified |
1321 derivatives with the aggressive simplification function is finite. Suppose |
1323 derivatives with the aggressive simplification function is finite. Suppose |
1322 we have a size functions for bitcoded regular expressions, written |
1324 we have a size function for bitcoded regular expressions, written |
1323 @{text "|r|"}, which counts the number of nodes if we regard $r$ as a tree |
1325 @{text "|r|"}, which counts the number of nodes if we regard $r$ as a tree |
1324 (we omit the precise definition). For this we show that for every $r$ |
1326 (we omit the precise definition). For this we show that for every $r$ |
1325 there exists a bound $N$ |
1327 there exists a bound $N$ |
1326 such that |
1328 such that |
1327 |
1329 |
1416 equivalent of @{term blexer_simp} ``...we can incrementally compute |
1418 equivalent of @{term blexer_simp} ``...we can incrementally compute |
1417 bitcoded parse trees in linear time in the size of the input'' |
1419 bitcoded parse trees in linear time in the size of the input'' |
1418 \cite[Page 14]{Sulzmann2014}. |
1420 \cite[Page 14]{Sulzmann2014}. |
1419 Given the growth of the |
1421 Given the growth of the |
1420 derivatives in some cases even after aggressive simplification, this |
1422 derivatives in some cases even after aggressive simplification, this |
1421 is a hard to believe fact. A similar claim of about a theoretical runtime |
1423 is a hard to believe fact. A similar claim about a theoretical runtime |
1422 of @{text "O(n\<^sup>2)"} is made for the Verbatim lexer, which calculates POSIX matches and is based on |
1424 of @{text "O(n\<^sup>2)"} is made for the Verbatim lexer, which calculates POSIX matches and is based on |
1423 derivatives \cite{verbatim}. In this case derivatives are not simplified. |
1425 derivatives \cite{verbatim}. In this case derivatives are not simplified. |
1424 Clearly our result of having finite |
1426 Clearly our result of having finite |
1425 derivatives is rather weak in this context but we think such effeciency claims |
1427 derivatives is rather weak in this context but we think such effeciency claims |
1426 require further scrutiny.\medskip |
1428 require further scrutiny.\medskip |