thys2/Paper/Paper.thy
changeset 436 222333d2bdc2
parent 426 5b77220fdf01
child 458 30c91ea7095b
equal deleted inserted replaced
434:0cce1aee0fb2 436:222333d2bdc2
   673   \noindent
   673   \noindent
   674   There is also an \emph{erase}-function, written $r^\downarrow$, which
   674   There is also an \emph{erase}-function, written $r^\downarrow$, which
   675   transforms a bitcoded regular expression into a (standard) regular
   675   transforms a bitcoded regular expression into a (standard) regular
   676   expression by just erasing the annotated bitsequences. We omit the
   676   expression by just erasing the annotated bitsequences. We omit the
   677   straightforward definition. For defining the algorithm, we also need
   677   straightforward definition. For defining the algorithm, we also need
   678   the functions \textit{bnullable} and \textit{bmkeps}, which are the
   678   the functions \textit{bnullable} and \textit{bmkeps}(\textit{s}), which are the
   679   ``lifted'' versions of \textit{nullable} and \textit{mkeps} acting on
   679   ``lifted'' versions of \textit{nullable} and \textit{mkeps} acting on
   680   bitcoded regular expressions.
   680   bitcoded regular expressions.
   681   %
   681   %
   682   \begin{center}
   682   \begin{center}
   683   \begin{tabular}{@ {}c@ {}c@ {}}
   683   \begin{tabular}{@ {}c@ {}c@ {}}
   693      $\textit{true}$
   693      $\textit{true}$
   694   \end{tabular}
   694   \end{tabular}
   695   &
   695   &
   696   \begin{tabular}{@ {}l@ {\hspace{1mm}}c@ {\hspace{1mm}}l@ {}}
   696   \begin{tabular}{@ {}l@ {\hspace{1mm}}c@ {\hspace{1mm}}l@ {}}
   697   $\textit{bmkeps}\,(\textit{ONE}\,bs)$ & $\dn$ & $bs$\\
   697   $\textit{bmkeps}\,(\textit{ONE}\,bs)$ & $\dn$ & $bs$\\
   698   $\textit{bmkeps}\,(\textit{ALTs}\,bs\,r\!::\!\rs)$ & $\dn$ &
   698   $\textit{bmkeps}\,(\textit{ALTs}\,bs\,\rs)$ & $\dn$ &
   699      $\textit{if}\;\textit{bnullable}\,r$\\
   699   $bs\,@\,\textit{bmkepss}\,\rs$\\
   700   & &$\textit{then}\;bs\,@\,\textit{bmkeps}\,r$\\
       
   701   & &$\textit{else}\;bs\,@\,\textit{bmkeps}\,\rs$\\
       
   702   $\textit{bmkeps}\,(\textit{SEQ}\,bs\,r_1\,r_2)$ & $\dn$ &\\
   700   $\textit{bmkeps}\,(\textit{SEQ}\,bs\,r_1\,r_2)$ & $\dn$ &\\
   703   \multicolumn{3}{r}{$bs \,@\,\textit{bmkeps}\,r_1\,@\, \textit{bmkeps}\,r_2$}\\
   701   \multicolumn{3}{r}{$bs \,@\,\textit{bmkeps}\,r_1\,@\, \textit{bmkeps}\,r_2$}\\
   704   $\textit{bmkeps}\,(\textit{STAR}\,bs\,r)$ & $\dn$ &
   702   $\textit{bmkeps}\,(\textit{STAR}\,bs\,r)$ & $\dn$ &
   705      $bs \,@\, [\S]$
   703      $bs \,@\, [\S]$\\
       
   704   $\textit{bmkepss}\,(r\!::\!\rs)$ & $\dn$ &
       
   705      $\textit{if}\;\textit{bnullable}\,r$\\
       
   706   & &$\textit{then}\;\textit{bmkeps}\,r$\\
       
   707   & &$\textit{else}\;\textit{bmkepss}\,\rs$
   706   \end{tabular}
   708   \end{tabular}
   707   \end{tabular}
   709   \end{tabular}
   708   \end{center}    
   710   \end{center}    
   709  
   711  
   710 
   712 
  1038      \end{center}
  1040      \end{center}
  1039 
  1041 
  1040      \noindent where we scan the list from left to right (because we
  1042      \noindent where we scan the list from left to right (because we
  1041      have to remove later copies). In @{text distinctBy}, @{text f} is a
  1043      have to remove later copies). In @{text distinctBy}, @{text f} is a
  1042      function and @{text acc} is an accumulator for regular
  1044      function and @{text acc} is an accumulator for regular
  1043      expressions---essentially a set of regular expression that we have already seen
  1045      expressions---essentially a set of regular expressions that we have already seen
  1044      while scanning the list. Therefore we delete an element, say @{text x},
  1046      while scanning the list. Therefore we delete an element, say @{text x},
  1045      from the list provided @{text "f x"} is already in the accumulator;
  1047      from the list provided @{text "f x"} is already in the accumulator;
  1046      otherwise we keep @{text x} and scan the rest of the list but 
  1048      otherwise we keep @{text x} and scan the rest of the list but 
  1047      add @{text "f x"} as another ``seen'' element to @{text acc}. We will use
  1049      add @{text "f x"} as another ``seen'' element to @{text acc}. We will use
  1048      @{term distinctBy} where @{text f} is the erase functions, @{term "erase (DUMMY)"},
  1050      @{term distinctBy} where @{text f} is the erase functions, @{term "erase (DUMMY)"},
  1180      We found it more helpful to introduce the rewriting systems shown in
  1182      We found it more helpful to introduce the rewriting systems shown in
  1181      Figure~\ref{SimpRewrites}. The idea is to generate 
  1183      Figure~\ref{SimpRewrites}. The idea is to generate 
  1182      simplified regular expressions in small steps (unlike the @{text bsimp}-function which
  1184      simplified regular expressions in small steps (unlike the @{text bsimp}-function which
  1183      does the same in a big step), and show that each of
  1185      does the same in a big step), and show that each of
  1184      the small steps preserves the bitcodes that lead to the final POSIX value.
  1186      the small steps preserves the bitcodes that lead to the final POSIX value.
  1185      The rewrite system is organised such that $\leadsto$ is for bitcode regular
  1187      The rewrite system is organised such that $\leadsto$ is for bitcoded regular
  1186      expressions and $\stackrel{s}{\leadsto}$ for lists of bitcoded regular
  1188      expressions and $\stackrel{s}{\leadsto}$ for lists of bitcoded regular
  1187      expressions. The former essentially implements the simplifications of
  1189      expressions. The former essentially implements the simplifications of
  1188      @{text "bsimpSEQ"} and @{text flts}; while the latter implements the
  1190      @{text "bsimpSEQ"} and @{text flts}; while the latter implements the
  1189      simplifications in @{text "bsimpALTs"}. We can show that any bitcoded
  1191      simplifications in @{text "bsimpALTs"}. We can show that any bitcoded
  1190      regular expression reduces in one or more steps to the simplified
  1192      regular expression reduces in zero or more steps to the simplified
  1191      regular expression generated by @{text bsimp}:
  1193      regular expression generated by @{text bsimp}:
  1192 
  1194 
  1193      \begin{lemma}\label{lemone}
  1195      \begin{lemma}\label{lemone}
  1194      @{thm[mode=IfThen] rewrites_to_bsimp}
  1196      @{thm[mode=IfThen] rewrites_to_bsimp}
  1195      \end{lemma}
  1197      \end{lemma}
  1315 
  1317 
  1316 section {* Finiteness of Derivatives *}
  1318 section {* Finiteness of Derivatives *}
  1317 
  1319 
  1318 text {*
  1320 text {*
  1319 
  1321 
  1320 In this section let us sketch our argument on why the size of the simplified
  1322 In this section let us sketch our argument for why the size of the simplified
  1321 derivatives with the aggressive simplification function is finite. Suppose
  1323 derivatives with the aggressive simplification function is finite. Suppose
  1322 we have a size functions for bitcoded regular expressions, written
  1324 we have a size function for bitcoded regular expressions, written
  1323 @{text "|r|"}, which counts the number of nodes if we regard $r$ as a tree
  1325 @{text "|r|"}, which counts the number of nodes if we regard $r$ as a tree
  1324 (we omit the precise definition). For this we show that for every $r$
  1326 (we omit the precise definition). For this we show that for every $r$
  1325 there exists a bound $N$
  1327 there exists a bound $N$
  1326 such that 
  1328 such that 
  1327 
  1329 
  1416    equivalent of @{term blexer_simp} ``...we can incrementally compute
  1418    equivalent of @{term blexer_simp} ``...we can incrementally compute
  1417    bitcoded parse trees in linear time in the size of the input''
  1419    bitcoded parse trees in linear time in the size of the input''
  1418    \cite[Page 14]{Sulzmann2014}. 
  1420    \cite[Page 14]{Sulzmann2014}. 
  1419    Given the growth of the
  1421    Given the growth of the
  1420    derivatives in some cases even after aggressive simplification, this
  1422    derivatives in some cases even after aggressive simplification, this
  1421    is a hard to believe fact. A similar claim of about a theoretical runtime
  1423    is a hard to believe fact. A similar claim about a theoretical runtime
  1422    of @{text "O(n\<^sup>2)"} is made for the Verbatim lexer, which calculates POSIX matches and is based on
  1424    of @{text "O(n\<^sup>2)"} is made for the Verbatim lexer, which calculates POSIX matches and is based on
  1423    derivatives \cite{verbatim}. In this case derivatives are not simplified.
  1425    derivatives \cite{verbatim}. In this case derivatives are not simplified.
  1424    Clearly our result of having finite
  1426    Clearly our result of having finite
  1425    derivatives is rather weak in this context but we think such effeciency claims
  1427    derivatives is rather weak in this context but we think such effeciency claims
  1426    require further scrutiny.\medskip
  1428    require further scrutiny.\medskip