ninems/ninems.tex
changeset 45 60cb82639691
parent 44 4d674a971852
child 46 9b48724ec609
equal deleted inserted replaced
44:4d674a971852 45:60cb82639691
     1  \documentclass[a4paper,UKenglish]{lipics}
     1 \documentclass[a4paper,UKenglish]{lipics}
     2 \usepackage{graphic}
     2 \usepackage{graphic}
     3 \usepackage{data}
     3 \usepackage{data}
     4 \usepackage{tikz-cd}
     4 \usepackage{tikz-cd}
     5 \usepackage{algorithm}
     5 \usepackage{algorithm}
     6 \usepackage{amsmath}
     6 \usepackage{amsmath}
   464 \begin{enumerate}
   464 \begin{enumerate}
   465     \item[1)] just $a$ or
   465     \item[1)] just $a$ or
   466     \item[2)] string $ab$ or 
   466     \item[2)] string $ab$ or 
   467     \item[3)] string $abc$.
   467     \item[3)] string $abc$.
   468 \end{enumerate}
   468 \end{enumerate}
   469 In order to differentiate between these choices, we just need to remember their positions--$a$ is on the left, $ab$ is in the middle , and $abc$ is on the right. Which one of these alternatives is chosen later does not affect their relative position because our algorithm does not change this order. There is no need to traverse this information twice. This leads to a new approach of lexing-- if we store the information for parse trees  in the corresponding regular expression pieces, update this information when we do derivative operation on them, and collect the information when fininshed with derivatives and calling $mkeps$ for deiciding which branch is POSIX, we can generate the parse tree in one pass, instead of doing an n-step backward transformation.This leads to Sulzmann and Lu's novel idea of using bit-codes on derivatives.
   469 In order to differentiate between these choices, we just need to remember their positions--$a$ is on the left, $ab$ is in the middle , and $abc$ is on the right. Which one of these alternatives is chosen later does not affect their relative position because our algorithm does not change this order. There is no need to traverse this information twice. This leads to a new approach of lexing-- if we store the information for parse trees  in the corresponding regular expression pieces, update this information when we do derivative operation on them, and collect the information when finished with derivatives and calling $mkeps$ for deciding which branch is POSIX, we can generate the parse tree in one pass, instead of doing an n-step backward transformation.This leads to Sulzmann and Lu's novel idea of using bit-codes on derivatives.
   470 
   470 
   471 In the next section, we shall focus on the bit-coded algorithm and the natural
   471 In the next section, we shall focus on the bit-coded algorithm and the natural
   472 process of simplification of regular expressions using bit-codes, which is needed in
   472 process of simplification of regular expressions using bit-codes, which is needed in
   473 order to obtain \emph{fast} versions of the Brzozowski's, and Sulzmann
   473 order to obtain \emph{fast} versions of the Brzozowski's, and Sulzmann
   474 and Lu's algorithms.  This is where the PhD-project hopes to advance
   474 and Lu's algorithms.  This is where the PhD-project hopes to advance
   475 the state-of-the-art.
   475 the state-of-the-art.
   476 
   476 
   477 
   477 
   478 \section{Simplification of Regular Expressions}
   478 \section{Simplification of Regular Expressions}
   479 Using bit-codes to guide  parsing is not a new idea.
   479 Using bit-codes to guide  parsing is not a new idea.
   480 It was applied to context free grammars and then adapted by Henglein and Nielson for efficient regular expression parsing \cite{nielson11bcre}. Sulzmann and Lu took a step further by intergrating bitcodes into derivatives.
   480 It was applied to context free grammars and then adapted by Henglein and Nielson for efficient regular expression parsing \cite{nielson11bcre}. Sulzmann and Lu took a step further by integrating bitcodes into derivatives.
   481 
   481 
   482 The argument for complicating the data structures from basic regular expressions to those with bitcodes
   482 The argument for complicating the data structures from basic regular expressions to those with bitcodes
   483 is that we can introduce simplification without making the algorithm crash or impossible to reason about.
   483 is that we can introduce simplification without making the algorithm crash or impossible to reason about.
   484 The reason why we need simplification is due to the shortcoming of a naive algorithm using Brzozowski's definition only. 
   484 The reason why we need simplification is due to the shortcoming of a naive algorithm using Brzozowski's definition only. 
   485 The main drawback of building successive derivatives according to
   485 The main drawback of building successive derivatives according to
   569 \end{tabular}    
   569 \end{tabular}    
   570 \end{center}    
   570 \end{center}    
   571 %\end{definition}
   571 %\end{definition}
   572 
   572 
   573 
   573 
   574 Sulzmann and Lu's integrated the bitcodes into annotated regular expressions by attaching them to the head of every substructure of a regularexpression\emph{annotated regular expressions}~\cite{Sulzmann2014}. They are
   574 Sulzmann and Lu's integrated the bitcodes into annotated regular expressions by attaching them to the head of every substructure of a regular expression\emph{annotated regular expressions}~\cite{Sulzmann2014}. They are
   575 defined by the following grammar:
   575 defined by the following grammar:
   576 
   576 
   577 \begin{center}
   577 \begin{center}
   578 \begin{tabular}{lcl}
   578 \begin{tabular}{lcl}
   579   $\textit{a}$ & $::=$  & $\textit{ZERO}$\\
   579   $\textit{a}$ & $::=$  & $\textit{ZERO}$\\
   672      $bs \,@\, [\S]$
   672      $bs \,@\, [\S]$
   673 \end{tabular}    
   673 \end{tabular}    
   674 \end{center}    
   674 \end{center}    
   675 %\end{definition}
   675 %\end{definition}
   676 This function completes the parse tree information by 
   676 This function completes the parse tree information by 
   677 travelling along the path on the regular epxression that corresponds to a POSIX value snd collect all the bits, and
   677 travelling along the path on the regular expression that corresponds to a POSIX value snd collect all the bits, and
   678 using S to indicate the end of star iterations. If we take the bitsproduced by $bmkeps$ and decode it, 
   678 using S to indicate the end of star iterations. If we take the bitsproduced by $bmkeps$ and decode it, 
   679 we get the parse tree we need, the working flow looks like this:\\
   679 we get the parse tree we need, the working flow looks like this:\\
   680 \begin{center}
   680 \begin{center}
   681 \begin{tabular}{lcl}
   681 \begin{tabular}{lcl}
   682   $\textit{blexer}\;r\,s$ & $\dn$ &
   682   $\textit{blexer}\;r\,s$ & $\dn$ &