s
authorChengsong
Wed, 03 Jul 2019 22:06:47 +0100
changeset 42 2a469388c989
parent 41 a1f90febbc7f
child 43 52a3cec0a5c7
s
ninems/ninems.tex
--- a/ninems/ninems.tex	Wed Jul 03 20:46:03 2019 +0100
+++ b/ninems/ninems.tex	Wed Jul 03 22:06:47 2019 +0100
@@ -5,6 +5,7 @@
 \usepackage{algorithm}
 \usepackage{amsmath}
 \usepackage[noend]{algpseudocode}
+\usepackage{enumitem}
 % \documentclass{article}
 %\usepackage[utf8]{inputenc}
 %\usepackage[english]{babel}
@@ -457,15 +458,25 @@
 $Left(Left(Seq(Right(Right(Right(Seq(Empty, Seq(Empty, Empty)))))), Stars []))$\\
 tells us how about the empty string matches the final regular expression after doing all the derivatives: among the regular expressions $(0+0+0 + 0 + 1 \cdot 1 \cdot 1) \cdot r^* + (0+0+0  + 1 + 0) \cdot r*) +((0+1+0  + 0 + 0) \cdot r*+(0+0+0  + 1 + 0) \cdot r* )$ we choose the left most nullable one, which is composed of a sequence of a nested alternative and a folded star that iterates 0 times. In that nested alternative we take the rightmost alternative.
 
-Using the value $v_3$, the character c, and the regular expression $r_2$, we can recover how $r_2$ matched the string $[c]$ : we inject $c$ back to $v_3$, and get $v_2 = Left(Seq(Right(Right(Right(Seq(Empty, Seq(Empty, c)))))), Stars [])$, which tells us how $r_2$ matched $c$.
-We omit the details of injection function, which is provided by Sulzmann and Lu's paper \cite{Sulzmann2014}. Rather, we shall focus next on the
-process of simplification of regular expressions, which is needed in
+Using the value $v_3$, the character c, and the regular expression $r_2$, we can recover how $r_2$ matched the string $[c]$ : we inject $c$ back to $v_3$, and get \\ $v_2 = Left(Seq(Right(Right(Right(Seq(Empty, Seq(Empty, c)))))), Stars [])$, which tells us how $r_2$ matched $c$. After this we inject back the character $b$, and get\\ $v_1 = Seq(Right(Right(Right(Seq(Empty, Seq(b, c)))))), Stars [])$ for how $r_1= (1+0+1 \cdot b + 0 + 1 \cdot b \cdot c) \cdot r*$ matched  the string $bc$ before it split into 2 pieces. Finally, after injecting character a back to $v_1$, we get  the parse tree $v_0= Stars [Right(Right(Right(Seq(a, Seq(b, c)))))]$ for how r matched $abc$.
+We omit the details of injection function, which is provided by Sulzmann and Lu's paper \cite{Sulzmann2014}. 
+Readers might have noticed that the parse tree information as actually already available when doing derivatives. For example, immediately after the operation $\backslash a$ we know that if we want to match a string that starts with a, we can either take the initial match to be 
+\begin{enumerate}
+    \item[1)] just $a$ or
+    \item[2)] string $ab$ or 
+    \item[3)] string $abc$.
+\end{enumerate}
+In order to differentiate between these choices, we just need to remember their positions--$a$ is on the left, $ab$ is in the middle , and $abc$ is on the right. Which one of these alternatives is chosen later does not affect their relative position because our algorithm does not change this order. There is no need to traverse this information twice. This leads to a new approach of lexing-- if we store the information for parse trees  in the corresponding regular expression pieces, update this information when we do derivative operation on them, and collect the information when fininshed with derivatives and calling $mkeps$ for deiciding which branch is POSIX, we can generate the parse tree in one pass, instead of doing an n-step backward transformation.This leads to Sulzmann and Lu's novel idea of using bit-codes on derivatives.
+
+In the next section, we shall focus on the bit-coded algorithm and the natural
+process of simplification of regular expressions using bit-codes, which is needed in
 order to obtain \emph{fast} versions of the Brzozowski's, and Sulzmann
 and Lu's algorithms.  This is where the PhD-project hopes to advance
 the state-of-the-art.
 
 
 \section{Simplification of Regular Expressions}
+Using bit-codes to guide  parsing is not a new idea.
 
 The main drawback of building successive derivatives according to
 Brzozowski's definition is that they can grow very quickly in size.