cst_tests: comparison ninems/ninems.tex

equal deleted inserted replaced

-:a1f90febbc7f
+:2a469388c989
 \usepackage{data}
 \usepackage{tikz-cd}
 \usepackage{algorithm}
 \usepackage{amsmath}
 \usepackage[noend]{algpseudocode}
+\usepackage{enumitem}
 % \documentclass{article}
 %\usepackage[utf8]{inputenc}
 %\usepackage[english]{babel}
 %\usepackage{listings}
 % \usepackage{amsthm}
 This corresponds to the leftmost term $((0+0+0 + 0 + 1 \cdot 1 \cdot 1) \cdot r^* $ in $r_3$. Note that its leftmost location allows $mkeps$ to choose  it as the first candidate that meets the requirement of being $nullable$. This location is naturally generated by the splitting clause\\ $(r_1 \cdot r_2)\backslash c  (when \, r_1 \, nullable)) \, = (r_1\backslash c) \cdot r_2 \,+\, r_2\backslash c.  \\$. By this clause, we put
 $r_1 \backslash c \cdot r_2 $ at the front and $r_2 \backslash c$ at the back. This allows $mkeps$ to always pick up among two matches the one with a longer prefix. The value \\
 $Left(Left(Seq(Right(Right(Right(Seq(Empty, Seq(Empty, Empty)))))), Stars []))$\\
 tells us how about the empty string matches the final regular expression after doing all the derivatives: among the regular expressions $(0+0+0 + 0 + 1 \cdot 1 \cdot 1) \cdot r^* + (0+0+0  + 1 + 0) \cdot r*) +((0+1+0  + 0 + 0) \cdot r*+(0+0+0  + 1 + 0) \cdot r* )$ we choose the left most nullable one, which is composed of a sequence of a nested alternative and a folded star that iterates 0 times. In that nested alternative we take the rightmost alternative.
-Using the value $v_3$, the character c, and the regular expression $r_2$, we can recover how $r_2$ matched the string $[c]$ : we inject $c$ back to $v_3$, and get $v_2 = Left(Seq(Right(Right(Right(Seq(Empty, Seq(Empty, c)))))), Stars [])$, which tells us how $r_2$ matched $c$.
+Using the value $v_3$, the character c, and the regular expression $r_2$, we can recover how $r_2$ matched the string $[c]$ : we inject $c$ back to $v_3$, and get \\ $v_2 = Left(Seq(Right(Right(Right(Seq(Empty, Seq(Empty, c)))))), Stars [])$, which tells us how $r_2$ matched $c$. After this we inject back the character $b$, and get\\ $v_1 = Seq(Right(Right(Right(Seq(Empty, Seq(b, c)))))), Stars [])$ for how $r_1= (1+0+1 \cdot b + 0 + 1 \cdot b \cdot c) \cdot r*$ matched  the string $bc$ before it split into 2 pieces. Finally, after injecting character a back to $v_1$, we get  the parse tree $v_0= Stars [Right(Right(Right(Seq(a, Seq(b, c)))))]$ for how r matched $abc$.
-We omit the details of injection function, which is provided by Sulzmann and Lu's paper \cite{Sulzmann2014}. Rather, we shall focus next on the
+We omit the details of injection function, which is provided by Sulzmann and Lu's paper \cite{Sulzmann2014}.
-process of simplification of regular expressions, which is needed in
+Readers might have noticed that the parse tree information as actually already available when doing derivatives. For example, immediately after the operation $\backslash a$ we know that if we want to match a string that starts with a, we can either take the initial match to be
+\begin{enumerate}
+\item[1)] just $a$ or
+\item[2)] string $ab$ or
+\item[3)] string $abc$.
+\end{enumerate}
+In order to differentiate between these choices, we just need to remember their positions--$a$ is on the left, $ab$ is in the middle , and $abc$ is on the right. Which one of these alternatives is chosen later does not affect their relative position because our algorithm does not change this order. There is no need to traverse this information twice. This leads to a new approach of lexing-- if we store the information for parse trees  in the corresponding regular expression pieces, update this information when we do derivative operation on them, and collect the information when fininshed with derivatives and calling $mkeps$ for deiciding which branch is POSIX, we can generate the parse tree in one pass, instead of doing an n-step backward transformation.This leads to Sulzmann and Lu's novel idea of using bit-codes on derivatives.
+In the next section, we shall focus on the bit-coded algorithm and the natural
+process of simplification of regular expressions using bit-codes, which is needed in
 order to obtain \emph{fast} versions of the Brzozowski's, and Sulzmann
 and Lu's algorithms.  This is where the PhD-project hopes to advance
 the state-of-the-art.
 \section{Simplification of Regular Expressions}
+Using bit-codes to guide  parsing is not a new idea.
 The main drawback of building successive derivatives according to
 Brzozowski's definition is that they can grow very quickly in size.
 This is mainly due to the fact that the derivative operation generates
 often ``useless'' $\ZERO$s and $\ONE$s in derivatives.  As a result,

changeset 42	2a469388c989
parent 41	a1f90febbc7f
child 43	52a3cec0a5c7