updated
authorChristian Urban <urbanc@in.tum.de>
Fri, 17 Jan 2020 23:53:08 +0000
changeset 108 0a0c551bb368
parent 107 b1e365afa29c
child 109 79f347cb8b4d
updated
etnms/etnms.tex
--- a/etnms/etnms.tex	Thu Jan 16 22:34:23 2020 +0000
+++ b/etnms/etnms.tex	Fri Jan 17 23:53:08 2020 +0000
@@ -102,10 +102,11 @@
 \end{abstract}
 
 \section{Recapitulation of Concepts From the Last Report}
-\subsection{The Algorithm by Brzozowski based on Derivatives of Regular
-Expressions}
+
+\subsection*{Regular Expressions and Derivatives}
 
 Suppose (basic) regular expressions are given by the following grammar:
+
 \[			r ::=   \ZERO \mid  \ONE
 			 \mid  c  
 			 \mid  r_1 \cdot r_2
@@ -114,9 +115,10 @@
 \]
 
 \noindent
+The ingenious contribution of Brzozowski is the notion of \emph{derivatives} of
+regular expressions, written~$\_ \backslash \_$. It uses the auxiliary notion of
+$\nullable$ defined below.
 
-The ingenious contribution by Brzozowski is the notion of
-\emph{derivatives} of regular expressions.  
 \begin{center}
 		\begin{tabular}{lcl}
 			$\nullable(\ZERO)$     & $\dn$ & $\mathit{false}$ \\  
@@ -154,9 +156,8 @@
 \end{center}
 
 \noindent
-
-
-Now we can generalise the derivative operation to strings like this:
+We can generalise the derivative operation shown above for single characters
+to strings as follows:
 
 \begin{center}
 \begin{tabular}{lcl}
@@ -166,13 +167,16 @@
 \end{center}
 
 \noindent
-and then define as  regular-expression matching algorithm: 
+and then define Brzozowski's  regular-expression matching algorithm as:
+
 \[
 match\;s\;r \;\dn\; nullable(r\backslash s)
 \]
 
 \noindent
-This algorithm looks graphically as follows:
+Assuming the a string is givane as a sequence of characters, say $c_0c_1..c_n$, 
+this algorithm presented graphically is as follows:
+
 \begin{equation}\label{graph:*}
 \begin{tikzcd}
 r_0 \arrow[r, "\backslash c_0"]  & r_1 \arrow[r, "\backslash c_1"] & r_2 \arrow[r, dashed]  & r_n  \arrow[r,"\textit{nullable}?"] & \;\textrm{YES}/\textrm{NO}
@@ -187,7 +191,7 @@
 an $s = c_0...c_{n-1}$ and an $r_0$, it generates YES if and only if $s \in L(r_0)$).
 
  
-\subsection{Values and the Algorithm by Sulzmann and Lu}
+\subsection*{Values and the Lexing Algorithm by Sulzmann and Lu}
 
 One limitation of Brzozowski's algorithm is that it only produces a
 YES/NO answer for whether a string is being matched by a regular
@@ -225,7 +229,6 @@
 \end{center}
 
 \noindent
-
 The contribution of Sulzmann and Lu is an extension of Brzozowski's
 algorithm by a second phase (the first phase being building successive
 derivatives---see \eqref{graph:*}). In this second phase, a POSIX value 
@@ -302,7 +305,7 @@
 expressions and values. 
 
 
-\subsection{Simplification of Regular Expressions}
+\subsection*{Simplification of Regular Expressions}
 
 The main drawback of building successive derivatives according
 to Brzozowski's definition is that they can grow very quickly in size.
@@ -331,16 +334,16 @@
 stay below this bound, we would need more aggressive simplifications.
 Essentially we need to delete useless $\ZERO$s and $\ONE$s, as well as
 deleting duplicates whenever possible. For example, the parentheses in
-$(a+b) \cdot c + bc$ can be opened up to get $a\cdot c +  b \cdot c + b
+$(a+b) \cdot c + b\cdot c$ can be opened up to get $a\cdot c + b \cdot c + b
 \cdot c$, and then simplified to just $a \cdot c + b \cdot c$. Another
 example is simplifying $(a^*+a) + (a^*+ \ONE) + (a +\ONE)$ to just
-$a^*+a+\ONE$. Adding these more aggressive simplification rules helps us
+$a^*+a+\ONE$. Adding these more aggressive simplification rules help us
 to achieve the same size bound as that of the partial derivatives. 
 
 In order to implement the idea of ``spilling out alternatives'' and to
-make them compatible with the $\text{inj}$-mechanism, we use
-\emph{bitcodes}. Bits and bitcodes (lists of bits) are just:
-
+make them compatible with the $\textit{inj}$-mechanism, we use
+\emph{bitcodes}. They were first introduced by Sulzmann and Lu.
+Here bits and bitcodes (lists of bits) are defined as:
 
 \begin{center}
 		$b ::=   S \mid  Z \qquad
@@ -574,6 +577,9 @@
 operation from characters to strings (just like the derivatives for un-annotated
 regular expressions).
 
+
+\subsection*{Our Simplification Rules}
+
 The main point of the bitcodes and annotated regular expressions is that
 we can apply rather aggressive (in terms of size) simplification rules
 in order to keep derivatives small. We have developed such