lexing: comparison ChengsongTanPhdThesis/Chapters/Finite.tex

equal deleted inserted replaced

-:988e92a70704
+:b2d0de6aee18
 \end{axis}
 \end{tikzpicture}\\
 \multicolumn{3}{c}{Graphs: size change of 3 randomly generated regular expressions $w.r.t.$ input string length.}
 \end{tabular}
 \end{center}
 \noindent
 Most of the regex's sizes seem to stay within a polynomial bound $w.r.t$ the
 original size.
-This suggests a link towrads "partial derivatives"
+We will discuss improvements to this bound in the next chapter.
-introduced by Antimirov \cite{Antimirov95}.
-\section{Antimirov's partial derivatives}
-The idea behind Antimirov's partial derivatives
-is to do derivatives in a similar way as suggested by Brzozowski,
-but maintain a set of regular expressions instead of a single one:
-%TODO: antimirov proposition 3.1, needs completion
-\begin{center}
-\begin{tabular}{ccc}
-$\partial_x(a+b)$ & $=$ & $\partial_x(a) \cup \partial_x(b)$\\
-$\partial_x(\ONE)$ & $=$ & $\phi$
-\end{tabular}
-\end{center}
-Rather than joining the calculated derivatives $\partial_x a$ and $\partial_x b$ together
-using the alternatives constructor, Antimirov cleverly chose to put them into
-a set instead.  This breaks the terms in a derivative regular expression up,
-allowing us to understand what are the "atomic" components of it.
-For example, To compute what regular expression $x^*(xx + y)^*$'s
-derivative against $x$ is made of, one can do a partial derivative
-of it and get two singleton sets $\{x^* \cdot (xx + y)^*\}$ and $\{x \cdot (xx + y) ^* \}$
-from $\partial_x(x^*) \cdot (xx + y) ^*$ and $\partial_x((xx + y)^*)$.
-To get all the "atomic" components of a regular expression's possible derivatives,
-there is a procedure Antimirov called $\textit{lf}$, short for "linear forms", that takes
-whatever character is available at the head of the string inside the language of a
-regular expression, and gives back the character and the derivative regular expression
-as a pair (which he called "monomial"):
-\begin{center}
-\begin{tabular}{ccc}
-$\lf(\ONE)$ & $=$ & $\phi$\\
-$\lf(c)$ & $=$ & $\{(c, \ONE) \}$\\
-$\lf(a+b)$ & $=$ & $\lf(a) \cup \lf(b)$\\
-$\lf(r^*)$ & $=$ & $\lf(r) \bigodot \lf(r^*)$\\
-\end{tabular}
-\end{center}
-%TODO: completion
-There is a slight difference in the last three clauses compared
-with $\partial$: instead of a dot operator $ \textit{rset} \cdot r$ that attaches the regular
-expression $r$ with every element inside $\textit{rset}$ to create a set of
-sequence derivatives, it uses the "circle dot" operator $\bigodot$ which operates
-on a set of monomials (which Antimirov called "linear form") and a regular
-expression, and returns a linear form:
-\begin{center}
-\begin{tabular}{ccc}
-$l \bigodot (\ZERO)$ & $=$ & $\phi$\\
-$l \bigodot (\ONE)$ & $=$ & $l$\\
-$\phi \bigodot t$ & $=$ & $\phi$\\
-$\{ (x, \ZERO) \} \bigodot t$ & $=$ & $\{(x,\ZERO) \}$\\
-$\{ (x, \ONE) \} \bigodot t$ & $=$ & $\{(x,t) \}$\\
-$\{ (x, p) \} \bigodot t$ & $=$ & $\{(x,p\cdot t) \}$\\
-$\lf(a+b)$ & $=$ & $\lf(a) \cup \lf(b)$\\
-$\lf(r^*)$ & $=$ & $\lf(r) \cdot \lf(r^*)$\\
-\end{tabular}
-\end{center}
-%TODO: completion
-Some degree of simplification is applied when doing $\bigodot$, for example,
-$l \bigodot (\ZERO) = \phi$ corresponds to $r \cdot \ZERO \rightsquigarrow \ZERO$,
-and  $l \bigodot (\ONE) = l$ to $l \cdot \ONE \rightsquigarrow l$, and
-$\{ (x, \ZERO) \} \bigodot t = \{(x,\ZERO) \}$ to $\ZERO \cdot x \rightsquigarrow \ZERO$,
-and so on.
-With the function $\lf$ one can compute all possible partial derivatives $\partial_{UNIV}(r)$ of a regex $r$ with
-an iterative procedure:
-\begin{center}
-\begin{tabular}{llll}
-$\textit{while}$ & $(\Delta_i \neq \phi)$  &                &          \\
-		       &  $\Delta_{i+1}$           &        $ =$ & $\lf(\Delta_i) - \PD_i$ \\
-		       &  $\PD_{i+1}$              &         $ =$ & $\Delta_{i+1} \cup \PD_i$ \\
-$\partial_{UNIV}(r)$ & $=$ & $\PD$ &
-\end{tabular}
-\end{center}
-$(r_1 + r_2) \cdot r_3 \longrightarrow (r_1 \cdot r_3) + (r_2 \cdot r_3)$,
-However, if we introduce them in our
-setting we would lose the POSIX property of our calculated values.
-A simple example for this would be the regex $(a + a\cdot b)\cdot(b\cdot c + c)$.
-If we split this regex up into $a\cdot(b\cdot c + c) + a\cdot b \cdot (b\cdot c + c)$, the lexer
-would give back $\Left(\Seq(\Char(a), \Left(\Char(b \cdot c))))$ instead of
-what we want: $\Seq(\Right(ab), \Right(c))$. Unless we can store the structural information
-in all the places where a transformation of the form $(r_1 + r_2)\cdot r \rightarrow r_1 \cdot r + r_2 \cdot r$
-occurs, and apply them in the right order once we get
-a result of the "aggressively simplified" regex, it would be impossible to still get a $\POSIX$ value.
-This is unlike the simplification we had before, where the rewriting rules
-such  as $\ONE \cdot r \rightsquigarrow r$, under which our lexer will give the same value.
-We will discuss better
-bounds in the last section of this chapter.\\[-6.5mm]
 %----------------------------------------------------------------------------------------
 %	SECTION ??

changeset 591	b2d0de6aee18
parent 590	988e92a70704
child 593	83fab852d72d