ChengsongTanPhdThesis/Chapters/Finite.tex
changeset 591 b2d0de6aee18
parent 590 988e92a70704
child 593 83fab852d72d
equal deleted inserted replaced
590:988e92a70704 591:b2d0de6aee18
  2005 \end{axis}
  2005 \end{axis}
  2006 \end{tikzpicture}\\
  2006 \end{tikzpicture}\\
  2007 \multicolumn{3}{c}{Graphs: size change of 3 randomly generated regular expressions $w.r.t.$ input string length.}
  2007 \multicolumn{3}{c}{Graphs: size change of 3 randomly generated regular expressions $w.r.t.$ input string length.}
  2008 \end{tabular}    
  2008 \end{tabular}    
  2009 \end{center}  
  2009 \end{center}  
  2010 
       
  2011 
       
  2012 
       
  2013 
       
  2014 
       
  2015 \noindent
  2010 \noindent
  2016 Most of the regex's sizes seem to stay within a polynomial bound $w.r.t$ the 
  2011 Most of the regex's sizes seem to stay within a polynomial bound $w.r.t$ the 
  2017 original size.
  2012 original size.
  2018 This suggests a link towrads "partial derivatives"
  2013 We will discuss improvements to this bound in the next chapter.
  2019  introduced by Antimirov \cite{Antimirov95}.
       
  2020  
       
  2021  \section{Antimirov's partial derivatives}
       
  2022  The idea behind Antimirov's partial derivatives
       
  2023 is to do derivatives in a similar way as suggested by Brzozowski, 
       
  2024 but maintain a set of regular expressions instead of a single one:
       
  2025 
       
  2026 %TODO: antimirov proposition 3.1, needs completion
       
  2027  \begin{center}  
       
  2028  \begin{tabular}{ccc}
       
  2029  $\partial_x(a+b)$ & $=$ & $\partial_x(a) \cup \partial_x(b)$\\
       
  2030 $\partial_x(\ONE)$ & $=$ & $\phi$
       
  2031 \end{tabular}
       
  2032 \end{center}
       
  2033 
       
  2034 Rather than joining the calculated derivatives $\partial_x a$ and $\partial_x b$ together
       
  2035 using the alternatives constructor, Antimirov cleverly chose to put them into
       
  2036 a set instead.  This breaks the terms in a derivative regular expression up, 
       
  2037 allowing us to understand what are the "atomic" components of it.
       
  2038 For example, To compute what regular expression $x^*(xx + y)^*$'s 
       
  2039 derivative against $x$ is made of, one can do a partial derivative
       
  2040 of it and get two singleton sets $\{x^* \cdot (xx + y)^*\}$ and $\{x \cdot (xx + y) ^* \}$
       
  2041 from $\partial_x(x^*) \cdot (xx + y) ^*$ and $\partial_x((xx + y)^*)$.
       
  2042 To get all the "atomic" components of a regular expression's possible derivatives,
       
  2043 there is a procedure Antimirov called $\textit{lf}$, short for "linear forms", that takes
       
  2044 whatever character is available at the head of the string inside the language of a
       
  2045 regular expression, and gives back the character and the derivative regular expression
       
  2046 as a pair (which he called "monomial"):
       
  2047  \begin{center}  
       
  2048  \begin{tabular}{ccc}
       
  2049  $\lf(\ONE)$ & $=$ & $\phi$\\
       
  2050 $\lf(c)$ & $=$ & $\{(c, \ONE) \}$\\
       
  2051  $\lf(a+b)$ & $=$ & $\lf(a) \cup \lf(b)$\\
       
  2052  $\lf(r^*)$ & $=$ & $\lf(r) \bigodot \lf(r^*)$\\
       
  2053 \end{tabular}
       
  2054 \end{center}
       
  2055 %TODO: completion
       
  2056 
       
  2057 There is a slight difference in the last three clauses compared
       
  2058 with $\partial$: instead of a dot operator $ \textit{rset} \cdot r$ that attaches the regular 
       
  2059 expression $r$ with every element inside $\textit{rset}$ to create a set of 
       
  2060 sequence derivatives, it uses the "circle dot" operator $\bigodot$ which operates 
       
  2061 on a set of monomials (which Antimirov called "linear form") and a regular 
       
  2062 expression, and returns a linear form:
       
  2063  \begin{center}  
       
  2064  \begin{tabular}{ccc}
       
  2065  $l \bigodot (\ZERO)$ & $=$ & $\phi$\\
       
  2066  $l \bigodot (\ONE)$ & $=$ & $l$\\
       
  2067  $\phi \bigodot t$ & $=$ & $\phi$\\
       
  2068  $\{ (x, \ZERO) \} \bigodot t$ & $=$ & $\{(x,\ZERO) \}$\\
       
  2069  $\{ (x, \ONE) \} \bigodot t$ & $=$ & $\{(x,t) \}$\\
       
  2070   $\{ (x, p) \} \bigodot t$ & $=$ & $\{(x,p\cdot t) \}$\\
       
  2071  $\lf(a+b)$ & $=$ & $\lf(a) \cup \lf(b)$\\
       
  2072  $\lf(r^*)$ & $=$ & $\lf(r) \cdot \lf(r^*)$\\
       
  2073 \end{tabular}
       
  2074 \end{center}
       
  2075 %TODO: completion
       
  2076 
       
  2077  Some degree of simplification is applied when doing $\bigodot$, for example,
       
  2078  $l \bigodot (\ZERO) = \phi$ corresponds to $r \cdot \ZERO \rightsquigarrow \ZERO$,
       
  2079  and  $l \bigodot (\ONE) = l$ to $l \cdot \ONE \rightsquigarrow l$, and
       
  2080   $\{ (x, \ZERO) \} \bigodot t = \{(x,\ZERO) \}$ to $\ZERO \cdot x \rightsquigarrow \ZERO$,
       
  2081   and so on.
       
  2082   
       
  2083   With the function $\lf$ one can compute all possible partial derivatives $\partial_{UNIV}(r)$ of a regex $r$ with 
       
  2084   an iterative procedure:
       
  2085    \begin{center}  
       
  2086  \begin{tabular}{llll}
       
  2087 $\textit{while}$ & $(\Delta_i \neq \phi)$  &                &          \\
       
  2088  		       &  $\Delta_{i+1}$           &        $ =$ & $\lf(\Delta_i) - \PD_i$ \\
       
  2089 		       &  $\PD_{i+1}$              &         $ =$ & $\Delta_{i+1} \cup \PD_i$ \\
       
  2090 $\partial_{UNIV}(r)$ & $=$ & $\PD$ &		     
       
  2091 \end{tabular}
       
  2092 \end{center}
       
  2093 
       
  2094 
       
  2095  $(r_1 + r_2) \cdot r_3 \longrightarrow (r_1 \cdot r_3) + (r_2 \cdot r_3)$,
       
  2096 
       
  2097 
       
  2098 However, if we introduce them in our
       
  2099 setting we would lose the POSIX property of our calculated values. 
       
  2100 A simple example for this would be the regex $(a + a\cdot b)\cdot(b\cdot c + c)$.
       
  2101 If we split this regex up into $a\cdot(b\cdot c + c) + a\cdot b \cdot (b\cdot c + c)$, the lexer 
       
  2102 would give back $\Left(\Seq(\Char(a), \Left(\Char(b \cdot c))))$ instead of
       
  2103 what we want: $\Seq(\Right(ab), \Right(c))$. Unless we can store the structural information
       
  2104 in all the places where a transformation of the form $(r_1 + r_2)\cdot r \rightarrow r_1 \cdot r + r_2 \cdot r$
       
  2105 occurs, and apply them in the right order once we get 
       
  2106 a result of the "aggressively simplified" regex, it would be impossible to still get a $\POSIX$ value.
       
  2107 This is unlike the simplification we had before, where the rewriting rules 
       
  2108 such  as $\ONE \cdot r \rightsquigarrow r$, under which our lexer will give the same value.
       
  2109 We will discuss better
       
  2110 bounds in the last section of this chapter.\\[-6.5mm]
       
  2111 
       
  2112 
  2014 
  2113 
  2015 
  2114 
  2016 
  2115 %----------------------------------------------------------------------------------------
  2017 %----------------------------------------------------------------------------------------
  2116 %	SECTION ??
  2018 %	SECTION ??