diff -r 988e92a70704 -r b2d0de6aee18 ChengsongTanPhdThesis/Chapters/Cubic.tex --- a/ChengsongTanPhdThesis/Chapters/Cubic.tex Wed Aug 31 23:57:42 2022 +0100 +++ b/ChengsongTanPhdThesis/Chapters/Cubic.tex Thu Sep 01 23:47:37 2022 +0100 @@ -59,239 +59,152 @@ \section{A Stronger Version of Simplification} %TODO: search for isabelle proofs of algorithms that check equivalence In our bitcoded lexing algorithm, (sub)terms represent (sub)matches. -For example, the regular expression $aa \cdot a^*+ a \cdot a^*$ contains two terms, -which expresses two possibilities it will match future input. -It will either match 2 $a$'s, then 0 or more $a$'s, in other words, at least 2 more $a$'s -\begin{figure}\label{string_3parts1} +For example, the regular expression +\[ + aa \cdot a^*+ a \cdot a^* + aa\cdot a^* +\] +contains three terms, +expressing three possibilities it will match future input. +The first and the third terms are identical, which means we can eliminate +the latter as we know it will not be picked up by $\bmkeps$. +In $\bsimps$, the $\distinctBy$ function takes care of this. +The criteria $\distinctBy$ uses for removing a duplicate +$a_2$ in the list +\begin{center} + $rs_a@[a_1]@rs_b@[a_2]@rs_c$ +\end{center} +is that \begin{center} -\begin{tikzpicture}[every node/.append style={draw, rounded corners, inner sep=10pt}] - \node [rectangle split, rectangle split horizontal, rectangle split parts=3] - {Consumed Input - \nodepart{two} Expects $aa$ - \nodepart{three} Then expects $a^*$}; - + $\rerase{a_1} = \rerase{a_2}$. +\end{center} +It can be characterised as the $LD$ +rewrite rule in \ref{rrewriteRules}.\\ +The problem , however, is that identical components +in two slightly different regular expressions cannot be removed: +\begin{figure}[H] +\[ + (a+b+d) \cdot r_1 + (a+c+e) \cdot r_1 \stackrel{?}{\rightsquigarrow} (a+b+d) \cdot r_1 + (c+e) \cdot r_1 +\] +\caption{Desired simplification, but not done in $\blexersimp$}\label{partialDedup} +\end{figure} +\noindent +A simplification like this actually +cannot be omitted, +as without it the size could blow up even with our $\simp$ function: +\begin{figure}[H] +\centering +\begin{tikzpicture} +\begin{axis}[ + %xlabel={$n$}, + myplotstyle, + xlabel={input length}, + ylabel={size}, + ] +\addplot[blue,mark=*, mark options={fill=white}] table {bsimpExponential.data}; +\end{axis} \end{tikzpicture} -\end{center} +\caption{Runtime of $\blexersimp$ for matching + $\protect((a^* + (aa)^* + \ldots + (aaaaa)^* )^*)^*$ + with strings + of the form $\protect\underbrace{aa..a}_{n}$.}\label{blexerExp} +\end{figure} +\noindent +We would like to apply the rewriting at some stage +\begin{figure}[H] +\[ + (a+b+d) \cdot r_1 \longrightarrow a \cdot r_1 + b \cdot r_1 + d \cdot r_1 +\] +\caption{Desired simplification, but not done in $\blexersimp$}\label{desiredSimp} \end{figure} \noindent -Or it will match at least 1 more $a$: -\begin{figure}\label{string_3parts2} -\begin{center} -\begin{tikzpicture}[every node/.append style={draw, rounded corners, inner sep=10pt}] - \node [rectangle split, rectangle split horizontal, rectangle split parts=3] - {Consumed - \nodepart{two} Expects $a$ - \nodepart{three} Then expects $a^*$}; - -\end{tikzpicture} -\end{center} -\end{figure} -If these two possibilities are identical, we can eliminate -the second one as we know the second one corresponds to -a match that is not $\POSIX$. - - -If two identical regexes -happen to be grouped into different sequences, one cannot use -the $a + a \rightsquigarrow a$ rule to eliminate them, even if they -are "parallel" to each other: -\[ -(a+b) \cdot r_1 + (a+c) \cdot r_2 \rightsquigarrow (a+b) \cdot r_1 + (c) \cdot r_2 -\] -and that's because they are followed by possibly -different "suffixes" $r_1$ and $r_2$, and if -they do turn out to be different then doing -\[ -(a+b) \cdot r_1 + (a+c) \cdot r_2 \rightsquigarrow (a+b) \cdot r_1 + (c) \cdot r_2 -\] -might cause a possible match where the string is in $L(a \cdot r_2)$ to be lost. - -Here is an example for this. -Assume we have the derivative regex -\[( r_1 + r_2 + r_3)\cdot r+ -( r_1 + r_5 + r_6)\cdot( r_1 + r_2 + r_3)^* -\] - -occurring in an intermediate step in our bit-coded lexing algorithm. +in our $\simp$ function, +so that it makes the simplification in \ref{partialDedup} possible. +For example, +can a function like the below one be used? +%TODO: simp' that spills things -In this expression, there will be 6 "parallel" terms if we break up regexes -of shape $(a+b)\cdot c$ using the distributivity law (into $ac + bc$). -\begin{align} -(\nonumber \\ - r_1 + & \label{term:1} \\ - r_2 + & \label{term:2} \\ - r_3 & \label{term:3} \\ -& )\cdot r\; + \; (\nonumber \\ - r_1 + & \label{term:4} \\ - r_5 + & \label{term:5} \\ - r_6 \label{term:6}\\ -& )\cdot r\nonumber -\end{align} - -They have been labelled, and each label denotes -one term, for example, \ref{term:1} denotes -\[ - r_1 \cdot r -\] -\noindent -and \ref{term:3} denotes -\[ - r_3\cdot r. -\] -From a lexer's point of view, \ref{term:4} will never -be picked up in later phases of matching because there -is a term \ref{term:1} giving identical matching information. -The first term \ref{term:1} will match a string in $L(r_1 \cdot r)$, -and so on for term \ref{term:2}, \ref{term:3}, etc. - -\mybox{previous input $\ldots$}\mybox{$aaa$ }\mybox{rest of input $\ldots$} -\begin{center}\label{string_2parts} - -\begin{tikzpicture}[every node/.append style={draw, rounded corners, inner sep=10pt}] - \node [rectangle split, rectangle split horizontal, rectangle split parts=2] - {$r_{x1}$ - \nodepart{two} $r_1\cdot r$ }; -%\caption{term 1 \ref{term:1}'s matching configuration} -\end{tikzpicture} - +Unfortunately, +if we introduce them in our +setting we would lose the POSIX property of our calculated values. +For example given the regular expression +\begin{center} + $(a + ab)(bc + c)$ +\end{center} +and the string +\begin{center} + $ab$, \end{center} -For term 1 \ref{term:1}, whatever was before the current -input position was fully matched and the regex corresponding -to it reduced to $\ONE$, -and in the next input token, it will start on $r_1\cdot r$. -The resulting value will be something of a similar configuration: -\begin{center}\label{value_2parts} -%\caption{term 1 \ref{term:1}'s matching configuration} -\begin{tikzpicture}[every node/.append style={draw, rounded corners, inner sep=10pt}] - \node [rectangle split, rectangle split horizontal, rectangle split parts=2] - {$v_{x1}$ - \nodepart{two} $v_{r_1\cdot r}$ }; -\end{tikzpicture} +then our algorithm generates the following +correct POSIX value +\begin{center} + $\Seq \; (\Right \; ab) \; (\Right \; c)$. \end{center} -For term 2 \ref{term:2} we have a similar value partition: -\begin{center}\label{value_2parts2} -%\caption{term 1 \ref{term:1}'s matching configuration} -\begin{tikzpicture}[every node/.append style={draw, rounded corners, inner sep=10pt}] - \node [rectangle split, rectangle split horizontal, rectangle split parts=2] - {$v_{x2}$ - \nodepart{two} $v_{r_2\cdot r}$ }; -\end{tikzpicture} +Essentially it matches the string with the longer Right-alternative +in the first sequence (and +then the 'rest' with the character regular expression $c$ from the second sequence). +If we add the simplification above, then we obtain the following value +\begin{center} + $\Left \; (\Seq \; a \; (\Left \; bc))$ +\end{center} +where the $\Left$-alternatives get priority. +However this violates the POSIX rules. +The reason for getting this undesired value +is that the new rule splits this regular expression up into +\begin{center} + $a\cdot(b c + c) + ab \cdot (bc + c)$, \end{center} -\noindent -and so on. -We note that for term 4 \ref{term:4} its result value -after any position beyond the current one will always -be the same: -\begin{center}\label{value_2parts4} -%\caption{term 1 \ref{term:1}'s matching configuration} -\begin{tikzpicture}[every node/.append style={draw, rounded corners, inner sep=10pt}] - \node [rectangle split, rectangle split horizontal, rectangle split parts=2] - {$v_{x4}$ - \nodepart{two} $v_{r_1\cdot r}$ }; -\end{tikzpicture} +which becomes a regular expression with a +totally different structure--the original +was a sequence, and now it becomes an alternative. +With an alternative the maximum munch rule no longer works.\\ +A method to reconcile this is to do the +transformation in \ref{desiredSimp} ``non-invasively'', +meaning that we traverse the list of regular expressions +\begin{center} + $rs_a@[a]@rs_c$ +\end{center} +in the alternative +\begin{center} + $\sum ( rs_a@[a]@rs_c)$ \end{center} -And $\POSIX$ rules says that we can eliminate at least one of them. -Our algorithm always puts the regex with the longest initial sub-match at the left of the -alternatives, so we safely throw away \ref{term:4}. -The fact that term 1 and 4 are not immediately in the same alternative -expression does not prevent them from being two duplicate matches at -the current point of view. -To implement this idea into an algorithm, we define a "pruning function" -$\textit{prune}$, that takes away parts of all terms in $r$ that belongs to -$\textit{acc}$, which acts as an element -is a stronger version of $\textit{distinctBy}$. -Here is a concise version of it (in the style of Scala): -\begin{figure}[H] -\begin{lstlisting} -def distinctByPlus(rs: List[ARexp], acc: Set[Rexp] = Set()) : -List[ARexp] = rs match { - case Nil => Nil - case r :: rs => - if(acc.contains(erase(r))) - distinctByPlus(rs, acc) - else - prune(r, acc) :: distinctByPlus(rs, prune(r, acc) +: acc) -} - -\end{lstlisting} -\caption{the distinctByPlus function} -\end{figure} -But for the function $\textit{prune}$ there is a difficulty: -how could one define the $L$ function in a "computable" way, -so that they generate a (lazy infinite) set of strings in $L(r)$. -We also need a function that tests whether $L(r_1) \subseteq L(r_2)$ -is true. -For the moment we cut corners and do not define these two functions -as they are not used by Antimirov (and will probably not contribute -to a bound better than Antimirov's cubic bound). -Rather, we do not try to eliminate in every instance of regular expressions -that have "language duplicates", but only eliminate the "exact duplicates". -For this we need to break up terms $(a+b)\cdot c$ into two -terms $a\cdot c$ and $b\cdot c$ before we add them to the accumulator: +using a function similar to $\distinctBy$, +but this time +we allow a more general list rewrite: + \begin{mathpar} + \inferrule{\vspace{0mm} }{rs_a@[a]@rs_c + \stackrel{s}{\rightsquigarrow } + rs_a@[\textit{prune} \; a \; rs_a]@rs_c } + \end{mathpar} +%L \; a_1' = L \; a_1 \setminus (\cup_{a \in rs_a} L \; a) +where $\textit{prune} \;a \; acc$ traverses $a$ +without altering the structure of $a$, removing components in $a$ +that have appeared in the accumulator $acc$. +For example +\begin{center} + $\textit{prune} \;\;\; (r_a+r_f+r_g+r_h)r_d \;\; \; [(r_a+r_b+r_c)r_d, (r_e+r_f)r_d] $ +\end{center} +should be equal to +\begin{center} + $(r_g+r_h)r_d$ +\end{center} +because $r_gr_d$ and +$r_hr_d$ are the only terms +that have not appeared in the accumulator list +\begin{center} +$[(r_a+r_b+r_c)r_d, (r_e+r_f)r_d]$. +\end{center} +We implemented +function $\textit{prune}$ in Scala, +and incorporated into our lexer, +by replacing the $\simp$ function +with a stronger version called $\bsimpStrong$ +that prunes regular expressions. +We call this lexer $\blexerStrong$. +$\blexerStrong$ is able to drastically reduce the +internal data structure size which could +trigger exponential behaviours in +$\blexersimp$. \begin{figure}[H] -\begin{lstlisting} -def distinctWith(rs: List[ARexp], - pruneFunction: (ARexp, Set[Rexp]) => ARexp, - acc: Set[Rexp] = Set()) : List[ARexp] = - rs match{ - case Nil => Nil - case r :: rs => - if(acc(erase(r))) - distinctWith(rs, pruneFunction, acc) - else { - val pruned_r = pruneFunction(r, acc) - pruned_r :: - distinctWith(rs, - pruneFunction, - turnIntoTerms(erase(pruned_r)) ++: acc - ) - } - } -\end{lstlisting} -\caption{A Stronger Version of $\textit{distinctBy}$} -\end{figure} -\noindent -This process is done by the function $\textit{turnIntoTerms}$: -\begin{figure} -\begin{verbatim} -def turnIntoTerms(r: Rexp): List[Rexp] = r match { - case SEQ(r1, r2) => if(isOne(r1)) - turnIntoTerms(r2) - else - turnIntoTerms(r1).map(r11 => SEQ(r11, r2)) - case ALTS(r1, r2) => turnIntoTerms(r1) ::: turnIntoTerms(r2) - case ZERO => Nil - case _ => r :: Nil -} -\end{verbatim} -\caption{function that breaks up regular expression into "atomic" terms} -\end{figure} - -\noindent -The pruning function can be defined recursively: -\begin{figure} -\begin{verbatim} -def prune7(r: ARexp, acc: Set[Rexp]) : ARexp = r match{ - case AALTS(bs, rs) => rs.map(r => prune7(r, acc)).filter(_ != ZERO) match - { - case Nil => AZERO - case r::Nil => fuse(bs, r) - case rs1 => AALTS(bs, rs1) - } - case ASEQ(bs, r1, r2) => prune7(r1, acc.map(r => removeSeqTail(r, erase(r2)))) match { - case AZERO => AZERO - case r1p if(isOne(erase(r1p))) => fuse(bs ++ mkepsBC(r1p), r2) - case r1p => ASEQ(bs, r1p, r2) - } - case r => if(acc(erase(r))) AZERO else r -} -\end{verbatim} -\caption{pruning function} -\end{figure} - - - -\begin{figure} \centering \begin{tabular}{@{}c@{\hspace{0mm}}c@{\hspace{0mm}}c@{}} \begin{tikzpicture} @@ -316,10 +229,275 @@ \end{axis} \end{tikzpicture}\\ \multicolumn{2}{l}{Graphs: Runtime for matching $((a^* + (aa)^* + \ldots + (aaaaa)^* )^*)^*$ with strings - of the form $\underbrace{aa..a}_{n}$.} + of the form $\underbrace{aa..a}_{n}$.} \end{tabular} \caption{aaaaaStarStar} \label{fig:aaaaaStarStar} \end{figure} +\begin{figure}[H] +\begin{lstlisting} + def atMostEmpty(r: Rexp) : Boolean = r match { + case ZERO => true + case ONE => true + case STAR(r) => atMostEmpty(r) + case SEQ(r1, r2) => atMostEmpty(r1) && atMostEmpty(r2) + case ALTS(r1, r2) => atMostEmpty(r1) && atMostEmpty(r2) + case CHAR(_) => false + } + + + def isOne(r: Rexp) : Boolean = r match { + case ONE => true + case SEQ(r1, r2) => isOne(r1) && isOne(r2) + case ALTS(r1, r2) => (isOne(r1) || isOne(r2)) && (atMostEmpty(r1) && atMostEmpty(r2))//rs.forall(atMostEmpty) && rs.exists(isOne) + case STAR(r0) => atMostEmpty(r0) + case CHAR(c) => false + case ZERO => false + } + + //r = r' ~ tail' : If tail' matches tail => returns r' + def removeSeqTail(r: Rexp, tail: Rexp) : Rexp = r match { + case SEQ(r1, r2) => + if(r2 == tail) + r1 + else + ZERO + case r => ZERO + } + + def prune(r: ARexp, acc: Set[Rexp]) : ARexp = r match{ + case AALTS(bs, rs) => rs.map(r => prune(r, acc)).filter(_ != ZERO) match + { + //all components have been removed, meaning this is effectively a duplicate + //flats will take care of removing this AZERO + case Nil => AZERO + case r::Nil => fuse(bs, r) + case rs1 => AALTS(bs, rs1) + } + case ASEQ(bs, r1, r2) => + //remove the r2 in (ra + rb)r2 to identify the duplicate contents of r1 + prune(r1, acc.map(r => removeSeqTail(r, erase(r2)))) match { + //after pruning, returns 0 + case AZERO => AZERO + //after pruning, got r1'.r2, where r1' is equal to 1 + case r1p if(isOne(erase(r1p))) => fuse(bs ++ mkepsBC(r1p), r2) + //assemble the pruned head r1p with r2 + case r1p => ASEQ(bs, r1p, r2) + } + //this does the duplicate component removal task + case r => if(acc(erase(r))) AZERO else r + } +\end{lstlisting} +\caption{pruning function together with its helper functions} +\end{figure} +\noindent +The benefits of using +$\textit{prune}$ such as refining the finiteness bound +to a cubic bound has not been formalised yet. +Therefore we choose to use Scala code rather than an Isabelle-style formal +definition like we did for $\simp$, as the definitions might change +to suit proof needs. +In the rest of the chapter we will use this convention consistently. +\begin{figure}[H] +\begin{lstlisting} + def distinctWith(rs: List[ARexp], + pruneFunction: (ARexp, Set[Rexp]) => ARexp, + acc: Set[Rexp] = Set()) : List[ARexp] = + rs match{ + case Nil => Nil + case r :: rs => + if(acc(erase(r))) + distinctWith(rs, pruneFunction, acc) + else { + val pruned_r = pruneFunction(r, acc) + pruned_r :: + distinctWith(rs, + pruneFunction, + turnIntoTerms(erase(pruned_r)) ++: acc + ) + } + } +\end{lstlisting} +\caption{A Stronger Version of $\textit{distinctBy}$} +\end{figure} +\noindent +The function $\textit{prune}$ is used in $\distinctWith$. +$\distinctWith$ is a stronger version of $\distinctBy$ +which not only removes duplicates as $\distinctBy$ would +do, but also uses the $\textit{pruneFunction}$ +argument to prune away verbose components in a regular expression.\\ +\begin{figure}[H] +\begin{lstlisting} + //a stronger version of simp + def bsimpStrong(r: ARexp): ARexp = + { + r match { + case ASEQ(bs1, r1, r2) => (bsimpStrong(r1), bsimpStrong(r2)) match { + //normal clauses same as simp + case (AZERO, _) => AZERO + case (_, AZERO) => AZERO + case (AONE(bs2), r2s) => fuse(bs1 ++ bs2, r2s) + //bs2 can be discarded + case (r1s, AONE(bs2)) => fuse(bs1, r1s) //assert bs2 == Nil + case (r1s, r2s) => ASEQ(bs1, r1s, r2s) + } + case AALTS(bs1, rs) => { + //distinctBy(flat_res, erase) + distinctWith(flats(rs.map(bsimpStrong(_))), prune) match { + case Nil => AZERO + case s :: Nil => fuse(bs1, s) + case rs => AALTS(bs1, rs) + } + } + //stars that can be treated as 1 + case ASTAR(bs, r0) if(atMostEmpty(erase(r0))) => AONE(bs) + case r => r + } + } +\end{lstlisting} +\caption{The function $\bsimpStrong$ and $\bdersStrongs$} +\end{figure} +\noindent +$\distinctWith$, is in turn used in $\bsimpStrong$: +\begin{figure}[H] +\begin{lstlisting} + //Conjecture: [| bdersStrong(s, r) |] = O([| r |]^3) + def bdersStrong(s: List[Char], r: ARexp) : ARexp = s match { + case Nil => r + case c::s => bdersStrong(s, bsimpStrong(bder(c, r))) + } +\end{lstlisting} +\caption{The function $\bsimpStrong$ and $\bdersStrongs$} +\end{figure} +\noindent +We conjecture that the above Scala function $\bdersStrongs$, +written $\bdersStrong{\_}{\_}$ as an infix notation, +satisfies the following property: +\begin{conjecture} + $\llbracket \bdersStrong{a}{s} \rrbracket = O(\llbracket a \rrbracket^3)$ +\end{conjecture} +The stronger version of $\blexersimp$'s +code in Scala looks like: +\begin{figure}[H] +\begin{lstlisting} + def strongBlexer(r: Rexp, s: String) : Option[Val] = { + Try(Some(decode(r, strong_blex_simp(internalise(r), s.toList)))).getOrElse(None) + } + def strong_blex_simp(r: ARexp, s: List[Char]) : Bits = s match { + case Nil => { + if (bnullable(r)) { + mkepsBC(r) + } + else + throw new Exception("Not matched") + } + case c::cs => { + strong_blex_simp(strongBsimp(bder(c, r)), cs) + } + } +\end{lstlisting} +\end{figure} +\noindent +We would like to preserve the correctness like the one +we had for $\blexersimp$: +\begin{conjecture}\label{cubicConjecture} + $\blexerStrong \;r \; s = \blexer\; r\;s$ +\end{conjecture} + +To implement this idea into an algorithm, we define a "pruning function" +$\textit{prune}$, that takes away parts of all terms in $r$ that belongs to +$\textit{acc}$, which acts as an element +is a stronger version of $\textit{distinctBy}$. +Here is a concise version of it (in the style of Scala): + + + + +\noindent +The pruning function can be defined recursively: + + + + +This suggests a link towrads "partial derivatives" + introduced by Antimirov \cite{Antimirov95}. + + \section{Antimirov's partial derivatives} + The idea behind Antimirov's partial derivatives +is to do derivatives in a similar way as suggested by Brzozowski, +but maintain a set of regular expressions instead of a single one: + +%TODO: antimirov proposition 3.1, needs completion + \begin{center} + \begin{tabular}{ccc} + $\partial_x(a+b)$ & $=$ & $\partial_x(a) \cup \partial_x(b)$\\ +$\partial_x(\ONE)$ & $=$ & $\phi$ +\end{tabular} +\end{center} + +Rather than joining the calculated derivatives $\partial_x a$ and $\partial_x b$ together +using the alternatives constructor, Antimirov cleverly chose to put them into +a set instead. This breaks the terms in a derivative regular expression up, +allowing us to understand what are the "atomic" components of it. +For example, To compute what regular expression $x^*(xx + y)^*$'s +derivative against $x$ is made of, one can do a partial derivative +of it and get two singleton sets $\{x^* \cdot (xx + y)^*\}$ and $\{x \cdot (xx + y) ^* \}$ +from $\partial_x(x^*) \cdot (xx + y) ^*$ and $\partial_x((xx + y)^*)$. +To get all the "atomic" components of a regular expression's possible derivatives, +there is a procedure Antimirov called $\textit{lf}$, short for "linear forms", that takes +whatever character is available at the head of the string inside the language of a +regular expression, and gives back the character and the derivative regular expression +as a pair (which he called "monomial"): + \begin{center} + \begin{tabular}{ccc} + $\lf(\ONE)$ & $=$ & $\phi$\\ +$\lf(c)$ & $=$ & $\{(c, \ONE) \}$\\ + $\lf(a+b)$ & $=$ & $\lf(a) \cup \lf(b)$\\ + $\lf(r^*)$ & $=$ & $\lf(r) \bigodot \lf(r^*)$\\ +\end{tabular} +\end{center} +%TODO: completion + +There is a slight difference in the last three clauses compared +with $\partial$: instead of a dot operator $ \textit{rset} \cdot r$ that attaches the regular +expression $r$ with every element inside $\textit{rset}$ to create a set of +sequence derivatives, it uses the "circle dot" operator $\bigodot$ which operates +on a set of monomials (which Antimirov called "linear form") and a regular +expression, and returns a linear form: + \begin{center} + \begin{tabular}{ccc} + $l \bigodot (\ZERO)$ & $=$ & $\phi$\\ + $l \bigodot (\ONE)$ & $=$ & $l$\\ + $\phi \bigodot t$ & $=$ & $\phi$\\ + $\{ (x, \ZERO) \} \bigodot t$ & $=$ & $\{(x,\ZERO) \}$\\ + $\{ (x, \ONE) \} \bigodot t$ & $=$ & $\{(x,t) \}$\\ + $\{ (x, p) \} \bigodot t$ & $=$ & $\{(x,p\cdot t) \}$\\ + $\lf(a+b)$ & $=$ & $\lf(a) \cup \lf(b)$\\ + $\lf(r^*)$ & $=$ & $\lf(r) \cdot \lf(r^*)$\\ +\end{tabular} +\end{center} +%TODO: completion + + Some degree of simplification is applied when doing $\bigodot$, for example, + $l \bigodot (\ZERO) = \phi$ corresponds to $r \cdot \ZERO \rightsquigarrow \ZERO$, + and $l \bigodot (\ONE) = l$ to $l \cdot \ONE \rightsquigarrow l$, and + $\{ (x, \ZERO) \} \bigodot t = \{(x,\ZERO) \}$ to $\ZERO \cdot x \rightsquigarrow \ZERO$, + and so on. + + With the function $\lf$ one can compute all possible partial derivatives $\partial_{UNIV}(r)$ of a regular expression $r$ with + an iterative procedure: + \begin{center} + \begin{tabular}{llll} +$\textit{while}$ & $(\Delta_i \neq \phi)$ & & \\ + & $\Delta_{i+1}$ & $ =$ & $\lf(\Delta_i) - \PD_i$ \\ + & $\PD_{i+1}$ & $ =$ & $\Delta_{i+1} \cup \PD_i$ \\ +$\partial_{UNIV}(r)$ & $=$ & $\PD$ & +\end{tabular} +\end{center} + + + $(r_1 + r_2) \cdot r_3 \longrightarrow (r_1 \cdot r_3) + (r_2 \cdot r_3)$, + + @@ -345,22 +523,22 @@ ~r \backslash c = ~ (r \backslash c) \] -The most tricky part of lexing for the $~r$ regex +The most tricky part of lexing for the $~r$ regular expression is creating a value for it. For other regular expressions, the value aligns with the - structure of the regex: + structure of the regular expression: \[ \vdash \Seq(\Char(a), \Char(b)) : a \cdot b \] -But for the $~r$ regex, $s$ is a member of it if and only if +But for the $~r$ regular expression, $s$ is a member of it if and only if $s$ does not belong to $L(r)$. That means when there -is a match for the not regex, it is not possible to generate how the string $s$ matched +is a match for the not regular expression, it is not possible to generate how the string $s$ matched with $r$. What we can do is preserve the information of how $s$ was not matched by $r$, and there are a number of options to do this. -We could give a partial value when there is a partial match for the regex inside +We could give a partial value when there is a partial match for the regular expression inside the $\mathbf{not}$ construct. For example, the string $ab$ is not in the language of $(a\cdot b) \cdot c$, A value for it could be