diff -r 0c491eff5b01 -r 14e5ae1fb541 handouts/antimirov.tex
--- /dev/null	Thu Jan 01 00:00:00 1970 +0000
+++ b/handouts/antimirov.tex	Fri Sep 05 16:59:48 2025 +0100
@@ -0,0 +1,477 @@
+\documentclass[12pt]{article}
+\usepackage{../style}
+\usepackage{../langs}
+\usepackage{graphicx}
+
+\newtheorem{thm}{Theorem}
+\newtheorem{lem}[thm]{Lemma}
+\newtheorem{cor}[thm]{Corollary}
+\newenvironment{proof}{\paragraph{Proof:}\it}{\hfill$\square$}
+
+
+\begin{document}
+
+
+\section*{Antimirov's Proof about Pders}
+
+These are some rough notes about the result by Antimirov establishing
+a bound on the number of regular expressions in a partial
+derivative. From this bound on the number of partial derivatives one
+can easily construct an NFA for a regular expression, but one can also
+derive a bound on the size of the partial derivatives. This is what we
+do first.  We start with the following definitions:
+
+\begin{itemize}
+\item $pder\,c\,r$ --- partial derivative according to a character; this can be defined
+  inductively as follows:
+
+  \begin{center}
+    \begin{tabular}{l@ {\hspace{2mm}}c@ {\hspace{2mm}}l}
+  $\textit{pder}\, c\, (\ZERO)$      & $\dn$ & $\emptyset$\\
+  $\textit{pder}\, c\, (\ONE)$         & $\dn$ & $\emptyset$ \\
+  $\textit{pder}\, c\, (d)$                & $\dn$ & if $c = d$ then $\{\ONE\}$ else $\emptyset$\\
+  $\textit{pder}\, c\, (r_1 + r_2)$        & $\dn$ & $\textit{pder}\, c\, r_1 \;\cup\; \textit{pder}\, c\, r_2$\\
+  $\textit{pder}\, c\, (r_1 \cdot r_2)$  & $\dn$  & if $\textit{nullable} (r_1)$\\
+  & & then $\Pi\,(\textit{pder}\,c\,r_1)\,r_2 \;\cup\; \textit{pder}\, c\, r_2$\\ 
+  & & else $\Pi\,(\textit{pder}\, c\, r_1)\,r_2$\\
+  $\textit{pder}\, c\, (r^*)$          & $\dn$ & $\Pi\,(\textit{pder}\,c\,r)\, (r^*)$
+  \end{tabular}
+\end{center}
+  
+\item $pder^+\,c\,\,rs$ --- partial derivatives for a set regular exprssions $rs$
+\item $pders\,s\,r$ --- partial derivative of a regular expression
+  according to a string
+
+\item $Pders\,A\,r \dn \bigcup_{s\in A}. pders\,s\,r$ --- partial
+  derivatives according to a language (a set of strings)
+\item $|rs|$ is the size of a set of regular expressions $rs$, or
+  the number of elements in the
+  set (also known as the cardinality of this set)
+\item $\prod\,rs\;r \dn \{r_1 \cdot r \;|\; r_1 \in rs\}$ --- this is
+  some convenience when writing a set of sequence regular
+  expressions. It essentially ``appends'' the regular expression $r$ to all
+  regular expressions in the set $rs$. As a result one can write
+  the sequence case for partial derivatives (see above) more conveniently 
+  as
+  \[
+    pder\,c\,(r_1\cdot r_2) \dn
+    \begin{cases}
+      \prod\,(pder\,c\,r_1)\,r_2\;\cup\;pder\,c\,r_2 &
+        \!\!\textit{provided}\,r_1\, \textit{is nullable}\\
+      \prod\,(pder\,c\,r_1)\,r_2 & \!\!\textit{otherwise}\\
+    \end{cases}  
+  \]
+\item $\textit{Psuf}\,s$ is the set of all non-empty suffixes of $s$ defined as
+  \[
+  \textit{PSuf}\, s \dn \{v.\;v \not= [] \wedge \exists u. u \,@\, v = s \}  
+\]
+
+So for the string $abc$ the non-empty suffixes are $c$, $bc$ and
+$abc$.  Also we have that
+$\textit{Psuf}\,(s\,@\,[c]) = ((\textit{Psuf}\,s)\,@@\,[c]) \cup
+\{[c]\}$. Here $@@$ means to concatenate $[c]$ to the end of
+all strings in $\textit{Psuf}\,s$; in this equation  we also
+need to add $\{[c]\}$ in order to make the equation to hold.
+\end{itemize}  
+
+\noindent
+To state Antimirov's result we need the following definition of an
+\emph{alphabetic width} of a regular expression defined as follows:
+
+\begin{center}
+\begin{tabular}{lcl}  
+  $awidth(\ZERO)$ & $\dn$ & $0$\\
+  $awidth(\ONE)$ & $\dn$ & $0$\\
+  $awidth(c)$ & $\dn$ & $1$\\
+  $awidth(r_1 + r_2)$     & $\dn$ & $awidth(r_1) + awidth(r_2)$\\
+  $awidth(r_1 \cdot r_2)$ & $\dn$ & $awidth(r_1) + awidth(r_2)$\\
+  $awidth(r^*)$ & $\dn$ & $awidth(r)$\\
+\end{tabular}
+\end{center}
+
+\noindent
+This function counts how many characters are in a regular expression.
+Antimirov's result states
+
+\begin{thm}\label{one}
+$\forall\,A\,r\,.\;\;|Pders\;A\;r| \leq awidth(r) + 1$
+\end{thm}
+
+\noindent
+Note this theorem holds for any set of strings $A$, for example
+for the set of all strings, which I will write as $\textit{UNIV}$, and also
+for the set $\{s\}$ containing only a single string $s$. Therefore a
+simple corollary is 
+
+\begin{cor}
+$\forall\,s\,r\,.\;\;|pders\;s\;r| \leq awidth(r) + 1$
+\end{cor}
+
+\noindent
+This property says that for every string $s$, the number of regular
+expressions in the derivative can never be bigger than
+$awidth(r) + 1$.  Interestingly we do not show Thm~\ref{one} for all
+sets of strings $A$ directly, but rather only for one particular set of
+strings which I call $UNIV_1$. It includes all strings except the empty string
+(remember $UNIV$ contains all strings).\bigskip
+
+\noindent
+Let's try to give below a comprehensible account of Antimirov's proof
+of Thm.~\ref{one}---I distictly remember that Antimirov's paper is
+great, but pretty incomprehensible for the first 20+ times one reads
+that paper.  The proof starts with the following much weaker property
+about the size being finite:
+
+\begin{lem}
+$\forall\,A\,r\,.\;\;(Pders\;A\;r)$ is a finite set.
+\end{lem}
+
+\noindent This lemma is needed because some reasoning steps in
+Thm~\ref{one} only work for finite sets, not infinite sets. But let us
+skip over the proof of this property at first and let us assume we
+know already that the partial derivatives are always finite sets (this for
+example does not hold for unsimplified Brzozowski derivatives which
+can be infinite for some sets of strings).
+
+There are some central lemmas about partial derivatives for $\cdot$ and $\_^*$.
+One is the following
+
+\begin{lem}\label{central}\mbox{}\\
+  \[Pders\,UNIV_1\,(r_1\cdot r_2) \subseteq (\prod (Pders\,UNIV_1\, r_1)\,r_2) \;
+  \cup \; Pders\,UNIV_1\,r_2\]
+\end{lem}  
+
+\begin{proof}
+  \noindent The proof is done via an induction for the following property
+  \[
+  pders\,s\,(r_1\cdot r_2) \subseteq (\prod (pders\,s\, r_1)\,r_2) \;
+  \cup \; Pders\,(\textit{PSuf}\,s)\,r_2
+  \]
+
+  \noindent
+  Note that this property uses $pders$ and $Pders$ together. The proof is done
+  by ``reverse'' induction on $s$, meaning the cases to analyse are the
+  empty string $[]$ and the case where a character is put at the end of the
+  string $s$, namely $s \,@\, [c]$. The case $[]$ is trivial. In the other
+  case we know by IH that
+
+   \[
+  pders\,s\,(r_1\cdot r_2) \subseteq (\prod (pders\,s\, r_1)\,r_2) \;
+  \cup \; Pders\,(\textit{PSuf}\,s)\,r_2
+  \]
+
+  \noindent
+  holds for $s$. Then we have to show it holds for $s\,@\,[c]$
+
+  \begin{center}
+  \begin{tabular}{r@{\hspace{2mm}}ll}
+        & $pders\,(s\,@\,[c])\,(r_1\cdot r_2)$\\
+    $=$ & $pder^+\,c\,(pders\,s\,(r_1\cdot r_2))$\\
+    $\subseteq$ & $pder^+\,c\,(\prod (pders\,s\, r_1)\,r_2 \;
+                  \cup \; Pders\,(\textit{PSuf}\,s)\,r_2)$\\
+    & \hfill{}by IH\\
+    $=$ & $(pder^+\,c\,(\prod (pders\,s\, r_1)\,r_2))\;\cup\;
+          (pder^+\,c\,(Pders\,(\textit{PSuf}\,s)\,r_2))$\\
+    $=$ & $(pder^+\,c\,(\prod (pders\,s\, r_1)\,r_2))\;\cup\;
+          (Pders\,(\textit{PSuf}\,(s\,@\,[c]))\,r_2)$\\
+    $\subseteq$ &
+                  $(pder^+\,c\,(\prod (pders\,s\, r_1)\,r_2))\;\cup\;
+                  (pder\,c\,r_2)\;\cup\;
+                  (Pders\,(\textit{PSuf}\,s\,@@\,[c])\,r_2)$\\
+    $\subseteq$ & $\prod (pder^+\,c\,(pders\,s\, r_1))\,r_2\;\cup\;
+                  (pder\,c\,r_2)\;\cup\;
+                  (Pders\,(\textit{PSuf}\,s\,@@\,[c])\,r_2)$\\
+    $=$ & $(\prod (pders\,(s\,@\,[c])\, r_1)\,r_2)\;\cup\;
+                  (pder\,c\,r_2)\;\cup\;
+    (Pders\,(\textit{PSuf}\,s\,@@\,[c])\,r_2)$\\
+    $\subseteq$ & $(\prod (pders\,(s\,@\,[c])\, r_1)\,r_2)\;\cup\;
+    (Pders\,(\textit{PSuf}\,(s\,@\,[c]))\,r_2)$\\
+  \end{tabular}    
+  \end{center}
+
+  \noindent The lemma now follows because for an $s \in UNIV_1$ it holds that
+
+  \[
+    \prod\,(pders\,s\,r_1)\,r_2 \subseteq \prod (Pders\,UNIV_1\, r_1)\,r_2
+  \]
+
+  \noindent and
+
+  \[
+    Pders\,(\textit{PSuf}\,s)\,r_2 \subseteq Pders\,UNIV_1\,r_2
+  \]
+
+  \noindent The left-hand sides of the inclusions above are
+  euqal to $pders\,s\,(r_1\cdot r_2)$ for a string $s\in UNIV_1$.
+\end{proof}\medskip
+
+\noindent There is a similar lemma for the $^*$-regular expression, namely:
+
+\begin{lem}\label{centraltwo}
+$Pders\,UNIV_1\,(r^*) \subseteq \prod\, (Pders\,UNIV_1\,r)\,(r^*)$
+\end{lem}  
+
+\noindent 
+We omit the proof for the moment (similar to Lem~\ref{central}). We
+also need the following property about the cardinality of $\prod$:
+
+\begin{lem}\label{centralthree}
+  $|\prod\,(Pders\,A\,r_1)\,r_2| \le |Pders\,A\,r_1|$
+\end{lem}  
+
+\noindent
+We only need the $\le$ version, which essentially says there
+are as many sequences $r\cdot r_2$ as are elements in $r$. Now
+for the proof of Thm~\ref{one}: The main induction in
+Antimirov's proof establishes that:\footnote{Remember that it is
+  always the hardest part in an induction proof to find the right
+  property that is strong enough and of the right shape for the
+  induction to go through.}
+
+\begin{lem}\label{two}
+$\forall r.\;\;|Pders\;UNIV_1\;r| \leq awidth(r)$
+\end{lem}
+
+\begin{proof} This is proved by induction on
+  $r$. The interesting cases are $r_1 + r_2$, $r_1\cdot r_2$ and
+  $r^*$. Let us start with the relatively simple case:\medskip
+
+\noindent  
+\textbf{Case $r_1 + r_2$:} By induction hypothesis we know
+
+\begin{center}
+\begin{tabular}{l}
+  $|Pders\;UNIV_1\;r_1| \leq awidth(r_1)$\\
+  $|Pders\;UNIV_1\;r_2| \leq awidth(r_2)$
+\end{tabular}    
+\end{center}  
+
+\noindent
+In this case we can reason as follows
+
+\begin{center}
+\begin{tabular}{r@{\hspace{2mm}}ll}
+  & $|Pders\;UNIV_1\;(r_1+r_2)|$\\
+  $=$ & $|(Pders\;UNIV_1\;r_1) \;\cup\; (Pders\;UNIV_1\;r_2)|$\\
+  $\leq$ & $|(Pders\;UNIV_1\;r_1)| \;+\; |(Pders\;UNIV_1\;r_2)|$ & (*)\\
+  $\leq$ & $awidth(r_1) + awidth(r_2)$\\
+  $\dn$ & $awidth(r_1 + r_2)$
+\end{tabular}    
+\end{center}  
+
+\noindent Note that (*) is a step that only works if one knows that
+$|(Pders\;UNIV_1\;r_1)|$ and so on are finite. The next case is
+a bit more interesting:\medskip
+
+\noindent  
+\textbf{Case $r_1 \cdot r_2$:} We have the same induction
+hypothesis as in the case before. 
+
+\begin{center}
+\begin{tabular}{r@{\hspace{2mm}}ll}
+  & $|Pders\;UNIV_1\;(r_1\cdot r_2)|$\\
+  $\leq$ & $|\prod\,(Pders\;UNIV_1\;r_1)\,r_2\;\cup\; (Pders\;UNIV_1\;r_2)|$
+  & by Lem~\ref{central}\\
+  $\leq$ & $|\prod\,(Pders\;UNIV_1\;r_1)\,r_2| \;+\; |(Pders\;UNIV_1\;r_2)|$\\
+  $\leq$ & $|Pders\;UNIV_1\;r_1| \;+\; |Pders\;UNIV_1\;r_2|$
+  & by Lem~\ref{centralthree} \\
+  $\leq$ & $awidth(r_1) + awidth(r_2)$\\
+  $\dn$ & $awidth(r_1 \cdot r_2)$
+\end{tabular}    
+\end{center} \medskip
+
+\noindent  
+\textbf{Case $r^*$:} Again we have the same induction
+hypothesis as in the cases before.
+
+\begin{center}
+\begin{tabular}{r@{\hspace{2mm}}ll}
+  & $|Pders\;UNIV_1\;(r^*)|$\\
+  $\leq$ & $|\prod\,(Pders\;UNIV_1\;r)\,(r^*)|$
+  & by Lem~\ref{centraltwo}\\
+  $\leq$ & $|Pders\;UNIV_1\;r|$
+  & by Lem~\ref{centralthree} \\
+  $\leq$ & $awidth(r)$\\
+\end{tabular}    
+\end{center}  
+\end{proof}\bigskip  
+
+\noindent
+From this lemma we can derive the next corrollary which extends
+the property to $UNIV$ ($= UNIV_1 \cup \{[]\}$):
+
+\begin{cor}
+$\forall r.\;\;|Pders\;UNIV\;r| \leq awidth(r) + 1$
+\end{cor}
+
+\begin{proof} This can be proved as follows
+\begin{center}
+\begin{tabular}{r@{\hspace{2mm}}ll}
+  & $|Pders\;UNIV\;r|$\\
+  $=$ & $|Pders\;(UNIV_1 \cup \{[]\})\;r|$\\
+  $=$ & $|(Pders\;UNIV_1\,r) \;\cup\;\{r\}|$\\
+  $\leq$ & $|Pders\;UNIV_1\,r| + 1$ & by Lem~\ref{two}\\
+  $\leq$ & $awidth(r) + 1$\\
+\end{tabular}    
+\end{center}  
+\end{proof}
+
+\noindent
+From the last corollary, it is easy to infer Antimirov's
+Thm~\ref{one}, because
+
+\[ Pders\,A\,r \subseteq Pders\,UNIV\,r \]
+
+
+\noindent
+for all sets $A$.\bigskip
+
+\noindent
+While I was earlier a bit dismissive above about the intelligibility
+of Antimirov's paper, you have to admit this proof is a work of
+beauty. It only gives a bound (\textit{awidth}) for the number of
+regular expressions in the de\-rivatives---this is important for
+constructing NFAs.  Maybe it has not been important before, but I have
+never seen a result about the \emph{size} of the partial
+derivatives.\footnote{Update: I have now seen a paper which proves
+  this result as well.}  However, a very crude bound, namely
+$(size(r)^2 + 1) \times (awidth(r) + 1)$, can be easily derived by
+using Antimirov's result. The definition we need for this is a
+function that collects subexpressions from which partial derivatives
+are built:
+
+
+\begin{center}
+\begin{tabular}{lcl}  
+  $subs(\ZERO)$ & $\dn$ & $\{\ZERO\}$\\
+  $subs(\ONE)$ & $\dn$ & $\{\ONE\}$\\
+  $subs(c)$ & $\dn$ & $\{c, \ONE\}$\\
+  $subs(r_1 + r_2)$     & $\dn$ & $\{r_1 + r_2\}\,\cup\,subs(r_1) \,\cup\, subs(r_2)$\\
+  $subs(r_1 \cdot r_2)$ & $\dn$ & $\{r_1 \cdot r_2\}\,\cup (\prod\,subs(r_1)\;r_2)\,\cup \,
+                                  subs(r_1) \,\cup\, subs(r_2)$\\
+  $subs(r^*)$ & $\dn$ & $\{r^*\}\,\cup\,(\prod\,subs(r)\;r^*) \,\cup\, subs(r)$\\
+\end{tabular}
+\end{center}
+
+\noindent
+We can show that
+
+\begin{lem}
+$pders\,s\,r \subseteq subs(r)$
+\end{lem}  
+
+\noindent
+This is a relatively simple induction on $r$. The point is that for every element
+in $subs$, the maximum size is given by
+
+\begin{lem}
+  If $r' \in subs(r)$ then $size(r') \le 1 + size(r)^2$.
+\end{lem}  
+
+\noindent
+Again the proof is a relatively simple induction on $r$. Stringing Antimirov's result
+and the lemma above together gives
+
+\begin{thm}
+$\sum_{r' \in pders\,s\,r}.\;size(r') \le (size(r)^2 + 1) \times (awidth(r) + 1)$
+\end{thm}  
+
+\noindent
+Since $awidth$ is always smaller than the $size$ of a regular expression,
+one can also state the bound as follows:
+
+\[
+\sum_{r' \in pders\,s\,r}.\;size(r') \le (size(r) + 1)^3
+\]  
+
+\noindent
+This, by the way, also holds for $Pders$, namely
+
+\[
+\sum_{r' \in Pders\,A\,r}.\;size(r') \le (size(r) + 1)^3
+\]  
+
+\noindent
+for all $r$ and $A$. If one is interested in the height of the partial derivatives, one can derive:
+
+\[
+\forall\,r' \in pders\,s\,r.\;height(r') \le height(r) + 1
+\]  
+
+
+\noindent
+meaning the height of the partial derivatives is never bigger than
+the height of the original regular expression (+1).
+
+\section*{NFA Construction via Antimirov's Partial Derivatives}
+
+Let's finish these notes with the construction of an NFA for a regular
+expression using partial derivatives.  As usual an automaton is a
+quintuple $(Q, A, \delta, q_0, F)$ where $Q$ is the set of states of
+the automaton, $A$ is the alphabet, $q_0$ is the starting state and
+$F$ are the accepting states.  For DFAs the $\delta$ is a (partial)
+function from state $\times$ character to state. For NFAs it is a
+relation between state $\times$ character $\times$ state. The
+non-determinism can be seen by the following: consider three
+(distinct) states $q_1$, $q_2$ and $q_3$, then the relation can
+include $(q_1, a, q_2)$ and $(q_1, a, q_3)$, which means there is a
+transition between $q_1$ and both $q_2$ and $q_3$ for the character
+$a$.
+
+The Antimirov's NFA for a regular expression $r$ is then
+given by the quintuple
+\[(PD(r), A, \delta_{PD}, r, F)\]
+
+\noindent
+where $PD(r)$ are all the partial
+derivatives according to all strings, that is
+\[
+PD(r) \;\dn\; \textit{Pders}\;\textit{UNIV}\;r
+\]
+
+\noindent
+Because of the previous proof, we know that this set is finite. We
+also see that the states in Antimirov's NFA are ``labelled'' by single
+regular expressions.  The starting state is labelled with the original
+regular expression $r$. The set of accepting states $F$ is all states
+$r'\in F$ where $r'$ is nullable. The relation $\delta_{PD}$ is given by
+\[
+(r_1, c, r_2)
+\]
+
+\noindent
+for every $r_1 \in PD(r)$ and $r_2 \in \textit{pder}\,c\,r$. This is
+in general a ``non-deterministic'' relation because the set of partial
+derivatives often contains more than one element. A nice example of
+an NFA constructed via Antimirov's partial derivatives is given in
+\cite{IlieYu2003} on Page 378.
+
+The difficulty of course in this construction is to find the set of
+partial derivatives according to \emph{all} strings. However, it seem
+a procedure that enumerates strings according to size suffices until
+no new derivative is found. There are various improvements that apply
+clever tricks on how to more efficiently discover this set.
+
+
+\begin{thebibliography}{999}
+
+\bibitem{IlieYu2003}
+  L.~Ilie and S.~Yu,
+  \emph{Reducing NFAs by Invariant Equivalences}.
+  In Theoretical Computer Science, Volume 306(1--3), Pages 373–-390, 2003.\\
+  \url{https://core.ac.uk/download/pdf/82545723.pdf}
+
+\end{thebibliography}
+
+
+
+
+
+
+\end{document}
+
+
+%%% Local Variables: 
+%%% mode: latex
+%%% TeX-master: t
+%%% End: