afl-material: comparison handouts/ho02.tex

equal deleted inserted replaced

-:f2d7b885b3e3
+:5dc452d7c08e
 \section*{Handout 2 (Regular Expression Matching)}
 This lecture is about implementing a more efficient regular expression
-matcher (the plots on the right)---more efficient than the matchers
+matcher (the plots on the right below)---more efficient than the
-from regular expression libraries in Ruby, Python and Java (the plots
+matchers from regular expression libraries in Ruby, Python and Java
-on the left). The first pair of plots show the running time for the
+(the plots on the left). The first pair of plots shows the running time
-regular expressions $a^?{}^{\{n\}}\cdot a^{\{n\}}$ and strings composed
+for the regular expression $(a^*)^*\cdot b$ and strings composed of
-of $n$ \pcode{a}s. The second pair of plots show the running time
+$n$ \pcode{a}s (meaning this regular expression actually does not
-for the regular expression $(a^*)^*\cdot b$ and also strings composed
+match the strings). The second pair of plots shows the running time for
-of $n$ \pcode{a}s (meaning this regular expression actually does not
+the regular expressions $a^?{}^{\{n\}}\cdot a^{\{n\}}$ and strings
+also composed of $n$ \pcode{a}s (this time the regular expressions
 match the strings).  To see the substantial differences in the left
 and right plots below, note the different scales of the $x$-axes.
+\begin{center}
+Graphs: $(a^*)^* \cdot b$ and strings $\underbrace{a\ldots a}_{n}$
+\begin{tabular}{@{}cc@{}}
+\begin{tikzpicture}
+\begin{axis}[
+xlabel={$n$},
+x label style={at={(1.05,0.0)}},
+ylabel={time in secs},
+enlargelimits=false,
+xtick={0,5,...,30},
+xmax=33,
+ymax=35,
+ytick={0,5,...,30},
+scaled ticks=false,
+axis lines=left,
+width=5cm,
+height=5cm,
+legend entries={Java, Python},
+legend pos=north west,
+legend cell align=left]
+\addplot[blue,mark=*, mark options={fill=white}] table {re-python2.data};
+\addplot[cyan,mark=*, mark options={fill=white}] table {re-java.data};
+\end{axis}
+\end{tikzpicture}
+&
+\begin{tikzpicture}
+\begin{axis}[
+xlabel={$n$},
+x label style={at={(1.1,0.0)}},
+%%xtick={0,1000000,...,5000000},
+ylabel={time in secs},
+enlargelimits=false,
+ymax=35,
+ytick={0,5,...,30},
+axis lines=left,
+%scaled ticks=false,
+width=6.5cm,
+height=5cm,
+legend entries={Our matcher},
+legend pos=north east,
+legend cell align=left]
+%\addplot[green,mark=square*,mark options={fill=white}] table {re2a.data};
+\addplot[black,mark=square*,mark options={fill=white}] table {re3a.data};
+\end{axis}
+\end{tikzpicture}
+\end{tabular}
+\end{center}\bigskip
 \begin{center}
 Graphs: $a^{?\{n\}} \cdot a^{\{n\}}$ and strings $\underbrace{a\ldots a}_{n}$\\
 \begin{tabular}{@{}cc@{}}
 \begin{tikzpicture}
 \begin{axis}[
 xlabel={$n$},
 x label style={at={(1.1,0.05)}},
 ylabel={\small time in secs},
 enlargelimits=false,
-xtick={0,3000,...,9000},
+xtick={0,2500,...,11000},
-xmax=10000,
+xmax=12000,
 ymax=35,
 ytick={0,5,...,30},
 scaled ticks=false,
 axis lines=left,
 width=6.5cm,
-height=5cm]
+height=5cm,
-\addplot[green,mark=square*,mark options={fill=white}] table {re2.data};
+legend entries={Our matcher},
+legend pos=north east,
+legend cell align=left]
+%\addplot[green,mark=square*,mark options={fill=white}] table {re2.data};
 \addplot[black,mark=square*,mark options={fill=white}] table {re3.data};
 \end{axis}
 \end{tikzpicture}
 \end{tabular}
 \end{center}
+\bigskip
-\begin{center}
-Graphs: $(a^*)^* \cdot b$ and strings $\underbrace{a\ldots a}_{n}$
+\noindent
-\begin{tabular}{@{}cc@{}}
+In what follows we will use these regular expressions and strings as
-\begin{tikzpicture}
+running examples. There will be several versions (V1, V2, V3,\ldots)
-\begin{axis}[
+of our matcher.\footnote{The corresponding files are
-xlabel={$n$},
+\texttt{re1.scala}, \texttt{re2.scala} and so on. As usual, you can
-x label style={at={(1.05,0.0)}},
+find the code on KEATS.}\bigskip
-ylabel={time in secs},
-enlargelimits=false,
+\noindent
-xtick={0,5,...,30},
-xmax=33,
-ymax=35,
-ytick={0,5,...,30},
-scaled ticks=false,
-axis lines=left,
-width=5cm,
-height=5cm,
-legend entries={Java},
-legend pos=north west,
-legend cell align=left]
-\addplot[cyan,mark=*, mark options={fill=white}] table {re-java.data};
-\end{axis}
-\end{tikzpicture}
-&
-\begin{tikzpicture}
-\begin{axis}[
-xlabel={$n$},
-x label style={at={(1.05,0.0)}},
-ylabel={time in secs},
-enlargelimits=false,
-ymax=35,
-ytick={0,5,...,30},
-axis lines=left,
-scaled ticks=false,
-width=6.5cm,
-height=5cm]
-\addplot[green,mark=square*,mark options={fill=white}] table {re2a.data};
-\addplot[black,mark=square*,mark options={fill=white}] table {re3a.data};
-\end{axis}
-\end{tikzpicture}
-\end{tabular}
-\end{center}\medskip
-\noindent
-We will use these regular expressions and strings
-as running examples.
 Having specified in the previous lecture what
 problem our regular expression matcher is supposed to solve,
 namely for any given regular expression $r$ and string $s$
 answer \textit{true} if and only if
 \[
 s \in L(r)
 \]
-\noindent we can look at an algorithm to solve this problem. Clearly
+\noindent we can look for an algorithm to solve this problem. Clearly
 we cannot use the function $L$ directly for this, because in general
 the set of strings $L$ returns is infinite (recall what $L(a^*)$ is).
 In such cases there is no way we can implement an exhaustive test for
 whether a string is member of this set or not. In contrast our
 matching algorithm will operate on the regular expression $r$ and
 (r_1 + \ZERO) \cdot \ONE + ((\ONE + r_2) + r_3) \cdot (r_4 \cdot \ZERO)
 \label{big}
 \end{equation}
 \noindent If we can find an equivalent regular expression that is
-simpler (smaller for example), then this might potentially make our
+simpler (that usually means smaller), then this might potentially make
-matching algorithm run faster. We can look for such a simpler regular
+our matching algorithm run faster. We can look for such a simpler
-expression $r'$ because whether a string $s$ is in $L(r)$ or in
+regular expression $r'$ because whether a string $s$ is in $L(r)$ or
-$L(r')$ with $r\equiv r'$ will always give the same answer. In the
+in $L(r')$ with $r\equiv r'$ will always give the same answer. Yes?
-example above you will see that the regular expression is equivalent
-to just $r_1$. You can verify this by iteratively applying the
+In the example above you will see that the regular expression is
-simplification rules from above:
+equivalent to just $r_1$. You can verify this by iteratively applying
+the simplification rules from above:
 \begin{center}
 \begin{tabular}{ll}
 & $(r_1 + \ZERO) \cdot \ONE + ((\ONE + r_2) + r_3) \cdot
 (\underline{r_4 \cdot \ZERO})$\smallskip\\
 rule is applied. Our matching algorithm in the next section
 will often generate such ``useless'' $\ONE$s and
 $\ZERO$s, therefore simplifying them away will make the
 algorithm quite a bit faster.
+Finally here are three equivalences between regular expressions which are
+not so obvious:
+\begin{center}
+\begin{tabular}{rcl}
+$r^*$  & $\equiv$ & $1 + r\cdot r^*$\\
+$(r_1 + r_2)^*$  & $\equiv$ & $r_1^* \cdot (r_2\cdot r_1^*)^*$\\
+$(r_1 \cdot r_2)^*$ & $\equiv$ & $1 + r_1\cdot (r_2 \cdot r_1)^* \cdot r_2$\\
+\end{tabular}
+\end{center}
+\noindent
+We will not use them in our algorithm, but feel free to convince you
+that they hold. As an aside, there has been a lot of research about
+questions like: Can one always decide when two regular expressions are
+equivalent or not? What does an algorithm look like to decide this
+efficiently?
 \subsection*{The Matching Algorithm}
 The algorithm we will define below consists of two parts. One
 is the function $\textit{nullable}$ which takes a regular expression as
 argument and decides whether it can match the empty string
 The other function of our matching algorithm calculates a
 \emph{derivative} of a regular expression. This is a function
 which will take a regular expression, say $r$, and a
 character, say $c$, as arguments and returns a new regular
-expression. Be careful that the intuition behind this function
+expression. Be mindful that the intuition behind this function
 is not so easy to grasp on first reading. Essentially this
 function solves the following problem: if $r$ can match a
-string of the form $c\!::\!s$, what does the regular
+string of the form $c\!::\!s$, what does a regular
 expression look like that can match just $s$? The definition
 of this function is as follows:
 \begin{center}
 \begin{tabular}{l@ {\hspace{2mm}}c@ {\hspace{2mm}}l}
 $c\!::\!s$, then the first part must be ``matched'' by a
 single copy of $r$. Therefore we call recursively $\textit{der}\,c\,r$
 and ``append'' $r^*$ in order to match the rest of $s$. Still
 makes sense?
-If all this did not make sense yet, here is another way to rationalise
+If all this did not make sense yet, here is another way to explain the
-the definition of $\textit{der}$ by considering the following operation
+definition of $\textit{der}$ by considering the following operation on
-on sets:
+sets:
 \begin{equation}\label{Der}
 \textit{Der}\,c\,A\;\dn\;\{s\,|\,c\!::\!s \in A\}
 \end{equation}
 $\textit{ders}\, (c\!::\!s)\, r$ & $\dn$ & $\textit{ders}\,s\,(\textit{der}\,c\,r)$ & \\
 \end{tabular}
 \end{center}
 \noindent This function iterates $\textit{der}$ taking one character at
-the time from the original string until it is exhausted.
+the time from the original string until the string is exhausted.
 Having $\textit{der}s$ in place, we can finally define our matching
 algorithm:
 \[
 \textit{matches}\,s\,r \dn \textit{nullable}(\textit{ders}\,s\,r)
 Given the implementation of regular expressions in Scala shown
 in the first lecture and handout, the functions and subfunctions
 for \pcode{matches} are shown in Figure~\ref{scala1}.
 \begin{figure}[p]
-\lstinputlisting{../progs/app5.scala}
+\lstinputlisting[numbers=left,linebackgroundcolor=
-\caption{Scala implementation of the \textit{nullable} and
+{\ifodd\value{lstnumber}\color{capri!3}\fi}]
+{../progs/app5.scala}
+\caption{A Scala implementation of the \textit{nullable} and
 derivative functions. These functions are easy to
-implement in functional languages, because their built-in pattern
+implement in functional languages. This is because pattern
 matching and recursion allow us to mimic the mathematical
-definitions very closely.\label{scala1}}
+definitions very closely. Nearly all functional
+programming languages support pattern matching and
+recursion out of the box.\label{scala1}}
 \end{figure}
 %Remember our second example involving the regular expression
 %$(a^*)^* \cdot b$ which could not match strings of $n$ \texttt{a}s.
 %strings up to the length of 6500. After that we receive a
 %StackOverflow exception, but still\ldots
 For running the algorithm with our first example, the evil
 regular expression $a^?{}^{\{n\}}a^{\{n\}}$, we need to implement
-the optional regular expression and the exactly $n$-times
+the optional regular expression and the `exactly $n$-times
-regular expression. This can be done with the translations
+regular expression'. This can be done with the translations
 \lstinputlisting[numbers=none]{../progs/app51.scala}
 \noindent Running the matcher with this example, we find it is
 slightly worse then the matcher in Ruby and Python.
 \addplot[red,mark=triangle*,mark options={fill=white}] table {re1.data};
 \end{axis}
 \end{tikzpicture}
 \end{center}
-\noindent Analysing this failure we notice that for
+\noindent Analysing this failure we notice that for $a^{\{n\}}$, for
-$a^{\{n\}}$ we generate quite big regular expressions:
+example, we generate quite big regular expressions:
 \begin{center}
 \begin{tabular}{rl}
 1: & $a$\\
 2: & $a\cdot a$\\
 \noindent Our algorithm traverses such regular expressions at
 least once every time a derivative is calculated. So having
 large regular expressions will cause problems. This problem
 is aggravated by $a^?$ being represented as $a + \ONE$.
-We can however fix this by having an explicit constructor for
+We can however fix this easily by having an explicit constructor for
 $r^{\{n\}}$. In Scala we would introduce a constructor like
 \begin{center}
 \code{case class NTIMES(r: Rexp, n: Int) extends Rexp}
 \end{center}
-\noindent With this fix we have a constant ``size'' regular
+\noindent With this fix we have a constant ``size'' regular expression
-expression for our running example no matter how large $n$ is.
+for our running example no matter how large $n$ is (see the
-This means we have to also add cases for \pcode{NTIMES} in the
+\texttt{size} section in the implementations).  This means we have to
-functions $\textit{nullable}$ and $\textit{der}$. Does the change have any
+also add cases for \pcode{NTIMES} in the functions $\textit{nullable}$
-effect?
+and $\textit{der}$. Does the change have any effect?
 \begin{center}
 \begin{tikzpicture}
 \begin{axis}[
 title={Graph: $a^{?\{n\}} \cdot a^{\{n\}}$ and strings $\underbrace{a\ldots a}_{n}$},
 xlabel={$n$},
 x label style={at={(1.01,0.0)}},
 ylabel={time in secs},
 enlargelimits=false,
-xtick={0,100,...,1000},
+xtick={0,200,...,1100},
-xmax=1100,
+xmax=1200,
 ytick={0,5,...,30},
 scaled ticks=false,
 axis lines=left,
 width=10cm,
 height=5cm,
 \addplot[green,mark=square*,mark options={fill=white}] table {re2.data};
 \end{axis}
 \end{tikzpicture}
 \end{center}
-\noindent Now we are talking business! The modified matcher
+\noindent Now we are talking business! The modified matcher can within
-can within 30 seconds handle regular expressions up to
+25 seconds handle regular expressions up to $n = 1,100$ before a
-$n = 950$ before a StackOverflow is raised. Recall that Python and Ruby
+StackOverflow is raised. Recall that Python and Ruby (and our first
-(and our first version, Scala V1) could only handle $n = 27$ or so in 30
+version, Scala V1) could only handle $n = 27$ or so in 30
-seconds. There is no change for our second example
+seconds. We have not tried our algorithm on the second example $(a^*)^* \cdot
-$(a^*)^* \cdot b$---so this is still good.
+b$---but it is doing OK with it.
 The moral is that our algorithm is rather sensitive to the
 size of regular expressions it needs to handle. This is of
 course obvious because both $\textit{nullable}$ and $\textit{der}$ frequently
 need to traverse the whole regular expression. There seems,
 however, one more issue for making the algorithm run faster.
 The derivative function often produces ``useless''
 $\ZERO$s and $\ONE$s. To see this, consider $r = ((a
-\cdot b) + b)^*$ and the following two derivatives
+\cdot b) + b)^*$ and the following three derivatives
 \begin{center}
 \begin{tabular}{l}
 $\textit{der}\,a\,r = ((\ONE \cdot b) + \ZERO) \cdot r$\\
 $\textit{der}\,b\,r = ((\ZERO \cdot b) + \ONE)\cdot r$\\
 $\textit{der}\,c\,r = ((\ZERO \cdot b) + \ZERO)\cdot r$
 \end{tabular}
 \end{center}
 \noindent
-If we simplify them according to the simple rules from the
+If we simplify them according to the simplification rules from the
-beginning, we can replace the right-hand sides by the
+beginning, we can replace the right-hand sides by the smaller
-smaller equivalent regular expressions
+equivalent regular expressions
 \begin{center}
 \begin{tabular}{l}
 $\textit{der}\,a\,r \equiv b \cdot r$\\
 $\textit{der}\,b\,r \equiv r$\\
 $\textit{der}\,c\,r \equiv \ZERO$
 \end{tabular}
 \end{center}
 \noindent I leave it to you to contemplate whether such a
-simplification can have any impact on the correctness of our
+simplification can have any impact on the correctness of our algorithm
-algorithm (will it change any answers?). Figure~\ref{scala2}
+(will it change any answers?). Figure~\ref{scala2} gives a
-gives a simplification function that recursively traverses a
+simplification function that recursively traverses a regular
-regular expression and simplifies it according to the rules
+expression and simplifies it according to the rules given at the
-given at the beginning. There are only rules for $+$, $\cdot$
+beginning. There are only rules for $+$, $\cdot$ and $n$-times (the
-and $n$-times (the latter because we added it in the second
+latter because we added it in the second version of our
-version of our matcher). There is no rule for a star, because
+matcher). There is no simplification rule for a star, because
-empirical data and also a little thought showed that
+empirical data and also a little thought showed that simplifying under
-simplifying under a star is a waste of computation time. The
+a star is a waste of computation time. The simplification function
-simplification function will be called after every derivation.
+will be called after every derivation.  This additional step removes
-This additional step removes all the ``junk'' the derivative
+all the ``junk'' the derivative function introduced. Does this improve
-function introduced. Does this improve the speed? You bet!!
+the speed? You bet!!
 \begin{figure}[p]
-\lstinputlisting{../progs/app6.scala}
+\lstinputlisting[numbers=left,linebackgroundcolor=
+{\ifodd\value{lstnumber}\color{capri!3}\fi}]
+{../progs/app6.scala}
 \caption{The simplification function and modified
 \texttt{ders}-function; this function now
 calls \texttt{der} first, but then simplifies
 the resulting derivative regular expressions before
 building the next derivative, see
 title={Graph: $a^{?\{n\}} \cdot a^{\{n\}}$ and strings $\underbrace{a\ldots a}_{n}$},
 xlabel={$n$},
 x label style={at={(1.04,0.0)}},
 ylabel={time in secs},
 enlargelimits=false,
-xtick={0,3000,...,9000},
+xtick={0,2500,...,10000},
-xmax=10000,
+xmax=12000,
 ytick={0,5,...,30},
 ymax=32,
 scaled ticks=false,
 axis lines=left,
 width=9cm,
 \end{axis}
 \end{tikzpicture}
 \end{center}
 \noindent
-To reacap, Python and Ruby needed approximately 30 seconds to match
+To reacap, Python and Ruby needed approximately 30 seconds to match a
-a string of 28 \texttt{a}s and the regular expression $a^{?\{n\}} \cdot a^{\{n\}}$.
+string of 28 \texttt{a}s and the regular expression $a^{?\{n\}} \cdot
-We need a third of this time to do the same with strings up to 12,000 \texttt{a}s.
+a^{\{n\}}$.  We need a third of this time to do the same with strings
-Similarly, Java needed 30 seconds to find out the regular expression
+up to 11,000 \texttt{a}s.  Similarly, Java and Python needed 30
-$(a^*)^* \cdot b$ does not match the string of 28 \texttt{a}s. We can do
+seconds to find out the regular expression $(a^*)^* \cdot b$ does not
-the same in approximately 5 seconds for strings of 6000000 \texttt{a}s:
+match the string of 28 \texttt{a}s. We can do the same in
+for strings composed of nearly 6,000,000 \texttt{a}s:
 \begin{center}
 \begin{tikzpicture}
 \begin{axis}[
 title={Graph: $(a^*)^* \cdot b$ and strings $\underbrace{a\ldots a}_{n}$},
 xlabel={$n$},
-x label style={at={(1.09,0.0)}},
 ylabel={time in secs},
 enlargelimits=false,
-xmax=7700000,
+ymax=35,
 ytick={0,5,...,30},
-ymax=32,
+axis lines=left,
 %scaled ticks=false,
-axis lines=left,
+x label style={at={(1.09,0.0)}},
+%xmax=7700000,
 width=9cm,
 height=5cm,
-legend entries={Scala V2, Scala V3},
+legend entries={Scala V3},
 legend pos=outer north east,
 legend cell align=left]
-\addplot[green,mark=square*,mark options={fill=white}] table {re2a.data};
+%\addplot[green,mark=square*,mark options={fill=white}] table {re2a.data};
 \addplot[black,mark=square*,mark options={fill=white}] table {re3a.data};
 \end{axis}
 \end{tikzpicture}
 \end{center}
 \subsection*{Epilogue}
-(23/Aug/2016) I recently found another place where this algorithm can be
+(23/Aug/2016) I recently found another place where this algorithm can
-sped (this idea is not integrated with what is coming next,
+be sped up (this idea is not integrated with what is coming next, but
-but I present it nonetheless). The idea is to define \texttt{ders}
+I present it nonetheless). The idea is to not define \texttt{ders}
-not such that it iterates the derivative character-by-character, but
+that it iterates the derivative character-by-character, but in bigger
-in bigger chunks. The resulting code for \texttt{ders2} looks as
+chunks. The resulting code for \texttt{ders2} looks as follows:
-follows:
 \lstinputlisting[numbers=none]{../progs/app52.scala}
 \noindent
 I have not fully understood why this version is much faster,
 xmax=7100000,
 ytick={0,5,...,30},
 ymax=33,
 %scaled ticks=false,
 axis lines=left,
-width=5.5cm,
+width=5.3cm,
 height=5cm,
 legend entries={Scala V3, Scala V4},
 legend style={at={(0.1,-0.2)},anchor=north}]
 \addplot[black,mark=square*,mark options={fill=white}] table {re3.data};
 \addplot[purple,mark=square*,mark options={fill=white}] table {re4.data};
 title={Graph: $(a^*)^* \cdot b$ and strings $\underbrace{a\ldots a}_{n}$},
 xlabel={$n$},
 x label style={at={(1.09,0.0)}},
 ylabel={time in secs},
 enlargelimits=false,
-xmax=8100000,
+xmax=8200000,
 ytick={0,5,...,30},
 ymax=33,
 %scaled ticks=false,
 axis lines=left,
-width=5.5cm,
+width=5.3cm,
 height=5cm,
 legend entries={Scala V3, Scala V4},
 legend style={at={(0.1,-0.2)},anchor=north}]
 \addplot[black,mark=square*,mark options={fill=white}] table {re3a.data};
 \addplot[purple,mark=square*,mark options={fill=white}] table {re4a.data};
 \section*{Proofs}
 You might not like doing proofs. But they serve a very
 important purpose in Computer Science: How can we be sure that
-our algorithm matches its specification. We can try to test
+our algorithm matches its specification? We can try to test
 the algorithm, but that often overlooks corner cases and an
 exhaustive testing is impossible (since there are infinitely
 many inputs). Proofs allow us to ensure that an algorithm
 really meets its specification.
 & $\mid$ & $r_1 \cdot r_2$      & sequence\\
 & $\mid$ & $r^*$                & star (zero or more)\\
 \end{tabular}
 \end{center}
-\noindent If you want to show a property $P(r)$ for all
+\noindent If you want to show a property $P(r)$ for \emph{all}
 regular expressions $r$, then you have to follow essentially
 the recipe:
 \begin{itemize}
 \item $P$ has to hold for $\ZERO$, $\ONE$ and $c$
 \textit{nullable}(r_1 + r_2) \;\;\text{if and only if}\;\;
 []\in L(r_1 + r_2)
 \label{propalt}
 \end{equation}
-\noindent The difference to the base cases is that in this
+\noindent The difference to the base cases is that in the inductive
-case we can already assume we proved
+cases we can already assume we proved $P$ for the components, that is
+we can assume.
 \begin{center}
 \begin{tabular}{l}
 $\textit{nullable}(r_1) \;\;\text{if and only if}\;\; []\in L(r_1)$ and\\
 $\textit{nullable}(r_2) \;\;\text{if and only if}\;\; []\in L(r_2)$\\
 \end{tabular}
 \end{center}
-\noindent These are the induction hypotheses. To check this
+\noindent These are called the induction hypotheses. To check this
 case, we can start from $\textit{nullable}(r_1 + r_2)$, which by
-definition is
+definition of $\textit{nullable}$ is
 \[
 \textit{nullable}(r_1) \vee \textit{nullable}(r_2)
 \]
 \[
 [] \in L(r_1)\cup L(r_2)
 \]
-\noindent but this is by definition of $L$ exactly $[] \in
+\noindent but this is by definition of $L$ exactly $[] \in L(r_1 +
-L(r_1 + r_2)$, which we needed to establish according to
+r_2)$, which we needed to establish according to statement in
 \eqref{propalt}. What we have shown is that starting from
 $\textit{nullable}(r_1 + r_2)$ we have done equivalent transformations
-to end up with $[] \in L(r_1 + r_2)$. Consequently we have
+to end up with $[] \in L(r_1 + r_2)$. Consequently we have established
-established that $P(r_1 + r_2)$ holds.
+that $P(r_1 + r_2)$ holds.
 In order to complete the proof we would now need to look
 at the cases \mbox{$P(r_1\cdot r_2)$} and $P(r^*)$. Again I let you
 check the details.
-You might have to do induction proofs over strings.
+You might also have to do induction proofs over strings.
 That means you want to establish a property $P(s)$ for all
 strings $s$. For this remember strings are lists of
 characters. These lists can be either the empty list or a
 list of the form $c::s$. If you want to perform an induction
 proof for strings you need to consider the cases
 \[
 L(\textit{der}\,c\,r) = \textit{Der}\,c\,(L(r))
 \]
-\noindent holds (this would be of course a property that
+\noindent holds (this would be of course another property that needs
-needs to be proved in a side-lemma by induction on $r$).
+to be proved in a side-lemma by induction on $r$). This is a bit
+more challenging, but not impossible.
 To sum up, using reasoning like the one shown above allows us
 to show the correctness of our algorithm. To see this,
 start from the specification
 \begin{equation}
 [] \in \textit{Ders}\,s\,(L(r))
 \label{dersstep}
 \end{equation}
-\noindent But we have shown above in \eqref{dersprop}, that
+\noindent You agree?  But we have shown above in \eqref{dersprop},
-the $\textit{Ders}$ can be replaced by $L(\textit{ders}\ldots)$. That means
+that the $\textit{Ders}$ can be replaced by
-\eqref{dersstep} is equivalent to
+$L(\textit{ders}\ldots)$. That means \eqref{dersstep} is equivalent to
 \begin{equation}
 [] \in L(\textit{ders}\,s\,r)
 \label{prefinalstep}
 \end{equation}
 \[
 matches\,s\,r\;\;\text{if and only if}\;\;
 s\in L(r)
 \]
-\noindent which is the property we set out to prove:
+\noindent which is the property we set out to prove: our algorithm
-our algorithm meets its specification. To have done
+meets its specification. To have done so, requires a few induction
-so, requires a few induction proofs about strings and
+proofs about strings and regular expressions. Following the \emph{induction
-regular expressions. Following the recipes is already a big
+recipes} is already a big step in actually performing these proofs.
-step in performing these proofs.
+If you do not believe it, proofs have helped me to make sure my code
+is correct and in several instances prevented me of letting slip
+embarassing mistakes into the `wild'.
 \end{document}

changeset 504	5dc452d7c08e
parent 492	39b7ff2cf1bc
child 510	25580bf89ac0