updated handouts
authorChristian Urban <christian dot urban at kcl dot ac dot uk>
Sun, 28 Sep 2014 18:07:58 +0100
changeset 261 24531cfaa36a
parent 260 65d1ea0e989f
child 262 ee4304bc6350
updated handouts
handouts/ho02.pdf
handouts/ho02.tex
progs/app5.scala
progs/app51.scala
progs/re1.scala
progs/re2.scala
progs/re3.scala
slides/slides02.pdf
slides/slides02.tex
Binary file handouts/ho02.pdf has changed
--- a/handouts/ho02.tex	Sat Sep 27 00:37:02 2014 +0100
+++ b/handouts/ho02.tex	Sun Sep 28 18:07:58 2014 +0100
@@ -1,15 +1,77 @@
 \documentclass{article}
 \usepackage{../style}
 \usepackage{../langs}
-
+\usepackage{../graphics}
+\usepackage{../data}
 
 \begin{document}
 
 \section*{Handout 2}
 
-Having specified what problem our matching algorithm, \pcode{match},
-is supposed to solve, namely for a given regular expression $r$ and
-string $s$ answer \textit{true} if and only if
+This lecture is about implementing a more efficient regular
+expression matcher (the plots on the right)---more efficient
+than the matchers from regular expression libraries in Ruby and
+Python (the plots on the left). These plots show the running 
+time for the evil regular expression $a?^{\{n\}}a^{\{n\}}$.
+
+\begin{center}
+\begin{tabular}{@{}cc@{}}
+\begin{tikzpicture}[y=.072cm, x=.12cm]
+ 	%axis
+	\draw (0,0) -- coordinate (x axis mid) (30,0);
+   \draw (0,0) -- coordinate (y axis mid) (0,30);
+   %ticks
+   \foreach \x in {0,5,...,30}
+     \draw (\x,1pt) -- (\x,-3pt) node[anchor=north] {\x};
+   \foreach \y in {0,5,...,30}
+     \draw (1pt,\y) -- (-3pt,\y) node[anchor=east] {\y}; 
+	%labels      
+	\node[below=0.6cm] at (x axis mid) {number of \texttt{a}s};
+	\node[rotate=90,left=0.9cm] at (y axis mid) {time in secs};
+	%plots
+	\draw[color=blue] plot[mark=*] 
+		file {re-python.data};
+	\draw[color=brown] plot[mark=triangle*] 
+		file {re-ruby.data};	 
+   %legend
+	\begin{scope}[shift={(4,20)}] 
+	\draw[color=blue] (0,0) -- 
+		plot[mark=*] (0.25,0) -- (0.5,0) 
+		node[right]{\small Python};
+	\draw[yshift=-4mm, color=brown] (0,0) -- 
+		plot[mark=triangle*] (0.25,0) -- (0.5,0)
+		node[right]{\small Ruby};
+	\end{scope}	
+\end{tikzpicture} 
+&
+\begin{tikzpicture}[y=.072cm, x=.0004cm]
+ 	%axis
+	\draw (0,0) -- coordinate (x axis mid) (12000,0);
+   \draw (0,0) -- coordinate (y axis mid) (0,30);
+   %ticks
+   \foreach \x in {0,3000,...,12000}
+     	\draw (\x,1pt) -- (\x,-3pt) node[anchor=north] {\x};
+   \foreach \y in {0,5,...,30}
+     	\draw (1pt,\y) -- (-3pt,\y) node[anchor=east] {\y}; 
+	%labels      
+	\node[below=0.6cm] at (x axis mid) {number of \texttt{a}s};
+	\node[rotate=90,left=0.9cm] at (y axis mid) {time in secs};
+
+	%plots
+   \draw[color=green] plot[mark=square*, mark options={fill=white} ] 
+		file {re2b.data};
+	\draw[color=black] plot[mark=square*, mark options={fill=white} ] 
+		file {re3.data};	 
+\end{tikzpicture}
+\end{tabular}
+\end{center}\medskip
+
+
+\noindent Having specified in the previous lecture what
+problem our regular expression matcher, which we will call
+\pcode{matches}, is supposed to solve, namely for any given
+regular expression $r$ and string $s$ answer \textit{true} if
+and only if
 
 \[
 s \in L(r)
@@ -20,95 +82,219 @@
 because in general the set of strings $L$ returns is infinite
 (recall what $L(a^*)$ is). In such cases there is no way we
 can implement an exhaustive test for whether a string is
-member of this set or not. Before we come to the matching 
-algorithm, lets have a closer look at what it means when
-two regular expressions are equivalent.
+member of this set or not. In contrast our matching algorithm
+will mainly operate on the regular expression $r$ and string
+$s$, which are both finite. Before we come to the matching
+algorithm, however, let us have a closer look at what it means
+when two regular expressions are equivalent.
 
 \subsection*{Regular Expression Equivalences}
 
-
-\subsection*{Matching Algorithm}
+We already defined in Handout 1 what it means for two regular
+expressions to be equivalent, namely if their meaning is the
+same language:
 
-The algorithm we will define below consists of two parts. One is the
-function $nullable$ which takes a regular expression as argument and
-decides whether it can match the empty string (this means it returns a
-boolean). This can be easily defined recursively as follows:
+\[
+r_1 \equiv r_2 \;\dn\; L(r_1) = L(r_2)
+\]
+
+\noindent
+It is relatively easy to verify that some concrete equivalences
+hold, for example
 
 \begin{center}
-\begin{tabular}{@ {}l@ {\hspace{2mm}}c@ {\hspace{2mm}}l@ {}}
-$nullable(\varnothing)$      & $\dn$ & $f\!\/alse$\\
-$nullable(\epsilon)$           & $\dn$ &  $true$\\
-$nullable(c)$                    & $\dn$ &  $f\!alse$\\
-$nullable(r_1 + r_2)$       & $\dn$ &  $nullable(r_1) \vee nullable(r_2)$\\ 
-$nullable(r_1 \cdot r_2)$ & $\dn$ &  $nullable(r_1) \wedge nullable(r_2)$\\
-$nullable(r^*)$                & $\dn$ & $true$ \\
+\begin{tabular}{rcl}
+$(a + b)  + c$ & $\equiv$ & $a + (b + c)$\\
+$a + a$        & $\equiv$ & $a$\\
+$a + b$        & $\equiv$ & $b + a$\\
+$(a \cdot b) \cdot c$ & $\equiv$ & $a \cdot (b \cdot c)$\\
+$c \cdot (a + b)$ & $\equiv$ & $(c \cdot a) + (c \cdot b)$\\
 \end{tabular}
 \end{center}
 
 \noindent
-The idea behind this function is that the following property holds:
+but also easy to verify that the following regular expressions
+are \emph{not} equivalent
+
+\begin{center}
+\begin{tabular}{rcl}
+$a \cdot a$ & $\not\equiv$ & $a$\\
+$a + (b \cdot c)$ & $\not\equiv$ & $(a + b) \cdot (a + c)$\\
+\end{tabular}
+\end{center}
+
+\noindent I leave it to you to verify these equivalences and
+non-equivalences. It is also interesting to look at some
+corner cases involving $\epsilon$ and $\varnothing$:
+
+\begin{center}
+\begin{tabular}{rcl}
+$a \cdot \varnothing$ & $\not\equiv$ & $a$\\
+$a + \epsilon$ & $\not\equiv$ & $a$\\
+$\epsilon$ & $\equiv$ & $\varnothing^*$\\
+$\epsilon^*$ & $\equiv$ & $\epsilon$\\
+$\varnothing^*$ & $\not\equiv$ & $\varnothing$\\
+\end{tabular}
+\end{center}
+
+\noindent Again I leave it to you to make sure you agree
+with these equivalences and non-equivalences. 
+
+
+For our matching algorithm however the following six
+equivalences will play an important role:
+
+\begin{center}
+\begin{tabular}{rcl}
+$r + \varnothing$  & $\equiv$ & $r$\\
+$\varnothing + r$  & $\equiv$ & $r$\\
+$r \cdot \epsilon$ & $\equiv$ & $r$\\
+$\epsilon \cdot r$     & $\equiv$ & $r$\\
+$r \cdot \varnothing$ & $\equiv$ & $\varnothing$\\
+$\varnothing \cdot r$ & $\equiv$ & $\varnothing$\\
+$r + r$ & $\equiv$ & $r$
+\end{tabular}
+\end{center}
+
+\noindent which always hold no matter what the regular
+expression $r$ looks like. The first are easy to verify since
+$L(\varnothing)$ is the empty set. The next two are also easy 
+to verify since $L(\epsilon) = \{[]\}$ and appending the empty 
+string to every string of another set, leaves the set 
+unchanged. Be careful to fully comprehend the fifth and 
+sixth equivalence: if you concatenate two sets of strings
+and one is the empty set, then the concatenation will also be
+the empty set. Check the definition of \pcode{_ @ _}.
+The last equivalence is again trivial.
+
+
+What will be important later on is that we can orient these
+equivalences and read them from left to right. In this way we
+can view them as \emph{simplification rules}. Suppose for 
+example the regular expression 
+
+\begin{equation}
+(r_1 + \varnothing) \cdot \epsilon + ((\epsilon + r_2) + r_3) \cdot (r_4 \cdot \varnothing)
+\label{big}
+\end{equation}
+
+\noindent If we can find an equivalent regular expression that
+is simpler (smaller for example), then this might potentially 
+make our matching algorithm is faster. The reason is that 
+whether a string $s$ is in $L(r)$ or in $L(r')$ with $r\equiv r'$
+will always give the same answer. In the example above you 
+will see that the regular expression is equivalent to $r_1$
+if you iteratively apply the simplification rules from above:
+
+\begin{center}
+\begin{tabular}{ll}
+ & $(r_1 + \varnothing) \cdot \epsilon + ((\epsilon + r_2) + r_3) \cdot 
+(\underline{r_4 \cdot \varnothing})$\smallskip\\
+$\equiv$ & $(r_1 + \varnothing) \cdot \epsilon + \underline{((\epsilon + r_2) + r_3) \cdot 
+\varnothing}$\smallskip\\
+$\equiv$ & $\underline{(r_1 + \varnothing) \cdot \epsilon} + \varnothing$\smallskip\\
+$\equiv$ & $(\underline{r_1 + \varnothing}) + \varnothing$\smallskip\\
+$\equiv$ & $\underline{r_1 + \varnothing}$\smallskip\\
+$\equiv$ & $r_1$\
+\end{tabular}
+\end{center}
+
+\noindent In each step I underlined where a simplification
+rule is applied. Our matching algorithm in the next section
+will often generate such ``useless'' $\epsilon$s and
+$\varnothing$s, therefore simplifying them away will make the
+algorithm quite a bit faster.
+
+\subsection*{The Matching Algorithm}
+
+The algorithm we will define below consists of two parts. One
+is the function $nullable$ which takes a regular expression as
+argument and decides whether it can match the empty string
+(this means it returns a boolean in Scala). This can be easily
+defined recursively as follows:
+
+\begin{center}
+\begin{tabular}{@ {}l@ {\hspace{2mm}}c@ {\hspace{2mm}}l@ {}}
+$nullable(\varnothing)$      & $\dn$ & $\textit{false}$\\
+$nullable(\epsilon)$         & $\dn$ & $true$\\
+$nullable(c)$                & $\dn$ & $\textit{false}$\\
+$nullable(r_1 + r_2)$     & $\dn$ &  $nullable(r_1) \vee nullable(r_2)$\\ 
+$nullable(r_1 \cdot r_2)$ & $\dn$ &  $nullable(r_1) \wedge nullable(r_2)$\\
+$nullable(r^*)$              & $\dn$ & $true$ \\
+\end{tabular}
+\end{center}
+
+\noindent The idea behind this function is that the following
+property holds:
 
 \[
-nullable(r) \;\;\text{if and only if}\;\; ""\in L(r)
+nullable(r) \;\;\text{if and only if}\;\; []\in L(r)
 \]
 
-\noindent
-Note on the left-hand side we have a function we can implement; on the
-right we have its specification.
+\noindent Note on the left-hand side we have a function we can
+implement; on the right we have its specification (which we
+cannot implement in a programming language).
 
 The other function of our matching algorithm calculates a
-\emph{derivative} of a regular expression. This is a function which
-will take a regular expression, say $r$, and a character, say $c$, as
-argument and return a new regular expression. Be careful that the
-intuition behind this function is not so easy to grasp on first
-reading. Essentially this function solves the following problem: if
-$r$ can match a string of the form $c\!::\!s$, what does the regular
-expression look like that can match just $s$. The definition of this
-function is as follows:
+\emph{derivative} of a regular expression. This is a function
+which will take a regular expression, say $r$, and a
+character, say $c$, as argument and return a new regular
+expression. Be careful that the intuition behind this function
+is not so easy to grasp on first reading. Essentially this
+function solves the following problem: if $r$ can match a
+string of the form $c\!::\!s$, what does the regular
+expression look like that can match just $s$. The definition
+of this function is as follows:
 
 \begin{center}
-\begin{tabular}{@ {}l@ {\hspace{2mm}}c@ {\hspace{2mm}}l@ {\hspace{-10mm}}l@ {}}
-  $der\, c\, (\varnothing)$      & $\dn$ & $\varnothing$ & \\
-  $der\, c\, (\epsilon)$           & $\dn$ & $\varnothing$ & \\
-  $der\, c\, (d)$                     & $\dn$ & if $c = d$ then $\epsilon$ else $\varnothing$ & \\
-  $der\, c\, (r_1 + r_2)$        & $\dn$ & $der\, c\, r_1 + der\, c\, r_2$ & \\
+\begin{tabular}{l@ {\hspace{2mm}}c@ {\hspace{2mm}}l}
+  $der\, c\, (\varnothing)$      & $\dn$ & $\varnothing$\\
+  $der\, c\, (\epsilon)$         & $\dn$ & $\varnothing$ \\
+  $der\, c\, (d)$                & $\dn$ & if $c = d$ then $\epsilon$ else $\varnothing$\\
+  $der\, c\, (r_1 + r_2)$        & $\dn$ & $der\, c\, r_1 + der\, c\, r_2$\\
   $der\, c\, (r_1 \cdot r_2)$  & $\dn$  & if $nullable (r_1)$\\
   & & then $(der\,c\,r_1) \cdot r_2 + der\, c\, r_2$\\ 
   & & else $(der\, c\, r_1) \cdot r_2$\\
-  $der\, c\, (r^*)$          & $\dn$ & $(der\,c\,r) \cdot (r^*)$ &
+  $der\, c\, (r^*)$          & $\dn$ & $(der\,c\,r) \cdot (r^*)$
   \end{tabular}
 \end{center}
 
-\noindent
-The first two clauses can be rationalised as follows: recall that
-$der$ should calculate a regular expression, if the ``input'' regular
-expression can match a string of the form $c\!::\!s$. Since neither
-$\varnothing$ nor $\epsilon$ can match such a string we return
-$\varnothing$. In the third case we have to make a case-distinction:
-In case the regular expression is $c$, then clearly it can recognise a
+\noindent The first two clauses can be rationalised as
+follows: recall that $der$ should calculate a regular
+expression, if the ``input'' regular expression can match a
+string of the form $c\!::\!s$. Since neither $\varnothing$ nor
+$\epsilon$ can match such a string we return $\varnothing$. In
+the third case we have to make a case-distinction: In case the
+regular expression is $c$, then clearly it can recognise a
 string of the form $c\!::\!s$, just that $s$ is the empty
-string. Therefore we return the $\epsilon$-regular expression. In the
-other case we again return $\varnothing$ since no string of the
-$c\!::\!s$ can be matched.  The $+$-case is relatively
-straightforward: all strings of the form $c\!::\!s$ are either matched
-by the regular expression $r_1$ or $r_2$. So we just have to
-recursively call $der$ with these two regular expressions and compose
-the results again with $+$. The $\cdot$-case is more complicated: if
-$r_1\cdot r_2$ matches a string of the form $c\!::\!s$, then the first
-part must be matched by $r_1$.  Consequently, it makes sense to
-construct the regular expression for $s$ by calling $der$ with $r_1$
-and ``appending'' $r_2$. There is however one exception to this simple
-rule: if $r_1$ can match the empty string, then all of $c\!::\!s$ is
-matched by $r_2$. So in case $r_1$ is nullable (that is can match the
-empty string) we have to allow the choice $der\,c\,r_2$ for
-calculating the regular expression that can match $s$. The $*$-case is
-again simple: if $r^*$ matches a string of the form $c\!::\!s$, then
-the first part must be ``matched'' by a single copy of $r$. Therefore
-we call recursively $der\,c\,r$ and ``append'' $r^*$ in order to match
-the rest of $s$.
+string. Therefore we return the $\epsilon$-regular expression.
+In the other case we again return $\varnothing$ since no
+string of the $c\!::\!s$ can be matched. Next come the
+recursive cases. Fortunately, the $+$-case is still relatively
+straightforward: all strings of the form $c\!::\!s$ are either
+matched by the regular expression $r_1$ or $r_2$. So we just
+have to recursively call $der$ with these two regular
+expressions and compose the results again with $+$. Yes, makes
+sense? The $\cdot$-case is more complicated: if $r_1\cdot r_2$
+matches a string of the form $c\!::\!s$, then the first part
+must be matched by $r_1$. Consequently, it makes sense to
+construct the regular expression for $s$ by calling $der$ with
+$r_1$ and ``appending'' $r_2$. There is however one exception
+to this simple rule: if $r_1$ can match the empty string, then
+all of $c\!::\!s$ is matched by $r_2$. So in case $r_1$ is
+nullable (that is can match the empty string) we have to allow
+the choice $der\,c\,r_2$ for calculating the regular
+expression that can match $s$. Therefore we have to 
+add the regular expression $der\,c\,r_2$.
+The $*$-case is again simple:
+if $r^*$ matches a string of the form $c\!::\!s$, then the
+first part must be ``matched'' by a single copy of $r$.
+Therefore we call recursively $der\,c\,r$ and ``append'' $r^*$
+in order to match the rest of $s$.
 
-Another way to rationalise the definition of $der$ is to consider the
-following operation on sets:
+If this did not make sense, here is another way to rationalise
+the definition of $der$ by considering the following operation
+on sets:
 
 \[
 Der\,c\,A\;\dn\;\{s\,|\,c\!::\!s \in A\}
@@ -117,11 +303,11 @@
 \noindent
 which essentially transforms a set of strings $A$ by filtering out all
 strings that do not start with $c$ and then strips off the $c$ from
-all the remaining strings. For example suppose $A = \{"f\!oo", "bar",
-"f\!rak"\}$ then
+all the remaining strings. For example suppose $A = \{f\!oo, bar,
+f\!rak\}$ then
 \[
-Der\,f\,A = \{"oo", "rak"\}\quad,\quad
-Der\,b\,A = \{"ar"\}  \quad \text{and} \quad
+Der\,f\,A = \{oo, rak\}\quad,\quad
+Der\,b\,A = \{ar\}  \quad \text{and} \quad
 Der\,a\,A = \varnothing
 \]
  
@@ -141,61 +327,126 @@
 from the remaining strings---this is exactly the language that
 $der\,c\,r$ can match.
 
-If we want to find out whether the string $"abc"$ is matched by the
-regular expression $r$ then we can iteratively apply $Der$ as follows
+If we want to find out whether the string $abc$ is matched by
+the regular expression $r_1$ then we can iteratively apply $der$
+as follows
+
+\begin{center}
+\begin{tabular}{rll}
+Input: $r_1$, $abc$\medskip\\
+Step 1: & build derivative of $a$ and $r_1$ & $(r_2 = der\,a\,r_1)$\smallskip\\
+Step 2: & build derivative of $b$ and $r_2$ & $(r_3 = der\,b\,r_2)$\smallskip\\
+Step 3: & build derivative of $c$ and $r_3$ & $(r_4 = der\,b\,r_3)$\smallskip\\
+Step 4: & the string is exhausted; test & ($nullable(r_4)$)\\
+        & whether $r_4$ can recognise the\\
+        & empty string\smallskip\\
+Output: & result of the test $\Rightarrow true \,\text{or}\, \textit{false}$\\        
+\end{tabular}
+\end{center}
 
-\begin{enumerate}
-\item $Der\,a\,(L(r))$
-\item $Der\,b\,(Der\,a\,(L(r)))$
-\item $Der\,c\,(Der\,b\,(Der\,a\,(L(r))))$
-\end{enumerate}
+\noindent Again the operation $Der$ might help to rationalise
+this algorithm. We want to know whether $abc \in L(r_1)$. We
+do not know yet. But lets assume it is. Then $Der\,a\,L(r_1)$
+builds the set where all the strings not starting with $a$ are
+filtered out. Of the remaining strings, the $a$ is stripped
+off. Then we continue with filtering out all strings not
+starting with $b$ and stripping off the $b$ from the remaining
+strings, that means we build $Der\,b\,(Der\,a\,(L(r_1)))$.
+Finally we filter out all strings not starting with $c$ and
+strip off $c$ from the remaining string. This is
+$Der\,c\,(Der\,b\,(Der\,a\,(L(r))))$. Now if $abc$ was in the 
+original set ($L(r_1)$), then in $Der\,c\,(Der\,b\,(Der\,a\,(L(r))))$ 
+must be the empty string. If not then $abc$ was not in the 
+language we started with.
 
-\noindent
-In the last step we need to test whether the empty string is in the
-set. Our matching algorithm will work similarly, just using regular
-expression instead of sets. For this we need to lift the notion of
-derivatives from characters to strings. This can be done using the
-following function, taking a string and regular expression as input
-and a regular expression as output.
+Our matching algorithm using $der$ and $nullable$ works
+similarly, just using regular expression instead of sets. For
+this we need to extend the notion of derivatives from
+characters to strings. This can be done using the following
+function, taking a string and regular expression as input and
+a regular expression as output.
 
 \begin{center}
 \begin{tabular}{@ {}l@ {\hspace{2mm}}c@ {\hspace{2mm}}l@ {\hspace{-10mm}}l@ {}}
-  $der\!s\, []\, r$     & $\dn$ & $r$ & \\
-  $der\!s\, (c\!::\!s)\, r$ & $\dn$ & $der\!s\,s\,(der\,c\,r)$ & \\
+  $\textit{ders}\, []\, r$     & $\dn$ & $r$ & \\
+  $\textit{ders}\, (c\!::\!s)\, r$ & $\dn$ & $\textit{ders}\,s\,(der\,c\,r)$ & \\
   \end{tabular}
 \end{center}
 
-\noindent
-Having $ders$ in place, we can finally define our matching algorithm:
+\noindent This function essentially iterates $der$ taking one
+character at the time from the original string until it is
+exhausted. Having $ders$ in place, we can finally define our
+matching algorithm:
 
 \[
-match\,s\,r = nullable(ders\,s\,r)
-\]
-
-\noindent
-We claim that
-
-\[
-match\,s\,r\quad\text{if and only if}\quad s\in L(r)
+matches\,s\,r = nullable(ders\,s\,r)
 \]
 
 \noindent
-holds, which means our algorithm satisfies the specification. This
-algorithm was introduced by Janus Brzozowski in 1964. Its main
-attractions are simplicity and being fast, as well as being easily
-extendable for other regular expressions such as $r^{\{n\}}$, $r^?$,
-$\sim{}r$ and so on.
+We can claim that
+
+\[
+matches\,s\,r\quad\text{if and only if}\quad s\in L(r)
+\]
+
+\noindent holds, which means our algorithm satisfies the
+specification. Of course we can claim many things\ldots
+whether the claim holds any water is a different question,
+which for example is the point of the Strand-2 Coursework.
+
+This algorithm was introduced by Janus Brzozowski in 1964. Its
+main attractions are simplicity and being fast, as well as
+being easily extendable for other regular expressions such as
+$r^{\{n\}}$, $r^?$, $\sim{}r$ and so on (this is subject of
+Strand-1 Coursework 1).
 
 \subsection*{The Matching Algorithm in Scala}
 
-
+Another attraction of the algorithm is that it can be easily
+implemented in a functional programming language, like Scala.
+Given the implementation of regular expressions in Scala given
+in the first lecture and handout, the functions for
+\pcode{matches} are shown in Figure~\ref{scala1}.
 
 \begin{figure}[p]
-{\lstset{language=Scala}\texttt{\lstinputlisting{../progs/app5.scala}}}
-{\lstset{language=Scala}\texttt{\lstinputlisting{../progs/app6.scala}}}
-\caption{Scala implementation of the nullable and derivatives functions.}
+\lstinputlisting{../progs/app5.scala}
+\caption{Scala implementation of the nullable and 
+  derivatives functions.\label{scala1}}
 \end{figure}
 
+For running the algorithm with our favourite example, the evil
+regular expression $a?^{\{n\}}a^{\{n\}}$, we need to implement
+the optional regular expression and the exactly $n$-times
+regular expression. This can be done with the translations
+
+\lstinputlisting[numbers=none]{../progs/app51.scala}
+
+\noindent Running the matcher with the example, we find it is
+slightly worse then the matcher in Ruby and Python.
+Ooops\ldots\medskip
+
+\noindent Analysing this failure a bit we notice that 
+for $a^{\{n\}}$ we generate quite big regular expressions:
+
+\begin{center}
+\begin{tabular}{rl}
+1: & $a$\\
+2: & $a\cdot a$\\
+3: & $a\cdot a\cdot a$\\
+& \ldots\\
+13: & $a\cdot a\cdot a\cdot a\cdot a\cdot a\cdot a\cdot a\cdot a\cdot a\cdot a\cdot a\cdot a$\\
+& \ldots\\
+20:
+\end{tabular}
+\end{center}
+
+\noindent Our algorithm traverses such regular expressions at
+least once every time a derivative is calculated. So having
+large regular expressions, will cause problems. This problem
+is aggravated with $a?$ being represented as $a + \epsilon$.
+
+
+
 \end{document}
 
 %%% Local Variables: 
--- a/progs/app5.scala	Sat Sep 27 00:37:02 2014 +0100
+++ b/progs/app5.scala	Sun Sep 28 18:07:58 2014 +0100
@@ -6,3 +6,22 @@
   case SEQ(r1, r2) => nullable(r1) && nullable(r2)
   case STAR(_) => true
 }
+
+def der (c: Char, r: Rexp) : Rexp = r match {
+  case NULL => NULL
+  case EMPTY => NULL
+  case CHAR(d) => if (c == d) EMPTY else NULL
+  case ALT(r1, r2) => ALT(der(c, r1), der(c, r2))
+  case SEQ(r1, r2) => 
+    if (nullable(r1)) ALT(SEQ(der(c, r1), r2), der(c, r2))
+    else SEQ(der(c, r1), r2)
+  case STAR(r) => SEQ(der(c, r), STAR(r))
+}
+
+def ders (s: List[Char], r: Rexp) : Rexp = s match {
+  case Nil => r
+  case c::s => ders(s, der(c, r))
+}
+
+def matches(r: Rexp, s: String) : Boolean = 
+  nullable(ders(s.toList, r))
--- a/progs/app51.scala	Sat Sep 27 00:37:02 2014 +0100
+++ b/progs/app51.scala	Sun Sep 28 18:07:58 2014 +0100
@@ -1,8 +1,8 @@
-abstract class Rexp
+def OPT(r: Rexp) = ALT(r, EMPTY)
 
-case object NULL extends Rexp
-case object EMPTY extends Rexp
-case class CHAR(c: Char) extends Rexp
-case class ALT(r1: Rexp, r2: Rexp) extends Rexp
-case class SEQ(r1: Rexp, r2: Rexp) extends Rexp
-case class STAR(r: Rexp) extends Rexp
+def NTIMES(r: Rexp, n: Int) : Rexp = n match {
+  case 0 => EMPTY
+  case 1 => r
+  case n => SEQ(r, NTIMES(r, n - 1))
+}
+
--- a/progs/re1.scala	Sat Sep 27 00:37:02 2014 +0100
+++ b/progs/re1.scala	Sun Sep 28 18:07:58 2014 +0100
@@ -36,7 +36,7 @@
 }
 
 // main matcher function
-def matcher(r: Rexp, s: String) : Boolean = nullable(ders(s.toList, r))
+def matches(r: Rexp, s: String) : Boolean = nullable(ders(s.toList, r))
 
 //example from the homework
 //val r = STAR(ALT(SEQ(CHAR('a'), CHAR('b')), CHAR('b')))
@@ -66,5 +66,5 @@
 }
 
 for (i <- 1 to 29) {
-  println(i + ": " + "%.5f".format(time_needed(1, matcher(EVIL(i), "a" * i))))
+  println(i + ": " + "%.5f".format(time_needed(1, matches(EVIL(i), "a" * i))))
 }
--- a/progs/re2.scala	Sat Sep 27 00:37:02 2014 +0100
+++ b/progs/re2.scala	Sun Sep 28 18:07:58 2014 +0100
@@ -35,7 +35,7 @@
   case c::s => ders(s, der(c, r))
 }
 
-def matcher(r: Rexp, s: String) : Boolean = nullable(ders(s.toList, r))
+def matches(r: Rexp, s: String) : Boolean = nullable(ders(s.toList, r))
 
 
 //optional: one or zero times
@@ -51,12 +51,12 @@
 }
 
 for (i <- 1 to 100) {
-  println(i + ": " + "%.5f".format(time_needed(1, matcher(EVIL(i), "a" * i))))
+  println(i + ": " + "%.5f".format(time_needed(1, matches(EVIL(i), "a" * i))))
 }
 
 //a bit bolder test
 for (i <- 1 to 1000 by 50) {
-  println(i + " " + "%.5f".format(time_needed(1, matcher(EVIL(i), "a" * i))))
+  println(i + " " + "%.5f".format(time_needed(1, matches(EVIL(i), "a" * i))))
 }
 
 
--- a/progs/re3.scala	Sat Sep 27 00:37:02 2014 +0100
+++ b/progs/re3.scala	Sun Sep 28 18:07:58 2014 +0100
@@ -60,7 +60,7 @@
   case c::s => ders(s, simp(der(c, r)))
 }
 
-def matcher(r: Rexp, s: String) : Boolean = nullable(ders(s.toList, r))
+def matches(r: Rexp, s: String) : Boolean = nullable(ders(s.toList, r))
 
 
 //one or zero
@@ -77,7 +77,7 @@
 
 
 for (i <- 1 to 12001 by 500) {
-  println(i + " " + "%.5f".format(time_needed(1, matcher(EVIL(i), "a" * i))))
+  println(i + " " + "%.5f".format(time_needed(1, matches(EVIL(i), "a" * i))))
 }
 
 
Binary file slides/slides02.pdf has changed
--- a/slides/slides02.tex	Sat Sep 27 00:37:02 2014 +0100
+++ b/slides/slides02.tex	Sun Sep 28 18:07:58 2014 +0100
@@ -4,32 +4,6 @@
 \usepackage{../langs}
 \usepackage{../data}
 
-%\usepackage{beamerthemeplaincu}
-%\usepackage[absolute,overlay]{textpos}
-%\usepackage{ifthen}
-%\usepackage{tikz}
-%\usepackage{pgf}
-%\usepackage{calc} 
-%\usepackage{ulem}
-%\usepackage{courier}
-%\usepackage{listings}
-%\renewcommand{\uline}[1]{#1}
-%\usetikzlibrary{arrows}
-%\usetikzlibrary{automata}
-%\usetikzlibrary{shapes}
-%\usetikzlibrary{shadows}
-%\usetikzlibrary{positioning}
-%\usetikzlibrary{plotmarks}
-%\usetikzlibrary{calc}
-%\usepackage{graphicx} 
-%\usepackage{../langs}
-%\usepackage{../data}
-
-%\makeatletter
-%\lst@CCPutMacro\lst@ProcessOther {"2D}{\lst@ttfamily{-{}}{-{}}}
-%\@empty\z@\@empty
-%\makeatother
-
 \hfuzz=220pt 
 
 \lstset{language=Scala,
@@ -47,8 +21,7 @@
 \begin{document}
 
 %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
-\mode<presentation>{
-\begin{frame}<1>[t]
+\begin{frame}[t]
 \frametitle{%
   \begin{tabular}{@ {}c@ {}}
   \\[-3mm]
@@ -56,13 +29,7 @@
   \LARGE Formal Languages (2)\\[3mm] 
   \end{tabular}}
 
-  %\begin{center}
-  %\includegraphics[scale=0.3]{pics/ante1.jpg}\hspace{5mm}
-  %\includegraphics[scale=0.31]{pics/ante2.jpg}\\
-  %\footnotesize\textcolor{gray}{Antikythera automaton, 100 BC (Archimedes?)}
-  %\end{center}
-
-\normalsize
+  \normalsize
   \begin{center}
   \begin{tabular}{ll}
   Email:  & christian.urban at kcl.ac.uk\\
@@ -71,20 +38,24 @@
   \end{tabular}
   \end{center}
 
-
-\end{frame}}
- %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%     
+\end{frame}
+%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%     
 
 %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
-\mode<presentation>{
 \begin{frame}[c]
-\frametitle{\begin{tabular}{c}Languages\end{tabular}}
+\frametitle{Languages, Strings}
 
-A \alert{language} is a set of strings.\bigskip
+\begin{itemize}
+\item A \alert{language} is a set of strings.\medskip
+\begin{center}
+\bl{\{[], hello, foobar, a, abc\}}
+\end{center}\bigskip
 
-A \alert{regular expression} specifies a set of strings, or language.
-  
-\end{frame}}
+\item The \alert{meaning} of a regular expression is a set of 
+  strings, or language.
+\end{itemize}  
+
+\end{frame}
 %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%   
 
 %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%