updated
authorChristian Urban <urbanc@in.tum.de>
Tue, 25 Apr 2017 12:28:07 +0100
changeset 485 bd6d999ab7b6
parent 484 e61ffb28994d
child 486 8178fcf377dc
updated
handouts/ho03.pdf
handouts/ho03.tex
progs/dfa.scala
progs/enfa.scala
progs/nfa.scala
Binary file handouts/ho03.pdf has changed
--- a/handouts/ho03.tex	Sat Apr 15 22:03:59 2017 +0800
+++ b/handouts/ho03.tex	Tue Apr 25 12:28:07 2017 +0100
@@ -23,12 +23,13 @@
 recently. Still let us in this lecture have a closer look at automata
 and their relation to regular expressions. This will help us with
 understanding why the regular expression matchers in Python, Ruby and
-Java are so slow with certain regular expressions.
+Java are so slow with certain regular expressions. On the way we will
+also see what are the limitations of regular expressions.
 
 
 \subsection*{Deterministic Finite Automata}
 
-The central definition is:\medskip
+Lets start\ldots the central definition is:\medskip
 
 \noindent 
 A \emph{deterministic finite automaton} (DFA), say $A$, is
@@ -76,7 +77,7 @@
 \noindent In this graphical notation, the accepting state $Q_4$ is
 indicated with double circles. Note that there can be more than one
 accepting state. It is also possible that a DFA has no accepting
-states at all, or that the starting state is also an accepting
+state at all, or that the starting state is also an accepting
 state. In the case above the transition function is defined everywhere
 and can also be given as a table as follows:
 
@@ -107,15 +108,15 @@
 \]
 
 \noindent This lifted transition function is often called
-``delta-hat''. Given a string, we start in the starting state and take
-the first character of the string, follow to the next state, then take
-the second character and so on. Once the string is exhausted and we
-end up in an accepting state, then this string is accepted by the
+\emph{delta-hat}. Given a string, we start in the starting state and
+take the first character of the string, follow to the next state, then
+take the second character and so on. Once the string is exhausted and
+we end up in an accepting state, then this string is accepted by the
 automaton. Otherwise it is not accepted. This also means that if along
 the way we hit the case where the transition function $\delta$ is not
 defined, we need to raise an error. In our implementation we will deal
-with this case elegantly by using Scala's \texttt{Try}. So a string
-$s$ is in the \emph{language accepted by the automaton} ${\cal
+with this case elegantly by using Scala's \texttt{Try}.  Summing up: a
+string $s$ is in the \emph{language accepted by the automaton} ${\cal
   A}(\varSigma, Q, Q_0, F, \delta)$ iff
 
 \[
@@ -132,16 +133,16 @@
                   {../progs/dfa.scala}
 \caption{A Scala implementation of DFAs using partial functions.
   Notice some subtleties: \texttt{deltas} implements the delta-hat
-  construction by lifting the transition (partial) function to
-  lists of characters. Since \texttt{delta} is given
-  as a partial function, it can obviously go ``wrong'' in which
-  case the \texttt{Try} in \texttt{accepts} catches the error and
-  returns \texttt{false}---that means the string is not accepted.
-  The example \texttt{delta} implements the DFA example shown
-  earlier in the handout.\label{dfa}}
+  construction by lifting the transition (partial) function to lists
+  of characters. Since \texttt{delta} is given as a partial function,
+  it can obviously go ``wrong'' in which case the \texttt{Try} in
+  \texttt{accepts} catches the error and returns \texttt{false}---that
+  means the string is not accepted.  The example \texttt{delta} in
+  Line 28--38 implements the DFA example shown earlier in the
+  handout.\label{dfa}}
 \end{figure}
 
-My take of a simple Scala implementation for DFAs is given in
+My take on an implementation of DFAs in Scala is given in
 Figure~\ref{dfa}. As you can see, there are many features of the
 mathematical definition that are quite closely reflected in the
 code. In the DFA-class, there is a starting state, called
@@ -151,10 +152,19 @@
 input (of polymorphic type \texttt{C}) and produce a new state (of
 type \texttt{A}). For the moment it is OK to assume that \texttt{A} is
 some arbitrary type for states and the input is just characters.  (The
-reason for having polymorphic types for the states and the input of
-DFAs will become clearer later on.)
+reason for not having concrete types, but polymorphic types for the
+states and the input of DFAs will become clearer later on.)
 
-The most important point in this implemnetation is that I use Scala's
+The DFA-class has also an argument for specifying final states. In the
+implementation it is not a set of states, as in the mathematical
+definition, but a function from states to booleans (this function is
+supposed to return true whenever a state is final; false
+otherwise). While this boolean function is different from the sets of
+states, Scala allows to use sets for such functions (see Line 40 where
+the DFA is initialised). Again it will become clear later on why I use
+functions for final states, rather than sets.
+
+The most important point in the implementation is that I use Scala's
 partial functions for representing the transitions; alternatives would
 have been \texttt{Maps} or even \texttt{Lists}. One of the main
 advantages of using partial functions is that transitions can be quite
@@ -176,36 +186,29 @@
   }
 \end{lstlisting}}  
 
-\noindent I let you think what this DFA looks like in the graphical
-notation.
+\noindent I let you think what the corresponding DFA looks like in the
+graph notation.
 
-The DFA-class has also an argument for specifying final states. In the
-implementation it not a set of states, as in the matemathical
-definition, but a function from states to booleans (this function is
-supposed to return true whenever a state is final; false
-otherwise). While this boolean function is different from the sets of
-states, Scala allows to use sets for such functions (see Line 40 where
-the DFA is initialised). Again it will become clear later on why I use
-functions for final states, rather than sets.
-
-I let you ponder whether this is a good implementation of DFAs. In
-doing so I hope you notice that the $\varSigma$ and $Qs$ components (the
-alphabet and the set of finite states, respectively) are missing from
-the class definition. This means that the implementation allows you to
-do some fishy things you are not meant to do with DFAs. Which fishy
-things could that be?
+Finally, I let you ponder whether this is a good implementation of
+DFAs or not. In doing so I hope you notice that the $\varSigma$ and
+$Qs$ components (the alphabet and the set of finite states,
+respectively) are missing from the class definition. This means that
+the implementation allows you to do some fishy things you are not
+meant to do with DFAs. Which fishy things could that be?
 
 
 
 \subsection*{Non-Deterministic Finite Automata}
 
-While with DFAs it is always be clear that given a state and a
-character what the next state is (potentially none), it will be useful
-to relax this restriction. That means we allow states to have several
+Remember we want to find out what the relation is between regular
+expressions and automata. To do this with DFAs is a bit unwieldy.
+While with DFAs it is always clear that given a state and a character
+what the next state is (potentially none), it will be convenient to
+relax this restriction. That means we allow states to have several
 potential successor states. We even allow more than one starting
 state. The resulting construction is called a \emph{Non-Deterministic
-  Finite Automaton} (NFA) given also as a five-tuple ${\cal A}(\varSigma,
-Qs, Q_{0s}, F, \rho)$ where
+  Finite Automaton} (NFA) given also as a five-tuple ${\cal
+  A}(\varSigma, Qs, Q_{0s}, F, \rho)$ where
 
 \begin{itemize}
 \item $\varSigma$ is an alphabet,  
@@ -246,7 +249,7 @@
 let you think which kind of strings the above NFA accepts.
 
 
-There are a number of additional points you should note with
+There are a number of additional points you should note about
 NFAs. Every DFA is a NFA, but not vice versa. The $\rho$ in NFAs is a
 transition \emph{relation} (DFAs have a transition function). The
 difference between a function and a relation is that a function has
@@ -259,162 +262,210 @@
 
 My implementation of NFAs in Scala is shown in Figure~\ref{nfa}.
 Perhaps interestingly, I do not actually use relations for my NFAs,
-and I also do not use transition functions that return sets of states
-(another popular choice for implementing NFAs).  For reasons that
-become clear in a moment, I use sets of partial functions
-instead---see Line 7 in Figure~\ref{nfa}. DFAs have only one such
-partial function; my NFAs have a set.  Another parameter,
-\texttt{starts}, is in NFAs a set of states; \texttt{fins} is again a
-function from states to booleans. The \texttt{next} function
-calculates the set of next states reachable from a single state
-\texttt{q} by a character~\texttt{c}---this is calculated by going
-through all the partial functions in the \texttt{delta}-set and apply
-\texttt{q} and \texttt{c} (see Line 13). This gives a set of
-\texttt{Some}s (in case the application succeeded) and possibly some
-\texttt{None}s (in case the partial function is not defined or produces an
-error).  The \texttt{None}s are filtered out by the \texttt{flatMap},
-leaving the values inside the \texttt{Some}s. The function
-\texttt{nexts} just lifts this function to sets of
-states. \texttt{Deltas} and \texttt{accept} are similar to the DFA
-definitions.
+but use transition functions that return sets of states.  DFAs have
+partial transition functions that return a single state; my NFAs
+return a set. I let you think about this representation for
+NFA-transitions and how it corresponds to the relations used in the
+mathematical definition of NFAs.
 
+Like in the mathematical definition, \texttt{starts} is in NFAs a set
+of states; \texttt{fins} is again a function from states to
+booleans. The \texttt{next} function calculates the set of next states
+reachable from a single state \texttt{q} by a character~\texttt{c}. In
+case there is no such state---the partial transition function is
+undefined---the empty set is returned (see function
+\texttt{applyOrElse} in Lines 9 and 10). The function \texttt{nexts}
+just lifts this function to sets of states.
+ 
 \begin{figure}[p]
 \small
 \lstinputlisting[numbers=left,linebackgroundcolor=
                   {\ifodd\value{lstnumber}\color{capri!3}\fi}]
                   {../progs/nfa.scala}
-\caption{A Scala implementation of NFAs using sets of partial functions.
-  Notice some subtleties: Since \texttt{delta} is given
-  as a set of partial functions, each of them can obviously go ``wrong'' in which
-  case the \texttt{Try}. The function \texttt{accepts} implements the
-  acceptance of a string in a breath-first fashion. This can be costly
-  way of deciding whether a string is accepted in practical contexts.\label{nfa}}
+\caption{A Scala implementation of NFAs using partial functions.
+  Notice that the function \texttt{accepts} implements the
+  acceptance of a string in a breath-first search fashion. This can be a costly
+  way of deciding whether a string is accepted or not in applications that need to handle
+  large NFAs and large inputs.\label{nfa}}
 \end{figure}
 
-The reason for using sets of partial functions for specifying the
-transitions in NFAs has to do with pattern matching. Consider the
-following example: a popular benchmark regular expression is
-$(.)^*\cdot a\cdot (.)^{\{n\}}\cdot b\cdot c$. The important point to
-note is that it uses $.$ in order to represent the regular expression
-that accepts any character. A NFA that accepts the same strings as
-this regular expression (for $n=3$) is as follows:
-
-\begin{center}
-\begin{tikzpicture}[>=stealth',very thick, auto, node distance=7mm,
-    every state/.style={minimum size=0pt,inner sep=1pt,
-      draw=blue!50,very thick,fill=blue!20},scale=0.5]
-\node[state,initial]  (Q_0)  {$Q_0$};
-\node[state] (Q_1) [right=of Q_0] {$Q_1$};
-\node[state] (Q_2) [right=of Q_1] {$Q_2$};
-\node[state] (Q_3) [right=of Q_2] {$Q_3$};
-\node[state] (Q_4) [right=of Q_3] {$Q_4$};
-\node[state] (Q_5) [right=of Q_4] {$Q_5$};
-\node[state,accepting] (Q_6) [right=of Q_5] {$Q_6$};
-\path[->] (Q_0) edge [loop above] node  {$.$} ();
-\path[->] (Q_0) edge node [above]  {$a$} (Q_1);
-\path[->] (Q_1) edge node [above]  {$.$} (Q_2);
-\path[->] (Q_2) edge node [above]  {$.$} (Q_3);
-\path[->] (Q_3) edge node [above]  {$.$} (Q_4);
-\path[->] (Q_4) edge node [above]  {$b$} (Q_5);
-\path[->] (Q_5) edge node [above]  {$c$} (Q_6);
-\end{tikzpicture}
-\end{center}
-
-\noindent
-Also here the $.$ stands for accepting any single character: for example if we
-are in $Q_0$ and read an $a$ we can either stay in $Q_0$ (since any
-character will do for this) or advance to $Q_1$ (but only if it is an
-$a$). Why this is a good benchmark regular expression is irrelevant
-here. The point is that this NFA can be conveniently represented by
-the code:
-
-{\small\begin{lstlisting}[language=Scala,linebackgroundcolor=
-                  {\ifodd\value{lstnumber}\color{capri!3}\fi}]
-val delta = Set[(State, Char) :=> State](
-  { case (Q0, 'a') => Q1 },
-  { case (Q0, _)   => Q0 },
-  { case (Q1, _)   => Q2 },
-  { case (Q2, _)   => Q3 },
-  { case (Q3, _)   => Q4 },
-  { case (Q4, 'b') => Q5 },
-  { case (Q5, 'c') => Q6 }
-)
-
-NFA(Set[State](Q0), delta, Set[State](Q6))
-\end{lstlisting}}
-
-\noindent
-where the $.$-transitions translate into a
-underscore-pattern-matching. Recall that in $Q_0$ if we read an $a$ we
-can go to $Q_1$ (by the first partial function in the set) and also
-stay in $Q_0$ (by the second partial function). Representing such
-transitions in any other way in Scala seems to be somehow awkward; the
-set of partial function representation makes them easy to implement.
-
-Look very careful again at the \texttt{accepts} and \texttt{deltas}
-functions in NFAs and remember that when accepting a string by an NFA
+Look very careful at the \texttt{accepts} and \texttt{deltas}
+functions in NFAs and remember that when accepting a string by a NFA
 we might have to explore all possible transitions (recall which state
-to go to is not unique anymore with NFAs). The implementation achieves
-this exploration in a \emph{breadth-first search} manner. This is fine
-for very small NFAs, but can lead to problems when the NFAs are
-bigger. Take for example the regular expression $(.)^*\cdot a\cdot
-(.)^{\{n\}}\cdot b\cdot c$ from above. If $n$ is large, say 100 or
-1000, then the corresponding NFA will have 104, respectively 1004,
-nodes. The problem is that with certain strings this can lead to 1000
-``active'' nodes in the breadth-first search, all of which we need to
-analyse when determining the next states. This can be a real memory
-strain in practical applications. As result, some regular expression
-matching engines resort to a \emph{depth-first search} with
-\emph{backtracking} in unsuccessful cases. In our implementation we
-can implement a depth-first version of \texttt{accepts} using Scala's
-\texttt{exists} as follows:
+to go to is not unique anymore with NFAs\ldots{}we need to explore
+potentially all next states). The implementation achieves this
+exploration in a \emph{breadth-first search} manner. This is fine for
+small NFAs, but can lead to real memory problems when the NFAs are
+bigger and larger strings need to be processed. As result, some
+regular expression matching engines resort to a \emph{depth-first
+  search} with \emph{backtracking} in unsuccessful cases. In our
+implementation we can implement a depth-first version of
+\texttt{accepts} using Scala's \texttt{exists}-function as follows:
 
 
 {\small\begin{lstlisting}[language=Scala,linebackgroundcolor=
                   {\ifodd\value{lstnumber}\color{capri!3}\fi}]
 def search(q: A, s: List[C]) : Boolean = s match {
   case Nil => fins(q)
-  case c::cs =>
-    delta.exists(d => Try(search(d(q, c), cs)) getOrElse false)
+  case c::cs => next(q, c).exists(search(_, cs)) 
 }
 
-def accepts(s: List[C]) : Boolean = 
+def accepts2(s: List[C]) : Boolean = 
   starts.exists(search(_, s))
 \end{lstlisting}}
 
 \noindent
 This depth-first way of exploration seems to work efficiently in many
-examples and is much less of strain on memory. The problem is that the
-backtracking can get ``catastrophic'' in some examples---remember the
-catastrophic backtracking from earlier lectures. This depth-first
+examples and is much less of a strain on memory. The problem is that
+the backtracking can get ``catastrophic'' in some examples---remember
+the catastrophic backtracking from earlier lectures. This depth-first
 search with backtracking is the reason for the abysmal performance of
-some regular expression macthings in Java, Ruby and Python. I like to
+some regular expression matchings in Java, Ruby and Python. I like to
 show you this next.
 
-%This means if
-%we need to decide whether a string is accepted by a NFA, we might have
-%to explore all possibilities. Also there is the special silent
-%transition in NFAs. As mentioned already this transition means you do
-%not have to ``consume'' any part of the input string, but ``silently''
-%change to a different state. In the left picture, for example, if you
-%are in the starting state, you can silently move either to $Q_1$ or
-%%$Q_2$. This silent transition is also often called
-%\emph{$\epsilon$-transition}.
-
 
 \subsubsection*{Thompson Construction}
 
-In order to get an idea what calculations are done in Java \& friends,
-we need a method for translating regular expressions into
-automata. The simplest and most well-known method is called
-\emph{Thompson Construction}, after the Turing Award winner Ken
-Thompson who implemented this method in early versions of grep????
+In order to get an idea what calculations are performed by Java \&
+friends, we need a method for transforming a regular expression into
+an automaton. This automaton should accept exactly those strings that
+are accepted by the regular expression.  The simplest and most
+well-known method for this is called \emph{Thompson Construction},
+after the Turing Award winner Ken Thompson. This method is by
+recursion over regular expressions and uses the non-determinism in
+NFAs. You will see shortly why this construction works well with NFAs,
+but is not so straightforward with DFAs. Unfortunately we are still
+one step away from our intended target though---because this
+construction uses a version of NFAs that allows ``silent
+transitions''. The idea behind silent transitions is that they
+allow us to go from one state to the next without having to consume a
+character. We label such silent transition with the letter
+$\epsilon$ and call the automata $\epsilon$NFAs. Two
+typical examples of $\epsilon$NFAs are
 
 
-The reason for introducing NFAs is that there is a relatively
-simple (recursive) translation of regular expressions into
-NFAs. Consider the simple regular expressions $\ZERO$,
-$\ONE$ and $c$. They can be translated as follows:
+\begin{center}
+\begin{tabular}[t]{c@{\hspace{9mm}}c}
+\begin{tikzpicture}[>=stealth',very thick,
+    every state/.style={minimum size=0pt,draw=blue!50,very thick,fill=blue!20},]
+\node[state,initial]  (Q_0)  {$Q_0$};
+\node[state] (Q_1) [above=of Q_0] {$Q_1$};
+\node[state, accepting] (Q_2) [below=of Q_0] {$Q_2$};
+\path[->] (Q_0) edge node [left]  {$\epsilon$} (Q_1);
+\path[->] (Q_0) edge node [left]  {$\epsilon$} (Q_2);
+\path[->] (Q_0) edge [loop right] node  {$a$} ();
+\path[->] (Q_1) edge [loop right] node  {$a$} ();
+\path[->] (Q_2) edge [loop right] node  {$b$} ();
+\end{tikzpicture} &
+
+\raisebox{20mm}{
+\begin{tikzpicture}[>=stealth',very thick,
+    every state/.style={minimum size=0pt,draw=blue!50,very thick,fill=blue!20},]
+\node[state,initial]  (r_1)  {$R_1$};
+\node[state] (r_2) [above=of r_1] {$R_2$};
+\node[state, accepting] (r_3) [right=of r_1] {$R_3$};
+\path[->] (r_1) edge node [below]  {$b$} (r_3);
+\path[->] (r_2) edge [bend left] node [above]  {$a$} (r_3);
+\path[->] (r_1) edge [bend left] node  [left] {$\epsilon$} (r_2);
+\path[->] (r_2) edge [bend left] node  [right] {$a$} (r_1);
+\end{tikzpicture}}
+\end{tabular}
+\end{center}
+
+\noindent
+Consider the $\epsilon$NFA on the left-hand side: an
+$\epsilon$-transition means you do not have to ``consume'' any part of
+the input string, but ``silently'' change to a different state. For
+example, if you are in the starting state $Q_0$, you can silently move
+either to $Q_1$ or $Q_2$. In this example you can see that once you
+are in $Q_1$, respectively $Q_2$, you cannot ``go back'' to the other
+states.
+
+On first appearances, $\epsilon$-transitions might look rather
+strange, or even silly. To start with, silent transitions make the
+decision whether a string is accepted by an automaton even harder:
+with $\epsilon$NFAs we have to look whether we can do first some
+$\epsilon$-transitions and then do a ``proper''-transition; and after
+any ``proper''-transition we again have to check whether we can do
+again some silent transitions. Even worse, if there is a silent
+transition pointing back to the same state, then we have to be careful
+our decision procedure for strings does not loop (remember the
+depth-first search for exploring all states).
+
+The obvious question is: Do we get anything in return for this hassle
+with silent transitions?  Well, we still have to work for it\ldots
+unfortunately.  If we were to follow the many textbooks on the
+subject, we would now start with defining what $\epsilon$NFAs
+are---that would require extending the transition relation of
+NFAs. Next, show that the $\epsilon$NFAs are equivalent to NFAs and so
+on. Once we have done all this on paper, we would need to implement
+$\epsilon$NFAs. Lets try to take a shortcut---we are not really
+interested in $\epsilon$NFAs; they are only a convenient tool for
+translating regular expressions into automata. We are not going to
+implementing them explicitly, but translate them directly into NFAs
+(in a sense $\epsilon$NFAs are just a convenient API for lazy people).
+How does the translation work: well we have to find all transitions of
+the form
+
+\[
+q\stackrel{\epsilon}{\longrightarrow}\ldots\stackrel{\epsilon}{\longrightarrow}
+\;\stackrel{a}{\longrightarrow}\;
+\stackrel{\epsilon}{\longrightarrow}\ldots\stackrel{\epsilon}{\longrightarrow} q'
+\]
+
+\noindent and replace them with $q \stackrel{a}{\longrightarrow}
+q'$. Doing this to the $\epsilon$NFA on the left-hand side above gives
+the NFA
+
+\begin{center}
+\begin{tikzpicture}[>=stealth',very thick,
+    every state/.style={minimum size=0pt,draw=blue!50,very thick,fill=blue!20},]
+\node[state,initial]  (r_1)  {$R_1$};
+\node[state] (r_2) [above=of r_1] {$R_2$};
+\node[state, accepting] (r_3) [right=of r_1] {$R_3$};
+\path[->] (r_1) edge node [above]  {$b$} (r_3);
+\path[->] (r_2) edge [bend left] node [above]  {$a$} (r_3);
+\path[->] (r_1) edge [bend left] node  [left] {$a$} (r_2);
+\path[->] (r_2) edge [bend left] node  [right] {$a$} (r_1);
+\path[->] (r_1) edge [loop below] node  {$a$} ();
+\path[->] (r_1) edge [bend right] node [below]  {$a$} (r_3);
+\end{tikzpicture}
+\end{center}
+
+\noindent where the single the $\epsilon$-transition is replaced by
+three more $a$-transitions. So whenever we are given $\epsilon$NFA we
+will, similarly, replace it by an equivalent NFA. The code for this is
+given in Figure~\ref{enfa}. At the core is a function that calculates
+a fixpoint of function (Lines 5--10). This is used for ``discovering''
+states reachable by $\epsilon$-transitions. Once no new state is
+reachable state, a fixpoint is reached. This is used for example when
+calculating the starting states of an equivalent NFA---see Line 36.
+We starts with all starting states of the $\epsilon$NFA and then look
+for all other states that can be reached by
+$\epsilon$-transition. This is what the $\epsilon$-closure, called
+\texttt{ecl}, calculates.
+
+\begin{figure}[p]
+\small
+\lstinputlisting[numbers=left,linebackgroundcolor=
+                  {\ifodd\value{lstnumber}\color{capri!3}\fi}]
+                  {../progs/enfa.scala}
+
+\caption{A Scala function that translates $\epsilon$NFA into NFAs. The
+  transtions of $\epsilon$NFA take as input an \texttt{Option[C]}.
+  \texttt{None} stands for an $\epsilon$-transition; \texttt{Some(c)}
+  for a ``proper'' transition. The functions in Lines 18--26 calculate
+  all states reachable by one or more $\epsilon$-transition for a given
+  set of states. The NFA is constructed in in Lines 36--38.\label{enfa}}
+\end{figure}
+
+
+Now having a translation of $\epsilon$NFAs to NFAs in place, we can
+finally make headway with the problem of translating regular
+expressions into equivalent NFAs. By equivalent we mean that the NFAs
+recognise the same language. Consider the simple regular expressions
+$\ZERO$, $\ONE$ and $c$. They can be translated into equivalent NFAs
+as follows:
 
 \begin{center}
 \begin{tabular}[t]{l@{\hspace{10mm}}l}
@@ -435,6 +486,15 @@
 \end{tabular}
 \end{center}
 
+\begin{figure}[p]
+\small
+\lstinputlisting[numbers=left,linebackgroundcolor=
+                  {\ifodd\value{lstnumber}\color{capri!3}\fi}]
+                  {../progs/thompson.scala}
+\caption{A Scala XXX\label{thompson}}
+\end{figure}
+
+
 \noindent The case for the sequence regular expression $r_1
 \cdot r_2$ is as follows: We are given by recursion two
 automata representing $r_1$ and $r_2$ respectively. 
@@ -606,7 +666,7 @@
 What is interesting is that for every NFA we can find a DFA
 which recognises the same language. This can, for example, be
 done by the \emph{subset construction}. Consider again the NFA
-below on the left, consisting of nodes labeled $0$, $1$ and $2$. 
+below on the left, consisting of nodes labelled $0$, $1$ and $2$. 
 
 \begin{center}
 \begin{tabular}{c@{\hspace{10mm}}c}
@@ -794,7 +854,7 @@
 edges for $Q_2$. Therefore we have two terms, namely $Q_1\,a$ and
 $Q_2\,a$. These terms are separated by $+$. The first states
 that if in state $Q_1$ consuming an $a$ will bring you to
-$Q_2$, and the secont that being in $Q_2$ and consuming an $a$
+$Q_2$, and the second that being in $Q_2$ and consuming an $a$
 will make you stay in $Q_2$. The right-hand side of the
 first equation is constructed similarly: there are three 
 incoming edges, therefore there are three terms. There is
@@ -815,7 +875,7 @@
 Q_2 & = & Q_0\,a\,a + Q_2\,a
 \end{eqnarray}
 
-\noindent where in Equation (4) we have two occurences
+\noindent where in Equation (4) we have two occurrences
 of $Q_0$. Like the laws about $+$ and $\cdot$, we can simplify 
 Equation (4) to obtain the following two equations:
 
@@ -1165,32 +1225,3 @@
 %%% End: 
 
 
-\noindent
-Two typical examples of NFAs are
-\begin{center}
-\begin{tabular}[t]{c@{\hspace{9mm}}c}
-\begin{tikzpicture}[scale=0.7,>=stealth',very thick,
-                             every state/.style={minimum size=0pt,draw=blue!50,very thick,fill=blue!20},]
-\node[state,initial]  (Q_0)  {$Q_0$};
-\node[state] (Q_1) [above=of Q_0] {$Q_1$};
-\node[state, accepting] (Q_2) [below=of Q_0] {$Q_2$};
-\path[->] (Q_0) edge node [left]  {$\epsilon$} (Q_1);
-\path[->] (Q_0) edge node [left]  {$\epsilon$} (Q_2);
-\path[->] (Q_0) edge [loop right] node  {$a$} ();
-\path[->] (Q_1) edge [loop above] node  {$a$} ();
-\path[->] (Q_2) edge [loop below] node  {$b$} ();
-\end{tikzpicture} &
-
-\raisebox{20mm}{
-\begin{tikzpicture}[scale=0.7,>=stealth',very thick,
-                             every state/.style={minimum size=0pt,draw=blue!50,very thick,fill=blue!20},]
-\node[state,initial]  (r_1)  {$r_1$};
-\node[state] (r_2) [above=of r_1] {$r_2$};
-\node[state, accepting] (r_3) [right=of r_1] {$r_3$};
-\path[->] (r_1) edge node [below]  {$b$} (r_3);
-\path[->] (r_2) edge [bend left] node [above]  {$a$} (r_3);
-\path[->] (r_1) edge [bend left] node  [left] {$\epsilon$} (r_2);
-\path[->] (r_2) edge [bend left] node  [right] {$a$} (r_1);
-\end{tikzpicture}}
-\end{tabular}
-\end{center}
--- a/progs/dfa.scala	Sat Apr 15 22:03:59 2017 +0800
+++ b/progs/dfa.scala	Tue Apr 25 12:28:07 2017 +0100
@@ -1,4 +1,4 @@
-// DFAs in Scala based on partial functions
+// DFAs in Scala  using partial functions
 import scala.util.Try
 
 // type abbreviation for partial functions
--- /dev/null	Thu Jan 01 00:00:00 1970 +0000
+++ b/progs/enfa.scala	Tue Apr 25 12:28:07 2017 +0100
@@ -0,0 +1,39 @@
+// epsilon NFAs...immediately translated into NFAs
+// (needs nfa.scala)
+
+// fixpoint construction
+import scala.annotation.tailrec
+@tailrec
+def fixpT[A](f: A => A, x: A): A = {
+  val fx = f(x)
+  if (fx == x) x else fixpT(f, fx) 
+}
+
+// translates eNFAs directly into NFAs 
+def eNFA[A, C](starts: Set[A],                     // starting states
+               delta: (A, Option[C]) :=> Set[A],   // epsilon-transitions
+               fins: A => Boolean) : NFA[A, C] = { // final states 
+
+  // epsilon transitions
+  def enext(q: A) : Set[A] = 
+    applyOrElse(delta, (q, None))
+
+  def enexts(qs: Set[A]) : Set[A] = 
+    qs | qs.flatMap(enext(_))
+
+  // epsilon closure
+  def ecl(qs: Set[A]) : Set[A] = 
+    fixpT(enexts, qs)
+
+  // "normal" transitions
+  def next(q: A, c: C) : Set[A] = 
+    applyOrElse(delta, (q, Some(c)))
+
+  def nexts(qs: Set[A], c: C) : Set[A] = 
+    ecl(ecl(qs).flatMap(next(_, c)))
+
+  // result NFA
+  NFA(ecl(starts), 
+      { case (q, c) => nexts(Set(q), c) }, 
+      q => ecl(Set(q)) exists fins)
+}
--- a/progs/nfa.scala	Sat Apr 15 22:03:59 2017 +0800
+++ b/progs/nfa.scala	Tue Apr 25 12:28:07 2017 +0100
@@ -1,26 +1,36 @@
-// NFAs in Scala based on sets of partial functions
+// NFAs in Scala using partial functions (returning
+// sets of states)
+import scala.util.Try
 
 // type abbreviation for partial functions
 type :=>[A, B] = PartialFunction[A, B]
 
+// return an empty set when not defined
+def applyOrElse[A, B](f: A :=> Set[B], x: A) : Set[B] =
+  Try(f(x)) getOrElse Set[B]()
+
+
+// NFAs
 case class NFA[A, C](starts: Set[A],            // starting states
-                     delta: Set[(A, C) :=> A],  // transitions
+                     delta: (A, C) :=> Set[A],  // transition function
                      fins:  A => Boolean) {     // final states 
 
-  // given a state and a character, what is the set of next states?
-  // if there is none => empty set
+  // given a state and a character, what is the set of 
+  // next states? if there is none => empty set
   def next(q: A, c: C) : Set[A] = 
-    delta.flatMap(d => Try(d(q, c)).toOption)
+    applyOrElse(delta, (q, c))
 
   def nexts(qs: Set[A], c: C) : Set[A] =
     qs.flatMap(next(_, c))
 
+  // given some states and a string, what is the set 
+  // of next states?
   def deltas(qs: Set[A], s: List[C]) : Set[A] = s match {
     case Nil => qs
     case c::cs => deltas(nexts(qs, c), cs)
   }
 
+  // is a string accepted by an NFA?
   def accepts(s: List[C]) : Boolean = 
     deltas(starts, s).exists(fins)
 }
-