afl-material: comparison handouts/ho03.tex

equal deleted inserted replaced

-:e61ffb28994d
+:bd6d999ab7b6
 are easier to reason about and the notion of derivatives, although
 already quite old, only became more widely known rather
 recently. Still let us in this lecture have a closer look at automata
 and their relation to regular expressions. This will help us with
 understanding why the regular expression matchers in Python, Ruby and
-Java are so slow with certain regular expressions.
+Java are so slow with certain regular expressions. On the way we will
+also see what are the limitations of regular expressions.
 \subsection*{Deterministic Finite Automata}
-The central definition is:\medskip
+Lets start\ldots the central definition is:\medskip
 \noindent
 A \emph{deterministic finite automaton} (DFA), say $A$, is
 given by a five-tuple written ${\cal A}(\varSigma, Qs, Q_0, F, \delta)$ where
 \end{center}
 \noindent In this graphical notation, the accepting state $Q_4$ is
 indicated with double circles. Note that there can be more than one
 accepting state. It is also possible that a DFA has no accepting
-states at all, or that the starting state is also an accepting
+state at all, or that the starting state is also an accepting
 state. In the case above the transition function is defined everywhere
 and can also be given as a table as follows:
 \[
 \begin{array}{lcl}
 \widehat{\delta}(q, c\!::\!s) & \dn & \widehat{\delta}(\delta(q, c), s)\\
 \end{array}
 \]
 \noindent This lifted transition function is often called
-``delta-hat''. Given a string, we start in the starting state and take
+\emph{delta-hat}. Given a string, we start in the starting state and
-the first character of the string, follow to the next state, then take
+take the first character of the string, follow to the next state, then
-the second character and so on. Once the string is exhausted and we
+take the second character and so on. Once the string is exhausted and
-end up in an accepting state, then this string is accepted by the
+we end up in an accepting state, then this string is accepted by the
 automaton. Otherwise it is not accepted. This also means that if along
 the way we hit the case where the transition function $\delta$ is not
 defined, we need to raise an error. In our implementation we will deal
-with this case elegantly by using Scala's \texttt{Try}. So a string
+with this case elegantly by using Scala's \texttt{Try}.  Summing up: a
-$s$ is in the \emph{language accepted by the automaton} ${\cal
+string $s$ is in the \emph{language accepted by the automaton} ${\cal
 A}(\varSigma, Q, Q_0, F, \delta)$ iff
 \[
 \widehat{\delta}(Q_0, s) \in F
 \]
 \lstinputlisting[numbers=left,linebackgroundcolor=
 {\ifodd\value{lstnumber}\color{capri!3}\fi}]
 {../progs/dfa.scala}
 \caption{A Scala implementation of DFAs using partial functions.
 Notice some subtleties: \texttt{deltas} implements the delta-hat
-construction by lifting the transition (partial) function to
+construction by lifting the transition (partial) function to lists
-lists of characters. Since \texttt{delta} is given
+of characters. Since \texttt{delta} is given as a partial function,
-as a partial function, it can obviously go ``wrong'' in which
+it can obviously go ``wrong'' in which case the \texttt{Try} in
-case the \texttt{Try} in \texttt{accepts} catches the error and
+\texttt{accepts} catches the error and returns \texttt{false}---that
-returns \texttt{false}---that means the string is not accepted.
+means the string is not accepted.  The example \texttt{delta} in
-The example \texttt{delta} implements the DFA example shown
+Line 28--38 implements the DFA example shown earlier in the
-earlier in the handout.\label{dfa}}
+handout.\label{dfa}}
 \end{figure}
-My take of a simple Scala implementation for DFAs is given in
+My take on an implementation of DFAs in Scala is given in
 Figure~\ref{dfa}. As you can see, there are many features of the
 mathematical definition that are quite closely reflected in the
 code. In the DFA-class, there is a starting state, called
 \texttt{start}, with the polymorphic type \texttt{A}.  There is a
 partial function \texttt{delta} for specifying the transitions---these
 partial functions take a state (of polymorphic type \texttt{A}) and an
 input (of polymorphic type \texttt{C}) and produce a new state (of
 type \texttt{A}). For the moment it is OK to assume that \texttt{A} is
 some arbitrary type for states and the input is just characters.  (The
-reason for having polymorphic types for the states and the input of
+reason for not having concrete types, but polymorphic types for the
-DFAs will become clearer later on.)
+states and the input of DFAs will become clearer later on.)
-The most important point in this implemnetation is that I use Scala's
+The DFA-class has also an argument for specifying final states. In the
+implementation it is not a set of states, as in the mathematical
+definition, but a function from states to booleans (this function is
+supposed to return true whenever a state is final; false
+otherwise). While this boolean function is different from the sets of
+states, Scala allows to use sets for such functions (see Line 40 where
+the DFA is initialised). Again it will become clear later on why I use
+functions for final states, rather than sets.
+The most important point in the implementation is that I use Scala's
 partial functions for representing the transitions; alternatives would
 have been \texttt{Maps} or even \texttt{Lists}. One of the main
 advantages of using partial functions is that transitions can be quite
 nicely defined by a series of \texttt{case} statements (see Lines 28
 -- 38 for an example). If you need to represent an automaton with a
 case (S1, 'a') => S2
 case _ => Sink
 }
 \end{lstlisting}}
-\noindent I let you think what this DFA looks like in the graphical
+\noindent I let you think what the corresponding DFA looks like in the
-notation.
+graph notation.
-The DFA-class has also an argument for specifying final states. In the
+Finally, I let you ponder whether this is a good implementation of
-implementation it not a set of states, as in the matemathical
+DFAs or not. In doing so I hope you notice that the $\varSigma$ and
-definition, but a function from states to booleans (this function is
+$Qs$ components (the alphabet and the set of finite states,
-supposed to return true whenever a state is final; false
+respectively) are missing from the class definition. This means that
-otherwise). While this boolean function is different from the sets of
+the implementation allows you to do some fishy things you are not
-states, Scala allows to use sets for such functions (see Line 40 where
+meant to do with DFAs. Which fishy things could that be?
-the DFA is initialised). Again it will become clear later on why I use
-functions for final states, rather than sets.
-I let you ponder whether this is a good implementation of DFAs. In
-doing so I hope you notice that the $\varSigma$ and $Qs$ components (the
-alphabet and the set of finite states, respectively) are missing from
-the class definition. This means that the implementation allows you to
-do some fishy things you are not meant to do with DFAs. Which fishy
-things could that be?
 \subsection*{Non-Deterministic Finite Automata}
-While with DFAs it is always be clear that given a state and a
+Remember we want to find out what the relation is between regular
-character what the next state is (potentially none), it will be useful
+expressions and automata. To do this with DFAs is a bit unwieldy.
-to relax this restriction. That means we allow states to have several
+While with DFAs it is always clear that given a state and a character
+what the next state is (potentially none), it will be convenient to
+relax this restriction. That means we allow states to have several
 potential successor states. We even allow more than one starting
 state. The resulting construction is called a \emph{Non-Deterministic
-Finite Automaton} (NFA) given also as a five-tuple ${\cal A}(\varSigma,
+Finite Automaton} (NFA) given also as a five-tuple ${\cal
-Qs, Q_{0s}, F, \rho)$ where
+A}(\varSigma, Qs, Q_{0s}, F, \rho)$ where
 \begin{itemize}
 \item $\varSigma$ is an alphabet,
 \item $Qs$ is a finite set of states
 \item $Q_{0s}$ is a set of start states ($Q_{0s} \subseteq Qs$)
 this choice is that when it comes to deciding whether a string is
 accepted by a NFA we potentially have to explore all possibilities. I
 let you think which kind of strings the above NFA accepts.
-There are a number of additional points you should note with
+There are a number of additional points you should note about
 NFAs. Every DFA is a NFA, but not vice versa. The $\rho$ in NFAs is a
 transition \emph{relation} (DFAs have a transition function). The
 difference between a function and a relation is that a function has
 always a single output, while a relation gives, roughly speaking,
 several outputs. Look again at the NFA above: if you are currently in
 not determined.  This non-determinism can be represented by a
 relation.
 My implementation of NFAs in Scala is shown in Figure~\ref{nfa}.
 Perhaps interestingly, I do not actually use relations for my NFAs,
-and I also do not use transition functions that return sets of states
+but use transition functions that return sets of states.  DFAs have
-(another popular choice for implementing NFAs).  For reasons that
+partial transition functions that return a single state; my NFAs
-become clear in a moment, I use sets of partial functions
+return a set. I let you think about this representation for
-instead---see Line 7 in Figure~\ref{nfa}. DFAs have only one such
+NFA-transitions and how it corresponds to the relations used in the
-partial function; my NFAs have a set.  Another parameter,
+mathematical definition of NFAs.
-\texttt{starts}, is in NFAs a set of states; \texttt{fins} is again a
-function from states to booleans. The \texttt{next} function
+Like in the mathematical definition, \texttt{starts} is in NFAs a set
-calculates the set of next states reachable from a single state
+of states; \texttt{fins} is again a function from states to
-\texttt{q} by a character~\texttt{c}---this is calculated by going
+booleans. The \texttt{next} function calculates the set of next states
-through all the partial functions in the \texttt{delta}-set and apply
+reachable from a single state \texttt{q} by a character~\texttt{c}. In
-\texttt{q} and \texttt{c} (see Line 13). This gives a set of
+case there is no such state---the partial transition function is
-\texttt{Some}s (in case the application succeeded) and possibly some
+undefined---the empty set is returned (see function
-\texttt{None}s (in case the partial function is not defined or produces an
+\texttt{applyOrElse} in Lines 9 and 10). The function \texttt{nexts}
-error).  The \texttt{None}s are filtered out by the \texttt{flatMap},
+just lifts this function to sets of states.
-leaving the values inside the \texttt{Some}s. The function
-\texttt{nexts} just lifts this function to sets of
-states. \texttt{Deltas} and \texttt{accept} are similar to the DFA
-definitions.
 \begin{figure}[p]
 \small
 \lstinputlisting[numbers=left,linebackgroundcolor=
 {\ifodd\value{lstnumber}\color{capri!3}\fi}]
 {../progs/nfa.scala}
-\caption{A Scala implementation of NFAs using sets of partial functions.
+\caption{A Scala implementation of NFAs using partial functions.
-Notice some subtleties: Since \texttt{delta} is given
+Notice that the function \texttt{accepts} implements the
-as a set of partial functions, each of them can obviously go ``wrong'' in which
+acceptance of a string in a breath-first search fashion. This can be a costly
-case the \texttt{Try}. The function \texttt{accepts} implements the
+way of deciding whether a string is accepted or not in applications that need to handle
-acceptance of a string in a breath-first fashion. This can be costly
+large NFAs and large inputs.\label{nfa}}
-way of deciding whether a string is accepted in practical contexts.\label{nfa}}
 \end{figure}
-The reason for using sets of partial functions for specifying the
+Look very careful at the \texttt{accepts} and \texttt{deltas}
-transitions in NFAs has to do with pattern matching. Consider the
+functions in NFAs and remember that when accepting a string by a NFA
-following example: a popular benchmark regular expression is
-$(.)^*\cdot a\cdot (.)^{\{n\}}\cdot b\cdot c$. The important point to
-note is that it uses $.$ in order to represent the regular expression
-that accepts any character. A NFA that accepts the same strings as
-this regular expression (for $n=3$) is as follows:
-\begin{center}
-\begin{tikzpicture}[>=stealth',very thick, auto, node distance=7mm,
-every state/.style={minimum size=0pt,inner sep=1pt,
-draw=blue!50,very thick,fill=blue!20},scale=0.5]
-\node[state,initial]  (Q_0)  {$Q_0$};
-\node[state] (Q_1) [right=of Q_0] {$Q_1$};
-\node[state] (Q_2) [right=of Q_1] {$Q_2$};
-\node[state] (Q_3) [right=of Q_2] {$Q_3$};
-\node[state] (Q_4) [right=of Q_3] {$Q_4$};
-\node[state] (Q_5) [right=of Q_4] {$Q_5$};
-\node[state,accepting] (Q_6) [right=of Q_5] {$Q_6$};
-\path[->] (Q_0) edge [loop above] node  {$.$} ();
-\path[->] (Q_0) edge node [above]  {$a$} (Q_1);
-\path[->] (Q_1) edge node [above]  {$.$} (Q_2);
-\path[->] (Q_2) edge node [above]  {$.$} (Q_3);
-\path[->] (Q_3) edge node [above]  {$.$} (Q_4);
-\path[->] (Q_4) edge node [above]  {$b$} (Q_5);
-\path[->] (Q_5) edge node [above]  {$c$} (Q_6);
-\end{tikzpicture}
-\end{center}
-\noindent
-Also here the $.$ stands for accepting any single character: for example if we
-are in $Q_0$ and read an $a$ we can either stay in $Q_0$ (since any
-character will do for this) or advance to $Q_1$ (but only if it is an
-$a$). Why this is a good benchmark regular expression is irrelevant
-here. The point is that this NFA can be conveniently represented by
-the code:
-{\small\begin{lstlisting}[language=Scala,linebackgroundcolor=
-{\ifodd\value{lstnumber}\color{capri!3}\fi}]
-val delta = Set[(State, Char) :=> State](
-{ case (Q0, 'a') => Q1 },
-{ case (Q0, _)   => Q0 },
-{ case (Q1, _)   => Q2 },
-{ case (Q2, _)   => Q3 },
-{ case (Q3, _)   => Q4 },
-{ case (Q4, 'b') => Q5 },
-{ case (Q5, 'c') => Q6 }
-)
-NFA(Set[State](Q0), delta, Set[State](Q6))
-\end{lstlisting}}
-\noindent
-where the $.$-transitions translate into a
-underscore-pattern-matching. Recall that in $Q_0$ if we read an $a$ we
-can go to $Q_1$ (by the first partial function in the set) and also
-stay in $Q_0$ (by the second partial function). Representing such
-transitions in any other way in Scala seems to be somehow awkward; the
-set of partial function representation makes them easy to implement.
-Look very careful again at the \texttt{accepts} and \texttt{deltas}
-functions in NFAs and remember that when accepting a string by an NFA
 we might have to explore all possible transitions (recall which state
-to go to is not unique anymore with NFAs). The implementation achieves
+to go to is not unique anymore with NFAs\ldots{}we need to explore
-this exploration in a \emph{breadth-first search} manner. This is fine
+potentially all next states). The implementation achieves this
-for very small NFAs, but can lead to problems when the NFAs are
+exploration in a \emph{breadth-first search} manner. This is fine for
-bigger. Take for example the regular expression $(.)^*\cdot a\cdot
+small NFAs, but can lead to real memory problems when the NFAs are
-(.)^{\{n\}}\cdot b\cdot c$ from above. If $n$ is large, say 100 or
+bigger and larger strings need to be processed. As result, some
-1000, then the corresponding NFA will have 104, respectively 1004,
+regular expression matching engines resort to a \emph{depth-first
-nodes. The problem is that with certain strings this can lead to 1000
+search} with \emph{backtracking} in unsuccessful cases. In our
-``active'' nodes in the breadth-first search, all of which we need to
+implementation we can implement a depth-first version of
-analyse when determining the next states. This can be a real memory
+\texttt{accepts} using Scala's \texttt{exists}-function as follows:
-strain in practical applications. As result, some regular expression
-matching engines resort to a \emph{depth-first search} with
-\emph{backtracking} in unsuccessful cases. In our implementation we
-can implement a depth-first version of \texttt{accepts} using Scala's
-\texttt{exists} as follows:
 {\small\begin{lstlisting}[language=Scala,linebackgroundcolor=
 {\ifodd\value{lstnumber}\color{capri!3}\fi}]
 def search(q: A, s: List[C]) : Boolean = s match {
 case Nil => fins(q)
-case c::cs =>
+case c::cs => next(q, c).exists(search(_, cs))
-delta.exists(d => Try(search(d(q, c), cs)) getOrElse false)
 }
-def accepts(s: List[C]) : Boolean =
+def accepts2(s: List[C]) : Boolean =
 starts.exists(search(_, s))
 \end{lstlisting}}
 \noindent
 This depth-first way of exploration seems to work efficiently in many
-examples and is much less of strain on memory. The problem is that the
+examples and is much less of a strain on memory. The problem is that
-backtracking can get ``catastrophic'' in some examples---remember the
+the backtracking can get ``catastrophic'' in some examples---remember
-catastrophic backtracking from earlier lectures. This depth-first
+the catastrophic backtracking from earlier lectures. This depth-first
 search with backtracking is the reason for the abysmal performance of
-some regular expression macthings in Java, Ruby and Python. I like to
+some regular expression matchings in Java, Ruby and Python. I like to
 show you this next.
-%This means if
-%we need to decide whether a string is accepted by a NFA, we might have
-%to explore all possibilities. Also there is the special silent
-%transition in NFAs. As mentioned already this transition means you do
-%not have to ``consume'' any part of the input string, but ``silently''
-%change to a different state. In the left picture, for example, if you
-%are in the starting state, you can silently move either to $Q_1$ or
-%%$Q_2$. This silent transition is also often called
-%\emph{$\epsilon$-transition}.
 \subsubsection*{Thompson Construction}
-In order to get an idea what calculations are done in Java \& friends,
+In order to get an idea what calculations are performed by Java \&
-we need a method for translating regular expressions into
+friends, we need a method for transforming a regular expression into
-automata. The simplest and most well-known method is called
+an automaton. This automaton should accept exactly those strings that
-\emph{Thompson Construction}, after the Turing Award winner Ken
+are accepted by the regular expression.  The simplest and most
-Thompson who implemented this method in early versions of grep????
+well-known method for this is called \emph{Thompson Construction},
+after the Turing Award winner Ken Thompson. This method is by
+recursion over regular expressions and uses the non-determinism in
-The reason for introducing NFAs is that there is a relatively
+NFAs. You will see shortly why this construction works well with NFAs,
-simple (recursive) translation of regular expressions into
+but is not so straightforward with DFAs. Unfortunately we are still
-NFAs. Consider the simple regular expressions $\ZERO$,
+one step away from our intended target though---because this
-$\ONE$ and $c$. They can be translated as follows:
+construction uses a version of NFAs that allows ``silent
+transitions''. The idea behind silent transitions is that they
+allow us to go from one state to the next without having to consume a
+character. We label such silent transition with the letter
+$\epsilon$ and call the automata $\epsilon$NFAs. Two
+typical examples of $\epsilon$NFAs are
+\begin{center}
+\begin{tabular}[t]{c@{\hspace{9mm}}c}
+\begin{tikzpicture}[>=stealth',very thick,
+every state/.style={minimum size=0pt,draw=blue!50,very thick,fill=blue!20},]
+\node[state,initial]  (Q_0)  {$Q_0$};
+\node[state] (Q_1) [above=of Q_0] {$Q_1$};
+\node[state, accepting] (Q_2) [below=of Q_0] {$Q_2$};
+\path[->] (Q_0) edge node [left]  {$\epsilon$} (Q_1);
+\path[->] (Q_0) edge node [left]  {$\epsilon$} (Q_2);
+\path[->] (Q_0) edge [loop right] node  {$a$} ();
+\path[->] (Q_1) edge [loop right] node  {$a$} ();
+\path[->] (Q_2) edge [loop right] node  {$b$} ();
+\end{tikzpicture} &
+\raisebox{20mm}{
+\begin{tikzpicture}[>=stealth',very thick,
+every state/.style={minimum size=0pt,draw=blue!50,very thick,fill=blue!20},]
+\node[state,initial]  (r_1)  {$R_1$};
+\node[state] (r_2) [above=of r_1] {$R_2$};
+\node[state, accepting] (r_3) [right=of r_1] {$R_3$};
+\path[->] (r_1) edge node [below]  {$b$} (r_3);
+\path[->] (r_2) edge [bend left] node [above]  {$a$} (r_3);
+\path[->] (r_1) edge [bend left] node  [left] {$\epsilon$} (r_2);
+\path[->] (r_2) edge [bend left] node  [right] {$a$} (r_1);
+\end{tikzpicture}}
+\end{tabular}
+\end{center}
+\noindent
+Consider the $\epsilon$NFA on the left-hand side: an
+$\epsilon$-transition means you do not have to ``consume'' any part of
+the input string, but ``silently'' change to a different state. For
+example, if you are in the starting state $Q_0$, you can silently move
+either to $Q_1$ or $Q_2$. In this example you can see that once you
+are in $Q_1$, respectively $Q_2$, you cannot ``go back'' to the other
+states.
+On first appearances, $\epsilon$-transitions might look rather
+strange, or even silly. To start with, silent transitions make the
+decision whether a string is accepted by an automaton even harder:
+with $\epsilon$NFAs we have to look whether we can do first some
+$\epsilon$-transitions and then do a ``proper''-transition; and after
+any ``proper''-transition we again have to check whether we can do
+again some silent transitions. Even worse, if there is a silent
+transition pointing back to the same state, then we have to be careful
+our decision procedure for strings does not loop (remember the
+depth-first search for exploring all states).
+The obvious question is: Do we get anything in return for this hassle
+with silent transitions?  Well, we still have to work for it\ldots
+unfortunately.  If we were to follow the many textbooks on the
+subject, we would now start with defining what $\epsilon$NFAs
+are---that would require extending the transition relation of
+NFAs. Next, show that the $\epsilon$NFAs are equivalent to NFAs and so
+on. Once we have done all this on paper, we would need to implement
+$\epsilon$NFAs. Lets try to take a shortcut---we are not really
+interested in $\epsilon$NFAs; they are only a convenient tool for
+translating regular expressions into automata. We are not going to
+implementing them explicitly, but translate them directly into NFAs
+(in a sense $\epsilon$NFAs are just a convenient API for lazy people).
+How does the translation work: well we have to find all transitions of
+the form
+\[
+q\stackrel{\epsilon}{\longrightarrow}\ldots\stackrel{\epsilon}{\longrightarrow}
+\;\stackrel{a}{\longrightarrow}\;
+\stackrel{\epsilon}{\longrightarrow}\ldots\stackrel{\epsilon}{\longrightarrow} q'
+\]
+\noindent and replace them with $q \stackrel{a}{\longrightarrow}
+q'$. Doing this to the $\epsilon$NFA on the left-hand side above gives
+the NFA
+\begin{center}
+\begin{tikzpicture}[>=stealth',very thick,
+every state/.style={minimum size=0pt,draw=blue!50,very thick,fill=blue!20},]
+\node[state,initial]  (r_1)  {$R_1$};
+\node[state] (r_2) [above=of r_1] {$R_2$};
+\node[state, accepting] (r_3) [right=of r_1] {$R_3$};
+\path[->] (r_1) edge node [above]  {$b$} (r_3);
+\path[->] (r_2) edge [bend left] node [above]  {$a$} (r_3);
+\path[->] (r_1) edge [bend left] node  [left] {$a$} (r_2);
+\path[->] (r_2) edge [bend left] node  [right] {$a$} (r_1);
+\path[->] (r_1) edge [loop below] node  {$a$} ();
+\path[->] (r_1) edge [bend right] node [below]  {$a$} (r_3);
+\end{tikzpicture}
+\end{center}
+\noindent where the single the $\epsilon$-transition is replaced by
+three more $a$-transitions. So whenever we are given $\epsilon$NFA we
+will, similarly, replace it by an equivalent NFA. The code for this is
+given in Figure~\ref{enfa}. At the core is a function that calculates
+a fixpoint of function (Lines 5--10). This is used for ``discovering''
+states reachable by $\epsilon$-transitions. Once no new state is
+reachable state, a fixpoint is reached. This is used for example when
+calculating the starting states of an equivalent NFA---see Line 36.
+We starts with all starting states of the $\epsilon$NFA and then look
+for all other states that can be reached by
+$\epsilon$-transition. This is what the $\epsilon$-closure, called
+\texttt{ecl}, calculates.
+\begin{figure}[p]
+\small
+\lstinputlisting[numbers=left,linebackgroundcolor=
+{\ifodd\value{lstnumber}\color{capri!3}\fi}]
+{../progs/enfa.scala}
+\caption{A Scala function that translates $\epsilon$NFA into NFAs. The
+transtions of $\epsilon$NFA take as input an \texttt{Option[C]}.
+\texttt{None} stands for an $\epsilon$-transition; \texttt{Some(c)}
+for a ``proper'' transition. The functions in Lines 18--26 calculate
+all states reachable by one or more $\epsilon$-transition for a given
+set of states. The NFA is constructed in in Lines 36--38.\label{enfa}}
+\end{figure}
+Now having a translation of $\epsilon$NFAs to NFAs in place, we can
+finally make headway with the problem of translating regular
+expressions into equivalent NFAs. By equivalent we mean that the NFAs
+recognise the same language. Consider the simple regular expressions
+$\ZERO$, $\ONE$ and $c$. They can be translated into equivalent NFAs
+as follows:
 \begin{center}
 \begin{tabular}[t]{l@{\hspace{10mm}}l}
 \raisebox{1mm}{$\ZERO$} &
 \begin{tikzpicture}[scale=0.7,>=stealth',very thick, every state/.style={minimum size=3pt,draw=blue!50,very thick,fill=blue!20},]
 \node[state, accepting]  (Q_1)  [right=of Q_0] {$\mbox{}$};
 \path[->] (Q_0) edge node [below]  {$c$} (Q_1);
 \end{tikzpicture}\\\\
 \end{tabular}
 \end{center}
+\begin{figure}[p]
+\small
+\lstinputlisting[numbers=left,linebackgroundcolor=
+{\ifodd\value{lstnumber}\color{capri!3}\fi}]
+{../progs/thompson.scala}
+\caption{A Scala XXX\label{thompson}}
+\end{figure}
 \noindent The case for the sequence regular expression $r_1
 \cdot r_2$ is as follows: We are given by recursion two
 automata representing $r_1$ and $r_2$ respectively.
 \subsubsection*{Subset Construction}
 What is interesting is that for every NFA we can find a DFA
 which recognises the same language. This can, for example, be
 done by the \emph{subset construction}. Consider again the NFA
-below on the left, consisting of nodes labeled $0$, $1$ and $2$.
+below on the left, consisting of nodes labelled $0$, $1$ and $2$.
 \begin{center}
 \begin{tabular}{c@{\hspace{10mm}}c}
 \begin{tikzpicture}[scale=0.7,>=stealth',very thick,
 every state/.style={minimum size=0pt,
 state followed by character---in this case $Q_0\,a$. Now lets
 have a look at the third equation: there are two incoming
 edges for $Q_2$. Therefore we have two terms, namely $Q_1\,a$ and
 $Q_2\,a$. These terms are separated by $+$. The first states
 that if in state $Q_1$ consuming an $a$ will bring you to
-$Q_2$, and the secont that being in $Q_2$ and consuming an $a$
+$Q_2$, and the second that being in $Q_2$ and consuming an $a$
 will make you stay in $Q_2$. The right-hand side of the
 first equation is constructed similarly: there are three
 incoming edges, therefore there are three terms. There is
 one exception in that we also ``add'' $\ONE$ to the
 first equation, because it corresponds to the starting state
 \begin{eqnarray}
 Q_0 & = & \ONE + Q_0\,b + Q_0\,a\,b +  Q_2\,b\\
 Q_2 & = & Q_0\,a\,a + Q_2\,a
 \end{eqnarray}
-\noindent where in Equation (4) we have two occurences
+\noindent where in Equation (4) we have two occurrences
 of $Q_0$. Like the laws about $+$ and $\cdot$, we can simplify
 Equation (4) to obtain the following two equations:
 \begin{eqnarray}
 Q_0 & = & \ONE + Q_0\,(b + a\,b) +  Q_2\,b\\
 %%% mode: latex
 %%% TeX-master: t
 %%% End:
-\noindent
-Two typical examples of NFAs are
-\begin{center}
-\begin{tabular}[t]{c@{\hspace{9mm}}c}
-\begin{tikzpicture}[scale=0.7,>=stealth',very thick,
-every state/.style={minimum size=0pt,draw=blue!50,very thick,fill=blue!20},]
-\node[state,initial]  (Q_0)  {$Q_0$};
-\node[state] (Q_1) [above=of Q_0] {$Q_1$};
-\node[state, accepting] (Q_2) [below=of Q_0] {$Q_2$};
-\path[->] (Q_0) edge node [left]  {$\epsilon$} (Q_1);
-\path[->] (Q_0) edge node [left]  {$\epsilon$} (Q_2);
-\path[->] (Q_0) edge [loop right] node  {$a$} ();
-\path[->] (Q_1) edge [loop above] node  {$a$} ();
-\path[->] (Q_2) edge [loop below] node  {$b$} ();
-\end{tikzpicture} &
-\raisebox{20mm}{
-\begin{tikzpicture}[scale=0.7,>=stealth',very thick,
-every state/.style={minimum size=0pt,draw=blue!50,very thick,fill=blue!20},]
-\node[state,initial]  (r_1)  {$r_1$};
-\node[state] (r_2) [above=of r_1] {$r_2$};
-\node[state, accepting] (r_3) [right=of r_1] {$r_3$};
-\path[->] (r_1) edge node [below]  {$b$} (r_3);
-\path[->] (r_2) edge [bend left] node [above]  {$a$} (r_3);
-\path[->] (r_1) edge [bend left] node  [left] {$\epsilon$} (r_2);
-\path[->] (r_2) edge [bend left] node  [right] {$a$} (r_1);
-\end{tikzpicture}}
-\end{tabular}
-\end{center}

changeset 485	bd6d999ab7b6
parent 484	e61ffb28994d
child 487	a697421eaa04