handouts/ho03.tex
changeset 483 6f508bcdaa30
parent 482 0f6e3c5a1751
child 484 e61ffb28994d
equal deleted inserted replaced
482:0f6e3c5a1751 483:6f508bcdaa30
    27 Java are so slow with certain regular expressions.
    27 Java are so slow with certain regular expressions.
    28 
    28 
    29 
    29 
    30 \subsection*{Deterministic Finite Automata}
    30 \subsection*{Deterministic Finite Automata}
    31 
    31 
    32 Their definition is as follows:\medskip
    32 The central definition is:\medskip
    33 
    33 
    34 \noindent 
    34 \noindent 
    35 A \emph{deterministic finite automaton} (DFA), say $A$, is
    35 A \emph{deterministic finite automaton} (DFA), say $A$, is
    36 given by a four-tuple written $A(Qs, Q_0, F, \delta)$ where
    36 given by a five-tuple written $A(\Sigma, Qs, Q_0, F, \delta)$ where
    37 
    37 
    38 \begin{itemize}
    38 \begin{itemize}
       
    39 \item $\Sigma$ is an alphabet,  
    39 \item $Qs$ is a finite set of states,
    40 \item $Qs$ is a finite set of states,
    40 \item $Q_0 \in Qs$ is the start state,
    41 \item $Q_0 \in Qs$ is the start state,
    41 \item $F \subseteq Qs$ are the accepting states, and
    42 \item $F \subseteq Qs$ are the accepting states, and
    42 \item $\delta$ is the transition function.
    43 \item $\delta$ is the transition function.
    43 \end{itemize}
    44 \end{itemize}
    45 \noindent The transition function determines how to ``transition''
    46 \noindent The transition function determines how to ``transition''
    46 from one state to the next state with respect to a character. We have
    47 from one state to the next state with respect to a character. We have
    47 the assumption that these transition functions do not need to be
    48 the assumption that these transition functions do not need to be
    48 defined everywhere: so it can be the case that given a character there
    49 defined everywhere: so it can be the case that given a character there
    49 is no next state, in which case we need to raise a kind of ``failure
    50 is no next state, in which case we need to raise a kind of ``failure
    50 exception''.  That means actually we have partial functions as
    51 exception''.  That means actually we have \emph{partial} functions as
    51 transitions---see our implementation later on.  A typical example of a
    52 transitions---see the implementation later on.  A typical example of a
    52 DFA is
    53 DFA is
    53 
    54 
    54 \begin{center}
    55 \begin{center}
    55 \begin{tikzpicture}[>=stealth',very thick,auto,
    56 \begin{tikzpicture}[>=stealth',very thick,auto,
    56                     every state/.style={minimum size=0pt,
    57                     every state/.style={minimum size=0pt,
   113 end up in an accepting state, then this string is accepted by the
   114 end up in an accepting state, then this string is accepted by the
   114 automaton. Otherwise it is not accepted. This also means that if along
   115 automaton. Otherwise it is not accepted. This also means that if along
   115 the way we hit the case where the transition function $\delta$ is not
   116 the way we hit the case where the transition function $\delta$ is not
   116 defined, we need to raise an error. In our implementation we will deal
   117 defined, we need to raise an error. In our implementation we will deal
   117 with this case elegantly by using Scala's \texttt{Try}. So a string
   118 with this case elegantly by using Scala's \texttt{Try}. So a string
   118 $s$ is in the \emph{language accepted by the automaton} $A(Q, Q_0, F,
   119 $s$ is in the \emph{language accepted by the automaton} $A(\Sigma, Q, Q_0, F,
   119 \delta)$ iff
   120 \delta)$ iff
   120 
   121 
   121 \[
   122 \[
   122 \hat{\delta}(Q_0, s) \in F 
   123 \hat{\delta}(Q_0, s) \in F 
   123 \]
   124 \]
   124 
   125 
   125 \noindent I let you think about a definition that describes
   126 \noindent I let you think about a definition that describes
   126 the set of strings accepted by an automaton. 
   127 the set of all strings accepted by an automaton. 
   127 
   128 
   128 \begin{figure}[p]
   129 \begin{figure}[p]
   129 \small
   130 \small
   130 \lstinputlisting[numbers=left,linebackgroundcolor=
   131 \lstinputlisting[numbers=left,linebackgroundcolor=
   131                   {\ifodd\value{lstnumber}\color{capri!3}\fi}]
   132                   {\ifodd\value{lstnumber}\color{capri!3}\fi}]
   132                   {../progs/dfa.scala}
   133                   {../progs/dfa.scala}
   133 \caption{A Scala implementation of DFAs using partial functions.
   134 \caption{A Scala implementation of DFAs using partial functions.
   134   Notice some subtleties: \texttt{deltas} implements the delta-hat
   135   Notice some subtleties: \texttt{deltas} implements the delta-hat
   135   construction by lifting the transition (partial) function to
   136   construction by lifting the transition (partial) function to
   136   lists of \texttt{C}haracters. Since \texttt{delta} is given
   137   lists of characters. Since \texttt{delta} is given
   137   as partial function, it can obviously go ``wrong'' in which
   138   as a partial function, it can obviously go ``wrong'' in which
   138   case the \texttt{Try} in \texttt{accepts} catches the error and
   139   case the \texttt{Try} in \texttt{accepts} catches the error and
   139   returns \texttt{false}---that means the string is not accepted.
   140   returns \texttt{false}---that means the string is not accepted.
   140   The example \texttt{delta} implements the DFA example shown
   141   The example \texttt{delta} implements the DFA example shown
   141   earlier in the handout.\label{dfa}}
   142   earlier in the handout.\label{dfa}}
   142 \end{figure}
   143 \end{figure}
   143 
   144 
   144 A simple Scala implementation for DFAs is given in
   145 A simple Scala implementation for DFAs is given in
   145 Figure~\ref{dfa}. As you can see, there many features of the
   146 Figure~\ref{dfa}. As you can see, there are many features of the
   146 mathematical definition that are quite closely reflected in the
   147 mathematical definition that are quite closely reflected in the
   147 code. In the DFA-class, there is a starting state, called
   148 code. In the DFA-class, there is a starting state, called
   148 \texttt{start}, with the polymorphic type \texttt{A}. (The reason for
   149 \texttt{start}, with the polymorphic type \texttt{A}.  There is a
   149 having a polymorphic types for the states and the input of DFAs will
   150 partial function \texttt{delta} for specifying the transitions---these
   150 become clearer later.)  There is a partial function \texttt{delta} for
   151 partial functions take a state (of polymorphic type \texttt{A}) and an
   151 specifying the transitions. I used partial functions for representing
   152 input (of polymorphic type \texttt{C}) and produce a new state (of
   152 the transitions in Scala; alternatives would have been \texttt{Maps}
   153 type \texttt{A}). For the moment it is OK to assume that \texttt{A} is
   153 or even \texttt{Lists}. One of the main advantages of using partial
   154 some arbitrary type for states and the input is just characters.  (The
   154 functions is that transitions can be quite nicely defined by a series
   155 reason for having a polymorphic types for the states and the input of
   155 of \texttt{case} statements (see Lines 28 -- 38). If you need to
   156 DFAs will become clearer later on.)
   156 represent an automaton with a sink state (catch-all-state), you can
   157 
   157 write something like
   158 I used partial functions for representing the transitions in Scala;
       
   159 alternatives would have been \texttt{Maps} or even \texttt{Lists}. One
       
   160 of the main advantages of using partial functions is that transitions
       
   161 can be quite nicely defined by a series of \texttt{case} statements
       
   162 (see Lines 28 -- 38 for an example). If you need to represent an
       
   163 automaton with a sink state (catch-all-state), you can use Scala's
       
   164 pattern matching and write something like
   158 
   165 
   159 {\small\begin{lstlisting}[language=Scala,linebackgroundcolor=
   166 {\small\begin{lstlisting}[language=Scala,linebackgroundcolor=
   160                   {\ifodd\value{lstnumber}\color{capri!3}\fi}]
   167                   {\ifodd\value{lstnumber}\color{capri!3}\fi}]
   161 abstract class State
   168 abstract class State
   162 ...
   169 ...
   168     case _ => Sink
   175     case _ => Sink
   169   }
   176   }
   170 \end{lstlisting}}  
   177 \end{lstlisting}}  
   171 
   178 
   172 \noindent The DFA-class has also an argument for specifying final
   179 \noindent The DFA-class has also an argument for specifying final
   173 states. In the implementation it is a function from states to booleans
   180 states. In the implementation it not a set of states, as in the
   174 (returns true whenever a state is meant to be final; false
   181 matemathical definition, but is a function from states to booleans
   175 otherwise). While this boolean function is different from the sets of
   182 (this function is supposed to return true whenever a state is meant to
   176 states used in the mathematical definition, Scala allows me to use
   183 be final; false otherwise). While this boolean function is different
   177 sets for such functions (see Line 40 where I initialise the DFA).
   184 from the sets of states, Scala allows to use sets for such functions
       
   185 (see Line 40 where the DFA is initialised). Again it will become clear
       
   186 later on why I use functions for final states, rather than sets.
   178 
   187 
   179 I let you ponder whether this is a good implementation of DFAs. In
   188 I let you ponder whether this is a good implementation of DFAs. In
   180 doing so I hope you notice that the $Qs$ component (the set of finite
   189 doing so I hope you notice that the $Qs$ component (the set of finite
   181 states) is missing from the class definition. This means that the
   190 states) is missing from the class definition. This means that the
   182 implementation allows you to do some fishy things you are not meant to
   191 implementation allows you to do some fishy things you are not meant to
   189 While with DFAs it will always be clear that given a state and a
   198 While with DFAs it will always be clear that given a state and a
   190 character what the next state is (potentially none), it will be useful
   199 character what the next state is (potentially none), it will be useful
   191 to relax this restriction. That means we allow to have several
   200 to relax this restriction. That means we allow to have several
   192 potential successor states. We even allow more than one starting
   201 potential successor states. We even allow more than one starting
   193 state. The resulting construction is called a \emph{non-deterministic
   202 state. The resulting construction is called a \emph{non-deterministic
   194   finite automaton} (NFA) given also as a four-tuple $A(Qs, Q_{0s}, F,
   203   finite automaton} (NFA) given also as a five-tuple $A(\Sigma, Qs, Q_{0s}, F,
   195 \rho)$ where
   204 \rho)$ where
   196 
   205 
   197 \begin{itemize}
   206 \begin{itemize}
       
   207 \item $\Sigma$ is an alphabet,  
   198 \item $Qs$ is a finite set of states
   208 \item $Qs$ is a finite set of states
   199 \item $Q_{0s}$ is a set of start states ($Q_{0s} \subseteq Qs$)
   209 \item $Q_{0s}$ is a set of start states ($Q_{0s} \subseteq Qs$)
   200 \item $F$ are some accepting states with $F \subseteq Qs$, and
   210 \item $F$ are some accepting states with $F \subseteq Qs$, and
   201 \item $\rho$ is a transition relation.
   211 \item $\rho$ is a transition relation.
   202 \end{itemize}
   212 \end{itemize}
   203 
   213 
   204 \noindent
   214 \noindent
   205 A typical example of an NFA is
   215 A typical example of a NFA is
   206 
   216 
   207 % A NFA for (ab* + b)*a
   217 % A NFA for (ab* + b)*a
   208 \begin{center}
   218 \begin{center}
   209 \begin{tikzpicture}[scale=0.8,>=stealth',very thick,
   219 \begin{tikzpicture}[>=stealth',very thick, auto,
   210 every state/.style={minimum size=0pt,draw=blue!50,very thick,fill=blue!20},]
   220     every state/.style={minimum size=0pt,inner sep=3pt,
       
   221       draw=blue!50,very thick,fill=blue!20},scale=2]
   211 \node[state,initial]  (Q_0)  {$Q_0$};
   222 \node[state,initial]  (Q_0)  {$Q_0$};
   212 \node[state] (Q_1) [right=of Q_0] {$Q_1$};
   223 \node[state] (Q_1) [right=of Q_0] {$Q_1$};
   213 \node[state, accepting] (Q_2) [right=of Q_1] {$Q_2$};
   224 \node[state, accepting] (Q_2) [right=of Q_1] {$Q_2$};
   214 \path[->] (Q_0) edge [loop above] node  {$b$} ();
   225 \path[->] (Q_0) edge [loop above] node  {$b$} ();
   215 \path[<-] (Q_0) edge node [below]  {$b$} (Q_1);
   226 \path[<-] (Q_0) edge node [below]  {$b$} (Q_1);
   221 \end{center}
   232 \end{center}
   222 
   233 
   223 \noindent
   234 \noindent
   224 This NFA happens to have only one starting state, but in general there
   235 This NFA happens to have only one starting state, but in general there
   225 could be more.  Notice that in state $Q_0$ we might go to state $Q_1$
   236 could be more.  Notice that in state $Q_0$ we might go to state $Q_1$
   226 \emph{or} to state $Q_2$ when receiving an $a$. Similarly in state $Q_1$
   237 \emph{or} to state $Q_2$ when receiving an $a$. Similarly in state
   227 and receiving an $a$, we can stay in state $Q_1$ or go to $Q_2$. This
   238 $Q_1$ and receiving an $a$, we can stay in state $Q_1$ or go to
   228 kind of choice is not allowed with DFAs. When it comes to deciding
   239 $Q_2$. This kind of choice is not allowed with DFAs. The downside is
   229 whether a string is accepted by an NFA we potentially need to explore
   240 that when it comes to deciding whether a string is accepted by a NFA
   230 all possibilities. I let you think which kind of strings this NFA
   241 we potentially have to explore all possibilities. I let you think
   231 accepts.
   242 which kind of strings the above NFA accepts.
   232 
   243 
   233 
   244 
   234 There are however a number of additional points you should note. Every
   245 There are a number of additional points you should note with
   235 DFA is a NFA, but not vice versa. The $\rho$ in NFAs is a transition
   246 NFAs. Every DFA is a NFA, but not vice versa. The $\rho$ in NFAs is a
   236 \emph{relation} (DFAs have a transition function). The difference
   247 transition \emph{relation} (DFAs have a transition function). The
   237 between a function and a relation is that a function has always a
   248 difference between a function and a relation is that a function has
   238 single output, while a relation gives, roughly speaking, several
   249 always a single output, while a relation gives, roughly speaking,
   239 outputs. Look again at the NFA above: if you are currently in the
   250 several outputs. Look again at the NFA above: if you are currently in
   240 state $Q_1$ and you read a character $b$, then you can transition to
   251 the state $Q_1$ and you read a character $b$, then you can transition
   241 either $Q_0$ \emph{or} $Q_2$. Which route, or output, you take is not
   252 to either $Q_0$ \emph{or} $Q_2$. Which route, or output, you take is
   242 determined.  This non-determinism can be represented by a relation.
   253 not determined.  This non-determinism can be represented by a
   243 
   254 relation.
   244 My implementation of NFAs is shown in Figure~\ref{nfa}.  Perhaps
   255 
   245 interestingly, I do not use relations for my implementation of NFAs in
   256 My implementation of NFAs in Scala is shown in Figure~\ref{nfa}.
   246 Scala, and I also do not use transition functions which return a set
   257 Perhaps interestingly, I do not actually use relations for my NFAs,
   247 of states (another popular choice for implementing NFAs).  For reasons
   258 and I also do not use transition functions that return sets of states
   248 that become clear in a moment, I use sets of partial functions---see
   259 (another popular choice for implementing NFAs).  For reasons that
   249 Line 7 in Figure~\ref{nfa}. \texttt{Starts} is now a set of states;
   260 become clear in a moment, I use sets of partial functions
   250 \texttt{fins} is again a function from states to booleans. The
   261 instead---see Line 7 in Figure~\ref{nfa}. DFAs have only one such
   251 \texttt{next} function calculates the set of next states reachable
   262 partial function; my NFAs have a set.  Another parameter,
   252 from a state \texttt{q} by a character \texttt{c}---we go through all
   263 \texttt{Starts}, is in NFAs a set of states; \texttt{fins} is again a
   253 partial functions in the \texttt{delta}-set, lift it (this transformes
   264 function from states to booleans. The \texttt{next} function
   254 the partial function into the corresponding \texttt{Option}-function
   265 calculates the set of next states reachable from a single state
   255 and then apply \texttt{q} and \texttt{c}. This gives us a set of
   266 \texttt{q} by a character \texttt{c}---this is calculated by going
   256 \texttt{Some}s and \texttt{None}s, where the \texttt{Some}s are
   267 through all the partial functions in the \texttt{delta}-set and apply
   257 filtered out by the \texttt{flatMap}. Teh function \texttt{nexts} just
   268 \texttt{q} and \texttt{c} (see Line 13). This gives us a set of
   258 lifts this to sets of states. \texttt{Deltas} and \texttt{accept} are
   269 \texttt{Some}s (in case the application succeeded) and possibly some
   259 similar to the DFA definitions.
   270 \texttt{None}s (in case the partial function is not defined or produces an
       
   271 error).  The \texttt{None}s are filtered out by the \texttt{flatMap},
       
   272 leaving the values inside the \texttt{Some}s. The function
       
   273 \texttt{nexts} just lifts this function to sets of
       
   274 states. \texttt{Deltas} and \texttt{accept} are similar to the DFA
       
   275 definitions.
   260 
   276 
   261 \begin{figure}[t]
   277 \begin{figure}[t]
   262 \small
   278 \small
   263 \lstinputlisting[numbers=left,linebackgroundcolor=
   279 \lstinputlisting[numbers=left,linebackgroundcolor=
   264                   {\ifodd\value{lstnumber}\color{capri!3}\fi}]
   280                   {\ifodd\value{lstnumber}\color{capri!3}\fi}]
   265                   {../progs/nfa.scala}
   281                   {../progs/nfa.scala}
   266 \caption{A Scala implementation of NFAs using sets of partial functions.
   282 \caption{A Scala implementation of NFAs using sets of partial functions.
   267   Notice some subtleties: \texttt{deltas} implements the delta-hat
   283   Notice some subtleties: Since \texttt{delta} is given
   268   construction by lifting the transition (partial) function to
   284   as a set of partial functions, each of them can obviously go ``wrong'' in which
   269   lists of \texttt{C}haracters. Since \texttt{delta} is given
   285   case the \texttt{Try}. The function \texttt{accepts} implements the
   270   as partial function, it can obviously go ``wrong'' in which
   286   acceptance of a string in a breath-first fashion. This can be costly
   271   case the \texttt{Try} in \texttt{accepts} catches the error and
   287   way of deciding whether a string is accepted in practical contexts.\label{nfa}}
   272   returns \texttt{false}---that means the string is not accepted.
       
   273   The example \texttt{delta} implements the DFA example shown
       
   274   earlier in the handout.\label{nfa}}
       
   275 \end{figure}
   288 \end{figure}
   276 
   289 
   277 
   290 The reason for using sets of partial functions for specifying the
       
   291 transitions in NFAs has to do with examples like this one: a
       
   292 popular benchmark regular expression is $(.)^*\cdot a\cdot
       
   293 (.)^{\{n\}}\cdot b\cdot c$. A NFA that accepts the same strings
       
   294 (for $n=3$) is as follows:
       
   295 
       
   296 \begin{center}
       
   297 \begin{tikzpicture}[>=stealth',very thick, auto, node distance=7mm,
       
   298     every state/.style={minimum size=0pt,inner sep=1pt,
       
   299       draw=blue!50,very thick,fill=blue!20},scale=0.5]
       
   300 \node[state,initial]  (Q_0)  {$Q_0$};
       
   301 \node[state] (Q_1) [right=of Q_0] {$Q_1$};
       
   302 \node[state] (Q_2) [right=of Q_1] {$Q_2$};
       
   303 \node[state] (Q_3) [right=of Q_2] {$Q_3$};
       
   304 \node[state] (Q_4) [right=of Q_3] {$Q_4$};
       
   305 \node[state] (Q_5) [right=of Q_4] {$Q_5$};
       
   306 \node[state,accepting] (Q_6) [right=of Q_5] {$Q_6$};
       
   307 \path[->] (Q_0) edge [loop above] node  {$.$} ();
       
   308 \path[->] (Q_0) edge node [above]  {$a$} (Q_1);
       
   309 \path[->] (Q_1) edge node [above]  {$.$} (Q_2);
       
   310 \path[->] (Q_2) edge node [above]  {$.$} (Q_3);
       
   311 \path[->] (Q_3) edge node [above]  {$.$} (Q_4);
       
   312 \path[->] (Q_4) edge node [above]  {$b$} (Q_5);
       
   313 \path[->] (Q_5) edge node [above]  {$c$} (Q_6);
       
   314 \end{tikzpicture}
       
   315 \end{center}
       
   316 
       
   317 \noindent
       
   318 The $.$ stands for accepting any single character: for example if we
       
   319 are in $Q_0$ and read an $a$ we can either stay in $Q_0$ (since any
       
   320 character will do for this) or advance to $Q_1$ (but only if it is an
       
   321 $a$). Why this is a good benchmark regular expression is irrelevant
       
   322 here. The point is that this NFA can be conveniently represented by
       
   323 the code:
       
   324 
       
   325 {\small\begin{lstlisting}[language=Scala,linebackgroundcolor=
       
   326                   {\ifodd\value{lstnumber}\color{capri!3}\fi}]
       
   327 val delta = Set[(State, Char) :=> State](
       
   328   { case (Q0, 'a') => Q1 },
       
   329   { case (Q0, _)   => Q0 },
       
   330   { case (Q1, _)   => Q2 },
       
   331   { case (Q2, _)   => Q3 },
       
   332   { case (Q3, _)   => Q4 },
       
   333   { case (Q4, 'b') => Q5 },
       
   334   { case (Q5, 'c') => Q6 }
       
   335 )
       
   336 
       
   337 NFA(Set[State](Q0), delta, Set[State](Q6))
       
   338 \end{lstlisting}}
       
   339 
       
   340 \noindent
       
   341 where the $.$-transitions translate into a
       
   342 underscore-pattern-matching. Recall that in $Q_0$ if we read an $a$ we
       
   343 can go to $Q_1$ (by the first partial function in the set) and also
       
   344 stay in $Q_0$ (by the second partial function). Representing such
       
   345 transitions in any other way is somehow awkward; the set of partial
       
   346 function representation makes them easy to implement.
       
   347 
       
   348 Look very careful again at the \texttt{accepts} and \texttt{deltas}
       
   349 functions in NFAs and remember that when accepting a string by an NFA
       
   350 we might have to explore all possible transitions (which state to go
       
   351 to is not unique anymore). The shown implementation achieves this
       
   352 exploration in a \emph{breath-first search}. This is fine for very
       
   353 small NFAs, but can lead to problems when the NFAs are bigger. Take
       
   354 for example the regular expression $(.)^*\cdot a\cdot (.)^{\{n\}}\cdot
       
   355 b\cdot c$ from above. If $n$ is large, say 100 or 1000, then the
       
   356 corresponding NFA will have 104, respectively 1004, nodes. The problem
       
   357 is that with certain strings this can lead to 1000 ``active'' nodes
       
   358 which we need to analyse when determining the next states. This can be
       
   359 a real memory strain in practical applications. As a result, some
       
   360 regular expression matching engines resort to a \emph{depth-first
       
   361   search} with \emph{backtracking} in unsuccessful cases. In our
       
   362 implementation we could implement a depth-first version of
       
   363 \texttt{accepts} using Scala's \texttt{exists} as follows:
       
   364 
       
   365 
       
   366 {\small\begin{lstlisting}[language=Scala,linebackgroundcolor=
       
   367                   {\ifodd\value{lstnumber}\color{capri!3}\fi}]
       
   368 def search(q: A, s: List[C]) : Boolean = s match {
       
   369   case Nil => fins(q)
       
   370   case c::cs =>
       
   371     delta.exists((d) => Try(search(d(q, c), cs)) getOrElse false)
       
   372 }
       
   373 
       
   374 def accepts(s: List[C]) : Boolean = 
       
   375   starts.exists(search(_, s))
       
   376 \end{lstlisting}}
       
   377 
       
   378 \noindent
       
   379 This depth-first way of exploration seems to work efficiently in many
       
   380 examples and is much less of strain on memory. The problem is that the
       
   381 backtracking can get catastrophic in some examples---remember the
       
   382 catastrophic backtracking from earlier lectures. This depth-first
       
   383 search with backtracking is the reason for the abysmal performance of
       
   384 regular expression macthing in Java, Ruby and Python.
   278 
   385 
   279 %This means if
   386 %This means if
   280 %we need to decide whether a string is accepted by a NFA, we might have
   387 %we need to decide whether a string is accepted by a NFA, we might have
   281 %to explore all possibilities. Also there is the special silent
   388 %to explore all possibilities. Also there is the special silent
   282 %transition in NFAs. As mentioned already this transition means you do
   389 %transition in NFAs. As mentioned already this transition means you do
   778 for the purposes of this module, we omit this.
   885 for the purposes of this module, we omit this.
   779 
   886 
   780 \subsubsection*{Automata Minimization}
   887 \subsubsection*{Automata Minimization}
   781 
   888 
   782 As seen in the subset construction, the translation 
   889 As seen in the subset construction, the translation 
   783 of an NFA to a DFA can result in a rather ``inefficient'' 
   890 of a NFA to a DFA can result in a rather ``inefficient'' 
   784 DFA. Meaning there are states that are not needed. A
   891 DFA. Meaning there are states that are not needed. A
   785 DFA can be \emph{minimised} by the following algorithm:
   892 DFA can be \emph{minimised} by the following algorithm:
   786 
   893 
   787 \begin{enumerate}
   894 \begin{enumerate}
   788 \item Take all pairs $(q, p)$ with $q \not= p$
   895 \item Take all pairs $(q, p)$ with $q \not= p$
   972 with the size of the regular expression. The problem with NFAs
  1079 with the size of the regular expression. The problem with NFAs
   973 is that the problem of deciding whether a string is accepted
  1080 is that the problem of deciding whether a string is accepted
   974 or not is computationally not cheap. Remember with NFAs we
  1081 or not is computationally not cheap. Remember with NFAs we
   975 have potentially many next states even for the same input and
  1082 have potentially many next states even for the same input and
   976 also have the silent $\epsilon$-transitions. If we want to
  1083 also have the silent $\epsilon$-transitions. If we want to
   977 find a path from the starting state of an NFA to an accepting
  1084 find a path from the starting state of a NFA to an accepting
   978 state, we need to consider all possibilities. In Ruby and
  1085 state, we need to consider all possibilities. In Ruby and
   979 Python this is done by a depth-first search, which in turn
  1086 Python this is done by a depth-first search, which in turn
   980 means that if a ``wrong'' choice is made, the algorithm has to
  1087 means that if a ``wrong'' choice is made, the algorithm has to
   981 backtrack and thus explore all potential candidates. This is
  1088 backtrack and thus explore all potential candidates. This is
   982 exactly the reason why Ruby and Python are so slow for evil
  1089 exactly the reason why Ruby and Python are so slow for evil
  1006 regular expressions because they are useful for recognising
  1113 regular expressions because they are useful for recognising
  1007 comments. But in principle we did not need to. The argument
  1114 comments. But in principle we did not need to. The argument
  1008 for this is as follows: take a regular expression, translate
  1115 for this is as follows: take a regular expression, translate
  1009 it into a NFA and then a DFA that both recognise the same
  1116 it into a NFA and then a DFA that both recognise the same
  1010 language. Once you have the DFA it is very easy to construct
  1117 language. Once you have the DFA it is very easy to construct
  1011 the automaton for the language not recognised by an DFA. If
  1118 the automaton for the language not recognised by a DFA. If
  1012 the DFA is completed (this is important!), then you just need
  1119 the DFA is completed (this is important!), then you just need
  1013 to exchange the accepting and non-accepting states. You can
  1120 to exchange the accepting and non-accepting states. You can
  1014 then translate this DFA back into a regular expression and
  1121 then translate this DFA back into a regular expression and
  1015 that will be the regular expression that can match all strings
  1122 that will be the regular expression that can match all strings
  1016 the original regular expression could \emph{not} match.
  1123 the original regular expression could \emph{not} match.