handouts/ho03.tex
changeset 269 83e6cb90216d
parent 268 18bef085a7ca
child 270 4dbeaf43031d
equal deleted inserted replaced
268:18bef085a7ca 269:83e6cb90216d
   178 ``silently'' change to a different state. In the left picture,
   178 ``silently'' change to a different state. In the left picture,
   179 for example, if you are in the starting state, you can 
   179 for example, if you are in the starting state, you can 
   180 silently move either to $q_1$ or $q_2$.
   180 silently move either to $q_1$ or $q_2$.
   181 
   181 
   182 
   182 
   183 \subsection*{Thompson Construction}
   183 \subsubsection*{Thompson Construction}
   184 
   184 
   185 The reason for introducing NFAs is that there is a relatively
   185 The reason for introducing NFAs is that there is a relatively
   186 simple (recursive) translation of regular expressions into
   186 simple (recursive) translation of regular expressions into
   187 NFAs. Consider the simple regular expressions $\varnothing$,
   187 NFAs. Consider the simple regular expressions $\varnothing$,
   188 $\epsilon$ and $c$. They can be translated as follows:
   188 $\epsilon$ and $c$. They can be translated as follows:
   370 
   370 
   371 \noindent This construction of a NFA from a regular expression
   371 \noindent This construction of a NFA from a regular expression
   372 was invented by Ken Thompson in 1968.
   372 was invented by Ken Thompson in 1968.
   373 
   373 
   374 
   374 
   375 \subsection*{Subset Construction}
   375 \subsubsection*{Subset Construction}
   376 
   376 
   377 What is interesting that for every NFA we can find a DFA which
   377 What is interesting that for every NFA we can find a DFA which
   378 recognises the same language. This can be done by the
   378 recognises the same language. This can be done by the
   379 \emph{subset construction}. Consider again the NFA on the 
   379 \emph{subset construction}. Consider again the NFA on the 
   380 left, consisting of nodes labeled $0$, $1$ and $2$. 
   380 left, consisting of nodes labeled $0$, $1$ and $2$. 
   456 of nodes by going from NFAs to DFAs exponentially increases,
   456 of nodes by going from NFAs to DFAs exponentially increases,
   457 namely by $2^n$ (which is the number of subsets you can form
   457 namely by $2^n$ (which is the number of subsets you can form
   458 for $n$ nodes). 
   458 for $n$ nodes). 
   459 
   459 
   460 
   460 
   461 \subsection*{Brzozowski's Method}
   461 \subsubsection*{Brzozowski's Method}
   462 
   462 
   463 \subsection*{Automata Minimization}
   463 DFA -> NFA
   464 
   464 
   465 \subsection*{Regular Languages and Automata}
   465 \subsubsection*{Automata Minimization}
       
   466 
       
   467 \subsubsection*{Regular Languages}
       
   468 
       
   469 Given the constructions in the previous sections we obtain 
       
   470 the following picture:
       
   471 
       
   472 \begin{center}
       
   473 \begin{tikzpicture}
       
   474 \node (rexp)  {\bf Regexps};
       
   475 \node (nfa) [right=of rexp] {\bf NFAs};
       
   476 \node (dfa) [right=of nfa] {\bf DFAs};
       
   477 \node (mdfa) [right=of dfa] {\bf\begin{tabular}{c}minimal\\ DFAs\end{tabular}};
       
   478 \path[->,line width=1mm] (rexp) edge node [above=4mm, black] {\begin{tabular}{c@{\hspace{9mm}}}Thompson's\\[-1mm] construction\end{tabular}} (nfa);
       
   479 \path[->,line width=1mm] (nfa) edge node [above=4mm, black] {\begin{tabular}{c}subset\\[-1mm] construction\end{tabular}}(dfa);
       
   480 \path[->,line width=1mm] (dfa) edge node [below=5mm, black] {minimisation} (mdfa);
       
   481 \path[->,line width=1mm] (dfa) edge [bend left=45] (rexp);
       
   482 \end{tikzpicture}
       
   483 \end{center}
       
   484 
       
   485 \noindent By going from regular expressions over NFAs to DFAs,
       
   486 we can always ensure that for every regular expression there
       
   487 exists a NFA and DFA that can recognise the same language.
       
   488 Although we did not prove this fact. Similarly by going from
       
   489 DFAs to regular expressions, we can make sure for every DFA 
       
   490 there exists a regular expression that can recognise the same
       
   491 language. Again we did not prove this fact. 
       
   492 
       
   493 The interesting conclusion is that automata and regular 
       
   494 expressions can recognise the same set of languages:
       
   495 
       
   496 \begin{quote} A language is \emph{regular} iff there exists a
       
   497 regular expression that recognises all its strings.
       
   498 \end{quote}
       
   499 
       
   500 \noindent or equivalently 
       
   501 
       
   502 \begin{quote} A language is \emph{regular} iff there exists an
       
   503 automaton that recognises all its strings.
       
   504 \end{quote}
       
   505 
       
   506 \noindent So for deciding whether a string is recognised by a
       
   507 regular expression, we could use our algorithm based on
       
   508 derivatives or NFAs or DFAs. But let us quickly look at what
       
   509 the differences mean in computational terms. Translating a
       
   510 regular expression into a NFA gives us an automaton that has
       
   511 $O(n)$ nodes---that means the size of the NFA grows linearly
       
   512 with the size of the regular expression. The problem with NFAs
       
   513 is that the problem of deciding whether a string is accepted
       
   514 is computationally not cheap. Remember with NFAs we have
       
   515 potentially many next states even for the same input and also
       
   516 have the silent $\epsilon$-transitions. If we want to find a
       
   517 path from the starting state of an NFA to an accepting state,
       
   518 we need to consider all possibilities. In Ruby and Python this
       
   519 is done by a depth-first search, which in turn means that if a
       
   520 ``wrong'' choice is made, the algorithm has to backtrack and
       
   521 thus explore all potential candidates. This is exactly the
       
   522 reason why Ruby and Python are so slow for evil regular
       
   523 expressions. The alternative is to explore the search space
       
   524 in a breadth-first fashion, but this might incur a memory
       
   525 penalty.  
       
   526 
       
   527 To avoid the problems with NFAs, we can translate them 
       
   528 into DFAs. With DFAs the problem of deciding whether a
       
   529 string is recognised or not is much simpler, because in
       
   530 each state it is completely determined what the next
       
   531 state will be for a given input. So no search is needed.
       
   532 The problem with this is that the translation to DFAs
       
   533 can explode exponentially the number of states. Therefore when 
       
   534 this route is taken, we definitely need to minimise the
       
   535 resulting DFAs in order to have an acceptable memory 
       
   536 and runtime behaviour.
       
   537 
       
   538 But this does not mean that everything is bad with automata.
       
   539 Recall the problem of finding a regular expressions for the
       
   540 language that is \emph{not} recognised by a regular
       
   541 expression. In our implementation we added explicitly such a
       
   542 regular expressions because they are useful for recognising
       
   543 comments. But in principle we did not need to. The argument
       
   544 for this is as follows: take a regular expression, translate
       
   545 it into a NFA and DFA that recognise the same language. Once
       
   546 you have the DFA it is very easy to construct the automaton
       
   547 for the language not recognised by an DFA. If the DAF is
       
   548 completed (this is important!), then you just need to exchange
       
   549 the accepting and non-accepting states. You can then translate
       
   550 this DFA back into a regular expression. 
   466 
   551 
   467 \end{document}
   552 \end{document}
   468 
   553 
   469 %%% Local Variables: 
   554 %%% Local Variables: 
   470 %%% mode: latex
   555 %%% mode: latex