afl-material: comparison handouts/ho03.tex

equal deleted inserted replaced

-:18bef085a7ca
+:83e6cb90216d
 ``silently'' change to a different state. In the left picture,
 for example, if you are in the starting state, you can
 silently move either to $q_1$ or $q_2$.
-\subsection*{Thompson Construction}
+\subsubsection*{Thompson Construction}
 The reason for introducing NFAs is that there is a relatively
 simple (recursive) translation of regular expressions into
 NFAs. Consider the simple regular expressions $\varnothing$,
 $\epsilon$ and $c$. They can be translated as follows:
 \noindent This construction of a NFA from a regular expression
 was invented by Ken Thompson in 1968.
-\subsection*{Subset Construction}
+\subsubsection*{Subset Construction}
 What is interesting that for every NFA we can find a DFA which
 recognises the same language. This can be done by the
 \emph{subset construction}. Consider again the NFA on the
 left, consisting of nodes labeled $0$, $1$ and $2$.
 of nodes by going from NFAs to DFAs exponentially increases,
 namely by $2^n$ (which is the number of subsets you can form
 for $n$ nodes).
-\subsection*{Brzozowski's Method}
+\subsubsection*{Brzozowski's Method}
-\subsection*{Automata Minimization}
+DFA -> NFA
-\subsection*{Regular Languages and Automata}
+\subsubsection*{Automata Minimization}
+\subsubsection*{Regular Languages}
+Given the constructions in the previous sections we obtain
+the following picture:
+\begin{center}
+\begin{tikzpicture}
+\node (rexp)  {\bf Regexps};
+\node (nfa) [right=of rexp] {\bf NFAs};
+\node (dfa) [right=of nfa] {\bf DFAs};
+\node (mdfa) [right=of dfa] {\bf\begin{tabular}{c}minimal\\ DFAs\end{tabular}};
+\path[->,line width=1mm] (rexp) edge node [above=4mm, black] {\begin{tabular}{c@{\hspace{9mm}}}Thompson's\\[-1mm] construction\end{tabular}} (nfa);
+\path[->,line width=1mm] (nfa) edge node [above=4mm, black] {\begin{tabular}{c}subset\\[-1mm] construction\end{tabular}}(dfa);
+\path[->,line width=1mm] (dfa) edge node [below=5mm, black] {minimisation} (mdfa);
+\path[->,line width=1mm] (dfa) edge [bend left=45] (rexp);
+\end{tikzpicture}
+\end{center}
+\noindent By going from regular expressions over NFAs to DFAs,
+we can always ensure that for every regular expression there
+exists a NFA and DFA that can recognise the same language.
+Although we did not prove this fact. Similarly by going from
+DFAs to regular expressions, we can make sure for every DFA
+there exists a regular expression that can recognise the same
+language. Again we did not prove this fact.
+The interesting conclusion is that automata and regular
+expressions can recognise the same set of languages:
+\begin{quote} A language is \emph{regular} iff there exists a
+regular expression that recognises all its strings.
+\end{quote}
+\noindent or equivalently
+\begin{quote} A language is \emph{regular} iff there exists an
+automaton that recognises all its strings.
+\end{quote}
+\noindent So for deciding whether a string is recognised by a
+regular expression, we could use our algorithm based on
+derivatives or NFAs or DFAs. But let us quickly look at what
+the differences mean in computational terms. Translating a
+regular expression into a NFA gives us an automaton that has
+$O(n)$ nodes---that means the size of the NFA grows linearly
+with the size of the regular expression. The problem with NFAs
+is that the problem of deciding whether a string is accepted
+is computationally not cheap. Remember with NFAs we have
+potentially many next states even for the same input and also
+have the silent $\epsilon$-transitions. If we want to find a
+path from the starting state of an NFA to an accepting state,
+we need to consider all possibilities. In Ruby and Python this
+is done by a depth-first search, which in turn means that if a
+``wrong'' choice is made, the algorithm has to backtrack and
+thus explore all potential candidates. This is exactly the
+reason why Ruby and Python are so slow for evil regular
+expressions. The alternative is to explore the search space
+in a breadth-first fashion, but this might incur a memory
+penalty.
+To avoid the problems with NFAs, we can translate them
+into DFAs. With DFAs the problem of deciding whether a
+string is recognised or not is much simpler, because in
+each state it is completely determined what the next
+state will be for a given input. So no search is needed.
+The problem with this is that the translation to DFAs
+can explode exponentially the number of states. Therefore when
+this route is taken, we definitely need to minimise the
+resulting DFAs in order to have an acceptable memory
+and runtime behaviour.
+But this does not mean that everything is bad with automata.
+Recall the problem of finding a regular expressions for the
+language that is \emph{not} recognised by a regular
+expression. In our implementation we added explicitly such a
+regular expressions because they are useful for recognising
+comments. But in principle we did not need to. The argument
+for this is as follows: take a regular expression, translate
+it into a NFA and DFA that recognise the same language. Once
+you have the DFA it is very easy to construct the automaton
+for the language not recognised by an DFA. If the DAF is
+completed (this is important!), then you just need to exchange
+the accepting and non-accepting states. You can then translate
+this DFA back into a regular expression.
 \end{document}
 %%% Local Variables:
 %%% mode: latex

changeset 269	83e6cb90216d
parent 268	18bef085a7ca
child 270	4dbeaf43031d