diff -r 18bef085a7ca -r 83e6cb90216d handouts/ho03.tex --- a/handouts/ho03.tex Sat Oct 11 01:13:13 2014 +0100 +++ b/handouts/ho03.tex Sat Oct 11 13:09:15 2014 +0100 @@ -180,7 +180,7 @@ silently move either to $q_1$ or $q_2$. -\subsection*{Thompson Construction} +\subsubsection*{Thompson Construction} The reason for introducing NFAs is that there is a relatively simple (recursive) translation of regular expressions into @@ -372,7 +372,7 @@ was invented by Ken Thompson in 1968. -\subsection*{Subset Construction} +\subsubsection*{Subset Construction} What is interesting that for every NFA we can find a DFA which recognises the same language. This can be done by the @@ -458,11 +458,96 @@ for $n$ nodes). -\subsection*{Brzozowski's Method} +\subsubsection*{Brzozowski's Method} + +DFA -> NFA + +\subsubsection*{Automata Minimization} + +\subsubsection*{Regular Languages} + +Given the constructions in the previous sections we obtain +the following picture: + +\begin{center} +\begin{tikzpicture} +\node (rexp) {\bf Regexps}; +\node (nfa) [right=of rexp] {\bf NFAs}; +\node (dfa) [right=of nfa] {\bf DFAs}; +\node (mdfa) [right=of dfa] {\bf\begin{tabular}{c}minimal\\ DFAs\end{tabular}}; +\path[->,line width=1mm] (rexp) edge node [above=4mm, black] {\begin{tabular}{c@{\hspace{9mm}}}Thompson's\\[-1mm] construction\end{tabular}} (nfa); +\path[->,line width=1mm] (nfa) edge node [above=4mm, black] {\begin{tabular}{c}subset\\[-1mm] construction\end{tabular}}(dfa); +\path[->,line width=1mm] (dfa) edge node [below=5mm, black] {minimisation} (mdfa); +\path[->,line width=1mm] (dfa) edge [bend left=45] (rexp); +\end{tikzpicture} +\end{center} + +\noindent By going from regular expressions over NFAs to DFAs, +we can always ensure that for every regular expression there +exists a NFA and DFA that can recognise the same language. +Although we did not prove this fact. Similarly by going from +DFAs to regular expressions, we can make sure for every DFA +there exists a regular expression that can recognise the same +language. Again we did not prove this fact. + +The interesting conclusion is that automata and regular +expressions can recognise the same set of languages: + +\begin{quote} A language is \emph{regular} iff there exists a +regular expression that recognises all its strings. +\end{quote} + +\noindent or equivalently + +\begin{quote} A language is \emph{regular} iff there exists an +automaton that recognises all its strings. +\end{quote} -\subsection*{Automata Minimization} +\noindent So for deciding whether a string is recognised by a +regular expression, we could use our algorithm based on +derivatives or NFAs or DFAs. But let us quickly look at what +the differences mean in computational terms. Translating a +regular expression into a NFA gives us an automaton that has +$O(n)$ nodes---that means the size of the NFA grows linearly +with the size of the regular expression. The problem with NFAs +is that the problem of deciding whether a string is accepted +is computationally not cheap. Remember with NFAs we have +potentially many next states even for the same input and also +have the silent $\epsilon$-transitions. If we want to find a +path from the starting state of an NFA to an accepting state, +we need to consider all possibilities. In Ruby and Python this +is done by a depth-first search, which in turn means that if a +``wrong'' choice is made, the algorithm has to backtrack and +thus explore all potential candidates. This is exactly the +reason why Ruby and Python are so slow for evil regular +expressions. The alternative is to explore the search space +in a breadth-first fashion, but this might incur a memory +penalty. -\subsection*{Regular Languages and Automata} +To avoid the problems with NFAs, we can translate them +into DFAs. With DFAs the problem of deciding whether a +string is recognised or not is much simpler, because in +each state it is completely determined what the next +state will be for a given input. So no search is needed. +The problem with this is that the translation to DFAs +can explode exponentially the number of states. Therefore when +this route is taken, we definitely need to minimise the +resulting DFAs in order to have an acceptable memory +and runtime behaviour. + +But this does not mean that everything is bad with automata. +Recall the problem of finding a regular expressions for the +language that is \emph{not} recognised by a regular +expression. In our implementation we added explicitly such a +regular expressions because they are useful for recognising +comments. But in principle we did not need to. The argument +for this is as follows: take a regular expression, translate +it into a NFA and DFA that recognise the same language. Once +you have the DFA it is very easy to construct the automaton +for the language not recognised by an DFA. If the DAF is +completed (this is important!), then you just need to exchange +the accepting and non-accepting states. You can then translate +this DFA back into a regular expression. \end{document}