handouts/ho03.tex
changeset 269 83e6cb90216d
parent 268 18bef085a7ca
child 270 4dbeaf43031d
--- a/handouts/ho03.tex	Sat Oct 11 01:13:13 2014 +0100
+++ b/handouts/ho03.tex	Sat Oct 11 13:09:15 2014 +0100
@@ -180,7 +180,7 @@
 silently move either to $q_1$ or $q_2$.
 
 
-\subsection*{Thompson Construction}
+\subsubsection*{Thompson Construction}
 
 The reason for introducing NFAs is that there is a relatively
 simple (recursive) translation of regular expressions into
@@ -372,7 +372,7 @@
 was invented by Ken Thompson in 1968.
 
 
-\subsection*{Subset Construction}
+\subsubsection*{Subset Construction}
 
 What is interesting that for every NFA we can find a DFA which
 recognises the same language. This can be done by the
@@ -458,11 +458,96 @@
 for $n$ nodes). 
 
 
-\subsection*{Brzozowski's Method}
+\subsubsection*{Brzozowski's Method}
+
+DFA -> NFA
+
+\subsubsection*{Automata Minimization}
+
+\subsubsection*{Regular Languages}
+
+Given the constructions in the previous sections we obtain 
+the following picture:
+
+\begin{center}
+\begin{tikzpicture}
+\node (rexp)  {\bf Regexps};
+\node (nfa) [right=of rexp] {\bf NFAs};
+\node (dfa) [right=of nfa] {\bf DFAs};
+\node (mdfa) [right=of dfa] {\bf\begin{tabular}{c}minimal\\ DFAs\end{tabular}};
+\path[->,line width=1mm] (rexp) edge node [above=4mm, black] {\begin{tabular}{c@{\hspace{9mm}}}Thompson's\\[-1mm] construction\end{tabular}} (nfa);
+\path[->,line width=1mm] (nfa) edge node [above=4mm, black] {\begin{tabular}{c}subset\\[-1mm] construction\end{tabular}}(dfa);
+\path[->,line width=1mm] (dfa) edge node [below=5mm, black] {minimisation} (mdfa);
+\path[->,line width=1mm] (dfa) edge [bend left=45] (rexp);
+\end{tikzpicture}
+\end{center}
+
+\noindent By going from regular expressions over NFAs to DFAs,
+we can always ensure that for every regular expression there
+exists a NFA and DFA that can recognise the same language.
+Although we did not prove this fact. Similarly by going from
+DFAs to regular expressions, we can make sure for every DFA 
+there exists a regular expression that can recognise the same
+language. Again we did not prove this fact. 
+
+The interesting conclusion is that automata and regular 
+expressions can recognise the same set of languages:
+
+\begin{quote} A language is \emph{regular} iff there exists a
+regular expression that recognises all its strings.
+\end{quote}
+
+\noindent or equivalently 
+
+\begin{quote} A language is \emph{regular} iff there exists an
+automaton that recognises all its strings.
+\end{quote}
 
-\subsection*{Automata Minimization}
+\noindent So for deciding whether a string is recognised by a
+regular expression, we could use our algorithm based on
+derivatives or NFAs or DFAs. But let us quickly look at what
+the differences mean in computational terms. Translating a
+regular expression into a NFA gives us an automaton that has
+$O(n)$ nodes---that means the size of the NFA grows linearly
+with the size of the regular expression. The problem with NFAs
+is that the problem of deciding whether a string is accepted
+is computationally not cheap. Remember with NFAs we have
+potentially many next states even for the same input and also
+have the silent $\epsilon$-transitions. If we want to find a
+path from the starting state of an NFA to an accepting state,
+we need to consider all possibilities. In Ruby and Python this
+is done by a depth-first search, which in turn means that if a
+``wrong'' choice is made, the algorithm has to backtrack and
+thus explore all potential candidates. This is exactly the
+reason why Ruby and Python are so slow for evil regular
+expressions. The alternative is to explore the search space
+in a breadth-first fashion, but this might incur a memory
+penalty.  
 
-\subsection*{Regular Languages and Automata}
+To avoid the problems with NFAs, we can translate them 
+into DFAs. With DFAs the problem of deciding whether a
+string is recognised or not is much simpler, because in
+each state it is completely determined what the next
+state will be for a given input. So no search is needed.
+The problem with this is that the translation to DFAs
+can explode exponentially the number of states. Therefore when 
+this route is taken, we definitely need to minimise the
+resulting DFAs in order to have an acceptable memory 
+and runtime behaviour.
+
+But this does not mean that everything is bad with automata.
+Recall the problem of finding a regular expressions for the
+language that is \emph{not} recognised by a regular
+expression. In our implementation we added explicitly such a
+regular expressions because they are useful for recognising
+comments. But in principle we did not need to. The argument
+for this is as follows: take a regular expression, translate
+it into a NFA and DFA that recognise the same language. Once
+you have the DFA it is very easy to construct the automaton
+for the language not recognised by an DFA. If the DAF is
+completed (this is important!), then you just need to exchange
+the accepting and non-accepting states. You can then translate
+this DFA back into a regular expression. 
 
 \end{document}