afl-material: comparison handouts/ho03.tex

equal deleted inserted replaced

-:539b2e88f5b9
+:408fd5994288
 \subsubsection*{Subset Construction}
 What is interesting is that for every NFA we can find a DFA
 which recognises the same language. This can, for example, be
 done by the \emph{subset construction}. Consider again the NFA
-on the left, consisting of nodes labeled $0$, $1$ and $2$.
+below on the left, consisting of nodes labeled $0$, $1$ and $2$.
 \begin{center}
 \begin{tabular}{c@{\hspace{10mm}}c}
 \begin{tikzpicture}[scale=0.7,>=stealth',very thick,
 every state/.style={minimum size=0pt,
 \end{tikzpicture}
 &
 \begin{tabular}{r|cl}
 nodes & $a$ & $b$\\
 \hline
-$\varnothing\phantom{\star}$ & $\varnothing$ & $\varnothing$\\
+$\{\}\phantom{\star}$ & $\{\}$ & $\{\}$\\
 $\{0\}\phantom{\star}$       & $\{0,1,2\}$   & $\{2\}$\\
-$\{1\}\phantom{\star}$       & $\{1\}$       & $\varnothing$\\
+$\{1\}\phantom{\star}$       & $\{1\}$       & $\{\}$\\
-$\{2\}\star$  & $\varnothing$ & $\{2\}$\\
+$\{2\}\star$  & $\{\}$ & $\{2\}$\\
 $\{0,1\}\phantom{\star}$     & $\{0,1,2\}$   & $\{2\}$\\
 $\{0,2\}\star$ & $\{0,1,2\}$   & $\{2\}$\\
 $\{1,2\}\star$ & $\{1\}$       & $\{2\}$\\
 s: $\{0,1,2\}\star$ & $\{0,1,2\}$ & $\{2\}$\\
 \end{tabular}
 \end{center}
 \noindent The nodes of the DFA are given by calculating all
 subsets of the set of nodes of the NFA (seen in the nodes
 column on the right). The table shows the transition function
-for the DFA. The first row states that $\varnothing$ is the
+for the DFA. The first row states that $\{\}$ is the
 sink node which has transitions for $a$ and $b$ to itself.
 The next three lines are calculated as follows:
 \begin{itemize}
 \item suppose you calculate the entry for the transition for
 set of nodes, in this case $\{0,1,2\}$
 \item filter out all notes that do not allow an $a$-transition
 from this set, this excludes $2$ which does not permit a
 $a$-transition
 \item from the remaining set, do as many $\epsilon$-transition
-as you can, this yields $\{0,1,2\}$
+as you can, this yields again $\{0,1,2\}$
 \item the resulting set specifies the transition from $\{0\}$
 when given an $a$
 \end{itemize}
-\noindent Similarly for the other entries in the rows for
+\noindent So the transition from the state $\{0\}$ reading an
-$\{0\}$, $\{1\}$ and $\{2\}$. The other rows are calculated by
+$a$ goes to the state $\{0,1,2\}$. Similarly for the other
-just taking the union of the single node entries. For example
+entries in the rows for $\{0\}$, $\{1\}$ and $\{2\}$. The
-for $a$ and $\{0,1\}$ we need to take the union of $\{0,1,2\}$
+other rows are calculated by just taking the union of the
-(for $0$) and $\{1\}$ (for $1$). The starting state of the DFA
+single node entries. For example for $a$ and $\{0,1\}$ we need
-can be calculated from the starting state of the NFA, that is
+to take the union of $\{0,1,2\}$ (for $0$) and $\{1\}$ (for
-$0$, and then do as many $\epsilon$-transitions as possible.
+$1$). The starting state of the DFA can be calculated from the
-This gives $\{0,1,2\}$ which is the starting state of the DFA.
+starting state of the NFA, that is $0$, and then do as many
-The terminal states in the DFA are given by all sets that
+$\epsilon$-transitions as possible. This gives $\{0,1,2\}$
-contain a $2$, which is the terminal state of the NFA. This
+which is the starting state of the DFA. The terminal states in
-completes the subset construction.
+the DFA are given by all sets that contain a $2$, which is the
+terminal state of the NFA. This completes the subset
+construction. So the corresponding DFA to the NFA from
+above is
+\begin{center}
+\begin{tikzpicture}[scale=0.7,>=stealth',very thick,
+every state/.style={minimum size=0pt,
+draw=blue!50,very thick,fill=blue!20},
+baseline=0mm]
+\node[state,initial,accepting]  (q012)  {$0,1,2$};
+\node[state,accepting] (q02) [right=of q012] {$0,2$};
+\node[state] (q01) [above=of q02] {$0,1$};
+\node[state,accepting] (q12) [below=of q02] {$1,2$};
+\node[state] (q0) [right=2cm of q01] {$0$};
+\node[state] (q1) [right=2.5cm of q02] {$1$};
+\node[state,accepting] (q2) [right=1.5cm of q12] {$2$};
+\node[state] (qn) [right=of q1] {$\{\}$};
+\path[->] (q012) edge [loop below] node  {$a$} ();
+\path[->] (q012) edge node [above]  {$b$} (q2);
+\path[->] (q12) edge [bend left] node [below,pos=0.4]  {$a$} (q1);
+\path[->] (q12) edge node [below]  {$b$} (q2);
+\path[->] (q02) edge node [above]  {$a$} (q012);
+\path[->] (q02) edge [bend left] node [above, pos=0.8]  {$b$} (q2);
+\path[->] (q01) edge node [below]  {$a$} (q012);
+\path[->] (q01) edge [bend left] node [above]  {$b$} (q2);
+\path[->] (q0) edge node [below]  {$a$} (q012);
+\path[->] (q0) edge node [right, pos=0.2]  {$b$} (q2);
+\path[->] (q1) edge [loop above] node  {$a$} ();
+\path[->] (q1) edge node [above]  {$b$} (qn);
+\path[->] (q2) edge [loop right] node  {$b$} ();
+\path[->] (q2) edge node [below]  {$a$} (qn);
+\path[->] (qn) edge [loop above] node  {$a,b$} ();
+\end{tikzpicture}
+\end{center}
 There are two points to note: One is that very often the
 resulting DFA contains a number of ``dead'' nodes that are
-never reachable from the starting state (that is that the
+never reachable from the starting state. For example
-calculated DFA in this example is not a minimal DFA). Such
+there is no way to reach node $\{0,2\}$ from the starting
+state $\{0,1,2\}$. I let you find the other dead states.
+In effect the DFA in this example is not a minimal DFA. Such
 dead nodes can be safely removed without changing the language
 that is recognised by the DFA. Another point is that in some
 cases, however, the subset construction produces a DFA that
 does \emph{not} contain any dead nodes\ldots{}that means it
 calculates a minimal DFA. Which in turn means that in some
 cases the number of nodes by going from NFAs to DFAs
 exponentially increases, namely by $2^n$ (which is the number
 of subsets you can form for $n$ nodes).
+Removing all the dead states in the automaton above,
+gives a much more legible automaton, namely
+\begin{center}
+\begin{tikzpicture}[scale=0.7,>=stealth',very thick,
+every state/.style={minimum size=0pt,
+draw=blue!50,very thick,fill=blue!20},
+baseline=0mm]
+\node[state,initial,accepting]  (q012)  {$0,1,2$};
+\node[state,accepting] (q2) [right=of q012] {$2$};
+\node[state] (qn) [right=of q2] {$\{\}$};
+\path[->] (q012) edge [loop below] node  {$a$} ();
+\path[->] (q012) edge node [above]  {$b$} (q2);
+\path[->] (q2) edge [loop below] node  {$b$} ();
+\path[->] (q2) edge node [below]  {$a$} (qn);
+\path[->] (qn) edge [loop above] node  {$a,b$} ();
+\end{tikzpicture}
+\end{center}
+\noindent Now the big question is whether this DFA
+can recognise the same language as the NFA we started with.
+I let you ponder about this question.
 \subsubsection*{Brzozowski's Method}
 As said before, we can also go into the other direction---from
 DFAs to regular expressions. Brzozowski's method calculates
 \node (dfa) [right=of nfa] {\bf DFAs};
 \node (mdfa) [right=of dfa] {\bf\begin{tabular}{c}minimal\\ DFAs\end{tabular}};
 \path[->,line width=1mm] (rexp) edge node [above=4mm, black] {\begin{tabular}{c@{\hspace{9mm}}}Thompson's\\[-1mm] construction\end{tabular}} (nfa);
 \path[->,line width=1mm] (nfa) edge node [above=4mm, black] {\begin{tabular}{c}subset\\[-1mm] construction\end{tabular}}(dfa);
 \path[->,line width=1mm] (dfa) edge node [below=5mm, black] {minimisation} (mdfa);
-\path[->,line width=1mm] (dfa) edge [bend left=45] (rexp);
+\path[->,line width=1mm] (dfa) edge [bend left=45] node [below] {\begin{tabular}{l}Brzozowski's\\ method\end{tabular}} (rexp);
 \end{tikzpicture}
 \end{center}
 \noindent By going from regular expressions over NFAs to DFAs,
 we can always ensure that for every regular expression there
 the differences mean in computational terms. Translating a
 regular expression into a NFA gives us an automaton that has
 $O(n)$ nodes---that means the size of the NFA grows linearly
 with the size of the regular expression. The problem with NFAs
 is that the problem of deciding whether a string is accepted
-is computationally not cheap. Remember with NFAs we have
+or not is computationally not cheap. Remember with NFAs we have
 potentially many next states even for the same input and also
 have the silent $\epsilon$-transitions. If we want to find a
 path from the starting state of an NFA to an accepting state,
 we need to consider all possibilities. In Ruby and Python this
 is done by a depth-first search, which in turn means that if a
 ``wrong'' choice is made, the algorithm has to backtrack and
 thus explore all potential candidates. This is exactly the
 reason why Ruby and Python are so slow for evil regular
 expressions. The alternative is to explore the search space
-in a breadth-first fashion, but this might incur a memory
+in a breadth-first fashion, but this might incur a big memory
 penalty.
 To avoid the problems with NFAs, we can translate them
 into DFAs. With DFAs the problem of deciding whether a
 string is recognised or not is much simpler, because in
 state will be for a given input. So no search is needed.
 The problem with this is that the translation to DFAs
 can explode exponentially the number of states. Therefore when
 this route is taken, we definitely need to minimise the
 resulting DFAs in order to have an acceptable memory
-and runtime behaviour.
+and runtime behaviour. But remember the subset construction
+in the worst case explodes the number of states by $2^n$.
 But this does not mean that everything is bad with automata.
 Recall the problem of finding a regular expressions for the
 language that is \emph{not} recognised by a regular
 expression. In our implementation we added explicitly such a
 expression for this language and also not an automaton. One
 can actually prove that there is no regular expression nor
 automaton for this language, but again that would lead us too
 far afield for what we want to do in this module.
-\section{Further Reading}
+\section*{Further Reading}
 Compare what a ``human expert'' would create as automaton for the
 regular expression $a (b + c)^*$ and what the Thomson
 algorithm generates.

changeset 344	408fd5994288
parent 333	8890852e18b7
child 349	434891622131