425 set of nodes, in this case $\{0,1,2\}$ |
425 set of nodes, in this case $\{0,1,2\}$ |
426 \item filter out all notes that do not allow an $a$-transition |
426 \item filter out all notes that do not allow an $a$-transition |
427 from this set, this excludes $2$ which does not permit a |
427 from this set, this excludes $2$ which does not permit a |
428 $a$-transition |
428 $a$-transition |
429 \item from the remaining set, do as many $\epsilon$-transition |
429 \item from the remaining set, do as many $\epsilon$-transition |
430 as you can, this yields $\{0,1,2\}$ |
430 as you can, this yields again $\{0,1,2\}$ |
431 \item the resulting set specifies the transition from $\{0\}$ |
431 \item the resulting set specifies the transition from $\{0\}$ |
432 when given an $a$ |
432 when given an $a$ |
433 \end{itemize} |
433 \end{itemize} |
434 |
434 |
435 \noindent Similarly for the other entries in the rows for |
435 \noindent So the transition from the state $\{0\}$ reading an |
436 $\{0\}$, $\{1\}$ and $\{2\}$. The other rows are calculated by |
436 $a$ goes to the state $\{0,1,2\}$. Similarly for the other |
437 just taking the union of the single node entries. For example |
437 entries in the rows for $\{0\}$, $\{1\}$ and $\{2\}$. The |
438 for $a$ and $\{0,1\}$ we need to take the union of $\{0,1,2\}$ |
438 other rows are calculated by just taking the union of the |
439 (for $0$) and $\{1\}$ (for $1$). The starting state of the DFA |
439 single node entries. For example for $a$ and $\{0,1\}$ we need |
440 can be calculated from the starting state of the NFA, that is |
440 to take the union of $\{0,1,2\}$ (for $0$) and $\{1\}$ (for |
441 $0$, and then do as many $\epsilon$-transitions as possible. |
441 $1$). The starting state of the DFA can be calculated from the |
442 This gives $\{0,1,2\}$ which is the starting state of the DFA. |
442 starting state of the NFA, that is $0$, and then do as many |
443 The terminal states in the DFA are given by all sets that |
443 $\epsilon$-transitions as possible. This gives $\{0,1,2\}$ |
444 contain a $2$, which is the terminal state of the NFA. This |
444 which is the starting state of the DFA. The terminal states in |
445 completes the subset construction. |
445 the DFA are given by all sets that contain a $2$, which is the |
|
446 terminal state of the NFA. This completes the subset |
|
447 construction. So the corresponding DFA to the NFA from |
|
448 above is |
|
449 |
|
450 \begin{center} |
|
451 \begin{tikzpicture}[scale=0.7,>=stealth',very thick, |
|
452 every state/.style={minimum size=0pt, |
|
453 draw=blue!50,very thick,fill=blue!20}, |
|
454 baseline=0mm] |
|
455 \node[state,initial,accepting] (q012) {$0,1,2$}; |
|
456 \node[state,accepting] (q02) [right=of q012] {$0,2$}; |
|
457 \node[state] (q01) [above=of q02] {$0,1$}; |
|
458 \node[state,accepting] (q12) [below=of q02] {$1,2$}; |
|
459 \node[state] (q0) [right=2cm of q01] {$0$}; |
|
460 \node[state] (q1) [right=2.5cm of q02] {$1$}; |
|
461 \node[state,accepting] (q2) [right=1.5cm of q12] {$2$}; |
|
462 \node[state] (qn) [right=of q1] {$\{\}$}; |
|
463 |
|
464 \path[->] (q012) edge [loop below] node {$a$} (); |
|
465 \path[->] (q012) edge node [above] {$b$} (q2); |
|
466 \path[->] (q12) edge [bend left] node [below,pos=0.4] {$a$} (q1); |
|
467 \path[->] (q12) edge node [below] {$b$} (q2); |
|
468 \path[->] (q02) edge node [above] {$a$} (q012); |
|
469 \path[->] (q02) edge [bend left] node [above, pos=0.8] {$b$} (q2); |
|
470 \path[->] (q01) edge node [below] {$a$} (q012); |
|
471 \path[->] (q01) edge [bend left] node [above] {$b$} (q2); |
|
472 \path[->] (q0) edge node [below] {$a$} (q012); |
|
473 \path[->] (q0) edge node [right, pos=0.2] {$b$} (q2); |
|
474 \path[->] (q1) edge [loop above] node {$a$} (); |
|
475 \path[->] (q1) edge node [above] {$b$} (qn); |
|
476 \path[->] (q2) edge [loop right] node {$b$} (); |
|
477 \path[->] (q2) edge node [below] {$a$} (qn); |
|
478 \path[->] (qn) edge [loop above] node {$a,b$} (); |
|
479 \end{tikzpicture} |
|
480 \end{center} |
|
481 |
|
482 |
446 |
483 |
447 There are two points to note: One is that very often the |
484 There are two points to note: One is that very often the |
448 resulting DFA contains a number of ``dead'' nodes that are |
485 resulting DFA contains a number of ``dead'' nodes that are |
449 never reachable from the starting state (that is that the |
486 never reachable from the starting state. For example |
450 calculated DFA in this example is not a minimal DFA). Such |
487 there is no way to reach node $\{0,2\}$ from the starting |
|
488 state $\{0,1,2\}$. I let you find the other dead states. |
|
489 In effect the DFA in this example is not a minimal DFA. Such |
451 dead nodes can be safely removed without changing the language |
490 dead nodes can be safely removed without changing the language |
452 that is recognised by the DFA. Another point is that in some |
491 that is recognised by the DFA. Another point is that in some |
453 cases, however, the subset construction produces a DFA that |
492 cases, however, the subset construction produces a DFA that |
454 does \emph{not} contain any dead nodes\ldots{}that means it |
493 does \emph{not} contain any dead nodes\ldots{}that means it |
455 calculates a minimal DFA. Which in turn means that in some |
494 calculates a minimal DFA. Which in turn means that in some |
456 cases the number of nodes by going from NFAs to DFAs |
495 cases the number of nodes by going from NFAs to DFAs |
457 exponentially increases, namely by $2^n$ (which is the number |
496 exponentially increases, namely by $2^n$ (which is the number |
458 of subsets you can form for $n$ nodes). |
497 of subsets you can form for $n$ nodes). |
459 |
498 |
|
499 Removing all the dead states in the automaton above, |
|
500 gives a much more legible automaton, namely |
|
501 |
|
502 \begin{center} |
|
503 \begin{tikzpicture}[scale=0.7,>=stealth',very thick, |
|
504 every state/.style={minimum size=0pt, |
|
505 draw=blue!50,very thick,fill=blue!20}, |
|
506 baseline=0mm] |
|
507 \node[state,initial,accepting] (q012) {$0,1,2$}; |
|
508 \node[state,accepting] (q2) [right=of q012] {$2$}; |
|
509 \node[state] (qn) [right=of q2] {$\{\}$}; |
|
510 |
|
511 \path[->] (q012) edge [loop below] node {$a$} (); |
|
512 \path[->] (q012) edge node [above] {$b$} (q2); |
|
513 \path[->] (q2) edge [loop below] node {$b$} (); |
|
514 \path[->] (q2) edge node [below] {$a$} (qn); |
|
515 \path[->] (qn) edge [loop above] node {$a,b$} (); |
|
516 \end{tikzpicture} |
|
517 \end{center} |
|
518 |
|
519 \noindent Now the big question is whether this DFA |
|
520 can recognise the same language as the NFA we started with. |
|
521 I let you ponder about this question. |
460 |
522 |
461 \subsubsection*{Brzozowski's Method} |
523 \subsubsection*{Brzozowski's Method} |
462 |
524 |
463 As said before, we can also go into the other direction---from |
525 As said before, we can also go into the other direction---from |
464 DFAs to regular expressions. Brzozowski's method calculates |
526 DFAs to regular expressions. Brzozowski's method calculates |
800 the differences mean in computational terms. Translating a |
862 the differences mean in computational terms. Translating a |
801 regular expression into a NFA gives us an automaton that has |
863 regular expression into a NFA gives us an automaton that has |
802 $O(n)$ nodes---that means the size of the NFA grows linearly |
864 $O(n)$ nodes---that means the size of the NFA grows linearly |
803 with the size of the regular expression. The problem with NFAs |
865 with the size of the regular expression. The problem with NFAs |
804 is that the problem of deciding whether a string is accepted |
866 is that the problem of deciding whether a string is accepted |
805 is computationally not cheap. Remember with NFAs we have |
867 or not is computationally not cheap. Remember with NFAs we have |
806 potentially many next states even for the same input and also |
868 potentially many next states even for the same input and also |
807 have the silent $\epsilon$-transitions. If we want to find a |
869 have the silent $\epsilon$-transitions. If we want to find a |
808 path from the starting state of an NFA to an accepting state, |
870 path from the starting state of an NFA to an accepting state, |
809 we need to consider all possibilities. In Ruby and Python this |
871 we need to consider all possibilities. In Ruby and Python this |
810 is done by a depth-first search, which in turn means that if a |
872 is done by a depth-first search, which in turn means that if a |
811 ``wrong'' choice is made, the algorithm has to backtrack and |
873 ``wrong'' choice is made, the algorithm has to backtrack and |
812 thus explore all potential candidates. This is exactly the |
874 thus explore all potential candidates. This is exactly the |
813 reason why Ruby and Python are so slow for evil regular |
875 reason why Ruby and Python are so slow for evil regular |
814 expressions. The alternative is to explore the search space |
876 expressions. The alternative is to explore the search space |
815 in a breadth-first fashion, but this might incur a memory |
877 in a breadth-first fashion, but this might incur a big memory |
816 penalty. |
878 penalty. |
817 |
879 |
818 To avoid the problems with NFAs, we can translate them |
880 To avoid the problems with NFAs, we can translate them |
819 into DFAs. With DFAs the problem of deciding whether a |
881 into DFAs. With DFAs the problem of deciding whether a |
820 string is recognised or not is much simpler, because in |
882 string is recognised or not is much simpler, because in |