438 for $a$ and $\{0,1\}$ we need to take the union of $\{0,1,2\}$ |
438 for $a$ and $\{0,1\}$ we need to take the union of $\{0,1,2\}$ |
439 (for $0$) and $\{1\}$ (for $1$). The starting state of the DFA |
439 (for $0$) and $\{1\}$ (for $1$). The starting state of the DFA |
440 can be calculated from the starting state of the NFA, that is |
440 can be calculated from the starting state of the NFA, that is |
441 $0$, and then do as many $\epsilon$-transitions as possible. |
441 $0$, and then do as many $\epsilon$-transitions as possible. |
442 This gives $\{0,1,2\}$ which is the starting state of the DFA. |
442 This gives $\{0,1,2\}$ which is the starting state of the DFA. |
443 One terminal states in the DFA are given by all sets that |
443 The terminal states in the DFA are given by all sets that |
444 contain a $2$, which is the terminal state of the NFA. This |
444 contain a $2$, which is the terminal state of the NFA. This |
445 completes the subset construction. |
445 completes the subset construction. |
446 |
446 |
447 There are two points to note: One is that the resulting DFA |
447 There are two points to note: One is that very often the |
448 contains a number of ``dead'' nodes that are never reachable |
448 resulting DFA contains a number of ``dead'' nodes that are |
449 from the starting state (that is that the calculated DFA in |
449 never reachable from the starting state (that is that the |
450 this example is not a minimal DFA). Such dead nodes can be |
450 calculated DFA in this example is not a minimal DFA). Such |
451 safely removed without changing the language that is |
451 dead nodes can be safely removed without changing the language |
452 recognised by the DFA. Another point is that in some cases the |
452 that is recognised by the DFA. Another point is that in some |
453 subset construction produces a DFA that does \emph{not} |
453 cases, however, the subset construction produces a DFA that |
454 contain any dead nodes\ldots{}that means it calculates a |
454 does \emph{not} contain any dead nodes\ldots{}that means it |
455 minimal DFA. Which in turn means that in some cases the number |
455 calculates a minimal DFA. Which in turn means that in some |
456 of nodes by going from NFAs to DFAs exponentially increases, |
456 cases the number of nodes by going from NFAs to DFAs |
457 namely by $2^n$ (which is the number of subsets you can form |
457 exponentially increases, namely by $2^n$ (which is the number |
458 for $n$ nodes). |
458 of subsets you can form for $n$ nodes). |
459 |
459 |
460 |
460 |
461 \subsubsection*{Brzozowski's Method} |
461 \subsubsection*{Brzozowski's Method} |
462 |
462 |
463 DFA -> NFA |
463 As said before, we can also go into the other direction---from |
|
464 DFAs to regular expressions. Brzozowski's method calculates |
|
465 a regular expression using familiar transformations for |
|
466 solving equational systems. Consider the DFA: |
|
467 |
|
468 \begin{center} |
|
469 \begin{tikzpicture}[scale=1.5,>=stealth',very thick,auto, |
|
470 every state/.style={minimum size=0pt, |
|
471 inner sep=2pt,draw=blue!50,very thick, |
|
472 fill=blue!20}] |
|
473 \node[state, initial] (q0) at ( 0,1) {$q_0$}; |
|
474 \node[state] (q1) at ( 1,1) {$q_1$}; |
|
475 \node[state, accepting] (q2) at ( 2,1) {$q_2$}; |
|
476 \path[->] (q0) edge[bend left] node[above] {$a$} (q1) |
|
477 (q1) edge[bend left] node[above] {$b$} (q0) |
|
478 (q2) edge[bend left=50] node[below] {$b$} (q0) |
|
479 (q1) edge node[above] {$a$} (q2) |
|
480 (q2) edge [loop right] node {$a$} () |
|
481 (q0) edge [loop below] node {$b$} (); |
|
482 \end{tikzpicture} |
|
483 \end{center} |
|
484 |
|
485 \noindent for which we can set up the following equational |
|
486 system |
|
487 |
|
488 \begin{eqnarray} |
|
489 q_0 & = & \epsilon + q_0\,b + q_1\,b + q_2\,b\\ |
|
490 q_1 & = & q_0\,a\\ |
|
491 q_2 & = & q_1\,a + q_2\,a |
|
492 \end{eqnarray} |
|
493 |
|
494 \noindent There is an equation for each node in the DFA. Let |
|
495 us have a look how the right-hand sides of the equations are |
|
496 constructed. First have a look at the second equation: the |
|
497 left-hand side is $q_1$ and the right-hand side $q_0\,a$. The |
|
498 right-hand side is essentially all possible ways how to end up |
|
499 in $q_1$. There is only one incoming edge from $q_0$ consuming |
|
500 an $a$. Therefore we say: if we are in $q_0$ consuming an $a$ |
|
501 then we end up in $q_1$. Therefore the right hand side is |
|
502 state followed by character---in this case $q_0\,a$. Now lets |
|
503 have a look at the third equation: there are two incoming |
|
504 edges. Therefore we have two terms, namely $q_1\,a$ and |
|
505 $q_2\,a$. These terms are separated by $+$. The first states |
|
506 that if in state $q_1$ consuming an $a$ will bring you to |
|
507 $q_2$, and the secont that being in $q_2$ and consuming an $a$ |
|
508 will make you stay in $q_2$. The right-hand side of the |
|
509 first equation is constructed similarly: there are three |
|
510 incoming edges, therefore there are three terms. There is |
|
511 one exception in that we also ``add'' $\epsilon$ to the |
|
512 first equation, because it corresponds to the starting state |
|
513 in the DFA. |
|
514 |
|
515 Having constructed the equational system, the question is |
|
516 how to solve it? Remarkably the rules are very similar to |
|
517 solving usual linear equational systems. For example the |
|
518 second equation does not contain the variable $q_1$ on the |
|
519 right-hand side of the equation. We can therefore eliminate |
|
520 $q_1$ from the system by just substituting this equation |
|
521 into the other two. This gives |
|
522 |
|
523 \begin{eqnarray} |
|
524 q_0 & = & \epsilon + q_0\,b + q_0\,a\,b + q_2\,b\\ |
|
525 q_2 & = & q_0\,a\,a + q_2\,a |
|
526 \end{eqnarray} |
|
527 |
|
528 \noindent where in Equation (4) we have two occurences |
|
529 of $q_0$. Like the laws about $+$ and $\cdot$, we can simplify |
|
530 Equation (4) to obtain the following two equations: |
|
531 |
|
532 \begin{eqnarray} |
|
533 q_0 & = & \epsilon + q_0\,(b + a\,b) + q_2\,b\\ |
|
534 q_2 & = & q_0\,a\,a + q_2\,a |
|
535 \end{eqnarray} |
|
536 |
|
537 \noindent Unfortunately we cannot make any more progress with |
|
538 substituting equations, because both (6) and (7) contain the |
|
539 variable on the left-hand side also on the right-hand side. |
|
540 Here we need to now use a law that is different from the usual |
|
541 laws. It is called \emph{Arden's rule}. It states that |
|
542 if an equation is of the form $q = q\,r + s$ then it can be |
|
543 transformed to $q = s\, r^*$. Since we can assume $+$ is |
|
544 symmetric, equation (7) is of that form: $s$ is $q_0\,a\,a$ |
|
545 and $r$ is $a$. That means we can transform Equation (7) |
|
546 to obtain the two new equations |
|
547 |
|
548 \begin{eqnarray} |
|
549 q_0 & = & \epsilon + q_0\,(b + a\,b) + q_2\,b\\ |
|
550 q_2 & = & q_0\,a\,a\,(a^*) |
|
551 \end{eqnarray} |
|
552 |
|
553 \noindent Now again we can substitute the second equation into |
|
554 the first in order to eliminate the variable $q_2$. |
|
555 |
|
556 \begin{eqnarray} |
|
557 q_0 & = & \epsilon + q_0\,(b + a\,b) + q_0\,a\,a\,(a^*)\,b |
|
558 \end{eqnarray} |
|
559 |
|
560 \noindent Pulling $q_0$ out as a single factor gives: |
|
561 |
|
562 \begin{eqnarray} |
|
563 q_0 & = & \epsilon + q_0\,(b + a\,b + a\,a\,(a^*)\,b) |
|
564 \end{eqnarray} |
|
565 |
|
566 \noindent This equation is again of the form so that we can |
|
567 apply Arden's rule ($r$ is $b + a\,b + a\,a\,(a^*)\,b$ and $s$ |
|
568 is $\epsilon$). This gives as solution for $q_0$ the following |
|
569 regular expression: |
|
570 |
|
571 \begin{eqnarray} |
|
572 q_0 & = & \epsilon\,(b + a\,b + a\,a\,(a^*)\,b)^* |
|
573 \end{eqnarray} |
|
574 |
|
575 \noindent SInce this is a regular expression, we can simplify |
|
576 away the $\epsilon$ to obtain the slightly simpler regular |
|
577 expression |
|
578 |
|
579 \begin{eqnarray} |
|
580 q_0 & = & (b + a\,b + a\,a\,(a^*)\,b)^* |
|
581 \end{eqnarray} |
|
582 |
|
583 \noindent |
|
584 Now we can unwind this process and obtain the solutions |
|
585 for the other equations. This gives: |
|
586 |
|
587 \begin{eqnarray} |
|
588 q_0 & = & (b + a\,b + a\,a\,(a^*)\,b)^*\\ |
|
589 q_1 & = & (b + a\,b + a\,a\,(a^*)\,b)^*\,a\\ |
|
590 q_2 & = & (b + a\,b + a\,a\,(a^*)\,b)^*\,a\,a\,(a)^* |
|
591 \end{eqnarray} |
|
592 |
|
593 \noindent Finally, we only need to ``add'' up the equations |
|
594 which correspond to a terminal state. In our running example, |
|
595 this is just $q_2$. Consequently, a regular expression |
|
596 that recognises the same language as the automaton is |
|
597 |
|
598 \[ |
|
599 (b + a\,b + a\,a\,(a^*)\,b)^*\,a\,a\,(a)^* |
|
600 \] |
|
601 |
|
602 \noindent You can somewhat crosscheck your solution |
|
603 by taking a string the regular expression can match and |
|
604 and see whether it can be matched by the automaton. |
|
605 One string for example is $aaa$ and \emph{voila} this |
|
606 string is also matched by the automaton. |
|
607 |
|
608 We should prove that Brzozowski's method really produces |
|
609 an equivalent regular expression for the automaton. But |
|
610 for the purposes of this module, we omit this. |
464 |
611 |
465 \subsubsection*{Automata Minimization} |
612 \subsubsection*{Automata Minimization} |
466 |
613 |
467 As seen in the subset construction, the translation |
614 As seen in the subset construction, the translation |
468 of an NFA to a DFA can result in a rather ``inefficient'' |
615 of an NFA to a DFA can result in a rather ``inefficient'' |