diff -r 201c2c6d8696 -r 7ed2a25dd115 handouts/ho03.tex --- a/handouts/ho03.tex Tue Oct 28 06:01:00 2014 +0000 +++ b/handouts/ho03.tex Tue Oct 28 12:24:11 2014 +0000 @@ -113,7 +113,7 @@ the set of strings accepted by an automaton. -While with DFAs it will always clear that given a character +While with DFAs it will always be clear that given a character what the next state is (potentially none), it will be useful to relax this restriction. That means we have several potential successor states. We even allow ``silent @@ -374,10 +374,10 @@ \subsubsection*{Subset Construction} -What is interesting that for every NFA we can find a DFA which -recognises the same language. This can be done by the -\emph{subset construction}. Consider again the NFA on the -left, consisting of nodes labeled $0$, $1$ and $2$. +What is interesting is that for every NFA we can find a DFA +which recognises the same language. This can, for example, be +done by the \emph{subset construction}. Consider again the NFA +on the left, consisting of nodes labeled $0$, $1$ and $2$. \begin{center} \begin{tabular}{c@{\hspace{10mm}}c} @@ -440,27 +440,174 @@ can be calculated from the starting state of the NFA, that is $0$, and then do as many $\epsilon$-transitions as possible. This gives $\{0,1,2\}$ which is the starting state of the DFA. -One terminal states in the DFA are given by all sets that +The terminal states in the DFA are given by all sets that contain a $2$, which is the terminal state of the NFA. This completes the subset construction. -There are two points to note: One is that the resulting DFA -contains a number of ``dead'' nodes that are never reachable -from the starting state (that is that the calculated DFA in -this example is not a minimal DFA). Such dead nodes can be -safely removed without changing the language that is -recognised by the DFA. Another point is that in some cases the -subset construction produces a DFA that does \emph{not} -contain any dead nodes\ldots{}that means it calculates a -minimal DFA. Which in turn means that in some cases the number -of nodes by going from NFAs to DFAs exponentially increases, -namely by $2^n$ (which is the number of subsets you can form -for $n$ nodes). +There are two points to note: One is that very often the +resulting DFA contains a number of ``dead'' nodes that are +never reachable from the starting state (that is that the +calculated DFA in this example is not a minimal DFA). Such +dead nodes can be safely removed without changing the language +that is recognised by the DFA. Another point is that in some +cases, however, the subset construction produces a DFA that +does \emph{not} contain any dead nodes\ldots{}that means it +calculates a minimal DFA. Which in turn means that in some +cases the number of nodes by going from NFAs to DFAs +exponentially increases, namely by $2^n$ (which is the number +of subsets you can form for $n$ nodes). \subsubsection*{Brzozowski's Method} -DFA -> NFA +As said before, we can also go into the other direction---from +DFAs to regular expressions. Brzozowski's method calculates +a regular expression using familiar transformations for +solving equational systems. Consider the DFA: + +\begin{center} +\begin{tikzpicture}[scale=1.5,>=stealth',very thick,auto, + every state/.style={minimum size=0pt, + inner sep=2pt,draw=blue!50,very thick, + fill=blue!20}] + \node[state, initial] (q0) at ( 0,1) {$q_0$}; + \node[state] (q1) at ( 1,1) {$q_1$}; + \node[state, accepting] (q2) at ( 2,1) {$q_2$}; + \path[->] (q0) edge[bend left] node[above] {$a$} (q1) + (q1) edge[bend left] node[above] {$b$} (q0) + (q2) edge[bend left=50] node[below] {$b$} (q0) + (q1) edge node[above] {$a$} (q2) + (q2) edge [loop right] node {$a$} () + (q0) edge [loop below] node {$b$} (); +\end{tikzpicture} +\end{center} + +\noindent for which we can set up the following equational +system + +\begin{eqnarray} +q_0 & = & \epsilon + q_0\,b + q_1\,b + q_2\,b\\ +q_1 & = & q_0\,a\\ +q_2 & = & q_1\,a + q_2\,a +\end{eqnarray} + +\noindent There is an equation for each node in the DFA. Let +us have a look how the right-hand sides of the equations are +constructed. First have a look at the second equation: the +left-hand side is $q_1$ and the right-hand side $q_0\,a$. The +right-hand side is essentially all possible ways how to end up +in $q_1$. There is only one incoming edge from $q_0$ consuming +an $a$. Therefore we say: if we are in $q_0$ consuming an $a$ +then we end up in $q_1$. Therefore the right hand side is +state followed by character---in this case $q_0\,a$. Now lets +have a look at the third equation: there are two incoming +edges. Therefore we have two terms, namely $q_1\,a$ and +$q_2\,a$. These terms are separated by $+$. The first states +that if in state $q_1$ consuming an $a$ will bring you to +$q_2$, and the secont that being in $q_2$ and consuming an $a$ +will make you stay in $q_2$. The right-hand side of the +first equation is constructed similarly: there are three +incoming edges, therefore there are three terms. There is +one exception in that we also ``add'' $\epsilon$ to the +first equation, because it corresponds to the starting state +in the DFA. + +Having constructed the equational system, the question is +how to solve it? Remarkably the rules are very similar to +solving usual linear equational systems. For example the +second equation does not contain the variable $q_1$ on the +right-hand side of the equation. We can therefore eliminate +$q_1$ from the system by just substituting this equation +into the other two. This gives + +\begin{eqnarray} +q_0 & = & \epsilon + q_0\,b + q_0\,a\,b + q_2\,b\\ +q_2 & = & q_0\,a\,a + q_2\,a +\end{eqnarray} + +\noindent where in Equation (4) we have two occurences +of $q_0$. Like the laws about $+$ and $\cdot$, we can simplify +Equation (4) to obtain the following two equations: + +\begin{eqnarray} +q_0 & = & \epsilon + q_0\,(b + a\,b) + q_2\,b\\ +q_2 & = & q_0\,a\,a + q_2\,a +\end{eqnarray} + +\noindent Unfortunately we cannot make any more progress with +substituting equations, because both (6) and (7) contain the +variable on the left-hand side also on the right-hand side. +Here we need to now use a law that is different from the usual +laws. It is called \emph{Arden's rule}. It states that +if an equation is of the form $q = q\,r + s$ then it can be +transformed to $q = s\, r^*$. Since we can assume $+$ is +symmetric, equation (7) is of that form: $s$ is $q_0\,a\,a$ +and $r$ is $a$. That means we can transform Equation (7) +to obtain the two new equations + +\begin{eqnarray} +q_0 & = & \epsilon + q_0\,(b + a\,b) + q_2\,b\\ +q_2 & = & q_0\,a\,a\,(a^*) +\end{eqnarray} + +\noindent Now again we can substitute the second equation into +the first in order to eliminate the variable $q_2$. + +\begin{eqnarray} +q_0 & = & \epsilon + q_0\,(b + a\,b) + q_0\,a\,a\,(a^*)\,b +\end{eqnarray} + +\noindent Pulling $q_0$ out as a single factor gives: + +\begin{eqnarray} +q_0 & = & \epsilon + q_0\,(b + a\,b + a\,a\,(a^*)\,b) +\end{eqnarray} + +\noindent This equation is again of the form so that we can +apply Arden's rule ($r$ is $b + a\,b + a\,a\,(a^*)\,b$ and $s$ +is $\epsilon$). This gives as solution for $q_0$ the following +regular expression: + +\begin{eqnarray} +q_0 & = & \epsilon\,(b + a\,b + a\,a\,(a^*)\,b)^* +\end{eqnarray} + +\noindent SInce this is a regular expression, we can simplify +away the $\epsilon$ to obtain the slightly simpler regular +expression + +\begin{eqnarray} +q_0 & = & (b + a\,b + a\,a\,(a^*)\,b)^* +\end{eqnarray} + +\noindent +Now we can unwind this process and obtain the solutions +for the other equations. This gives: + +\begin{eqnarray} +q_0 & = & (b + a\,b + a\,a\,(a^*)\,b)^*\\ +q_1 & = & (b + a\,b + a\,a\,(a^*)\,b)^*\,a\\ +q_2 & = & (b + a\,b + a\,a\,(a^*)\,b)^*\,a\,a\,(a)^* +\end{eqnarray} + +\noindent Finally, we only need to ``add'' up the equations +which correspond to a terminal state. In our running example, +this is just $q_2$. Consequently, a regular expression +that recognises the same language as the automaton is + +\[ +(b + a\,b + a\,a\,(a^*)\,b)^*\,a\,a\,(a)^* +\] + +\noindent You can somewhat crosscheck your solution +by taking a string the regular expression can match and +and see whether it can be matched by the automaton. +One string for example is $aaa$ and \emph{voila} this +string is also matched by the automaton. + +We should prove that Brzozowski's method really produces +an equivalent regular expression for the automaton. But +for the purposes of this module, we omit this. \subsubsection*{Automata Minimization} @@ -694,7 +841,18 @@ the accepting and non-accepting states. You can then translate this DFA back into a regular expression. -Not all languages are regular\ldots{} +Not all languages are regular. The most well-known example +of a language that is not regular consists of all the strings +of the form + +\[a^n\,b^n\] + +\noindent meaning strings that have the same number of $a$s +and $b$s. You can try, but you cannot find a regular +expression for this language and also not an automaton. One +can actually prove that there is no regular expression nor +automaton for this language, but again that would lead us too +far afield for what we want to do in this module. \end{document}