afl-material: comparison handouts/ho03.tex

equal deleted inserted replaced

-:201c2c6d8696
+:7ed2a25dd115
 \noindent I let you think about a definition that describes
 the set of strings accepted by an automaton.
-While with DFAs it will always clear that given a character
+While with DFAs it will always be clear that given a character
 what the next state is (potentially none), it will be useful
 to relax this restriction. That means we have several
 potential successor states. We even allow ``silent
 transitions'', also called epsilon-transitions. They allow us
 to go from one state to the next without having a character
 was invented by Ken Thompson in 1968.
 \subsubsection*{Subset Construction}
-What is interesting that for every NFA we can find a DFA which
+What is interesting is that for every NFA we can find a DFA
-recognises the same language. This can be done by the
+which recognises the same language. This can, for example, be
-\emph{subset construction}. Consider again the NFA on the
+done by the \emph{subset construction}. Consider again the NFA
-left, consisting of nodes labeled $0$, $1$ and $2$.
+on the left, consisting of nodes labeled $0$, $1$ and $2$.
 \begin{center}
 \begin{tabular}{c@{\hspace{10mm}}c}
 \begin{tikzpicture}[scale=0.7,>=stealth',very thick,
 every state/.style={minimum size=0pt,
 for $a$ and $\{0,1\}$ we need to take the union of $\{0,1,2\}$
 (for $0$) and $\{1\}$ (for $1$). The starting state of the DFA
 can be calculated from the starting state of the NFA, that is
 $0$, and then do as many $\epsilon$-transitions as possible.
 This gives $\{0,1,2\}$ which is the starting state of the DFA.
-One terminal states in the DFA are given by all sets that
+The terminal states in the DFA are given by all sets that
 contain a $2$, which is the terminal state of the NFA. This
 completes the subset construction.
-There are two points to note: One is that the resulting DFA
+There are two points to note: One is that very often the
-contains a number of ``dead'' nodes that are never reachable
+resulting DFA contains a number of ``dead'' nodes that are
-from the starting state (that is that the calculated DFA in
+never reachable from the starting state (that is that the
-this example is not a minimal DFA). Such dead nodes can be
+calculated DFA in this example is not a minimal DFA). Such
-safely removed without changing the language that is
+dead nodes can be safely removed without changing the language
-recognised by the DFA. Another point is that in some cases the
+that is recognised by the DFA. Another point is that in some
-subset construction produces a DFA that does \emph{not}
+cases, however, the subset construction produces a DFA that
-contain any dead nodes\ldots{}that means it calculates a
+does \emph{not} contain any dead nodes\ldots{}that means it
-minimal DFA. Which in turn means that in some cases the number
+calculates a minimal DFA. Which in turn means that in some
-of nodes by going from NFAs to DFAs exponentially increases,
+cases the number of nodes by going from NFAs to DFAs
-namely by $2^n$ (which is the number of subsets you can form
+exponentially increases, namely by $2^n$ (which is the number
-for $n$ nodes).
+of subsets you can form for $n$ nodes).
 \subsubsection*{Brzozowski's Method}
-DFA -> NFA
+As said before, we can also go into the other direction---from
+DFAs to regular expressions. Brzozowski's method calculates
+a regular expression using familiar transformations for
+solving equational systems. Consider the DFA:
+\begin{center}
+\begin{tikzpicture}[scale=1.5,>=stealth',very thick,auto,
+every state/.style={minimum size=0pt,
+inner sep=2pt,draw=blue!50,very thick,
+fill=blue!20}]
+\node[state, initial]        (q0) at ( 0,1) {$q_0$};
+\node[state]                    (q1) at ( 1,1) {$q_1$};
+\node[state, accepting] (q2) at ( 2,1) {$q_2$};
+\path[->] (q0) edge[bend left] node[above] {$a$} (q1)
+(q1) edge[bend left] node[above] {$b$} (q0)
+(q2) edge[bend left=50] node[below] {$b$} (q0)
+(q1) edge node[above] {$a$} (q2)
+(q2) edge [loop right] node {$a$} ()
+(q0) edge [loop below] node {$b$} ();
+\end{tikzpicture}
+\end{center}
+\noindent for which we can set up the following equational
+system
+\begin{eqnarray}
+q_0 & = & \epsilon + q_0\,b + q_1\,b +  q_2\,b\\
+q_1 & = & q_0\,a\\
+q_2 & = & q_1\,a + q_2\,a
+\end{eqnarray}
+\noindent There is an equation for each node in the DFA. Let
+us have a look how the right-hand sides of the equations are
+constructed. First have a look at the second equation: the
+left-hand side is $q_1$ and the right-hand side $q_0\,a$. The
+right-hand side is essentially all possible ways how to end up
+in $q_1$. There is only one incoming edge from $q_0$ consuming
+an $a$. Therefore we say: if we are in $q_0$ consuming an $a$
+then we end up in $q_1$. Therefore the right hand side is
+state followed by character---in this case $q_0\,a$. Now lets
+have a look at the third equation: there are two incoming
+edges. Therefore we have two terms, namely $q_1\,a$ and
+$q_2\,a$. These terms are separated by $+$. The first states
+that if in state $q_1$ consuming an $a$ will bring you to
+$q_2$, and the secont that being in $q_2$ and consuming an $a$
+will make you stay in $q_2$. The right-hand side of the
+first equation is constructed similarly: there are three
+incoming edges, therefore there are three terms. There is
+one exception in that we also ``add'' $\epsilon$ to the
+first equation, because it corresponds to the starting state
+in the DFA.
+Having constructed the equational system, the question is
+how to solve it? Remarkably the rules are very similar to
+solving usual linear equational systems. For example the
+second equation does not contain the variable $q_1$ on the
+right-hand side of the equation. We can therefore eliminate
+$q_1$ from the system by just substituting this equation
+into the other two. This gives
+\begin{eqnarray}
+q_0 & = & \epsilon + q_0\,b + q_0\,a\,b +  q_2\,b\\
+q_2 & = & q_0\,a\,a + q_2\,a
+\end{eqnarray}
+\noindent where in Equation (4) we have two occurences
+of $q_0$. Like the laws about $+$ and $\cdot$, we can simplify
+Equation (4) to obtain the following two equations:
+\begin{eqnarray}
+q_0 & = & \epsilon + q_0\,(b + a\,b) +  q_2\,b\\
+q_2 & = & q_0\,a\,a + q_2\,a
+\end{eqnarray}
+\noindent Unfortunately we cannot make any more progress with
+substituting equations, because both (6) and (7) contain the
+variable on the left-hand side also on the right-hand side.
+Here we need to now use a law that is different from the usual
+laws. It is called \emph{Arden's rule}. It states that
+if an equation is of the form $q = q\,r + s$ then it can be
+transformed to $q = s\, r^*$. Since we can assume $+$ is
+symmetric, equation (7) is of that form: $s$ is $q_0\,a\,a$
+and $r$ is $a$. That means we can transform Equation (7)
+to obtain the two new equations
+\begin{eqnarray}
+q_0 & = & \epsilon + q_0\,(b + a\,b) +  q_2\,b\\
+q_2 & = & q_0\,a\,a\,(a^*)
+\end{eqnarray}
+\noindent Now again we can substitute the second equation into
+the first in order to eliminate the variable $q_2$.
+\begin{eqnarray}
+q_0 & = & \epsilon + q_0\,(b + a\,b) +  q_0\,a\,a\,(a^*)\,b
+\end{eqnarray}
+\noindent Pulling $q_0$ out as a single factor gives:
+\begin{eqnarray}
+q_0 & = & \epsilon + q_0\,(b + a\,b + a\,a\,(a^*)\,b)
+\end{eqnarray}
+\noindent This equation is again of the form so that we can
+apply Arden's rule ($r$ is $b + a\,b + a\,a\,(a^*)\,b$ and $s$
+is $\epsilon$). This gives as solution for $q_0$ the following
+regular expression:
+\begin{eqnarray}
+q_0 & = & \epsilon\,(b + a\,b + a\,a\,(a^*)\,b)^*
+\end{eqnarray}
+\noindent SInce this is a regular expression, we can simplify
+away the $\epsilon$ to obtain the slightly simpler regular
+expression
+\begin{eqnarray}
+q_0 & = & (b + a\,b + a\,a\,(a^*)\,b)^*
+\end{eqnarray}
+\noindent
+Now we can unwind this process and obtain the solutions
+for the other equations. This gives:
+\begin{eqnarray}
+q_0 & = & (b + a\,b + a\,a\,(a^*)\,b)^*\\
+q_1 & = & (b + a\,b + a\,a\,(a^*)\,b)^*\,a\\
+q_2 & = & (b + a\,b + a\,a\,(a^*)\,b)^*\,a\,a\,(a)^*
+\end{eqnarray}
+\noindent Finally, we only need to ``add'' up the equations
+which correspond to a terminal state. In our running example,
+this is just $q_2$. Consequently, a regular expression
+that recognises the same language as the automaton is
+\[
+(b + a\,b + a\,a\,(a^*)\,b)^*\,a\,a\,(a)^*
+\]
+\noindent You can somewhat crosscheck your solution
+by taking a string the regular expression can match and
+and see whether it can be matched by the automaton.
+One string for example is $aaa$ and \emph{voila} this
+string is also matched by the automaton.
+We should prove that Brzozowski's method really produces
+an equivalent  regular expression for the automaton. But
+for the purposes of this module, we omit this.
 \subsubsection*{Automata Minimization}
 As seen in the subset construction, the translation
 of an NFA to a DFA can result in a rather ``inefficient''
 for the language not recognised by an DFA. If the DAF is
 completed (this is important!), then you just need to exchange
 the accepting and non-accepting states. You can then translate
 this DFA back into a regular expression.
-Not all languages are regular\ldots{}
+Not all languages are regular. The most well-known example
+of a language that is not regular consists of all the strings
+of the form
+\[a^n\,b^n\]
+\noindent meaning strings that have the same number of $a$s
+and $b$s. You can try, but you cannot find a regular
+expression for this language and also not an automaton. One
+can actually prove that there is no regular expression nor
+automaton for this language, but again that would lead us too
+far afield for what we want to do in this module.
 \end{document}
 %%% Local Variables:

changeset 292	7ed2a25dd115
parent 270	4dbeaf43031d
child 318	7975e4f0d4de