afl-material: comparison handouts/ho03.tex

equal deleted inserted replaced

-:598741d39d21
+:e28d7a327870
 combines an $\epsilon$NFA transition with a NFA transition
 (both given as partial functions).\label{thompson2}}
 \end{figure}
 The case for the sequence regular expression $r_1 \cdot r_2$ is a bit more
-complicated: We are given by recursion two NFAs representing the regular
+complicated: Say, we are given by recursion two NFAs representing the regular
 expressions $r_1$ and $r_2$ respectively.
 \begin{center}
 \begin{tikzpicture}[node distance=3mm,
 >=stealth',very thick,
 \end{pgfonlayer}
 \end{tikzpicture}
 \end{center}
 \noindent The first NFA has some accepting states and the second some
-starting states. We obtain $\epsilon$NFA for $r_1\cdot r_2$ by
+starting states. We obtain an $\epsilon$NFA for $r_1\cdot r_2$ by
-connecting these accepting states with $\epsilon$-transitions to the
+connecting the accepting states of the first NFA with
-starting states of the second automaton. By doing so we make them
+$\epsilon$-transitions to the starting states of the second
-non-accepting like so:
+automaton. By doing so we make the accepting states of the first NFA
+to be non-accepting like so:
 \begin{center}
 \begin{tikzpicture}[node distance=3mm,
 >=stealth',very thick,
 every state/.style={minimum size=3pt,draw=blue!50,very thick,fill=blue!20},]
 \node [yshift=2mm] at (3.north) {$r_1\cdot r_2$};
 \end{pgfonlayer}
 \end{tikzpicture}
 \end{center}
-\noindent The case for the choice regular expression $r_1 +
+\noindent The idea behind this construction is that the start of any
-r_2$ is slightly different: We are given by recursion two
+string is first recognised by the first NFA, then we silently change
-automata representing $r_1$ and $r_2$ respectively.
+to the second NFA; the ending of the string is recognised by the
+second NFA...just like matching of a string by the regular expression
+$r_1\cdot r_2$. The Scala code for this constrction is given in
+Figure~\ref{thompson2} in Lines 16--23. The starting states of the
+$\epsilon$NFA are the starting states of the first NFA (corresponding
+to $r_1$); the accepting function is the accepting function of the
+second NFA (corresponding to $r_2$). The new transition function is
+all the ``old'' transitions plus the $\epsilon$-transitions connecting
+the accepting states of the first NFA to the starting states of the
+first NFA (Lines 18 and 19). The $\epsilon$NFA is then immedately
+translated in a NFA.
+The case for the choice regular expression $r_1 + r_2$ is slightly
+different: We are given by recursion two NFAs representing $r_1$ and
+$r_2$ respectively.
 \begin{center}
 \begin{tikzpicture}[node distance=3mm,
 >=stealth',very thick,
 every state/.style={minimum size=3pt,draw=blue!50,very thick,fill=blue!20},]
 \node at (0,0)  (1)  {$\mbox{}$};
-\node[state, initial]  (2)  [above right=16mm of 1] {$\mbox{}$};
+\node  (2)  [above=10mm of 1] {};
-\node[state, initial]  (3)  [below right=16mm of 1] {$\mbox{}$};
+\node[state, initial]  (4)  [above=1mm of 2] {$\mbox{}$};
+\node[state, initial]  (5)  [below=1mm of 2] {$\mbox{}$};
-\node (a)  [right=of 2] {$\ldots$};
+\node[state, initial]  (3)  [below=10mm of 1] {$\mbox{}$};
-\node[state, accepting]  (a1)  [right=of a] {$\mbox{}$};
+\node (a)  [right=of 2] {$\ldots\,$};
+\node  (a1)  [right=of a] {$$};
 \node[state, accepting]  (a2)  [above=of a1] {$\mbox{}$};
 \node[state, accepting]  (a3)  [below=of a1] {$\mbox{}$};
 \node (b)  [right=of 3] {$\ldots$};
 \node[state, accepting]  (b1)  [right=of b] {$\mbox{}$};
 \node [yshift=3mm] at (2.north) {$r_2$};
 \end{pgfonlayer}
 \end{tikzpicture}
 \end{center}
-\noindent Each automaton has a single start state and
+\noindent Each NFA has some starting states and some accepting
-potentially several accepting states. We obtain a NFA for the
+states. We obtain a NFA for the regular expression $r_1 + r_2$
-regular expression $r_1 + r_2$ by introducing a new starting
+by composing the transition functions (this crucially depends
-state and connecting it with an $\epsilon$-transition to the
+on knowing that the states of each component NFA are distinct);
-two starting states above, like so
+and also combine the starting states and accepting functions:
 \begin{center}
-\hspace{2cm}\begin{tikzpicture}[node distance=3mm,
+\begin{tikzpicture}[node distance=3mm,
->=stealth',very thick, every state/.style={minimum size=3pt,draw=blue!50,very thick,fill=blue!20},]
+>=stealth',very thick,
-\node at (0,0) [state, initial]  (1)  {$\mbox{}$};
+every state/.style={minimum size=3pt,draw=blue!50,very thick,fill=blue!20},]
-\node[state]  (2)  [above right=16mm of 1] {$\mbox{}$};
+\node at (0,0) (1)  {$\mbox{}$};
-\node[state]  (3)  [below right=16mm of 1] {$\mbox{}$};
+\node (2)  [above=10mm of 1] {$$};
+\node[state, initial]  (4)  [above=1mm of 2] {$\mbox{}$};
-\node (a)  [right=of 2] {$\ldots$};
+\node[state, initial]  (5)  [below=1mm of 2] {$\mbox{}$};
-\node[state, accepting]  (a1)  [right=of a] {$\mbox{}$};
+\node[state, initial]  (3)  [below=10mm of 1] {$\mbox{}$};
+\node (a)  [right=of 2] {$\ldots\,$};
+\node (a1)  [right=of a] {$$};
 \node[state, accepting]  (a2)  [above=of a1] {$\mbox{}$};
 \node[state, accepting]  (a3)  [below=of a1] {$\mbox{}$};
 \node (b)  [right=of 3] {$\ldots$};
 \node[state, accepting]  (b1)  [right=of b] {$\mbox{}$};
 \node[state, accepting]  (b2)  [above=of b1] {$\mbox{}$};
 \node[state, accepting]  (b3)  [below=of b1] {$\mbox{}$};
-\path[->] (1) edge node [above]  {$\epsilon$} (2);
+%\path[->] (1) edge node [above]  {$\epsilon$} (2);
-\path[->] (1) edge node [below]  {$\epsilon$} (3);
+%\path[->] (1) edge node [below]  {$\epsilon$} (3);
 \begin{pgfonlayer}{background}
 \node (3) [rounded corners, inner sep=1mm, thick, draw=black!60, fill=black!20, fit= (1) (a2) (a3) (b2) (b3)] {};
 \node [yshift=3mm] at (3.north) {$r_1+ r_2$};
 \end{pgfonlayer}
 \end{tikzpicture}
 \end{center}
-\noindent
+\noindent The code for this construction is in Figure~\ref{thompson2}
-Finally for the $*$-case we have an automaton for $r$
+in Lines 25--33. Finally for the $*$-case we have a NFA for $r$
 \begin{center}
 \begin{tikzpicture}[node distance=3mm,
 >=stealth',very thick, every state/.style={minimum size=3pt,draw=blue!50,very thick,fill=blue!20},]
 \node at (0,0)  (1)  {$\mbox{}$};
 \node [yshift=3mm] at (1.north) {$r$};
 \end{pgfonlayer}
 \end{tikzpicture}
 \end{center}
-\noindent and connect its accepting states to a new starting
+\noindent and connect its accepting states to a new starting state via
-state via $\epsilon$-transitions. This new starting state is
+$\epsilon$-transitions. This new starting state is also an accepting
-also an accepting state, because $r^*$ can recognise the
+state, because $r^*$ can recognise the empty string. This gives the
-empty string. This gives the following automaton for $r^*$:
+following $\epsilon$NFA for $r^*$ (the corresponding code is in
+Figure~\ref{thompson2} in Lines 35--43:
 \begin{center}
 \begin{tikzpicture}[node distance=3mm,
->=stealth',very thick, every state/.style={minimum size=3pt,draw=blue!50,very thick,fill=blue!20},]
+>=stealth',very thick,
+every state/.style={minimum size=3pt,draw=blue!50,very thick,fill=blue!20},]
 \node at (0,0) [state, initial,accepting]  (1)  {$\mbox{}$};
 \node[state]  (2)  [right=16mm of 1] {$\mbox{}$};
 \node (a)  [right=of 2] {$\ldots$};
 \node[state]  (a1)  [right=of a] {$\mbox{}$};
 \node[state]  (a2)  [above=of a1] {$\mbox{}$};
 \node [yshift=3mm] at (2.north) {$r^*$};
 \end{pgfonlayer}
 \end{tikzpicture}
 \end{center}
+To sum ap, you can see in the sequence and star cases the need of
+having silent $\epsilon$-transitions. Similarly the alternative case
+shows the need of the NFA-nondeterminsim. It seems awkward to form the
+`alternative' composition of two DFAs, because DFA do not allow
+several starting and successor states. All these constructions can now
+be put together in the following recursive function:
 {\small\begin{lstlisting}[language=Scala,linebackgroundcolor=
 {\ifodd\value{lstnumber}\color{capri!3}\fi}]
 def thompson (r: Rexp) : NFAt = r match {
 case ZERO => NFA_ZERO()
 case ONE => NFA_ONE()
 case SEQ(r1, r2) => NFA_SEQ(thompson(r1), thompson(r2))
 case STAR(r1) => NFA_STAR(thompson(r1))
 }
 \end{lstlisting}}
+\noindent
+It calculates a NFA from a regular expressions. At last we can run a
+NFA for the our evil regular expression examples.
 \begin{center}
 \begin{tabular}{@{\hspace{-1mm}}c@{\hspace{1mm}}c@{}}
+\begin{tikzpicture}
+\begin{axis}[
+title={Graph: $\texttt{a?\{n\}\,a{\{n\}}}$ and strings
+$\underbrace{\texttt{a}\ldots \texttt{a}}_{n}$},
+xlabel={$n$},
+x label style={at={(1.05,0.0)}},
+ylabel={time in secs},
+enlargelimits=false,
+xtick={0,5,...,30},
+xmax=33,
+ymax=35,
+ytick={0,5,...,30},
+scaled ticks=false,
+axis lines=left,
+width=5.5cm,
+height=4.5cm,
+legend entries={Python,Ruby},
+legend pos=south east,
+legend cell align=left]
+\addplot[blue,mark=*, mark options={fill=white}] table {re-python.data};
+\addplot[brown,mark=triangle*, mark options={fill=white}] table {re-ruby.data};
+% breath-first search in NFAs
+\addplot[red,mark=*, mark options={fill=white}] table {
+1 0.00586
+2 0.01209
+3 0.03076
+4 0.08269
+5 0.12881
+6 0.25146
+7 0.51377
+8 0.89079
+9 1.62802
+10 3.05326
+11 5.92437
+12 11.67863
+13 24.00568
+};
+\end{axis}
+\end{tikzpicture}
+&
 \begin{tikzpicture}
 \begin{axis}[
 title={Graph: $\texttt{(a*)*\,b}$ and strings
 $\underbrace{\texttt{a}\ldots \texttt{a}}_{n}$},
 xlabel={$n$},
 6 8.04894
 7 32.63549
 };
 \end{axis}
 \end{tikzpicture}
-&
-\begin{tikzpicture}
-\begin{axis}[
-title={Graph: $\texttt{a?\{n\}\,a{\{n\}}}$ and strings
-$\underbrace{\texttt{a}\ldots \texttt{a}}_{n}$},
-xlabel={$n$},
-x label style={at={(1.05,0.0)}},
-ylabel={time in secs},
-enlargelimits=false,
-xtick={0,5,...,30},
-xmax=33,
-ymax=35,
-ytick={0,5,...,30},
-scaled ticks=false,
-axis lines=left,
-width=5.5cm,
-height=4.5cm,
-legend entries={Python,Ruby},
-legend pos=south east,
-legend cell align=left]
-\addplot[blue,mark=*, mark options={fill=white}] table {re-python.data};
-\addplot[brown,mark=triangle*, mark options={fill=white}] table {re-ruby.data};
-% breath-first search in NFAs
-\addplot[red,mark=*, mark options={fill=white}] table {
-1  0.00741
-2  0.02657
-3  0.08317
-4  0.22133
-5  0.54463
-6  1.42062
-7  4.04926
-8  12.70395
-9  39.21555
-10  112.05466
-};
-\end{axis}
-\end{tikzpicture}
 \end{tabular}
 \end{center}
-\noindent This construction of a NFA from a regular expression
-was invented by Ken Thompson in 1968.
 \subsubsection*{Subset Construction}
 Remember that we did not bother with defining and implementing

changeset 489	e28d7a327870
parent 488	598741d39d21
child 490	4fee50f38305