handouts/ho03.tex
changeset 489 e28d7a327870
parent 488 598741d39d21
child 490 4fee50f38305
equal deleted inserted replaced
488:598741d39d21 489:e28d7a327870
   568   combines an $\epsilon$NFA transition with a NFA transition
   568   combines an $\epsilon$NFA transition with a NFA transition
   569   (both given as partial functions).\label{thompson2}}
   569   (both given as partial functions).\label{thompson2}}
   570 \end{figure}
   570 \end{figure}
   571 
   571 
   572 The case for the sequence regular expression $r_1 \cdot r_2$ is a bit more
   572 The case for the sequence regular expression $r_1 \cdot r_2$ is a bit more
   573 complicated: We are given by recursion two NFAs representing the regular
   573 complicated: Say, we are given by recursion two NFAs representing the regular
   574 expressions $r_1$ and $r_2$ respectively.
   574 expressions $r_1$ and $r_2$ respectively.
   575 
   575 
   576 \begin{center}
   576 \begin{center}
   577 \begin{tikzpicture}[node distance=3mm,
   577 \begin{tikzpicture}[node distance=3mm,
   578     >=stealth',very thick,
   578     >=stealth',very thick,
   603 \end{pgfonlayer}
   603 \end{pgfonlayer}
   604 \end{tikzpicture}
   604 \end{tikzpicture}
   605 \end{center}
   605 \end{center}
   606 
   606 
   607 \noindent The first NFA has some accepting states and the second some
   607 \noindent The first NFA has some accepting states and the second some
   608 starting states. We obtain $\epsilon$NFA for $r_1\cdot r_2$ by
   608 starting states. We obtain an $\epsilon$NFA for $r_1\cdot r_2$ by
   609 connecting these accepting states with $\epsilon$-transitions to the
   609 connecting the accepting states of the first NFA with
   610 starting states of the second automaton. By doing so we make them
   610 $\epsilon$-transitions to the starting states of the second
   611 non-accepting like so:
   611 automaton. By doing so we make the accepting states of the first NFA
       
   612 to be non-accepting like so:
   612 
   613 
   613 \begin{center}
   614 \begin{center}
   614 \begin{tikzpicture}[node distance=3mm,
   615 \begin{tikzpicture}[node distance=3mm,
   615     >=stealth',very thick,
   616     >=stealth',very thick,
   616     every state/.style={minimum size=3pt,draw=blue!50,very thick,fill=blue!20},]
   617     every state/.style={minimum size=3pt,draw=blue!50,very thick,fill=blue!20},]
   643 \node [yshift=2mm] at (3.north) {$r_1\cdot r_2$};
   644 \node [yshift=2mm] at (3.north) {$r_1\cdot r_2$};
   644 \end{pgfonlayer}
   645 \end{pgfonlayer}
   645 \end{tikzpicture}
   646 \end{tikzpicture}
   646 \end{center}
   647 \end{center}
   647 
   648 
   648 \noindent The case for the choice regular expression $r_1 +
   649 \noindent The idea behind this construction is that the start of any
   649 r_2$ is slightly different: We are given by recursion two
   650 string is first recognised by the first NFA, then we silently change
   650 automata representing $r_1$ and $r_2$ respectively. 
   651 to the second NFA; the ending of the string is recognised by the
       
   652 second NFA...just like matching of a string by the regular expression
       
   653 $r_1\cdot r_2$. The Scala code for this constrction is given in
       
   654 Figure~\ref{thompson2} in Lines 16--23. The starting states of the
       
   655 $\epsilon$NFA are the starting states of the first NFA (corresponding
       
   656 to $r_1$); the accepting function is the accepting function of the
       
   657 second NFA (corresponding to $r_2$). The new transition function is
       
   658 all the ``old'' transitions plus the $\epsilon$-transitions connecting
       
   659 the accepting states of the first NFA to the starting states of the
       
   660 first NFA (Lines 18 and 19). The $\epsilon$NFA is then immedately
       
   661 translated in a NFA.
       
   662 
       
   663 
       
   664 The case for the choice regular expression $r_1 + r_2$ is slightly
       
   665 different: We are given by recursion two NFAs representing $r_1$ and
       
   666 $r_2$ respectively.
   651 
   667 
   652 \begin{center}
   668 \begin{center}
   653 \begin{tikzpicture}[node distance=3mm,
   669 \begin{tikzpicture}[node distance=3mm,
   654     >=stealth',very thick,
   670     >=stealth',very thick,
   655     every state/.style={minimum size=3pt,draw=blue!50,very thick,fill=blue!20},]
   671     every state/.style={minimum size=3pt,draw=blue!50,very thick,fill=blue!20},]
   656 \node at (0,0)  (1)  {$\mbox{}$};
   672 \node at (0,0)  (1)  {$\mbox{}$};
   657 \node[state, initial]  (2)  [above right=16mm of 1] {$\mbox{}$};
   673 \node  (2)  [above=10mm of 1] {};
   658 \node[state, initial]  (3)  [below right=16mm of 1] {$\mbox{}$};
   674 \node[state, initial]  (4)  [above=1mm of 2] {$\mbox{}$};
   659 
   675 \node[state, initial]  (5)  [below=1mm of 2] {$\mbox{}$};
   660 \node (a)  [right=of 2] {$\ldots$};
   676 \node[state, initial]  (3)  [below=10mm of 1] {$\mbox{}$};
   661 \node[state, accepting]  (a1)  [right=of a] {$\mbox{}$};
   677 
       
   678 \node (a)  [right=of 2] {$\ldots\,$};
       
   679 \node  (a1)  [right=of a] {$$};
   662 \node[state, accepting]  (a2)  [above=of a1] {$\mbox{}$};
   680 \node[state, accepting]  (a2)  [above=of a1] {$\mbox{}$};
   663 \node[state, accepting]  (a3)  [below=of a1] {$\mbox{}$};
   681 \node[state, accepting]  (a3)  [below=of a1] {$\mbox{}$};
   664 
   682 
   665 \node (b)  [right=of 3] {$\ldots$};
   683 \node (b)  [right=of 3] {$\ldots$};
   666 \node[state, accepting]  (b1)  [right=of b] {$\mbox{}$};
   684 \node[state, accepting]  (b1)  [right=of b] {$\mbox{}$};
   673 \node [yshift=3mm] at (2.north) {$r_2$};
   691 \node [yshift=3mm] at (2.north) {$r_2$};
   674 \end{pgfonlayer}
   692 \end{pgfonlayer}
   675 \end{tikzpicture}
   693 \end{tikzpicture}
   676 \end{center}
   694 \end{center}
   677 
   695 
   678 \noindent Each automaton has a single start state and
   696 \noindent Each NFA has some starting states and some accepting
   679 potentially several accepting states. We obtain a NFA for the
   697 states. We obtain a NFA for the regular expression $r_1 + r_2$
   680 regular expression $r_1 + r_2$ by introducing a new starting
   698 by composing the transition functions (this crucially depends
   681 state and connecting it with an $\epsilon$-transition to the
   699 on knowing that the states of each component NFA are distinct);
   682 two starting states above, like so
   700 and also combine the starting states and accepting functions:
   683 
   701 
   684 \begin{center}
   702 \begin{center}
   685 \hspace{2cm}\begin{tikzpicture}[node distance=3mm,
   703 \begin{tikzpicture}[node distance=3mm,
   686                              >=stealth',very thick, every state/.style={minimum size=3pt,draw=blue!50,very thick,fill=blue!20},]
   704     >=stealth',very thick,
   687 \node at (0,0) [state, initial]  (1)  {$\mbox{}$};
   705     every state/.style={minimum size=3pt,draw=blue!50,very thick,fill=blue!20},]
   688 \node[state]  (2)  [above right=16mm of 1] {$\mbox{}$};
   706 \node at (0,0) (1)  {$\mbox{}$};
   689 \node[state]  (3)  [below right=16mm of 1] {$\mbox{}$};
   707 \node (2)  [above=10mm of 1] {$$};
   690 
   708 \node[state, initial]  (4)  [above=1mm of 2] {$\mbox{}$};
   691 \node (a)  [right=of 2] {$\ldots$};
   709 \node[state, initial]  (5)  [below=1mm of 2] {$\mbox{}$};
   692 \node[state, accepting]  (a1)  [right=of a] {$\mbox{}$};
   710 \node[state, initial]  (3)  [below=10mm of 1] {$\mbox{}$};
       
   711 
       
   712 \node (a)  [right=of 2] {$\ldots\,$};
       
   713 \node (a1)  [right=of a] {$$};
   693 \node[state, accepting]  (a2)  [above=of a1] {$\mbox{}$};
   714 \node[state, accepting]  (a2)  [above=of a1] {$\mbox{}$};
   694 \node[state, accepting]  (a3)  [below=of a1] {$\mbox{}$};
   715 \node[state, accepting]  (a3)  [below=of a1] {$\mbox{}$};
   695 
   716 
   696 \node (b)  [right=of 3] {$\ldots$};
   717 \node (b)  [right=of 3] {$\ldots$};
   697 \node[state, accepting]  (b1)  [right=of b] {$\mbox{}$};
   718 \node[state, accepting]  (b1)  [right=of b] {$\mbox{}$};
   698 \node[state, accepting]  (b2)  [above=of b1] {$\mbox{}$};
   719 \node[state, accepting]  (b2)  [above=of b1] {$\mbox{}$};
   699 \node[state, accepting]  (b3)  [below=of b1] {$\mbox{}$};
   720 \node[state, accepting]  (b3)  [below=of b1] {$\mbox{}$};
   700 
   721 
   701 \path[->] (1) edge node [above]  {$\epsilon$} (2);
   722 %\path[->] (1) edge node [above]  {$\epsilon$} (2);
   702 \path[->] (1) edge node [below]  {$\epsilon$} (3);
   723 %\path[->] (1) edge node [below]  {$\epsilon$} (3);
   703 
   724 
   704 \begin{pgfonlayer}{background}
   725 \begin{pgfonlayer}{background}
   705 \node (3) [rounded corners, inner sep=1mm, thick, draw=black!60, fill=black!20, fit= (1) (a2) (a3) (b2) (b3)] {};
   726 \node (3) [rounded corners, inner sep=1mm, thick, draw=black!60, fill=black!20, fit= (1) (a2) (a3) (b2) (b3)] {};
   706 \node [yshift=3mm] at (3.north) {$r_1+ r_2$};
   727 \node [yshift=3mm] at (3.north) {$r_1+ r_2$};
   707 \end{pgfonlayer}
   728 \end{pgfonlayer}
   708 \end{tikzpicture}
   729 \end{tikzpicture}
   709 \end{center}
   730 \end{center}
   710 
   731 
   711 \noindent 
   732 \noindent The code for this construction is in Figure~\ref{thompson2}
   712 Finally for the $*$-case we have an automaton for $r$
   733 in Lines 25--33. Finally for the $*$-case we have a NFA for $r$
   713 
   734 
   714 \begin{center}
   735 \begin{center}
   715 \begin{tikzpicture}[node distance=3mm,
   736 \begin{tikzpicture}[node distance=3mm,
   716                              >=stealth',very thick, every state/.style={minimum size=3pt,draw=blue!50,very thick,fill=blue!20},]
   737                              >=stealth',very thick, every state/.style={minimum size=3pt,draw=blue!50,very thick,fill=blue!20},]
   717 \node at (0,0)  (1)  {$\mbox{}$};
   738 \node at (0,0)  (1)  {$\mbox{}$};
   725 \node [yshift=3mm] at (1.north) {$r$};
   746 \node [yshift=3mm] at (1.north) {$r$};
   726 \end{pgfonlayer}
   747 \end{pgfonlayer}
   727 \end{tikzpicture}
   748 \end{tikzpicture}
   728 \end{center}
   749 \end{center}
   729 
   750 
   730 \noindent and connect its accepting states to a new starting
   751 \noindent and connect its accepting states to a new starting state via
   731 state via $\epsilon$-transitions. This new starting state is
   752 $\epsilon$-transitions. This new starting state is also an accepting
   732 also an accepting state, because $r^*$ can recognise the
   753 state, because $r^*$ can recognise the empty string. This gives the
   733 empty string. This gives the following automaton for $r^*$:
   754 following $\epsilon$NFA for $r^*$ (the corresponding code is in
       
   755 Figure~\ref{thompson2} in Lines 35--43:
   734 
   756 
   735 \begin{center}  
   757 \begin{center}  
   736 \begin{tikzpicture}[node distance=3mm,
   758 \begin{tikzpicture}[node distance=3mm,
   737                              >=stealth',very thick, every state/.style={minimum size=3pt,draw=blue!50,very thick,fill=blue!20},]
   759     >=stealth',very thick,
       
   760     every state/.style={minimum size=3pt,draw=blue!50,very thick,fill=blue!20},]
   738 \node at (0,0) [state, initial,accepting]  (1)  {$\mbox{}$};
   761 \node at (0,0) [state, initial,accepting]  (1)  {$\mbox{}$};
   739 \node[state]  (2)  [right=16mm of 1] {$\mbox{}$};
   762 \node[state]  (2)  [right=16mm of 1] {$\mbox{}$};
   740 \node (a)  [right=of 2] {$\ldots$};
   763 \node (a)  [right=of 2] {$\ldots$};
   741 \node[state]  (a1)  [right=of a] {$\mbox{}$};
   764 \node[state]  (a1)  [right=of a] {$\mbox{}$};
   742 \node[state]  (a2)  [above=of a1] {$\mbox{}$};
   765 \node[state]  (a2)  [above=of a1] {$\mbox{}$};
   750 \node [yshift=3mm] at (2.north) {$r^*$};
   773 \node [yshift=3mm] at (2.north) {$r^*$};
   751 \end{pgfonlayer}
   774 \end{pgfonlayer}
   752 \end{tikzpicture}
   775 \end{tikzpicture}
   753 \end{center}
   776 \end{center}
   754 
   777 
       
   778 
       
   779 To sum ap, you can see in the sequence and star cases the need of
       
   780 having silent $\epsilon$-transitions. Similarly the alternative case
       
   781 shows the need of the NFA-nondeterminsim. It seems awkward to form the
       
   782 `alternative' composition of two DFAs, because DFA do not allow
       
   783 several starting and successor states. All these constructions can now
       
   784 be put together in the following recursive function:
       
   785 
       
   786 
   755 {\small\begin{lstlisting}[language=Scala,linebackgroundcolor=
   787 {\small\begin{lstlisting}[language=Scala,linebackgroundcolor=
   756                   {\ifodd\value{lstnumber}\color{capri!3}\fi}]
   788                   {\ifodd\value{lstnumber}\color{capri!3}\fi}]
   757 def thompson (r: Rexp) : NFAt = r match {
   789 def thompson (r: Rexp) : NFAt = r match {
   758   case ZERO => NFA_ZERO()
   790   case ZERO => NFA_ZERO()
   759   case ONE => NFA_ONE()
   791   case ONE => NFA_ONE()
   762   case SEQ(r1, r2) => NFA_SEQ(thompson(r1), thompson(r2))
   794   case SEQ(r1, r2) => NFA_SEQ(thompson(r1), thompson(r2))
   763   case STAR(r1) => NFA_STAR(thompson(r1))
   795   case STAR(r1) => NFA_STAR(thompson(r1))
   764 }
   796 }
   765 \end{lstlisting}}
   797 \end{lstlisting}}
   766 
   798 
       
   799 \noindent
       
   800 It calculates a NFA from a regular expressions. At last we can run a
       
   801 NFA for the our evil regular expression examples.
       
   802 
   767 
   803 
   768 \begin{center}
   804 \begin{center}
   769 \begin{tabular}{@{\hspace{-1mm}}c@{\hspace{1mm}}c@{}}  
   805 \begin{tabular}{@{\hspace{-1mm}}c@{\hspace{1mm}}c@{}}  
       
   806 \begin{tikzpicture}
       
   807 \begin{axis}[
       
   808     title={Graph: $\texttt{a?\{n\}\,a{\{n\}}}$ and strings 
       
   809            $\underbrace{\texttt{a}\ldots \texttt{a}}_{n}$},
       
   810     xlabel={$n$},
       
   811     x label style={at={(1.05,0.0)}},
       
   812     ylabel={time in secs},
       
   813     enlargelimits=false,
       
   814     xtick={0,5,...,30},
       
   815     xmax=33,
       
   816     ymax=35,
       
   817     ytick={0,5,...,30},
       
   818     scaled ticks=false,
       
   819     axis lines=left,
       
   820     width=5.5cm,
       
   821     height=4.5cm, 
       
   822     legend entries={Python,Ruby},  
       
   823     legend pos=south east,
       
   824     legend cell align=left]
       
   825   \addplot[blue,mark=*, mark options={fill=white}] table {re-python.data};
       
   826   \addplot[brown,mark=triangle*, mark options={fill=white}] table {re-ruby.data};
       
   827   % breath-first search in NFAs
       
   828   \addplot[red,mark=*, mark options={fill=white}] table {
       
   829     1 0.00586
       
   830     2 0.01209
       
   831     3 0.03076
       
   832     4 0.08269
       
   833     5 0.12881
       
   834     6 0.25146
       
   835     7 0.51377
       
   836     8 0.89079
       
   837     9 1.62802
       
   838     10 3.05326
       
   839     11 5.92437
       
   840     12 11.67863
       
   841     13 24.00568
       
   842   };
       
   843 \end{axis}
       
   844 \end{tikzpicture}
       
   845 &
   770 \begin{tikzpicture}
   846 \begin{tikzpicture}
   771 \begin{axis}[
   847 \begin{axis}[
   772     title={Graph: $\texttt{(a*)*\,b}$ and strings 
   848     title={Graph: $\texttt{(a*)*\,b}$ and strings 
   773            $\underbrace{\texttt{a}\ldots \texttt{a}}_{n}$},
   849            $\underbrace{\texttt{a}\ldots \texttt{a}}_{n}$},
   774     xlabel={$n$},
   850     xlabel={$n$},
   798     6 8.04894
   874     6 8.04894
   799     7 32.63549
   875     7 32.63549
   800   };
   876   };
   801 \end{axis}
   877 \end{axis}
   802 \end{tikzpicture}
   878 \end{tikzpicture}
   803 &
       
   804 \begin{tikzpicture}
       
   805 \begin{axis}[
       
   806     title={Graph: $\texttt{a?\{n\}\,a{\{n\}}}$ and strings 
       
   807            $\underbrace{\texttt{a}\ldots \texttt{a}}_{n}$},
       
   808     xlabel={$n$},
       
   809     x label style={at={(1.05,0.0)}},
       
   810     ylabel={time in secs},
       
   811     enlargelimits=false,
       
   812     xtick={0,5,...,30},
       
   813     xmax=33,
       
   814     ymax=35,
       
   815     ytick={0,5,...,30},
       
   816     scaled ticks=false,
       
   817     axis lines=left,
       
   818     width=5.5cm,
       
   819     height=4.5cm, 
       
   820     legend entries={Python,Ruby},  
       
   821     legend pos=south east,
       
   822     legend cell align=left]
       
   823   \addplot[blue,mark=*, mark options={fill=white}] table {re-python.data};
       
   824   \addplot[brown,mark=triangle*, mark options={fill=white}] table {re-ruby.data};
       
   825   % breath-first search in NFAs
       
   826   \addplot[red,mark=*, mark options={fill=white}] table {
       
   827     1  0.00741
       
   828     2  0.02657
       
   829     3  0.08317
       
   830     4  0.22133
       
   831     5  0.54463
       
   832     6  1.42062
       
   833     7  4.04926
       
   834     8  12.70395
       
   835     9  39.21555
       
   836     10  112.05466
       
   837   };
       
   838 \end{axis}
       
   839 \end{tikzpicture}
       
   840 \end{tabular}
   879 \end{tabular}
   841 \end{center}
   880 \end{center}
   842 
   881 
   843 
       
   844 \noindent This construction of a NFA from a regular expression
       
   845 was invented by Ken Thompson in 1968.
       
   846 
   882 
   847 
   883 
   848 \subsubsection*{Subset Construction}
   884 \subsubsection*{Subset Construction}
   849 
   885 
   850 Remember that we did not bother with defining and implementing
   886 Remember that we did not bother with defining and implementing