603 \end{pgfonlayer} |
603 \end{pgfonlayer} |
604 \end{tikzpicture} |
604 \end{tikzpicture} |
605 \end{center} |
605 \end{center} |
606 |
606 |
607 \noindent The first NFA has some accepting states and the second some |
607 \noindent The first NFA has some accepting states and the second some |
608 starting states. We obtain $\epsilon$NFA for $r_1\cdot r_2$ by |
608 starting states. We obtain an $\epsilon$NFA for $r_1\cdot r_2$ by |
609 connecting these accepting states with $\epsilon$-transitions to the |
609 connecting the accepting states of the first NFA with |
610 starting states of the second automaton. By doing so we make them |
610 $\epsilon$-transitions to the starting states of the second |
611 non-accepting like so: |
611 automaton. By doing so we make the accepting states of the first NFA |
|
612 to be non-accepting like so: |
612 |
613 |
613 \begin{center} |
614 \begin{center} |
614 \begin{tikzpicture}[node distance=3mm, |
615 \begin{tikzpicture}[node distance=3mm, |
615 >=stealth',very thick, |
616 >=stealth',very thick, |
616 every state/.style={minimum size=3pt,draw=blue!50,very thick,fill=blue!20},] |
617 every state/.style={minimum size=3pt,draw=blue!50,very thick,fill=blue!20},] |
643 \node [yshift=2mm] at (3.north) {$r_1\cdot r_2$}; |
644 \node [yshift=2mm] at (3.north) {$r_1\cdot r_2$}; |
644 \end{pgfonlayer} |
645 \end{pgfonlayer} |
645 \end{tikzpicture} |
646 \end{tikzpicture} |
646 \end{center} |
647 \end{center} |
647 |
648 |
648 \noindent The case for the choice regular expression $r_1 + |
649 \noindent The idea behind this construction is that the start of any |
649 r_2$ is slightly different: We are given by recursion two |
650 string is first recognised by the first NFA, then we silently change |
650 automata representing $r_1$ and $r_2$ respectively. |
651 to the second NFA; the ending of the string is recognised by the |
|
652 second NFA...just like matching of a string by the regular expression |
|
653 $r_1\cdot r_2$. The Scala code for this constrction is given in |
|
654 Figure~\ref{thompson2} in Lines 16--23. The starting states of the |
|
655 $\epsilon$NFA are the starting states of the first NFA (corresponding |
|
656 to $r_1$); the accepting function is the accepting function of the |
|
657 second NFA (corresponding to $r_2$). The new transition function is |
|
658 all the ``old'' transitions plus the $\epsilon$-transitions connecting |
|
659 the accepting states of the first NFA to the starting states of the |
|
660 first NFA (Lines 18 and 19). The $\epsilon$NFA is then immedately |
|
661 translated in a NFA. |
|
662 |
|
663 |
|
664 The case for the choice regular expression $r_1 + r_2$ is slightly |
|
665 different: We are given by recursion two NFAs representing $r_1$ and |
|
666 $r_2$ respectively. |
651 |
667 |
652 \begin{center} |
668 \begin{center} |
653 \begin{tikzpicture}[node distance=3mm, |
669 \begin{tikzpicture}[node distance=3mm, |
654 >=stealth',very thick, |
670 >=stealth',very thick, |
655 every state/.style={minimum size=3pt,draw=blue!50,very thick,fill=blue!20},] |
671 every state/.style={minimum size=3pt,draw=blue!50,very thick,fill=blue!20},] |
656 \node at (0,0) (1) {$\mbox{}$}; |
672 \node at (0,0) (1) {$\mbox{}$}; |
657 \node[state, initial] (2) [above right=16mm of 1] {$\mbox{}$}; |
673 \node (2) [above=10mm of 1] {}; |
658 \node[state, initial] (3) [below right=16mm of 1] {$\mbox{}$}; |
674 \node[state, initial] (4) [above=1mm of 2] {$\mbox{}$}; |
659 |
675 \node[state, initial] (5) [below=1mm of 2] {$\mbox{}$}; |
660 \node (a) [right=of 2] {$\ldots$}; |
676 \node[state, initial] (3) [below=10mm of 1] {$\mbox{}$}; |
661 \node[state, accepting] (a1) [right=of a] {$\mbox{}$}; |
677 |
|
678 \node (a) [right=of 2] {$\ldots\,$}; |
|
679 \node (a1) [right=of a] {$$}; |
662 \node[state, accepting] (a2) [above=of a1] {$\mbox{}$}; |
680 \node[state, accepting] (a2) [above=of a1] {$\mbox{}$}; |
663 \node[state, accepting] (a3) [below=of a1] {$\mbox{}$}; |
681 \node[state, accepting] (a3) [below=of a1] {$\mbox{}$}; |
664 |
682 |
665 \node (b) [right=of 3] {$\ldots$}; |
683 \node (b) [right=of 3] {$\ldots$}; |
666 \node[state, accepting] (b1) [right=of b] {$\mbox{}$}; |
684 \node[state, accepting] (b1) [right=of b] {$\mbox{}$}; |
673 \node [yshift=3mm] at (2.north) {$r_2$}; |
691 \node [yshift=3mm] at (2.north) {$r_2$}; |
674 \end{pgfonlayer} |
692 \end{pgfonlayer} |
675 \end{tikzpicture} |
693 \end{tikzpicture} |
676 \end{center} |
694 \end{center} |
677 |
695 |
678 \noindent Each automaton has a single start state and |
696 \noindent Each NFA has some starting states and some accepting |
679 potentially several accepting states. We obtain a NFA for the |
697 states. We obtain a NFA for the regular expression $r_1 + r_2$ |
680 regular expression $r_1 + r_2$ by introducing a new starting |
698 by composing the transition functions (this crucially depends |
681 state and connecting it with an $\epsilon$-transition to the |
699 on knowing that the states of each component NFA are distinct); |
682 two starting states above, like so |
700 and also combine the starting states and accepting functions: |
683 |
701 |
684 \begin{center} |
702 \begin{center} |
685 \hspace{2cm}\begin{tikzpicture}[node distance=3mm, |
703 \begin{tikzpicture}[node distance=3mm, |
686 >=stealth',very thick, every state/.style={minimum size=3pt,draw=blue!50,very thick,fill=blue!20},] |
704 >=stealth',very thick, |
687 \node at (0,0) [state, initial] (1) {$\mbox{}$}; |
705 every state/.style={minimum size=3pt,draw=blue!50,very thick,fill=blue!20},] |
688 \node[state] (2) [above right=16mm of 1] {$\mbox{}$}; |
706 \node at (0,0) (1) {$\mbox{}$}; |
689 \node[state] (3) [below right=16mm of 1] {$\mbox{}$}; |
707 \node (2) [above=10mm of 1] {$$}; |
690 |
708 \node[state, initial] (4) [above=1mm of 2] {$\mbox{}$}; |
691 \node (a) [right=of 2] {$\ldots$}; |
709 \node[state, initial] (5) [below=1mm of 2] {$\mbox{}$}; |
692 \node[state, accepting] (a1) [right=of a] {$\mbox{}$}; |
710 \node[state, initial] (3) [below=10mm of 1] {$\mbox{}$}; |
|
711 |
|
712 \node (a) [right=of 2] {$\ldots\,$}; |
|
713 \node (a1) [right=of a] {$$}; |
693 \node[state, accepting] (a2) [above=of a1] {$\mbox{}$}; |
714 \node[state, accepting] (a2) [above=of a1] {$\mbox{}$}; |
694 \node[state, accepting] (a3) [below=of a1] {$\mbox{}$}; |
715 \node[state, accepting] (a3) [below=of a1] {$\mbox{}$}; |
695 |
716 |
696 \node (b) [right=of 3] {$\ldots$}; |
717 \node (b) [right=of 3] {$\ldots$}; |
697 \node[state, accepting] (b1) [right=of b] {$\mbox{}$}; |
718 \node[state, accepting] (b1) [right=of b] {$\mbox{}$}; |
698 \node[state, accepting] (b2) [above=of b1] {$\mbox{}$}; |
719 \node[state, accepting] (b2) [above=of b1] {$\mbox{}$}; |
699 \node[state, accepting] (b3) [below=of b1] {$\mbox{}$}; |
720 \node[state, accepting] (b3) [below=of b1] {$\mbox{}$}; |
700 |
721 |
701 \path[->] (1) edge node [above] {$\epsilon$} (2); |
722 %\path[->] (1) edge node [above] {$\epsilon$} (2); |
702 \path[->] (1) edge node [below] {$\epsilon$} (3); |
723 %\path[->] (1) edge node [below] {$\epsilon$} (3); |
703 |
724 |
704 \begin{pgfonlayer}{background} |
725 \begin{pgfonlayer}{background} |
705 \node (3) [rounded corners, inner sep=1mm, thick, draw=black!60, fill=black!20, fit= (1) (a2) (a3) (b2) (b3)] {}; |
726 \node (3) [rounded corners, inner sep=1mm, thick, draw=black!60, fill=black!20, fit= (1) (a2) (a3) (b2) (b3)] {}; |
706 \node [yshift=3mm] at (3.north) {$r_1+ r_2$}; |
727 \node [yshift=3mm] at (3.north) {$r_1+ r_2$}; |
707 \end{pgfonlayer} |
728 \end{pgfonlayer} |
708 \end{tikzpicture} |
729 \end{tikzpicture} |
709 \end{center} |
730 \end{center} |
710 |
731 |
711 \noindent |
732 \noindent The code for this construction is in Figure~\ref{thompson2} |
712 Finally for the $*$-case we have an automaton for $r$ |
733 in Lines 25--33. Finally for the $*$-case we have a NFA for $r$ |
713 |
734 |
714 \begin{center} |
735 \begin{center} |
715 \begin{tikzpicture}[node distance=3mm, |
736 \begin{tikzpicture}[node distance=3mm, |
716 >=stealth',very thick, every state/.style={minimum size=3pt,draw=blue!50,very thick,fill=blue!20},] |
737 >=stealth',very thick, every state/.style={minimum size=3pt,draw=blue!50,very thick,fill=blue!20},] |
717 \node at (0,0) (1) {$\mbox{}$}; |
738 \node at (0,0) (1) {$\mbox{}$}; |
725 \node [yshift=3mm] at (1.north) {$r$}; |
746 \node [yshift=3mm] at (1.north) {$r$}; |
726 \end{pgfonlayer} |
747 \end{pgfonlayer} |
727 \end{tikzpicture} |
748 \end{tikzpicture} |
728 \end{center} |
749 \end{center} |
729 |
750 |
730 \noindent and connect its accepting states to a new starting |
751 \noindent and connect its accepting states to a new starting state via |
731 state via $\epsilon$-transitions. This new starting state is |
752 $\epsilon$-transitions. This new starting state is also an accepting |
732 also an accepting state, because $r^*$ can recognise the |
753 state, because $r^*$ can recognise the empty string. This gives the |
733 empty string. This gives the following automaton for $r^*$: |
754 following $\epsilon$NFA for $r^*$ (the corresponding code is in |
|
755 Figure~\ref{thompson2} in Lines 35--43: |
734 |
756 |
735 \begin{center} |
757 \begin{center} |
736 \begin{tikzpicture}[node distance=3mm, |
758 \begin{tikzpicture}[node distance=3mm, |
737 >=stealth',very thick, every state/.style={minimum size=3pt,draw=blue!50,very thick,fill=blue!20},] |
759 >=stealth',very thick, |
|
760 every state/.style={minimum size=3pt,draw=blue!50,very thick,fill=blue!20},] |
738 \node at (0,0) [state, initial,accepting] (1) {$\mbox{}$}; |
761 \node at (0,0) [state, initial,accepting] (1) {$\mbox{}$}; |
739 \node[state] (2) [right=16mm of 1] {$\mbox{}$}; |
762 \node[state] (2) [right=16mm of 1] {$\mbox{}$}; |
740 \node (a) [right=of 2] {$\ldots$}; |
763 \node (a) [right=of 2] {$\ldots$}; |
741 \node[state] (a1) [right=of a] {$\mbox{}$}; |
764 \node[state] (a1) [right=of a] {$\mbox{}$}; |
742 \node[state] (a2) [above=of a1] {$\mbox{}$}; |
765 \node[state] (a2) [above=of a1] {$\mbox{}$}; |
750 \node [yshift=3mm] at (2.north) {$r^*$}; |
773 \node [yshift=3mm] at (2.north) {$r^*$}; |
751 \end{pgfonlayer} |
774 \end{pgfonlayer} |
752 \end{tikzpicture} |
775 \end{tikzpicture} |
753 \end{center} |
776 \end{center} |
754 |
777 |
|
778 |
|
779 To sum ap, you can see in the sequence and star cases the need of |
|
780 having silent $\epsilon$-transitions. Similarly the alternative case |
|
781 shows the need of the NFA-nondeterminsim. It seems awkward to form the |
|
782 `alternative' composition of two DFAs, because DFA do not allow |
|
783 several starting and successor states. All these constructions can now |
|
784 be put together in the following recursive function: |
|
785 |
|
786 |
755 {\small\begin{lstlisting}[language=Scala,linebackgroundcolor= |
787 {\small\begin{lstlisting}[language=Scala,linebackgroundcolor= |
756 {\ifodd\value{lstnumber}\color{capri!3}\fi}] |
788 {\ifodd\value{lstnumber}\color{capri!3}\fi}] |
757 def thompson (r: Rexp) : NFAt = r match { |
789 def thompson (r: Rexp) : NFAt = r match { |
758 case ZERO => NFA_ZERO() |
790 case ZERO => NFA_ZERO() |
759 case ONE => NFA_ONE() |
791 case ONE => NFA_ONE() |
762 case SEQ(r1, r2) => NFA_SEQ(thompson(r1), thompson(r2)) |
794 case SEQ(r1, r2) => NFA_SEQ(thompson(r1), thompson(r2)) |
763 case STAR(r1) => NFA_STAR(thompson(r1)) |
795 case STAR(r1) => NFA_STAR(thompson(r1)) |
764 } |
796 } |
765 \end{lstlisting}} |
797 \end{lstlisting}} |
766 |
798 |
|
799 \noindent |
|
800 It calculates a NFA from a regular expressions. At last we can run a |
|
801 NFA for the our evil regular expression examples. |
|
802 |
767 |
803 |
768 \begin{center} |
804 \begin{center} |
769 \begin{tabular}{@{\hspace{-1mm}}c@{\hspace{1mm}}c@{}} |
805 \begin{tabular}{@{\hspace{-1mm}}c@{\hspace{1mm}}c@{}} |
|
806 \begin{tikzpicture} |
|
807 \begin{axis}[ |
|
808 title={Graph: $\texttt{a?\{n\}\,a{\{n\}}}$ and strings |
|
809 $\underbrace{\texttt{a}\ldots \texttt{a}}_{n}$}, |
|
810 xlabel={$n$}, |
|
811 x label style={at={(1.05,0.0)}}, |
|
812 ylabel={time in secs}, |
|
813 enlargelimits=false, |
|
814 xtick={0,5,...,30}, |
|
815 xmax=33, |
|
816 ymax=35, |
|
817 ytick={0,5,...,30}, |
|
818 scaled ticks=false, |
|
819 axis lines=left, |
|
820 width=5.5cm, |
|
821 height=4.5cm, |
|
822 legend entries={Python,Ruby}, |
|
823 legend pos=south east, |
|
824 legend cell align=left] |
|
825 \addplot[blue,mark=*, mark options={fill=white}] table {re-python.data}; |
|
826 \addplot[brown,mark=triangle*, mark options={fill=white}] table {re-ruby.data}; |
|
827 % breath-first search in NFAs |
|
828 \addplot[red,mark=*, mark options={fill=white}] table { |
|
829 1 0.00586 |
|
830 2 0.01209 |
|
831 3 0.03076 |
|
832 4 0.08269 |
|
833 5 0.12881 |
|
834 6 0.25146 |
|
835 7 0.51377 |
|
836 8 0.89079 |
|
837 9 1.62802 |
|
838 10 3.05326 |
|
839 11 5.92437 |
|
840 12 11.67863 |
|
841 13 24.00568 |
|
842 }; |
|
843 \end{axis} |
|
844 \end{tikzpicture} |
|
845 & |
770 \begin{tikzpicture} |
846 \begin{tikzpicture} |
771 \begin{axis}[ |
847 \begin{axis}[ |
772 title={Graph: $\texttt{(a*)*\,b}$ and strings |
848 title={Graph: $\texttt{(a*)*\,b}$ and strings |
773 $\underbrace{\texttt{a}\ldots \texttt{a}}_{n}$}, |
849 $\underbrace{\texttt{a}\ldots \texttt{a}}_{n}$}, |
774 xlabel={$n$}, |
850 xlabel={$n$}, |
798 6 8.04894 |
874 6 8.04894 |
799 7 32.63549 |
875 7 32.63549 |
800 }; |
876 }; |
801 \end{axis} |
877 \end{axis} |
802 \end{tikzpicture} |
878 \end{tikzpicture} |
803 & |
|
804 \begin{tikzpicture} |
|
805 \begin{axis}[ |
|
806 title={Graph: $\texttt{a?\{n\}\,a{\{n\}}}$ and strings |
|
807 $\underbrace{\texttt{a}\ldots \texttt{a}}_{n}$}, |
|
808 xlabel={$n$}, |
|
809 x label style={at={(1.05,0.0)}}, |
|
810 ylabel={time in secs}, |
|
811 enlargelimits=false, |
|
812 xtick={0,5,...,30}, |
|
813 xmax=33, |
|
814 ymax=35, |
|
815 ytick={0,5,...,30}, |
|
816 scaled ticks=false, |
|
817 axis lines=left, |
|
818 width=5.5cm, |
|
819 height=4.5cm, |
|
820 legend entries={Python,Ruby}, |
|
821 legend pos=south east, |
|
822 legend cell align=left] |
|
823 \addplot[blue,mark=*, mark options={fill=white}] table {re-python.data}; |
|
824 \addplot[brown,mark=triangle*, mark options={fill=white}] table {re-ruby.data}; |
|
825 % breath-first search in NFAs |
|
826 \addplot[red,mark=*, mark options={fill=white}] table { |
|
827 1 0.00741 |
|
828 2 0.02657 |
|
829 3 0.08317 |
|
830 4 0.22133 |
|
831 5 0.54463 |
|
832 6 1.42062 |
|
833 7 4.04926 |
|
834 8 12.70395 |
|
835 9 39.21555 |
|
836 10 112.05466 |
|
837 }; |
|
838 \end{axis} |
|
839 \end{tikzpicture} |
|
840 \end{tabular} |
879 \end{tabular} |
841 \end{center} |
880 \end{center} |
842 |
881 |
843 |
|
844 \noindent This construction of a NFA from a regular expression |
|
845 was invented by Ken Thompson in 1968. |
|
846 |
882 |
847 |
883 |
848 \subsubsection*{Subset Construction} |
884 \subsubsection*{Subset Construction} |
849 |
885 |
850 Remember that we did not bother with defining and implementing |
886 Remember that we did not bother with defining and implementing |