6 |
6 |
7 \begin{document} |
7 \begin{document} |
8 \fnote{\copyright{} Christian Urban, King's College London, 2014, 2015, 2016, 2017} |
8 \fnote{\copyright{} Christian Urban, King's College London, 2014, 2015, 2016, 2017} |
9 |
9 |
10 \section*{Handout 3 (Finite Automata)} |
10 \section*{Handout 3 (Finite Automata)} |
|
11 |
11 |
12 |
12 Every formal language and compiler course I know of bombards you first |
13 Every formal language and compiler course I know of bombards you first |
13 with automata and then to a much, much smaller extend with regular |
14 with automata and then to a much, much smaller extend with regular |
14 expressions. As you can see, this course is turned upside down: |
15 expressions. As you can see, this course is turned upside down: |
15 regular expressions come first. The reason is that regular expressions |
16 regular expressions come first. The reason is that regular expressions |
16 are easier to reason about and the notion of derivatives, although |
17 are easier to reason about and the notion of derivatives, although |
17 already quite old, only became more widely known rather |
18 already quite old, only became more widely known rather |
18 recently. Still let us in this lecture have a closer look at automata |
19 recently. Still, let us in this lecture have a closer look at automata |
19 and their relation to regular expressions. This will help us with |
20 and their relation to regular expressions. This will help us with |
20 understanding why the regular expression matchers in Python, Ruby and |
21 understanding why the regular expression matchers in Python, Ruby and |
21 Java are so slow with certain regular expressions. On the way we will |
22 Java are so slow with certain regular expressions. On the way we will |
22 also see what are the limitations of regular expressions. |
23 also see what are the limitations of regular |
|
24 expressions. Unfortunately, they cannot be used for \emph{everything}. |
23 |
25 |
24 |
26 |
25 \subsection*{Deterministic Finite Automata} |
27 \subsection*{Deterministic Finite Automata} |
26 |
28 |
27 Lets start\ldots the central definition is:\medskip |
29 Lets start\ldots the central definition is:\medskip |
116 |
119 |
117 \[ |
120 \[ |
118 \widehat{\delta}(Q_0, s) \in F |
121 \widehat{\delta}(Q_0, s) \in F |
119 \] |
122 \] |
120 |
123 |
121 \noindent I let you think about a definition that describes |
124 \noindent I let you think about a definition that describes the set of |
122 the set of all strings accepted by an automaton. |
125 all strings accepted by a determinsitic finite automaton. |
123 |
126 |
124 \begin{figure}[p] |
127 \begin{figure}[p] |
125 \small |
128 \small |
126 \lstinputlisting[numbers=left,linebackgroundcolor= |
129 \lstinputlisting[numbers=left,linebackgroundcolor= |
127 {\ifodd\value{lstnumber}\color{capri!3}\fi}] |
130 {\ifodd\value{lstnumber}\color{capri!3}\fi}] |
128 {../progs/display/dfa.scala} |
131 {../progs/display/dfa.scala} |
129 \caption{A Scala implementation of DFAs using partial functions. |
132 \caption{A Scala implementation of DFAs using partial functions. |
130 Notice some subtleties: \texttt{deltas} implements the delta-hat |
133 Note some subtleties: \texttt{deltas} implements the delta-hat |
131 construction by lifting the (partial) transition function to lists |
134 construction by lifting the (partial) transition function to lists |
132 of characters. Since \texttt{delta} is given as a partial function, |
135 of characters. Since \texttt{delta} is given as a partial function, |
133 it can obviously go ``wrong'' in which case the \texttt{Try} in |
136 it can obviously go ``wrong'' in which case the \texttt{Try} in |
134 \texttt{accepts} catches the error and returns \texttt{false}---that |
137 \texttt{accepts} catches the error and returns \texttt{false}---that |
135 means the string is not accepted. The example \texttt{delta} in |
138 means the string is not accepted. The example \texttt{delta} in |
257 |
260 |
258 My implementation of NFAs in Scala is shown in Figure~\ref{nfa}. |
261 My implementation of NFAs in Scala is shown in Figure~\ref{nfa}. |
259 Perhaps interestingly, I do not actually use relations for my NFAs, |
262 Perhaps interestingly, I do not actually use relations for my NFAs, |
260 but use transition functions that return sets of states. DFAs have |
263 but use transition functions that return sets of states. DFAs have |
261 partial transition functions that return a single state; my NFAs |
264 partial transition functions that return a single state; my NFAs |
262 return a set. I let you think about this representation for |
265 return a set of states. I let you think about this representation for |
263 NFA-transitions and how it corresponds to the relations used in the |
266 NFA-transitions and how it corresponds to the relations used in the |
264 mathematical definition of NFAs. An example of a transition function |
267 mathematical definition of NFAs. An example of a transition function |
265 in Scala for the NFA above is |
268 in Scala for the NFA shown above is |
266 |
269 |
267 {\small\begin{lstlisting}[language=Scala,linebackgroundcolor= |
270 {\small\begin{lstlisting}[language=Scala,linebackgroundcolor= |
268 {\ifodd\value{lstnumber}\color{capri!3}\fi}] |
271 {\ifodd\value{lstnumber}\color{capri!3}\fi}] |
269 val nfa_delta : (State, Char) :=> Set[State] = |
272 val nfa_delta : (State, Char) :=> Set[State] = |
270 { case (Q0, 'a') => Set(Q1, Q2) |
273 { case (Q0, 'a') => Set(Q1, Q2) |
402 The obvious question is: Do we get anything in return for this hassle |
405 The obvious question is: Do we get anything in return for this hassle |
403 with silent transitions? Well, we still have to work for it\ldots |
406 with silent transitions? Well, we still have to work for it\ldots |
404 unfortunately. If we were to follow the many textbooks on the |
407 unfortunately. If we were to follow the many textbooks on the |
405 subject, we would now start with defining what $\epsilon$NFAs |
408 subject, we would now start with defining what $\epsilon$NFAs |
406 are---that would require extending the transition relation of |
409 are---that would require extending the transition relation of |
407 NFAs. Next, show that the $\epsilon$NFAs are equivalent to NFAs and so |
410 NFAs. Next, we woudl show that the $\epsilon$NFAs are equivalent to |
408 on. Once we have done all this on paper, we would need to implement |
411 NFAs and so on. Once we have done all this on paper, we would need to |
409 $\epsilon$NFAs. Lets try to take a shortcut instead. We are not really |
412 implement $\epsilon$NFAs. Lets try to take a shortcut instead. We are |
410 interested in $\epsilon$NFAs; they are only a convenient tool for |
413 not really interested in $\epsilon$NFAs; they are only a convenient |
411 translating regular expressions into automata. So we are not going to |
414 tool for translating regular expressions into automata. So we are not |
412 implementing them explicitly, but translate them immediately into NFAs |
415 going to implementing them explicitly, but translate them immediately |
413 (in a sense $\epsilon$NFAs are just a convenient API for lazy people ;o). |
416 into NFAs (in a sense $\epsilon$NFAs are just a convenient API for |
414 How does the translation work? Well we have to find all transitions of |
417 lazy people ;o). How does the translation work? Well we have to find |
415 the form |
418 all transitions of the form |
416 |
419 |
417 \[ |
420 \[ |
418 q\stackrel{\epsilon}{\longrightarrow}\ldots\stackrel{\epsilon}{\longrightarrow} |
421 q\stackrel{\epsilon}{\longrightarrow}\ldots\stackrel{\epsilon}{\longrightarrow} |
419 \;\stackrel{a}{\longrightarrow}\; |
422 \;\stackrel{a}{\longrightarrow}\; |
420 \stackrel{\epsilon}{\longrightarrow}\ldots\stackrel{\epsilon}{\longrightarrow} q' |
423 \stackrel{\epsilon}{\longrightarrow}\ldots\stackrel{\epsilon}{\longrightarrow} q' |
442 \noindent where the single $\epsilon$-transition is replaced by |
445 \noindent where the single $\epsilon$-transition is replaced by |
443 three additional $a$-transitions. Please do the calculations yourself |
446 three additional $a$-transitions. Please do the calculations yourself |
444 and verify that I did not forget any transition. |
447 and verify that I did not forget any transition. |
445 |
448 |
446 So in what follows, whenever we are given an $\epsilon$NFA we will |
449 So in what follows, whenever we are given an $\epsilon$NFA we will |
447 replace it by an equivalent NFA. The code for this is given in |
450 replace it by an equivalent NFA. The Scala code for this translation |
448 Figure~\ref{enfa}. The main workhorse in this code is a function that |
451 is given in Figure~\ref{enfa}. The main workhorse in this code is a |
449 calculates a fixpoint of function (Lines 5--10). This function is used |
452 function that calculates a fixpoint of function (Lines 5--10). This |
450 for ``discovering'' which states are reachable by |
453 function is used for ``discovering'' which states are reachable by |
451 $\epsilon$-transitions. Once no new state is discovered, a fixpoint is |
454 $\epsilon$-transitions. Once no new state is discovered, a fixpoint is |
452 reached. This is used for example when calculating the starting states |
455 reached. This is used for example when calculating the starting states |
453 of an equivalent NFA (see Line 36): we start with all starting states |
456 of an equivalent NFA (see Line 36): we start with all starting states |
454 of the $\epsilon$NFA and then look for all additional states that can |
457 of the $\epsilon$NFA and then look for all additional states that can |
455 be reached by $\epsilon$-transitions. We keep on doing this until no |
458 be reached by $\epsilon$-transitions. We keep on doing this until no |
522 \begin{tikzpicture}[scale=0.7,>=stealth',very thick, every state/.style={minimum size=3pt,draw=blue!50,very thick,fill=blue!20},] |
526 \begin{tikzpicture}[scale=0.7,>=stealth',very thick, every state/.style={minimum size=3pt,draw=blue!50,very thick,fill=blue!20},] |
523 \node[state, initial] (Q_0) {$\mbox{}$}; |
527 \node[state, initial] (Q_0) {$\mbox{}$}; |
524 \node[state, accepting] (Q_1) [right=of Q_0] {$\mbox{}$}; |
528 \node[state, accepting] (Q_1) [right=of Q_0] {$\mbox{}$}; |
525 \path[->] (Q_0) edge node [below] {$c$} (Q_1); |
529 \path[->] (Q_0) edge node [below] {$c$} (Q_1); |
526 \end{tikzpicture}\\ |
530 \end{tikzpicture}\\ |
527 \end{tabular} |
531 \end{tabular}}\label{simplecases} |
528 \end{center} |
532 \end{equation} |
529 |
533 |
530 \noindent |
534 \noindent |
531 I let you think whether the NFAs can match exactly those strings the |
535 I let you think whether the NFAs can match exactly those strings the |
532 regular expressions can match. To do this translation in code we need |
536 regular expressions can match. To do this translation in code we need |
533 a way to construct states programatically...and as an additional |
537 a way to construct states programatically...and as an additional |
534 constrain Scala needs to recognise these states as being distinct. |
538 constrain Scala needs to recognise that these states are being distinct. |
535 For this I implemented in Figure~\ref{thompson1} a class |
539 For this I implemented in Figure~\ref{thompson1} a class |
536 \texttt{TState} that includes a counter and a companion object that |
540 \texttt{TState} that includes a counter and a companion object that |
537 increases this counter whenever a state is created.\footnote{You might |
541 increases this counter whenever a new state is created.\footnote{You might |
538 have to read up what \emph{companion objects} are in Scala.} |
542 have to read up what \emph{companion objects} do in Scala.} |
539 |
543 |
540 \begin{figure}[p] |
544 \begin{figure}[p] |
541 \small |
545 \small |
542 \lstinputlisting[numbers=left,linebackgroundcolor= |
546 \lstinputlisting[numbers=left,linebackgroundcolor= |
543 {\ifodd\value{lstnumber}\color{capri!3}\fi}] |
547 {\ifodd\value{lstnumber}\color{capri!3}\fi}] |
544 {../progs/display/thompson1.scala} |
548 {../progs/display/thompson1.scala} |
545 \caption{The first part of the Thompson Construction. Lines 7--16 |
549 \caption{The first part of the Thompson Construction. Lines 7--16 |
546 implement a way how to create states that are all |
550 implement a way of how to create new states that are all |
547 distinct by virtue of a counter. This counter is |
551 distinct by virtue of a counter. This counter is |
548 increased in the companion object of \texttt{TState} |
552 increased in the companion object of \texttt{TState} |
549 whenever a new state is created. The code in Lines 24--40 |
553 whenever a new state is created. The code in Lines 24--40 |
550 constructs NFAs for the simple regular expressions. |
554 constructs NFAs for the simple regular expressions $\ZERO$, $\ONE$ and $c$. |
|
555 Compare the pictures given in \eqref{simplecases}. |
551 \label{thompson1}} |
556 \label{thompson1}} |
552 \end{figure} |
557 \end{figure} |
553 |
558 |
554 \begin{figure}[p] |
559 \begin{figure}[p] |
555 \small |
560 \small |
562 implements the infix operation \texttt{+++} which |
567 implements the infix operation \texttt{+++} which |
563 combines an $\epsilon$NFA transition with a NFA transition |
568 combines an $\epsilon$NFA transition with a NFA transition |
564 (both given as partial functions).\label{thompson2}} |
569 (both given as partial functions).\label{thompson2}} |
565 \end{figure} |
570 \end{figure} |
566 |
571 |
567 The case for the sequence regular expression $r_1 \cdot r_2$ is as |
572 The case for the sequence regular expression $r_1 \cdot r_2$ is a bit more |
568 follows: We are given by recursion two automata representing $r_1$ and |
573 complicated: We are given by recursion two NFAs representing the regular |
569 $r_2$ respectively. |
574 expressions $r_1$ and $r_2$ respectively. |
570 |
575 |
571 \begin{center} |
576 \begin{center} |
572 \begin{tikzpicture}[node distance=3mm, |
577 \begin{tikzpicture}[node distance=3mm, |
573 >=stealth',very thick, every state/.style={minimum size=3pt,draw=blue!50,very thick,fill=blue!20},] |
578 >=stealth',very thick, |
|
579 every state/.style={minimum size=3pt,draw=blue!50,very thick,fill=blue!20},] |
574 \node[state, initial] (Q_0) {$\mbox{}$}; |
580 \node[state, initial] (Q_0) {$\mbox{}$}; |
575 \node (r_1) [right=of Q_0] {$\ldots$}; |
581 \node[state, initial] (Q_01) [below=1mm of Q_0] {$\mbox{}$}; |
576 \node[state, accepting] (t_1) [right=of r_1] {$\mbox{}$}; |
582 \node[state, initial] (Q_02) [above=1mm of Q_0] {$\mbox{}$}; |
577 \node[state, accepting] (t_2) [above=of t_1] {$\mbox{}$}; |
583 \node (R_1) [right=of Q_0] {$\ldots$}; |
578 \node[state, accepting] (t_3) [below=of t_1] {$\mbox{}$}; |
584 \node[state, accepting] (T_1) [right=of R_1] {$\mbox{}$}; |
579 \node[state, initial] (a_0) [right=2.5cm of t_1] {$\mbox{}$}; |
585 \node[state, accepting] (T_2) [above=of T_1] {$\mbox{}$}; |
580 \node (b_1) [right=of a_0] {$\ldots$}; |
586 \node[state, accepting] (T_3) [below=of T_1] {$\mbox{}$}; |
|
587 |
|
588 \node (A_0) [right=2.5cm of T_1] {$\mbox{}$}; |
|
589 \node[state, initial] (A_01) [above=1mm of A_0] {$\mbox{}$}; |
|
590 \node[state, initial] (A_02) [below=1mm of A_0] {$\mbox{}$}; |
|
591 |
|
592 \node (b_1) [right=of A_0] {$\ldots$}; |
581 \node[state, accepting] (c_1) [right=of b_1] {$\mbox{}$}; |
593 \node[state, accepting] (c_1) [right=of b_1] {$\mbox{}$}; |
582 \node[state, accepting] (c_2) [above=of c_1] {$\mbox{}$}; |
594 \node[state, accepting] (c_2) [above=of c_1] {$\mbox{}$}; |
583 \node[state, accepting] (c_3) [below=of c_1] {$\mbox{}$}; |
595 \node[state, accepting] (c_3) [below=of c_1] {$\mbox{}$}; |
584 \begin{pgfonlayer}{background} |
596 \begin{pgfonlayer}{background} |
585 \node (1) [rounded corners, inner sep=1mm, thick, draw=black!60, fill=black!20, fit= (Q_0) (r_1) (t_1) (t_2) (t_3)] {}; |
597 \node (1) [rounded corners, inner sep=1mm, thick, |
586 \node (2) [rounded corners, inner sep=1mm, thick, draw=black!60, fill=black!20, fit= (a_0) (b_1) (c_1) (c_2) (c_3)] {}; |
598 draw=black!60, fill=black!20, fit= (Q_0) (R_1) (T_1) (T_2) (T_3)] {}; |
|
599 \node (2) [rounded corners, inner sep=1mm, thick, |
|
600 draw=black!60, fill=black!20, fit= (A_0) (b_1) (c_1) (c_2) (c_3)] {}; |
587 \node [yshift=2mm] at (1.north) {$r_1$}; |
601 \node [yshift=2mm] at (1.north) {$r_1$}; |
588 \node [yshift=2mm] at (2.north) {$r_2$}; |
602 \node [yshift=2mm] at (2.north) {$r_2$}; |
589 \end{pgfonlayer} |
603 \end{pgfonlayer} |
590 \end{tikzpicture} |
604 \end{tikzpicture} |
591 \end{center} |
605 \end{center} |
592 |
606 |
593 \noindent The first automaton has some accepting states. We |
607 \noindent The first NFA has some accepting states and the second some |
594 obtain an automaton for $r_1\cdot r_2$ by connecting these |
608 starting states. We obtain $\epsilon$NFA for $r_1\cdot r_2$ by |
595 accepting states with $\epsilon$-transitions to the starting |
609 connecting these accepting states with $\epsilon$-transitions to the |
596 state of the second automaton. By doing so we make them |
610 starting states of the second automaton. By doing so we make them |
597 non-accepting like so: |
611 non-accepting like so: |
598 |
612 |
599 \begin{center} |
613 \begin{center} |
600 \begin{tikzpicture}[node distance=3mm, |
614 \begin{tikzpicture}[node distance=3mm, |
601 >=stealth',very thick, every state/.style={minimum size=3pt,draw=blue!50,very thick,fill=blue!20},] |
615 >=stealth',very thick, |
|
616 every state/.style={minimum size=3pt,draw=blue!50,very thick,fill=blue!20},] |
602 \node[state, initial] (Q_0) {$\mbox{}$}; |
617 \node[state, initial] (Q_0) {$\mbox{}$}; |
|
618 \node[state, initial] (Q_01) [below=1mm of Q_0] {$\mbox{}$}; |
|
619 \node[state, initial] (Q_02) [above=1mm of Q_0] {$\mbox{}$}; |
603 \node (r_1) [right=of Q_0] {$\ldots$}; |
620 \node (r_1) [right=of Q_0] {$\ldots$}; |
604 \node[state] (t_1) [right=of r_1] {$\mbox{}$}; |
621 \node[state] (t_1) [right=of r_1] {$\mbox{}$}; |
605 \node[state] (t_2) [above=of t_1] {$\mbox{}$}; |
622 \node[state] (t_2) [above=of t_1] {$\mbox{}$}; |
606 \node[state] (t_3) [below=of t_1] {$\mbox{}$}; |
623 \node[state] (t_3) [below=of t_1] {$\mbox{}$}; |
607 \node[state] (a_0) [right=2.5cm of t_1] {$\mbox{}$}; |
624 |
608 \node (b_1) [right=of a_0] {$\ldots$}; |
625 \node (A_0) [right=2.5cm of t_1] {$\mbox{}$}; |
|
626 \node[state] (A_01) [above=1mm of A_0] {$\mbox{}$}; |
|
627 \node[state] (A_02) [below=1mm of A_0] {$\mbox{}$}; |
|
628 |
|
629 \node (b_1) [right=of A_0] {$\ldots$}; |
609 \node[state, accepting] (c_1) [right=of b_1] {$\mbox{}$}; |
630 \node[state, accepting] (c_1) [right=of b_1] {$\mbox{}$}; |
610 \node[state, accepting] (c_2) [above=of c_1] {$\mbox{}$}; |
631 \node[state, accepting] (c_2) [above=of c_1] {$\mbox{}$}; |
611 \node[state, accepting] (c_3) [below=of c_1] {$\mbox{}$}; |
632 \node[state, accepting] (c_3) [below=of c_1] {$\mbox{}$}; |
612 \path[->] (t_1) edge node [above, pos=0.3] {$\epsilon$} (a_0); |
633 \path[->] (t_1) edge (A_01); |
613 \path[->] (t_2) edge node [above] {$\epsilon$} (a_0); |
634 \path[->] (t_2) edge node [above] {$\epsilon$} (A_01); |
614 \path[->] (t_3) edge node [below] {$\epsilon$} (a_0); |
635 \path[->] (t_3) edge (A_01); |
615 |
636 \path[->] (t_1) edge (A_02); |
|
637 \path[->] (t_2) edge (A_02); |
|
638 \path[->] (t_3) edge node [below] {$\epsilon$} (A_02); |
|
639 |
616 \begin{pgfonlayer}{background} |
640 \begin{pgfonlayer}{background} |
617 \node (3) [rounded corners, inner sep=1mm, thick, draw=black!60, fill=black!20, fit= (Q_0) (c_1) (c_2) (c_3)] {}; |
641 \node (3) [rounded corners, inner sep=1mm, thick, |
|
642 draw=black!60, fill=black!20, fit= (Q_0) (c_1) (c_2) (c_3)] {}; |
618 \node [yshift=2mm] at (3.north) {$r_1\cdot r_2$}; |
643 \node [yshift=2mm] at (3.north) {$r_1\cdot r_2$}; |
619 \end{pgfonlayer} |
644 \end{pgfonlayer} |
620 \end{tikzpicture} |
645 \end{tikzpicture} |
621 \end{center} |
646 \end{center} |
622 |
647 |
724 \node [yshift=3mm] at (2.north) {$r^*$}; |
750 \node [yshift=3mm] at (2.north) {$r^*$}; |
725 \end{pgfonlayer} |
751 \end{pgfonlayer} |
726 \end{tikzpicture} |
752 \end{tikzpicture} |
727 \end{center} |
753 \end{center} |
728 |
754 |
|
755 {\small\begin{lstlisting}[language=Scala,linebackgroundcolor= |
|
756 {\ifodd\value{lstnumber}\color{capri!3}\fi}] |
|
757 def thompson (r: Rexp) : NFAt = r match { |
|
758 case ZERO => NFA_ZERO() |
|
759 case ONE => NFA_ONE() |
|
760 case CHAR(c) => NFA_CHAR(c) |
|
761 case ALT(r1, r2) => NFA_ALT(thompson(r1), thompson(r2)) |
|
762 case SEQ(r1, r2) => NFA_SEQ(thompson(r1), thompson(r2)) |
|
763 case STAR(r1) => NFA_STAR(thompson(r1)) |
|
764 } |
|
765 \end{lstlisting}} |
|
766 |
|
767 |
|
768 \begin{center} |
|
769 \begin{tabular}{@{\hspace{-1mm}}c@{\hspace{1mm}}c@{}} |
|
770 \begin{tikzpicture} |
|
771 \begin{axis}[ |
|
772 title={Graph: $\texttt{(a*)*\,b}$ and strings |
|
773 $\underbrace{\texttt{a}\ldots \texttt{a}}_{n}$}, |
|
774 xlabel={$n$}, |
|
775 x label style={at={(1.05,0.0)}}, |
|
776 ylabel={time in secs}, |
|
777 enlargelimits=false, |
|
778 xtick={0,5,...,30}, |
|
779 xmax=33, |
|
780 ymax=35, |
|
781 ytick={0,5,...,30}, |
|
782 scaled ticks=false, |
|
783 axis lines=left, |
|
784 width=5.5cm, |
|
785 height=4.5cm, |
|
786 legend entries={Python, Java}, |
|
787 legend pos=outer north east, |
|
788 legend cell align=left] |
|
789 \addplot[blue,mark=*, mark options={fill=white}] table {re-python2.data}; |
|
790 \addplot[cyan,mark=*, mark options={fill=white}] table {re-java.data}; |
|
791 % depth-first search in NFAs |
|
792 \addplot[red,mark=*, mark options={fill=white}] table { |
|
793 1 0.00605 |
|
794 2 0.03086 |
|
795 3 0.11994 |
|
796 4 0.45389 |
|
797 5 2.06192 |
|
798 6 8.04894 |
|
799 7 32.63549 |
|
800 }; |
|
801 \end{axis} |
|
802 \end{tikzpicture} |
|
803 & |
|
804 \begin{tikzpicture} |
|
805 \begin{axis}[ |
|
806 title={Graph: $\texttt{a?\{n\}\,a{\{n\}}}$ and strings |
|
807 $\underbrace{\texttt{a}\ldots \texttt{a}}_{n}$}, |
|
808 xlabel={$n$}, |
|
809 x label style={at={(1.05,0.0)}}, |
|
810 ylabel={time in secs}, |
|
811 enlargelimits=false, |
|
812 xtick={0,5,...,30}, |
|
813 xmax=33, |
|
814 ymax=35, |
|
815 ytick={0,5,...,30}, |
|
816 scaled ticks=false, |
|
817 axis lines=left, |
|
818 width=5.5cm, |
|
819 height=4.5cm, |
|
820 legend entries={Python,Ruby}, |
|
821 legend pos=south east, |
|
822 legend cell align=left] |
|
823 \addplot[blue,mark=*, mark options={fill=white}] table {re-python.data}; |
|
824 \addplot[brown,mark=triangle*, mark options={fill=white}] table {re-ruby.data}; |
|
825 % breath-first search in NFAs |
|
826 \addplot[red,mark=*, mark options={fill=white}] table { |
|
827 1 0.00741 |
|
828 2 0.02657 |
|
829 3 0.08317 |
|
830 4 0.22133 |
|
831 5 0.54463 |
|
832 6 1.42062 |
|
833 7 4.04926 |
|
834 8 12.70395 |
|
835 9 39.21555 |
|
836 10 112.05466 |
|
837 }; |
|
838 \end{axis} |
|
839 \end{tikzpicture} |
|
840 \end{tabular} |
|
841 \end{center} |
|
842 |
|
843 |
729 \noindent This construction of a NFA from a regular expression |
844 \noindent This construction of a NFA from a regular expression |
730 was invented by Ken Thompson in 1968. |
845 was invented by Ken Thompson in 1968. |
731 |
846 |
732 |
847 |
733 \subsubsection*{Subset Construction} |
848 \subsubsection*{Subset Construction} |
734 |
849 |
735 What is interesting is that for every NFA we can find a DFA |
850 Remember that we did not bother with defining and implementing |
736 which recognises the same language. This can, for example, be |
851 $\epsilon$NFA; we immediately translated them into equivalent |
737 done by the \emph{subset construction}. Consider again the NFA |
852 NFAs. Equivalent in the sense of accepting the same language (though |
738 below on the left, consisting of nodes labelled $0$, $1$ and $2$. |
853 we only claimed this and did not prove it rigorously). Remember also |
|
854 that NFAs have a non-deterministic transitions, given as a relation. |
|
855 This non-determinism makes it harder to decide when a string is |
|
856 accepted or not; such a decision is rather straightforward with DFAs |
|
857 (remember their transition function). |
|
858 |
|
859 What is interesting is that for every NFA we can find a DFA that also |
|
860 recognises the same language. This might sound like a bit paradoxical, |
|
861 but I litke to show you this next. There are a number of ways of |
|
862 transforming a NFA into an equivalent DFA, but the most famous is |
|
863 \emph{subset construction}. Consider again the NFA below on the left, |
|
864 consisting of nodes labelled, say, with $0$, $1$ and $2$. |
739 |
865 |
740 \begin{center} |
866 \begin{center} |
741 \begin{tabular}{c@{\hspace{10mm}}c} |
867 \begin{tabular}{c@{\hspace{10mm}}c} |
742 \begin{tikzpicture}[scale=0.7,>=stealth',very thick, |
868 \begin{tikzpicture}[scale=0.7,>=stealth',very thick, |
743 every state/.style={minimum size=0pt, |
869 every state/.style={minimum size=0pt, |