15 regular expressions are easier to reason about and the notion |
15 regular expressions are easier to reason about and the notion |
16 of derivatives, although already quite old, only became more |
16 of derivatives, although already quite old, only became more |
17 widely known rather recently. Still let us in this lecture |
17 widely known rather recently. Still let us in this lecture |
18 have a closer look at automata and their relation to regular |
18 have a closer look at automata and their relation to regular |
19 expressions. This will help us with understanding why the |
19 expressions. This will help us with understanding why the |
20 regular expression matchers in Python and Ruby are so slow |
20 regular expression matchers in Python, Ruby and Java are so slow |
21 with certain regular expressions. The central definition |
21 with certain regular expressions. The central definition |
22 is:\medskip |
22 is:\medskip |
23 |
23 |
24 \noindent |
24 \noindent |
25 A \emph{deterministic finite automaton} (DFA), say $A$, is |
25 A \emph{deterministic finite automaton} (DFA), say $A$, is |
175 possibilities. Also there is the special silent transition in |
175 possibilities. Also there is the special silent transition in |
176 NFAs. As mentioned already this transition means you do not |
176 NFAs. As mentioned already this transition means you do not |
177 have to ``consume'' any part of the input string, but |
177 have to ``consume'' any part of the input string, but |
178 ``silently'' change to a different state. In the left picture, |
178 ``silently'' change to a different state. In the left picture, |
179 for example, if you are in the starting state, you can |
179 for example, if you are in the starting state, you can |
180 silently move either to $q_1$ or $q_2$. |
180 silently move either to $q_1$ or $q_2$. This silent |
|
181 transition is also often called \emph{$\epsilon$-transition}. |
181 |
182 |
182 |
183 |
183 \subsubsection*{Thompson Construction} |
184 \subsubsection*{Thompson Construction} |
184 |
185 |
185 The reason for introducing NFAs is that there is a relatively |
186 The reason for introducing NFAs is that there is a relatively |
556 \noindent There is an equation for each node in the DFA. Let |
557 \noindent There is an equation for each node in the DFA. Let |
557 us have a look how the right-hand sides of the equations are |
558 us have a look how the right-hand sides of the equations are |
558 constructed. First have a look at the second equation: the |
559 constructed. First have a look at the second equation: the |
559 left-hand side is $q_1$ and the right-hand side $q_0\,a$. The |
560 left-hand side is $q_1$ and the right-hand side $q_0\,a$. The |
560 right-hand side is essentially all possible ways how to end up |
561 right-hand side is essentially all possible ways how to end up |
561 in $q_1$. There is only one incoming edge from $q_0$ consuming |
562 in node $q_1$. There is only one incoming edge from $q_0$ consuming |
562 an $a$. Therefore the right hand side is this |
563 an $a$. Therefore the right hand side is this |
563 state followed by character---in this case $q_0\,a$. Now lets |
564 state followed by character---in this case $q_0\,a$. Now lets |
564 have a look at the third equation: there are two incoming |
565 have a look at the third equation: there are two incoming |
565 edges for $q_2$. Therefore we have two terms, namely $q_1\,a$ and |
566 edges for $q_2$. Therefore we have two terms, namely $q_1\,a$ and |
566 $q_2\,a$. These terms are separated by $+$. The first states |
567 $q_2\,a$. These terms are separated by $+$. The first states |