handouts/ho03.tex
changeset 459 780486571e38
parent 444 3056a4c071b0
child 471 9476086849ad
equal deleted inserted replaced
458:896a5f91838d 459:780486571e38
    15 regular expressions are easier to reason about and the notion
    15 regular expressions are easier to reason about and the notion
    16 of derivatives, although already quite old, only became more
    16 of derivatives, although already quite old, only became more
    17 widely known rather recently. Still let us in this lecture
    17 widely known rather recently. Still let us in this lecture
    18 have a closer look at automata and their relation to regular
    18 have a closer look at automata and their relation to regular
    19 expressions. This will help us with understanding why the
    19 expressions. This will help us with understanding why the
    20 regular expression matchers in Python and Ruby are so slow
    20 regular expression matchers in Python, Ruby and Java are so slow
    21 with certain regular expressions. The central definition
    21 with certain regular expressions. The central definition
    22 is:\medskip 
    22 is:\medskip 
    23 
    23 
    24 \noindent 
    24 \noindent 
    25 A \emph{deterministic finite automaton} (DFA), say $A$, is
    25 A \emph{deterministic finite automaton} (DFA), say $A$, is
   175 possibilities. Also there is the special silent transition in
   175 possibilities. Also there is the special silent transition in
   176 NFAs. As mentioned already this transition means you do not
   176 NFAs. As mentioned already this transition means you do not
   177 have to ``consume'' any part of the input string, but
   177 have to ``consume'' any part of the input string, but
   178 ``silently'' change to a different state. In the left picture,
   178 ``silently'' change to a different state. In the left picture,
   179 for example, if you are in the starting state, you can 
   179 for example, if you are in the starting state, you can 
   180 silently move either to $q_1$ or $q_2$.
   180 silently move either to $q_1$ or $q_2$. This silent
       
   181 transition is also often called \emph{$\epsilon$-transition}.
   181 
   182 
   182 
   183 
   183 \subsubsection*{Thompson Construction}
   184 \subsubsection*{Thompson Construction}
   184 
   185 
   185 The reason for introducing NFAs is that there is a relatively
   186 The reason for introducing NFAs is that there is a relatively
   556 \noindent There is an equation for each node in the DFA. Let
   557 \noindent There is an equation for each node in the DFA. Let
   557 us have a look how the right-hand sides of the equations are
   558 us have a look how the right-hand sides of the equations are
   558 constructed. First have a look at the second equation: the
   559 constructed. First have a look at the second equation: the
   559 left-hand side is $q_1$ and the right-hand side $q_0\,a$. The
   560 left-hand side is $q_1$ and the right-hand side $q_0\,a$. The
   560 right-hand side is essentially all possible ways how to end up
   561 right-hand side is essentially all possible ways how to end up
   561 in $q_1$. There is only one incoming edge from $q_0$ consuming
   562 in node $q_1$. There is only one incoming edge from $q_0$ consuming
   562 an $a$.  Therefore the right hand side is this
   563 an $a$.  Therefore the right hand side is this
   563 state followed by character---in this case $q_0\,a$. Now lets
   564 state followed by character---in this case $q_0\,a$. Now lets
   564 have a look at the third equation: there are two incoming
   565 have a look at the third equation: there are two incoming
   565 edges for $q_2$. Therefore we have two terms, namely $q_1\,a$ and
   566 edges for $q_2$. Therefore we have two terms, namely $q_1\,a$ and
   566 $q_2\,a$. These terms are separated by $+$. The first states
   567 $q_2\,a$. These terms are separated by $+$. The first states