handouts/ho03.tex
changeset 349 434891622131
parent 344 408fd5994288
child 444 3056a4c071b0
equal deleted inserted replaced
348:31e89128ccd2 349:434891622131
   597  
   597  
   598 \noindent Unfortunately we cannot make any more progress with
   598 \noindent Unfortunately we cannot make any more progress with
   599 substituting equations, because both (6) and (7) contain the
   599 substituting equations, because both (6) and (7) contain the
   600 variable on the left-hand side also on the right-hand side.
   600 variable on the left-hand side also on the right-hand side.
   601 Here we need to now use a law that is different from the usual
   601 Here we need to now use a law that is different from the usual
   602 laws. It is called \emph{Arden's rule}. It states that
   602 laws about linear equations. It is called \emph{Arden's rule}.
   603 if an equation is of the form $q = q\,r + s$ then it can be
   603 It states that if an equation is of the form $q = q\,r + s$
   604 transformed to $q = s\, r^*$. Since we can assume $+$ is 
   604 then it can be transformed to $q = s\, r^*$. Since we can
   605 symmetric, equation (7) is of that form: $s$ is $q_0\,a\,a$
   605 assume $+$ is symmetric, Equation (7) is of that form: $s$ is
   606 and $r$ is $a$. That means we can transform Equation (7)
   606 $q_0\,a\,a$ and $r$ is $a$. That means we can transform
   607 to obtain the two new equations
   607 (7) to obtain the two new equations
   608 
   608 
   609 \begin{eqnarray}
   609 \begin{eqnarray}
   610 q_0 & = & \epsilon + q_0\,(b + a\,b) +  q_2\,b\\
   610 q_0 & = & \epsilon + q_0\,(b + a\,b) +  q_2\,b\\
   611 q_2 & = & q_0\,a\,a\,(a^*)
   611 q_2 & = & q_0\,a\,a\,(a^*)
   612 \end{eqnarray}
   612 \end{eqnarray}
   631 
   631 
   632 \begin{eqnarray}
   632 \begin{eqnarray}
   633 q_0 & = & \epsilon\,(b + a\,b + a\,a\,(a^*)\,b)^*
   633 q_0 & = & \epsilon\,(b + a\,b + a\,a\,(a^*)\,b)^*
   634 \end{eqnarray}
   634 \end{eqnarray}
   635 
   635 
   636 \noindent SInce this is a regular expression, we can simplify
   636 \noindent Since this is a regular expression, we can simplify
   637 away the $\epsilon$ to obtain the slightly simpler regular
   637 away the $\epsilon$ to obtain the slightly simpler regular
   638 expression
   638 expression
   639 
   639 
   640 \begin{eqnarray}
   640 \begin{eqnarray}
   641 q_0 & = & (b + a\,b + a\,a\,(a^*)\,b)^*
   641 q_0 & = & (b + a\,b + a\,a\,(a^*)\,b)^*
   818 \end{center}
   818 \end{center}
   819 
   819 
   820 \subsubsection*{Regular Languages}
   820 \subsubsection*{Regular Languages}
   821 
   821 
   822 Given the constructions in the previous sections we obtain 
   822 Given the constructions in the previous sections we obtain 
   823 the following picture:
   823 the following overall picture:
   824 
   824 
   825 \begin{center}
   825 \begin{center}
   826 \begin{tikzpicture}
   826 \begin{tikzpicture}
   827 \node (rexp)  {\bf Regexps};
   827 \node (rexp)  {\bf Regexps};
   828 \node (nfa) [right=of rexp] {\bf NFAs};
   828 \node (nfa) [right=of rexp] {\bf NFAs};
   835 \end{tikzpicture}
   835 \end{tikzpicture}
   836 \end{center}
   836 \end{center}
   837 
   837 
   838 \noindent By going from regular expressions over NFAs to DFAs,
   838 \noindent By going from regular expressions over NFAs to DFAs,
   839 we can always ensure that for every regular expression there
   839 we can always ensure that for every regular expression there
   840 exists a NFA and DFA that can recognise the same language.
   840 exists a NFA and a DFA that can recognise the same language.
   841 Although we did not prove this fact. Similarly by going from
   841 Although we did not prove this fact. Similarly by going from
   842 DFAs to regular expressions, we can make sure for every DFA 
   842 DFAs to regular expressions, we can make sure for every DFA 
   843 there exists a regular expression that can recognise the same
   843 there exists a regular expression that can recognise the same
   844 language. Again we did not prove this fact. 
   844 language. Again we did not prove this fact. 
   845 
   845 
   862 the differences mean in computational terms. Translating a
   862 the differences mean in computational terms. Translating a
   863 regular expression into a NFA gives us an automaton that has
   863 regular expression into a NFA gives us an automaton that has
   864 $O(n)$ nodes---that means the size of the NFA grows linearly
   864 $O(n)$ nodes---that means the size of the NFA grows linearly
   865 with the size of the regular expression. The problem with NFAs
   865 with the size of the regular expression. The problem with NFAs
   866 is that the problem of deciding whether a string is accepted
   866 is that the problem of deciding whether a string is accepted
   867 or not is computationally not cheap. Remember with NFAs we have
   867 or not is computationally not cheap. Remember with NFAs we
   868 potentially many next states even for the same input and also
   868 have potentially many next states even for the same input and
   869 have the silent $\epsilon$-transitions. If we want to find a
   869 also have the silent $\epsilon$-transitions. If we want to
   870 path from the starting state of an NFA to an accepting state,
   870 find a path from the starting state of an NFA to an accepting
   871 we need to consider all possibilities. In Ruby and Python this
   871 state, we need to consider all possibilities. In Ruby and
   872 is done by a depth-first search, which in turn means that if a
   872 Python this is done by a depth-first search, which in turn
   873 ``wrong'' choice is made, the algorithm has to backtrack and
   873 means that if a ``wrong'' choice is made, the algorithm has to
   874 thus explore all potential candidates. This is exactly the
   874 backtrack and thus explore all potential candidates. This is
   875 reason why Ruby and Python are so slow for evil regular
   875 exactly the reason why Ruby and Python are so slow for evil
   876 expressions. The alternative is to explore the search space
   876 regular expressions. An alternative to the potentially slow
   877 in a breadth-first fashion, but this might incur a big memory
   877 depth-first search is to explore the search space in a
       
   878 breadth-first fashion, but this might incur a big memory
   878 penalty.  
   879 penalty.  
   879 
   880 
   880 To avoid the problems with NFAs, we can translate them 
   881 To avoid the problems with NFAs, we can translate them 
   881 into DFAs. With DFAs the problem of deciding whether a
   882 into DFAs. With DFAs the problem of deciding whether a
   882 string is recognised or not is much simpler, because in
   883 string is recognised or not is much simpler, because in
   886 can explode exponentially the number of states. Therefore when 
   887 can explode exponentially the number of states. Therefore when 
   887 this route is taken, we definitely need to minimise the
   888 this route is taken, we definitely need to minimise the
   888 resulting DFAs in order to have an acceptable memory 
   889 resulting DFAs in order to have an acceptable memory 
   889 and runtime behaviour. But remember the subset construction
   890 and runtime behaviour. But remember the subset construction
   890 in the worst case explodes the number of states by $2^n$.
   891 in the worst case explodes the number of states by $2^n$.
       
   892 Effectively also the translation to DFAs can incur a big
       
   893 runtime penalty.
   891 
   894 
   892 But this does not mean that everything is bad with automata.
   895 But this does not mean that everything is bad with automata.
   893 Recall the problem of finding a regular expressions for the
   896 Recall the problem of finding a regular expressions for the
   894 language that is \emph{not} recognised by a regular
   897 language that is \emph{not} recognised by a regular
   895 expression. In our implementation we added explicitly such a
   898 expression. In our implementation we added explicitly such a
   896 regular expressions because they are useful for recognising
   899 regular expressions because they are useful for recognising
   897 comments. But in principle we did not need to. The argument
   900 comments. But in principle we did not need to. The argument
   898 for this is as follows: take a regular expression, translate
   901 for this is as follows: take a regular expression, translate
   899 it into a NFA and DFA that recognise the same language. Once
   902 it into a NFA and then a DFA that both recognise the same
   900 you have the DFA it is very easy to construct the automaton
   903 language. Once you have the DFA it is very easy to construct
   901 for the language not recognised by an DFA. If the DFA is
   904 the automaton for the language not recognised by an DFA. If
   902 completed (this is important!), then you just need to exchange
   905 the DFA is completed (this is important!), then you just need
   903 the accepting and non-accepting states. You can then translate
   906 to exchange the accepting and non-accepting states. You can
   904 this DFA back into a regular expression. 
   907 then translate this DFA back into a regular expression and
   905 
   908 that will be the regular expression that can match all strings
   906 Not all languages are regular. The most well-known example
   909 the original regular expression could \emph{not} match.
   907 of a language that is not regular consists of all the strings
   910 
   908 of the form
   911 It is also interesting that not all languages are regular. The
       
   912 most well-known example of a language that is not regular
       
   913 consists of all the strings of the form
   909 
   914 
   910 \[a^n\,b^n\]
   915 \[a^n\,b^n\]
   911 
   916 
   912 \noindent meaning strings that have the same number of $a$s
   917 \noindent meaning strings that have the same number of $a$s
   913 and $b$s. You can try, but you cannot find a regular
   918 and $b$s. You can try, but you cannot find a regular