diff -r 31e89128ccd2 -r 434891622131 handouts/ho03.tex --- a/handouts/ho03.tex Thu Oct 08 22:27:10 2015 +0100 +++ b/handouts/ho03.tex Thu Oct 08 22:44:16 2015 +0100 @@ -599,12 +599,12 @@ substituting equations, because both (6) and (7) contain the variable on the left-hand side also on the right-hand side. Here we need to now use a law that is different from the usual -laws. It is called \emph{Arden's rule}. It states that -if an equation is of the form $q = q\,r + s$ then it can be -transformed to $q = s\, r^*$. Since we can assume $+$ is -symmetric, equation (7) is of that form: $s$ is $q_0\,a\,a$ -and $r$ is $a$. That means we can transform Equation (7) -to obtain the two new equations +laws about linear equations. It is called \emph{Arden's rule}. +It states that if an equation is of the form $q = q\,r + s$ +then it can be transformed to $q = s\, r^*$. Since we can +assume $+$ is symmetric, Equation (7) is of that form: $s$ is +$q_0\,a\,a$ and $r$ is $a$. That means we can transform +(7) to obtain the two new equations \begin{eqnarray} q_0 & = & \epsilon + q_0\,(b + a\,b) + q_2\,b\\ @@ -633,7 +633,7 @@ q_0 & = & \epsilon\,(b + a\,b + a\,a\,(a^*)\,b)^* \end{eqnarray} -\noindent SInce this is a regular expression, we can simplify +\noindent Since this is a regular expression, we can simplify away the $\epsilon$ to obtain the slightly simpler regular expression @@ -820,7 +820,7 @@ \subsubsection*{Regular Languages} Given the constructions in the previous sections we obtain -the following picture: +the following overall picture: \begin{center} \begin{tikzpicture} @@ -837,7 +837,7 @@ \noindent By going from regular expressions over NFAs to DFAs, we can always ensure that for every regular expression there -exists a NFA and DFA that can recognise the same language. +exists a NFA and a DFA that can recognise the same language. Although we did not prove this fact. Similarly by going from DFAs to regular expressions, we can make sure for every DFA there exists a regular expression that can recognise the same @@ -864,17 +864,18 @@ $O(n)$ nodes---that means the size of the NFA grows linearly with the size of the regular expression. The problem with NFAs is that the problem of deciding whether a string is accepted -or not is computationally not cheap. Remember with NFAs we have -potentially many next states even for the same input and also -have the silent $\epsilon$-transitions. If we want to find a -path from the starting state of an NFA to an accepting state, -we need to consider all possibilities. In Ruby and Python this -is done by a depth-first search, which in turn means that if a -``wrong'' choice is made, the algorithm has to backtrack and -thus explore all potential candidates. This is exactly the -reason why Ruby and Python are so slow for evil regular -expressions. The alternative is to explore the search space -in a breadth-first fashion, but this might incur a big memory +or not is computationally not cheap. Remember with NFAs we +have potentially many next states even for the same input and +also have the silent $\epsilon$-transitions. If we want to +find a path from the starting state of an NFA to an accepting +state, we need to consider all possibilities. In Ruby and +Python this is done by a depth-first search, which in turn +means that if a ``wrong'' choice is made, the algorithm has to +backtrack and thus explore all potential candidates. This is +exactly the reason why Ruby and Python are so slow for evil +regular expressions. An alternative to the potentially slow +depth-first search is to explore the search space in a +breadth-first fashion, but this might incur a big memory penalty. To avoid the problems with NFAs, we can translate them @@ -888,6 +889,8 @@ resulting DFAs in order to have an acceptable memory and runtime behaviour. But remember the subset construction in the worst case explodes the number of states by $2^n$. +Effectively also the translation to DFAs can incur a big +runtime penalty. But this does not mean that everything is bad with automata. Recall the problem of finding a regular expressions for the @@ -896,16 +899,18 @@ regular expressions because they are useful for recognising comments. But in principle we did not need to. The argument for this is as follows: take a regular expression, translate -it into a NFA and DFA that recognise the same language. Once -you have the DFA it is very easy to construct the automaton -for the language not recognised by an DFA. If the DFA is -completed (this is important!), then you just need to exchange -the accepting and non-accepting states. You can then translate -this DFA back into a regular expression. +it into a NFA and then a DFA that both recognise the same +language. Once you have the DFA it is very easy to construct +the automaton for the language not recognised by an DFA. If +the DFA is completed (this is important!), then you just need +to exchange the accepting and non-accepting states. You can +then translate this DFA back into a regular expression and +that will be the regular expression that can match all strings +the original regular expression could \emph{not} match. -Not all languages are regular. The most well-known example -of a language that is not regular consists of all the strings -of the form +It is also interesting that not all languages are regular. The +most well-known example of a language that is not regular +consists of all the strings of the form \[a^n\,b^n\]