diff -r b79e704acb72 -r 5b5a68df6d16 handouts/ho03.tex --- a/handouts/ho03.tex Mon Sep 15 07:25:17 2014 +0100 +++ b/handouts/ho03.tex Mon Sep 15 09:36:02 2014 +0100 @@ -1,9 +1,7 @@ \documentclass{article} -\usepackage{hyperref} -\usepackage{amssymb} -\usepackage{amsmath} -\usepackage[T1]{fontenc} -\usepackage{listings} +\usepackage{../style} +\usepackage{../langs} + \usepackage{xcolor} \usepackage{tikz} \usetikzlibrary{arrows} @@ -14,19 +12,19 @@ \usetikzlibrary{calc} \usetikzlibrary{fit} \usetikzlibrary{backgrounds} -\usepackage{../langs} - -\newcommand{\dn}{\stackrel{\mbox{\scriptsize def}}{=}}% \begin{document} \section*{Handout 3} -Let us have a closer look at automata and their relation to regular expressions. This will help us to understand -why the regular expression matchers in Python and Ruby are so slow with certain regular expressions. +Let us have a closer look at automata and their relation to +regular expressions. This will help us to understand why the +regular expression matchers in Python and Ruby are so slow +with certain regular expressions. -A \emph{deterministic finite automaton} (DFA), say $A$, is defined by a four-tuple written $A(Q, q_0, F, \delta)$ where +A \emph{deterministic finite automaton} (DFA), say $A$, is +defined by a four-tuple written $A(Q, q_0, F, \delta)$ where \begin{itemize} \item $Q$ is a set of states, @@ -35,10 +33,12 @@ \item $\delta$ is the transition function. \end{itemize} -\noindent -The transition function determines how to ``transition'' from one state to the next state with respect to a character. -We have the assumption that these functions do not need to be defined everywhere: so it can be the case that -given a character there is no next state, in which case we need to raise a kind of ``raise an exception''. A typical +\noindent The transition function determines how to +``transition'' from one state to the next state with respect +to a character. We have the assumption that these functions do +not need to be defined everywhere: so it can be the case that +given a character there is no next state, in which case we +need to raise a kind of ``raise an exception''. A typical example of a DFA is \begin{center} @@ -61,11 +61,11 @@ \end{tikzpicture} \end{center} -\noindent -The accepting state $q_4$ is indicated with double circles. It is possible that a DFA has no -accepting states at all, or that the starting state is also an accepting state. -In the case above the transition function is defined everywhere and can be given as a table -as follows: +\noindent The accepting state $q_4$ is indicated with double +circles. It is possible that a DFA has no accepting states at +all, or that the starting state is also an accepting state. In +the case above the transition function is defined everywhere +and can be given as a table as follows: \[ \begin{array}{lcl} @@ -82,9 +82,9 @@ \end{array} \] -\noindent -We need to define the notion of what language is accepted by an automaton. For this we -lift the transition function $\delta$ from characters to strings as follows: +\noindent We need to define the notion of what language is +accepted by an automaton. For this we lift the transition +function $\delta$ from characters to strings as follows: \[ \begin{array}{lcl} @@ -93,20 +93,24 @@ \end{array} \] -\noindent -Given a string this means we start in the starting state and take the first character of the string, -follow to the next sate, then take the second character and so on. Once the string is exhausted -and we end up in an accepting state, then this string is accepted. Otherwise it is not accepted. -So $s$ in the \emph{language accepted by the automaton} $A(Q, q_0, F, \delta)$ iff +\noindent Given a string this means we start in the starting +state and take the first character of the string, follow to +the next sate, then take the second character and so on. Once +the string is exhausted and we end up in an accepting state, +then this string is accepted. Otherwise it is not accepted. So +$s$ in the \emph{language accepted by the automaton} $A(Q, +q_0, F, \delta)$ iff \[ \hat{\delta}(q_0, s) \in F \] -While with DFA it will always clear that given a character what the next state is, it will be useful to relax -this restriction. The resulting construction is called a \emph{non-deterministic finite automaton} (NFA) given -as a four-tuple $A(Q, q_0, F, \rho)$ where +While with DFA it will always clear that given a character +what the next state is, it will be useful to relax this +restriction. The resulting construction is called a +\emph{non-deterministic finite automaton} (NFA) given as a +four-tuple $A(Q, q_0, F, \rho)$ where \begin{itemize} \item $Q$ is a finite set of states @@ -145,21 +149,26 @@ \end{tabular} \end{center} -\noindent -There are a number of points you should note. Every DFA is a NFA, but not vice versa. -The $\rho$ in NFAs is a transition \emph{relation} -(DFAs have a transition function). The difference between a function and a relation is that a function -has always a single output, while a relation gives, roughly speaking, several outputs. Look -at the NFA on the right-hand side above: if you are currently in the state $r_2$ and you read a -character $a$, then you can transition to $r_1$ \emph{or} $r_3$. Which route you take is not -determined. This means if we need to decide whether a string is accepted by a NFA, we might have -to explore all possibilities. Also there is a special transition in NFAs which is called \emph{epsilon-transition} -or \emph{silent transition}. This transition means you do not have to ``consume'' no part of -the input string, but ``silently'' change to a different state. +\noindent There are a number of points you should note. Every +DFA is a NFA, but not vice versa. The $\rho$ in NFAs is a +transition \emph{relation} (DFAs have a transition function). +The difference between a function and a relation is that a +function has always a single output, while a relation gives, +roughly speaking, several outputs. Look at the NFA on the +right-hand side above: if you are currently in the state $r_2$ +and you read a character $a$, then you can transition to $r_1$ +\emph{or} $r_3$. Which route you take is not determined. This +means if we need to decide whether a string is accepted by a +NFA, we might have to explore all possibilities. Also there is +a special transition in NFAs which is called +\emph{epsilon-transition} or \emph{silent transition}. This +transition means you do not have to ``consume'' no part of the +input string, but ``silently'' change to a different state. -The reason for introducing NFAs is that there is a relatively simple (recursive) translation of regular expressions into -NFAs. Consider the simple regular expressions $\varnothing$, $\epsilon$ and $c$. They can be translated -as follows: +The reason for introducing NFAs is that there is a relatively +simple (recursive) translation of regular expressions into +NFAs. Consider the simple regular expressions $\varnothing$, +$\epsilon$ and $c$. They can be translated as follows: \begin{center} \begin{tabular}[t]{l@{\hspace{10mm}}l} @@ -180,9 +189,9 @@ \end{tabular} \end{center} -\noindent -The case for the sequence regular expression $r_1 \cdot r_2$ is as follows: We are given by recursion -two automata representing $r_1$ and $r_2$ respectively. +\noindent The case for the sequence regular expression $r_1 +\cdot r_2$ is as follows: We are given by recursion two +automata representing $r_1$ and $r_2$ respectively. \begin{center} \begin{tikzpicture}[node distance=3mm, @@ -206,10 +215,11 @@ \end{tikzpicture} \end{center} -\noindent -The first automaton has some accepting states. We obtain an automaton for $r_1\cdot r_2$ by connecting -these accepting states with $\epsilon$-transitions to the starting state of the second automaton. By doing -so we make them non-accepting like so: +\noindent The first automaton has some accepting states. We +obtain an automaton for $r_1\cdot r_2$ by connecting these +accepting states with $\epsilon$-transitions to the starting +state of the second automaton. By doing so we make them +non-accepting like so: \begin{center} \begin{tikzpicture}[node distance=3mm, @@ -235,9 +245,9 @@ \end{tikzpicture} \end{center} -\noindent -The case for the choice regular expression $r_1 + r_2$ is slightly different: We are given by recursion -two automata representing $r_1$ and $r_2$ respectively. +\noindent The case for the choice regular expression $r_1 + +r_2$ is slightly different: We are given by recursion two +automata representing $r_1$ and $r_2$ respectively. \begin{center} \begin{tikzpicture}[node distance=3mm, @@ -264,10 +274,11 @@ \end{tikzpicture} \end{center} -\noindent -Each automaton has a single start state and potentially several accepting states. We obtain a -NFA for the regular expression $r_1 + r_2$ by introducing a new starting state and connecting it -with an $\epsilon$-transition to the two starting states above, like so +\noindent Each automaton has a single start state and +potentially several accepting states. We obtain a NFA for the +regular expression $r_1 + r_2$ by introducing a new starting +state and connecting it with an $\epsilon$-transition to the +two starting states above, like so \begin{center} \hspace{2cm}\begin{tikzpicture}[node distance=3mm, @@ -315,10 +326,10 @@ \end{tikzpicture} \end{center} -\noindent -and connect its accepting states to a new starting state via $\epsilon$-transitions. This new -starting state is also an accepting state, because $r^*$ can also recognise the empty string. -This gives the following automaton for $r^*$: +\noindent and connect its accepting states to a new starting +state via $\epsilon$-transitions. This new starting state is +also an accepting state, because $r^*$ can also recognise the +empty string. This gives the following automaton for $r^*$: \begin{center} \begin{tikzpicture}[node distance=3mm, @@ -340,8 +351,8 @@ \end{tikzpicture} \end{center} -\noindent -This construction of a NFA from a regular expression was invented by Ken Thompson in 1968. +\noindent This construction of a NFA from a regular expression +was invented by Ken Thompson in 1968. \end{document}