handouts/ho03.tex
changeset 480 9e42ccbbd1e6
parent 471 9476086849ad
child 482 0f6e3c5a1751
equal deleted inserted replaced
478:48b842c997c7 480:9e42ccbbd1e6
     2 \usepackage{../style}
     2 \usepackage{../style}
     3 \usepackage{../langs}
     3 \usepackage{../langs}
     4 \usepackage{../graphics}
     4 \usepackage{../graphics}
     5 
     5 
     6 
     6 
       
     7 %We can even allow ``silent
       
     8 %transitions'', also called epsilon-transitions. They allow us
       
     9 %to go from one state to the next without having a character
       
    10 %consumed. We label such silent transition with the letter
       
    11 %$\epsilon$.
       
    12 
     7 \begin{document}
    13 \begin{document}
       
    14 \fnote{\copyright{} Christian Urban, King's College London, 2014, 2015, 2016, 2017}
     8 
    15 
     9 \section*{Handout 3 (Automata)}
    16 \section*{Handout 3 (Automata)}
    10 
    17 
    11 Every formal language course I know of bombards you first with
    18 Every formal language and compiler course I know of bombards you first
    12 automata and then to a much, much smaller extend with regular
    19 with automata and then to a much, much smaller extend with regular
    13 expressions. As you can see, this course is turned upside
    20 expressions. As you can see, this course is turned upside down:
    14 down: regular expressions come first. The reason is that
    21 regular expressions come first. The reason is that regular expressions
    15 regular expressions are easier to reason about and the notion
    22 are easier to reason about and the notion of derivatives, although
    16 of derivatives, although already quite old, only became more
    23 already quite old, only became more widely known rather
    17 widely known rather recently. Still let us in this lecture
    24 recently. Still let us in this lecture have a closer look at automata
    18 have a closer look at automata and their relation to regular
    25 and their relation to regular expressions. This will help us with
    19 expressions. This will help us with understanding why the
    26 understanding why the regular expression matchers in Python, Ruby and
    20 regular expression matchers in Python, Ruby and Java are so slow
    27 Java are so slow with certain regular expressions. The central
    21 with certain regular expressions. The central definition
    28 definition is:\medskip
    22 is:\medskip 
       
    23 
    29 
    24 \noindent 
    30 \noindent 
    25 A \emph{deterministic finite automaton} (DFA), say $A$, is
    31 A \emph{deterministic finite automaton} (DFA), say $A$, is
    26 defined by a four-tuple written $A(Q, q_0, F, \delta)$ where
    32 given by a four-tuple written $A(Q, q_0, F, \delta)$ where
    27 
    33 
    28 \begin{itemize}
    34 \begin{itemize}
    29 \item $Q$ is a finite set of states,
    35 \item $Q$ is a finite set of states,
    30 \item $q_0 \in Q$ is the start state,
    36 \item $q_0 \in Q$ is the start state,
    31 \item $F \subseteq Q$ are the accepting states, and
    37 \item $F \subseteq Q$ are the accepting states, and
    95 \hat{\delta}(q, c\!::\!s) & \dn & \hat{\delta}(\delta(q, c), s)\\
   101 \hat{\delta}(q, c\!::\!s) & \dn & \hat{\delta}(\delta(q, c), s)\\
    96 \end{array}
   102 \end{array}
    97 \]
   103 \]
    98 
   104 
    99 \noindent This lifted transition function is often called
   105 \noindent This lifted transition function is often called
   100 ``delta-hat''. Given a string, we start in the starting state
   106 ``delta-hat''. Given a string, we start in the starting state and take
   101 and take the first character of the string, follow to the next
   107 the first character of the string, follow to the next sate, then take
   102 sate, then take the second character and so on. Once the
   108 the second character and so on. Once the string is exhausted and we
   103 string is exhausted and we end up in an accepting state, then
   109 end up in an accepting state, then this string is accepted by the
   104 this string is accepted by the automaton. Otherwise it is not
   110 automaton. Otherwise it is not accepted. This also means that if along
   105 accepted. So $s$ is in the \emph{language accepted by the
   111 the way we hit the case where the transition function $\delta$ is not
   106 automaton} $A(Q, q_0, F, \delta)$ iff
   112 defined, we need to raise an error. In our implementation we will deal
       
   113 with this case elegantly by using Scala's \texttt{Try}. So $s$ is in
       
   114 the \emph{language accepted by the automaton} $A(Q, q_0, F, \delta)$
       
   115 iff
   107 
   116 
   108 \[
   117 \[
   109 \hat{\delta}(q_0, s) \in F 
   118 \hat{\delta}(q_0, s) \in F 
   110 \]
   119 \]
   111 
   120 
   112 \noindent I let you think about a definition that describes
   121 \noindent I let you think about a definition that describes
   113 the set of strings accepted by an automaton.
   122 the set of strings accepted by an automaton. 
   114   
   123 
   115 
   124 \begin{figure}[p]
   116 While with DFAs it will always be clear that given a character
   125 \small
   117 what the next state is (potentially none), it will be useful
   126 \lstinputlisting[numbers=left,linebackgroundcolor=
   118 to relax this restriction. That means we have several
   127                   {\ifodd\value{lstnumber}\color{capri!3}\fi}]
   119 potential successor states. We even allow ``silent
   128                   {../progs/dfa.scala}
   120 transitions'', also called epsilon-transitions. They allow us
   129 \caption{Scala implementation of DFAs using partial functions.
   121 to go from one state to the next without having a character
   130   Notice some subtleties: \texttt{deltas} implements the delta-hat
   122 consumed. We label such silent transition with the letter
   131   construction by lifting the transition (partial) function to
   123 $\epsilon$. The resulting construction is called a
   132   lists of \texttt{C}haracters. Since \texttt{delta} is given
   124 \emph{non-deterministic finite automaton} (NFA) given also as
   133   as partial function, it can obviously go ``wrong'' in which
   125 a four-tuple $A(Q, q_0, F, \rho)$ where
   134   case the \texttt{Try} in \texttt{accepts} catches the error and
       
   135   returns \texttt{false}, that is the string is not accepted.
       
   136   The example implements the DFA example shown above.\label{dfa}}
       
   137 \end{figure}
       
   138 
       
   139 A simple Scala implementation of DFAs is given in Figure~\ref{dfa}. As
       
   140 you can see, there many features of the mathematical definition that
       
   141 are quite closely reflected in the code. In the DFA-class, there is a
       
   142 starting state, called \texttt{start}, with the polymorphic type
       
   143 \texttt{A}. There is a partial function \texttt{delta} for specifying
       
   144 the transitions. I used partial functions for transitions in Scala;
       
   145 alternatives would have been \texttt{Maps} or even \texttt{Lists}. One
       
   146 of the main advantages of using partial functions is that transitions
       
   147 can be quite nicely defined by a series of \texttt{case} statements
       
   148 (see Lines 28 -- 38). If you need to represent an automaton
       
   149 with a sink state (catch-all-state), you can write something like
       
   150 
       
   151 {\small\begin{lstlisting}[language=Scala,linebackgroundcolor=
       
   152                   {\ifodd\value{lstnumber}\color{capri!3}\fi}]
       
   153 abstract class State
       
   154 ...
       
   155 case object Sink extends State
       
   156 
       
   157 val delta : (State, Char) :=> State = 
       
   158   { case (S0, 'a') => S1
       
   159     case (S1, 'a') => S2
       
   160     case _ => Sink
       
   161   }
       
   162 \end{lstlisting}}  
       
   163 
       
   164 \noindent The DFA-class has also an argument for final states. It is a
       
   165 function from states to booleans (returns true whenever a state is
       
   166 meant to be finbal; false otherwise). While this is a boolean
       
   167 function, Scala allows me to use sets of states for such functions
       
   168 (see Line 40 where I initialise the DFA). 
       
   169 
       
   170 I let you ponder whether this is a good implementation of DFAs. In
       
   171 doing so I hope you notice that the $Q$ component (the set of finite
       
   172 states) is missing from the class definition. This means that the
       
   173 implementation allows you to do some fishy things you are not meant to
       
   174 do with DFAs. Which fishy things could that be?
       
   175 
       
   176 While with DFAs it will always be clear that given a character what
       
   177 the next state is (potentially none), it will be useful to relax this
       
   178 restriction. That means we have several potential successor states. We
       
   179 even allow more than one starting state. The resulting construction is
       
   180 called a \emph{non-deterministic finite automaton} (NFA) given also as
       
   181 a four-tuple $A(Q, Q_{0s}, F, \rho)$ where
   126 
   182 
   127 \begin{itemize}
   183 \begin{itemize}
   128 \item $Q$ is a finite set of states
   184 \item $Q$ is a finite set of states
   129 \item $q_0$ is a start state
   185 \item $Q_{0s}$ are the start states ($Q_{0s} \subseteq Q$)
   130 \item $F$ are some accepting states with $F \subseteq Q$, and
   186 \item $F$ are some accepting states with $F \subseteq Q$, and
   131 \item $\rho$ is a transition relation.
   187 \item $\rho$ is a transition relation.
   132 \end{itemize}
   188 \end{itemize}
   133 
   189 
   134 \noindent
   190 \noindent