diff -r 48b842c997c7 -r 9e42ccbbd1e6 handouts/ho03.tex --- a/handouts/ho03.tex Wed Mar 15 14:34:10 2017 +0000 +++ b/handouts/ho03.tex Wed Mar 22 14:10:01 2017 +0000 @@ -4,26 +4,32 @@ \usepackage{../graphics} +%We can even allow ``silent +%transitions'', also called epsilon-transitions. They allow us +%to go from one state to the next without having a character +%consumed. We label such silent transition with the letter +%$\epsilon$. + \begin{document} +\fnote{\copyright{} Christian Urban, King's College London, 2014, 2015, 2016, 2017} \section*{Handout 3 (Automata)} -Every formal language course I know of bombards you first with -automata and then to a much, much smaller extend with regular -expressions. As you can see, this course is turned upside -down: regular expressions come first. The reason is that -regular expressions are easier to reason about and the notion -of derivatives, although already quite old, only became more -widely known rather recently. Still let us in this lecture -have a closer look at automata and their relation to regular -expressions. This will help us with understanding why the -regular expression matchers in Python, Ruby and Java are so slow -with certain regular expressions. The central definition -is:\medskip +Every formal language and compiler course I know of bombards you first +with automata and then to a much, much smaller extend with regular +expressions. As you can see, this course is turned upside down: +regular expressions come first. The reason is that regular expressions +are easier to reason about and the notion of derivatives, although +already quite old, only became more widely known rather +recently. Still let us in this lecture have a closer look at automata +and their relation to regular expressions. This will help us with +understanding why the regular expression matchers in Python, Ruby and +Java are so slow with certain regular expressions. The central +definition is:\medskip \noindent A \emph{deterministic finite automaton} (DFA), say $A$, is -defined by a four-tuple written $A(Q, q_0, F, \delta)$ where +given by a four-tuple written $A(Q, q_0, F, \delta)$ where \begin{itemize} \item $Q$ is a finite set of states, @@ -97,36 +103,86 @@ \] \noindent This lifted transition function is often called -``delta-hat''. Given a string, we start in the starting state -and take the first character of the string, follow to the next -sate, then take the second character and so on. Once the -string is exhausted and we end up in an accepting state, then -this string is accepted by the automaton. Otherwise it is not -accepted. So $s$ is in the \emph{language accepted by the -automaton} $A(Q, q_0, F, \delta)$ iff +``delta-hat''. Given a string, we start in the starting state and take +the first character of the string, follow to the next sate, then take +the second character and so on. Once the string is exhausted and we +end up in an accepting state, then this string is accepted by the +automaton. Otherwise it is not accepted. This also means that if along +the way we hit the case where the transition function $\delta$ is not +defined, we need to raise an error. In our implementation we will deal +with this case elegantly by using Scala's \texttt{Try}. So $s$ is in +the \emph{language accepted by the automaton} $A(Q, q_0, F, \delta)$ +iff \[ \hat{\delta}(q_0, s) \in F \] \noindent I let you think about a definition that describes -the set of strings accepted by an automaton. - +the set of strings accepted by an automaton. + +\begin{figure}[p] +\small +\lstinputlisting[numbers=left,linebackgroundcolor= + {\ifodd\value{lstnumber}\color{capri!3}\fi}] + {../progs/dfa.scala} +\caption{Scala implementation of DFAs using partial functions. + Notice some subtleties: \texttt{deltas} implements the delta-hat + construction by lifting the transition (partial) function to + lists of \texttt{C}haracters. Since \texttt{delta} is given + as partial function, it can obviously go ``wrong'' in which + case the \texttt{Try} in \texttt{accepts} catches the error and + returns \texttt{false}, that is the string is not accepted. + The example implements the DFA example shown above.\label{dfa}} +\end{figure} + +A simple Scala implementation of DFAs is given in Figure~\ref{dfa}. As +you can see, there many features of the mathematical definition that +are quite closely reflected in the code. In the DFA-class, there is a +starting state, called \texttt{start}, with the polymorphic type +\texttt{A}. There is a partial function \texttt{delta} for specifying +the transitions. I used partial functions for transitions in Scala; +alternatives would have been \texttt{Maps} or even \texttt{Lists}. One +of the main advantages of using partial functions is that transitions +can be quite nicely defined by a series of \texttt{case} statements +(see Lines 28 -- 38). If you need to represent an automaton +with a sink state (catch-all-state), you can write something like -While with DFAs it will always be clear that given a character -what the next state is (potentially none), it will be useful -to relax this restriction. That means we have several -potential successor states. We even allow ``silent -transitions'', also called epsilon-transitions. They allow us -to go from one state to the next without having a character -consumed. We label such silent transition with the letter -$\epsilon$. The resulting construction is called a -\emph{non-deterministic finite automaton} (NFA) given also as -a four-tuple $A(Q, q_0, F, \rho)$ where +{\small\begin{lstlisting}[language=Scala,linebackgroundcolor= + {\ifodd\value{lstnumber}\color{capri!3}\fi}] +abstract class State +... +case object Sink extends State + +val delta : (State, Char) :=> State = + { case (S0, 'a') => S1 + case (S1, 'a') => S2 + case _ => Sink + } +\end{lstlisting}} + +\noindent The DFA-class has also an argument for final states. It is a +function from states to booleans (returns true whenever a state is +meant to be finbal; false otherwise). While this is a boolean +function, Scala allows me to use sets of states for such functions +(see Line 40 where I initialise the DFA). + +I let you ponder whether this is a good implementation of DFAs. In +doing so I hope you notice that the $Q$ component (the set of finite +states) is missing from the class definition. This means that the +implementation allows you to do some fishy things you are not meant to +do with DFAs. Which fishy things could that be? + +While with DFAs it will always be clear that given a character what +the next state is (potentially none), it will be useful to relax this +restriction. That means we have several potential successor states. We +even allow more than one starting state. The resulting construction is +called a \emph{non-deterministic finite automaton} (NFA) given also as +a four-tuple $A(Q, Q_{0s}, F, \rho)$ where \begin{itemize} \item $Q$ is a finite set of states -\item $q_0$ is a start state +\item $Q_{0s}$ are the start states ($Q_{0s} \subseteq Q$) \item $F$ are some accepting states with $F \subseteq Q$, and \item $\rho$ is a transition relation. \end{itemize}