2 \usepackage{../style} |
2 \usepackage{../style} |
3 \usepackage{../langs} |
3 \usepackage{../langs} |
4 \usepackage{../graphics} |
4 \usepackage{../graphics} |
5 |
5 |
6 |
6 |
|
7 %We can even allow ``silent |
|
8 %transitions'', also called epsilon-transitions. They allow us |
|
9 %to go from one state to the next without having a character |
|
10 %consumed. We label such silent transition with the letter |
|
11 %$\epsilon$. |
|
12 |
7 \begin{document} |
13 \begin{document} |
|
14 \fnote{\copyright{} Christian Urban, King's College London, 2014, 2015, 2016, 2017} |
8 |
15 |
9 \section*{Handout 3 (Automata)} |
16 \section*{Handout 3 (Automata)} |
10 |
17 |
11 Every formal language course I know of bombards you first with |
18 Every formal language and compiler course I know of bombards you first |
12 automata and then to a much, much smaller extend with regular |
19 with automata and then to a much, much smaller extend with regular |
13 expressions. As you can see, this course is turned upside |
20 expressions. As you can see, this course is turned upside down: |
14 down: regular expressions come first. The reason is that |
21 regular expressions come first. The reason is that regular expressions |
15 regular expressions are easier to reason about and the notion |
22 are easier to reason about and the notion of derivatives, although |
16 of derivatives, although already quite old, only became more |
23 already quite old, only became more widely known rather |
17 widely known rather recently. Still let us in this lecture |
24 recently. Still let us in this lecture have a closer look at automata |
18 have a closer look at automata and their relation to regular |
25 and their relation to regular expressions. This will help us with |
19 expressions. This will help us with understanding why the |
26 understanding why the regular expression matchers in Python, Ruby and |
20 regular expression matchers in Python, Ruby and Java are so slow |
27 Java are so slow with certain regular expressions. The central |
21 with certain regular expressions. The central definition |
28 definition is:\medskip |
22 is:\medskip |
|
23 |
29 |
24 \noindent |
30 \noindent |
25 A \emph{deterministic finite automaton} (DFA), say $A$, is |
31 A \emph{deterministic finite automaton} (DFA), say $A$, is |
26 defined by a four-tuple written $A(Q, q_0, F, \delta)$ where |
32 given by a four-tuple written $A(Q, q_0, F, \delta)$ where |
27 |
33 |
28 \begin{itemize} |
34 \begin{itemize} |
29 \item $Q$ is a finite set of states, |
35 \item $Q$ is a finite set of states, |
30 \item $q_0 \in Q$ is the start state, |
36 \item $q_0 \in Q$ is the start state, |
31 \item $F \subseteq Q$ are the accepting states, and |
37 \item $F \subseteq Q$ are the accepting states, and |
95 \hat{\delta}(q, c\!::\!s) & \dn & \hat{\delta}(\delta(q, c), s)\\ |
101 \hat{\delta}(q, c\!::\!s) & \dn & \hat{\delta}(\delta(q, c), s)\\ |
96 \end{array} |
102 \end{array} |
97 \] |
103 \] |
98 |
104 |
99 \noindent This lifted transition function is often called |
105 \noindent This lifted transition function is often called |
100 ``delta-hat''. Given a string, we start in the starting state |
106 ``delta-hat''. Given a string, we start in the starting state and take |
101 and take the first character of the string, follow to the next |
107 the first character of the string, follow to the next sate, then take |
102 sate, then take the second character and so on. Once the |
108 the second character and so on. Once the string is exhausted and we |
103 string is exhausted and we end up in an accepting state, then |
109 end up in an accepting state, then this string is accepted by the |
104 this string is accepted by the automaton. Otherwise it is not |
110 automaton. Otherwise it is not accepted. This also means that if along |
105 accepted. So $s$ is in the \emph{language accepted by the |
111 the way we hit the case where the transition function $\delta$ is not |
106 automaton} $A(Q, q_0, F, \delta)$ iff |
112 defined, we need to raise an error. In our implementation we will deal |
|
113 with this case elegantly by using Scala's \texttt{Try}. So $s$ is in |
|
114 the \emph{language accepted by the automaton} $A(Q, q_0, F, \delta)$ |
|
115 iff |
107 |
116 |
108 \[ |
117 \[ |
109 \hat{\delta}(q_0, s) \in F |
118 \hat{\delta}(q_0, s) \in F |
110 \] |
119 \] |
111 |
120 |
112 \noindent I let you think about a definition that describes |
121 \noindent I let you think about a definition that describes |
113 the set of strings accepted by an automaton. |
122 the set of strings accepted by an automaton. |
114 |
123 |
115 |
124 \begin{figure}[p] |
116 While with DFAs it will always be clear that given a character |
125 \small |
117 what the next state is (potentially none), it will be useful |
126 \lstinputlisting[numbers=left,linebackgroundcolor= |
118 to relax this restriction. That means we have several |
127 {\ifodd\value{lstnumber}\color{capri!3}\fi}] |
119 potential successor states. We even allow ``silent |
128 {../progs/dfa.scala} |
120 transitions'', also called epsilon-transitions. They allow us |
129 \caption{Scala implementation of DFAs using partial functions. |
121 to go from one state to the next without having a character |
130 Notice some subtleties: \texttt{deltas} implements the delta-hat |
122 consumed. We label such silent transition with the letter |
131 construction by lifting the transition (partial) function to |
123 $\epsilon$. The resulting construction is called a |
132 lists of \texttt{C}haracters. Since \texttt{delta} is given |
124 \emph{non-deterministic finite automaton} (NFA) given also as |
133 as partial function, it can obviously go ``wrong'' in which |
125 a four-tuple $A(Q, q_0, F, \rho)$ where |
134 case the \texttt{Try} in \texttt{accepts} catches the error and |
|
135 returns \texttt{false}, that is the string is not accepted. |
|
136 The example implements the DFA example shown above.\label{dfa}} |
|
137 \end{figure} |
|
138 |
|
139 A simple Scala implementation of DFAs is given in Figure~\ref{dfa}. As |
|
140 you can see, there many features of the mathematical definition that |
|
141 are quite closely reflected in the code. In the DFA-class, there is a |
|
142 starting state, called \texttt{start}, with the polymorphic type |
|
143 \texttt{A}. There is a partial function \texttt{delta} for specifying |
|
144 the transitions. I used partial functions for transitions in Scala; |
|
145 alternatives would have been \texttt{Maps} or even \texttt{Lists}. One |
|
146 of the main advantages of using partial functions is that transitions |
|
147 can be quite nicely defined by a series of \texttt{case} statements |
|
148 (see Lines 28 -- 38). If you need to represent an automaton |
|
149 with a sink state (catch-all-state), you can write something like |
|
150 |
|
151 {\small\begin{lstlisting}[language=Scala,linebackgroundcolor= |
|
152 {\ifodd\value{lstnumber}\color{capri!3}\fi}] |
|
153 abstract class State |
|
154 ... |
|
155 case object Sink extends State |
|
156 |
|
157 val delta : (State, Char) :=> State = |
|
158 { case (S0, 'a') => S1 |
|
159 case (S1, 'a') => S2 |
|
160 case _ => Sink |
|
161 } |
|
162 \end{lstlisting}} |
|
163 |
|
164 \noindent The DFA-class has also an argument for final states. It is a |
|
165 function from states to booleans (returns true whenever a state is |
|
166 meant to be finbal; false otherwise). While this is a boolean |
|
167 function, Scala allows me to use sets of states for such functions |
|
168 (see Line 40 where I initialise the DFA). |
|
169 |
|
170 I let you ponder whether this is a good implementation of DFAs. In |
|
171 doing so I hope you notice that the $Q$ component (the set of finite |
|
172 states) is missing from the class definition. This means that the |
|
173 implementation allows you to do some fishy things you are not meant to |
|
174 do with DFAs. Which fishy things could that be? |
|
175 |
|
176 While with DFAs it will always be clear that given a character what |
|
177 the next state is (potentially none), it will be useful to relax this |
|
178 restriction. That means we have several potential successor states. We |
|
179 even allow more than one starting state. The resulting construction is |
|
180 called a \emph{non-deterministic finite automaton} (NFA) given also as |
|
181 a four-tuple $A(Q, Q_{0s}, F, \rho)$ where |
126 |
182 |
127 \begin{itemize} |
183 \begin{itemize} |
128 \item $Q$ is a finite set of states |
184 \item $Q$ is a finite set of states |
129 \item $q_0$ is a start state |
185 \item $Q_{0s}$ are the start states ($Q_{0s} \subseteq Q$) |
130 \item $F$ are some accepting states with $F \subseteq Q$, and |
186 \item $F$ are some accepting states with $F \subseteq Q$, and |
131 \item $\rho$ is a transition relation. |
187 \item $\rho$ is a transition relation. |
132 \end{itemize} |
188 \end{itemize} |
133 |
189 |
134 \noindent |
190 \noindent |