|    113 end up in an accepting state, then this string is accepted by the |    114 end up in an accepting state, then this string is accepted by the | 
|    114 automaton. Otherwise it is not accepted. This also means that if along |    115 automaton. Otherwise it is not accepted. This also means that if along | 
|    115 the way we hit the case where the transition function $\delta$ is not |    116 the way we hit the case where the transition function $\delta$ is not | 
|    116 defined, we need to raise an error. In our implementation we will deal |    117 defined, we need to raise an error. In our implementation we will deal | 
|    117 with this case elegantly by using Scala's \texttt{Try}. So a string |    118 with this case elegantly by using Scala's \texttt{Try}. So a string | 
|    118 $s$ is in the \emph{language accepted by the automaton} $A(Q, Q_0, F, |    119 $s$ is in the \emph{language accepted by the automaton} $A(\Sigma, Q, Q_0, F, | 
|    119 \delta)$ iff |    120 \delta)$ iff | 
|    120  |    121  | 
|    121 \[ |    122 \[ | 
|    122 \hat{\delta}(Q_0, s) \in F  |    123 \hat{\delta}(Q_0, s) \in F  | 
|    123 \] |    124 \] | 
|    124  |    125  | 
|    125 \noindent I let you think about a definition that describes |    126 \noindent I let you think about a definition that describes | 
|    126 the set of strings accepted by an automaton.  |    127 the set of all strings accepted by an automaton.  | 
|    127  |    128  | 
|    128 \begin{figure}[p] |    129 \begin{figure}[p] | 
|    129 \small |    130 \small | 
|    130 \lstinputlisting[numbers=left,linebackgroundcolor= |    131 \lstinputlisting[numbers=left,linebackgroundcolor= | 
|    131                   {\ifodd\value{lstnumber}\color{capri!3}\fi}] |    132                   {\ifodd\value{lstnumber}\color{capri!3}\fi}] | 
|    132                   {../progs/dfa.scala} |    133                   {../progs/dfa.scala} | 
|    133 \caption{A Scala implementation of DFAs using partial functions. |    134 \caption{A Scala implementation of DFAs using partial functions. | 
|    134   Notice some subtleties: \texttt{deltas} implements the delta-hat |    135   Notice some subtleties: \texttt{deltas} implements the delta-hat | 
|    135   construction by lifting the transition (partial) function to |    136   construction by lifting the transition (partial) function to | 
|    136   lists of \texttt{C}haracters. Since \texttt{delta} is given |    137   lists of characters. Since \texttt{delta} is given | 
|    137   as partial function, it can obviously go ``wrong'' in which |    138   as a partial function, it can obviously go ``wrong'' in which | 
|    138   case the \texttt{Try} in \texttt{accepts} catches the error and |    139   case the \texttt{Try} in \texttt{accepts} catches the error and | 
|    139   returns \texttt{false}---that means the string is not accepted. |    140   returns \texttt{false}---that means the string is not accepted. | 
|    140   The example \texttt{delta} implements the DFA example shown |    141   The example \texttt{delta} implements the DFA example shown | 
|    141   earlier in the handout.\label{dfa}} |    142   earlier in the handout.\label{dfa}} | 
|    142 \end{figure} |    143 \end{figure} | 
|    143  |    144  | 
|    144 A simple Scala implementation for DFAs is given in |    145 A simple Scala implementation for DFAs is given in | 
|    145 Figure~\ref{dfa}. As you can see, there many features of the |    146 Figure~\ref{dfa}. As you can see, there are many features of the | 
|    146 mathematical definition that are quite closely reflected in the |    147 mathematical definition that are quite closely reflected in the | 
|    147 code. In the DFA-class, there is a starting state, called |    148 code. In the DFA-class, there is a starting state, called | 
|    148 \texttt{start}, with the polymorphic type \texttt{A}. (The reason for |    149 \texttt{start}, with the polymorphic type \texttt{A}.  There is a | 
|    149 having a polymorphic types for the states and the input of DFAs will |    150 partial function \texttt{delta} for specifying the transitions---these | 
|    150 become clearer later.)  There is a partial function \texttt{delta} for |    151 partial functions take a state (of polymorphic type \texttt{A}) and an | 
|    151 specifying the transitions. I used partial functions for representing |    152 input (of polymorphic type \texttt{C}) and produce a new state (of | 
|    152 the transitions in Scala; alternatives would have been \texttt{Maps} |    153 type \texttt{A}). For the moment it is OK to assume that \texttt{A} is | 
|    153 or even \texttt{Lists}. One of the main advantages of using partial |    154 some arbitrary type for states and the input is just characters.  (The | 
|    154 functions is that transitions can be quite nicely defined by a series |    155 reason for having a polymorphic types for the states and the input of | 
|    155 of \texttt{case} statements (see Lines 28 -- 38). If you need to |    156 DFAs will become clearer later on.) | 
|    156 represent an automaton with a sink state (catch-all-state), you can |    157  | 
|    157 write something like |    158 I used partial functions for representing the transitions in Scala; | 
|         |    159 alternatives would have been \texttt{Maps} or even \texttt{Lists}. One | 
|         |    160 of the main advantages of using partial functions is that transitions | 
|         |    161 can be quite nicely defined by a series of \texttt{case} statements | 
|         |    162 (see Lines 28 -- 38 for an example). If you need to represent an | 
|         |    163 automaton with a sink state (catch-all-state), you can use Scala's | 
|         |    164 pattern matching and write something like | 
|    158  |    165  | 
|    159 {\small\begin{lstlisting}[language=Scala,linebackgroundcolor= |    166 {\small\begin{lstlisting}[language=Scala,linebackgroundcolor= | 
|    160                   {\ifodd\value{lstnumber}\color{capri!3}\fi}] |    167                   {\ifodd\value{lstnumber}\color{capri!3}\fi}] | 
|    161 abstract class State |    168 abstract class State | 
|    162 ... |    169 ... | 
|    189 While with DFAs it will always be clear that given a state and a |    198 While with DFAs it will always be clear that given a state and a | 
|    190 character what the next state is (potentially none), it will be useful |    199 character what the next state is (potentially none), it will be useful | 
|    191 to relax this restriction. That means we allow to have several |    200 to relax this restriction. That means we allow to have several | 
|    192 potential successor states. We even allow more than one starting |    201 potential successor states. We even allow more than one starting | 
|    193 state. The resulting construction is called a \emph{non-deterministic |    202 state. The resulting construction is called a \emph{non-deterministic | 
|    194   finite automaton} (NFA) given also as a four-tuple $A(Qs, Q_{0s}, F, |    203   finite automaton} (NFA) given also as a five-tuple $A(\Sigma, Qs, Q_{0s}, F, | 
|    195 \rho)$ where |    204 \rho)$ where | 
|    196  |    205  | 
|    197 \begin{itemize} |    206 \begin{itemize} | 
|         |    207 \item $\Sigma$ is an alphabet,   | 
|    198 \item $Qs$ is a finite set of states |    208 \item $Qs$ is a finite set of states | 
|    199 \item $Q_{0s}$ is a set of start states ($Q_{0s} \subseteq Qs$) |    209 \item $Q_{0s}$ is a set of start states ($Q_{0s} \subseteq Qs$) | 
|    200 \item $F$ are some accepting states with $F \subseteq Qs$, and |    210 \item $F$ are some accepting states with $F \subseteq Qs$, and | 
|    201 \item $\rho$ is a transition relation. |    211 \item $\rho$ is a transition relation. | 
|    202 \end{itemize} |    212 \end{itemize} | 
|    203  |    213  | 
|    204 \noindent |    214 \noindent | 
|    205 A typical example of an NFA is |    215 A typical example of a NFA is | 
|    206  |    216  | 
|    207 % A NFA for (ab* + b)*a |    217 % A NFA for (ab* + b)*a | 
|    208 \begin{center} |    218 \begin{center} | 
|    209 \begin{tikzpicture}[scale=0.8,>=stealth',very thick, |    219 \begin{tikzpicture}[>=stealth',very thick, auto, | 
|    210 every state/.style={minimum size=0pt,draw=blue!50,very thick,fill=blue!20},] |    220     every state/.style={minimum size=0pt,inner sep=3pt, | 
|         |    221       draw=blue!50,very thick,fill=blue!20},scale=2] | 
|    211 \node[state,initial]  (Q_0)  {$Q_0$}; |    222 \node[state,initial]  (Q_0)  {$Q_0$}; | 
|    212 \node[state] (Q_1) [right=of Q_0] {$Q_1$}; |    223 \node[state] (Q_1) [right=of Q_0] {$Q_1$}; | 
|    213 \node[state, accepting] (Q_2) [right=of Q_1] {$Q_2$}; |    224 \node[state, accepting] (Q_2) [right=of Q_1] {$Q_2$}; | 
|    214 \path[->] (Q_0) edge [loop above] node  {$b$} (); |    225 \path[->] (Q_0) edge [loop above] node  {$b$} (); | 
|    215 \path[<-] (Q_0) edge node [below]  {$b$} (Q_1); |    226 \path[<-] (Q_0) edge node [below]  {$b$} (Q_1); | 
|    221 \end{center} |    232 \end{center} | 
|    222  |    233  | 
|    223 \noindent |    234 \noindent | 
|    224 This NFA happens to have only one starting state, but in general there |    235 This NFA happens to have only one starting state, but in general there | 
|    225 could be more.  Notice that in state $Q_0$ we might go to state $Q_1$ |    236 could be more.  Notice that in state $Q_0$ we might go to state $Q_1$ | 
|    226 \emph{or} to state $Q_2$ when receiving an $a$. Similarly in state $Q_1$ |    237 \emph{or} to state $Q_2$ when receiving an $a$. Similarly in state | 
|    227 and receiving an $a$, we can stay in state $Q_1$ or go to $Q_2$. This |    238 $Q_1$ and receiving an $a$, we can stay in state $Q_1$ or go to | 
|    228 kind of choice is not allowed with DFAs. When it comes to deciding |    239 $Q_2$. This kind of choice is not allowed with DFAs. The downside is | 
|    229 whether a string is accepted by an NFA we potentially need to explore |    240 that when it comes to deciding whether a string is accepted by a NFA | 
|    230 all possibilities. I let you think which kind of strings this NFA |    241 we potentially have to explore all possibilities. I let you think | 
|    231 accepts. |    242 which kind of strings the above NFA accepts. | 
|    232  |    243  | 
|    233  |    244  | 
|    234 There are however a number of additional points you should note. Every |    245 There are a number of additional points you should note with | 
|    235 DFA is a NFA, but not vice versa. The $\rho$ in NFAs is a transition |    246 NFAs. Every DFA is a NFA, but not vice versa. The $\rho$ in NFAs is a | 
|    236 \emph{relation} (DFAs have a transition function). The difference |    247 transition \emph{relation} (DFAs have a transition function). The | 
|    237 between a function and a relation is that a function has always a |    248 difference between a function and a relation is that a function has | 
|    238 single output, while a relation gives, roughly speaking, several |    249 always a single output, while a relation gives, roughly speaking, | 
|    239 outputs. Look again at the NFA above: if you are currently in the |    250 several outputs. Look again at the NFA above: if you are currently in | 
|    240 state $Q_1$ and you read a character $b$, then you can transition to |    251 the state $Q_1$ and you read a character $b$, then you can transition | 
|    241 either $Q_0$ \emph{or} $Q_2$. Which route, or output, you take is not |    252 to either $Q_0$ \emph{or} $Q_2$. Which route, or output, you take is | 
|    242 determined.  This non-determinism can be represented by a relation. |    253 not determined.  This non-determinism can be represented by a | 
|    243  |    254 relation. | 
|    244 My implementation of NFAs is shown in Figure~\ref{nfa}.  Perhaps |    255  | 
|    245 interestingly, I do not use relations for my implementation of NFAs in |    256 My implementation of NFAs in Scala is shown in Figure~\ref{nfa}. | 
|    246 Scala, and I also do not use transition functions which return a set |    257 Perhaps interestingly, I do not actually use relations for my NFAs, | 
|    247 of states (another popular choice for implementing NFAs).  For reasons |    258 and I also do not use transition functions that return sets of states | 
|    248 that become clear in a moment, I use sets of partial functions---see |    259 (another popular choice for implementing NFAs).  For reasons that | 
|    249 Line 7 in Figure~\ref{nfa}. \texttt{Starts} is now a set of states; |    260 become clear in a moment, I use sets of partial functions | 
|    250 \texttt{fins} is again a function from states to booleans. The |    261 instead---see Line 7 in Figure~\ref{nfa}. DFAs have only one such | 
|    251 \texttt{next} function calculates the set of next states reachable |    262 partial function; my NFAs have a set.  Another parameter, | 
|    252 from a state \texttt{q} by a character \texttt{c}---we go through all |    263 \texttt{Starts}, is in NFAs a set of states; \texttt{fins} is again a | 
|    253 partial functions in the \texttt{delta}-set, lift it (this transformes |    264 function from states to booleans. The \texttt{next} function | 
|    254 the partial function into the corresponding \texttt{Option}-function |    265 calculates the set of next states reachable from a single state | 
|    255 and then apply \texttt{q} and \texttt{c}. This gives us a set of |    266 \texttt{q} by a character \texttt{c}---this is calculated by going | 
|    256 \texttt{Some}s and \texttt{None}s, where the \texttt{Some}s are |    267 through all the partial functions in the \texttt{delta}-set and apply | 
|    257 filtered out by the \texttt{flatMap}. Teh function \texttt{nexts} just |    268 \texttt{q} and \texttt{c} (see Line 13). This gives us a set of | 
|    258 lifts this to sets of states. \texttt{Deltas} and \texttt{accept} are |    269 \texttt{Some}s (in case the application succeeded) and possibly some | 
|    259 similar to the DFA definitions. |    270 \texttt{None}s (in case the partial function is not defined or produces an | 
|         |    271 error).  The \texttt{None}s are filtered out by the \texttt{flatMap}, | 
|         |    272 leaving the values inside the \texttt{Some}s. The function | 
|         |    273 \texttt{nexts} just lifts this function to sets of | 
|         |    274 states. \texttt{Deltas} and \texttt{accept} are similar to the DFA | 
|         |    275 definitions. | 
|    260  |    276  | 
|    261 \begin{figure}[t] |    277 \begin{figure}[t] | 
|    262 \small |    278 \small | 
|    263 \lstinputlisting[numbers=left,linebackgroundcolor= |    279 \lstinputlisting[numbers=left,linebackgroundcolor= | 
|    264                   {\ifodd\value{lstnumber}\color{capri!3}\fi}] |    280                   {\ifodd\value{lstnumber}\color{capri!3}\fi}] | 
|    265                   {../progs/nfa.scala} |    281                   {../progs/nfa.scala} | 
|    266 \caption{A Scala implementation of NFAs using sets of partial functions. |    282 \caption{A Scala implementation of NFAs using sets of partial functions. | 
|    267   Notice some subtleties: \texttt{deltas} implements the delta-hat |    283   Notice some subtleties: Since \texttt{delta} is given | 
|    268   construction by lifting the transition (partial) function to |    284   as a set of partial functions, each of them can obviously go ``wrong'' in which | 
|    269   lists of \texttt{C}haracters. Since \texttt{delta} is given |    285   case the \texttt{Try}. The function \texttt{accepts} implements the | 
|    270   as partial function, it can obviously go ``wrong'' in which |    286   acceptance of a string in a breath-first fashion. This can be costly | 
|    271   case the \texttt{Try} in \texttt{accepts} catches the error and |    287   way of deciding whether a string is accepted in practical contexts.\label{nfa}} | 
|    272   returns \texttt{false}---that means the string is not accepted. |         | 
|    273   The example \texttt{delta} implements the DFA example shown |         | 
|    274   earlier in the handout.\label{nfa}} |         | 
|    275 \end{figure} |    288 \end{figure} | 
|    276  |    289  | 
|    277  |    290 The reason for using sets of partial functions for specifying the | 
|         |    291 transitions in NFAs has to do with examples like this one: a | 
|         |    292 popular benchmark regular expression is $(.)^*\cdot a\cdot | 
|         |    293 (.)^{\{n\}}\cdot b\cdot c$. A NFA that accepts the same strings | 
|         |    294 (for $n=3$) is as follows: | 
|         |    295  | 
|         |    296 \begin{center} | 
|         |    297 \begin{tikzpicture}[>=stealth',very thick, auto, node distance=7mm, | 
|         |    298     every state/.style={minimum size=0pt,inner sep=1pt, | 
|         |    299       draw=blue!50,very thick,fill=blue!20},scale=0.5] | 
|         |    300 \node[state,initial]  (Q_0)  {$Q_0$}; | 
|         |    301 \node[state] (Q_1) [right=of Q_0] {$Q_1$}; | 
|         |    302 \node[state] (Q_2) [right=of Q_1] {$Q_2$}; | 
|         |    303 \node[state] (Q_3) [right=of Q_2] {$Q_3$}; | 
|         |    304 \node[state] (Q_4) [right=of Q_3] {$Q_4$}; | 
|         |    305 \node[state] (Q_5) [right=of Q_4] {$Q_5$}; | 
|         |    306 \node[state,accepting] (Q_6) [right=of Q_5] {$Q_6$}; | 
|         |    307 \path[->] (Q_0) edge [loop above] node  {$.$} (); | 
|         |    308 \path[->] (Q_0) edge node [above]  {$a$} (Q_1); | 
|         |    309 \path[->] (Q_1) edge node [above]  {$.$} (Q_2); | 
|         |    310 \path[->] (Q_2) edge node [above]  {$.$} (Q_3); | 
|         |    311 \path[->] (Q_3) edge node [above]  {$.$} (Q_4); | 
|         |    312 \path[->] (Q_4) edge node [above]  {$b$} (Q_5); | 
|         |    313 \path[->] (Q_5) edge node [above]  {$c$} (Q_6); | 
|         |    314 \end{tikzpicture} | 
|         |    315 \end{center} | 
|         |    316  | 
|         |    317 \noindent | 
|         |    318 The $.$ stands for accepting any single character: for example if we | 
|         |    319 are in $Q_0$ and read an $a$ we can either stay in $Q_0$ (since any | 
|         |    320 character will do for this) or advance to $Q_1$ (but only if it is an | 
|         |    321 $a$). Why this is a good benchmark regular expression is irrelevant | 
|         |    322 here. The point is that this NFA can be conveniently represented by | 
|         |    323 the code: | 
|         |    324  | 
|         |    325 {\small\begin{lstlisting}[language=Scala,linebackgroundcolor= | 
|         |    326                   {\ifodd\value{lstnumber}\color{capri!3}\fi}] | 
|         |    327 val delta = Set[(State, Char) :=> State]( | 
|         |    328   { case (Q0, 'a') => Q1 }, | 
|         |    329   { case (Q0, _)   => Q0 }, | 
|         |    330   { case (Q1, _)   => Q2 }, | 
|         |    331   { case (Q2, _)   => Q3 }, | 
|         |    332   { case (Q3, _)   => Q4 }, | 
|         |    333   { case (Q4, 'b') => Q5 }, | 
|         |    334   { case (Q5, 'c') => Q6 } | 
|         |    335 ) | 
|         |    336  | 
|         |    337 NFA(Set[State](Q0), delta, Set[State](Q6)) | 
|         |    338 \end{lstlisting}} | 
|         |    339  | 
|         |    340 \noindent | 
|         |    341 where the $.$-transitions translate into a | 
|         |    342 underscore-pattern-matching. Recall that in $Q_0$ if we read an $a$ we | 
|         |    343 can go to $Q_1$ (by the first partial function in the set) and also | 
|         |    344 stay in $Q_0$ (by the second partial function). Representing such | 
|         |    345 transitions in any other way is somehow awkward; the set of partial | 
|         |    346 function representation makes them easy to implement. | 
|         |    347  | 
|         |    348 Look very careful again at the \texttt{accepts} and \texttt{deltas} | 
|         |    349 functions in NFAs and remember that when accepting a string by an NFA | 
|         |    350 we might have to explore all possible transitions (which state to go | 
|         |    351 to is not unique anymore). The shown implementation achieves this | 
|         |    352 exploration in a \emph{breath-first search}. This is fine for very | 
|         |    353 small NFAs, but can lead to problems when the NFAs are bigger. Take | 
|         |    354 for example the regular expression $(.)^*\cdot a\cdot (.)^{\{n\}}\cdot | 
|         |    355 b\cdot c$ from above. If $n$ is large, say 100 or 1000, then the | 
|         |    356 corresponding NFA will have 104, respectively 1004, nodes. The problem | 
|         |    357 is that with certain strings this can lead to 1000 ``active'' nodes | 
|         |    358 which we need to analyse when determining the next states. This can be | 
|         |    359 a real memory strain in practical applications. As a result, some | 
|         |    360 regular expression matching engines resort to a \emph{depth-first | 
|         |    361   search} with \emph{backtracking} in unsuccessful cases. In our | 
|         |    362 implementation we could implement a depth-first version of | 
|         |    363 \texttt{accepts} using Scala's \texttt{exists} as follows: | 
|         |    364  | 
|         |    365  | 
|         |    366 {\small\begin{lstlisting}[language=Scala,linebackgroundcolor= | 
|         |    367                   {\ifodd\value{lstnumber}\color{capri!3}\fi}] | 
|         |    368 def search(q: A, s: List[C]) : Boolean = s match { | 
|         |    369   case Nil => fins(q) | 
|         |    370   case c::cs => | 
|         |    371     delta.exists((d) => Try(search(d(q, c), cs)) getOrElse false) | 
|         |    372 } | 
|         |    373  | 
|         |    374 def accepts(s: List[C]) : Boolean =  | 
|         |    375   starts.exists(search(_, s)) | 
|         |    376 \end{lstlisting}} | 
|         |    377  | 
|         |    378 \noindent | 
|         |    379 This depth-first way of exploration seems to work efficiently in many | 
|         |    380 examples and is much less of strain on memory. The problem is that the | 
|         |    381 backtracking can get catastrophic in some examples---remember the | 
|         |    382 catastrophic backtracking from earlier lectures. This depth-first | 
|         |    383 search with backtracking is the reason for the abysmal performance of | 
|         |    384 regular expression macthing in Java, Ruby and Python. | 
|    278  |    385  | 
|    279 %This means if |    386 %This means if | 
|    280 %we need to decide whether a string is accepted by a NFA, we might have |    387 %we need to decide whether a string is accepted by a NFA, we might have | 
|    281 %to explore all possibilities. Also there is the special silent |    388 %to explore all possibilities. Also there is the special silent | 
|    282 %transition in NFAs. As mentioned already this transition means you do |    389 %transition in NFAs. As mentioned already this transition means you do |