afl-material: handouts/ho03.tex@d94532448ec8 (annotated)

662 7f7098f0b5f0 updated Christian Urban <urbanc@in.tum.de> parents: 578 diff changeset	1	% !TEX program = xelatex
140 1be892087df2 added Christian Urban <christian dot urban at kcl dot ac dot uk> parents: diff changeset	2	\documentclass{article}
251 5b5a68df6d16 updated Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 217 diff changeset	3	\usepackage{../style}
5b5a68df6d16 updated Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 217 diff changeset	4	\usepackage{../langs}
268 18bef085a7ca updated Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 251 diff changeset	5	\usepackage{../graphics}
140 1be892087df2 added Christian Urban <christian dot urban at kcl dot ac dot uk> parents: diff changeset	6
480 14318f1d3b0f updated Christian Urban <urbanc@in.tum.de> parents: 471 diff changeset	7
140 1be892087df2 added Christian Urban <christian dot urban at kcl dot ac dot uk> parents: diff changeset	8	\begin{document}
874 c3d78e7b731c updated Christian Urban <christian.urban@kcl.ac.uk> parents: 764 diff changeset	9	\fnote{\copyright{} Christian Urban, King's College London, 2014, 2015, 2016, 2017, 2020, 2022}
140 1be892087df2 added Christian Urban <christian dot urban at kcl dot ac dot uk> parents: diff changeset	10
482 74149519e436 updated Christian Urban <urbanc@in.tum.de> parents: 480 diff changeset	11	\section*{Handout 3 (Finite Automata)}
140 1be892087df2 added Christian Urban <christian dot urban at kcl dot ac dot uk> parents: diff changeset	12
488 057b4603b940 updated Christian Urban <urbanc@in.tum.de> parents: 487 diff changeset	13
480 14318f1d3b0f updated Christian Urban <urbanc@in.tum.de> parents: 471 diff changeset	14	Every formal language and compiler course I know of bombards you first
14318f1d3b0f updated Christian Urban <urbanc@in.tum.de> parents: 471 diff changeset	15	with automata and then to a much, much smaller extend with regular
14318f1d3b0f updated Christian Urban <urbanc@in.tum.de> parents: 471 diff changeset	16	expressions. As you can see, this course is turned upside down:
14318f1d3b0f updated Christian Urban <urbanc@in.tum.de> parents: 471 diff changeset	17	regular expressions come first. The reason is that regular expressions
14318f1d3b0f updated Christian Urban <urbanc@in.tum.de> parents: 471 diff changeset	18	are easier to reason about and the notion of derivatives, although
14318f1d3b0f updated Christian Urban <urbanc@in.tum.de> parents: 471 diff changeset	19	already quite old, only became more widely known rather
488 057b4603b940 updated Christian Urban <urbanc@in.tum.de> parents: 487 diff changeset	20	recently. Still, let us in this lecture have a closer look at automata
480 14318f1d3b0f updated Christian Urban <urbanc@in.tum.de> parents: 471 diff changeset	21	and their relation to regular expressions. This will help us with
925 ff202426ec47 updated Christian Urban <christian.urban@kcl.ac.uk> parents: 874 diff changeset	22	understanding why the regular expression matchers in Python, Ruby,
ff202426ec47 updated Christian Urban <christian.urban@kcl.ac.uk> parents: 874 diff changeset	23	Java and so on are so slow with certain regular expressions. On the
ff202426ec47 updated Christian Urban <christian.urban@kcl.ac.uk> parents: 874 diff changeset	24	way we will also see what are the limitations of regular
488 057b4603b940 updated Christian Urban <urbanc@in.tum.de> parents: 487 diff changeset	25	expressions. Unfortunately, they cannot be used for \emph{everything}.
482 74149519e436 updated Christian Urban <urbanc@in.tum.de> parents: 480 diff changeset	26
74149519e436 updated Christian Urban <urbanc@in.tum.de> parents: 480 diff changeset	27
74149519e436 updated Christian Urban <urbanc@in.tum.de> parents: 480 diff changeset	28	\subsection*{Deterministic Finite Automata}
74149519e436 updated Christian Urban <urbanc@in.tum.de> parents: 480 diff changeset	29
485 21dec9df46ba updated Christian Urban <urbanc@in.tum.de> parents: 484 diff changeset	30	Lets start\ldots the central definition is:\medskip
142 1aa28135a2da added Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 141 diff changeset	31
268 18bef085a7ca updated Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 251 diff changeset	32	\noindent
251 5b5a68df6d16 updated Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 217 diff changeset	33	A \emph{deterministic finite automaton} (DFA), say $A$, is
484 8182eb3278e0 updated Christian Urban <urbanc@in.tum.de> parents: 483 diff changeset	34	given by a five-tuple written ${\cal A}(\varSigma, Qs, Q_0, F, \delta)$ where
142 1aa28135a2da added Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 141 diff changeset	35
1aa28135a2da added Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 141 diff changeset	36	\begin{itemize}
484 8182eb3278e0 updated Christian Urban <urbanc@in.tum.de> parents: 483 diff changeset	37	\item $\varSigma$ is an alphabet,
482 74149519e436 updated Christian Urban <urbanc@in.tum.de> parents: 480 diff changeset	38	\item $Qs$ is a finite set of states,
74149519e436 updated Christian Urban <urbanc@in.tum.de> parents: 480 diff changeset	39	\item $Q_0 \in Qs$ is the start state,
74149519e436 updated Christian Urban <urbanc@in.tum.de> parents: 480 diff changeset	40	\item $F \subseteq Qs$ are the accepting states, and
142 1aa28135a2da added Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 141 diff changeset	41	\item $\delta$ is the transition function.
1aa28135a2da added Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 141 diff changeset	42	\end{itemize}
1aa28135a2da added Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 141 diff changeset	43
490 8a07f7256f2a updated Christian Urban <urbanc@in.tum.de> parents: 489 diff changeset	44	\noindent I am sure you have seen this definition already
488 057b4603b940 updated Christian Urban <urbanc@in.tum.de> parents: 487 diff changeset	45	before. The transition function determines how to ``transition'' from
057b4603b940 updated Christian Urban <urbanc@in.tum.de> parents: 487 diff changeset	46	one state to the next state with respect to a character. We have the
057b4603b940 updated Christian Urban <urbanc@in.tum.de> parents: 487 diff changeset	47	assumption that these transition functions do not need to be defined
057b4603b940 updated Christian Urban <urbanc@in.tum.de> parents: 487 diff changeset	48	everywhere: so it can be the case that given a character there is no
057b4603b940 updated Christian Urban <urbanc@in.tum.de> parents: 487 diff changeset	49	next state, in which case we need to raise a kind of ``failure
483 faba5360372c updated Christian Urban <urbanc@in.tum.de> parents: 482 diff changeset	50	exception''. That means actually we have \emph{partial} functions as
495 acd4567735ce updated Christian Urban <urbanc@in.tum.de> parents: 492 diff changeset	51	transitions---see the Scala implementation for DFAs later on. A
484 8182eb3278e0 updated Christian Urban <urbanc@in.tum.de> parents: 483 diff changeset	52	typical example of a DFA is
142 1aa28135a2da added Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 141 diff changeset	53
1aa28135a2da added Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 141 diff changeset	54	\begin{center}
1aa28135a2da added Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 141 diff changeset	55	\begin{tikzpicture}[>=stealth',very thick,auto,
268 18bef085a7ca updated Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 251 diff changeset	56	every state/.style={minimum size=0pt,
18bef085a7ca updated Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 251 diff changeset	57	inner sep=2pt,draw=blue!50,very thick,
18bef085a7ca updated Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 251 diff changeset	58	fill=blue!20},scale=2]
482 74149519e436 updated Christian Urban <urbanc@in.tum.de> parents: 480 diff changeset	59	\node[state,initial] (Q_0) {$Q_0$};
74149519e436 updated Christian Urban <urbanc@in.tum.de> parents: 480 diff changeset	60	\node[state] (Q_1) [right=of Q_0] {$Q_1$};
74149519e436 updated Christian Urban <urbanc@in.tum.de> parents: 480 diff changeset	61	\node[state] (Q_2) [below right=of Q_0] {$Q_2$};
74149519e436 updated Christian Urban <urbanc@in.tum.de> parents: 480 diff changeset	62	\node[state] (Q_3) [right=of Q_2] {$Q_3$};
74149519e436 updated Christian Urban <urbanc@in.tum.de> parents: 480 diff changeset	63	\node[state, accepting] (Q_4) [right=of Q_1] {$Q_4$};
74149519e436 updated Christian Urban <urbanc@in.tum.de> parents: 480 diff changeset	64	\path[->] (Q_0) edge node [above] {$a$} (Q_1);
74149519e436 updated Christian Urban <urbanc@in.tum.de> parents: 480 diff changeset	65	\path[->] (Q_1) edge node [above] {$a$} (Q_4);
74149519e436 updated Christian Urban <urbanc@in.tum.de> parents: 480 diff changeset	66	\path[->] (Q_4) edge [loop right] node {$a, b$} ();
74149519e436 updated Christian Urban <urbanc@in.tum.de> parents: 480 diff changeset	67	\path[->] (Q_3) edge node [right] {$a$} (Q_4);
74149519e436 updated Christian Urban <urbanc@in.tum.de> parents: 480 diff changeset	68	\path[->] (Q_2) edge node [above] {$a$} (Q_3);
74149519e436 updated Christian Urban <urbanc@in.tum.de> parents: 480 diff changeset	69	\path[->] (Q_1) edge node [right] {$b$} (Q_2);
74149519e436 updated Christian Urban <urbanc@in.tum.de> parents: 480 diff changeset	70	\path[->] (Q_0) edge node [above] {$b$} (Q_2);
74149519e436 updated Christian Urban <urbanc@in.tum.de> parents: 480 diff changeset	71	\path[->] (Q_2) edge [loop left] node {$b$} ();
74149519e436 updated Christian Urban <urbanc@in.tum.de> parents: 480 diff changeset	72	\path[->] (Q_3) edge [bend left=95, looseness=1.3] node [below] {$b$} (Q_0);
142 1aa28135a2da added Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 141 diff changeset	73	\end{tikzpicture}
1aa28135a2da added Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 141 diff changeset	74	\end{center}
140 1be892087df2 added Christian Urban <christian dot urban at kcl dot ac dot uk> parents: diff changeset	75
482 74149519e436 updated Christian Urban <urbanc@in.tum.de> parents: 480 diff changeset	76	\noindent In this graphical notation, the accepting state $Q_4$ is
74149519e436 updated Christian Urban <urbanc@in.tum.de> parents: 480 diff changeset	77	indicated with double circles. Note that there can be more than one
74149519e436 updated Christian Urban <urbanc@in.tum.de> parents: 480 diff changeset	78	accepting state. It is also possible that a DFA has no accepting
485 21dec9df46ba updated Christian Urban <urbanc@in.tum.de> parents: 484 diff changeset	79	state at all, or that the starting state is also an accepting
482 74149519e436 updated Christian Urban <urbanc@in.tum.de> parents: 480 diff changeset	80	state. In the case above the transition function is defined everywhere
74149519e436 updated Christian Urban <urbanc@in.tum.de> parents: 480 diff changeset	81	and can also be given as a table as follows:
143 e3fd4c5995ef added Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 142 diff changeset	82
e3fd4c5995ef added Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 142 diff changeset	83	\[
e3fd4c5995ef added Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 142 diff changeset	84	\begin{array}{lcl}
482 74149519e436 updated Christian Urban <urbanc@in.tum.de> parents: 480 diff changeset	85	(Q_0, a) &\rightarrow& Q_1\\
74149519e436 updated Christian Urban <urbanc@in.tum.de> parents: 480 diff changeset	86	(Q_0, b) &\rightarrow& Q_2\\
74149519e436 updated Christian Urban <urbanc@in.tum.de> parents: 480 diff changeset	87	(Q_1, a) &\rightarrow& Q_4\\
74149519e436 updated Christian Urban <urbanc@in.tum.de> parents: 480 diff changeset	88	(Q_1, b) &\rightarrow& Q_2\\
74149519e436 updated Christian Urban <urbanc@in.tum.de> parents: 480 diff changeset	89	(Q_2, a) &\rightarrow& Q_3\\
74149519e436 updated Christian Urban <urbanc@in.tum.de> parents: 480 diff changeset	90	(Q_2, b) &\rightarrow& Q_2\\
74149519e436 updated Christian Urban <urbanc@in.tum.de> parents: 480 diff changeset	91	(Q_3, a) &\rightarrow& Q_4\\
74149519e436 updated Christian Urban <urbanc@in.tum.de> parents: 480 diff changeset	92	(Q_3, b) &\rightarrow& Q_0\\
74149519e436 updated Christian Urban <urbanc@in.tum.de> parents: 480 diff changeset	93	(Q_4, a) &\rightarrow& Q_4\\
74149519e436 updated Christian Urban <urbanc@in.tum.de> parents: 480 diff changeset	94	(Q_4, b) &\rightarrow& Q_4\\
143 e3fd4c5995ef added Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 142 diff changeset	95	\end{array}
e3fd4c5995ef added Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 142 diff changeset	96	\]
e3fd4c5995ef added Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 142 diff changeset	97
495 acd4567735ce updated Christian Urban <urbanc@in.tum.de> parents: 492 diff changeset	98	\noindent
acd4567735ce updated Christian Urban <urbanc@in.tum.de> parents: 492 diff changeset	99	Please check that this table represents the same transition function
acd4567735ce updated Christian Urban <urbanc@in.tum.de> parents: 492 diff changeset	100	as the graph above.
acd4567735ce updated Christian Urban <urbanc@in.tum.de> parents: 492 diff changeset	101
268 18bef085a7ca updated Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 251 diff changeset	102	We need to define the notion of what language is accepted by
18bef085a7ca updated Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 251 diff changeset	103	an automaton. For this we lift the transition function
18bef085a7ca updated Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 251 diff changeset	104	$\delta$ from characters to strings as follows:
143 e3fd4c5995ef added Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 142 diff changeset	105
e3fd4c5995ef added Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 142 diff changeset	106	\[
e3fd4c5995ef added Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 142 diff changeset	107	\begin{array}{lcl}
484 8182eb3278e0 updated Christian Urban <urbanc@in.tum.de> parents: 483 diff changeset	108	\widehat{\delta}(q, []) & \dn & q\\
8182eb3278e0 updated Christian Urban <urbanc@in.tum.de> parents: 483 diff changeset	109	\widehat{\delta}(q, c\!::\!s) & \dn & \widehat{\delta}(\delta(q, c), s)\\
143 e3fd4c5995ef added Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 142 diff changeset	110	\end{array}
e3fd4c5995ef added Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 142 diff changeset	111	\]
e3fd4c5995ef added Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 142 diff changeset	112
268 18bef085a7ca updated Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 251 diff changeset	113	\noindent This lifted transition function is often called
485 21dec9df46ba updated Christian Urban <urbanc@in.tum.de> parents: 484 diff changeset	114	\emph{delta-hat}. Given a string, we start in the starting state and
21dec9df46ba updated Christian Urban <urbanc@in.tum.de> parents: 484 diff changeset	115	take the first character of the string, follow to the next state, then
21dec9df46ba updated Christian Urban <urbanc@in.tum.de> parents: 484 diff changeset	116	take the second character and so on. Once the string is exhausted and
21dec9df46ba updated Christian Urban <urbanc@in.tum.de> parents: 484 diff changeset	117	we end up in an accepting state, then this string is accepted by the
480 14318f1d3b0f updated Christian Urban <urbanc@in.tum.de> parents: 471 diff changeset	118	automaton. Otherwise it is not accepted. This also means that if along
14318f1d3b0f updated Christian Urban <urbanc@in.tum.de> parents: 471 diff changeset	119	the way we hit the case where the transition function $\delta$ is not
14318f1d3b0f updated Christian Urban <urbanc@in.tum.de> parents: 471 diff changeset	120	defined, we need to raise an error. In our implementation we will deal
485 21dec9df46ba updated Christian Urban <urbanc@in.tum.de> parents: 484 diff changeset	121	with this case elegantly by using Scala's \texttt{Try}. Summing up: a
21dec9df46ba updated Christian Urban <urbanc@in.tum.de> parents: 484 diff changeset	122	string $s$ is in the \emph{language accepted by the automaton} ${\cal
484 8182eb3278e0 updated Christian Urban <urbanc@in.tum.de> parents: 483 diff changeset	123	A}(\varSigma, Q, Q_0, F, \delta)$ iff
143 e3fd4c5995ef added Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 142 diff changeset	124
e3fd4c5995ef added Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 142 diff changeset	125	\[
484 8182eb3278e0 updated Christian Urban <urbanc@in.tum.de> parents: 483 diff changeset	126	\widehat{\delta}(Q_0, s) \in F
143 e3fd4c5995ef added Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 142 diff changeset	127	\]
268 18bef085a7ca updated Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 251 diff changeset	128
488 057b4603b940 updated Christian Urban <urbanc@in.tum.de> parents: 487 diff changeset	129	\noindent I let you think about a definition that describes the set of
490 8a07f7256f2a updated Christian Urban <urbanc@in.tum.de> parents: 489 diff changeset	130	all strings accepted by a deterministic finite automaton.
480 14318f1d3b0f updated Christian Urban <urbanc@in.tum.de> parents: 471 diff changeset	131
14318f1d3b0f updated Christian Urban <urbanc@in.tum.de> parents: 471 diff changeset	132	\begin{figure}[p]
14318f1d3b0f updated Christian Urban <urbanc@in.tum.de> parents: 471 diff changeset	133	\small
753 30ea6b01db46 updated Christian Urban <christian.urban@kcl.ac.uk> parents: 698 diff changeset	134	\lstinputlisting[numbers=left,lastline=43]{../progs/automata/dfa.sc}
572 96af3fbdcd8d updated Christian Urban <urbanc@in.tum.de> parents: 556 diff changeset	135	\caption{An implementation of DFAs in Scala using partial functions.
488 057b4603b940 updated Christian Urban <urbanc@in.tum.de> parents: 487 diff changeset	136	Note some subtleties: \texttt{deltas} implements the delta-hat
487 ffbc65112d48 updated Christian Urban <urbanc@in.tum.de> parents: 485 diff changeset	137	construction by lifting the (partial) transition function to lists
485 21dec9df46ba updated Christian Urban <urbanc@in.tum.de> parents: 484 diff changeset	138	of characters. Since \texttt{delta} is given as a partial function,
21dec9df46ba updated Christian Urban <urbanc@in.tum.de> parents: 484 diff changeset	139	it can obviously go ``wrong'' in which case the \texttt{Try} in
21dec9df46ba updated Christian Urban <urbanc@in.tum.de> parents: 484 diff changeset	140	\texttt{accepts} catches the error and returns \texttt{false}---that
21dec9df46ba updated Christian Urban <urbanc@in.tum.de> parents: 484 diff changeset	141	means the string is not accepted. The example \texttt{delta} in
753 30ea6b01db46 updated Christian Urban <christian.urban@kcl.ac.uk> parents: 698 diff changeset	142	Line 22--43 implements the DFA example shown earlier in the
485 21dec9df46ba updated Christian Urban <urbanc@in.tum.de> parents: 484 diff changeset	143	handout.\label{dfa}}
480 14318f1d3b0f updated Christian Urban <urbanc@in.tum.de> parents: 471 diff changeset	144	\end{figure}
14318f1d3b0f updated Christian Urban <urbanc@in.tum.de> parents: 471 diff changeset	145
485 21dec9df46ba updated Christian Urban <urbanc@in.tum.de> parents: 484 diff changeset	146	My take on an implementation of DFAs in Scala is given in
483 faba5360372c updated Christian Urban <urbanc@in.tum.de> parents: 482 diff changeset	147	Figure~\ref{dfa}. As you can see, there are many features of the
482 74149519e436 updated Christian Urban <urbanc@in.tum.de> parents: 480 diff changeset	148	mathematical definition that are quite closely reflected in the
74149519e436 updated Christian Urban <urbanc@in.tum.de> parents: 480 diff changeset	149	code. In the DFA-class, there is a starting state, called
483 faba5360372c updated Christian Urban <urbanc@in.tum.de> parents: 482 diff changeset	150	\texttt{start}, with the polymorphic type \texttt{A}. There is a
faba5360372c updated Christian Urban <urbanc@in.tum.de> parents: 482 diff changeset	151	partial function \texttt{delta} for specifying the transitions---these
faba5360372c updated Christian Urban <urbanc@in.tum.de> parents: 482 diff changeset	152	partial functions take a state (of polymorphic type \texttt{A}) and an
faba5360372c updated Christian Urban <urbanc@in.tum.de> parents: 482 diff changeset	153	input (of polymorphic type \texttt{C}) and produce a new state (of
faba5360372c updated Christian Urban <urbanc@in.tum.de> parents: 482 diff changeset	154	type \texttt{A}). For the moment it is OK to assume that \texttt{A} is
faba5360372c updated Christian Urban <urbanc@in.tum.de> parents: 482 diff changeset	155	some arbitrary type for states and the input is just characters. (The
485 21dec9df46ba updated Christian Urban <urbanc@in.tum.de> parents: 484 diff changeset	156	reason for not having concrete types, but polymorphic types for the
21dec9df46ba updated Christian Urban <urbanc@in.tum.de> parents: 484 diff changeset	157	states and the input of DFAs will become clearer later on.)
483 faba5360372c updated Christian Urban <urbanc@in.tum.de> parents: 482 diff changeset	158
485 21dec9df46ba updated Christian Urban <urbanc@in.tum.de> parents: 484 diff changeset	159	The DFA-class has also an argument for specifying final states. In the
21dec9df46ba updated Christian Urban <urbanc@in.tum.de> parents: 484 diff changeset	160	implementation it is not a set of states, as in the mathematical
21dec9df46ba updated Christian Urban <urbanc@in.tum.de> parents: 484 diff changeset	161	definition, but a function from states to booleans (this function is
21dec9df46ba updated Christian Urban <urbanc@in.tum.de> parents: 484 diff changeset	162	supposed to return true whenever a state is final; false
21dec9df46ba updated Christian Urban <urbanc@in.tum.de> parents: 484 diff changeset	163	otherwise). While this boolean function is different from the sets of
753 30ea6b01db46 updated Christian Urban <christian.urban@kcl.ac.uk> parents: 698 diff changeset	164	states, Scala allows us to use sets for such functions (see Line 41 where
485 21dec9df46ba updated Christian Urban <urbanc@in.tum.de> parents: 484 diff changeset	165	the DFA is initialised). Again it will become clear later on why I use
21dec9df46ba updated Christian Urban <urbanc@in.tum.de> parents: 484 diff changeset	166	functions for final states, rather than sets.
21dec9df46ba updated Christian Urban <urbanc@in.tum.de> parents: 484 diff changeset	167
21dec9df46ba updated Christian Urban <urbanc@in.tum.de> parents: 484 diff changeset	168	The most important point in the implementation is that I use Scala's
484 8182eb3278e0 updated Christian Urban <urbanc@in.tum.de> parents: 483 diff changeset	169	partial functions for representing the transitions; alternatives would
8182eb3278e0 updated Christian Urban <urbanc@in.tum.de> parents: 483 diff changeset	170	have been \texttt{Maps} or even \texttt{Lists}. One of the main
8182eb3278e0 updated Christian Urban <urbanc@in.tum.de> parents: 483 diff changeset	171	advantages of using partial functions is that transitions can be quite
753 30ea6b01db46 updated Christian Urban <christian.urban@kcl.ac.uk> parents: 698 diff changeset	172	nicely defined by a series of \texttt{case} statements (see Lines 29
30ea6b01db46 updated Christian Urban <christian.urban@kcl.ac.uk> parents: 698 diff changeset	173	-- 39 for an example). If you need to represent an automaton with a
484 8182eb3278e0 updated Christian Urban <urbanc@in.tum.de> parents: 483 diff changeset	174	sink state (catch-all-state), you can use Scala's pattern matching and
8182eb3278e0 updated Christian Urban <urbanc@in.tum.de> parents: 483 diff changeset	175	write something like
143 e3fd4c5995ef added Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 142 diff changeset	176
490 8a07f7256f2a updated Christian Urban <urbanc@in.tum.de> parents: 489 diff changeset	177	{\small\begin{lstlisting}[language=Scala]
480 14318f1d3b0f updated Christian Urban <urbanc@in.tum.de> parents: 471 diff changeset	178	abstract class State
14318f1d3b0f updated Christian Urban <urbanc@in.tum.de> parents: 471 diff changeset	179	...
14318f1d3b0f updated Christian Urban <urbanc@in.tum.de> parents: 471 diff changeset	180	case object Sink extends State
14318f1d3b0f updated Christian Urban <urbanc@in.tum.de> parents: 471 diff changeset	181
14318f1d3b0f updated Christian Urban <urbanc@in.tum.de> parents: 471 diff changeset	182	val delta : (State, Char) :=> State =
14318f1d3b0f updated Christian Urban <urbanc@in.tum.de> parents: 471 diff changeset	183	{ case (S0, 'a') => S1
14318f1d3b0f updated Christian Urban <urbanc@in.tum.de> parents: 471 diff changeset	184	case (S1, 'a') => S2
14318f1d3b0f updated Christian Urban <urbanc@in.tum.de> parents: 471 diff changeset	185	case _ => Sink
14318f1d3b0f updated Christian Urban <urbanc@in.tum.de> parents: 471 diff changeset	186	}
14318f1d3b0f updated Christian Urban <urbanc@in.tum.de> parents: 471 diff changeset	187	\end{lstlisting}}
14318f1d3b0f updated Christian Urban <urbanc@in.tum.de> parents: 471 diff changeset	188
485 21dec9df46ba updated Christian Urban <urbanc@in.tum.de> parents: 484 diff changeset	189	\noindent I let you think what the corresponding DFA looks like in the
497 aa88ac9be3c0 updated Christian Urban <urbanc@in.tum.de> parents: 495 diff changeset	190	graph notation. Also, I suggest you to tinker with the Scala code in
aa88ac9be3c0 updated Christian Urban <urbanc@in.tum.de> parents: 495 diff changeset	191	order to define the DFA that does not accept any string at all.
484 8182eb3278e0 updated Christian Urban <urbanc@in.tum.de> parents: 483 diff changeset	192
485 21dec9df46ba updated Christian Urban <urbanc@in.tum.de> parents: 484 diff changeset	193	Finally, I let you ponder whether this is a good implementation of
21dec9df46ba updated Christian Urban <urbanc@in.tum.de> parents: 484 diff changeset	194	DFAs or not. In doing so I hope you notice that the $\varSigma$ and
572 96af3fbdcd8d updated Christian Urban <urbanc@in.tum.de> parents: 556 diff changeset	195	$Qs$ components (the alphabet and the set of \emph{finite} states,
485 21dec9df46ba updated Christian Urban <urbanc@in.tum.de> parents: 484 diff changeset	196	respectively) are missing from the class definition. This means that
572 96af3fbdcd8d updated Christian Urban <urbanc@in.tum.de> parents: 556 diff changeset	197	the implementation allows you to do some ``fishy'' things you are not
485 21dec9df46ba updated Christian Urban <urbanc@in.tum.de> parents: 484 diff changeset	198	meant to do with DFAs. Which fishy things could that be?
480 14318f1d3b0f updated Christian Urban <urbanc@in.tum.de> parents: 471 diff changeset	199
482 74149519e436 updated Christian Urban <urbanc@in.tum.de> parents: 480 diff changeset	200
74149519e436 updated Christian Urban <urbanc@in.tum.de> parents: 480 diff changeset	201
74149519e436 updated Christian Urban <urbanc@in.tum.de> parents: 480 diff changeset	202	\subsection*{Non-Deterministic Finite Automata}
74149519e436 updated Christian Urban <urbanc@in.tum.de> parents: 480 diff changeset	203
485 21dec9df46ba updated Christian Urban <urbanc@in.tum.de> parents: 484 diff changeset	204	Remember we want to find out what the relation is between regular
21dec9df46ba updated Christian Urban <urbanc@in.tum.de> parents: 484 diff changeset	205	expressions and automata. To do this with DFAs is a bit unwieldy.
21dec9df46ba updated Christian Urban <urbanc@in.tum.de> parents: 484 diff changeset	206	While with DFAs it is always clear that given a state and a character
21dec9df46ba updated Christian Urban <urbanc@in.tum.de> parents: 484 diff changeset	207	what the next state is (potentially none), it will be convenient to
21dec9df46ba updated Christian Urban <urbanc@in.tum.de> parents: 484 diff changeset	208	relax this restriction. That means we allow states to have several
482 74149519e436 updated Christian Urban <urbanc@in.tum.de> parents: 480 diff changeset	209	potential successor states. We even allow more than one starting
484 8182eb3278e0 updated Christian Urban <urbanc@in.tum.de> parents: 483 diff changeset	210	state. The resulting construction is called a \emph{Non-Deterministic
485 21dec9df46ba updated Christian Urban <urbanc@in.tum.de> parents: 484 diff changeset	211	Finite Automaton} (NFA) given also as a five-tuple ${\cal
21dec9df46ba updated Christian Urban <urbanc@in.tum.de> parents: 484 diff changeset	212	A}(\varSigma, Qs, Q_{0s}, F, \rho)$ where
143 e3fd4c5995ef added Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 142 diff changeset	213
e3fd4c5995ef added Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 142 diff changeset	214	\begin{itemize}
484 8182eb3278e0 updated Christian Urban <urbanc@in.tum.de> parents: 483 diff changeset	215	\item $\varSigma$ is an alphabet,
482 74149519e436 updated Christian Urban <urbanc@in.tum.de> parents: 480 diff changeset	216	\item $Qs$ is a finite set of states
74149519e436 updated Christian Urban <urbanc@in.tum.de> parents: 480 diff changeset	217	\item $Q_{0s}$ is a set of start states ($Q_{0s} \subseteq Qs$)
74149519e436 updated Christian Urban <urbanc@in.tum.de> parents: 480 diff changeset	218	\item $F$ are some accepting states with $F \subseteq Qs$, and
143 e3fd4c5995ef added Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 142 diff changeset	219	\item $\rho$ is a transition relation.
e3fd4c5995ef added Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 142 diff changeset	220	\end{itemize}
e3fd4c5995ef added Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 142 diff changeset	221
e3fd4c5995ef added Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 142 diff changeset	222	\noindent
483 faba5360372c updated Christian Urban <urbanc@in.tum.de> parents: 482 diff changeset	223	A typical example of a NFA is
482 74149519e436 updated Christian Urban <urbanc@in.tum.de> parents: 480 diff changeset	224
74149519e436 updated Christian Urban <urbanc@in.tum.de> parents: 480 diff changeset	225	% A NFA for (ab* + b)*a
143 e3fd4c5995ef added Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 142 diff changeset	226	\begin{center}
483 faba5360372c updated Christian Urban <urbanc@in.tum.de> parents: 482 diff changeset	227	\begin{tikzpicture}[>=stealth',very thick, auto,
faba5360372c updated Christian Urban <urbanc@in.tum.de> parents: 482 diff changeset	228	every state/.style={minimum size=0pt,inner sep=3pt,
faba5360372c updated Christian Urban <urbanc@in.tum.de> parents: 482 diff changeset	229	draw=blue!50,very thick,fill=blue!20},scale=2]
482 74149519e436 updated Christian Urban <urbanc@in.tum.de> parents: 480 diff changeset	230	\node[state,initial] (Q_0) {$Q_0$};
74149519e436 updated Christian Urban <urbanc@in.tum.de> parents: 480 diff changeset	231	\node[state] (Q_1) [right=of Q_0] {$Q_1$};
74149519e436 updated Christian Urban <urbanc@in.tum.de> parents: 480 diff changeset	232	\node[state, accepting] (Q_2) [right=of Q_1] {$Q_2$};
74149519e436 updated Christian Urban <urbanc@in.tum.de> parents: 480 diff changeset	233	\path[->] (Q_0) edge [loop above] node {$b$} ();
74149519e436 updated Christian Urban <urbanc@in.tum.de> parents: 480 diff changeset	234	\path[<-] (Q_0) edge node [below] {$b$} (Q_1);
74149519e436 updated Christian Urban <urbanc@in.tum.de> parents: 480 diff changeset	235	\path[->] (Q_0) edge [bend left] node [above] {$a$} (Q_1);
74149519e436 updated Christian Urban <urbanc@in.tum.de> parents: 480 diff changeset	236	\path[->] (Q_0) edge [bend right] node [below] {$a$} (Q_2);
74149519e436 updated Christian Urban <urbanc@in.tum.de> parents: 480 diff changeset	237	\path[->] (Q_1) edge [loop above] node {$a,b$} ();
74149519e436 updated Christian Urban <urbanc@in.tum.de> parents: 480 diff changeset	238	\path[->] (Q_1) edge node [above] {$a$} (Q_2);
74149519e436 updated Christian Urban <urbanc@in.tum.de> parents: 480 diff changeset	239	\end{tikzpicture}
143 e3fd4c5995ef added Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 142 diff changeset	240	\end{center}
e3fd4c5995ef added Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 142 diff changeset	241
482 74149519e436 updated Christian Urban <urbanc@in.tum.de> parents: 480 diff changeset	242	\noindent
74149519e436 updated Christian Urban <urbanc@in.tum.de> parents: 480 diff changeset	243	This NFA happens to have only one starting state, but in general there
495 acd4567735ce updated Christian Urban <urbanc@in.tum.de> parents: 492 diff changeset	244	could be more than one. Notice that in state $Q_0$ we might go to
acd4567735ce updated Christian Urban <urbanc@in.tum.de> parents: 492 diff changeset	245	state $Q_1$ \emph{or} to state $Q_2$ when receiving an $a$. Similarly
acd4567735ce updated Christian Urban <urbanc@in.tum.de> parents: 492 diff changeset	246	in state $Q_1$ and receiving an $a$, we can stay in state $Q_1$
acd4567735ce updated Christian Urban <urbanc@in.tum.de> parents: 492 diff changeset	247	\emph{or} go to $Q_2$. This kind of choice is not allowed with
acd4567735ce updated Christian Urban <urbanc@in.tum.de> parents: 492 diff changeset	248	DFAs. The downside of this choice in NFAs is that when it comes to
acd4567735ce updated Christian Urban <urbanc@in.tum.de> parents: 492 diff changeset	249	deciding whether a string is accepted by a NFA we potentially have to
acd4567735ce updated Christian Urban <urbanc@in.tum.de> parents: 492 diff changeset	250	explore all possibilities. I let you think which strings the above NFA
acd4567735ce updated Christian Urban <urbanc@in.tum.de> parents: 492 diff changeset	251	accepts.
482 74149519e436 updated Christian Urban <urbanc@in.tum.de> parents: 480 diff changeset	252
74149519e436 updated Christian Urban <urbanc@in.tum.de> parents: 480 diff changeset	253
485 21dec9df46ba updated Christian Urban <urbanc@in.tum.de> parents: 484 diff changeset	254	There are a number of additional points you should note about
483 faba5360372c updated Christian Urban <urbanc@in.tum.de> parents: 482 diff changeset	255	NFAs. Every DFA is a NFA, but not vice versa. The $\rho$ in NFAs is a
faba5360372c updated Christian Urban <urbanc@in.tum.de> parents: 482 diff changeset	256	transition \emph{relation} (DFAs have a transition function). The
faba5360372c updated Christian Urban <urbanc@in.tum.de> parents: 482 diff changeset	257	difference between a function and a relation is that a function has
faba5360372c updated Christian Urban <urbanc@in.tum.de> parents: 482 diff changeset	258	always a single output, while a relation gives, roughly speaking,
faba5360372c updated Christian Urban <urbanc@in.tum.de> parents: 482 diff changeset	259	several outputs. Look again at the NFA above: if you are currently in
faba5360372c updated Christian Urban <urbanc@in.tum.de> parents: 482 diff changeset	260	the state $Q_1$ and you read a character $b$, then you can transition
faba5360372c updated Christian Urban <urbanc@in.tum.de> parents: 482 diff changeset	261	to either $Q_0$ \emph{or} $Q_2$. Which route, or output, you take is
faba5360372c updated Christian Urban <urbanc@in.tum.de> parents: 482 diff changeset	262	not determined. This non-determinism can be represented by a
faba5360372c updated Christian Urban <urbanc@in.tum.de> parents: 482 diff changeset	263	relation.
482 74149519e436 updated Christian Urban <urbanc@in.tum.de> parents: 480 diff changeset	264
483 faba5360372c updated Christian Urban <urbanc@in.tum.de> parents: 482 diff changeset	265	My implementation of NFAs in Scala is shown in Figure~\ref{nfa}.
faba5360372c updated Christian Urban <urbanc@in.tum.de> parents: 482 diff changeset	266	Perhaps interestingly, I do not actually use relations for my NFAs,
485 21dec9df46ba updated Christian Urban <urbanc@in.tum.de> parents: 484 diff changeset	267	but use transition functions that return sets of states. DFAs have
21dec9df46ba updated Christian Urban <urbanc@in.tum.de> parents: 484 diff changeset	268	partial transition functions that return a single state; my NFAs
488 057b4603b940 updated Christian Urban <urbanc@in.tum.de> parents: 487 diff changeset	269	return a set of states. I let you think about this representation for
485 21dec9df46ba updated Christian Urban <urbanc@in.tum.de> parents: 484 diff changeset	270	NFA-transitions and how it corresponds to the relations used in the
487 ffbc65112d48 updated Christian Urban <urbanc@in.tum.de> parents: 485 diff changeset	271	mathematical definition of NFAs. An example of a transition function
488 057b4603b940 updated Christian Urban <urbanc@in.tum.de> parents: 487 diff changeset	272	in Scala for the NFA shown above is
482 74149519e436 updated Christian Urban <urbanc@in.tum.de> parents: 480 diff changeset	273
490 8a07f7256f2a updated Christian Urban <urbanc@in.tum.de> parents: 489 diff changeset	274	{\small\begin{lstlisting}[language=Scala]
487 ffbc65112d48 updated Christian Urban <urbanc@in.tum.de> parents: 485 diff changeset	275	val nfa_delta : (State, Char) :=> Set[State] =
ffbc65112d48 updated Christian Urban <urbanc@in.tum.de> parents: 485 diff changeset	276	{ case (Q0, 'a') => Set(Q1, Q2)
ffbc65112d48 updated Christian Urban <urbanc@in.tum.de> parents: 485 diff changeset	277	case (Q0, 'b') => Set(Q0)
ffbc65112d48 updated Christian Urban <urbanc@in.tum.de> parents: 485 diff changeset	278	case (Q1, 'a') => Set(Q1, Q2)
ffbc65112d48 updated Christian Urban <urbanc@in.tum.de> parents: 485 diff changeset	279	case (Q1, 'b') => Set(Q0, Q1) }
ffbc65112d48 updated Christian Urban <urbanc@in.tum.de> parents: 485 diff changeset	280	\end{lstlisting}}
ffbc65112d48 updated Christian Urban <urbanc@in.tum.de> parents: 485 diff changeset	281
490 8a07f7256f2a updated Christian Urban <urbanc@in.tum.de> parents: 489 diff changeset	282	Like in the mathematical definition, \texttt{starts} is in
487 ffbc65112d48 updated Christian Urban <urbanc@in.tum.de> parents: 485 diff changeset	283	NFAs a set of states; \texttt{fins} is again a function from states to
485 21dec9df46ba updated Christian Urban <urbanc@in.tum.de> parents: 484 diff changeset	284	booleans. The \texttt{next} function calculates the set of next states
21dec9df46ba updated Christian Urban <urbanc@in.tum.de> parents: 484 diff changeset	285	reachable from a single state \texttt{q} by a character~\texttt{c}. In
21dec9df46ba updated Christian Urban <urbanc@in.tum.de> parents: 484 diff changeset	286	case there is no such state---the partial transition function is
21dec9df46ba updated Christian Urban <urbanc@in.tum.de> parents: 484 diff changeset	287	undefined---the empty set is returned (see function
753 30ea6b01db46 updated Christian Urban <christian.urban@kcl.ac.uk> parents: 698 diff changeset	288	\texttt{applyOrElse} in Lines 11 and 12). The function \texttt{nexts}
485 21dec9df46ba updated Christian Urban <urbanc@in.tum.de> parents: 484 diff changeset	289	just lifts this function to sets of states.
21dec9df46ba updated Christian Urban <urbanc@in.tum.de> parents: 484 diff changeset	290
484 8182eb3278e0 updated Christian Urban <urbanc@in.tum.de> parents: 483 diff changeset	291	\begin{figure}[p]
482 74149519e436 updated Christian Urban <urbanc@in.tum.de> parents: 480 diff changeset	292	\small
753 30ea6b01db46 updated Christian Urban <christian.urban@kcl.ac.uk> parents: 698 diff changeset	293	\lstinputlisting[numbers=left,lastline=43]{../progs/automata/nfa.sc}
485 21dec9df46ba updated Christian Urban <urbanc@in.tum.de> parents: 484 diff changeset	294	\caption{A Scala implementation of NFAs using partial functions.
21dec9df46ba updated Christian Urban <urbanc@in.tum.de> parents: 484 diff changeset	295	Notice that the function \texttt{accepts} implements the
556 4b0fffaef849 updated Christian Urban <urbanc@in.tum.de> parents: 518 diff changeset	296	acceptance of a string in a breadth-first search fashion. This can be a costly
485 21dec9df46ba updated Christian Urban <urbanc@in.tum.de> parents: 484 diff changeset	297	way of deciding whether a string is accepted or not in applications that need to handle
21dec9df46ba updated Christian Urban <urbanc@in.tum.de> parents: 484 diff changeset	298	large NFAs and large inputs.\label{nfa}}
482 74149519e436 updated Christian Urban <urbanc@in.tum.de> parents: 480 diff changeset	299	\end{figure}
74149519e436 updated Christian Urban <urbanc@in.tum.de> parents: 480 diff changeset	300
485 21dec9df46ba updated Christian Urban <urbanc@in.tum.de> parents: 484 diff changeset	301	Look very careful at the \texttt{accepts} and \texttt{deltas}
21dec9df46ba updated Christian Urban <urbanc@in.tum.de> parents: 484 diff changeset	302	functions in NFAs and remember that when accepting a string by a NFA
484 8182eb3278e0 updated Christian Urban <urbanc@in.tum.de> parents: 483 diff changeset	303	we might have to explore all possible transitions (recall which state
753 30ea6b01db46 updated Christian Urban <christian.urban@kcl.ac.uk> parents: 698 diff changeset	304	to go to is not unique any more with NFAs\ldots{}we need to explore
485 21dec9df46ba updated Christian Urban <urbanc@in.tum.de> parents: 484 diff changeset	305	potentially all next states). The implementation achieves this
487 ffbc65112d48 updated Christian Urban <urbanc@in.tum.de> parents: 485 diff changeset	306	exploration through a \emph{breadth-first search}. This is fine for
485 21dec9df46ba updated Christian Urban <urbanc@in.tum.de> parents: 484 diff changeset	307	small NFAs, but can lead to real memory problems when the NFAs are
21dec9df46ba updated Christian Urban <urbanc@in.tum.de> parents: 484 diff changeset	308	bigger and larger strings need to be processed. As result, some
21dec9df46ba updated Christian Urban <urbanc@in.tum.de> parents: 484 diff changeset	309	regular expression matching engines resort to a \emph{depth-first
21dec9df46ba updated Christian Urban <urbanc@in.tum.de> parents: 484 diff changeset	310	search} with \emph{backtracking} in unsuccessful cases. In our
21dec9df46ba updated Christian Urban <urbanc@in.tum.de> parents: 484 diff changeset	311	implementation we can implement a depth-first version of
21dec9df46ba updated Christian Urban <urbanc@in.tum.de> parents: 484 diff changeset	312	\texttt{accepts} using Scala's \texttt{exists}-function as follows:
483 faba5360372c updated Christian Urban <urbanc@in.tum.de> parents: 482 diff changeset	313
faba5360372c updated Christian Urban <urbanc@in.tum.de> parents: 482 diff changeset	314
490 8a07f7256f2a updated Christian Urban <urbanc@in.tum.de> parents: 489 diff changeset	315	{\small\begin{lstlisting}[language=Scala]
483 faba5360372c updated Christian Urban <urbanc@in.tum.de> parents: 482 diff changeset	316	def search(q: A, s: List[C]) : Boolean = s match {
faba5360372c updated Christian Urban <urbanc@in.tum.de> parents: 482 diff changeset	317	case Nil => fins(q)
485 21dec9df46ba updated Christian Urban <urbanc@in.tum.de> parents: 484 diff changeset	318	case c::cs => next(q, c).exists(search(_, cs))
483 faba5360372c updated Christian Urban <urbanc@in.tum.de> parents: 482 diff changeset	319	}
faba5360372c updated Christian Urban <urbanc@in.tum.de> parents: 482 diff changeset	320
485 21dec9df46ba updated Christian Urban <urbanc@in.tum.de> parents: 484 diff changeset	321	def accepts2(s: List[C]) : Boolean =
483 faba5360372c updated Christian Urban <urbanc@in.tum.de> parents: 482 diff changeset	322	starts.exists(search(_, s))
faba5360372c updated Christian Urban <urbanc@in.tum.de> parents: 482 diff changeset	323	\end{lstlisting}}
faba5360372c updated Christian Urban <urbanc@in.tum.de> parents: 482 diff changeset	324
faba5360372c updated Christian Urban <urbanc@in.tum.de> parents: 482 diff changeset	325	\noindent
487 ffbc65112d48 updated Christian Urban <urbanc@in.tum.de> parents: 485 diff changeset	326	This depth-first way of exploration seems to work quite efficiently in
ffbc65112d48 updated Christian Urban <urbanc@in.tum.de> parents: 485 diff changeset	327	many examples and is much less of a strain on memory. The problem is
ffbc65112d48 updated Christian Urban <urbanc@in.tum.de> parents: 485 diff changeset	328	that the backtracking can get ``catastrophic'' in some
ffbc65112d48 updated Christian Urban <urbanc@in.tum.de> parents: 485 diff changeset	329	examples---remember the catastrophic backtracking from earlier
ffbc65112d48 updated Christian Urban <urbanc@in.tum.de> parents: 485 diff changeset	330	lectures. This depth-first search with backtracking is the reason for
ffbc65112d48 updated Christian Urban <urbanc@in.tum.de> parents: 485 diff changeset	331	the abysmal performance of some regular expression matchings in Java,
ffbc65112d48 updated Christian Urban <urbanc@in.tum.de> parents: 485 diff changeset	332	Ruby and Python. I like to show you this in the next two sections.
482 74149519e436 updated Christian Urban <urbanc@in.tum.de> parents: 480 diff changeset	333
268 18bef085a7ca updated Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 251 diff changeset	334
490 8a07f7256f2a updated Christian Urban <urbanc@in.tum.de> parents: 489 diff changeset	335	\subsection*{Epsilon NFAs}
143 e3fd4c5995ef added Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 142 diff changeset	336
485 21dec9df46ba updated Christian Urban <urbanc@in.tum.de> parents: 484 diff changeset	337	In order to get an idea what calculations are performed by Java \&
21dec9df46ba updated Christian Urban <urbanc@in.tum.de> parents: 484 diff changeset	338	friends, we need a method for transforming a regular expression into
753 30ea6b01db46 updated Christian Urban <christian.urban@kcl.ac.uk> parents: 698 diff changeset	339	a corresponding automaton. This automaton should accept exactly those strings that
485 21dec9df46ba updated Christian Urban <urbanc@in.tum.de> parents: 484 diff changeset	340	are accepted by the regular expression. The simplest and most
753 30ea6b01db46 updated Christian Urban <christian.urban@kcl.ac.uk> parents: 698 diff changeset	341	well-known method for this is called the \emph{Thompson Construction},
485 21dec9df46ba updated Christian Urban <urbanc@in.tum.de> parents: 484 diff changeset	342	after the Turing Award winner Ken Thompson. This method is by
487 ffbc65112d48 updated Christian Urban <urbanc@in.tum.de> parents: 485 diff changeset	343	recursion over regular expressions and depends on the non-determinism
488 057b4603b940 updated Christian Urban <urbanc@in.tum.de> parents: 487 diff changeset	344	in NFAs described in the previous section. You will see shortly why
487 ffbc65112d48 updated Christian Urban <urbanc@in.tum.de> parents: 485 diff changeset	345	this construction works well with NFAs, but is not so straightforward
ffbc65112d48 updated Christian Urban <urbanc@in.tum.de> parents: 485 diff changeset	346	with DFAs.
ffbc65112d48 updated Christian Urban <urbanc@in.tum.de> parents: 485 diff changeset	347
ffbc65112d48 updated Christian Urban <urbanc@in.tum.de> parents: 485 diff changeset	348	Unfortunately we are still one step away from our intended target
ffbc65112d48 updated Christian Urban <urbanc@in.tum.de> parents: 485 diff changeset	349	though---because this construction uses a version of NFAs that allows
ffbc65112d48 updated Christian Urban <urbanc@in.tum.de> parents: 485 diff changeset	350	``silent transitions''. The idea behind silent transitions is that
ffbc65112d48 updated Christian Urban <urbanc@in.tum.de> parents: 485 diff changeset	351	they allow us to go from one state to the next without having to
ffbc65112d48 updated Christian Urban <urbanc@in.tum.de> parents: 485 diff changeset	352	consume a character. We label such silent transition with the letter
ffbc65112d48 updated Christian Urban <urbanc@in.tum.de> parents: 485 diff changeset	353	$\epsilon$ and call the automata $\epsilon$NFAs. Two typical examples
ffbc65112d48 updated Christian Urban <urbanc@in.tum.de> parents: 485 diff changeset	354	of $\epsilon$NFAs are:
484 8182eb3278e0 updated Christian Urban <urbanc@in.tum.de> parents: 483 diff changeset	355
8182eb3278e0 updated Christian Urban <urbanc@in.tum.de> parents: 483 diff changeset	356
485 21dec9df46ba updated Christian Urban <urbanc@in.tum.de> parents: 484 diff changeset	357	\begin{center}
21dec9df46ba updated Christian Urban <urbanc@in.tum.de> parents: 484 diff changeset	358	\begin{tabular}[t]{c@{\hspace{9mm}}c}
21dec9df46ba updated Christian Urban <urbanc@in.tum.de> parents: 484 diff changeset	359	\begin{tikzpicture}[>=stealth',very thick,
21dec9df46ba updated Christian Urban <urbanc@in.tum.de> parents: 484 diff changeset	360	every state/.style={minimum size=0pt,draw=blue!50,very thick,fill=blue!20},]
21dec9df46ba updated Christian Urban <urbanc@in.tum.de> parents: 484 diff changeset	361	\node[state,initial] (Q_0) {$Q_0$};
21dec9df46ba updated Christian Urban <urbanc@in.tum.de> parents: 484 diff changeset	362	\node[state] (Q_1) [above=of Q_0] {$Q_1$};
21dec9df46ba updated Christian Urban <urbanc@in.tum.de> parents: 484 diff changeset	363	\node[state, accepting] (Q_2) [below=of Q_0] {$Q_2$};
21dec9df46ba updated Christian Urban <urbanc@in.tum.de> parents: 484 diff changeset	364	\path[->] (Q_0) edge node [left] {$\epsilon$} (Q_1);
21dec9df46ba updated Christian Urban <urbanc@in.tum.de> parents: 484 diff changeset	365	\path[->] (Q_0) edge node [left] {$\epsilon$} (Q_2);
21dec9df46ba updated Christian Urban <urbanc@in.tum.de> parents: 484 diff changeset	366	\path[->] (Q_0) edge [loop right] node {$a$} ();
21dec9df46ba updated Christian Urban <urbanc@in.tum.de> parents: 484 diff changeset	367	\path[->] (Q_1) edge [loop right] node {$a$} ();
21dec9df46ba updated Christian Urban <urbanc@in.tum.de> parents: 484 diff changeset	368	\path[->] (Q_2) edge [loop right] node {$b$} ();
21dec9df46ba updated Christian Urban <urbanc@in.tum.de> parents: 484 diff changeset	369	\end{tikzpicture} &
21dec9df46ba updated Christian Urban <urbanc@in.tum.de> parents: 484 diff changeset	370
21dec9df46ba updated Christian Urban <urbanc@in.tum.de> parents: 484 diff changeset	371	\raisebox{20mm}{
21dec9df46ba updated Christian Urban <urbanc@in.tum.de> parents: 484 diff changeset	372	\begin{tikzpicture}[>=stealth',very thick,
21dec9df46ba updated Christian Urban <urbanc@in.tum.de> parents: 484 diff changeset	373	every state/.style={minimum size=0pt,draw=blue!50,very thick,fill=blue!20},]
21dec9df46ba updated Christian Urban <urbanc@in.tum.de> parents: 484 diff changeset	374	\node[state,initial] (r_1) {$R_1$};
21dec9df46ba updated Christian Urban <urbanc@in.tum.de> parents: 484 diff changeset	375	\node[state] (r_2) [above=of r_1] {$R_2$};
21dec9df46ba updated Christian Urban <urbanc@in.tum.de> parents: 484 diff changeset	376	\node[state, accepting] (r_3) [right=of r_1] {$R_3$};
21dec9df46ba updated Christian Urban <urbanc@in.tum.de> parents: 484 diff changeset	377	\path[->] (r_1) edge node [below] {$b$} (r_3);
21dec9df46ba updated Christian Urban <urbanc@in.tum.de> parents: 484 diff changeset	378	\path[->] (r_2) edge [bend left] node [above] {$a$} (r_3);
21dec9df46ba updated Christian Urban <urbanc@in.tum.de> parents: 484 diff changeset	379	\path[->] (r_1) edge [bend left] node [left] {$\epsilon$} (r_2);
21dec9df46ba updated Christian Urban <urbanc@in.tum.de> parents: 484 diff changeset	380	\path[->] (r_2) edge [bend left] node [right] {$a$} (r_1);
21dec9df46ba updated Christian Urban <urbanc@in.tum.de> parents: 484 diff changeset	381	\end{tikzpicture}}
21dec9df46ba updated Christian Urban <urbanc@in.tum.de> parents: 484 diff changeset	382	\end{tabular}
21dec9df46ba updated Christian Urban <urbanc@in.tum.de> parents: 484 diff changeset	383	\end{center}
21dec9df46ba updated Christian Urban <urbanc@in.tum.de> parents: 484 diff changeset	384
21dec9df46ba updated Christian Urban <urbanc@in.tum.de> parents: 484 diff changeset	385	\noindent
487 ffbc65112d48 updated Christian Urban <urbanc@in.tum.de> parents: 485 diff changeset	386	Consider the $\epsilon$NFA on the left-hand side: the
ffbc65112d48 updated Christian Urban <urbanc@in.tum.de> parents: 485 diff changeset	387	$\epsilon$-transitions mean you do not have to ``consume'' any part of
ffbc65112d48 updated Christian Urban <urbanc@in.tum.de> parents: 485 diff changeset	388	the input string, but ``silently'' change to a different state. In
ffbc65112d48 updated Christian Urban <urbanc@in.tum.de> parents: 485 diff changeset	389	this example, if you are in the starting state $Q_0$, you can silently
ffbc65112d48 updated Christian Urban <urbanc@in.tum.de> parents: 485 diff changeset	390	move either to $Q_1$ or $Q_2$. You can see that once you are in $Q_1$,
ffbc65112d48 updated Christian Urban <urbanc@in.tum.de> parents: 485 diff changeset	391	respectively $Q_2$, you cannot ``go back'' to the other states. So it
490 8a07f7256f2a updated Christian Urban <urbanc@in.tum.de> parents: 489 diff changeset	392	seems allowing $\epsilon$-transitions is a rather substantial
487 ffbc65112d48 updated Christian Urban <urbanc@in.tum.de> parents: 485 diff changeset	393	extension to NFAs. On first appearances, $\epsilon$-transitions might
ffbc65112d48 updated Christian Urban <urbanc@in.tum.de> parents: 485 diff changeset	394	even look rather strange, or even silly. To start with, silent
ffbc65112d48 updated Christian Urban <urbanc@in.tum.de> parents: 485 diff changeset	395	transitions make the decision whether a string is accepted by an
ffbc65112d48 updated Christian Urban <urbanc@in.tum.de> parents: 485 diff changeset	396	automaton even harder: with $\epsilon$NFAs we have to look whether we
ffbc65112d48 updated Christian Urban <urbanc@in.tum.de> parents: 485 diff changeset	397	can do first some $\epsilon$-transitions and then do a
ffbc65112d48 updated Christian Urban <urbanc@in.tum.de> parents: 485 diff changeset	398	``proper''-transition; and after any ``proper''-transition we again
ffbc65112d48 updated Christian Urban <urbanc@in.tum.de> parents: 485 diff changeset	399	have to check whether we can do again some silent transitions. Even
ffbc65112d48 updated Christian Urban <urbanc@in.tum.de> parents: 485 diff changeset	400	worse, if there is a silent transition pointing back to the same
ffbc65112d48 updated Christian Urban <urbanc@in.tum.de> parents: 485 diff changeset	401	state, then we have to be careful our decision procedure for strings
ffbc65112d48 updated Christian Urban <urbanc@in.tum.de> parents: 485 diff changeset	402	does not loop (remember the depth-first search for exploring all
ffbc65112d48 updated Christian Urban <urbanc@in.tum.de> parents: 485 diff changeset	403	states).
485 21dec9df46ba updated Christian Urban <urbanc@in.tum.de> parents: 484 diff changeset	404
21dec9df46ba updated Christian Urban <urbanc@in.tum.de> parents: 484 diff changeset	405	The obvious question is: Do we get anything in return for this hassle
21dec9df46ba updated Christian Urban <urbanc@in.tum.de> parents: 484 diff changeset	406	with silent transitions? Well, we still have to work for it\ldots
21dec9df46ba updated Christian Urban <urbanc@in.tum.de> parents: 484 diff changeset	407	unfortunately. If we were to follow the many textbooks on the
21dec9df46ba updated Christian Urban <urbanc@in.tum.de> parents: 484 diff changeset	408	subject, we would now start with defining what $\epsilon$NFAs
21dec9df46ba updated Christian Urban <urbanc@in.tum.de> parents: 484 diff changeset	409	are---that would require extending the transition relation of
490 8a07f7256f2a updated Christian Urban <urbanc@in.tum.de> parents: 489 diff changeset	410	NFAs. Next, we would show that the $\epsilon$NFAs are equivalent to
488 057b4603b940 updated Christian Urban <urbanc@in.tum.de> parents: 487 diff changeset	411	NFAs and so on. Once we have done all this on paper, we would need to
753 30ea6b01db46 updated Christian Urban <christian.urban@kcl.ac.uk> parents: 698 diff changeset	412	implement $\epsilon$NFAs. Let's try to take a shortcut instead. We are
488 057b4603b940 updated Christian Urban <urbanc@in.tum.de> parents: 487 diff changeset	413	not really interested in $\epsilon$NFAs; they are only a convenient
057b4603b940 updated Christian Urban <urbanc@in.tum.de> parents: 487 diff changeset	414	tool for translating regular expressions into automata. So we are not
057b4603b940 updated Christian Urban <urbanc@in.tum.de> parents: 487 diff changeset	415	going to implementing them explicitly, but translate them immediately
057b4603b940 updated Christian Urban <urbanc@in.tum.de> parents: 487 diff changeset	416	into NFAs (in a sense $\epsilon$NFAs are just a convenient API for
753 30ea6b01db46 updated Christian Urban <christian.urban@kcl.ac.uk> parents: 698 diff changeset	417	lazy people ;o). How does this translation work? Well we have to find
488 057b4603b940 updated Christian Urban <urbanc@in.tum.de> parents: 487 diff changeset	418	all transitions of the form
485 21dec9df46ba updated Christian Urban <urbanc@in.tum.de> parents: 484 diff changeset	419
21dec9df46ba updated Christian Urban <urbanc@in.tum.de> parents: 484 diff changeset	420	\[
21dec9df46ba updated Christian Urban <urbanc@in.tum.de> parents: 484 diff changeset	421	q\stackrel{\epsilon}{\longrightarrow}\ldots\stackrel{\epsilon}{\longrightarrow}
21dec9df46ba updated Christian Urban <urbanc@in.tum.de> parents: 484 diff changeset	422	\;\stackrel{a}{\longrightarrow}\;
21dec9df46ba updated Christian Urban <urbanc@in.tum.de> parents: 484 diff changeset	423	\stackrel{\epsilon}{\longrightarrow}\ldots\stackrel{\epsilon}{\longrightarrow} q'
21dec9df46ba updated Christian Urban <urbanc@in.tum.de> parents: 484 diff changeset	424	\]
21dec9df46ba updated Christian Urban <urbanc@in.tum.de> parents: 484 diff changeset	425
753 30ea6b01db46 updated Christian Urban <christian.urban@kcl.ac.uk> parents: 698 diff changeset	426	\noindent where somewhere in the ``middle'' is an $a$-transition (for
30ea6b01db46 updated Christian Urban <christian.urban@kcl.ac.uk> parents: 698 diff changeset	427	a character, say, $a$). We replace them with
30ea6b01db46 updated Christian Urban <christian.urban@kcl.ac.uk> parents: 698 diff changeset	428	$q \stackrel{a}{\longrightarrow} q'$. Doing this to the $\epsilon$NFA
30ea6b01db46 updated Christian Urban <christian.urban@kcl.ac.uk> parents: 698 diff changeset	429	on the right-hand side above gives the NFA
485 21dec9df46ba updated Christian Urban <urbanc@in.tum.de> parents: 484 diff changeset	430
21dec9df46ba updated Christian Urban <urbanc@in.tum.de> parents: 484 diff changeset	431	\begin{center}
21dec9df46ba updated Christian Urban <urbanc@in.tum.de> parents: 484 diff changeset	432	\begin{tikzpicture}[>=stealth',very thick,
21dec9df46ba updated Christian Urban <urbanc@in.tum.de> parents: 484 diff changeset	433	every state/.style={minimum size=0pt,draw=blue!50,very thick,fill=blue!20},]
21dec9df46ba updated Christian Urban <urbanc@in.tum.de> parents: 484 diff changeset	434	\node[state,initial] (r_1) {$R_1$};
21dec9df46ba updated Christian Urban <urbanc@in.tum.de> parents: 484 diff changeset	435	\node[state] (r_2) [above=of r_1] {$R_2$};
21dec9df46ba updated Christian Urban <urbanc@in.tum.de> parents: 484 diff changeset	436	\node[state, accepting] (r_3) [right=of r_1] {$R_3$};
21dec9df46ba updated Christian Urban <urbanc@in.tum.de> parents: 484 diff changeset	437	\path[->] (r_1) edge node [above] {$b$} (r_3);
21dec9df46ba updated Christian Urban <urbanc@in.tum.de> parents: 484 diff changeset	438	\path[->] (r_2) edge [bend left] node [above] {$a$} (r_3);
21dec9df46ba updated Christian Urban <urbanc@in.tum.de> parents: 484 diff changeset	439	\path[->] (r_1) edge [bend left] node [left] {$a$} (r_2);
21dec9df46ba updated Christian Urban <urbanc@in.tum.de> parents: 484 diff changeset	440	\path[->] (r_2) edge [bend left] node [right] {$a$} (r_1);
21dec9df46ba updated Christian Urban <urbanc@in.tum.de> parents: 484 diff changeset	441	\path[->] (r_1) edge [loop below] node {$a$} ();
21dec9df46ba updated Christian Urban <urbanc@in.tum.de> parents: 484 diff changeset	442	\path[->] (r_1) edge [bend right] node [below] {$a$} (r_3);
21dec9df46ba updated Christian Urban <urbanc@in.tum.de> parents: 484 diff changeset	443	\end{tikzpicture}
21dec9df46ba updated Christian Urban <urbanc@in.tum.de> parents: 484 diff changeset	444	\end{center}
21dec9df46ba updated Christian Urban <urbanc@in.tum.de> parents: 484 diff changeset	445
487 ffbc65112d48 updated Christian Urban <urbanc@in.tum.de> parents: 485 diff changeset	446	\noindent where the single $\epsilon$-transition is replaced by
ffbc65112d48 updated Christian Urban <urbanc@in.tum.de> parents: 485 diff changeset	447	three additional $a$-transitions. Please do the calculations yourself
ffbc65112d48 updated Christian Urban <urbanc@in.tum.de> parents: 485 diff changeset	448	and verify that I did not forget any transition.
ffbc65112d48 updated Christian Urban <urbanc@in.tum.de> parents: 485 diff changeset	449
ffbc65112d48 updated Christian Urban <urbanc@in.tum.de> parents: 485 diff changeset	450	So in what follows, whenever we are given an $\epsilon$NFA we will
488 057b4603b940 updated Christian Urban <urbanc@in.tum.de> parents: 487 diff changeset	451	replace it by an equivalent NFA. The Scala code for this translation
057b4603b940 updated Christian Urban <urbanc@in.tum.de> parents: 487 diff changeset	452	is given in Figure~\ref{enfa}. The main workhorse in this code is a
753 30ea6b01db46 updated Christian Urban <christian.urban@kcl.ac.uk> parents: 698 diff changeset	453	function that calculates a fixpoint of function (Lines 6--12). This
488 057b4603b940 updated Christian Urban <urbanc@in.tum.de> parents: 487 diff changeset	454	function is used for ``discovering'' which states are reachable by
487 ffbc65112d48 updated Christian Urban <urbanc@in.tum.de> parents: 485 diff changeset	455	$\epsilon$-transitions. Once no new state is discovered, a fixpoint is
ffbc65112d48 updated Christian Urban <urbanc@in.tum.de> parents: 485 diff changeset	456	reached. This is used for example when calculating the starting states
753 30ea6b01db46 updated Christian Urban <christian.urban@kcl.ac.uk> parents: 698 diff changeset	457	of an equivalent NFA (see Line 28): we start with all starting states
487 ffbc65112d48 updated Christian Urban <urbanc@in.tum.de> parents: 485 diff changeset	458	of the $\epsilon$NFA and then look for all additional states that can
ffbc65112d48 updated Christian Urban <urbanc@in.tum.de> parents: 485 diff changeset	459	be reached by $\epsilon$-transitions. We keep on doing this until no
ffbc65112d48 updated Christian Urban <urbanc@in.tum.de> parents: 485 diff changeset	460	new state can be reached. This is what the $\epsilon$-closure, named
ffbc65112d48 updated Christian Urban <urbanc@in.tum.de> parents: 485 diff changeset	461	in the code \texttt{ecl}, calculates. Similarly, an accepting state of
ffbc65112d48 updated Christian Urban <urbanc@in.tum.de> parents: 485 diff changeset	462	the NFA is when we can reach an accepting state of the $\epsilon$NFA
ffbc65112d48 updated Christian Urban <urbanc@in.tum.de> parents: 485 diff changeset	463	by $\epsilon$-transitions.
ffbc65112d48 updated Christian Urban <urbanc@in.tum.de> parents: 485 diff changeset	464
485 21dec9df46ba updated Christian Urban <urbanc@in.tum.de> parents: 484 diff changeset	465
21dec9df46ba updated Christian Urban <urbanc@in.tum.de> parents: 484 diff changeset	466	\begin{figure}[p]
21dec9df46ba updated Christian Urban <urbanc@in.tum.de> parents: 484 diff changeset	467	\small
753 30ea6b01db46 updated Christian Urban <christian.urban@kcl.ac.uk> parents: 698 diff changeset	468	\lstinputlisting[numbers=left,lastline=43]{../progs/automata/enfa.sc}
485 21dec9df46ba updated Christian Urban <urbanc@in.tum.de> parents: 484 diff changeset	469
21dec9df46ba updated Christian Urban <urbanc@in.tum.de> parents: 484 diff changeset	470	\caption{A Scala function that translates $\epsilon$NFA into NFAs. The
490 8a07f7256f2a updated Christian Urban <urbanc@in.tum.de> parents: 489 diff changeset	471	transition function of $\epsilon$NFA takes as input an \texttt{Option[C]}.
485 21dec9df46ba updated Christian Urban <urbanc@in.tum.de> parents: 484 diff changeset	472	\texttt{None} stands for an $\epsilon$-transition; \texttt{Some(c)}
488 057b4603b940 updated Christian Urban <urbanc@in.tum.de> parents: 487 diff changeset	473	for a ``proper'' transition consuming a character. The functions in
753 30ea6b01db46 updated Christian Urban <christian.urban@kcl.ac.uk> parents: 698 diff changeset	474	Lines 19--24 calculate
485 21dec9df46ba updated Christian Urban <urbanc@in.tum.de> parents: 484 diff changeset	475	all states reachable by one or more $\epsilon$-transition for a given
753 30ea6b01db46 updated Christian Urban <christian.urban@kcl.ac.uk> parents: 698 diff changeset	476	set of states. The NFA is constructed in Lines 30--34.
30ea6b01db46 updated Christian Urban <christian.urban@kcl.ac.uk> parents: 698 diff changeset	477	Note the interesting commands in Lines 7 and 8: their purpose is
491 7a0182c66403 updated Christian Urban <urbanc@in.tum.de> parents: 490 diff changeset	478	to ensure that \texttt{fixpT} is the tail-recursive version of
7a0182c66403 updated Christian Urban <urbanc@in.tum.de> parents: 490 diff changeset	479	the fixpoint construction; otherwise we would quickly get a
7a0182c66403 updated Christian Urban <urbanc@in.tum.de> parents: 490 diff changeset	480	stack-overflow exception, even on small examples, due to limitations
7a0182c66403 updated Christian Urban <urbanc@in.tum.de> parents: 490 diff changeset	481	of the JVM.
7a0182c66403 updated Christian Urban <urbanc@in.tum.de> parents: 490 diff changeset	482	\label{enfa}}
485 21dec9df46ba updated Christian Urban <urbanc@in.tum.de> parents: 484 diff changeset	483	\end{figure}
21dec9df46ba updated Christian Urban <urbanc@in.tum.de> parents: 484 diff changeset	484
487 ffbc65112d48 updated Christian Urban <urbanc@in.tum.de> parents: 485 diff changeset	485	Also look carefully how the transitions of $\epsilon$NFAs are
ffbc65112d48 updated Christian Urban <urbanc@in.tum.de> parents: 485 diff changeset	486	implemented. The additional possibility of performing silent
ffbc65112d48 updated Christian Urban <urbanc@in.tum.de> parents: 485 diff changeset	487	transitions is encoded by using \texttt{Option[C]} as the type for the
490 8a07f7256f2a updated Christian Urban <urbanc@in.tum.de> parents: 489 diff changeset	488	``input''. The \texttt{Some}s stand for ``proper'' transitions where
487 ffbc65112d48 updated Christian Urban <urbanc@in.tum.de> parents: 485 diff changeset	489	a character is consumed; \texttt{None} stands for
ffbc65112d48 updated Christian Urban <urbanc@in.tum.de> parents: 485 diff changeset	490	$\epsilon$-transitions. The transition functions for the two
ffbc65112d48 updated Christian Urban <urbanc@in.tum.de> parents: 485 diff changeset	491	$\epsilon$NFAs from the beginning of this section can be defined as
485 21dec9df46ba updated Christian Urban <urbanc@in.tum.de> parents: 484 diff changeset	492
490 8a07f7256f2a updated Christian Urban <urbanc@in.tum.de> parents: 489 diff changeset	493	{\small\begin{lstlisting}[language=Scala]
487 ffbc65112d48 updated Christian Urban <urbanc@in.tum.de> parents: 485 diff changeset	494	val enfa_trans1 : (State, Option[Char]) :=> Set[State] =
ffbc65112d48 updated Christian Urban <urbanc@in.tum.de> parents: 485 diff changeset	495	{ case (Q0, Some('a')) => Set(Q0)
ffbc65112d48 updated Christian Urban <urbanc@in.tum.de> parents: 485 diff changeset	496	case (Q0, None) => Set(Q1, Q2)
ffbc65112d48 updated Christian Urban <urbanc@in.tum.de> parents: 485 diff changeset	497	case (Q1, Some('a')) => Set(Q1)
ffbc65112d48 updated Christian Urban <urbanc@in.tum.de> parents: 485 diff changeset	498	case (Q2, Some('b')) => Set(Q2) }
ffbc65112d48 updated Christian Urban <urbanc@in.tum.de> parents: 485 diff changeset	499
ffbc65112d48 updated Christian Urban <urbanc@in.tum.de> parents: 485 diff changeset	500	val enfa_trans2 : (State, Option[Char]) :=> Set[State] =
ffbc65112d48 updated Christian Urban <urbanc@in.tum.de> parents: 485 diff changeset	501	{ case (R1, Some('b')) => Set(R3)
ffbc65112d48 updated Christian Urban <urbanc@in.tum.de> parents: 485 diff changeset	502	case (R1, None) => Set(R2)
ffbc65112d48 updated Christian Urban <urbanc@in.tum.de> parents: 485 diff changeset	503	case (R2, Some('a')) => Set(R1, R3) }
ffbc65112d48 updated Christian Urban <urbanc@in.tum.de> parents: 485 diff changeset	504	\end{lstlisting}}
ffbc65112d48 updated Christian Urban <urbanc@in.tum.de> parents: 485 diff changeset	505
ffbc65112d48 updated Christian Urban <urbanc@in.tum.de> parents: 485 diff changeset	506	\noindent
ffbc65112d48 updated Christian Urban <urbanc@in.tum.de> parents: 485 diff changeset	507	I hope you agree now with my earlier statement that the $\epsilon$NFAs
ffbc65112d48 updated Christian Urban <urbanc@in.tum.de> parents: 485 diff changeset	508	are just an API for NFAs.
ffbc65112d48 updated Christian Urban <urbanc@in.tum.de> parents: 485 diff changeset	509
490 8a07f7256f2a updated Christian Urban <urbanc@in.tum.de> parents: 489 diff changeset	510	\subsection*{Thompson Construction}
487 ffbc65112d48 updated Christian Urban <urbanc@in.tum.de> parents: 485 diff changeset	511
ffbc65112d48 updated Christian Urban <urbanc@in.tum.de> parents: 485 diff changeset	512	Having the translation of $\epsilon$NFAs to NFAs in place, we can
ffbc65112d48 updated Christian Urban <urbanc@in.tum.de> parents: 485 diff changeset	513	finally return to the problem of translating regular expressions into
ffbc65112d48 updated Christian Urban <urbanc@in.tum.de> parents: 485 diff changeset	514	equivalent NFAs. Recall that by equivalent we mean that the NFAs
485 21dec9df46ba updated Christian Urban <urbanc@in.tum.de> parents: 484 diff changeset	515	recognise the same language. Consider the simple regular expressions
21dec9df46ba updated Christian Urban <urbanc@in.tum.de> parents: 484 diff changeset	516	$\ZERO$, $\ONE$ and $c$. They can be translated into equivalent NFAs
21dec9df46ba updated Christian Urban <urbanc@in.tum.de> parents: 484 diff changeset	517	as follows:
143 e3fd4c5995ef added Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 142 diff changeset	518
488 057b4603b940 updated Christian Urban <urbanc@in.tum.de> parents: 487 diff changeset	519	\begin{equation}\mbox{
143 e3fd4c5995ef added Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 142 diff changeset	520	\begin{tabular}[t]{l@{\hspace{10mm}}l}
444 3056a4c071b0 updated Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 349 diff changeset	521	\raisebox{1mm}{$\ZERO$} &
143 e3fd4c5995ef added Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 142 diff changeset	522	\begin{tikzpicture}[scale=0.7,>=stealth',very thick, every state/.style={minimum size=3pt,draw=blue!50,very thick,fill=blue!20},]
482 74149519e436 updated Christian Urban <urbanc@in.tum.de> parents: 480 diff changeset	523	\node[state, initial] (Q_0) {$\mbox{}$};
143 e3fd4c5995ef added Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 142 diff changeset	524	\end{tikzpicture}\\\\
444 3056a4c071b0 updated Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 349 diff changeset	525	\raisebox{1mm}{$\ONE$} &
143 e3fd4c5995ef added Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 142 diff changeset	526	\begin{tikzpicture}[scale=0.7,>=stealth',very thick, every state/.style={minimum size=3pt,draw=blue!50,very thick,fill=blue!20},]
482 74149519e436 updated Christian Urban <urbanc@in.tum.de> parents: 480 diff changeset	527	\node[state, initial, accepting] (Q_0) {$\mbox{}$};
143 e3fd4c5995ef added Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 142 diff changeset	528	\end{tikzpicture}\\\\
487 ffbc65112d48 updated Christian Urban <urbanc@in.tum.de> parents: 485 diff changeset	529	\raisebox{3mm}{$c$} &
143 e3fd4c5995ef added Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 142 diff changeset	530	\begin{tikzpicture}[scale=0.7,>=stealth',very thick, every state/.style={minimum size=3pt,draw=blue!50,very thick,fill=blue!20},]
482 74149519e436 updated Christian Urban <urbanc@in.tum.de> parents: 480 diff changeset	531	\node[state, initial] (Q_0) {$\mbox{}$};
74149519e436 updated Christian Urban <urbanc@in.tum.de> parents: 480 diff changeset	532	\node[state, accepting] (Q_1) [right=of Q_0] {$\mbox{}$};
74149519e436 updated Christian Urban <urbanc@in.tum.de> parents: 480 diff changeset	533	\path[->] (Q_0) edge node [below] {$c$} (Q_1);
487 ffbc65112d48 updated Christian Urban <urbanc@in.tum.de> parents: 485 diff changeset	534	\end{tikzpicture}\\
488 057b4603b940 updated Christian Urban <urbanc@in.tum.de> parents: 487 diff changeset	535	\end{tabular}}\label{simplecases}
057b4603b940 updated Christian Urban <urbanc@in.tum.de> parents: 487 diff changeset	536	\end{equation}
143 e3fd4c5995ef added Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 142 diff changeset	537
487 ffbc65112d48 updated Christian Urban <urbanc@in.tum.de> parents: 485 diff changeset	538	\noindent
ffbc65112d48 updated Christian Urban <urbanc@in.tum.de> parents: 485 diff changeset	539	I let you think whether the NFAs can match exactly those strings the
ffbc65112d48 updated Christian Urban <urbanc@in.tum.de> parents: 485 diff changeset	540	regular expressions can match. To do this translation in code we need
931 c9d6b50345d7 updated Christian Urban <christian.urban@kcl.ac.uk> parents: 925 diff changeset	541	a way to construct states ``programmatically''...and as an additional
495 acd4567735ce updated Christian Urban <urbanc@in.tum.de> parents: 492 diff changeset	542	constraint Scala needs to recognise that these states are being distinct.
487 ffbc65112d48 updated Christian Urban <urbanc@in.tum.de> parents: 485 diff changeset	543	For this I implemented in Figure~\ref{thompson1} a class
ffbc65112d48 updated Christian Urban <urbanc@in.tum.de> parents: 485 diff changeset	544	\texttt{TState} that includes a counter and a companion object that
488 057b4603b940 updated Christian Urban <urbanc@in.tum.de> parents: 487 diff changeset	545	increases this counter whenever a new state is created.\footnote{You might
057b4603b940 updated Christian Urban <urbanc@in.tum.de> parents: 487 diff changeset	546	have to read up what \emph{companion objects} do in Scala.}
487 ffbc65112d48 updated Christian Urban <urbanc@in.tum.de> parents: 485 diff changeset	547
485 21dec9df46ba updated Christian Urban <urbanc@in.tum.de> parents: 484 diff changeset	548	\begin{figure}[p]
21dec9df46ba updated Christian Urban <urbanc@in.tum.de> parents: 484 diff changeset	549	\small
753 30ea6b01db46 updated Christian Urban <christian.urban@kcl.ac.uk> parents: 698 diff changeset	550	\lstinputlisting[numbers=left,linerange={1-20}]{../progs/automata/thompson.sc}
30ea6b01db46 updated Christian Urban <christian.urban@kcl.ac.uk> parents: 698 diff changeset	551	\hspace{5mm}\texttt{\dots}
30ea6b01db46 updated Christian Urban <christian.urban@kcl.ac.uk> parents: 698 diff changeset	552	\lstinputlisting[numbers=left,linerange={28-45},firstnumber=28]{../progs/automata/thompson.sc}
30ea6b01db46 updated Christian Urban <christian.urban@kcl.ac.uk> parents: 698 diff changeset	553	\caption{The first part of the Thompson Construction. Lines 10--19
488 057b4603b940 updated Christian Urban <urbanc@in.tum.de> parents: 487 diff changeset	554	implement a way of how to create new states that are all
487 ffbc65112d48 updated Christian Urban <urbanc@in.tum.de> parents: 485 diff changeset	555	distinct by virtue of a counter. This counter is
ffbc65112d48 updated Christian Urban <urbanc@in.tum.de> parents: 485 diff changeset	556	increased in the companion object of \texttt{TState}
753 30ea6b01db46 updated Christian Urban <christian.urban@kcl.ac.uk> parents: 698 diff changeset	557	whenever a new state is created. The code in Lines 38--45
488 057b4603b940 updated Christian Urban <urbanc@in.tum.de> parents: 487 diff changeset	558	constructs NFAs for the simple regular expressions $\ZERO$, $\ONE$ and $c$.
495 acd4567735ce updated Christian Urban <urbanc@in.tum.de> parents: 492 diff changeset	559	Compare this code with the pictures given in \eqref{simplecases} on
acd4567735ce updated Christian Urban <urbanc@in.tum.de> parents: 492 diff changeset	560	Page~\pageref{simplecases}.
487 ffbc65112d48 updated Christian Urban <urbanc@in.tum.de> parents: 485 diff changeset	561	\label{thompson1}}
485 21dec9df46ba updated Christian Urban <urbanc@in.tum.de> parents: 484 diff changeset	562	\end{figure}
21dec9df46ba updated Christian Urban <urbanc@in.tum.de> parents: 484 diff changeset	563
487 ffbc65112d48 updated Christian Urban <urbanc@in.tum.de> parents: 485 diff changeset	564	\begin{figure}[p]
935 3fb9b05465dd updated Christian Urban <christian.urban@kcl.ac.uk> parents: 931 diff changeset	565	\small
753 30ea6b01db46 updated Christian Urban <christian.urban@kcl.ac.uk> parents: 698 diff changeset	566	\lstinputlisting[numbers=left,firstline=48,firstnumber=48,lastline=85]{../progs/automata/thompson.sc}
487 ffbc65112d48 updated Christian Urban <urbanc@in.tum.de> parents: 485 diff changeset	567	\caption{The second part of the Thompson Construction implementing
490 8a07f7256f2a updated Christian Urban <urbanc@in.tum.de> parents: 489 diff changeset	568	the composition of NFAs according to $\cdot$, $+$ and ${}^*$.
935 3fb9b05465dd updated Christian Urban <christian.urban@kcl.ac.uk> parents: 931 diff changeset	569	The extension (Lines 48--54) about rich partial functions
487 ffbc65112d48 updated Christian Urban <urbanc@in.tum.de> parents: 485 diff changeset	570	implements the infix operation \texttt{+++} which
753 30ea6b01db46 updated Christian Urban <christian.urban@kcl.ac.uk> parents: 698 diff changeset	571	combines an $\epsilon$NFA transition with an NFA transition
495 acd4567735ce updated Christian Urban <urbanc@in.tum.de> parents: 492 diff changeset	572	(both are given as partial functions---but with different type!).\label{thompson2}}
487 ffbc65112d48 updated Christian Urban <urbanc@in.tum.de> parents: 485 diff changeset	573	\end{figure}
485 21dec9df46ba updated Christian Urban <urbanc@in.tum.de> parents: 484 diff changeset	574
488 057b4603b940 updated Christian Urban <urbanc@in.tum.de> parents: 487 diff changeset	575	The case for the sequence regular expression $r_1 \cdot r_2$ is a bit more
489 4430477595ec updated Christian Urban <urbanc@in.tum.de> parents: 488 diff changeset	576	complicated: Say, we are given by recursion two NFAs representing the regular
488 057b4603b940 updated Christian Urban <urbanc@in.tum.de> parents: 487 diff changeset	577	expressions $r_1$ and $r_2$ respectively.
143 e3fd4c5995ef added Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 142 diff changeset	578
e3fd4c5995ef added Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 142 diff changeset	579	\begin{center}
e3fd4c5995ef added Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 142 diff changeset	580	\begin{tikzpicture}[node distance=3mm,
488 057b4603b940 updated Christian Urban <urbanc@in.tum.de> parents: 487 diff changeset	581	>=stealth',very thick,
057b4603b940 updated Christian Urban <urbanc@in.tum.de> parents: 487 diff changeset	582	every state/.style={minimum size=3pt,draw=blue!50,very thick,fill=blue!20},]
482 74149519e436 updated Christian Urban <urbanc@in.tum.de> parents: 480 diff changeset	583	\node[state, initial] (Q_0) {$\mbox{}$};
488 057b4603b940 updated Christian Urban <urbanc@in.tum.de> parents: 487 diff changeset	584	\node[state, initial] (Q_01) [below=1mm of Q_0] {$\mbox{}$};
057b4603b940 updated Christian Urban <urbanc@in.tum.de> parents: 487 diff changeset	585	\node[state, initial] (Q_02) [above=1mm of Q_0] {$\mbox{}$};
057b4603b940 updated Christian Urban <urbanc@in.tum.de> parents: 487 diff changeset	586	\node (R_1) [right=of Q_0] {$\ldots$};
057b4603b940 updated Christian Urban <urbanc@in.tum.de> parents: 487 diff changeset	587	\node[state, accepting] (T_1) [right=of R_1] {$\mbox{}$};
057b4603b940 updated Christian Urban <urbanc@in.tum.de> parents: 487 diff changeset	588	\node[state, accepting] (T_2) [above=of T_1] {$\mbox{}$};
057b4603b940 updated Christian Urban <urbanc@in.tum.de> parents: 487 diff changeset	589	\node[state, accepting] (T_3) [below=of T_1] {$\mbox{}$};
057b4603b940 updated Christian Urban <urbanc@in.tum.de> parents: 487 diff changeset	590
057b4603b940 updated Christian Urban <urbanc@in.tum.de> parents: 487 diff changeset	591	\node (A_0) [right=2.5cm of T_1] {$\mbox{}$};
057b4603b940 updated Christian Urban <urbanc@in.tum.de> parents: 487 diff changeset	592	\node[state, initial] (A_01) [above=1mm of A_0] {$\mbox{}$};
057b4603b940 updated Christian Urban <urbanc@in.tum.de> parents: 487 diff changeset	593	\node[state, initial] (A_02) [below=1mm of A_0] {$\mbox{}$};
057b4603b940 updated Christian Urban <urbanc@in.tum.de> parents: 487 diff changeset	594
057b4603b940 updated Christian Urban <urbanc@in.tum.de> parents: 487 diff changeset	595	\node (b_1) [right=of A_0] {$\ldots$};
143 e3fd4c5995ef added Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 142 diff changeset	596	\node[state, accepting] (c_1) [right=of b_1] {$\mbox{}$};
e3fd4c5995ef added Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 142 diff changeset	597	\node[state, accepting] (c_2) [above=of c_1] {$\mbox{}$};
e3fd4c5995ef added Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 142 diff changeset	598	\node[state, accepting] (c_3) [below=of c_1] {$\mbox{}$};
e3fd4c5995ef added Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 142 diff changeset	599	\begin{pgfonlayer}{background}
488 057b4603b940 updated Christian Urban <urbanc@in.tum.de> parents: 487 diff changeset	600	\node (1) [rounded corners, inner sep=1mm, thick,
057b4603b940 updated Christian Urban <urbanc@in.tum.de> parents: 487 diff changeset	601	draw=black!60, fill=black!20, fit= (Q_0) (R_1) (T_1) (T_2) (T_3)] {};
057b4603b940 updated Christian Urban <urbanc@in.tum.de> parents: 487 diff changeset	602	\node (2) [rounded corners, inner sep=1mm, thick,
057b4603b940 updated Christian Urban <urbanc@in.tum.de> parents: 487 diff changeset	603	draw=black!60, fill=black!20, fit= (A_0) (b_1) (c_1) (c_2) (c_3)] {};
143 e3fd4c5995ef added Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 142 diff changeset	604	\node [yshift=2mm] at (1.north) {$r_1$};
e3fd4c5995ef added Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 142 diff changeset	605	\node [yshift=2mm] at (2.north) {$r_2$};
e3fd4c5995ef added Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 142 diff changeset	606	\end{pgfonlayer}
e3fd4c5995ef added Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 142 diff changeset	607	\end{tikzpicture}
e3fd4c5995ef added Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 142 diff changeset	608	\end{center}
e3fd4c5995ef added Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 142 diff changeset	609
488 057b4603b940 updated Christian Urban <urbanc@in.tum.de> parents: 487 diff changeset	610	\noindent The first NFA has some accepting states and the second some
489 4430477595ec updated Christian Urban <urbanc@in.tum.de> parents: 488 diff changeset	611	starting states. We obtain an $\epsilon$NFA for $r_1\cdot r_2$ by
4430477595ec updated Christian Urban <urbanc@in.tum.de> parents: 488 diff changeset	612	connecting the accepting states of the first NFA with
4430477595ec updated Christian Urban <urbanc@in.tum.de> parents: 488 diff changeset	613	$\epsilon$-transitions to the starting states of the second
4430477595ec updated Christian Urban <urbanc@in.tum.de> parents: 488 diff changeset	614	automaton. By doing so we make the accepting states of the first NFA
4430477595ec updated Christian Urban <urbanc@in.tum.de> parents: 488 diff changeset	615	to be non-accepting like so:
143 e3fd4c5995ef added Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 142 diff changeset	616
e3fd4c5995ef added Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 142 diff changeset	617	\begin{center}
e3fd4c5995ef added Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 142 diff changeset	618	\begin{tikzpicture}[node distance=3mm,
488 057b4603b940 updated Christian Urban <urbanc@in.tum.de> parents: 487 diff changeset	619	>=stealth',very thick,
057b4603b940 updated Christian Urban <urbanc@in.tum.de> parents: 487 diff changeset	620	every state/.style={minimum size=3pt,draw=blue!50,very thick,fill=blue!20},]
482 74149519e436 updated Christian Urban <urbanc@in.tum.de> parents: 480 diff changeset	621	\node[state, initial] (Q_0) {$\mbox{}$};
488 057b4603b940 updated Christian Urban <urbanc@in.tum.de> parents: 487 diff changeset	622	\node[state, initial] (Q_01) [below=1mm of Q_0] {$\mbox{}$};
057b4603b940 updated Christian Urban <urbanc@in.tum.de> parents: 487 diff changeset	623	\node[state, initial] (Q_02) [above=1mm of Q_0] {$\mbox{}$};
482 74149519e436 updated Christian Urban <urbanc@in.tum.de> parents: 480 diff changeset	624	\node (r_1) [right=of Q_0] {$\ldots$};
143 e3fd4c5995ef added Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 142 diff changeset	625	\node[state] (t_1) [right=of r_1] {$\mbox{}$};
e3fd4c5995ef added Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 142 diff changeset	626	\node[state] (t_2) [above=of t_1] {$\mbox{}$};
e3fd4c5995ef added Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 142 diff changeset	627	\node[state] (t_3) [below=of t_1] {$\mbox{}$};
488 057b4603b940 updated Christian Urban <urbanc@in.tum.de> parents: 487 diff changeset	628
057b4603b940 updated Christian Urban <urbanc@in.tum.de> parents: 487 diff changeset	629	\node (A_0) [right=2.5cm of t_1] {$\mbox{}$};
057b4603b940 updated Christian Urban <urbanc@in.tum.de> parents: 487 diff changeset	630	\node[state] (A_01) [above=1mm of A_0] {$\mbox{}$};
057b4603b940 updated Christian Urban <urbanc@in.tum.de> parents: 487 diff changeset	631	\node[state] (A_02) [below=1mm of A_0] {$\mbox{}$};
057b4603b940 updated Christian Urban <urbanc@in.tum.de> parents: 487 diff changeset	632
057b4603b940 updated Christian Urban <urbanc@in.tum.de> parents: 487 diff changeset	633	\node (b_1) [right=of A_0] {$\ldots$};
143 e3fd4c5995ef added Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 142 diff changeset	634	\node[state, accepting] (c_1) [right=of b_1] {$\mbox{}$};
e3fd4c5995ef added Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 142 diff changeset	635	\node[state, accepting] (c_2) [above=of c_1] {$\mbox{}$};
e3fd4c5995ef added Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 142 diff changeset	636	\node[state, accepting] (c_3) [below=of c_1] {$\mbox{}$};
488 057b4603b940 updated Christian Urban <urbanc@in.tum.de> parents: 487 diff changeset	637	\path[->] (t_1) edge (A_01);
492 882d5de18adc updated Christian Urban <urbanc@in.tum.de> parents: 491 diff changeset	638	\path[->] (t_2) edge node [above] {$\epsilon$s} (A_01);
488 057b4603b940 updated Christian Urban <urbanc@in.tum.de> parents: 487 diff changeset	639	\path[->] (t_3) edge (A_01);
057b4603b940 updated Christian Urban <urbanc@in.tum.de> parents: 487 diff changeset	640	\path[->] (t_1) edge (A_02);
057b4603b940 updated Christian Urban <urbanc@in.tum.de> parents: 487 diff changeset	641	\path[->] (t_2) edge (A_02);
492 882d5de18adc updated Christian Urban <urbanc@in.tum.de> parents: 491 diff changeset	642	\path[->] (t_3) edge node [below] {$\epsilon$s} (A_02);
488 057b4603b940 updated Christian Urban <urbanc@in.tum.de> parents: 487 diff changeset	643
143 e3fd4c5995ef added Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 142 diff changeset	644	\begin{pgfonlayer}{background}
488 057b4603b940 updated Christian Urban <urbanc@in.tum.de> parents: 487 diff changeset	645	\node (3) [rounded corners, inner sep=1mm, thick,
057b4603b940 updated Christian Urban <urbanc@in.tum.de> parents: 487 diff changeset	646	draw=black!60, fill=black!20, fit= (Q_0) (c_1) (c_2) (c_3)] {};
143 e3fd4c5995ef added Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 142 diff changeset	647	\node [yshift=2mm] at (3.north) {$r_1\cdot r_2$};
e3fd4c5995ef added Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 142 diff changeset	648	\end{pgfonlayer}
e3fd4c5995ef added Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 142 diff changeset	649	\end{tikzpicture}
e3fd4c5995ef added Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 142 diff changeset	650	\end{center}
e3fd4c5995ef added Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 142 diff changeset	651
489 4430477595ec updated Christian Urban <urbanc@in.tum.de> parents: 488 diff changeset	652	\noindent The idea behind this construction is that the start of any
4430477595ec updated Christian Urban <urbanc@in.tum.de> parents: 488 diff changeset	653	string is first recognised by the first NFA, then we silently change
4430477595ec updated Christian Urban <urbanc@in.tum.de> parents: 488 diff changeset	654	to the second NFA; the ending of the string is recognised by the
4430477595ec updated Christian Urban <urbanc@in.tum.de> parents: 488 diff changeset	655	second NFA...just like matching of a string by the regular expression
490 8a07f7256f2a updated Christian Urban <urbanc@in.tum.de> parents: 489 diff changeset	656	$r_1\cdot r_2$. The Scala code for this construction is given in
753 30ea6b01db46 updated Christian Urban <christian.urban@kcl.ac.uk> parents: 698 diff changeset	657	Figure~\ref{thompson2} in Lines 57--65. The starting states of the
489 4430477595ec updated Christian Urban <urbanc@in.tum.de> parents: 488 diff changeset	658	$\epsilon$NFA are the starting states of the first NFA (corresponding
4430477595ec updated Christian Urban <urbanc@in.tum.de> parents: 488 diff changeset	659	to $r_1$); the accepting function is the accepting function of the
4430477595ec updated Christian Urban <urbanc@in.tum.de> parents: 488 diff changeset	660	second NFA (corresponding to $r_2$). The new transition function is
4430477595ec updated Christian Urban <urbanc@in.tum.de> parents: 488 diff changeset	661	all the ``old'' transitions plus the $\epsilon$-transitions connecting
4430477595ec updated Christian Urban <urbanc@in.tum.de> parents: 488 diff changeset	662	the accepting states of the first NFA to the starting states of the
753 30ea6b01db46 updated Christian Urban <christian.urban@kcl.ac.uk> parents: 698 diff changeset	663	second NFA (Lines 59 and 60). The $\epsilon$NFA is then immediately
489 4430477595ec updated Christian Urban <urbanc@in.tum.de> parents: 488 diff changeset	664	translated in a NFA.
4430477595ec updated Christian Urban <urbanc@in.tum.de> parents: 488 diff changeset	665
4430477595ec updated Christian Urban <urbanc@in.tum.de> parents: 488 diff changeset	666
490 8a07f7256f2a updated Christian Urban <urbanc@in.tum.de> parents: 489 diff changeset	667	The case for the alternative regular expression $r_1 + r_2$ is
8a07f7256f2a updated Christian Urban <urbanc@in.tum.de> parents: 489 diff changeset	668	slightly different: We are given by recursion two NFAs representing
8a07f7256f2a updated Christian Urban <urbanc@in.tum.de> parents: 489 diff changeset	669	$r_1$ and $r_2$ respectively. Each NFA has some starting states and
753 30ea6b01db46 updated Christian Urban <christian.urban@kcl.ac.uk> parents: 698 diff changeset	670	some accepting states. We obtain a NFA for the regular expression
30ea6b01db46 updated Christian Urban <christian.urban@kcl.ac.uk> parents: 698 diff changeset	671	$r_1 + r_2$ by composing the transition functions (this crucially
30ea6b01db46 updated Christian Urban <christian.urban@kcl.ac.uk> parents: 698 diff changeset	672	depends on knowing that the states of each component NFA are
30ea6b01db46 updated Christian Urban <christian.urban@kcl.ac.uk> parents: 698 diff changeset	673	distinct---recall we implemented for this to hold by some bespoke code
30ea6b01db46 updated Christian Urban <christian.urban@kcl.ac.uk> parents: 698 diff changeset	674	for \texttt{TState}s). We also need to combine the starting states and
30ea6b01db46 updated Christian Urban <christian.urban@kcl.ac.uk> parents: 698 diff changeset	675	accepting functions appropriately.
143 e3fd4c5995ef added Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 142 diff changeset	676
e3fd4c5995ef added Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 142 diff changeset	677	\begin{center}
490 8a07f7256f2a updated Christian Urban <urbanc@in.tum.de> parents: 489 diff changeset	678	\begin{tabular}[t]{ccc}
143 e3fd4c5995ef added Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 142 diff changeset	679	\begin{tikzpicture}[node distance=3mm,
488 057b4603b940 updated Christian Urban <urbanc@in.tum.de> parents: 487 diff changeset	680	>=stealth',very thick,
490 8a07f7256f2a updated Christian Urban <urbanc@in.tum.de> parents: 489 diff changeset	681	every state/.style={minimum size=3pt,draw=blue!50,very thick,fill=blue!20},
8a07f7256f2a updated Christian Urban <urbanc@in.tum.de> parents: 489 diff changeset	682	baseline=(current bounding box.center)]
143 e3fd4c5995ef added Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 142 diff changeset	683	\node at (0,0) (1) {$\mbox{}$};
489 4430477595ec updated Christian Urban <urbanc@in.tum.de> parents: 488 diff changeset	684	\node (2) [above=10mm of 1] {};
4430477595ec updated Christian Urban <urbanc@in.tum.de> parents: 488 diff changeset	685	\node[state, initial] (4) [above=1mm of 2] {$\mbox{}$};
4430477595ec updated Christian Urban <urbanc@in.tum.de> parents: 488 diff changeset	686	\node[state, initial] (5) [below=1mm of 2] {$\mbox{}$};
4430477595ec updated Christian Urban <urbanc@in.tum.de> parents: 488 diff changeset	687	\node[state, initial] (3) [below=10mm of 1] {$\mbox{}$};
143 e3fd4c5995ef added Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 142 diff changeset	688
489 4430477595ec updated Christian Urban <urbanc@in.tum.de> parents: 488 diff changeset	689	\node (a) [right=of 2] {$\ldots\,$};
4430477595ec updated Christian Urban <urbanc@in.tum.de> parents: 488 diff changeset	690	\node (a1) [right=of a] {$$};
143 e3fd4c5995ef added Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 142 diff changeset	691	\node[state, accepting] (a2) [above=of a1] {$\mbox{}$};
e3fd4c5995ef added Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 142 diff changeset	692	\node[state, accepting] (a3) [below=of a1] {$\mbox{}$};
e3fd4c5995ef added Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 142 diff changeset	693
e3fd4c5995ef added Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 142 diff changeset	694	\node (b) [right=of 3] {$\ldots$};
e3fd4c5995ef added Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 142 diff changeset	695	\node[state, accepting] (b1) [right=of b] {$\mbox{}$};
e3fd4c5995ef added Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 142 diff changeset	696	\node[state, accepting] (b2) [above=of b1] {$\mbox{}$};
e3fd4c5995ef added Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 142 diff changeset	697	\node[state, accepting] (b3) [below=of b1] {$\mbox{}$};
e3fd4c5995ef added Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 142 diff changeset	698	\begin{pgfonlayer}{background}
e3fd4c5995ef added Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 142 diff changeset	699	\node (1) [rounded corners, inner sep=1mm, thick, draw=black!60, fill=black!20, fit= (2) (a1) (a2) (a3)] {};
e3fd4c5995ef added Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 142 diff changeset	700	\node (2) [rounded corners, inner sep=1mm, thick, draw=black!60, fill=black!20, fit= (3) (b1) (b2) (b3)] {};
e3fd4c5995ef added Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 142 diff changeset	701	\node [yshift=3mm] at (1.north) {$r_1$};
e3fd4c5995ef added Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 142 diff changeset	702	\node [yshift=3mm] at (2.north) {$r_2$};
e3fd4c5995ef added Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 142 diff changeset	703	\end{pgfonlayer}
e3fd4c5995ef added Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 142 diff changeset	704	\end{tikzpicture}
490 8a07f7256f2a updated Christian Urban <urbanc@in.tum.de> parents: 489 diff changeset	705	&
8a07f7256f2a updated Christian Urban <urbanc@in.tum.de> parents: 489 diff changeset	706	\mbox{}\qquad\tikz{\draw[>=stealth,line width=2mm,->] (0,0) -- (1, 0)}\quad\mbox{}
8a07f7256f2a updated Christian Urban <urbanc@in.tum.de> parents: 489 diff changeset	707	&
489 4430477595ec updated Christian Urban <urbanc@in.tum.de> parents: 488 diff changeset	708	\begin{tikzpicture}[node distance=3mm,
4430477595ec updated Christian Urban <urbanc@in.tum.de> parents: 488 diff changeset	709	>=stealth',very thick,
490 8a07f7256f2a updated Christian Urban <urbanc@in.tum.de> parents: 489 diff changeset	710	every state/.style={minimum size=3pt,draw=blue!50,very thick,fill=blue!20},
8a07f7256f2a updated Christian Urban <urbanc@in.tum.de> parents: 489 diff changeset	711	baseline=(current bounding box.center)]
489 4430477595ec updated Christian Urban <urbanc@in.tum.de> parents: 488 diff changeset	712	\node at (0,0) (1) {$\mbox{}$};
4430477595ec updated Christian Urban <urbanc@in.tum.de> parents: 488 diff changeset	713	\node (2) [above=10mm of 1] {$$};
4430477595ec updated Christian Urban <urbanc@in.tum.de> parents: 488 diff changeset	714	\node[state, initial] (4) [above=1mm of 2] {$\mbox{}$};
4430477595ec updated Christian Urban <urbanc@in.tum.de> parents: 488 diff changeset	715	\node[state, initial] (5) [below=1mm of 2] {$\mbox{}$};
4430477595ec updated Christian Urban <urbanc@in.tum.de> parents: 488 diff changeset	716	\node[state, initial] (3) [below=10mm of 1] {$\mbox{}$};
143 e3fd4c5995ef added Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 142 diff changeset	717
489 4430477595ec updated Christian Urban <urbanc@in.tum.de> parents: 488 diff changeset	718	\node (a) [right=of 2] {$\ldots\,$};
4430477595ec updated Christian Urban <urbanc@in.tum.de> parents: 488 diff changeset	719	\node (a1) [right=of a] {$$};
143 e3fd4c5995ef added Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 142 diff changeset	720	\node[state, accepting] (a2) [above=of a1] {$\mbox{}$};
e3fd4c5995ef added Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 142 diff changeset	721	\node[state, accepting] (a3) [below=of a1] {$\mbox{}$};
e3fd4c5995ef added Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 142 diff changeset	722
e3fd4c5995ef added Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 142 diff changeset	723	\node (b) [right=of 3] {$\ldots$};
e3fd4c5995ef added Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 142 diff changeset	724	\node[state, accepting] (b1) [right=of b] {$\mbox{}$};
e3fd4c5995ef added Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 142 diff changeset	725	\node[state, accepting] (b2) [above=of b1] {$\mbox{}$};
e3fd4c5995ef added Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 142 diff changeset	726	\node[state, accepting] (b3) [below=of b1] {$\mbox{}$};
e3fd4c5995ef added Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 142 diff changeset	727
489 4430477595ec updated Christian Urban <urbanc@in.tum.de> parents: 488 diff changeset	728	%\path[->] (1) edge node [above] {$\epsilon$} (2);
4430477595ec updated Christian Urban <urbanc@in.tum.de> parents: 488 diff changeset	729	%\path[->] (1) edge node [below] {$\epsilon$} (3);
143 e3fd4c5995ef added Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 142 diff changeset	730
e3fd4c5995ef added Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 142 diff changeset	731	\begin{pgfonlayer}{background}
e3fd4c5995ef added Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 142 diff changeset	732	\node (3) [rounded corners, inner sep=1mm, thick, draw=black!60, fill=black!20, fit= (1) (a2) (a3) (b2) (b3)] {};
e3fd4c5995ef added Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 142 diff changeset	733	\node [yshift=3mm] at (3.north) {$r_1+ r_2$};
e3fd4c5995ef added Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 142 diff changeset	734	\end{pgfonlayer}
e3fd4c5995ef added Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 142 diff changeset	735	\end{tikzpicture}
490 8a07f7256f2a updated Christian Urban <urbanc@in.tum.de> parents: 489 diff changeset	736	\end{tabular}
143 e3fd4c5995ef added Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 142 diff changeset	737	\end{center}
e3fd4c5995ef added Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 142 diff changeset	738
489 4430477595ec updated Christian Urban <urbanc@in.tum.de> parents: 488 diff changeset	739	\noindent The code for this construction is in Figure~\ref{thompson2}
753 30ea6b01db46 updated Christian Urban <christian.urban@kcl.ac.uk> parents: 698 diff changeset	740	in Lines 67--75.
490 8a07f7256f2a updated Christian Urban <urbanc@in.tum.de> parents: 489 diff changeset	741
8a07f7256f2a updated Christian Urban <urbanc@in.tum.de> parents: 489 diff changeset	742	Finally for the $*$-case we have a NFA for $r$ and connect its
8a07f7256f2a updated Christian Urban <urbanc@in.tum.de> parents: 489 diff changeset	743	accepting states to a new starting state via
8a07f7256f2a updated Christian Urban <urbanc@in.tum.de> parents: 489 diff changeset	744	$\epsilon$-transitions. This new starting state is also an accepting
8a07f7256f2a updated Christian Urban <urbanc@in.tum.de> parents: 489 diff changeset	745	state, because $r^*$ can recognise the empty string.
143 e3fd4c5995ef added Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 142 diff changeset	746
e3fd4c5995ef added Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 142 diff changeset	747	\begin{center}
495 acd4567735ce updated Christian Urban <urbanc@in.tum.de> parents: 492 diff changeset	748	\begin{tabular}[b]{@{}ccc@{}}
143 e3fd4c5995ef added Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 142 diff changeset	749	\begin{tikzpicture}[node distance=3mm,
490 8a07f7256f2a updated Christian Urban <urbanc@in.tum.de> parents: 489 diff changeset	750	>=stealth',very thick,
8a07f7256f2a updated Christian Urban <urbanc@in.tum.de> parents: 489 diff changeset	751	every state/.style={minimum size=3pt,draw=blue!50,very thick,fill=blue!20},
8a07f7256f2a updated Christian Urban <urbanc@in.tum.de> parents: 489 diff changeset	752	baseline=(current bounding box.north)]
495 acd4567735ce updated Christian Urban <urbanc@in.tum.de> parents: 492 diff changeset	753	\node (2) {$\mbox{}$};
acd4567735ce updated Christian Urban <urbanc@in.tum.de> parents: 492 diff changeset	754	\node[state, initial] (4) [above=1mm of 2] {$\mbox{}$};
acd4567735ce updated Christian Urban <urbanc@in.tum.de> parents: 492 diff changeset	755	\node[state, initial] (5) [below=1mm of 2] {$\mbox{}$};
143 e3fd4c5995ef added Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 142 diff changeset	756	\node (a) [right=of 2] {$\ldots$};
e3fd4c5995ef added Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 142 diff changeset	757	\node[state, accepting] (a1) [right=of a] {$\mbox{}$};
e3fd4c5995ef added Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 142 diff changeset	758	\node[state, accepting] (a2) [above=of a1] {$\mbox{}$};
e3fd4c5995ef added Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 142 diff changeset	759	\node[state, accepting] (a3) [below=of a1] {$\mbox{}$};
e3fd4c5995ef added Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 142 diff changeset	760	\begin{pgfonlayer}{background}
e3fd4c5995ef added Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 142 diff changeset	761	\node (1) [rounded corners, inner sep=1mm, thick, draw=black!60, fill=black!20, fit= (2) (a1) (a2) (a3)] {};
e3fd4c5995ef added Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 142 diff changeset	762	\node [yshift=3mm] at (1.north) {$r$};
e3fd4c5995ef added Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 142 diff changeset	763	\end{pgfonlayer}
e3fd4c5995ef added Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 142 diff changeset	764	\end{tikzpicture}
490 8a07f7256f2a updated Christian Urban <urbanc@in.tum.de> parents: 489 diff changeset	765	&
8a07f7256f2a updated Christian Urban <urbanc@in.tum.de> parents: 489 diff changeset	766	\raisebox{-16mm}{\;\tikz{\draw[>=stealth,line width=2mm,->] (0,0) -- (1, 0)}}
8a07f7256f2a updated Christian Urban <urbanc@in.tum.de> parents: 489 diff changeset	767	&
143 e3fd4c5995ef added Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 142 diff changeset	768	\begin{tikzpicture}[node distance=3mm,
489 4430477595ec updated Christian Urban <urbanc@in.tum.de> parents: 488 diff changeset	769	>=stealth',very thick,
490 8a07f7256f2a updated Christian Urban <urbanc@in.tum.de> parents: 489 diff changeset	770	every state/.style={minimum size=3pt,draw=blue!50,very thick,fill=blue!20},
8a07f7256f2a updated Christian Urban <urbanc@in.tum.de> parents: 489 diff changeset	771	baseline=(current bounding box.north)]
143 e3fd4c5995ef added Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 142 diff changeset	772	\node at (0,0) [state, initial,accepting] (1) {$\mbox{}$};
495 acd4567735ce updated Christian Urban <urbanc@in.tum.de> parents: 492 diff changeset	773	\node (2) [right=16mm of 1] {$\mbox{}$};
acd4567735ce updated Christian Urban <urbanc@in.tum.de> parents: 492 diff changeset	774	\node[state] (4) [above=1mm of 2] {$\mbox{}$};
acd4567735ce updated Christian Urban <urbanc@in.tum.de> parents: 492 diff changeset	775	\node[state] (5) [below=1mm of 2] {$\mbox{}$};
143 e3fd4c5995ef added Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 142 diff changeset	776	\node (a) [right=of 2] {$\ldots$};
e3fd4c5995ef added Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 142 diff changeset	777	\node[state] (a1) [right=of a] {$\mbox{}$};
e3fd4c5995ef added Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 142 diff changeset	778	\node[state] (a2) [above=of a1] {$\mbox{}$};
e3fd4c5995ef added Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 142 diff changeset	779	\node[state] (a3) [below=of a1] {$\mbox{}$};
495 acd4567735ce updated Christian Urban <urbanc@in.tum.de> parents: 492 diff changeset	780	\path[->] (1) edge node [below] {$\epsilon$} (4);
acd4567735ce updated Christian Urban <urbanc@in.tum.de> parents: 492 diff changeset	781	\path[->] (1) edge node [below] {$\epsilon$} (5);
acd4567735ce updated Christian Urban <urbanc@in.tum.de> parents: 492 diff changeset	782	\path[->] (a1) edge [bend left=45] node [below] {$\epsilon$} (1);
143 e3fd4c5995ef added Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 142 diff changeset	783	\path[->] (a2) edge [bend right] node [below] {$\epsilon$} (1);
e3fd4c5995ef added Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 142 diff changeset	784	\path[->] (a3) edge [bend left=45] node [below] {$\epsilon$} (1);
e3fd4c5995ef added Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 142 diff changeset	785	\begin{pgfonlayer}{background}
e3fd4c5995ef added Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 142 diff changeset	786	\node (2) [rounded corners, inner sep=1mm, thick, draw=black!60, fill=black!20, fit= (1) (a2) (a3)] {};
e3fd4c5995ef added Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 142 diff changeset	787	\node [yshift=3mm] at (2.north) {$r^*$};
e3fd4c5995ef added Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 142 diff changeset	788	\end{pgfonlayer}
490 8a07f7256f2a updated Christian Urban <urbanc@in.tum.de> parents: 489 diff changeset	789	\end{tikzpicture}
8a07f7256f2a updated Christian Urban <urbanc@in.tum.de> parents: 489 diff changeset	790	\end{tabular}
143 e3fd4c5995ef added Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 142 diff changeset	791	\end{center}
e3fd4c5995ef added Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 142 diff changeset	792
490 8a07f7256f2a updated Christian Urban <urbanc@in.tum.de> parents: 489 diff changeset	793	\noindent
753 30ea6b01db46 updated Christian Urban <christian.urban@kcl.ac.uk> parents: 698 diff changeset	794	The corresponding code is in Figure~\ref{thompson2} in Lines 77--85)
489 4430477595ec updated Christian Urban <urbanc@in.tum.de> parents: 488 diff changeset	795
753 30ea6b01db46 updated Christian Urban <christian.urban@kcl.ac.uk> parents: 698 diff changeset	796	To sum up, you can see in the sequence and star cases the need for
30ea6b01db46 updated Christian Urban <christian.urban@kcl.ac.uk> parents: 698 diff changeset	797	silent $\epsilon$-transitions. Otherwise this construction just
30ea6b01db46 updated Christian Urban <christian.urban@kcl.ac.uk> parents: 698 diff changeset	798	becomes awkward. Similarly the alternative case shows the need of the
30ea6b01db46 updated Christian Urban <christian.urban@kcl.ac.uk> parents: 698 diff changeset	799	NFA-nondeterminism. It looks non-obvious to form the `alternative'
30ea6b01db46 updated Christian Urban <christian.urban@kcl.ac.uk> parents: 698 diff changeset	800	composition of two DFAs, because DFA do not allow several starting and
30ea6b01db46 updated Christian Urban <christian.urban@kcl.ac.uk> parents: 698 diff changeset	801	successor states. All these constructions can now be put together in
30ea6b01db46 updated Christian Urban <christian.urban@kcl.ac.uk> parents: 698 diff changeset	802	the following recursive function:
489 4430477595ec updated Christian Urban <urbanc@in.tum.de> parents: 488 diff changeset	803
4430477595ec updated Christian Urban <urbanc@in.tum.de> parents: 488 diff changeset	804
490 8a07f7256f2a updated Christian Urban <urbanc@in.tum.de> parents: 489 diff changeset	805	{\small\begin{lstlisting}[language=Scala]
8a07f7256f2a updated Christian Urban <urbanc@in.tum.de> parents: 489 diff changeset	806	def thompson(r: Rexp) : NFAt = r match {
488 057b4603b940 updated Christian Urban <urbanc@in.tum.de> parents: 487 diff changeset	807	case ZERO => NFA_ZERO()
057b4603b940 updated Christian Urban <urbanc@in.tum.de> parents: 487 diff changeset	808	case ONE => NFA_ONE()
057b4603b940 updated Christian Urban <urbanc@in.tum.de> parents: 487 diff changeset	809	case CHAR(c) => NFA_CHAR(c)
057b4603b940 updated Christian Urban <urbanc@in.tum.de> parents: 487 diff changeset	810	case ALT(r1, r2) => NFA_ALT(thompson(r1), thompson(r2))
057b4603b940 updated Christian Urban <urbanc@in.tum.de> parents: 487 diff changeset	811	case SEQ(r1, r2) => NFA_SEQ(thompson(r1), thompson(r2))
057b4603b940 updated Christian Urban <urbanc@in.tum.de> parents: 487 diff changeset	812	case STAR(r1) => NFA_STAR(thompson(r1))
057b4603b940 updated Christian Urban <urbanc@in.tum.de> parents: 487 diff changeset	813	}
057b4603b940 updated Christian Urban <urbanc@in.tum.de> parents: 487 diff changeset	814	\end{lstlisting}}
057b4603b940 updated Christian Urban <urbanc@in.tum.de> parents: 487 diff changeset	815
489 4430477595ec updated Christian Urban <urbanc@in.tum.de> parents: 488 diff changeset	816	\noindent
966 d82c91f85391 updated Christian Urban <christian.urban@kcl.ac.uk> parents: 935 diff changeset	817	It calculates a NFA from a regular expression. At last we can run
d82c91f85391 updated Christian Urban <christian.urban@kcl.ac.uk> parents: 935 diff changeset	818	NFAs for our evil regular expression examples. The graph on the
490 8a07f7256f2a updated Christian Urban <urbanc@in.tum.de> parents: 489 diff changeset	819	left shows that when translating a regular expression such as
753 30ea6b01db46 updated Christian Urban <christian.urban@kcl.ac.uk> parents: 698 diff changeset	820	$a^{?\{n\}} \cdot a^{\{n\}}$ into a NFA, the size can blow up and then
30ea6b01db46 updated Christian Urban <christian.urban@kcl.ac.uk> parents: 698 diff changeset	821	even the relative fast (on small examples) breadth-first search can be
30ea6b01db46 updated Christian Urban <christian.urban@kcl.ac.uk> parents: 698 diff changeset	822	slow\ldots the red line maxes out at about 15 $n$s.
30ea6b01db46 updated Christian Urban <christian.urban@kcl.ac.uk> parents: 698 diff changeset	823
30ea6b01db46 updated Christian Urban <christian.urban@kcl.ac.uk> parents: 698 diff changeset	824
30ea6b01db46 updated Christian Urban <christian.urban@kcl.ac.uk> parents: 698 diff changeset	825	The graph on the right shows that with `evil' regular expressions also
30ea6b01db46 updated Christian Urban <christian.urban@kcl.ac.uk> parents: 698 diff changeset	826	the depth-first search can be abysmally slow. Even if the graphs not
30ea6b01db46 updated Christian Urban <christian.urban@kcl.ac.uk> parents: 698 diff changeset	827	completely overlap with the curves of Python, Ruby and Java, they are
30ea6b01db46 updated Christian Urban <christian.urban@kcl.ac.uk> parents: 698 diff changeset	828	similar enough.
489 4430477595ec updated Christian Urban <urbanc@in.tum.de> parents: 488 diff changeset	829
488 057b4603b940 updated Christian Urban <urbanc@in.tum.de> parents: 487 diff changeset	830
057b4603b940 updated Christian Urban <urbanc@in.tum.de> parents: 487 diff changeset	831	\begin{center}
057b4603b940 updated Christian Urban <urbanc@in.tum.de> parents: 487 diff changeset	832	\begin{tabular}{@{\hspace{-1mm}}c@{\hspace{1mm}}c@{}}
057b4603b940 updated Christian Urban <urbanc@in.tum.de> parents: 487 diff changeset	833	\begin{tikzpicture}
057b4603b940 updated Christian Urban <urbanc@in.tum.de> parents: 487 diff changeset	834	\begin{axis}[
490 8a07f7256f2a updated Christian Urban <urbanc@in.tum.de> parents: 489 diff changeset	835	title={Graph: $a^{?\{n\}} \cdot a^{\{n\}}$ and strings
489 4430477595ec updated Christian Urban <urbanc@in.tum.de> parents: 488 diff changeset	836	$\underbrace{\texttt{a}\ldots \texttt{a}}_{n}$},
490 8a07f7256f2a updated Christian Urban <urbanc@in.tum.de> parents: 489 diff changeset	837	title style={yshift=-2ex},
489 4430477595ec updated Christian Urban <urbanc@in.tum.de> parents: 488 diff changeset	838	xlabel={$n$},
4430477595ec updated Christian Urban <urbanc@in.tum.de> parents: 488 diff changeset	839	x label style={at={(1.05,0.0)}},
4430477595ec updated Christian Urban <urbanc@in.tum.de> parents: 488 diff changeset	840	ylabel={time in secs},
4430477595ec updated Christian Urban <urbanc@in.tum.de> parents: 488 diff changeset	841	enlargelimits=false,
4430477595ec updated Christian Urban <urbanc@in.tum.de> parents: 488 diff changeset	842	xtick={0,5,...,30},
4430477595ec updated Christian Urban <urbanc@in.tum.de> parents: 488 diff changeset	843	xmax=33,
4430477595ec updated Christian Urban <urbanc@in.tum.de> parents: 488 diff changeset	844	ymax=35,
4430477595ec updated Christian Urban <urbanc@in.tum.de> parents: 488 diff changeset	845	ytick={0,5,...,30},
4430477595ec updated Christian Urban <urbanc@in.tum.de> parents: 488 diff changeset	846	scaled ticks=false,
4430477595ec updated Christian Urban <urbanc@in.tum.de> parents: 488 diff changeset	847	axis lines=left,
4430477595ec updated Christian Urban <urbanc@in.tum.de> parents: 488 diff changeset	848	width=5.5cm,
490 8a07f7256f2a updated Christian Urban <urbanc@in.tum.de> parents: 489 diff changeset	849	height=4cm,
8a07f7256f2a updated Christian Urban <urbanc@in.tum.de> parents: 489 diff changeset	850	legend entries={Python,Ruby, breadth-first NFA},
8a07f7256f2a updated Christian Urban <urbanc@in.tum.de> parents: 489 diff changeset	851	legend style={at={(0.5,-0.25)},anchor=north,font=\small},
489 4430477595ec updated Christian Urban <urbanc@in.tum.de> parents: 488 diff changeset	852	legend cell align=left]
4430477595ec updated Christian Urban <urbanc@in.tum.de> parents: 488 diff changeset	853	\addplot[blue,mark=*, mark options={fill=white}] table {re-python.data};
4430477595ec updated Christian Urban <urbanc@in.tum.de> parents: 488 diff changeset	854	\addplot[brown,mark=triangle*, mark options={fill=white}] table {re-ruby.data};
4430477595ec updated Christian Urban <urbanc@in.tum.de> parents: 488 diff changeset	855	% breath-first search in NFAs
4430477595ec updated Christian Urban <urbanc@in.tum.de> parents: 488 diff changeset	856	\addplot[red,mark=*, mark options={fill=white}] table {
4430477595ec updated Christian Urban <urbanc@in.tum.de> parents: 488 diff changeset	857	1 0.00586
4430477595ec updated Christian Urban <urbanc@in.tum.de> parents: 488 diff changeset	858	2 0.01209
4430477595ec updated Christian Urban <urbanc@in.tum.de> parents: 488 diff changeset	859	3 0.03076
4430477595ec updated Christian Urban <urbanc@in.tum.de> parents: 488 diff changeset	860	4 0.08269
4430477595ec updated Christian Urban <urbanc@in.tum.de> parents: 488 diff changeset	861	5 0.12881
4430477595ec updated Christian Urban <urbanc@in.tum.de> parents: 488 diff changeset	862	6 0.25146
4430477595ec updated Christian Urban <urbanc@in.tum.de> parents: 488 diff changeset	863	7 0.51377
4430477595ec updated Christian Urban <urbanc@in.tum.de> parents: 488 diff changeset	864	8 0.89079
4430477595ec updated Christian Urban <urbanc@in.tum.de> parents: 488 diff changeset	865	9 1.62802
4430477595ec updated Christian Urban <urbanc@in.tum.de> parents: 488 diff changeset	866	10 3.05326
4430477595ec updated Christian Urban <urbanc@in.tum.de> parents: 488 diff changeset	867	11 5.92437
4430477595ec updated Christian Urban <urbanc@in.tum.de> parents: 488 diff changeset	868	12 11.67863
4430477595ec updated Christian Urban <urbanc@in.tum.de> parents: 488 diff changeset	869	13 24.00568
4430477595ec updated Christian Urban <urbanc@in.tum.de> parents: 488 diff changeset	870	};
4430477595ec updated Christian Urban <urbanc@in.tum.de> parents: 488 diff changeset	871	\end{axis}
4430477595ec updated Christian Urban <urbanc@in.tum.de> parents: 488 diff changeset	872	\end{tikzpicture}
4430477595ec updated Christian Urban <urbanc@in.tum.de> parents: 488 diff changeset	873	&
4430477595ec updated Christian Urban <urbanc@in.tum.de> parents: 488 diff changeset	874	\begin{tikzpicture}
4430477595ec updated Christian Urban <urbanc@in.tum.de> parents: 488 diff changeset	875	\begin{axis}[
490 8a07f7256f2a updated Christian Urban <urbanc@in.tum.de> parents: 489 diff changeset	876	title={Graph: $(a^)^ \cdot b$ and strings
488 057b4603b940 updated Christian Urban <urbanc@in.tum.de> parents: 487 diff changeset	877	$\underbrace{\texttt{a}\ldots \texttt{a}}_{n}$},
490 8a07f7256f2a updated Christian Urban <urbanc@in.tum.de> parents: 489 diff changeset	878	title style={yshift=-2ex},
488 057b4603b940 updated Christian Urban <urbanc@in.tum.de> parents: 487 diff changeset	879	xlabel={$n$},
057b4603b940 updated Christian Urban <urbanc@in.tum.de> parents: 487 diff changeset	880	x label style={at={(1.05,0.0)}},
057b4603b940 updated Christian Urban <urbanc@in.tum.de> parents: 487 diff changeset	881	ylabel={time in secs},
057b4603b940 updated Christian Urban <urbanc@in.tum.de> parents: 487 diff changeset	882	enlargelimits=false,
057b4603b940 updated Christian Urban <urbanc@in.tum.de> parents: 487 diff changeset	883	xtick={0,5,...,30},
057b4603b940 updated Christian Urban <urbanc@in.tum.de> parents: 487 diff changeset	884	xmax=33,
057b4603b940 updated Christian Urban <urbanc@in.tum.de> parents: 487 diff changeset	885	ymax=35,
057b4603b940 updated Christian Urban <urbanc@in.tum.de> parents: 487 diff changeset	886	ytick={0,5,...,30},
057b4603b940 updated Christian Urban <urbanc@in.tum.de> parents: 487 diff changeset	887	scaled ticks=false,
057b4603b940 updated Christian Urban <urbanc@in.tum.de> parents: 487 diff changeset	888	axis lines=left,
057b4603b940 updated Christian Urban <urbanc@in.tum.de> parents: 487 diff changeset	889	width=5.5cm,
490 8a07f7256f2a updated Christian Urban <urbanc@in.tum.de> parents: 489 diff changeset	890	height=4cm,
753 30ea6b01db46 updated Christian Urban <christian.urban@kcl.ac.uk> parents: 698 diff changeset	891	legend entries={Python, Java 8, depth-first NFA},
490 8a07f7256f2a updated Christian Urban <urbanc@in.tum.de> parents: 489 diff changeset	892	legend style={at={(0.5,-0.25)},anchor=north,font=\small},
488 057b4603b940 updated Christian Urban <urbanc@in.tum.de> parents: 487 diff changeset	893	legend cell align=left]
057b4603b940 updated Christian Urban <urbanc@in.tum.de> parents: 487 diff changeset	894	\addplot[blue,mark=*, mark options={fill=white}] table {re-python2.data};
057b4603b940 updated Christian Urban <urbanc@in.tum.de> parents: 487 diff changeset	895	\addplot[cyan,mark=*, mark options={fill=white}] table {re-java.data};
057b4603b940 updated Christian Urban <urbanc@in.tum.de> parents: 487 diff changeset	896	% depth-first search in NFAs
057b4603b940 updated Christian Urban <urbanc@in.tum.de> parents: 487 diff changeset	897	\addplot[red,mark=*, mark options={fill=white}] table {
057b4603b940 updated Christian Urban <urbanc@in.tum.de> parents: 487 diff changeset	898	1 0.00605
057b4603b940 updated Christian Urban <urbanc@in.tum.de> parents: 487 diff changeset	899	2 0.03086
057b4603b940 updated Christian Urban <urbanc@in.tum.de> parents: 487 diff changeset	900	3 0.11994
057b4603b940 updated Christian Urban <urbanc@in.tum.de> parents: 487 diff changeset	901	4 0.45389
057b4603b940 updated Christian Urban <urbanc@in.tum.de> parents: 487 diff changeset	902	5 2.06192
057b4603b940 updated Christian Urban <urbanc@in.tum.de> parents: 487 diff changeset	903	6 8.04894
057b4603b940 updated Christian Urban <urbanc@in.tum.de> parents: 487 diff changeset	904	7 32.63549
057b4603b940 updated Christian Urban <urbanc@in.tum.de> parents: 487 diff changeset	905	};
057b4603b940 updated Christian Urban <urbanc@in.tum.de> parents: 487 diff changeset	906	\end{axis}
057b4603b940 updated Christian Urban <urbanc@in.tum.de> parents: 487 diff changeset	907	\end{tikzpicture}
057b4603b940 updated Christian Urban <urbanc@in.tum.de> parents: 487 diff changeset	908	\end{tabular}
057b4603b940 updated Christian Urban <urbanc@in.tum.de> parents: 487 diff changeset	909	\end{center}
057b4603b940 updated Christian Urban <urbanc@in.tum.de> parents: 487 diff changeset	910
753 30ea6b01db46 updated Christian Urban <christian.urban@kcl.ac.uk> parents: 698 diff changeset	911	\noindent
30ea6b01db46 updated Christian Urban <christian.urban@kcl.ac.uk> parents: 698 diff changeset	912	OK\ldots now you know why regular expression matchers in those
30ea6b01db46 updated Christian Urban <christian.urban@kcl.ac.uk> parents: 698 diff changeset	913	languages are sometimes so slow. A bit surprising, don't you agree?
925 ff202426ec47 updated Christian Urban <christian.urban@kcl.ac.uk> parents: 874 diff changeset	914	Also it is still a mystery why Rust, which because of the reasons
935 3fb9b05465dd updated Christian Urban <christian.urban@kcl.ac.uk> parents: 931 diff changeset	915	above avoids NFAs and uses DFAs instead, cannot compete in all cases
925 ff202426ec47 updated Christian Urban <christian.urban@kcl.ac.uk> parents: 874 diff changeset	916	with our simple derivative-based regular expression matcher in Scala.
935 3fb9b05465dd updated Christian Urban <christian.urban@kcl.ac.uk> parents: 931 diff changeset	917	There is an explanation for this as well\ldots{}remember there the
3fb9b05465dd updated Christian Urban <christian.urban@kcl.ac.uk> parents: 931 diff changeset	918	offending examples are of the form $r^{\{n\}}$. Why could they be
3fb9b05465dd updated Christian Urban <christian.urban@kcl.ac.uk> parents: 931 diff changeset	919	a problem in Rust?
268 18bef085a7ca updated Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 251 diff changeset	920
490 8a07f7256f2a updated Christian Urban <urbanc@in.tum.de> parents: 489 diff changeset	921	\subsection*{Subset Construction}
8a07f7256f2a updated Christian Urban <urbanc@in.tum.de> parents: 489 diff changeset	922
925 ff202426ec47 updated Christian Urban <christian.urban@kcl.ac.uk> parents: 874 diff changeset	923	So of course, some clever developers of regular expression matchers are aware of
491 7a0182c66403 updated Christian Urban <urbanc@in.tum.de> parents: 490 diff changeset	924	these problems with sluggish NFAs and try to address them. One common
7a0182c66403 updated Christian Urban <urbanc@in.tum.de> parents: 490 diff changeset	925	technique for alleviating the problem I like to show you in this
7a0182c66403 updated Christian Urban <urbanc@in.tum.de> parents: 490 diff changeset	926	section. This will also explain why we insisted on polymorphic types in
7a0182c66403 updated Christian Urban <urbanc@in.tum.de> parents: 490 diff changeset	927	our DFA code (remember I used \texttt{A} and \texttt{C} for the types
7a0182c66403 updated Christian Urban <urbanc@in.tum.de> parents: 490 diff changeset	928	of states and the input, see Figure~\ref{dfa} on
7a0182c66403 updated Christian Urban <urbanc@in.tum.de> parents: 490 diff changeset	929	Page~\pageref{dfa}).\bigskip
268 18bef085a7ca updated Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 251 diff changeset	930
490 8a07f7256f2a updated Christian Urban <urbanc@in.tum.de> parents: 489 diff changeset	931	\noindent
662 7f7098f0b5f0 updated Christian Urban <urbanc@in.tum.de> parents: 578 diff changeset	932	To start, remember that we did not bother with defining and implementing
7f7098f0b5f0 updated Christian Urban <urbanc@in.tum.de> parents: 578 diff changeset	933	$\epsilon$NFAs: we immediately translated them into equivalent NFAs.
7f7098f0b5f0 updated Christian Urban <urbanc@in.tum.de> parents: 578 diff changeset	934	Equivalent in the sense of accepting the same language (though we only
7f7098f0b5f0 updated Christian Urban <urbanc@in.tum.de> parents: 578 diff changeset	935	claimed this and did not prove it rigorously). Remember also that NFAs
7f7098f0b5f0 updated Christian Urban <urbanc@in.tum.de> parents: 578 diff changeset	936	have non-deterministic transitions defined as a relation, or
7f7098f0b5f0 updated Christian Urban <urbanc@in.tum.de> parents: 578 diff changeset	937	alternatively my Scala implementation used transition functions returning sets of
7f7098f0b5f0 updated Christian Urban <urbanc@in.tum.de> parents: 578 diff changeset	938	states. This non-determinism is crucial for the Thompson Construction
7f7098f0b5f0 updated Christian Urban <urbanc@in.tum.de> parents: 578 diff changeset	939	to work (recall the cases for $\cdot$, $+$ and ${}^*$). But this
7f7098f0b5f0 updated Christian Urban <urbanc@in.tum.de> parents: 578 diff changeset	940	non-determinism makes it harder with NFAs to decide when a string is
7f7098f0b5f0 updated Christian Urban <urbanc@in.tum.de> parents: 578 diff changeset	941	accepted or not; whereas such a decision is rather straightforward with
7f7098f0b5f0 updated Christian Urban <urbanc@in.tum.de> parents: 578 diff changeset	942	DFAs: recall their transition function is a ``real'' function that returns
7f7098f0b5f0 updated Christian Urban <urbanc@in.tum.de> parents: 578 diff changeset	943	a single state. So with DFAs we do not have to search at all. What is
7f7098f0b5f0 updated Christian Urban <urbanc@in.tum.de> parents: 578 diff changeset	944	perhaps interesting is the fact that for every NFA we can find a DFA
7f7098f0b5f0 updated Christian Urban <urbanc@in.tum.de> parents: 578 diff changeset	945	that also recognises the same language. This might sound a bit
7f7098f0b5f0 updated Christian Urban <urbanc@in.tum.de> parents: 578 diff changeset	946	paradoxical: NFA $\rightarrow$ decision of acceptance hard; DFA
7f7098f0b5f0 updated Christian Urban <urbanc@in.tum.de> parents: 578 diff changeset	947	$\rightarrow$ decision easy. But this \emph{is} true\ldots but of course
925 ff202426ec47 updated Christian Urban <christian.urban@kcl.ac.uk> parents: 874 diff changeset	948	there is always a caveat---nothing ever is for free in life. Let's see
ff202426ec47 updated Christian Urban <christian.urban@kcl.ac.uk> parents: 874 diff changeset	949	what this caveat is.
488 057b4603b940 updated Christian Urban <urbanc@in.tum.de> parents: 487 diff changeset	950
491 7a0182c66403 updated Christian Urban <urbanc@in.tum.de> parents: 490 diff changeset	951	There are actually a number of methods for transforming a NFA into
7a0182c66403 updated Christian Urban <urbanc@in.tum.de> parents: 490 diff changeset	952	an equivalent DFA, but the most famous one is the \emph{subset
490 8a07f7256f2a updated Christian Urban <urbanc@in.tum.de> parents: 489 diff changeset	953	construction}. Consider the following NFA where the states are
491 7a0182c66403 updated Christian Urban <urbanc@in.tum.de> parents: 490 diff changeset	954	labelled with $0$, $1$ and $2$.
268 18bef085a7ca updated Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 251 diff changeset	955
18bef085a7ca updated Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 251 diff changeset	956	\begin{center}
18bef085a7ca updated Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 251 diff changeset	957	\begin{tabular}{c@{\hspace{10mm}}c}
18bef085a7ca updated Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 251 diff changeset	958	\begin{tikzpicture}[scale=0.7,>=stealth',very thick,
18bef085a7ca updated Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 251 diff changeset	959	every state/.style={minimum size=0pt,
18bef085a7ca updated Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 251 diff changeset	960	draw=blue!50,very thick,fill=blue!20},
490 8a07f7256f2a updated Christian Urban <urbanc@in.tum.de> parents: 489 diff changeset	961	baseline=(current bounding box.center)]
482 74149519e436 updated Christian Urban <urbanc@in.tum.de> parents: 480 diff changeset	962	\node[state,initial] (Q_0) {$0$};
490 8a07f7256f2a updated Christian Urban <urbanc@in.tum.de> parents: 489 diff changeset	963	\node[state] (Q_1) [below=of Q_0] {$1$};
8a07f7256f2a updated Christian Urban <urbanc@in.tum.de> parents: 489 diff changeset	964	\node[state, accepting] (Q_2) [below=of Q_1] {$2$};
8a07f7256f2a updated Christian Urban <urbanc@in.tum.de> parents: 489 diff changeset	965
8a07f7256f2a updated Christian Urban <urbanc@in.tum.de> parents: 489 diff changeset	966	\path[->] (Q_0) edge node [right] {$b$} (Q_1);
8a07f7256f2a updated Christian Urban <urbanc@in.tum.de> parents: 489 diff changeset	967	\path[->] (Q_1) edge node [right] {$a,b$} (Q_2);
8a07f7256f2a updated Christian Urban <urbanc@in.tum.de> parents: 489 diff changeset	968	\path[->] (Q_0) edge [loop above] node {$a, b$} ();
268 18bef085a7ca updated Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 251 diff changeset	969	\end{tikzpicture}
18bef085a7ca updated Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 251 diff changeset	970	&
490 8a07f7256f2a updated Christian Urban <urbanc@in.tum.de> parents: 489 diff changeset	971	\begin{tabular}{r\|ll}
8a07f7256f2a updated Christian Urban <urbanc@in.tum.de> parents: 489 diff changeset	972	states & $a$ & $b$\\
268 18bef085a7ca updated Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 251 diff changeset	973	\hline
344 408fd5994288 updated Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 333 diff changeset	974	$\{\}\phantom{\star}$ & $\{\}$ & $\{\}$\\
490 8a07f7256f2a updated Christian Urban <urbanc@in.tum.de> parents: 489 diff changeset	975	start: $\{0\}\phantom{\star}$ & $\{0\}$ & $\{0,1\}$\\
8a07f7256f2a updated Christian Urban <urbanc@in.tum.de> parents: 489 diff changeset	976	$\{1\}\phantom{\star}$ & $\{2\}$ & $\{2\}$\\
8a07f7256f2a updated Christian Urban <urbanc@in.tum.de> parents: 489 diff changeset	977	$\{2\}\star$ & $\{\}$ & $\{\}$\\
8a07f7256f2a updated Christian Urban <urbanc@in.tum.de> parents: 489 diff changeset	978	$\{0,1\}\phantom{\star}$ & $\{0,2\}$ & $\{0,1,2\}$\\
8a07f7256f2a updated Christian Urban <urbanc@in.tum.de> parents: 489 diff changeset	979	$\{0,2\}\star$ & $\{0\}$ & $\{0,1\}$\\
8a07f7256f2a updated Christian Urban <urbanc@in.tum.de> parents: 489 diff changeset	980	$\{1,2\}\star$ & $\{2\}$ & $\{2\}$\\
8a07f7256f2a updated Christian Urban <urbanc@in.tum.de> parents: 489 diff changeset	981	$\{0,1,2\}\star$ & $\{0,2\}$ & $\{0,1,2\}$\\
268 18bef085a7ca updated Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 251 diff changeset	982	\end{tabular}
18bef085a7ca updated Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 251 diff changeset	983	\end{tabular}
18bef085a7ca updated Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 251 diff changeset	984	\end{center}
18bef085a7ca updated Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 251 diff changeset	985
490 8a07f7256f2a updated Christian Urban <urbanc@in.tum.de> parents: 489 diff changeset	986	\noindent The states of the corresponding DFA are given by generating
491 7a0182c66403 updated Christian Urban <urbanc@in.tum.de> parents: 490 diff changeset	987	all subsets of the set $\{0,1,2\}$ (seen in the states column
490 8a07f7256f2a updated Christian Urban <urbanc@in.tum.de> parents: 489 diff changeset	988	in the table on the right). The other columns define the transition
491 7a0182c66403 updated Christian Urban <urbanc@in.tum.de> parents: 490 diff changeset	989	function for the DFA for inputs $a$ and $b$. The first row states that
490 8a07f7256f2a updated Christian Urban <urbanc@in.tum.de> parents: 489 diff changeset	990	$\{\}$ is the sink state which has transitions for $a$ and $b$ to
8a07f7256f2a updated Christian Urban <urbanc@in.tum.de> parents: 489 diff changeset	991	itself. The next three lines are calculated as follows:
268 18bef085a7ca updated Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 251 diff changeset	992
18bef085a7ca updated Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 251 diff changeset	993	\begin{itemize}
490 8a07f7256f2a updated Christian Urban <urbanc@in.tum.de> parents: 489 diff changeset	994	\item Suppose you calculate the entry for the $a$-transition for state
8a07f7256f2a updated Christian Urban <urbanc@in.tum.de> parents: 489 diff changeset	995	$\{0\}$. Look for all states in the NFA that can be reached by such
8a07f7256f2a updated Christian Urban <urbanc@in.tum.de> parents: 489 diff changeset	996	a transition from this state; this is only state $0$; therefore from
8a07f7256f2a updated Christian Urban <urbanc@in.tum.de> parents: 489 diff changeset	997	state $\{0\}$ we can go to state $\{0\}$ via an $a$-transition.
8a07f7256f2a updated Christian Urban <urbanc@in.tum.de> parents: 489 diff changeset	998	\item Do the same for the $b$-transition; you can reach states $0$ and
8a07f7256f2a updated Christian Urban <urbanc@in.tum.de> parents: 489 diff changeset	999	$1$ in the NFA; therefore in the DFA we can go from state $\{0\}$ to
8a07f7256f2a updated Christian Urban <urbanc@in.tum.de> parents: 489 diff changeset	1000	state $\{0,1\}$ via an $b$-transition.
8a07f7256f2a updated Christian Urban <urbanc@in.tum.de> parents: 489 diff changeset	1001	\item Continue with the states $\{1\}$ and $\{2\}$.
268 18bef085a7ca updated Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 251 diff changeset	1002	\end{itemize}
18bef085a7ca updated Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 251 diff changeset	1003
491 7a0182c66403 updated Christian Urban <urbanc@in.tum.de> parents: 490 diff changeset	1004	\noindent
7a0182c66403 updated Christian Urban <urbanc@in.tum.de> parents: 490 diff changeset	1005	Once you filled in the transitions for `simple' states $\{0\}$
7a0182c66403 updated Christian Urban <urbanc@in.tum.de> parents: 490 diff changeset	1006	.. $\{2\}$, you only have to build the union for the compound states
7a0182c66403 updated Christian Urban <urbanc@in.tum.de> parents: 490 diff changeset	1007	$\{0,1\}$, $\{0,2\}$ and so on. For example for $\{0,1\}$ you take the
7a0182c66403 updated Christian Urban <urbanc@in.tum.de> parents: 490 diff changeset	1008	union of Line $\{0\}$ and Line $\{1\}$, which gives $\{0,2\}$ for $a$,
7a0182c66403 updated Christian Urban <urbanc@in.tum.de> parents: 490 diff changeset	1009	and $\{0,1,2\}$ for $b$. And so on.
344 408fd5994288 updated Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 333 diff changeset	1010
491 7a0182c66403 updated Christian Urban <urbanc@in.tum.de> parents: 490 diff changeset	1011	The starting state of the DFA can be calculated from the starting
7a0182c66403 updated Christian Urban <urbanc@in.tum.de> parents: 490 diff changeset	1012	states of the NFA, that is in this case $\{0\}$. But in general there
7a0182c66403 updated Christian Urban <urbanc@in.tum.de> parents: 490 diff changeset	1013	can of course be many starting states in the NFA and you would take
7a0182c66403 updated Christian Urban <urbanc@in.tum.de> parents: 490 diff changeset	1014	the corresponding subset as \emph{the} starting state of the DFA.
7a0182c66403 updated Christian Urban <urbanc@in.tum.de> parents: 490 diff changeset	1015
7a0182c66403 updated Christian Urban <urbanc@in.tum.de> parents: 490 diff changeset	1016	The accepting states in the DFA are given by all sets that contain a
667 6127e8992a5c updated Christian Urban <urbanc@in.tum.de> parents: 662 diff changeset	1017	$2$, which is the only accepting state in this NFA. But again in
491 7a0182c66403 updated Christian Urban <urbanc@in.tum.de> parents: 490 diff changeset	1018	general if the subset contains any accepting state from the NFA, then
7a0182c66403 updated Christian Urban <urbanc@in.tum.de> parents: 490 diff changeset	1019	the corresponding state in the DFA is accepting as well. This
7a0182c66403 updated Christian Urban <urbanc@in.tum.de> parents: 490 diff changeset	1020	completes the subset construction. The corresponding DFA for the NFA
7a0182c66403 updated Christian Urban <urbanc@in.tum.de> parents: 490 diff changeset	1021	shown above is:
7a0182c66403 updated Christian Urban <urbanc@in.tum.de> parents: 490 diff changeset	1022
7a0182c66403 updated Christian Urban <urbanc@in.tum.de> parents: 490 diff changeset	1023	\begin{equation}
490 8a07f7256f2a updated Christian Urban <urbanc@in.tum.de> parents: 489 diff changeset	1024	\begin{tikzpicture}[scale=0.8,>=stealth',very thick,
344 408fd5994288 updated Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 333 diff changeset	1025	every state/.style={minimum size=0pt,
491 7a0182c66403 updated Christian Urban <urbanc@in.tum.de> parents: 490 diff changeset	1026	draw=blue!50,very thick,fill=blue!20},
7a0182c66403 updated Christian Urban <urbanc@in.tum.de> parents: 490 diff changeset	1027	baseline=(current bounding box.center)]
490 8a07f7256f2a updated Christian Urban <urbanc@in.tum.de> parents: 489 diff changeset	1028	\node[state,initial] (q0) {$0$};
8a07f7256f2a updated Christian Urban <urbanc@in.tum.de> parents: 489 diff changeset	1029	\node[state] (q01) [right=of q0] {$0,1$};
8a07f7256f2a updated Christian Urban <urbanc@in.tum.de> parents: 489 diff changeset	1030	\node[state,accepting] (q02) [below=of q01] {$0,2$};
8a07f7256f2a updated Christian Urban <urbanc@in.tum.de> parents: 489 diff changeset	1031	\node[state,accepting] (q012) [right=of q02] {$0,1,2$};
8a07f7256f2a updated Christian Urban <urbanc@in.tum.de> parents: 489 diff changeset	1032	\node[state] (q1) [below=0.5cm of q0] {$1$};
8a07f7256f2a updated Christian Urban <urbanc@in.tum.de> parents: 489 diff changeset	1033	\node[state,accepting] (q2) [below=1cm of q1] {$2$};
8a07f7256f2a updated Christian Urban <urbanc@in.tum.de> parents: 489 diff changeset	1034	\node[state] (qn) [below left=1cm of q2] {$\{\}$};
8a07f7256f2a updated Christian Urban <urbanc@in.tum.de> parents: 489 diff changeset	1035	\node[state,accepting] (q12) [below right=1cm of q2] {$1,2$};
8a07f7256f2a updated Christian Urban <urbanc@in.tum.de> parents: 489 diff changeset	1036
8a07f7256f2a updated Christian Urban <urbanc@in.tum.de> parents: 489 diff changeset	1037	\path[->] (q0) edge node [above] {$b$} (q01);
8a07f7256f2a updated Christian Urban <urbanc@in.tum.de> parents: 489 diff changeset	1038	\path[->] (q01) edge node [above] {$b$} (q012);
8a07f7256f2a updated Christian Urban <urbanc@in.tum.de> parents: 489 diff changeset	1039	\path[->] (q0) edge [loop above] node {$a$} ();
8a07f7256f2a updated Christian Urban <urbanc@in.tum.de> parents: 489 diff changeset	1040	\path[->] (q012) edge [loop right] node {$b$} ();
8a07f7256f2a updated Christian Urban <urbanc@in.tum.de> parents: 489 diff changeset	1041	\path[->] (q012) edge node [below] {$a$} (q02);
8a07f7256f2a updated Christian Urban <urbanc@in.tum.de> parents: 489 diff changeset	1042	\path[->] (q02) edge node [below] {$a$} (q0);
8a07f7256f2a updated Christian Urban <urbanc@in.tum.de> parents: 489 diff changeset	1043	\path[->] (q01) edge [bend left] node [left] {$a$} (q02);
8a07f7256f2a updated Christian Urban <urbanc@in.tum.de> parents: 489 diff changeset	1044	\path[->] (q02) edge [bend left] node [right] {$b$} (q01);
8a07f7256f2a updated Christian Urban <urbanc@in.tum.de> parents: 489 diff changeset	1045	\path[->] (q1) edge node [left] {$a,b$} (q2);
8a07f7256f2a updated Christian Urban <urbanc@in.tum.de> parents: 489 diff changeset	1046	\path[->] (q12) edge node [right] {$a, b$} (q2);
8a07f7256f2a updated Christian Urban <urbanc@in.tum.de> parents: 489 diff changeset	1047	\path[->] (q2) edge node [right] {$a, b$} (qn);
8a07f7256f2a updated Christian Urban <urbanc@in.tum.de> parents: 489 diff changeset	1048	\path[->] (qn) edge [loop left] node {$a,b$} ();
491 7a0182c66403 updated Christian Urban <urbanc@in.tum.de> parents: 490 diff changeset	1049	\end{tikzpicture}\label{subsetdfa}
7a0182c66403 updated Christian Urban <urbanc@in.tum.de> parents: 490 diff changeset	1050	\end{equation}
490 8a07f7256f2a updated Christian Urban <urbanc@in.tum.de> parents: 489 diff changeset	1051
8a07f7256f2a updated Christian Urban <urbanc@in.tum.de> parents: 489 diff changeset	1052	\noindent
8a07f7256f2a updated Christian Urban <urbanc@in.tum.de> parents: 489 diff changeset	1053	Please check that this is indeed a DFA. The big question is whether
491 7a0182c66403 updated Christian Urban <urbanc@in.tum.de> parents: 490 diff changeset	1054	this DFA can recognise the same language as the NFA we started with?
490 8a07f7256f2a updated Christian Urban <urbanc@in.tum.de> parents: 489 diff changeset	1055	I let you ponder about this question.
8a07f7256f2a updated Christian Urban <urbanc@in.tum.de> parents: 489 diff changeset	1056
8a07f7256f2a updated Christian Urban <urbanc@in.tum.de> parents: 489 diff changeset	1057
491 7a0182c66403 updated Christian Urban <urbanc@in.tum.de> parents: 490 diff changeset	1058	There are also two points to note: One is that very often in the
7a0182c66403 updated Christian Urban <urbanc@in.tum.de> parents: 490 diff changeset	1059	subset construction the resulting DFA contains a number of ``dead''
7a0182c66403 updated Christian Urban <urbanc@in.tum.de> parents: 490 diff changeset	1060	states that are never reachable from the starting state. This is
7a0182c66403 updated Christian Urban <urbanc@in.tum.de> parents: 490 diff changeset	1061	obvious in the example, where state $\{1\}$, $\{2\}$, $\{1,2\}$ and
7a0182c66403 updated Christian Urban <urbanc@in.tum.de> parents: 490 diff changeset	1062	$\{\}$ can never be reached from the starting state. But this might
7a0182c66403 updated Christian Urban <urbanc@in.tum.de> parents: 490 diff changeset	1063	not always be as obvious as that. In effect the DFA in this example is
7a0182c66403 updated Christian Urban <urbanc@in.tum.de> parents: 490 diff changeset	1064	not a \emph{minimal} DFA (more about this in a minute). Such dead
7a0182c66403 updated Christian Urban <urbanc@in.tum.de> parents: 490 diff changeset	1065	states can be safely removed without changing the language that is
7a0182c66403 updated Christian Urban <urbanc@in.tum.de> parents: 490 diff changeset	1066	recognised by the DFA. Another point is that in some cases, however,
7a0182c66403 updated Christian Urban <urbanc@in.tum.de> parents: 490 diff changeset	1067	the subset construction produces a DFA that does \emph{not} contain
7a0182c66403 updated Christian Urban <urbanc@in.tum.de> parents: 490 diff changeset	1068	any dead states\ldots{}this means it calculates a minimal DFA. Which
7a0182c66403 updated Christian Urban <urbanc@in.tum.de> parents: 490 diff changeset	1069	in turn means that in some cases the number of states can by going
7a0182c66403 updated Christian Urban <urbanc@in.tum.de> parents: 490 diff changeset	1070	from NFAs to DFAs exponentially increase, namely by $2^n$ (which is
7a0182c66403 updated Christian Urban <urbanc@in.tum.de> parents: 490 diff changeset	1071	the number of subsets you can form for sets of $n$ states). This blow
7a0182c66403 updated Christian Urban <urbanc@in.tum.de> parents: 490 diff changeset	1072	up in the number of states in the DFA is again bad news for how
7a0182c66403 updated Christian Urban <urbanc@in.tum.de> parents: 490 diff changeset	1073	quickly you can decide whether a string is accepted by a DFA or
7a0182c66403 updated Christian Urban <urbanc@in.tum.de> parents: 490 diff changeset	1074	not. So the caveat with DFAs is that they might make the task of
667 6127e8992a5c updated Christian Urban <urbanc@in.tum.de> parents: 662 diff changeset	1075	finding the next state trivial, but might require $2^n$ times as many
874 c3d78e7b731c updated Christian Urban <christian.urban@kcl.ac.uk> parents: 764 diff changeset	1076	states than a NFA.\bigskip
490 8a07f7256f2a updated Christian Urban <urbanc@in.tum.de> parents: 489 diff changeset	1077
491 7a0182c66403 updated Christian Urban <urbanc@in.tum.de> parents: 490 diff changeset	1078	\noindent
7a0182c66403 updated Christian Urban <urbanc@in.tum.de> parents: 490 diff changeset	1079	To conclude this section, how conveniently we can
7a0182c66403 updated Christian Urban <urbanc@in.tum.de> parents: 490 diff changeset	1080	implement the subset construction with our versions of NFAs and
698 eed94d5780c5 updated Christian Urban <urbanc@in.tum.de> parents: 667 diff changeset	1081	DFAs? Very conveniently. The code is just:
490 8a07f7256f2a updated Christian Urban <urbanc@in.tum.de> parents: 489 diff changeset	1082
8a07f7256f2a updated Christian Urban <urbanc@in.tum.de> parents: 489 diff changeset	1083	{\small\begin{lstlisting}[language=Scala]
8a07f7256f2a updated Christian Urban <urbanc@in.tum.de> parents: 489 diff changeset	1084	def subset[A, C](nfa: NFA[A, C]) : DFA[Set[A], C] = {
8a07f7256f2a updated Christian Urban <urbanc@in.tum.de> parents: 489 diff changeset	1085	DFA(nfa.starts,
8a07f7256f2a updated Christian Urban <urbanc@in.tum.de> parents: 489 diff changeset	1086	{ case (qs, c) => nfa.nexts(qs, c) },
8a07f7256f2a updated Christian Urban <urbanc@in.tum.de> parents: 489 diff changeset	1087	_.exists(nfa.fins))
8a07f7256f2a updated Christian Urban <urbanc@in.tum.de> parents: 489 diff changeset	1088	}
8a07f7256f2a updated Christian Urban <urbanc@in.tum.de> parents: 489 diff changeset	1089	\end{lstlisting}}
8a07f7256f2a updated Christian Urban <urbanc@in.tum.de> parents: 489 diff changeset	1090
491 7a0182c66403 updated Christian Urban <urbanc@in.tum.de> parents: 490 diff changeset	1091	\noindent
7a0182c66403 updated Christian Urban <urbanc@in.tum.de> parents: 490 diff changeset	1092	The interesting point in this code is that the state type of the
7a0182c66403 updated Christian Urban <urbanc@in.tum.de> parents: 490 diff changeset	1093	calculated DFA is \texttt{Set[A]}. Think carefully that this works out
7a0182c66403 updated Christian Urban <urbanc@in.tum.de> parents: 490 diff changeset	1094	correctly.
490 8a07f7256f2a updated Christian Urban <urbanc@in.tum.de> parents: 489 diff changeset	1095
491 7a0182c66403 updated Christian Urban <urbanc@in.tum.de> parents: 490 diff changeset	1096	The DFA is then given by three components: the starting states, the
7a0182c66403 updated Christian Urban <urbanc@in.tum.de> parents: 490 diff changeset	1097	transition function and the accepting-states function. The starting
7a0182c66403 updated Christian Urban <urbanc@in.tum.de> parents: 490 diff changeset	1098	states are a set in the given NFA, but a single state in the DFA. The
7a0182c66403 updated Christian Urban <urbanc@in.tum.de> parents: 490 diff changeset	1099	transition function, given the state \texttt{qs} and input \texttt{c},
7a0182c66403 updated Christian Urban <urbanc@in.tum.de> parents: 490 diff changeset	1100	needs to produce the next state: this is the set of all NFA states
7a0182c66403 updated Christian Urban <urbanc@in.tum.de> parents: 490 diff changeset	1101	that are reachable from each state in \texttt{qs}. The function
7a0182c66403 updated Christian Urban <urbanc@in.tum.de> parents: 490 diff changeset	1102	\texttt{nexts} from the NFA class already calculates this for us. The
667 6127e8992a5c updated Christian Urban <urbanc@in.tum.de> parents: 662 diff changeset	1103	accepting-states function for the DFA is true whenever at least one
491 7a0182c66403 updated Christian Urban <urbanc@in.tum.de> parents: 490 diff changeset	1104	state in the subset is accepting (that is true) in the NFA.\medskip
7a0182c66403 updated Christian Urban <urbanc@in.tum.de> parents: 490 diff changeset	1105
7a0182c66403 updated Christian Urban <urbanc@in.tum.de> parents: 490 diff changeset	1106	\noindent
495 acd4567735ce updated Christian Urban <urbanc@in.tum.de> parents: 492 diff changeset	1107	You might be able to spend some quality time tinkering with this code
acd4567735ce updated Christian Urban <urbanc@in.tum.de> parents: 492 diff changeset	1108	and time to ponder about it. Then you will probably notice that it is
acd4567735ce updated Christian Urban <urbanc@in.tum.de> parents: 492 diff changeset	1109	actually a bit silly. The whole point of translating the NFA into a
acd4567735ce updated Christian Urban <urbanc@in.tum.de> parents: 492 diff changeset	1110	DFA via the subset construction is to make the decision of whether a
acd4567735ce updated Christian Urban <urbanc@in.tum.de> parents: 492 diff changeset	1111	string is accepted or not faster. Given the code above, the generated
acd4567735ce updated Christian Urban <urbanc@in.tum.de> parents: 492 diff changeset	1112	DFA will be exactly as fast, or as slow, as the NFA we started with
acd4567735ce updated Christian Urban <urbanc@in.tum.de> parents: 492 diff changeset	1113	(actually it will even be a tiny bit slower). The reason is that we
acd4567735ce updated Christian Urban <urbanc@in.tum.de> parents: 492 diff changeset	1114	just re-use the \texttt{nexts} function from the NFA. This function
acd4567735ce updated Christian Urban <urbanc@in.tum.de> parents: 492 diff changeset	1115	implements the non-deterministic breadth-first search. You might be
acd4567735ce updated Christian Urban <urbanc@in.tum.de> parents: 492 diff changeset	1116	thinking: This is cheating! \ldots{} Well, not quite as you will see
acd4567735ce updated Christian Urban <urbanc@in.tum.de> parents: 492 diff changeset	1117	later, but in terms of speed we still need to work a bit in order to
acd4567735ce updated Christian Urban <urbanc@in.tum.de> parents: 492 diff changeset	1118	get sometimes(!) a faster DFA. Let's do this next.
490 8a07f7256f2a updated Christian Urban <urbanc@in.tum.de> parents: 489 diff changeset	1119
8a07f7256f2a updated Christian Urban <urbanc@in.tum.de> parents: 489 diff changeset	1120	\subsection*{DFA Minimisation}
8a07f7256f2a updated Christian Urban <urbanc@in.tum.de> parents: 489 diff changeset	1121
491 7a0182c66403 updated Christian Urban <urbanc@in.tum.de> parents: 490 diff changeset	1122	As seen in \eqref{subsetdfa}, the subset construction from NFA to a
7a0182c66403 updated Christian Urban <urbanc@in.tum.de> parents: 490 diff changeset	1123	DFA can result in a rather ``inefficient'' DFA. Meaning there are
7a0182c66403 updated Christian Urban <urbanc@in.tum.de> parents: 490 diff changeset	1124	states that are not needed. There are two kinds of such unneeded
495 acd4567735ce updated Christian Urban <urbanc@in.tum.de> parents: 492 diff changeset	1125	states: \emph{unreachable} states and \emph{non-distinguishable}
491 7a0182c66403 updated Christian Urban <urbanc@in.tum.de> parents: 490 diff changeset	1126	states. The first kind of states can just be removed without affecting
7a0182c66403 updated Christian Urban <urbanc@in.tum.de> parents: 490 diff changeset	1127	the language that can be recognised (after all they are
7a0182c66403 updated Christian Urban <urbanc@in.tum.de> parents: 490 diff changeset	1128	unreachable). The second kind can also be recognised and thus a DFA
7a0182c66403 updated Christian Urban <urbanc@in.tum.de> parents: 490 diff changeset	1129	can be \emph{minimised} by the following algorithm:
490 8a07f7256f2a updated Christian Urban <urbanc@in.tum.de> parents: 489 diff changeset	1130
8a07f7256f2a updated Christian Urban <urbanc@in.tum.de> parents: 489 diff changeset	1131	\begin{enumerate}
8a07f7256f2a updated Christian Urban <urbanc@in.tum.de> parents: 489 diff changeset	1132	\item Take all pairs $(q, p)$ with $q \not= p$
8a07f7256f2a updated Christian Urban <urbanc@in.tum.de> parents: 489 diff changeset	1133	\item Mark all pairs that accepting and non-accepting states
8a07f7256f2a updated Christian Urban <urbanc@in.tum.de> parents: 489 diff changeset	1134	\item For all unmarked pairs $(q, p)$ and all characters $c$
8a07f7256f2a updated Christian Urban <urbanc@in.tum.de> parents: 489 diff changeset	1135	test whether
8a07f7256f2a updated Christian Urban <urbanc@in.tum.de> parents: 489 diff changeset	1136
8a07f7256f2a updated Christian Urban <urbanc@in.tum.de> parents: 489 diff changeset	1137	\begin{center}
8a07f7256f2a updated Christian Urban <urbanc@in.tum.de> parents: 489 diff changeset	1138	$(\delta(q, c), \delta(p,c))$
8a07f7256f2a updated Christian Urban <urbanc@in.tum.de> parents: 489 diff changeset	1139	\end{center}
8a07f7256f2a updated Christian Urban <urbanc@in.tum.de> parents: 489 diff changeset	1140
8a07f7256f2a updated Christian Urban <urbanc@in.tum.de> parents: 489 diff changeset	1141	are marked. If there is one, then also mark $(q, p)$.
8a07f7256f2a updated Christian Urban <urbanc@in.tum.de> parents: 489 diff changeset	1142	\item Repeat last step until no change.
8a07f7256f2a updated Christian Urban <urbanc@in.tum.de> parents: 489 diff changeset	1143	\item All unmarked pairs can be merged.
8a07f7256f2a updated Christian Urban <urbanc@in.tum.de> parents: 489 diff changeset	1144	\end{enumerate}
8a07f7256f2a updated Christian Urban <urbanc@in.tum.de> parents: 489 diff changeset	1145
491 7a0182c66403 updated Christian Urban <urbanc@in.tum.de> parents: 490 diff changeset	1146	\noindent Unfortunately, once we throw away all unreachable states in
7a0182c66403 updated Christian Urban <urbanc@in.tum.de> parents: 490 diff changeset	1147	\eqref{subsetdfa}, all remaining states are needed. In order to
7a0182c66403 updated Christian Urban <urbanc@in.tum.de> parents: 490 diff changeset	1148	illustrate the minimisation algorithm, consider the following DFA.
344 408fd5994288 updated Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 333 diff changeset	1149
490 8a07f7256f2a updated Christian Urban <urbanc@in.tum.de> parents: 489 diff changeset	1150	\begin{center}
8a07f7256f2a updated Christian Urban <urbanc@in.tum.de> parents: 489 diff changeset	1151	\begin{tikzpicture}[>=stealth',very thick,auto,
8a07f7256f2a updated Christian Urban <urbanc@in.tum.de> parents: 489 diff changeset	1152	every state/.style={minimum size=0pt,
8a07f7256f2a updated Christian Urban <urbanc@in.tum.de> parents: 489 diff changeset	1153	inner sep=2pt,draw=blue!50,very thick,
8a07f7256f2a updated Christian Urban <urbanc@in.tum.de> parents: 489 diff changeset	1154	fill=blue!20}]
8a07f7256f2a updated Christian Urban <urbanc@in.tum.de> parents: 489 diff changeset	1155	\node[state,initial] (Q_0) {$Q_0$};
8a07f7256f2a updated Christian Urban <urbanc@in.tum.de> parents: 489 diff changeset	1156	\node[state] (Q_1) [right=of Q_0] {$Q_1$};
8a07f7256f2a updated Christian Urban <urbanc@in.tum.de> parents: 489 diff changeset	1157	\node[state] (Q_2) [below right=of Q_0] {$Q_2$};
8a07f7256f2a updated Christian Urban <urbanc@in.tum.de> parents: 489 diff changeset	1158	\node[state] (Q_3) [right=of Q_2] {$Q_3$};
8a07f7256f2a updated Christian Urban <urbanc@in.tum.de> parents: 489 diff changeset	1159	\node[state, accepting] (Q_4) [right=of Q_1] {$Q_4$};
8a07f7256f2a updated Christian Urban <urbanc@in.tum.de> parents: 489 diff changeset	1160	\path[->] (Q_0) edge node [above] {$a$} (Q_1);
8a07f7256f2a updated Christian Urban <urbanc@in.tum.de> parents: 489 diff changeset	1161	\path[->] (Q_1) edge node [above] {$a$} (Q_4);
8a07f7256f2a updated Christian Urban <urbanc@in.tum.de> parents: 489 diff changeset	1162	\path[->] (Q_4) edge [loop right] node {$a, b$} ();
8a07f7256f2a updated Christian Urban <urbanc@in.tum.de> parents: 489 diff changeset	1163	\path[->] (Q_3) edge node [right] {$a$} (Q_4);
8a07f7256f2a updated Christian Urban <urbanc@in.tum.de> parents: 489 diff changeset	1164	\path[->] (Q_2) edge node [above] {$a$} (Q_3);
8a07f7256f2a updated Christian Urban <urbanc@in.tum.de> parents: 489 diff changeset	1165	\path[->] (Q_1) edge node [right] {$b$} (Q_2);
8a07f7256f2a updated Christian Urban <urbanc@in.tum.de> parents: 489 diff changeset	1166	\path[->] (Q_0) edge node [above] {$b$} (Q_2);
8a07f7256f2a updated Christian Urban <urbanc@in.tum.de> parents: 489 diff changeset	1167	\path[->] (Q_2) edge [loop left] node {$b$} ();
8a07f7256f2a updated Christian Urban <urbanc@in.tum.de> parents: 489 diff changeset	1168	\path[->] (Q_3) edge [bend left=95, looseness=1.3] node
8a07f7256f2a updated Christian Urban <urbanc@in.tum.de> parents: 489 diff changeset	1169	[below] {$b$} (Q_0);
8a07f7256f2a updated Christian Urban <urbanc@in.tum.de> parents: 489 diff changeset	1170	\end{tikzpicture}
8a07f7256f2a updated Christian Urban <urbanc@in.tum.de> parents: 489 diff changeset	1171	\end{center}
8a07f7256f2a updated Christian Urban <urbanc@in.tum.de> parents: 489 diff changeset	1172
8a07f7256f2a updated Christian Urban <urbanc@in.tum.de> parents: 489 diff changeset	1173	\noindent In Step 1 and 2 we consider essentially a triangle
8a07f7256f2a updated Christian Urban <urbanc@in.tum.de> parents: 489 diff changeset	1174	of the form
8a07f7256f2a updated Christian Urban <urbanc@in.tum.de> parents: 489 diff changeset	1175
8a07f7256f2a updated Christian Urban <urbanc@in.tum.de> parents: 489 diff changeset	1176	\begin{center}
8a07f7256f2a updated Christian Urban <urbanc@in.tum.de> parents: 489 diff changeset	1177	\begin{tikzpicture}[scale=0.6,line width=0.8mm]
8a07f7256f2a updated Christian Urban <urbanc@in.tum.de> parents: 489 diff changeset	1178	\draw (0,0) -- (4,0);
8a07f7256f2a updated Christian Urban <urbanc@in.tum.de> parents: 489 diff changeset	1179	\draw (0,1) -- (4,1);
8a07f7256f2a updated Christian Urban <urbanc@in.tum.de> parents: 489 diff changeset	1180	\draw (0,2) -- (3,2);
8a07f7256f2a updated Christian Urban <urbanc@in.tum.de> parents: 489 diff changeset	1181	\draw (0,3) -- (2,3);
8a07f7256f2a updated Christian Urban <urbanc@in.tum.de> parents: 489 diff changeset	1182	\draw (0,4) -- (1,4);
8a07f7256f2a updated Christian Urban <urbanc@in.tum.de> parents: 489 diff changeset	1183
8a07f7256f2a updated Christian Urban <urbanc@in.tum.de> parents: 489 diff changeset	1184	\draw (0,0) -- (0, 4);
8a07f7256f2a updated Christian Urban <urbanc@in.tum.de> parents: 489 diff changeset	1185	\draw (1,0) -- (1, 4);
8a07f7256f2a updated Christian Urban <urbanc@in.tum.de> parents: 489 diff changeset	1186	\draw (2,0) -- (2, 3);
8a07f7256f2a updated Christian Urban <urbanc@in.tum.de> parents: 489 diff changeset	1187	\draw (3,0) -- (3, 2);
8a07f7256f2a updated Christian Urban <urbanc@in.tum.de> parents: 489 diff changeset	1188	\draw (4,0) -- (4, 1);
8a07f7256f2a updated Christian Urban <urbanc@in.tum.de> parents: 489 diff changeset	1189
8a07f7256f2a updated Christian Urban <urbanc@in.tum.de> parents: 489 diff changeset	1190	\draw (0.5,-0.5) node {$Q_0$};
8a07f7256f2a updated Christian Urban <urbanc@in.tum.de> parents: 489 diff changeset	1191	\draw (1.5,-0.5) node {$Q_1$};
8a07f7256f2a updated Christian Urban <urbanc@in.tum.de> parents: 489 diff changeset	1192	\draw (2.5,-0.5) node {$Q_2$};
8a07f7256f2a updated Christian Urban <urbanc@in.tum.de> parents: 489 diff changeset	1193	\draw (3.5,-0.5) node {$Q_3$};
8a07f7256f2a updated Christian Urban <urbanc@in.tum.de> parents: 489 diff changeset	1194
8a07f7256f2a updated Christian Urban <urbanc@in.tum.de> parents: 489 diff changeset	1195	\draw (-0.5, 3.5) node {$Q_1$};
8a07f7256f2a updated Christian Urban <urbanc@in.tum.de> parents: 489 diff changeset	1196	\draw (-0.5, 2.5) node {$Q_2$};
8a07f7256f2a updated Christian Urban <urbanc@in.tum.de> parents: 489 diff changeset	1197	\draw (-0.5, 1.5) node {$Q_3$};
8a07f7256f2a updated Christian Urban <urbanc@in.tum.de> parents: 489 diff changeset	1198	\draw (-0.5, 0.5) node {$Q_4$};
8a07f7256f2a updated Christian Urban <urbanc@in.tum.de> parents: 489 diff changeset	1199
8a07f7256f2a updated Christian Urban <urbanc@in.tum.de> parents: 489 diff changeset	1200	\draw (0.5,0.5) node {\large$\star$};
8a07f7256f2a updated Christian Urban <urbanc@in.tum.de> parents: 489 diff changeset	1201	\draw (1.5,0.5) node {\large$\star$};
8a07f7256f2a updated Christian Urban <urbanc@in.tum.de> parents: 489 diff changeset	1202	\draw (2.5,0.5) node {\large$\star$};
8a07f7256f2a updated Christian Urban <urbanc@in.tum.de> parents: 489 diff changeset	1203	\draw (3.5,0.5) node {\large$\star$};
8a07f7256f2a updated Christian Urban <urbanc@in.tum.de> parents: 489 diff changeset	1204	\end{tikzpicture}
8a07f7256f2a updated Christian Urban <urbanc@in.tum.de> parents: 489 diff changeset	1205	\end{center}
8a07f7256f2a updated Christian Urban <urbanc@in.tum.de> parents: 489 diff changeset	1206
8a07f7256f2a updated Christian Urban <urbanc@in.tum.de> parents: 489 diff changeset	1207	\noindent where the lower row is filled with stars, because in
8a07f7256f2a updated Christian Urban <urbanc@in.tum.de> parents: 489 diff changeset	1208	the corresponding pairs there is always one state that is
8a07f7256f2a updated Christian Urban <urbanc@in.tum.de> parents: 489 diff changeset	1209	accepting ($Q_4$) and a state that is non-accepting (the other
8a07f7256f2a updated Christian Urban <urbanc@in.tum.de> parents: 489 diff changeset	1210	states).
8a07f7256f2a updated Christian Urban <urbanc@in.tum.de> parents: 489 diff changeset	1211
491 7a0182c66403 updated Christian Urban <urbanc@in.tum.de> parents: 490 diff changeset	1212	In Step 3 we need to fill in more stars according whether
490 8a07f7256f2a updated Christian Urban <urbanc@in.tum.de> parents: 489 diff changeset	1213	one of the next-state pairs are marked. We have to do this
753 30ea6b01db46 updated Christian Urban <christian.urban@kcl.ac.uk> parents: 698 diff changeset	1214	for every unmarked field until there is no change any more.
490 8a07f7256f2a updated Christian Urban <urbanc@in.tum.de> parents: 489 diff changeset	1215	This gives the triangle
8a07f7256f2a updated Christian Urban <urbanc@in.tum.de> parents: 489 diff changeset	1216
8a07f7256f2a updated Christian Urban <urbanc@in.tum.de> parents: 489 diff changeset	1217	\begin{center}
8a07f7256f2a updated Christian Urban <urbanc@in.tum.de> parents: 489 diff changeset	1218	\begin{tikzpicture}[scale=0.6,line width=0.8mm]
8a07f7256f2a updated Christian Urban <urbanc@in.tum.de> parents: 489 diff changeset	1219	\draw (0,0) -- (4,0);
8a07f7256f2a updated Christian Urban <urbanc@in.tum.de> parents: 489 diff changeset	1220	\draw (0,1) -- (4,1);
8a07f7256f2a updated Christian Urban <urbanc@in.tum.de> parents: 489 diff changeset	1221	\draw (0,2) -- (3,2);
8a07f7256f2a updated Christian Urban <urbanc@in.tum.de> parents: 489 diff changeset	1222	\draw (0,3) -- (2,3);
8a07f7256f2a updated Christian Urban <urbanc@in.tum.de> parents: 489 diff changeset	1223	\draw (0,4) -- (1,4);
8a07f7256f2a updated Christian Urban <urbanc@in.tum.de> parents: 489 diff changeset	1224
8a07f7256f2a updated Christian Urban <urbanc@in.tum.de> parents: 489 diff changeset	1225	\draw (0,0) -- (0, 4);
8a07f7256f2a updated Christian Urban <urbanc@in.tum.de> parents: 489 diff changeset	1226	\draw (1,0) -- (1, 4);
8a07f7256f2a updated Christian Urban <urbanc@in.tum.de> parents: 489 diff changeset	1227	\draw (2,0) -- (2, 3);
8a07f7256f2a updated Christian Urban <urbanc@in.tum.de> parents: 489 diff changeset	1228	\draw (3,0) -- (3, 2);
8a07f7256f2a updated Christian Urban <urbanc@in.tum.de> parents: 489 diff changeset	1229	\draw (4,0) -- (4, 1);
8a07f7256f2a updated Christian Urban <urbanc@in.tum.de> parents: 489 diff changeset	1230
8a07f7256f2a updated Christian Urban <urbanc@in.tum.de> parents: 489 diff changeset	1231	\draw (0.5,-0.5) node {$Q_0$};
8a07f7256f2a updated Christian Urban <urbanc@in.tum.de> parents: 489 diff changeset	1232	\draw (1.5,-0.5) node {$Q_1$};
8a07f7256f2a updated Christian Urban <urbanc@in.tum.de> parents: 489 diff changeset	1233	\draw (2.5,-0.5) node {$Q_2$};
8a07f7256f2a updated Christian Urban <urbanc@in.tum.de> parents: 489 diff changeset	1234	\draw (3.5,-0.5) node {$Q_3$};
8a07f7256f2a updated Christian Urban <urbanc@in.tum.de> parents: 489 diff changeset	1235
8a07f7256f2a updated Christian Urban <urbanc@in.tum.de> parents: 489 diff changeset	1236	\draw (-0.5, 3.5) node {$Q_1$};
8a07f7256f2a updated Christian Urban <urbanc@in.tum.de> parents: 489 diff changeset	1237	\draw (-0.5, 2.5) node {$Q_2$};
8a07f7256f2a updated Christian Urban <urbanc@in.tum.de> parents: 489 diff changeset	1238	\draw (-0.5, 1.5) node {$Q_3$};
8a07f7256f2a updated Christian Urban <urbanc@in.tum.de> parents: 489 diff changeset	1239	\draw (-0.5, 0.5) node {$Q_4$};
8a07f7256f2a updated Christian Urban <urbanc@in.tum.de> parents: 489 diff changeset	1240
8a07f7256f2a updated Christian Urban <urbanc@in.tum.de> parents: 489 diff changeset	1241	\draw (0.5,0.5) node {\large$\star$};
8a07f7256f2a updated Christian Urban <urbanc@in.tum.de> parents: 489 diff changeset	1242	\draw (1.5,0.5) node {\large$\star$};
8a07f7256f2a updated Christian Urban <urbanc@in.tum.de> parents: 489 diff changeset	1243	\draw (2.5,0.5) node {\large$\star$};
8a07f7256f2a updated Christian Urban <urbanc@in.tum.de> parents: 489 diff changeset	1244	\draw (3.5,0.5) node {\large$\star$};
8a07f7256f2a updated Christian Urban <urbanc@in.tum.de> parents: 489 diff changeset	1245	\draw (0.5,1.5) node {\large$\star$};
8a07f7256f2a updated Christian Urban <urbanc@in.tum.de> parents: 489 diff changeset	1246	\draw (2.5,1.5) node {\large$\star$};
8a07f7256f2a updated Christian Urban <urbanc@in.tum.de> parents: 489 diff changeset	1247	\draw (0.5,3.5) node {\large$\star$};
8a07f7256f2a updated Christian Urban <urbanc@in.tum.de> parents: 489 diff changeset	1248	\draw (1.5,2.5) node {\large$\star$};
8a07f7256f2a updated Christian Urban <urbanc@in.tum.de> parents: 489 diff changeset	1249	\end{tikzpicture}
8a07f7256f2a updated Christian Urban <urbanc@in.tum.de> parents: 489 diff changeset	1250	\end{center}
8a07f7256f2a updated Christian Urban <urbanc@in.tum.de> parents: 489 diff changeset	1251
8a07f7256f2a updated Christian Urban <urbanc@in.tum.de> parents: 489 diff changeset	1252	\noindent which means states $Q_0$ and $Q_2$, as well as $Q_1$
8a07f7256f2a updated Christian Urban <urbanc@in.tum.de> parents: 489 diff changeset	1253	and $Q_3$ can be merged. This gives the following minimal DFA
8a07f7256f2a updated Christian Urban <urbanc@in.tum.de> parents: 489 diff changeset	1254
8a07f7256f2a updated Christian Urban <urbanc@in.tum.de> parents: 489 diff changeset	1255	\begin{center}
8a07f7256f2a updated Christian Urban <urbanc@in.tum.de> parents: 489 diff changeset	1256	\begin{tikzpicture}[>=stealth',very thick,auto,
8a07f7256f2a updated Christian Urban <urbanc@in.tum.de> parents: 489 diff changeset	1257	every state/.style={minimum size=0pt,
8a07f7256f2a updated Christian Urban <urbanc@in.tum.de> parents: 489 diff changeset	1258	inner sep=2pt,draw=blue!50,very thick,
8a07f7256f2a updated Christian Urban <urbanc@in.tum.de> parents: 489 diff changeset	1259	fill=blue!20}]
8a07f7256f2a updated Christian Urban <urbanc@in.tum.de> parents: 489 diff changeset	1260	\node[state,initial] (Q_02) {$Q_{0, 2}$};
8a07f7256f2a updated Christian Urban <urbanc@in.tum.de> parents: 489 diff changeset	1261	\node[state] (Q_13) [right=of Q_02] {$Q_{1, 3}$};
8a07f7256f2a updated Christian Urban <urbanc@in.tum.de> parents: 489 diff changeset	1262	\node[state, accepting] (Q_4) [right=of Q_13]
8a07f7256f2a updated Christian Urban <urbanc@in.tum.de> parents: 489 diff changeset	1263	{$Q_{4\phantom{,0}}$};
8a07f7256f2a updated Christian Urban <urbanc@in.tum.de> parents: 489 diff changeset	1264	\path[->] (Q_02) edge [bend left] node [above] {$a$} (Q_13);
8a07f7256f2a updated Christian Urban <urbanc@in.tum.de> parents: 489 diff changeset	1265	\path[->] (Q_13) edge [bend left] node [below] {$b$} (Q_02);
8a07f7256f2a updated Christian Urban <urbanc@in.tum.de> parents: 489 diff changeset	1266	\path[->] (Q_02) edge [loop below] node {$b$} ();
8a07f7256f2a updated Christian Urban <urbanc@in.tum.de> parents: 489 diff changeset	1267	\path[->] (Q_13) edge node [above] {$a$} (Q_4);
8a07f7256f2a updated Christian Urban <urbanc@in.tum.de> parents: 489 diff changeset	1268	\path[->] (Q_4) edge [loop above] node {$a, b$} ();
344 408fd5994288 updated Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 333 diff changeset	1269	\end{tikzpicture}
408fd5994288 updated Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 333 diff changeset	1270	\end{center}
408fd5994288 updated Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 333 diff changeset	1271
408fd5994288 updated Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 333 diff changeset	1272
753 30ea6b01db46 updated Christian Urban <christian.urban@kcl.ac.uk> parents: 698 diff changeset	1273	\noindent This minimised DFA is certainly fast when it comes deciding
30ea6b01db46 updated Christian Urban <christian.urban@kcl.ac.uk> parents: 698 diff changeset	1274	whether a string is accepted or not. But this is not universally the
30ea6b01db46 updated Christian Urban <christian.urban@kcl.ac.uk> parents: 698 diff changeset	1275	case. Suppose you count the nodes in a regular expression (when
30ea6b01db46 updated Christian Urban <christian.urban@kcl.ac.uk> parents: 698 diff changeset	1276	represented as tree). If you look carefully at the Thompson
30ea6b01db46 updated Christian Urban <christian.urban@kcl.ac.uk> parents: 698 diff changeset	1277	Construction you can see that the constructed NFA has states that grow
30ea6b01db46 updated Christian Urban <christian.urban@kcl.ac.uk> parents: 698 diff changeset	1278	linearly in terms of the size of the regular expression. This is good,
30ea6b01db46 updated Christian Urban <christian.urban@kcl.ac.uk> parents: 698 diff changeset	1279	but as we have seen earlier deciding whether a string is matched by an
30ea6b01db46 updated Christian Urban <christian.urban@kcl.ac.uk> parents: 698 diff changeset	1280	NFA is hard. Translating an NFA into a DFA means deciding whether a
30ea6b01db46 updated Christian Urban <christian.urban@kcl.ac.uk> parents: 698 diff changeset	1281	string is matched by a DFA is easy, but the number of states can grow
30ea6b01db46 updated Christian Urban <christian.urban@kcl.ac.uk> parents: 698 diff changeset	1282	exponentially, even after minimisation. Say a NFA has $n$ states, then
30ea6b01db46 updated Christian Urban <christian.urban@kcl.ac.uk> parents: 698 diff changeset	1283	in the worst case the corresponding minimal DFA that can match the
30ea6b01db46 updated Christian Urban <christian.urban@kcl.ac.uk> parents: 698 diff changeset	1284	same language as the NFA might contain $2^n$ of states. Unfortunately
30ea6b01db46 updated Christian Urban <christian.urban@kcl.ac.uk> parents: 698 diff changeset	1285	in many interesting cases this worst case bound is the dominant
30ea6b01db46 updated Christian Urban <christian.urban@kcl.ac.uk> parents: 698 diff changeset	1286	factor.
30ea6b01db46 updated Christian Urban <christian.urban@kcl.ac.uk> parents: 698 diff changeset	1287
30ea6b01db46 updated Christian Urban <christian.urban@kcl.ac.uk> parents: 698 diff changeset	1288
492 882d5de18adc updated Christian Urban <urbanc@in.tum.de> parents: 491 diff changeset	1289	By the way, we are not bothering with implementing the above
667 6127e8992a5c updated Christian Urban <urbanc@in.tum.de> parents: 662 diff changeset	1290	minimisation algorithm: while up to now all the transformations used
492 882d5de18adc updated Christian Urban <urbanc@in.tum.de> parents: 491 diff changeset	1291	some clever composition of functions, the minimisation algorithm
882d5de18adc updated Christian Urban <urbanc@in.tum.de> parents: 491 diff changeset	1292	cannot be implemented by just composing some functions. For this we
882d5de18adc updated Christian Urban <urbanc@in.tum.de> parents: 491 diff changeset	1293	would require a more concrete representation of the transition
882d5de18adc updated Christian Urban <urbanc@in.tum.de> parents: 491 diff changeset	1294	function (like maps). If we did this, however, then many advantages of
882d5de18adc updated Christian Urban <urbanc@in.tum.de> parents: 491 diff changeset	1295	the functions would be thrown away. So the compromise is to not being
753 30ea6b01db46 updated Christian Urban <christian.urban@kcl.ac.uk> parents: 698 diff changeset	1296	able to minimise (easily) our DFAs. We want to use regular expressions
30ea6b01db46 updated Christian Urban <christian.urban@kcl.ac.uk> parents: 698 diff changeset	1297	directly anyway.
492 882d5de18adc updated Christian Urban <urbanc@in.tum.de> parents: 491 diff changeset	1298
490 8a07f7256f2a updated Christian Urban <urbanc@in.tum.de> parents: 489 diff changeset	1299	\subsection*{Brzozowski's Method}
269 83e6cb90216d updated Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 268 diff changeset	1300
495 acd4567735ce updated Christian Urban <urbanc@in.tum.de> parents: 492 diff changeset	1301	I know this handout is already a long, long rant: but after all it is
acd4567735ce updated Christian Urban <urbanc@in.tum.de> parents: 492 diff changeset	1302	a topic that has been researched for more than 60 years. If you
acd4567735ce updated Christian Urban <urbanc@in.tum.de> parents: 492 diff changeset	1303	reflect on what you have read so far, the story is that you can take a
acd4567735ce updated Christian Urban <urbanc@in.tum.de> parents: 492 diff changeset	1304	regular expression, translate it via the Thompson Construction into an
491 7a0182c66403 updated Christian Urban <urbanc@in.tum.de> parents: 490 diff changeset	1305	$\epsilon$NFA, then translate it into a NFA by removing all
7a0182c66403 updated Christian Urban <urbanc@in.tum.de> parents: 490 diff changeset	1306	$\epsilon$-transitions, and then via the subset construction obtain a
7a0182c66403 updated Christian Urban <urbanc@in.tum.de> parents: 490 diff changeset	1307	DFA. In all steps we made sure the language, or which strings can be
931 c9d6b50345d7 updated Christian Urban <christian.urban@kcl.ac.uk> parents: 925 diff changeset	1308	recognised, stays the same. Of couse we should have proved this in
495 acd4567735ce updated Christian Urban <urbanc@in.tum.de> parents: 492 diff changeset	1309	each step, but let us cut corners here. After the last section, we
acd4567735ce updated Christian Urban <urbanc@in.tum.de> parents: 492 diff changeset	1310	can even minimise the DFA (maybe not in code). But again we made sure
acd4567735ce updated Christian Urban <urbanc@in.tum.de> parents: 492 diff changeset	1311	the same language is recognised. You might be wondering: Can we go
acd4567735ce updated Christian Urban <urbanc@in.tum.de> parents: 492 diff changeset	1312	into the other direction? Can we go from a DFA and obtain a regular
acd4567735ce updated Christian Urban <urbanc@in.tum.de> parents: 492 diff changeset	1313	expression that can recognise the same language as the DFA?\medskip
491 7a0182c66403 updated Christian Urban <urbanc@in.tum.de> parents: 490 diff changeset	1314
7a0182c66403 updated Christian Urban <urbanc@in.tum.de> parents: 490 diff changeset	1315	\noindent
7a0182c66403 updated Christian Urban <urbanc@in.tum.de> parents: 490 diff changeset	1316	The answer is yes. Again there are several methods for calculating a
7a0182c66403 updated Christian Urban <urbanc@in.tum.de> parents: 490 diff changeset	1317	regular expression for a DFA. I will show you Brzozowski's method
7a0182c66403 updated Christian Urban <urbanc@in.tum.de> parents: 490 diff changeset	1318	because it calculates a regular expression using quite familiar
7a0182c66403 updated Christian Urban <urbanc@in.tum.de> parents: 490 diff changeset	1319	transformations for solving equational systems. Consider the DFA:
292 7ed2a25dd115 updated Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 270 diff changeset	1320
7ed2a25dd115 updated Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 270 diff changeset	1321	\begin{center}
7ed2a25dd115 updated Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 270 diff changeset	1322	\begin{tikzpicture}[scale=1.5,>=stealth',very thick,auto,
7ed2a25dd115 updated Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 270 diff changeset	1323	every state/.style={minimum size=0pt,
7ed2a25dd115 updated Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 270 diff changeset	1324	inner sep=2pt,draw=blue!50,very thick,
7ed2a25dd115 updated Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 270 diff changeset	1325	fill=blue!20}]
482 74149519e436 updated Christian Urban <urbanc@in.tum.de> parents: 480 diff changeset	1326	\node[state, initial] (q0) at ( 0,1) {$Q_0$};
74149519e436 updated Christian Urban <urbanc@in.tum.de> parents: 480 diff changeset	1327	\node[state] (q1) at ( 1,1) {$Q_1$};
74149519e436 updated Christian Urban <urbanc@in.tum.de> parents: 480 diff changeset	1328	\node[state, accepting] (q2) at ( 2,1) {$Q_2$};
292 7ed2a25dd115 updated Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 270 diff changeset	1329	\path[->] (q0) edge[bend left] node[above] {$a$} (q1)
7ed2a25dd115 updated Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 270 diff changeset	1330	(q1) edge[bend left] node[above] {$b$} (q0)
7ed2a25dd115 updated Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 270 diff changeset	1331	(q2) edge[bend left=50] node[below] {$b$} (q0)
7ed2a25dd115 updated Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 270 diff changeset	1332	(q1) edge node[above] {$a$} (q2)
7ed2a25dd115 updated Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 270 diff changeset	1333	(q2) edge [loop right] node {$a$} ()
7ed2a25dd115 updated Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 270 diff changeset	1334	(q0) edge [loop below] node {$b$} ();
7ed2a25dd115 updated Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 270 diff changeset	1335	\end{tikzpicture}
7ed2a25dd115 updated Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 270 diff changeset	1336	\end{center}
7ed2a25dd115 updated Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 270 diff changeset	1337
7ed2a25dd115 updated Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 270 diff changeset	1338	\noindent for which we can set up the following equational
7ed2a25dd115 updated Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 270 diff changeset	1339	system
7ed2a25dd115 updated Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 270 diff changeset	1340
7ed2a25dd115 updated Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 270 diff changeset	1341	\begin{eqnarray}
482 74149519e436 updated Christian Urban <urbanc@in.tum.de> parents: 480 diff changeset	1342	Q_0 & = & \ONE + Q_0\,b + Q_1\,b + Q_2\,b\\
74149519e436 updated Christian Urban <urbanc@in.tum.de> parents: 480 diff changeset	1343	Q_1 & = & Q_0\,a\\
74149519e436 updated Christian Urban <urbanc@in.tum.de> parents: 480 diff changeset	1344	Q_2 & = & Q_1\,a + Q_2\,a
292 7ed2a25dd115 updated Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 270 diff changeset	1345	\end{eqnarray}
7ed2a25dd115 updated Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 270 diff changeset	1346
7ed2a25dd115 updated Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 270 diff changeset	1347	\noindent There is an equation for each node in the DFA. Let
7ed2a25dd115 updated Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 270 diff changeset	1348	us have a look how the right-hand sides of the equations are
7ed2a25dd115 updated Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 270 diff changeset	1349	constructed. First have a look at the second equation: the
482 74149519e436 updated Christian Urban <urbanc@in.tum.de> parents: 480 diff changeset	1350	left-hand side is $Q_1$ and the right-hand side $Q_0\,a$. The
292 7ed2a25dd115 updated Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 270 diff changeset	1351	right-hand side is essentially all possible ways how to end up
482 74149519e436 updated Christian Urban <urbanc@in.tum.de> parents: 480 diff changeset	1352	in node $Q_1$. There is only one incoming edge from $Q_0$ consuming
322 698ed1c96cd0 updated Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 318 diff changeset	1353	an $a$. Therefore the right hand side is this
482 74149519e436 updated Christian Urban <urbanc@in.tum.de> parents: 480 diff changeset	1354	state followed by character---in this case $Q_0\,a$. Now lets
292 7ed2a25dd115 updated Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 270 diff changeset	1355	have a look at the third equation: there are two incoming
482 74149519e436 updated Christian Urban <urbanc@in.tum.de> parents: 480 diff changeset	1356	edges for $Q_2$. Therefore we have two terms, namely $Q_1\,a$ and
74149519e436 updated Christian Urban <urbanc@in.tum.de> parents: 480 diff changeset	1357	$Q_2\,a$. These terms are separated by $+$. The first states
74149519e436 updated Christian Urban <urbanc@in.tum.de> parents: 480 diff changeset	1358	that if in state $Q_1$ consuming an $a$ will bring you to
485 21dec9df46ba updated Christian Urban <urbanc@in.tum.de> parents: 484 diff changeset	1359	$Q_2$, and the second that being in $Q_2$ and consuming an $a$
482 74149519e436 updated Christian Urban <urbanc@in.tum.de> parents: 480 diff changeset	1360	will make you stay in $Q_2$. The right-hand side of the
292 7ed2a25dd115 updated Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 270 diff changeset	1361	first equation is constructed similarly: there are three
7ed2a25dd115 updated Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 270 diff changeset	1362	incoming edges, therefore there are three terms. There is
444 3056a4c071b0 updated Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 349 diff changeset	1363	one exception in that we also ``add'' $\ONE$ to the
292 7ed2a25dd115 updated Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 270 diff changeset	1364	first equation, because it corresponds to the starting state
7ed2a25dd115 updated Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 270 diff changeset	1365	in the DFA.
7ed2a25dd115 updated Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 270 diff changeset	1366
7ed2a25dd115 updated Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 270 diff changeset	1367	Having constructed the equational system, the question is
7ed2a25dd115 updated Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 270 diff changeset	1368	how to solve it? Remarkably the rules are very similar to
7ed2a25dd115 updated Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 270 diff changeset	1369	solving usual linear equational systems. For example the
482 74149519e436 updated Christian Urban <urbanc@in.tum.de> parents: 480 diff changeset	1370	second equation does not contain the variable $Q_1$ on the
292 7ed2a25dd115 updated Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 270 diff changeset	1371	right-hand side of the equation. We can therefore eliminate
482 74149519e436 updated Christian Urban <urbanc@in.tum.de> parents: 480 diff changeset	1372	$Q_1$ from the system by just substituting this equation
292 7ed2a25dd115 updated Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 270 diff changeset	1373	into the other two. This gives
7ed2a25dd115 updated Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 270 diff changeset	1374
7ed2a25dd115 updated Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 270 diff changeset	1375	\begin{eqnarray}
482 74149519e436 updated Christian Urban <urbanc@in.tum.de> parents: 480 diff changeset	1376	Q_0 & = & \ONE + Q_0\,b + Q_0\,a\,b + Q_2\,b\\
74149519e436 updated Christian Urban <urbanc@in.tum.de> parents: 480 diff changeset	1377	Q_2 & = & Q_0\,a\,a + Q_2\,a
292 7ed2a25dd115 updated Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 270 diff changeset	1378	\end{eqnarray}
7ed2a25dd115 updated Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 270 diff changeset	1379
698 eed94d5780c5 updated Christian Urban <urbanc@in.tum.de> parents: 667 diff changeset	1380	\noindent where in Equation (6) we have two occurrences
482 74149519e436 updated Christian Urban <urbanc@in.tum.de> parents: 480 diff changeset	1381	of $Q_0$. Like the laws about $+$ and $\cdot$, we can simplify
698 eed94d5780c5 updated Christian Urban <urbanc@in.tum.de> parents: 667 diff changeset	1382	Equation (6) to obtain the following two equations:
292 7ed2a25dd115 updated Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 270 diff changeset	1383
7ed2a25dd115 updated Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 270 diff changeset	1384	\begin{eqnarray}
482 74149519e436 updated Christian Urban <urbanc@in.tum.de> parents: 480 diff changeset	1385	Q_0 & = & \ONE + Q_0\,(b + a\,b) + Q_2\,b\\
74149519e436 updated Christian Urban <urbanc@in.tum.de> parents: 480 diff changeset	1386	Q_2 & = & Q_0\,a\,a + Q_2\,a
292 7ed2a25dd115 updated Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 270 diff changeset	1387	\end{eqnarray}
7ed2a25dd115 updated Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 270 diff changeset	1388
7ed2a25dd115 updated Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 270 diff changeset	1389	\noindent Unfortunately we cannot make any more progress with
578 e3cb8adb2a92 updated Christian Urban <urbanc@in.tum.de> parents: 573 diff changeset	1390	substituting equations, because both (8) and (9) contain the
292 7ed2a25dd115 updated Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 270 diff changeset	1391	variable on the left-hand side also on the right-hand side.
7ed2a25dd115 updated Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 270 diff changeset	1392	Here we need to now use a law that is different from the usual
349 434891622131 updated Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 344 diff changeset	1393	laws about linear equations. It is called \emph{Arden's rule}.
434891622131 updated Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 344 diff changeset	1394	It states that if an equation is of the form $q = q\,r + s$
434891622131 updated Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 344 diff changeset	1395	then it can be transformed to $q = s\, r^*$. Since we can
578 e3cb8adb2a92 updated Christian Urban <urbanc@in.tum.de> parents: 573 diff changeset	1396	assume $+$ is symmetric, Equation (9) is of that form: $s$ is
482 74149519e436 updated Christian Urban <urbanc@in.tum.de> parents: 480 diff changeset	1397	$Q_0\,a\,a$ and $r$ is $a$. That means we can transform
578 e3cb8adb2a92 updated Christian Urban <urbanc@in.tum.de> parents: 573 diff changeset	1398	(9) to obtain the two new equations
292 7ed2a25dd115 updated Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 270 diff changeset	1399
7ed2a25dd115 updated Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 270 diff changeset	1400	\begin{eqnarray}
482 74149519e436 updated Christian Urban <urbanc@in.tum.de> parents: 480 diff changeset	1401	Q_0 & = & \ONE + Q_0\,(b + a\,b) + Q_2\,b\\
74149519e436 updated Christian Urban <urbanc@in.tum.de> parents: 480 diff changeset	1402	Q_2 & = & Q_0\,a\,a\,(a^*)
292 7ed2a25dd115 updated Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 270 diff changeset	1403	\end{eqnarray}
7ed2a25dd115 updated Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 270 diff changeset	1404
7ed2a25dd115 updated Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 270 diff changeset	1405	\noindent Now again we can substitute the second equation into
482 74149519e436 updated Christian Urban <urbanc@in.tum.de> parents: 480 diff changeset	1406	the first in order to eliminate the variable $Q_2$.
292 7ed2a25dd115 updated Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 270 diff changeset	1407
7ed2a25dd115 updated Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 270 diff changeset	1408	\begin{eqnarray}
482 74149519e436 updated Christian Urban <urbanc@in.tum.de> parents: 480 diff changeset	1409	Q_0 & = & \ONE + Q_0\,(b + a\,b) + Q_0\,a\,a\,(a^*)\,b
292 7ed2a25dd115 updated Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 270 diff changeset	1410	\end{eqnarray}
7ed2a25dd115 updated Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 270 diff changeset	1411
482 74149519e436 updated Christian Urban <urbanc@in.tum.de> parents: 480 diff changeset	1412	\noindent Pulling $Q_0$ out as a single factor gives:
292 7ed2a25dd115 updated Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 270 diff changeset	1413
7ed2a25dd115 updated Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 270 diff changeset	1414	\begin{eqnarray}
482 74149519e436 updated Christian Urban <urbanc@in.tum.de> parents: 480 diff changeset	1415	Q_0 & = & \ONE + Q_0\,(b + a\,b + a\,a\,(a^*)\,b)
292 7ed2a25dd115 updated Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 270 diff changeset	1416	\end{eqnarray}
7ed2a25dd115 updated Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 270 diff changeset	1417
7ed2a25dd115 updated Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 270 diff changeset	1418	\noindent This equation is again of the form so that we can
7ed2a25dd115 updated Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 270 diff changeset	1419	apply Arden's rule ($r$ is $b + a\,b + a\,a\,(a^*)\,b$ and $s$
482 74149519e436 updated Christian Urban <urbanc@in.tum.de> parents: 480 diff changeset	1420	is $\ONE$). This gives as solution for $Q_0$ the following
292 7ed2a25dd115 updated Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 270 diff changeset	1421	regular expression:
7ed2a25dd115 updated Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 270 diff changeset	1422
7ed2a25dd115 updated Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 270 diff changeset	1423	\begin{eqnarray}
482 74149519e436 updated Christian Urban <urbanc@in.tum.de> parents: 480 diff changeset	1424	Q_0 & = & \ONE\,(b + a\,b + a\,a\,(a^)\,b)^
292 7ed2a25dd115 updated Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 270 diff changeset	1425	\end{eqnarray}
7ed2a25dd115 updated Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 270 diff changeset	1426
349 434891622131 updated Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 344 diff changeset	1427	\noindent Since this is a regular expression, we can simplify
444 3056a4c071b0 updated Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 349 diff changeset	1428	away the $\ONE$ to obtain the slightly simpler regular
292 7ed2a25dd115 updated Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 270 diff changeset	1429	expression
7ed2a25dd115 updated Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 270 diff changeset	1430
7ed2a25dd115 updated Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 270 diff changeset	1431	\begin{eqnarray}
482 74149519e436 updated Christian Urban <urbanc@in.tum.de> parents: 480 diff changeset	1432	Q_0 & = & (b + a\,b + a\,a\,(a^)\,b)^
292 7ed2a25dd115 updated Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 270 diff changeset	1433	\end{eqnarray}
7ed2a25dd115 updated Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 270 diff changeset	1434
7ed2a25dd115 updated Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 270 diff changeset	1435	\noindent
7ed2a25dd115 updated Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 270 diff changeset	1436	Now we can unwind this process and obtain the solutions
7ed2a25dd115 updated Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 270 diff changeset	1437	for the other equations. This gives:
7ed2a25dd115 updated Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 270 diff changeset	1438
7ed2a25dd115 updated Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 270 diff changeset	1439	\begin{eqnarray}
482 74149519e436 updated Christian Urban <urbanc@in.tum.de> parents: 480 diff changeset	1440	Q_0 & = & (b + a\,b + a\,a\,(a^)\,b)^\\
74149519e436 updated Christian Urban <urbanc@in.tum.de> parents: 480 diff changeset	1441	Q_1 & = & (b + a\,b + a\,a\,(a^)\,b)^\,a\\
74149519e436 updated Christian Urban <urbanc@in.tum.de> parents: 480 diff changeset	1442	Q_2 & = & (b + a\,b + a\,a\,(a^)\,b)^\,a\,a\,(a)^*
292 7ed2a25dd115 updated Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 270 diff changeset	1443	\end{eqnarray}
7ed2a25dd115 updated Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 270 diff changeset	1444
7ed2a25dd115 updated Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 270 diff changeset	1445	\noindent Finally, we only need to ``add'' up the equations
7ed2a25dd115 updated Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 270 diff changeset	1446	which correspond to a terminal state. In our running example,
482 74149519e436 updated Christian Urban <urbanc@in.tum.de> parents: 480 diff changeset	1447	this is just $Q_2$. Consequently, a regular expression
491 7a0182c66403 updated Christian Urban <urbanc@in.tum.de> parents: 490 diff changeset	1448	that recognises the same language as the DFA is
292 7ed2a25dd115 updated Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 270 diff changeset	1449
7ed2a25dd115 updated Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 270 diff changeset	1450	\[
7ed2a25dd115 updated Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 270 diff changeset	1451	(b + a\,b + a\,a\,(a^)\,b)^\,a\,a\,(a)^*
7ed2a25dd115 updated Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 270 diff changeset	1452	\]
7ed2a25dd115 updated Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 270 diff changeset	1453
491 7a0182c66403 updated Christian Urban <urbanc@in.tum.de> parents: 490 diff changeset	1454	\noindent You can somewhat crosscheck your solution by taking a string
931 c9d6b50345d7 updated Christian Urban <christian.urban@kcl.ac.uk> parents: 925 diff changeset	1455	the regular expression can match and see whether it can be matched
491 7a0182c66403 updated Christian Urban <urbanc@in.tum.de> parents: 490 diff changeset	1456	by the DFA. One string for example is $aaa$ and \emph{voila} this
292 7ed2a25dd115 updated Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 270 diff changeset	1457	string is also matched by the automaton.
7ed2a25dd115 updated Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 270 diff changeset	1458
491 7a0182c66403 updated Christian Urban <urbanc@in.tum.de> parents: 490 diff changeset	1459	We should prove that Brzozowski's method really produces an equivalent
7a0182c66403 updated Christian Urban <urbanc@in.tum.de> parents: 490 diff changeset	1460	regular expression. But for the purposes of this module, we omit
7a0182c66403 updated Christian Urban <urbanc@in.tum.de> parents: 490 diff changeset	1461	this. I guess you are relieved.
269 83e6cb90216d updated Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 268 diff changeset	1462
83e6cb90216d updated Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 268 diff changeset	1463
490 8a07f7256f2a updated Christian Urban <urbanc@in.tum.de> parents: 489 diff changeset	1464	\subsection*{Regular Languages}
269 83e6cb90216d updated Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 268 diff changeset	1465
83e6cb90216d updated Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 268 diff changeset	1466	Given the constructions in the previous sections we obtain
349 434891622131 updated Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 344 diff changeset	1467	the following overall picture:
269 83e6cb90216d updated Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 268 diff changeset	1468
83e6cb90216d updated Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 268 diff changeset	1469	\begin{center}
83e6cb90216d updated Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 268 diff changeset	1470	\begin{tikzpicture}
83e6cb90216d updated Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 268 diff changeset	1471	\node (rexp) {\bf Regexps};
83e6cb90216d updated Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 268 diff changeset	1472	\node (nfa) [right=of rexp] {\bf NFAs};
83e6cb90216d updated Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 268 diff changeset	1473	\node (dfa) [right=of nfa] {\bf DFAs};
83e6cb90216d updated Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 268 diff changeset	1474	\node (mdfa) [right=of dfa] {\bf\begin{tabular}{c}minimal\\ DFAs\end{tabular}};
83e6cb90216d updated Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 268 diff changeset	1475	\path[->,line width=1mm] (rexp) edge node [above=4mm, black] {\begin{tabular}{c@{\hspace{9mm}}}Thompson's\\[-1mm] construction\end{tabular}} (nfa);
83e6cb90216d updated Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 268 diff changeset	1476	\path[->,line width=1mm] (nfa) edge node [above=4mm, black] {\begin{tabular}{c}subset\\[-1mm] construction\end{tabular}}(dfa);
83e6cb90216d updated Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 268 diff changeset	1477	\path[->,line width=1mm] (dfa) edge node [below=5mm, black] {minimisation} (mdfa);
344 408fd5994288 updated Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 333 diff changeset	1478	\path[->,line width=1mm] (dfa) edge [bend left=45] node [below] {\begin{tabular}{l}Brzozowski's\\ method\end{tabular}} (rexp);
269 83e6cb90216d updated Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 268 diff changeset	1479	\end{tikzpicture}
83e6cb90216d updated Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 268 diff changeset	1480	\end{center}
83e6cb90216d updated Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 268 diff changeset	1481
83e6cb90216d updated Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 268 diff changeset	1482	\noindent By going from regular expressions over NFAs to DFAs,
83e6cb90216d updated Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 268 diff changeset	1483	we can always ensure that for every regular expression there
349 434891622131 updated Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 344 diff changeset	1484	exists a NFA and a DFA that can recognise the same language.
269 83e6cb90216d updated Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 268 diff changeset	1485	Although we did not prove this fact. Similarly by going from
83e6cb90216d updated Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 268 diff changeset	1486	DFAs to regular expressions, we can make sure for every DFA
83e6cb90216d updated Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 268 diff changeset	1487	there exists a regular expression that can recognise the same
83e6cb90216d updated Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 268 diff changeset	1488	language. Again we did not prove this fact.
83e6cb90216d updated Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 268 diff changeset	1489
491 7a0182c66403 updated Christian Urban <urbanc@in.tum.de> parents: 490 diff changeset	1490	The fundamental conclusion we can draw is that automata and regular
269 83e6cb90216d updated Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 268 diff changeset	1491	expressions can recognise the same set of languages:
83e6cb90216d updated Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 268 diff changeset	1492
83e6cb90216d updated Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 268 diff changeset	1493	\begin{quote} A language is \emph{regular} iff there exists a
83e6cb90216d updated Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 268 diff changeset	1494	regular expression that recognises all its strings.
83e6cb90216d updated Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 268 diff changeset	1495	\end{quote}
83e6cb90216d updated Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 268 diff changeset	1496
83e6cb90216d updated Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 268 diff changeset	1497	\noindent or equivalently
83e6cb90216d updated Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 268 diff changeset	1498
83e6cb90216d updated Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 268 diff changeset	1499	\begin{quote} A language is \emph{regular} iff there exists an
83e6cb90216d updated Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 268 diff changeset	1500	automaton that recognises all its strings.
83e6cb90216d updated Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 268 diff changeset	1501	\end{quote}
268 18bef085a7ca updated Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 251 diff changeset	1502
698 eed94d5780c5 updated Christian Urban <urbanc@in.tum.de> parents: 667 diff changeset	1503	\noindent Note that this is not a statement for a particular language
491 7a0182c66403 updated Christian Urban <urbanc@in.tum.de> parents: 490 diff changeset	1504	(that is a particular set of strings), but about a large class of
7a0182c66403 updated Christian Urban <urbanc@in.tum.de> parents: 490 diff changeset	1505	languages, namely the regular ones.
268 18bef085a7ca updated Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 251 diff changeset	1506
491 7a0182c66403 updated Christian Urban <urbanc@in.tum.de> parents: 490 diff changeset	1507	As a consequence for deciding whether a string is recognised by a
7a0182c66403 updated Christian Urban <urbanc@in.tum.de> parents: 490 diff changeset	1508	regular expression, we could use our algorithm based on derivatives or
7a0182c66403 updated Christian Urban <urbanc@in.tum.de> parents: 490 diff changeset	1509	NFAs or DFAs. But let us quickly look at what the differences mean in
7a0182c66403 updated Christian Urban <urbanc@in.tum.de> parents: 490 diff changeset	1510	computational terms. Translating a regular expression into a NFA gives
7a0182c66403 updated Christian Urban <urbanc@in.tum.de> parents: 490 diff changeset	1511	us an automaton that has $O(n)$ states---that means the size of the
7a0182c66403 updated Christian Urban <urbanc@in.tum.de> parents: 490 diff changeset	1512	NFA grows linearly with the size of the regular expression. The
7a0182c66403 updated Christian Urban <urbanc@in.tum.de> parents: 490 diff changeset	1513	problem with NFAs is that the problem of deciding whether a string is
7a0182c66403 updated Christian Urban <urbanc@in.tum.de> parents: 490 diff changeset	1514	accepted or not is computationally not cheap. Remember with NFAs we
7a0182c66403 updated Christian Urban <urbanc@in.tum.de> parents: 490 diff changeset	1515	have potentially many next states even for the same input and also
7a0182c66403 updated Christian Urban <urbanc@in.tum.de> parents: 490 diff changeset	1516	have the silent $\epsilon$-transitions. If we want to find a path from
7a0182c66403 updated Christian Urban <urbanc@in.tum.de> parents: 490 diff changeset	1517	the starting state of a NFA to an accepting state, we need to consider
7a0182c66403 updated Christian Urban <urbanc@in.tum.de> parents: 490 diff changeset	1518	all possibilities. In Ruby, Python and Java this is done by a
7a0182c66403 updated Christian Urban <urbanc@in.tum.de> parents: 490 diff changeset	1519	depth-first search, which in turn means that if a ``wrong'' choice is
7a0182c66403 updated Christian Urban <urbanc@in.tum.de> parents: 490 diff changeset	1520	made, the algorithm has to backtrack and thus explore all potential
7a0182c66403 updated Christian Urban <urbanc@in.tum.de> parents: 490 diff changeset	1521	candidates. This is exactly the reason why Ruby, Python and Java are
7a0182c66403 updated Christian Urban <urbanc@in.tum.de> parents: 490 diff changeset	1522	so slow for evil regular expressions. An alternative to the
7a0182c66403 updated Christian Urban <urbanc@in.tum.de> parents: 490 diff changeset	1523	potentially slow depth-first search is to explore the search space in
7a0182c66403 updated Christian Urban <urbanc@in.tum.de> parents: 490 diff changeset	1524	a breadth-first fashion, but this might incur a big memory penalty.
7a0182c66403 updated Christian Urban <urbanc@in.tum.de> parents: 490 diff changeset	1525
7a0182c66403 updated Christian Urban <urbanc@in.tum.de> parents: 490 diff changeset	1526	To avoid the problems with NFAs, we can translate them into DFAs. With
7a0182c66403 updated Christian Urban <urbanc@in.tum.de> parents: 490 diff changeset	1527	DFAs the problem of deciding whether a string is recognised or not is
7a0182c66403 updated Christian Urban <urbanc@in.tum.de> parents: 490 diff changeset	1528	much simpler, because in each state it is completely determined what
7a0182c66403 updated Christian Urban <urbanc@in.tum.de> parents: 490 diff changeset	1529	the next state will be for a given input. So no search is needed. The
7a0182c66403 updated Christian Urban <urbanc@in.tum.de> parents: 490 diff changeset	1530	problem with this is that the translation to DFAs can explode
7a0182c66403 updated Christian Urban <urbanc@in.tum.de> parents: 490 diff changeset	1531	exponentially the number of states. Therefore when this route is
7a0182c66403 updated Christian Urban <urbanc@in.tum.de> parents: 490 diff changeset	1532	taken, we definitely need to minimise the resulting DFAs in order to
7a0182c66403 updated Christian Urban <urbanc@in.tum.de> parents: 490 diff changeset	1533	have an acceptable memory and runtime behaviour. But remember the
7a0182c66403 updated Christian Urban <urbanc@in.tum.de> parents: 490 diff changeset	1534	subset construction in the worst case explodes the number of states by
7a0182c66403 updated Christian Urban <urbanc@in.tum.de> parents: 490 diff changeset	1535	$2^n$. Effectively also the translation to DFAs can incur a big
925 ff202426ec47 updated Christian Urban <christian.urban@kcl.ac.uk> parents: 874 diff changeset	1536	runtime penalty.\footnote{Therefore the clever people in Rust try to
ff202426ec47 updated Christian Urban <christian.urban@kcl.ac.uk> parents: 874 diff changeset	1537	\emph{not} do such calculations upfront, but rather delay them and
ff202426ec47 updated Christian Urban <christian.urban@kcl.ac.uk> parents: 874 diff changeset	1538	in this way can avoid much of the penalties\ldots{}in many practical
ff202426ec47 updated Christian Urban <christian.urban@kcl.ac.uk> parents: 874 diff changeset	1539	relevant places. As a result, they make the extraordinary claim that
ff202426ec47 updated Christian Urban <christian.urban@kcl.ac.uk> parents: 874 diff changeset	1540	their time complexity is in the worst case $O(m \times n)$ where $m$
ff202426ec47 updated Christian Urban <christian.urban@kcl.ac.uk> parents: 874 diff changeset	1541	is proportional to the size of the regex and $n$ is proportional to
ff202426ec47 updated Christian Urban <christian.urban@kcl.ac.uk> parents: 874 diff changeset	1542	the size of strings. Does this claim hold water?}
269 83e6cb90216d updated Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 268 diff changeset	1543
491 7a0182c66403 updated Christian Urban <urbanc@in.tum.de> parents: 490 diff changeset	1544	But this does not mean that everything is bad with automata. Recall
7a0182c66403 updated Christian Urban <urbanc@in.tum.de> parents: 490 diff changeset	1545	the problem of finding a regular expressions for the language that is
7a0182c66403 updated Christian Urban <urbanc@in.tum.de> parents: 490 diff changeset	1546	\emph{not} recognised by a regular expression. In our implementation
7a0182c66403 updated Christian Urban <urbanc@in.tum.de> parents: 490 diff changeset	1547	we added explicitly such a regular expressions because they are useful
7a0182c66403 updated Christian Urban <urbanc@in.tum.de> parents: 490 diff changeset	1548	for recognising comments. But in principle we did not need to. The
7a0182c66403 updated Christian Urban <urbanc@in.tum.de> parents: 490 diff changeset	1549	argument for this is as follows: take a regular expression, translate
349 434891622131 updated Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 344 diff changeset	1550	it into a NFA and then a DFA that both recognise the same
491 7a0182c66403 updated Christian Urban <urbanc@in.tum.de> parents: 490 diff changeset	1551	language. Once you have the DFA it is very easy to construct the
7a0182c66403 updated Christian Urban <urbanc@in.tum.de> parents: 490 diff changeset	1552	automaton for the language not recognised by a DFA. If the DFA is
7a0182c66403 updated Christian Urban <urbanc@in.tum.de> parents: 490 diff changeset	1553	completed (this is important!), then you just need to exchange the
7a0182c66403 updated Christian Urban <urbanc@in.tum.de> parents: 490 diff changeset	1554	accepting and non-accepting states. You can then translate this DFA
7a0182c66403 updated Christian Urban <urbanc@in.tum.de> parents: 490 diff changeset	1555	back into a regular expression and that will be the regular expression
7a0182c66403 updated Christian Urban <urbanc@in.tum.de> parents: 490 diff changeset	1556	that can match all strings the original regular expression could
7a0182c66403 updated Christian Urban <urbanc@in.tum.de> parents: 490 diff changeset	1557	\emph{not} match.
268 18bef085a7ca updated Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 251 diff changeset	1558
491 7a0182c66403 updated Christian Urban <urbanc@in.tum.de> parents: 490 diff changeset	1559	It is also interesting that not all languages are regular. The most
7a0182c66403 updated Christian Urban <urbanc@in.tum.de> parents: 490 diff changeset	1560	well-known example of a language that is not regular consists of all
7a0182c66403 updated Christian Urban <urbanc@in.tum.de> parents: 490 diff changeset	1561	the strings of the form
292 7ed2a25dd115 updated Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 270 diff changeset	1562
7ed2a25dd115 updated Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 270 diff changeset	1563	\[a^n\,b^n\]
7ed2a25dd115 updated Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 270 diff changeset	1564
491 7a0182c66403 updated Christian Urban <urbanc@in.tum.de> parents: 490 diff changeset	1565	\noindent meaning strings that have the same number of $a$s and
7a0182c66403 updated Christian Urban <urbanc@in.tum.de> parents: 490 diff changeset	1566	$b$s. You can try, but you cannot find a regular expression for this
7a0182c66403 updated Christian Urban <urbanc@in.tum.de> parents: 490 diff changeset	1567	language and also not an automaton. One can actually prove that there
7a0182c66403 updated Christian Urban <urbanc@in.tum.de> parents: 490 diff changeset	1568	is no regular expression nor automaton for this language, but again
7a0182c66403 updated Christian Urban <urbanc@in.tum.de> parents: 490 diff changeset	1569	that would lead us too far afield for what we want to do in this
7a0182c66403 updated Christian Urban <urbanc@in.tum.de> parents: 490 diff changeset	1570	module.
270 4dbeaf43031d updated Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 269 diff changeset	1571
492 882d5de18adc updated Christian Urban <urbanc@in.tum.de> parents: 491 diff changeset	1572
882d5de18adc updated Christian Urban <urbanc@in.tum.de> parents: 491 diff changeset	1573	\subsection*{Where Have Derivatives Gone?}
882d5de18adc updated Christian Urban <urbanc@in.tum.de> parents: 491 diff changeset	1574
764 9d40619bc503 updated Christian Urban <christian.urban@kcl.ac.uk> parents: 753 diff changeset	1575	%%Still to be done\bigskip
9d40619bc503 updated Christian Urban <christian.urban@kcl.ac.uk> parents: 753 diff changeset	1576
9d40619bc503 updated Christian Urban <christian.urban@kcl.ac.uk> parents: 753 diff changeset	1577	\noindent
9d40619bc503 updated Christian Urban <christian.urban@kcl.ac.uk> parents: 753 diff changeset	1578	By now you are probably fed up with this text. It is now way too long
9d40619bc503 updated Christian Urban <christian.urban@kcl.ac.uk> parents: 753 diff changeset	1579	for one lecture, but there is still one aspect of the
9d40619bc503 updated Christian Urban <christian.urban@kcl.ac.uk> parents: 753 diff changeset	1580	automata-regular-expression-connection I like to describe:\medskip
518 54632be1b873 updated cu parents: 497 diff changeset	1581
764 9d40619bc503 updated Christian Urban <christian.urban@kcl.ac.uk> parents: 753 diff changeset	1582	\noindent
9d40619bc503 updated Christian Urban <christian.urban@kcl.ac.uk> parents: 753 diff changeset	1583	Where have the derivatives gone? Did we just forget them? Well, they
9d40619bc503 updated Christian Urban <christian.urban@kcl.ac.uk> parents: 753 diff changeset	1584	actually do play a role of generating a DFA from a regular expression.
9d40619bc503 updated Christian Urban <christian.urban@kcl.ac.uk> parents: 753 diff changeset	1585	And we can also see this in our implementation\ldots{}because there is
9d40619bc503 updated Christian Urban <christian.urban@kcl.ac.uk> parents: 753 diff changeset	1586	one flaw in our representation of automata and transitions as partial
874 c3d78e7b731c updated Christian Urban <christian.urban@kcl.ac.uk> parents: 764 diff changeset	1587	functions....remember I said something about fishy things.
c3d78e7b731c updated Christian Urban <christian.urban@kcl.ac.uk> parents: 764 diff changeset	1588	Namely, we can represent automata with infinite states, which is
764 9d40619bc503 updated Christian Urban <christian.urban@kcl.ac.uk> parents: 753 diff changeset	1589	actually forbidden by the definition of what an automaton is. We can
9d40619bc503 updated Christian Urban <christian.urban@kcl.ac.uk> parents: 753 diff changeset	1590	exploit this flaw as follows: Suppose our alphabet consists of the
925 ff202426ec47 updated Christian Urban <christian.urban@kcl.ac.uk> parents: 874 diff changeset	1591	characters $a$ to $z$. Then we can generate an ``automaton''
ff202426ec47 updated Christian Urban <christian.urban@kcl.ac.uk> parents: 874 diff changeset	1592	(it is not really one because it has infinitely many states) by taking
764 9d40619bc503 updated Christian Urban <christian.urban@kcl.ac.uk> parents: 753 diff changeset	1593	as starting state the regular expression $r$ for which we
9d40619bc503 updated Christian Urban <christian.urban@kcl.ac.uk> parents: 753 diff changeset	1594	want to generate an automaton. There are $n$ next-states which
925 ff202426ec47 updated Christian Urban <christian.urban@kcl.ac.uk> parents: 874 diff changeset	1595	corresponds to the derivatives of $r$ according to $a$ to $z$.
764 9d40619bc503 updated Christian Urban <christian.urban@kcl.ac.uk> parents: 753 diff changeset	1596	Implementing this in our slightly ``flawed'' representation is
9d40619bc503 updated Christian Urban <christian.urban@kcl.ac.uk> parents: 753 diff changeset	1597	not too difficult. This will give a picture for the ``automaton''
9d40619bc503 updated Christian Urban <christian.urban@kcl.ac.uk> parents: 753 diff changeset	1598	looking something like this, except that it extends infinitely
9d40619bc503 updated Christian Urban <christian.urban@kcl.ac.uk> parents: 753 diff changeset	1599	far to the right:
9d40619bc503 updated Christian Urban <christian.urban@kcl.ac.uk> parents: 753 diff changeset	1600
492 882d5de18adc updated Christian Urban <urbanc@in.tum.de> parents: 491 diff changeset	1601
764 9d40619bc503 updated Christian Urban <christian.urban@kcl.ac.uk> parents: 753 diff changeset	1602	\begin{center}
9d40619bc503 updated Christian Urban <christian.urban@kcl.ac.uk> parents: 753 diff changeset	1603	\begin{tikzpicture}
9d40619bc503 updated Christian Urban <christian.urban@kcl.ac.uk> parents: 753 diff changeset	1604	[level distance=25mm,very thick,auto,
925 ff202426ec47 updated Christian Urban <christian.urban@kcl.ac.uk> parents: 874 diff changeset	1605	level 1/.style={sibling distance=10mm},
764 9d40619bc503 updated Christian Urban <christian.urban@kcl.ac.uk> parents: 753 diff changeset	1606	level 2/.style={sibling distance=15mm},
9d40619bc503 updated Christian Urban <christian.urban@kcl.ac.uk> parents: 753 diff changeset	1607	every node/.style={minimum size=30pt,
9d40619bc503 updated Christian Urban <christian.urban@kcl.ac.uk> parents: 753 diff changeset	1608	inner sep=0pt,circle,draw=blue!50,very thick,
9d40619bc503 updated Christian Urban <christian.urban@kcl.ac.uk> parents: 753 diff changeset	1609	fill=blue!20}]
925 ff202426ec47 updated Christian Urban <christian.urban@kcl.ac.uk> parents: 874 diff changeset	1610	\node {$r$} [grow=right] {
ff202426ec47 updated Christian Urban <christian.urban@kcl.ac.uk> parents: 874 diff changeset	1611	child[->] {node (c1) {$d_{z}$}
ff202426ec47 updated Christian Urban <christian.urban@kcl.ac.uk> parents: 874 diff changeset	1612	child { node {$dd_{zz}$}}
ff202426ec47 updated Christian Urban <christian.urban@kcl.ac.uk> parents: 874 diff changeset	1613	child { node {$dd_{za}$}}
764 9d40619bc503 updated Christian Urban <christian.urban@kcl.ac.uk> parents: 753 diff changeset	1614	}
925 ff202426ec47 updated Christian Urban <christian.urban@kcl.ac.uk> parents: 874 diff changeset	1615	child[->] {}
ff202426ec47 updated Christian Urban <christian.urban@kcl.ac.uk> parents: 874 diff changeset	1616	child[->] {}
ff202426ec47 updated Christian Urban <christian.urban@kcl.ac.uk> parents: 874 diff changeset	1617	child[->] {}
ff202426ec47 updated Christian Urban <christian.urban@kcl.ac.uk> parents: 874 diff changeset	1618	child[->] {node (cn) {$d_{a}$}
ff202426ec47 updated Christian Urban <christian.urban@kcl.ac.uk> parents: 874 diff changeset	1619	child { node {$dd_{az}$}}
ff202426ec47 updated Christian Urban <christian.urban@kcl.ac.uk> parents: 874 diff changeset	1620	child { node {$dd_{aa}$}}
ff202426ec47 updated Christian Urban <christian.urban@kcl.ac.uk> parents: 874 diff changeset	1621	}
ff202426ec47 updated Christian Urban <christian.urban@kcl.ac.uk> parents: 874 diff changeset	1622	};
ff202426ec47 updated Christian Urban <christian.urban@kcl.ac.uk> parents: 874 diff changeset	1623	\node[draw=none,fill=none] at (3,0.1) {\vdots};
ff202426ec47 updated Christian Urban <christian.urban@kcl.ac.uk> parents: 874 diff changeset	1624	\node[draw=none,fill=none] at (7,0.1) {\Large\ldots};
764 9d40619bc503 updated Christian Urban <christian.urban@kcl.ac.uk> parents: 753 diff changeset	1625	\end{tikzpicture}
9d40619bc503 updated Christian Urban <christian.urban@kcl.ac.uk> parents: 753 diff changeset	1626	\end{center}
9d40619bc503 updated Christian Urban <christian.urban@kcl.ac.uk> parents: 753 diff changeset	1627
9d40619bc503 updated Christian Urban <christian.urban@kcl.ac.uk> parents: 753 diff changeset	1628	\noindent
935 3fb9b05465dd updated Christian Urban <christian.urban@kcl.ac.uk> parents: 931 diff changeset	1629	You might want to implement this ``automaton''. What do you get?
764 9d40619bc503 updated Christian Urban <christian.urban@kcl.ac.uk> parents: 753 diff changeset	1630
9d40619bc503 updated Christian Urban <christian.urban@kcl.ac.uk> parents: 753 diff changeset	1631	While this makes all sense (modulo the flaw with the infinite states),
874 c3d78e7b731c updated Christian Urban <christian.urban@kcl.ac.uk> parents: 764 diff changeset	1632	does this automaton teach us anything new? The answer is no, because it
764 9d40619bc503 updated Christian Urban <christian.urban@kcl.ac.uk> parents: 753 diff changeset	1633	boils down to just another implementation of the Brzozowski
874 c3d78e7b731c updated Christian Urban <christian.urban@kcl.ac.uk> parents: 764 diff changeset	1634	algorithm from Lecture 2. There \emph{is} however something interesting
c3d78e7b731c updated Christian Urban <christian.urban@kcl.ac.uk> parents: 764 diff changeset	1635	in this construction
764 9d40619bc503 updated Christian Urban <christian.urban@kcl.ac.uk> parents: 753 diff changeset	1636	which Brzozowski already cleverly found out, because there is
9d40619bc503 updated Christian Urban <christian.urban@kcl.ac.uk> parents: 753 diff changeset	1637	a way to restrict the number of states to something finite.
874 c3d78e7b731c updated Christian Urban <christian.urban@kcl.ac.uk> parents: 764 diff changeset	1638	Meaning it would give us a real automaton.
925 ff202426ec47 updated Christian Urban <christian.urban@kcl.ac.uk> parents: 874 diff changeset	1639	However, this would lead us far, far away from what we want
935 3fb9b05465dd updated Christian Urban <christian.urban@kcl.ac.uk> parents: 931 diff changeset	1640	discuss here. The end.
764 9d40619bc503 updated Christian Urban <christian.urban@kcl.ac.uk> parents: 753 diff changeset	1641
9d40619bc503 updated Christian Urban <christian.urban@kcl.ac.uk> parents: 753 diff changeset	1642
492 882d5de18adc updated Christian Urban <urbanc@in.tum.de> parents: 491 diff changeset	1643
490 8a07f7256f2a updated Christian Urban <urbanc@in.tum.de> parents: 489 diff changeset	1644	%\section*{Further Reading}
270 4dbeaf43031d updated Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 269 diff changeset	1645
490 8a07f7256f2a updated Christian Urban <urbanc@in.tum.de> parents: 489 diff changeset	1646	%Compare what a ``human expert'' would create as an automaton for the
8a07f7256f2a updated Christian Urban <urbanc@in.tum.de> parents: 489 diff changeset	1647	%regular expression $a\cdot (b + c)^*$ and what the Thomson
8a07f7256f2a updated Christian Urban <urbanc@in.tum.de> parents: 489 diff changeset	1648	%algorithm generates.
325 794c599cee53 updated Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 324 diff changeset	1649
140 1be892087df2 added Christian Urban <christian dot urban at kcl dot ac dot uk> parents: diff changeset	1650	\end{document}
1be892087df2 added Christian Urban <christian dot urban at kcl dot ac dot uk> parents: diff changeset	1651
1be892087df2 added Christian Urban <christian dot urban at kcl dot ac dot uk> parents: diff changeset	1652	%%% Local Variables:
1be892087df2 added Christian Urban <christian dot urban at kcl dot ac dot uk> parents: diff changeset	1653	%%% mode: latex
1be892087df2 added Christian Urban <christian dot urban at kcl dot ac dot uk> parents: diff changeset	1654	%%% TeX-master: t
1be892087df2 added Christian Urban <christian dot urban at kcl dot ac dot uk> parents: diff changeset	1655	%%% End:
482 74149519e436 updated Christian Urban <urbanc@in.tum.de> parents: 480 diff changeset	1656
74149519e436 updated Christian Urban <urbanc@in.tum.de> parents: 480 diff changeset	1657

author	Christian Urban <christian.urban@kcl.ac.uk>
	Sun, 14 Sep 2025 12:59:23 +0100
changeset 983	d94532448ec8
parent 966	d82c91f85391
permissions	-rw-r--r--