Binary file handouts/ho03.pdf has changed
--- a/handouts/ho03.tex Tue Apr 25 12:33:16 2017 +0100
+++ b/handouts/ho03.tex Fri Apr 28 11:01:25 2017 +0100
@@ -3,11 +3,6 @@
\usepackage{../langs}
\usepackage{../graphics}
-%We can even allow ``silent
-%transitions'', also called epsilon-transitions. They allow us
-%to go from one state to the next without having a character
-%consumed. We label such silent transition with the letter
-%$\epsilon$.
\begin{document}
\fnote{\copyright{} Christian Urban, King's College London, 2014, 2015, 2016, 2017}
@@ -130,10 +125,10 @@
\small
\lstinputlisting[numbers=left,linebackgroundcolor=
{\ifodd\value{lstnumber}\color{capri!3}\fi}]
- {../progs/dfa.scala}
+ {../progs/display/dfa.scala}
\caption{A Scala implementation of DFAs using partial functions.
Notice some subtleties: \texttt{deltas} implements the delta-hat
- construction by lifting the transition (partial) function to lists
+ construction by lifting the (partial) transition function to lists
of characters. Since \texttt{delta} is given as a partial function,
it can obviously go ``wrong'' in which case the \texttt{Try} in
\texttt{accepts} catches the error and returns \texttt{false}---that
@@ -266,10 +261,20 @@
partial transition functions that return a single state; my NFAs
return a set. I let you think about this representation for
NFA-transitions and how it corresponds to the relations used in the
-mathematical definition of NFAs.
+mathematical definition of NFAs. An example of a transition function
+in Scala for the NFA above is
-Like in the mathematical definition, \texttt{starts} is in NFAs a set
-of states; \texttt{fins} is again a function from states to
+{\small\begin{lstlisting}[language=Scala,linebackgroundcolor=
+ {\ifodd\value{lstnumber}\color{capri!3}\fi}]
+val nfa_delta : (State, Char) :=> Set[State] =
+ { case (Q0, 'a') => Set(Q1, Q2)
+ case (Q0, 'b') => Set(Q0)
+ case (Q1, 'a') => Set(Q1, Q2)
+ case (Q1, 'b') => Set(Q0, Q1) }
+\end{lstlisting}}
+
+\noindent Like in the mathematical definition, \texttt{starts} is in
+NFAs a set of states; \texttt{fins} is again a function from states to
booleans. The \texttt{next} function calculates the set of next states
reachable from a single state \texttt{q} by a character~\texttt{c}. In
case there is no such state---the partial transition function is
@@ -281,7 +286,7 @@
\small
\lstinputlisting[numbers=left,linebackgroundcolor=
{\ifodd\value{lstnumber}\color{capri!3}\fi}]
- {../progs/nfa.scala}
+ {../progs/display/nfa.scala}
\caption{A Scala implementation of NFAs using partial functions.
Notice that the function \texttt{accepts} implements the
acceptance of a string in a breath-first search fashion. This can be a costly
@@ -294,7 +299,7 @@
we might have to explore all possible transitions (recall which state
to go to is not unique anymore with NFAs\ldots{}we need to explore
potentially all next states). The implementation achieves this
-exploration in a \emph{breadth-first search} manner. This is fine for
+exploration through a \emph{breadth-first search}. This is fine for
small NFAs, but can lead to real memory problems when the NFAs are
bigger and larger strings need to be processed. As result, some
regular expression matching engines resort to a \emph{depth-first
@@ -315,16 +320,16 @@
\end{lstlisting}}
\noindent
-This depth-first way of exploration seems to work efficiently in many
-examples and is much less of a strain on memory. The problem is that
-the backtracking can get ``catastrophic'' in some examples---remember
-the catastrophic backtracking from earlier lectures. This depth-first
-search with backtracking is the reason for the abysmal performance of
-some regular expression matchings in Java, Ruby and Python. I like to
-show you this next.
+This depth-first way of exploration seems to work quite efficiently in
+many examples and is much less of a strain on memory. The problem is
+that the backtracking can get ``catastrophic'' in some
+examples---remember the catastrophic backtracking from earlier
+lectures. This depth-first search with backtracking is the reason for
+the abysmal performance of some regular expression matchings in Java,
+Ruby and Python. I like to show you this in the next two sections.
-\subsubsection*{Thompson Construction}
+\subsubsection*{Epsilon NFAs}
In order to get an idea what calculations are performed by Java \&
friends, we need a method for transforming a regular expression into
@@ -332,16 +337,18 @@
are accepted by the regular expression. The simplest and most
well-known method for this is called \emph{Thompson Construction},
after the Turing Award winner Ken Thompson. This method is by
-recursion over regular expressions and uses the non-determinism in
-NFAs. You will see shortly why this construction works well with NFAs,
-but is not so straightforward with DFAs. Unfortunately we are still
-one step away from our intended target though---because this
-construction uses a version of NFAs that allows ``silent
-transitions''. The idea behind silent transitions is that they
-allow us to go from one state to the next without having to consume a
-character. We label such silent transition with the letter
-$\epsilon$ and call the automata $\epsilon$NFAs. Two
-typical examples of $\epsilon$NFAs are
+recursion over regular expressions and depends on the non-determinism
+in NFAs described in the earlier section. You will see shortly why
+this construction works well with NFAs, but is not so straightforward
+with DFAs.
+
+Unfortunately we are still one step away from our intended target
+though---because this construction uses a version of NFAs that allows
+``silent transitions''. The idea behind silent transitions is that
+they allow us to go from one state to the next without having to
+consume a character. We label such silent transition with the letter
+$\epsilon$ and call the automata $\epsilon$NFAs. Two typical examples
+of $\epsilon$NFAs are:
\begin{center}
@@ -373,24 +380,24 @@
\end{center}
\noindent
-Consider the $\epsilon$NFA on the left-hand side: an
-$\epsilon$-transition means you do not have to ``consume'' any part of
-the input string, but ``silently'' change to a different state. For
-example, if you are in the starting state $Q_0$, you can silently move
-either to $Q_1$ or $Q_2$. In this example you can see that once you
-are in $Q_1$, respectively $Q_2$, you cannot ``go back'' to the other
-states.
-
-On first appearances, $\epsilon$-transitions might look rather
-strange, or even silly. To start with, silent transitions make the
-decision whether a string is accepted by an automaton even harder:
-with $\epsilon$NFAs we have to look whether we can do first some
-$\epsilon$-transitions and then do a ``proper''-transition; and after
-any ``proper''-transition we again have to check whether we can do
-again some silent transitions. Even worse, if there is a silent
-transition pointing back to the same state, then we have to be careful
-our decision procedure for strings does not loop (remember the
-depth-first search for exploring all states).
+Consider the $\epsilon$NFA on the left-hand side: the
+$\epsilon$-transitions mean you do not have to ``consume'' any part of
+the input string, but ``silently'' change to a different state. In
+this example, if you are in the starting state $Q_0$, you can silently
+move either to $Q_1$ or $Q_2$. You can see that once you are in $Q_1$,
+respectively $Q_2$, you cannot ``go back'' to the other states. So it
+seems allowing $\epsilon$-transitions is a rather substancial
+extension to NFAs. On first appearances, $\epsilon$-transitions might
+even look rather strange, or even silly. To start with, silent
+transitions make the decision whether a string is accepted by an
+automaton even harder: with $\epsilon$NFAs we have to look whether we
+can do first some $\epsilon$-transitions and then do a
+``proper''-transition; and after any ``proper''-transition we again
+have to check whether we can do again some silent transitions. Even
+worse, if there is a silent transition pointing back to the same
+state, then we have to be careful our decision procedure for strings
+does not loop (remember the depth-first search for exploring all
+states).
The obvious question is: Do we get anything in return for this hassle
with silent transitions? Well, we still have to work for it\ldots
@@ -399,12 +406,12 @@
are---that would require extending the transition relation of
NFAs. Next, show that the $\epsilon$NFAs are equivalent to NFAs and so
on. Once we have done all this on paper, we would need to implement
-$\epsilon$NFAs. Lets try to take a shortcut---we are not really
+$\epsilon$NFAs. Lets try to take a shortcut instead. We are not really
interested in $\epsilon$NFAs; they are only a convenient tool for
-translating regular expressions into automata. We are not going to
-implementing them explicitly, but translate them directly into NFAs
-(in a sense $\epsilon$NFAs are just a convenient API for lazy people).
-How does the translation work: well we have to find all transitions of
+translating regular expressions into automata. So we are not going to
+implementing them explicitly, but translate them immediately into NFAs
+(in a sense $\epsilon$NFAs are just a convenient API for lazy people ;o).
+How does the translation work? Well we have to find all transitions of
the form
\[
@@ -414,7 +421,7 @@
\]
\noindent and replace them with $q \stackrel{a}{\longrightarrow}
-q'$. Doing this to the $\epsilon$NFA on the left-hand side above gives
+q'$. Doing this to the $\epsilon$NFA on the right-hand side above gives
the NFA
\begin{center}
@@ -432,24 +439,31 @@
\end{tikzpicture}
\end{center}
-\noindent where the single the $\epsilon$-transition is replaced by
-three more $a$-transitions. So whenever we are given $\epsilon$NFA we
-will, similarly, replace it by an equivalent NFA. The code for this is
-given in Figure~\ref{enfa}. At the core is a function that calculates
-a fixpoint of function (Lines 5--10). This is used for ``discovering''
-states reachable by $\epsilon$-transitions. Once no new state is
-reachable state, a fixpoint is reached. This is used for example when
-calculating the starting states of an equivalent NFA---see Line 36.
-We starts with all starting states of the $\epsilon$NFA and then look
-for all other states that can be reached by
-$\epsilon$-transition. This is what the $\epsilon$-closure, called
-\texttt{ecl}, calculates.
+\noindent where the single $\epsilon$-transition is replaced by
+three additional $a$-transitions. Please do the calculations yourself
+and verify that I did not forget any transition.
+
+So in what follows, whenever we are given an $\epsilon$NFA we will
+replace it by an equivalent NFA. The code for this is given in
+Figure~\ref{enfa}. The main workhorse in this code is a function that
+calculates a fixpoint of function (Lines 5--10). This function is used
+for ``discovering'' which states are reachable by
+$\epsilon$-transitions. Once no new state is discovered, a fixpoint is
+reached. This is used for example when calculating the starting states
+of an equivalent NFA (see Line 36): we start with all starting states
+of the $\epsilon$NFA and then look for all additional states that can
+be reached by $\epsilon$-transitions. We keep on doing this until no
+new state can be reached. This is what the $\epsilon$-closure, named
+in the code \texttt{ecl}, calculates. Similarly, an accepting state of
+the NFA is when we can reach an accepting state of the $\epsilon$NFA
+by $\epsilon$-transitions.
+
\begin{figure}[p]
\small
\lstinputlisting[numbers=left,linebackgroundcolor=
{\ifodd\value{lstnumber}\color{capri!3}\fi}]
- {../progs/enfa.scala}
+ {../progs/display/enfa.scala}
\caption{A Scala function that translates $\epsilon$NFA into NFAs. The
transtions of $\epsilon$NFA take as input an \texttt{Option[C]}.
@@ -459,10 +473,37 @@
set of states. The NFA is constructed in in Lines 36--38.\label{enfa}}
\end{figure}
+Also look carefully how the transitions of $\epsilon$NFAs are
+implemented. The additional possibility of performing silent
+transitions is encoded by using \texttt{Option[C]} as the type for the
+``input''. The \texttt{Some}s stand for ``propper'' transitions where
+a character is consumed; \texttt{None} stands for
+$\epsilon$-transitions. The transition functions for the two
+$\epsilon$NFAs from the beginning of this section can be defined as
-Now having a translation of $\epsilon$NFAs to NFAs in place, we can
-finally make headway with the problem of translating regular
-expressions into equivalent NFAs. By equivalent we mean that the NFAs
+{\small\begin{lstlisting}[language=Scala,linebackgroundcolor=
+ {\ifodd\value{lstnumber}\color{capri!3}\fi}]
+val enfa_trans1 : (State, Option[Char]) :=> Set[State] =
+ { case (Q0, Some('a')) => Set(Q0)
+ case (Q0, None) => Set(Q1, Q2)
+ case (Q1, Some('a')) => Set(Q1)
+ case (Q2, Some('b')) => Set(Q2) }
+
+val enfa_trans2 : (State, Option[Char]) :=> Set[State] =
+ { case (R1, Some('b')) => Set(R3)
+ case (R1, None) => Set(R2)
+ case (R2, Some('a')) => Set(R1, R3) }
+\end{lstlisting}}
+
+\noindent
+I hope you agree now with my earlier statement that the $\epsilon$NFAs
+are just an API for NFAs.
+
+\subsubsection*{Thompson Construction}
+
+Having the translation of $\epsilon$NFAs to NFAs in place, we can
+finally return to the problem of translating regular expressions into
+equivalent NFAs. Recall that by equivalent we mean that the NFAs
recognise the same language. Consider the simple regular expressions
$\ZERO$, $\ONE$ and $c$. They can be translated into equivalent NFAs
as follows:
@@ -477,27 +518,55 @@
\begin{tikzpicture}[scale=0.7,>=stealth',very thick, every state/.style={minimum size=3pt,draw=blue!50,very thick,fill=blue!20},]
\node[state, initial, accepting] (Q_0) {$\mbox{}$};
\end{tikzpicture}\\\\
-\raisebox{2mm}{$c$} &
+\raisebox{3mm}{$c$} &
\begin{tikzpicture}[scale=0.7,>=stealth',very thick, every state/.style={minimum size=3pt,draw=blue!50,very thick,fill=blue!20},]
\node[state, initial] (Q_0) {$\mbox{}$};
\node[state, accepting] (Q_1) [right=of Q_0] {$\mbox{}$};
\path[->] (Q_0) edge node [below] {$c$} (Q_1);
-\end{tikzpicture}\\\\
+\end{tikzpicture}\\
\end{tabular}
\end{center}
+\noindent
+I let you think whether the NFAs can match exactly those strings the
+regular expressions can match. To do this translation in code we need
+a way to construct states programatically...and as an additional
+constrain Scala needs to recognise these states as being distinct.
+For this I implemented in Figure~\ref{thompson1} a class
+\texttt{TState} that includes a counter and a companion object that
+increases this counter whenever a state is created.\footnote{You might
+ have to read up what \emph{companion objects} are in Scala.}
+
\begin{figure}[p]
\small
\lstinputlisting[numbers=left,linebackgroundcolor=
{\ifodd\value{lstnumber}\color{capri!3}\fi}]
- {../progs/thompson.scala}
-\caption{A Scala XXX\label{thompson}}
+ {../progs/display/thompson1.scala}
+\caption{The first part of the Thompson Construction. Lines 7--16
+ implement a way how to create states that are all
+ distinct by virtue of a counter. This counter is
+ increased in the companion object of \texttt{TState}
+ whenever a new state is created. The code in Lines 24--40
+ constructs NFAs for the simple regular expressions.
+ \label{thompson1}}
\end{figure}
+\begin{figure}[p]
+\small
+\lstinputlisting[numbers=left,linebackgroundcolor=
+ {\ifodd\value{lstnumber}\color{capri!3}\fi}]
+ {../progs/display/thompson2.scala}
+\caption{The second part of the Thompson Construction implementing
+ the composition of NFAs according to $\cdot$, $+$ and $\_^*$.
+ The implicit class about rich partial functions
+ implements the infix operation \texttt{+++} which
+ combines an $\epsilon$NFA transition with a NFA transition
+ (both given as partial functions).\label{thompson2}}
+\end{figure}
-\noindent The case for the sequence regular expression $r_1
-\cdot r_2$ is as follows: We are given by recursion two
-automata representing $r_1$ and $r_2$ respectively.
+The case for the sequence regular expression $r_1 \cdot r_2$ is as
+follows: We are given by recursion two automata representing $r_1$ and
+$r_2$ respectively.
\begin{center}
\begin{tikzpicture}[node distance=3mm,
--- a/progs/dfa.scala Tue Apr 25 12:33:16 2017 +0100
+++ b/progs/dfa.scala Fri Apr 28 11:01:25 2017 +0100
@@ -17,7 +17,7 @@
Try(fins(deltas(start, s))) getOrElse false
}
-// the example shown earlier in the handout
+// the example shown in the handout
abstract class State
case object Q0 extends State
case object Q1 extends State
@@ -41,3 +41,25 @@
dfa.accepts("bbabaab".toList) // true
dfa.accepts("baba".toList) // false
+
+
+// another DFA test with a Sink state
+abstract class S
+case object S0 extends S
+case object S1 extends S
+case object S2 extends S
+case object Sink extends S
+
+// transition function with a sink state
+val sigma : (S, Char) :=> S =
+ { case (S0, 'a') => S1
+ case (S1, 'a') => S2
+ case _ => Sink
+ }
+
+val dfa1a = DFA(S0, sigma, Set[S](S2))
+
+dfa1a.accepts("aa".toList) // true
+dfa1a.accepts("".toList) // false
+dfa1a.accepts("ab".toList) // false
+
--- /dev/null Thu Jan 01 00:00:00 1970 +0000
+++ b/progs/display/dfa.scala Fri Apr 28 11:01:25 2017 +0100
@@ -0,0 +1,43 @@
+// DFAs in Scala using partial functions
+import scala.util.Try
+
+// type abbreviation for partial functions
+type :=>[A, B] = PartialFunction[A, B]
+
+case class DFA[A, C](start: A, // starting state
+ delta: (A, C) :=> A, // transition (partial fun)
+ fins: A => Boolean) { // final states
+
+ def deltas(q: A, s: List[C]) : A = s match {
+ case Nil => q
+ case c::cs => deltas(delta(q, c), cs)
+ }
+
+ def accepts(s: List[C]) : Boolean =
+ Try(fins(deltas(start, s))) getOrElse false
+}
+
+// the example shown earlier in the handout
+abstract class State
+case object Q0 extends State
+case object Q1 extends State
+case object Q2 extends State
+case object Q3 extends State
+case object Q4 extends State
+
+val delta : (State, Char) :=> State =
+ { case (Q0, 'a') => Q1
+ case (Q0, 'b') => Q2
+ case (Q1, 'a') => Q4
+ case (Q1, 'b') => Q2
+ case (Q2, 'a') => Q3
+ case (Q2, 'b') => Q2
+ case (Q3, 'a') => Q4
+ case (Q3, 'b') => Q0
+ case (Q4, 'a') => Q4
+ case (Q4, 'b') => Q4 }
+
+val dfa = DFA(Q0, delta, Set[State](Q4))
+
+dfa.accepts("bbabaab".toList) // true
+dfa.accepts("baba".toList) // false
--- /dev/null Thu Jan 01 00:00:00 1970 +0000
+++ b/progs/display/enfa.scala Fri Apr 28 11:01:25 2017 +0100
@@ -0,0 +1,39 @@
+// epsilon NFAs...immediately translated into NFAs
+// (needs nfa.scala)
+
+// fixpoint construction
+import scala.annotation.tailrec
+@tailrec
+def fixpT[A](f: A => A, x: A): A = {
+ val fx = f(x)
+ if (fx == x) x else fixpT(f, fx)
+}
+
+// translates eNFAs directly into NFAs
+def eNFA[A, C](starts: Set[A], // starting states
+ delta: (A, Option[C]) :=> Set[A], // epsilon-transitions
+ fins: A => Boolean) : NFA[A, C] = { // final states
+
+ // epsilon transitions
+ def enext(q: A) : Set[A] =
+ applyOrElse(delta, (q, None))
+
+ def enexts(qs: Set[A]) : Set[A] =
+ qs | qs.flatMap(enext(_))
+
+ // epsilon closure
+ def ecl(qs: Set[A]) : Set[A] =
+ fixpT(enexts, qs)
+
+ // "normal" transitions
+ def next(q: A, c: C) : Set[A] =
+ applyOrElse(delta, (q, Some(c)))
+
+ def nexts(qs: Set[A], c: C) : Set[A] =
+ ecl(ecl(qs).flatMap(next(_, c)))
+
+ // result NFA
+ NFA(ecl(starts),
+ { case (q, c) => nexts(Set(q), c) },
+ q => ecl(Set(q)) exists fins)
+}
--- /dev/null Thu Jan 01 00:00:00 1970 +0000
+++ b/progs/display/nfa.scala Fri Apr 28 11:01:25 2017 +0100
@@ -0,0 +1,36 @@
+// NFAs in Scala using partial functions (returning
+// sets of states)
+import scala.util.Try
+
+// type abbreviation for partial functions
+type :=>[A, B] = PartialFunction[A, B]
+
+// return an empty set when not defined
+def applyOrElse[A, B](f: A :=> Set[B], x: A) : Set[B] =
+ Try(f(x)) getOrElse Set[B]()
+
+
+// NFAs
+case class NFA[A, C](starts: Set[A], // starting states
+ delta: (A, C) :=> Set[A], // transition function
+ fins: A => Boolean) { // final states
+
+ // given a state and a character, what is the set of
+ // next states? if there is none => empty set
+ def next(q: A, c: C) : Set[A] =
+ applyOrElse(delta, (q, c))
+
+ def nexts(qs: Set[A], c: C) : Set[A] =
+ qs.flatMap(next(_, c))
+
+ // given some states and a string, what is the set
+ // of next states?
+ def deltas(qs: Set[A], s: List[C]) : Set[A] = s match {
+ case Nil => qs
+ case c::cs => deltas(nexts(qs, c), cs)
+ }
+
+ // is a string accepted by an NFA?
+ def accepts(s: List[C]) : Boolean =
+ deltas(starts, s).exists(fins)
+}
--- /dev/null Thu Jan 01 00:00:00 1970 +0000
+++ b/progs/display/thompson1.scala Fri Apr 28 11:01:25 2017 +0100
@@ -0,0 +1,40 @@
+// Thompson Construction (Part 1)
+// (needs :load nfa.scala
+// :load enfa.scala)
+
+
+// states for Thompson construction
+case class TState(i: Int) extends State
+
+object TState {
+ var counter = 0
+
+ def apply() : TState = {
+ counter += 1;
+ new TState(counter - 1)
+ }
+}
+
+
+// some types abbreviations
+type NFAt = NFA[TState, Char]
+
+
+// NFA that does not accept any string
+def NFA_ZERO(): NFAt = {
+ val Q = TState()
+ NFA(Set(Q), { case _ => Set() }, Set())
+}
+
+// NFA that accepts the empty string
+def NFA_ONE() : NFAt = {
+ val Q = TState()
+ NFA(Set(Q), { case _ => Set() }, Set(Q))
+}
+
+// NFA that accepts the string "c"
+def NFA_CHAR(c: Char) : NFAt = {
+ val Q1 = TState()
+ val Q2 = TState()
+ NFA(Set(Q1), { case (Q1, d) if (c == d) => Set(Q2) }, Set(Q2))
+}
--- /dev/null Thu Jan 01 00:00:00 1970 +0000
+++ b/progs/display/thompson2.scala Fri Apr 28 11:01:25 2017 +0100
@@ -0,0 +1,44 @@
+// Thompson Construction (Part 2)
+
+// some more types abbreviations
+type NFAtrans = (TState, Char) :=> Set[TState]
+type eNFAtrans = (TState, Option[Char]) :=> Set[TState]
+
+
+// for composing an eNFA transition with a NFA transition
+implicit class RichPF(val f: eNFAtrans) extends AnyVal {
+ def +++(g: NFAtrans) : eNFAtrans =
+ { case (q, None) => applyOrElse(f, (q, None))
+ case (q, Some(c)) =>
+ applyOrElse(f, (q, Some(c))) | applyOrElse(g, (q, c)) }
+}
+
+// sequence of two NFAs
+def NFA_SEQ(enfa1: NFAt, enfa2: NFAt) : NFAt = {
+ val new_delta : eNFAtrans =
+ { case (q, None) if enfa1.fins(q) => enfa2.starts }
+
+ eNFA(enfa1.starts, new_delta +++ enfa1.delta +++ enfa2.delta,
+ enfa2.fins)
+}
+
+// alternative of two NFAs
+def NFA_ALT(enfa1: NFAt, enfa2: NFAt) : NFAt = {
+ val Q = TState()
+ val new_delta : eNFAtrans =
+ { case (Q, None) => enfa1.starts | enfa2.starts }
+ val new_fins = (q: TState) => enfa1.fins(q) || enfa2.fins(q)
+
+ eNFA(Set(Q), new_delta +++ enfa1.delta +++ enfa2.delta,
+ new_fins)
+}
+
+// star of a NFA
+def NFA_STAR(enfa: NFAt) : NFAt = {
+ val Q = TState()
+ val new_delta : eNFAtrans =
+ { case (Q, None) => enfa.starts
+ case (q, None) if enfa.fins(q) => Set(Q) }
+
+ eNFA(Set(Q), new_delta +++ enfa.delta, Set(Q))
+}
--- a/progs/enfa.scala Tue Apr 25 12:33:16 2017 +0100
+++ b/progs/enfa.scala Fri Apr 28 11:01:25 2017 +0100
@@ -1,5 +1,5 @@
// epsilon NFAs...immediately translated into NFAs
-// (needs nfa.scala)
+// (needs :load nfa.scala in REPL)
// fixpoint construction
import scala.annotation.tailrec
@@ -37,3 +37,29 @@
{ case (q, c) => nexts(Set(q), c) },
q => ecl(Set(q)) exists fins)
}
+
+
+// eNFA examples
+val enfa_trans1 : (State, Option[Char]) :=> Set[State] =
+ { case (Q0, Some('a')) => Set(Q0)
+ case (Q0, None) => Set(Q1, Q2)
+ case (Q1, Some('a')) => Set(Q1)
+ case (Q2, Some('b')) => Set(Q2)
+ }
+
+val enfa1 = eNFA(Set[State](Q0), enfa_trans1, Set[State](Q2))
+
+
+//
+case object R1 extends State
+case object R2 extends State
+case object R3 extends State
+
+val enfa_trans2 : (State, Option[Char]) :=> Set[State] =
+ { case (R1, Some('b')) => Set(R3)
+ case (R1, None) => Set(R2)
+ case (R2, Some('a')) => Set(R1, R3)
+ }
+
+
+val enfa2 = eNFA(Set[State](R1), enfa_trans1, Set[State](R3))
--- a/progs/nfa.scala Tue Apr 25 12:33:16 2017 +0100
+++ b/progs/nfa.scala Fri Apr 28 11:01:25 2017 +0100
@@ -2,6 +2,16 @@
// sets of states)
import scala.util.Try
+// states class
+abstract class State
+case object Q0 extends State
+case object Q1 extends State
+case object Q2 extends State
+case object Q3 extends State
+case object Q4 extends State
+
+
+
// type abbreviation for partial functions
type :=>[A, B] = PartialFunction[A, B]
@@ -11,9 +21,9 @@
// NFAs
-case class NFA[A, C](starts: Set[A], // starting states
- delta: (A, C) :=> Set[A], // transition function
- fins: A => Boolean) { // final states
+case class NFA[A, C](starts: Set[A], // starting states
+ delta: (A, C) :=> Set[A], // transition function
+ fins: A => Boolean) { // final states
// given a state and a character, what is the set of
// next states? if there is none => empty set
@@ -33,4 +43,39 @@
// is a string accepted by an NFA?
def accepts(s: List[C]) : Boolean =
deltas(starts, s).exists(fins)
+
+ // depth-first version of accepts
+ def search(q: A, s: List[C]) : Boolean = s match {
+ case Nil => fins(q)
+ case c::cs => next(q, c).exists(search(_, cs))
+ }
+
+ def accepts2(s: List[C]) : Boolean =
+ starts.exists(search(_, s))
}
+
+
+
+// NFA examples
+
+val nfa_trans1 : (State, Char) :=> Set[State] =
+ { case (Q0, 'a') => Set(Q0, Q1)
+ case (Q0, 'b') => Set(Q2)
+ case (Q1, 'a') => Set(Q1)
+ case (Q2, 'b') => Set(Q2) }
+
+val nfa1 = NFA(Set[State](Q0), nfa_trans1, Set[State](Q2))
+
+nfa1.accepts("aa".toList) // false
+nfa1.accepts("aaaaa".toList) // false
+nfa1.accepts("aaaaab".toList) // true
+nfa1.accepts("aaaaabbb".toList) // true
+nfa1.accepts("aaaaabbbaaa".toList) // false
+nfa1.accepts("ac".toList) // false
+
+nfa1.accepts2("aa".toList) // false
+nfa1.accepts2("aaaaa".toList) // false
+nfa1.accepts2("aaaaab".toList) // true
+nfa1.accepts2("aaaaabbb".toList) // true
+nfa1.accepts2("aaaaabbbaaa".toList) // false
+nfa1.accepts2("ac".toList) // false
--- a/progs/thompson.scala Tue Apr 25 12:33:16 2017 +0100
+++ b/progs/thompson.scala Fri Apr 28 11:01:25 2017 +0100
@@ -1,51 +1,157 @@
-// eNFA that does not accept any string
-def eNFA_ZERO(): NFA[TState, Char] = {
+// Thompson Construction
+// (needs :load nfa.scala
+// :load enfa.scala)
+
+
+// states for Thompson construction
+case class TState(i: Int) extends State
+
+object TState {
+ var counter = 0
+
+ def apply() : TState = {
+ counter += 1;
+ new TState(counter - 1)
+ }
+}
+
+
+// some types abbreviations
+type NFAt = NFA[TState, Char]
+type NFAtrans = (TState, Char) :=> Set[TState]
+type eNFAtrans = (TState, Option[Char]) :=> Set[TState]
+
+
+// for composing an eNFA transition with a NFA transition
+implicit class RichPF(val f: eNFAtrans) extends AnyVal {
+ def +++(g: NFAtrans) : eNFAtrans =
+ { case (q, None) => applyOrElse(f, (q, None))
+ case (q, Some(c)) => applyOrElse(f, (q, Some(c))) | applyOrElse(g, (q, c)) }
+}
+
+
+// NFA that does not accept any string
+def NFA_ZERO(): NFAt = {
val Q = TState()
NFA(Set(Q), { case _ => Set() }, Set())
}
-// eNFA that accepts the empty string
-def eNFA_ONE() : NFA[TState, Char] = {
+// NFA that accepts the empty string
+def NFA_ONE() : NFAt = {
val Q = TState()
NFA(Set(Q), { case _ => Set() }, Set(Q))
}
-// eNFA that accepts the string "c"
-def eNFA_CHAR(c: Char) : NFA[TState, Char] = {
+// NFA that accepts the string "c"
+def NFA_CHAR(c: Char) : NFAt = {
val Q1 = TState()
val Q2 = TState()
NFA(Set(Q1), { case (Q1, d) if (c == d) => Set(Q2) }, Set(Q2))
}
-// alternative of two eNFAs
-def eNFA_ALT(enfa1: NFA[TState, Char], enfa2: NFA[TState, Char]) : NFA[TState, Char] = {
+// sequence of two NFAs
+def NFA_SEQ(enfa1: NFAt, enfa2: NFAt) : NFAt = {
+ val new_delta : eNFAtrans =
+ { case (q, None) if enfa1.fins(q) => enfa2.starts }
+
+ eNFA(enfa1.starts, new_delta +++ enfa1.delta +++ enfa2.delta,
+ enfa2.fins)
+}
+
+// alternative of two NFAs
+def NFA_ALT(enfa1: NFAt, enfa2: NFAt) : NFAt = {
val Q = TState()
val new_delta : eNFAtrans =
{ case (Q, None) => enfa1.starts | enfa2.starts }
-
val new_fins = (q: TState) => enfa1.fins(q) || enfa2.fins(q)
- eNFA(Set(Q),
- new_delta andAlso enfa1.delta andAlso enfa2.delta,
- new_fins)
+ eNFA(Set(Q), new_delta +++ enfa1.delta +++ enfa2.delta, new_fins)
}
-// sequence of two eNFAs
-def eNFA_SEQ(enfa1: NFA[TState, Char], enfa2: NFA[TState, Char]) : NFA[TState, Char] = {
- val new_delta : eNFAtrans =
- { case (q, None) if enfa1.fins(q) => enfa2.starts }
-
- eNFA(enfa1.starts,
- new_delta andAlso enfa1.delta andAlso enfa2.delta,
- enfa2.fins)
-}
-
-// star of an eNFA
-def eNFA_STAR(enfa: NFA[TState, Char]) : NFA[TState, Char] = {
+// star of a NFA
+def NFA_STAR(enfa: NFAt) : NFAt = {
val Q = TState()
val new_delta : eNFAtrans =
{ case (Q, None) => enfa.starts
case (q, None) if enfa.fins(q) => Set(Q) }
- eNFA(Set(Q), new_delta andAlso enfa.delta, Set(Q))
+ eNFA(Set(Q), new_delta +++ enfa.delta, Set(Q))
+}
+
+
+
+// regular expressions
+abstract class Rexp
+case object ZERO extends Rexp // matches nothing
+case object ONE extends Rexp // matches the empty string
+case class CHAR(c: Char) extends Rexp // matches a character c
+case class ALT(r1: Rexp, r2: Rexp) extends Rexp // alternative
+case class SEQ(r1: Rexp, r2: Rexp) extends Rexp // sequence
+case class STAR(r: Rexp) extends Rexp // star
+
+
+
+
+// thompson construction
+def thompson (r: Rexp) : NFAt = r match {
+ case ZERO => NFA_ZERO()
+ case ONE => NFA_ONE()
+ case CHAR(c) => NFA_CHAR(c)
+ case ALT(r1, r2) => NFA_ALT(thompson(r1), thompson(r2))
+ case SEQ(r1, r2) => NFA_SEQ(thompson(r1), thompson(r2))
+ case STAR(r1) => NFA_STAR(thompson(r1))
}
+
+
+def tmatches(r: Rexp, s: String) : Boolean =
+ thompson(r).accepts(s.toList)
+
+def tmatches2(r: Rexp, s: String) : Boolean =
+ thompson(r).accepts2(s.toList)
+
+
+//optional regular expression (one or zero times)
+def OPT(r: Rexp) = ALT(r, ONE)
+
+//n-times regular expression (explicitly expanded)
+def NTIMES(r: Rexp, n: Int) : Rexp = n match {
+ case 0 => ONE
+ case 1 => r
+ case n => SEQ(r, NTIMES(r, n - 1))
+}
+
+
+// Test Cases
+
+// the evil regular expression a?{n} a{n}
+def EVIL1(n: Int) = SEQ(NTIMES(OPT(CHAR('a')), n), NTIMES(CHAR('a'), n))
+
+// the evil regular expression (a*)*b
+val EVIL2 = SEQ(STAR(STAR(CHAR('a'))), CHAR('b'))
+
+//for measuring time
+def time_needed[T](i: Int, code: => T) = {
+ val start = System.nanoTime()
+ for (j <- 1 to i) code
+ val end = System.nanoTime()
+ (end - start)/(i * 1.0e9)
+
+
+// the size of the NFA can be large,
+// thus slowing down the breadth-first search
+
+for (i <- 1 to 10) {
+ println(i + ": " + "%.5f".format(time_needed(2, tmatches(EVIL1(i), "a" * i))))
+}
+
+for (i <- 1 to 10) {
+ println(i + " " + "%.5f".format(time_needed(2, tmatches(EVIL2, "a" * i))))
+}
+
+
+// the backtracking needed in depth-first search
+// can be painfully slow
+
+for (i <- 1 to 8) {
+ println(i + " " + "%.5f".format(time_needed(2, tmatches2(EVIL2, "a" * i))))
+}