updated
authorChristian Urban <urbanc@in.tum.de>
Wed, 22 Mar 2017 14:10:01 +0000
changeset 480 9e42ccbbd1e6
parent 478 48b842c997c7
child 481 acd8780bfc8b
updated
handouts/ho02.pdf
handouts/ho02.tex
handouts/ho03.pdf
handouts/ho03.tex
langs.sty
progs/dfa.scala
progs/re1.scala
Binary file handouts/ho02.pdf has changed
--- a/handouts/ho02.tex	Wed Mar 15 14:34:10 2017 +0000
+++ b/handouts/ho02.tex	Wed Mar 22 14:10:01 2017 +0000
@@ -3,8 +3,7 @@
 \usepackage{../langs}
 \usepackage{../graphics}
 \usepackage{../data}
-\usepackage{lstlinebgrd}
-\definecolor{capri}{rgb}{0.0, 0.75, 1.0}
+
 
 \begin{document}
 \fnote{\copyright{} Christian Urban, King's College London, 2014, 2015, 2016, 2017}
@@ -265,6 +264,22 @@
 $\ZERO$s, therefore simplifying them away will make the
 algorithm quite a bit faster.
 
+Finally here are three equivalences between regulare expressions which are
+not so obvious:
+
+\begin{center}
+\begin{tabular}{rcl}
+$r^*$  & $\equiv$ & $1 + r\cdot r^*$\\
+$(r_1 + r_2)^*$  & $\equiv$ & $r_1^* \cdot (r_2\cdot r_1^*)^*$\\
+$(r_1 \cdot r_2)^*$ & $\equiv$ & $1 + r_1\cdot (r_2 \cdot r_1)^* \cdot r_2$\\
+\end{tabular}
+\end{center}
+
+\noindent
+You can try to establish them. As an aside, there has been a lot of research
+in questions like: Can one always decide when two regular expressions are
+equivalent or not? What does an algorithm look like to decide this?
+
 \subsection*{The Matching Algorithm}
 
 The algorithm we will define below consists of two parts. One
Binary file handouts/ho03.pdf has changed
--- a/handouts/ho03.tex	Wed Mar 15 14:34:10 2017 +0000
+++ b/handouts/ho03.tex	Wed Mar 22 14:10:01 2017 +0000
@@ -4,26 +4,32 @@
 \usepackage{../graphics}
 
 
+%We can even allow ``silent
+%transitions'', also called epsilon-transitions. They allow us
+%to go from one state to the next without having a character
+%consumed. We label such silent transition with the letter
+%$\epsilon$.
+
 \begin{document}
+\fnote{\copyright{} Christian Urban, King's College London, 2014, 2015, 2016, 2017}
 
 \section*{Handout 3 (Automata)}
 
-Every formal language course I know of bombards you first with
-automata and then to a much, much smaller extend with regular
-expressions. As you can see, this course is turned upside
-down: regular expressions come first. The reason is that
-regular expressions are easier to reason about and the notion
-of derivatives, although already quite old, only became more
-widely known rather recently. Still let us in this lecture
-have a closer look at automata and their relation to regular
-expressions. This will help us with understanding why the
-regular expression matchers in Python, Ruby and Java are so slow
-with certain regular expressions. The central definition
-is:\medskip 
+Every formal language and compiler course I know of bombards you first
+with automata and then to a much, much smaller extend with regular
+expressions. As you can see, this course is turned upside down:
+regular expressions come first. The reason is that regular expressions
+are easier to reason about and the notion of derivatives, although
+already quite old, only became more widely known rather
+recently. Still let us in this lecture have a closer look at automata
+and their relation to regular expressions. This will help us with
+understanding why the regular expression matchers in Python, Ruby and
+Java are so slow with certain regular expressions. The central
+definition is:\medskip
 
 \noindent 
 A \emph{deterministic finite automaton} (DFA), say $A$, is
-defined by a four-tuple written $A(Q, q_0, F, \delta)$ where
+given by a four-tuple written $A(Q, q_0, F, \delta)$ where
 
 \begin{itemize}
 \item $Q$ is a finite set of states,
@@ -97,36 +103,86 @@
 \]
 
 \noindent This lifted transition function is often called
-``delta-hat''. Given a string, we start in the starting state
-and take the first character of the string, follow to the next
-sate, then take the second character and so on. Once the
-string is exhausted and we end up in an accepting state, then
-this string is accepted by the automaton. Otherwise it is not
-accepted. So $s$ is in the \emph{language accepted by the
-automaton} $A(Q, q_0, F, \delta)$ iff
+``delta-hat''. Given a string, we start in the starting state and take
+the first character of the string, follow to the next sate, then take
+the second character and so on. Once the string is exhausted and we
+end up in an accepting state, then this string is accepted by the
+automaton. Otherwise it is not accepted. This also means that if along
+the way we hit the case where the transition function $\delta$ is not
+defined, we need to raise an error. In our implementation we will deal
+with this case elegantly by using Scala's \texttt{Try}. So $s$ is in
+the \emph{language accepted by the automaton} $A(Q, q_0, F, \delta)$
+iff
 
 \[
 \hat{\delta}(q_0, s) \in F 
 \]
 
 \noindent I let you think about a definition that describes
-the set of strings accepted by an automaton.
-  
+the set of strings accepted by an automaton. 
+
+\begin{figure}[p]
+\small
+\lstinputlisting[numbers=left,linebackgroundcolor=
+                  {\ifodd\value{lstnumber}\color{capri!3}\fi}]
+                  {../progs/dfa.scala}
+\caption{Scala implementation of DFAs using partial functions.
+  Notice some subtleties: \texttt{deltas} implements the delta-hat
+  construction by lifting the transition (partial) function to
+  lists of \texttt{C}haracters. Since \texttt{delta} is given
+  as partial function, it can obviously go ``wrong'' in which
+  case the \texttt{Try} in \texttt{accepts} catches the error and
+  returns \texttt{false}, that is the string is not accepted.
+  The example implements the DFA example shown above.\label{dfa}}
+\end{figure}
+
+A simple Scala implementation of DFAs is given in Figure~\ref{dfa}. As
+you can see, there many features of the mathematical definition that
+are quite closely reflected in the code. In the DFA-class, there is a
+starting state, called \texttt{start}, with the polymorphic type
+\texttt{A}. There is a partial function \texttt{delta} for specifying
+the transitions. I used partial functions for transitions in Scala;
+alternatives would have been \texttt{Maps} or even \texttt{Lists}. One
+of the main advantages of using partial functions is that transitions
+can be quite nicely defined by a series of \texttt{case} statements
+(see Lines 28 -- 38). If you need to represent an automaton
+with a sink state (catch-all-state), you can write something like
 
-While with DFAs it will always be clear that given a character
-what the next state is (potentially none), it will be useful
-to relax this restriction. That means we have several
-potential successor states. We even allow ``silent
-transitions'', also called epsilon-transitions. They allow us
-to go from one state to the next without having a character
-consumed. We label such silent transition with the letter
-$\epsilon$. The resulting construction is called a
-\emph{non-deterministic finite automaton} (NFA) given also as
-a four-tuple $A(Q, q_0, F, \rho)$ where
+{\small\begin{lstlisting}[language=Scala,linebackgroundcolor=
+                  {\ifodd\value{lstnumber}\color{capri!3}\fi}]
+abstract class State
+...
+case object Sink extends State
+
+val delta : (State, Char) :=> State = 
+  { case (S0, 'a') => S1
+    case (S1, 'a') => S2
+    case _ => Sink
+  }
+\end{lstlisting}}  
+
+\noindent The DFA-class has also an argument for final states. It is a
+function from states to booleans (returns true whenever a state is
+meant to be finbal; false otherwise). While this is a boolean
+function, Scala allows me to use sets of states for such functions
+(see Line 40 where I initialise the DFA). 
+
+I let you ponder whether this is a good implementation of DFAs. In
+doing so I hope you notice that the $Q$ component (the set of finite
+states) is missing from the class definition. This means that the
+implementation allows you to do some fishy things you are not meant to
+do with DFAs. Which fishy things could that be?
+
+While with DFAs it will always be clear that given a character what
+the next state is (potentially none), it will be useful to relax this
+restriction. That means we have several potential successor states. We
+even allow more than one starting state. The resulting construction is
+called a \emph{non-deterministic finite automaton} (NFA) given also as
+a four-tuple $A(Q, Q_{0s}, F, \rho)$ where
 
 \begin{itemize}
 \item $Q$ is a finite set of states
-\item $q_0$ is a start state
+\item $Q_{0s}$ are the start states ($Q_{0s} \subseteq Q$)
 \item $F$ are some accepting states with $F \subseteq Q$, and
 \item $\rho$ is a transition relation.
 \end{itemize}
--- a/langs.sty	Wed Mar 15 14:34:10 2017 +0000
+++ b/langs.sty	Wed Mar 22 14:10:01 2017 +0000
@@ -68,3 +68,9 @@
 
 %%\lstset{escapeinside={(*@}{@*)}}
 \lstset{escapeinside={/*@}{@*/}}
+
+
+
+%% stripy code
+\usepackage{lstlinebgrd}
+\definecolor{capri}{rgb}{0.0, 0.75, 1.0}
--- a/progs/dfa.scala	Wed Mar 15 14:34:10 2017 +0000
+++ b/progs/dfa.scala	Wed Mar 22 14:10:01 2017 +0000
@@ -1,64 +1,43 @@
-// DFAs
-import scala.util._
+// DFAs in Scala based on partial functions
+import scala.util.Try
 
-abstract class State
-type States = Set[State]
-
-case class IntState(i: Int) extends State
+// type abbreviation for partial functions
+type :=>[A, B] = PartialFunction[A, B]
 
-object NewState {
-  var counter = 0
-  
-  def apply() : IntState = {
-    counter += 1;
-    new IntState(counter - 1)
+case class DFA[A, C](start: A,                // starting state
+                     delta: (A, C) :=> A,     // transition partial fun
+                     fins:  A => Boolean) {   // final states
+
+  def deltas(q: A, s: List[C]) : A = s match {
+    case Nil => q
+    case c::cs => deltas(delta(q, c), cs)
   }
+
+  def accepts(s: List[C]) : Boolean = 
+    Try(fins(deltas(start, s))) getOrElse false
 }
 
-
-// DFA class
-case class DFA(states: States, 
-               start: State, 
-               delta: (State, Char) => State, 
-               fins: States) {
-  
-  def deltas(q: State, s: List[Char]) : State = s match {
-    case Nil => q
-    case c::cs => deltas(delta(q, c), cs) 
-  }
-  
-  // wether a string is accepted by the automaton or not
-  def accepts(s: String) : Boolean = 
-    Try(fins contains 
-      (deltas(start, s.toList))) getOrElse false
-}
-
+// the example shown earlier in the handout 
+abstract class State
+case object Q0 extends State
+case object Q1 extends State
+case object Q2 extends State
+case object Q3 extends State
+case object Q4 extends State
 
-// example DFA from the lectures
-val Q0 = NewState()
-val Q1 = NewState()
-val Q2 = NewState()
-val Q3 = NewState()
-val Q4 = NewState()
-
+val delta : (State, Char) :=> State = 
+  { case (Q0, 'a') => Q1
+    case (Q0, 'b') => Q2
+    case (Q1, 'a') => Q4
+    case (Q1, 'b') => Q2
+    case (Q2, 'a') => Q3
+    case (Q2, 'b') => Q2
+    case (Q3, 'a') => Q4
+    case (Q3, 'b') => Q0
+    case (Q4, 'a') => Q4
+    case (Q4, 'b') => Q4 }
 
-val delta : (State, Char) => State = {
-  case (Q0, 'a') => Q1
-  case (Q0, 'b') => Q2
-  case (Q1, 'a') => Q4
-  case (Q1, 'b') => Q2
-  case (Q2, 'a') => Q3
-  case (Q2, 'b') => Q2
-  case (Q3, 'a') => Q4
-  case (Q3, 'b') => Q0
-  case (Q4, 'a') => Q4
-  case (Q4, 'b') => Q4
-}
+val dfa = DFA(Q0, delta, Set[State](Q4))
 
-val DFA1 = DFA(Set(Q0, Q1, Q2, Q3, Q4), Q0, delta, Set(Q4))
-
-println(DFA1.accepts("aaa"))
-println(DFA1.accepts("bb"))
-println(DFA1.accepts("aaac"))
-
-
+dfa.accepts("bbabaab".toList)   // true
+dfa.accepts("baba".toList)      // false
--- a/progs/re1.scala	Wed Mar 15 14:34:10 2017 +0000
+++ b/progs/re1.scala	Wed Mar 22 14:10:01 2017 +0000
@@ -77,6 +77,7 @@
   (end - start)/(i * 1.0e9)
 }
 
+
 //test: (a?{n}) (a{n})
 for (i <- 1 to 20) {
   println(i + ": " + "%.5f".format(time_needed(2, matches(EVIL1(i), "a" * i))))
@@ -115,3 +116,23 @@
 size(EVIL1(3))  // 17
 size(EVIL1(5))  // 29
 size(EVIL1(7))  // 41
+
+
+// given a regular expression and building successive
+// derivatives might result into bigger and bigger
+// regular expressions...here is an example for this:
+
+val BIG_aux = STAR(ALT(CHAR('a'), CHAR('b')))
+val BIG = SEQ(BIG_aux, SEQ(CHAR('a'),SEQ(CHAR('b'), BIG_aux)))
+
+size(ders("".toList, BIG))              // 13
+size(ders("ab".toList, BIG))            // 51
+size(ders("abab".toList, BIG))          // 112
+size(ders("ababab".toList, BIG))        // 191
+size(ders("abababab".toList, BIG))      // 288
+size(ders("ababababab".toList, BIG))    // 403
+size(ders("abababababab".toList, BIG))  // 536
+
+
+size(ders(("ab" * 200).toList, BIG))    // 366808
+