# HG changeset patch # User Christian Urban # Date 1411736815 -3600 # Node ID 1e4da6d2490c5ff87a5326d6dcadaf26be802f3a # Parent 70c307641d05b7ff39f31263e94ac951c7db5ebb updated programs diff -r 70c307641d05 -r 1e4da6d2490c handouts/ho01.pdf Binary file handouts/ho01.pdf has changed diff -r 70c307641d05 -r 1e4da6d2490c handouts/ho01.tex --- a/handouts/ho01.tex Mon Sep 22 13:42:14 2014 +0100 +++ b/handouts/ho01.tex Fri Sep 26 14:06:55 2014 +0100 @@ -160,7 +160,7 @@ \draw (1pt,\y) -- (-3pt,\y) node[anchor=east] {\y}; %labels \node[below=0.6cm] at (x axis mid) {number of \texttt{a}s}; - \node[rotate=90,above=0.8cm] at (y axis mid) {time in secs}; + \node[rotate=90,left=0.9cm] at (y axis mid) {time in secs}; %plots \draw[color=blue] plot[mark=*] file {re-python.data}; @@ -222,7 +222,7 @@ \draw (1pt,\y) -- (-3pt,\y) node[anchor=east] {\y}; %labels \node[below=0.6cm] at (x axis mid) {number of \texttt{a}s}; - \node[rotate=90, above=0.8cm] at (y axis mid) {time in secs}; + \node[rotate=90,left=0.9cm] at (y axis mid) {time in secs}; %plots \draw[color=green] plot[mark=square*, mark options={fill=white} ] @@ -247,7 +247,7 @@ \begin{center} \begin{tabular}{r@{\hspace{1mm}}r@{\hspace{1mm}}l@{\hspace{13mm}}l} $r$ & $::=$ & $\varnothing$ & null\\ - & $\mid$ & $\epsilon$ & empty string / "" / []\\ + & $\mid$ & $\epsilon$ & empty string / \texttt{""} / []\\ & $\mid$ & $c$ & single character\\ & $\mid$ & $r_1 + r_2$ & alternative / choice\\ & $\mid$ & $r_1 \cdot r_2$ & sequence\\ @@ -444,7 +444,7 @@ \varnothing^* \equiv \epsilon^* \] -\noindent holds. Such equivalences will be important for our +\noindent holds? Such equivalences will be important for our matching algorithm, because we can use them to simplify regular expressions. diff -r 70c307641d05 -r 1e4da6d2490c handouts/ho02.pdf Binary file handouts/ho02.pdf has changed diff -r 70c307641d05 -r 1e4da6d2490c handouts/ho02.tex --- a/handouts/ho02.tex Mon Sep 22 13:42:14 2014 +0100 +++ b/handouts/ho02.tex Fri Sep 26 14:06:55 2014 +0100 @@ -7,10 +7,9 @@ \section*{Handout 2} -Having specified what problem our matching algorithm, -\pcode{match}, is supposed to solve, namely for a given -regular expression $r$ and string $s$ answer \textit{true} if -and only if +Having specified what problem our matching algorithm, \pcode{match}, +is supposed to solve, namely for a given regular expression $r$ and +string $s$ answer \textit{true} if and only if \[ s \in L(r) @@ -21,10 +20,18 @@ because in general the set of strings $L$ returns is infinite (recall what $L(a^*)$ is). In such cases there is no way we can implement an exhaustive test for whether a string is -member of this set or not. +member of this set or not. Before we come to the matching +algorithm, lets have a closer look at what it means when +two regular expressions are equivalent. + +\subsection*{Regular Expression Equivalences} -The algorithm we will define below consists of two parts. One is the function $nullable$ which takes a -regular expression as argument and decides whether it can match the empty string (this means it returns a + +\subsection*{Matching Algorithm} + +The algorithm we will define below consists of two parts. One is the +function $nullable$ which takes a regular expression as argument and +decides whether it can match the empty string (this means it returns a boolean). This can be easily defined recursively as follows: \begin{center} @@ -46,13 +53,17 @@ \] \noindent -Note on the left-hand side we have a function we can implement; on the right we have its specification. +Note on the left-hand side we have a function we can implement; on the +right we have its specification. -The other function of our matching algorithm calculates a \emph{derivative} of a regular expression. This is a function -which will take a regular expression, say $r$, and a character, say $c$, as argument and return -a new regular expression. Be careful that the intuition behind this function is not so easy to grasp on first -reading. Essentially this function solves the following problem: if $r$ can match a string of the form -$c\!::\!s$, what does the regular expression look like that can match just $s$. The definition of this +The other function of our matching algorithm calculates a +\emph{derivative} of a regular expression. This is a function which +will take a regular expression, say $r$, and a character, say $c$, as +argument and return a new regular expression. Be careful that the +intuition behind this function is not so easy to grasp on first +reading. Essentially this function solves the following problem: if +$r$ can match a string of the form $c\!::\!s$, what does the regular +expression look like that can match just $s$. The definition of this function is as follows: \begin{center} @@ -69,33 +80,45 @@ \end{center} \noindent -The first two clauses can be rationalised as follows: recall that $der$ should calculate a regular -expression, if the ``input'' regular expression can match a string of the form $c\!::\!s$. Since neither -$\varnothing$ nor $\epsilon$ can match such a string we return $\varnothing$. In the third case -we have to make a case-distinction: In case the regular expression is $c$, then clearly it can recognise -a string of the form $c\!::\!s$, just that $s$ is the empty string. Therefore we return the $\epsilon$-regular -expression. In the other case we again return $\varnothing$ since no string of the $c\!::\!s$ can be matched. -The $+$-case is relatively straightforward: all strings of the form $c\!::\!s$ are either matched by the -regular expression $r_1$ or $r_2$. So we just have to recursively call $der$ with these two regular -expressions and compose the results again with $+$. The $\cdot$-case is more complicated: -if $r_1\cdot r_2$ matches a string of the form $c\!::\!s$, then the first part must be matched by $r_1$. -Consequently, it makes sense to construct the regular expression for $s$ by calling $der$ with $r_1$ and -``appending'' $r_2$. There is however one exception to this simple rule: if $r_1$ can match the empty -string, then all of $c\!::\!s$ is matched by $r_2$. So in case $r_1$ is nullable (that is can match the -empty string) we have to allow the choice $der\,c\,r_2$ for calculating the regular expression that can match -$s$. The $*$-case is again simple: if $r^*$ matches a string of the form $c\!::\!s$, then the first part must be -``matched'' by a single copy of $r$. Therefore we call recursively $der\,c\,r$ and ``append'' $r^*$ in order to -match the rest of $s$. +The first two clauses can be rationalised as follows: recall that +$der$ should calculate a regular expression, if the ``input'' regular +expression can match a string of the form $c\!::\!s$. Since neither +$\varnothing$ nor $\epsilon$ can match such a string we return +$\varnothing$. In the third case we have to make a case-distinction: +In case the regular expression is $c$, then clearly it can recognise a +string of the form $c\!::\!s$, just that $s$ is the empty +string. Therefore we return the $\epsilon$-regular expression. In the +other case we again return $\varnothing$ since no string of the +$c\!::\!s$ can be matched. The $+$-case is relatively +straightforward: all strings of the form $c\!::\!s$ are either matched +by the regular expression $r_1$ or $r_2$. So we just have to +recursively call $der$ with these two regular expressions and compose +the results again with $+$. The $\cdot$-case is more complicated: if +$r_1\cdot r_2$ matches a string of the form $c\!::\!s$, then the first +part must be matched by $r_1$. Consequently, it makes sense to +construct the regular expression for $s$ by calling $der$ with $r_1$ +and ``appending'' $r_2$. There is however one exception to this simple +rule: if $r_1$ can match the empty string, then all of $c\!::\!s$ is +matched by $r_2$. So in case $r_1$ is nullable (that is can match the +empty string) we have to allow the choice $der\,c\,r_2$ for +calculating the regular expression that can match $s$. The $*$-case is +again simple: if $r^*$ matches a string of the form $c\!::\!s$, then +the first part must be ``matched'' by a single copy of $r$. Therefore +we call recursively $der\,c\,r$ and ``append'' $r^*$ in order to match +the rest of $s$. -Another way to rationalise the definition of $der$ is to consider the following operation on sets: +Another way to rationalise the definition of $der$ is to consider the +following operation on sets: \[ Der\,c\,A\;\dn\;\{s\,|\,c\!::\!s \in A\} \] \noindent -which essentially transforms a set of strings $A$ by filtering out all strings that do not start with $c$ and then -strips off the $c$ from all the remaining strings. For example suppose $A = \{"f\!oo", "bar", "f\!rak"\}$ then +which essentially transforms a set of strings $A$ by filtering out all +strings that do not start with $c$ and then strips off the $c$ from +all the remaining strings. For example suppose $A = \{"f\!oo", "bar", +"f\!rak"\}$ then \[ Der\,f\,A = \{"oo", "rak"\}\quad,\quad Der\,b\,A = \{"ar"\} \quad \text{and} \quad @@ -103,20 +126,23 @@ \] \noindent -Note that in the last case $Der$ is empty, because no string in $A$ starts with $a$. With this operation we can -state the following property about $der$: +Note that in the last case $Der$ is empty, because no string in $A$ +starts with $a$. With this operation we can state the following +property about $der$: \[ L(der\,c\,r) = Der\,c\,(L(r)) \] \noindent -This property clarifies what regular expression $der$ calculates, namely take the set of strings -that $r$ can match (that is $L(r)$), filter out all strings not starting with $c$ and strip off the $c$ from the -remaining strings---this is exactly the language that $der\,c\,r$ can match. +This property clarifies what regular expression $der$ calculates, +namely take the set of strings that $r$ can match (that is $L(r)$), +filter out all strings not starting with $c$ and strip off the $c$ +from the remaining strings---this is exactly the language that +$der\,c\,r$ can match. -If we want to find out whether the string $"abc"$ is matched by the regular expression $r$ -then we can iteratively apply $Der$ as follows +If we want to find out whether the string $"abc"$ is matched by the +regular expression $r$ then we can iteratively apply $Der$ as follows \begin{enumerate} \item $Der\,a\,(L(r))$ @@ -125,10 +151,12 @@ \end{enumerate} \noindent -In the last step we need to test whether the empty string is in the set. Our matching algorithm will work similarly, -just using regular expression instead of sets. For this we need to lift the notion of derivatives from characters to strings. This can be -done using the following function, taking a string and regular expression as input and a regular expression -as output. +In the last step we need to test whether the empty string is in the +set. Our matching algorithm will work similarly, just using regular +expression instead of sets. For this we need to lift the notion of +derivatives from characters to strings. This can be done using the +following function, taking a string and regular expression as input +and a regular expression as output. \begin{center} \begin{tabular}{@ {}l@ {\hspace{2mm}}c@ {\hspace{2mm}}l@ {\hspace{-10mm}}l@ {}} @@ -152,9 +180,15 @@ \] \noindent -holds, which means our algorithm satisfies the specification. This algorithm was introduced by -Janus Brzozowski in 1964. Its main attractions are simplicity and being fast, as well as -being easily extendable for other regular expressions such as $r^{\{n\}}$, $r^?$, $\sim{}r$ and so on. +holds, which means our algorithm satisfies the specification. This +algorithm was introduced by Janus Brzozowski in 1964. Its main +attractions are simplicity and being fast, as well as being easily +extendable for other regular expressions such as $r^{\{n\}}$, $r^?$, +$\sim{}r$ and so on. + +\subsection*{The Matching Algorithm in Scala} + + \begin{figure}[p] {\lstset{language=Scala}\texttt{\lstinputlisting{../progs/app5.scala}}} diff -r 70c307641d05 -r 1e4da6d2490c hws/hw01.pdf Binary file hws/hw01.pdf has changed diff -r 70c307641d05 -r 1e4da6d2490c hws/hw01.tex --- a/hws/hw01.tex Mon Sep 22 13:42:14 2014 +0100 +++ b/hws/hw01.tex Fri Sep 26 14:06:55 2014 +0100 @@ -37,7 +37,8 @@ \item Assume the concatenation operation of two strings is written as $s_1 @ s_2$. Define the operation of - \emph{concatenating} two sets of strings. + \emph{concatenating}, written $\_ \,@\, \_$ two sets of + strings. \item Assume a set $A$ contains 4 strings and a set $B$ 7 strings, how many strings are in $A @ B$? @@ -47,14 +48,14 @@ $\_^{n+1}$.) \item How many regular expressions are there to match the - string $abc$? (How many if they cannot include + string $abc$? How many if they cannot include $\epsilon$ and $\varnothing$? How many if they are also not allowed to contain stars? How many if they are also - not allowed to contain $\_ + \_$?) + not allowed to contain $\_ + \_$? \item When are two regular expressions equivalent? Can you think of instances where two regular expressions match - teh same strings, but it is not so obvious that they do? For + the same strings, but it is not so obvious that they do? For example $a + b$ and $b + a$ do not count\ldots they obviously match the same strings, namely $[a]$ and $[b]$. diff -r 70c307641d05 -r 1e4da6d2490c hws/hw02.pdf Binary file hws/hw02.pdf has changed diff -r 70c307641d05 -r 1e4da6d2490c hws/hw02.tex --- a/hws/hw02.tex Mon Sep 22 13:42:14 2014 +0100 +++ b/hws/hw02.tex Fri Sep 26 14:06:55 2014 +0100 @@ -9,40 +9,53 @@ \section*{Homework 2} \begin{enumerate} -\item Review the first handout about sets of strings and read the second handout. -Assuming the alphabet is $\{a, b\}$, decide which of the following equations are true -in general for arbitrary languages $A$, $B$ and $C$: +\item Review the first handout about sets of strings and read + the second handout. Assuming the alphabet is $\{a, b\}$, + decide which of the following equations are true in + general for arbitrary languages $A$, $B$ and $C$: + \begin{eqnarray} -(A \cup B) @ C & = & A @ C \cup B @ C\nonumber\\ -A^* \cup B^* & = & (A \cup B)^*\nonumber\\ -A^* @ A^* & = & A^*\nonumber\\ -(A \cap B)@ C & = & (A@C) \cap (B@C)\nonumber +(A \cup B) @ C & =^? & A @ C \cup B @ C\nonumber\\ +A^* \cup B^* & =^? & (A \cup B)^*\nonumber\\ +A^* @ A^* & =^? & A^*\nonumber\\ +(A \cap B)@ C & =^? & (A@C) \cap (B@C)\nonumber \end{eqnarray} -\noindent -In case an equation is true, give an explanation; otherwise give a counter-example. +\noindent In case an equation is true, give an explanation; +otherwise give a counter-example. -\item What is the meaning of a regular expression? Give an inductive definition. +\item What is the meaning of a regular expression? Give an + inductive definition. -\item Given the regular expressions $r_1 = \epsilon$ and $r_2 = \varnothing$ and $r_3 = a$. -How many strings can the regular expressions $r_1^*$, $r_2^*$ and $r_3^*$ each match? +\item Given the regular expressions $r_1 = \epsilon$ and $r_2 + = \varnothing$ and $r_3 = a$. How many strings can the + regular expressions $r_1^*$, $r_2^*$ and $r_3^*$ each + match? -\item Give regular expressions for (a) decimal numbers and for (b) binary numbers. -(Hint: Observe that the empty string is not a number. Also observe that leading 0s -are normally not written.) +\item Give regular expressions for (a) decimal numbers and for + (b) binary numbers. (Hint: Observe that the empty string + is not a number. Also observe that leading 0s are + normally not written.) -\item Decide whether the following two regular expressions are equivalent $(\epsilon + a)^* \equiv^? a^*$ and -$(a \cdot b)^* \cdot a \equiv^? a \cdot (b \cdot a)^*$. +\item Decide whether the following two regular expressions are + equivalent $(\epsilon + a)^* \equiv^? a^*$ and $(a \cdot + b)^* \cdot a \equiv^? a \cdot (b \cdot a)^*$. -\item Given the regular expression $r = (a \cdot b + b)^*$. Compute what the derivative of $r$ is with respect to -$a$ and $b$. Is $r$ nullable? +\item Given the regular expression $r = (a \cdot b + b)^*$. + Compute what the derivative of $r$ is with respect to + $a$, $b$ and $c$. Is $r$ nullable? \item Prove that for all regular expressions $r$ we have -\begin{center} -$\text{nullable}(r)$ \quad if and only if \quad $\texttt{""} \in L(r)$ + +\begin{center} + $\textit{nullable}(r) \quad \text{if and only if} + \quad [] \in L(r)$ \end{center} + Write down clearly in each case what you need to prove and + what are the assumptions. + \end{enumerate} \end{document} diff -r 70c307641d05 -r 1e4da6d2490c hws/hw03.pdf Binary file hws/hw03.pdf has changed diff -r 70c307641d05 -r 1e4da6d2490c hws/hw03.tex --- a/hws/hw03.tex Mon Sep 22 13:42:14 2014 +0100 +++ b/hws/hw03.tex Fri Sep 26 14:06:55 2014 +0100 @@ -11,34 +11,41 @@ \begin{enumerate} \item What is a regular language? -\item Assume you have an alphabet consisting of the letters $a$, $b$ and $c$ only. -(1) Find a regular expression that recognises the two strings $ab$ and $ac$. (2) -Find a regular expression that matches all strings \emph{except} these two strings. -Note, you can only use regular expressions of the form +\item Assume you have an alphabet consisting of the letters + $a$, $b$ and $c$ only. (1) Find a regular expression + that recognises the two strings $ab$ and $ac$. (2) Find + a regular expression that matches all strings + \emph{except} these two strings. Note, you can only use + regular expressions of the form + + \begin{center} $r ::= + \varnothing \;|\; \epsilon \;|\; c \;|\; r_1 + r_2 \;|\; + r_1 \cdot r_2 \;|\; r^*$ + \end{center} + +\item Define the function \textit{zeroable} which takes a + regular expression as argument and returns a boolean. + The function should satisfy the following property: + \begin{center} -$r ::= \varnothing \;|\; \epsilon \;|\; c \;|\; r_1 + r_2 \;|\; r_1 \cdot r_2 \;|\; r^*$ -\end{center} - -\item Define the function $zeroable$ which takes a regular expression as argument -and returns a boolean. The -function should satisfy the following property: -\begin{center} -$zeroable(r)$ \;if and only if\; $L(r) = \varnothing$ +$\textit{zeroable(r)} \;\text{if and only if}\; L(r) = \varnothing$ \end{center} \item Define the tokens and regular expressions for a language -consisting of numbers, left-parenthesis (, right-parenthesis ), -identifiers and the operations $+$, $-$ and $*$. Can the following strings -in this language be lexed? + consisting of numbers, left-parenthesis $($, + right-parenthesis $)$, identifiers and the operations $+$, + $-$ and $*$. Can the following strings in this language + be lexed? \begin{itemize} -\item \texttt{"}$(a + 3) * b$\texttt{"} -\item \texttt{"}$)()++ -33$\texttt{"} -\item \texttt{"}$(a / 3) * 3$\texttt{"} + \item $(a + 3) * b$ + \item $)()++ -33$ + \item $(a / 3) * 3$ \end{itemize} +In case they can, can you give the corresponding token +sequences. -In case they can, can you give the corresponding token sequences. \end{enumerate} diff -r 70c307641d05 -r 1e4da6d2490c progs/re1.scala --- a/progs/re1.scala Mon Sep 22 13:42:14 2014 +0100 +++ b/progs/re1.scala Fri Sep 26 14:06:55 2014 +0100 @@ -1,7 +1,4 @@ -import scala.language.implicitConversions - abstract class Rexp - case object NULL extends Rexp case object EMPTY extends Rexp case class CHAR(c: Char) extends Rexp @@ -9,15 +6,6 @@ case class SEQ(r1: Rexp, r2: Rexp) extends Rexp case class STAR(r: Rexp) extends Rexp -// some convenience for typing in regular expressions -def charlist2rexp(s : List[Char]) : Rexp = s match { - case Nil => EMPTY - case c::Nil => CHAR(c) - case c::s => SEQ(CHAR(c), charlist2rexp(s)) -} -implicit def string2rexp(s : String) : Rexp = charlist2rexp(s.toList) - - // nullable function: tests whether the regular // expression can recognise the empty string def nullable (r: Rexp) : Boolean = r match { @@ -50,12 +38,13 @@ // main matcher function def matcher(r: Rexp, s: String) : Boolean = nullable(ders(s.toList, r)) -//example +//example from the homework //val r = STAR(ALT(SEQ(CHAR('a'), CHAR('b')), CHAR('b'))) //der('a', r) //der('b', r) +//der('c', r) -//one or zero +//optional: one or zero times def OPT(r: Rexp) = ALT(r, EMPTY) //n-times @@ -65,8 +54,10 @@ case n => SEQ(r, NTIMES(r, n - 1)) } -def EVIL(n: Int) = SEQ(NTIMES(OPT("a"), n), NTIMES("a", n)) +// the evil regular expression a?{n} a{n} +def EVIL(n: Int) = SEQ(NTIMES(OPT(CHAR('a')), n), NTIMES(CHAR('a'), n)) +//for measuring time def time_needed[T](i: Int, code: => T) = { val start = System.nanoTime() for (j <- 1 to i) code @@ -77,5 +68,3 @@ for (i <- 1 to 29) { println(i + ": " + "%.5f".format(time_needed(1, matcher(EVIL(i), "a" * i)))) } - - diff -r 70c307641d05 -r 1e4da6d2490c progs/re2.scala --- a/progs/re2.scala Mon Sep 22 13:42:14 2014 +0100 +++ b/progs/re2.scala Fri Sep 26 14:06:55 2014 +0100 @@ -1,26 +1,12 @@ -import scala.language.implicitConversions - abstract class Rexp - case object NULL extends Rexp case object EMPTY extends Rexp case class CHAR(c: Char) extends Rexp case class ALT(r1: Rexp, r2: Rexp) extends Rexp case class SEQ(r1: Rexp, r2: Rexp) extends Rexp case class STAR(r: Rexp) extends Rexp -case class NTIMES(r: Rexp, n: Int) extends Rexp +case class NTIMES(r: Rexp, n: Int) extends Rexp //explicit constructor -// some convenience for typing in regular expressions -def charlist2rexp(s : List[Char]) : Rexp = s match { - case Nil => EMPTY - case c::Nil => CHAR(c) - case c::s => SEQ(CHAR(c), charlist2rexp(s)) -} -implicit def string2rexp(s : String) : Rexp = charlist2rexp(s.toList) - - -// nullable function: tests whether the regular -// expression can recognise the empty string def nullable (r: Rexp) : Boolean = r match { case NULL => false case EMPTY => true @@ -31,7 +17,6 @@ case NTIMES(r, i) => if (i == 0) true else nullable(r) } -// derivative of a regular expression w.r.t. a character def der (c: Char, r: Rexp) : Rexp = r match { case NULL => NULL case EMPTY => NULL @@ -45,20 +30,18 @@ if (i == 0) NULL else SEQ(der(c, r), NTIMES(r, i - 1)) } -// derivative w.r.t. a string (iterates der) def ders (s: List[Char], r: Rexp) : Rexp = s match { case Nil => r case c::s => ders(s, der(c, r)) } -// main matcher function def matcher(r: Rexp, s: String) : Boolean = nullable(ders(s.toList, r)) -//one or zero +//optional: one or zero times def OPT(r: Rexp) = ALT(r, EMPTY) -def EVIL(n: Int) = SEQ(NTIMES(OPT("a"), n), NTIMES("a", n)) +def EVIL(n: Int) = SEQ(NTIMES(OPT(CHAR('a')), n), NTIMES(CHAR('a'), n)) def time_needed[T](i: Int, code: => T) = { val start = System.nanoTime() @@ -71,7 +54,7 @@ println(i + ": " + "%.5f".format(time_needed(1, matcher(EVIL(i), "a" * i)))) } - +//a bit bolder test for (i <- 1 to 1000 by 50) { println(i + " " + "%.5f".format(time_needed(1, matcher(EVIL(i), "a" * i)))) } diff -r 70c307641d05 -r 1e4da6d2490c progs/re3.scala --- a/progs/re3.scala Mon Sep 22 13:42:14 2014 +0100 +++ b/progs/re3.scala Fri Sep 26 14:06:55 2014 +0100 @@ -1,57 +1,37 @@ -import scala.language.implicitConversions - -abstract class Rexp { - def simp : Rexp = this -} - +abstract class Rexp case object NULL extends Rexp case object EMPTY extends Rexp case class CHAR(c: Char) extends Rexp -case class ALT(r1: Rexp, r2: Rexp) extends Rexp { - override def simp = (r1.simp, r2.simp) match { - case (NULL, r) => r - case (r, NULL) => r - case (r, EMPTY) => if (nullable(r)) r else ALT(r, EMPTY) - case (EMPTY, r) => if (nullable(r)) r else ALT(r, EMPTY) - case (r1, r2) => if (r1 == r2) r1 else ALT(r1, r2) +case class ALT(r1: Rexp, r2: Rexp) extends Rexp +case class SEQ(r1: Rexp, r2: Rexp) extends Rexp +case class STAR(r: Rexp) extends Rexp +case class NTIMES(r: Rexp, n: Int) extends Rexp + +def simp(r: Rexp): Rexp = r match { + case ALT(r1, r2) => { + val r1s = simp(r1) + val r2s = simp(r2) + (r1s, r2s) match { + case (NULL, _) => r2s + case (_, NULL) => r1s + case _ => if (r1s == r2s) r1s else ALT(r1s, r2s) + } } -} -case class SEQ(r1: Rexp, r2: Rexp) extends Rexp { - override def simp = (r1.simp, r2.simp) match { - case (NULL, _) => NULL - case (_, NULL) => NULL - case (EMPTY, r) => r - case (r, EMPTY) => r - case (r1, r2) => SEQ(r1, r2) + case SEQ(r1, r2) => { + val r1s = simp(r1) + val r2s = simp(r2) + (r1s, r2s) match { + case (NULL, _) => NULL + case (_, NULL) => NULL + case (EMPTY, _) => r2s + case (_, EMPTY) => r1s + case _ => SEQ(r1s, r2s) + } } -} -case class STAR(r: Rexp) extends Rexp { - override def simp = r.simp match { - case NULL => EMPTY - case EMPTY => EMPTY - case r => STAR(r) - } -} -case class NTIMES(r: Rexp, n: Int) extends Rexp { - override def simp = if (n == 0) EMPTY else - r.simp match { - case NULL => NULL - case EMPTY => EMPTY - case r => NTIMES(r, n) - } + case NTIMES(r, n) => NTIMES(simp(r), n) + case r => r } -// some convenience for typing in regular expressions -def charlist2rexp(s : List[Char]) : Rexp = s match { - case Nil => EMPTY - case c::Nil => CHAR(c) - case c::s => SEQ(CHAR(c), charlist2rexp(s)) -} -implicit def string2rexp(s : String) : Rexp = charlist2rexp(s.toList) - - -// nullable function: tests whether the regular -// expression can recognise the empty string def nullable (r: Rexp) : Boolean = r match { case NULL => false case EMPTY => true @@ -62,7 +42,6 @@ case NTIMES(r, i) => if (i == 0) true else nullable(r) } -// derivative of a regular expression w.r.t. a character def der (c: Char, r: Rexp) : Rexp = r match { case NULL => NULL case EMPTY => NULL @@ -76,20 +55,18 @@ if (i == 0) NULL else SEQ(der(c, r), NTIMES(r, i - 1)) } -// derivative w.r.t. a string (iterates der) def ders (s: List[Char], r: Rexp) : Rexp = s match { case Nil => r - case c::s => ders(s, der(c, r).simp) + case c::s => ders(s, simp(der(c, r))) } -// main matcher function def matcher(r: Rexp, s: String) : Boolean = nullable(ders(s.toList, r)) //one or zero def OPT(r: Rexp) = ALT(r, EMPTY) -def EVIL(n: Int) = SEQ(NTIMES(OPT("a"), n), NTIMES("a", n)) +def EVIL(n: Int) = SEQ(NTIMES(OPT(CHAR('a')), n), NTIMES(CHAR('a'), n)) def time_needed[T](i: Int, code: => T) = { val start = System.nanoTime() diff -r 70c307641d05 -r 1e4da6d2490c progs/scraper.scala --- a/progs/scraper.scala Mon Sep 22 13:42:14 2014 +0100 +++ b/progs/scraper.scala Fri Sep 26 14:06:55 2014 +0100 @@ -24,7 +24,7 @@ val wr = new OutputStreamWriter(conn.getOutputStream()) //possible date ranges -wr.write("Fdate=2012-8-24&Tdate=2012-09-25") +wr.write("Fdate=2011-6-24&Tdate=2011-09-25") //wr.write("Fdate=2011-8-24&Tdate=2011-09-25") //wr.write("Fdate=2001-9-18&Tdate=2012-09-25") wr.flush diff -r 70c307641d05 -r 1e4da6d2490c slides/slides02.pdf Binary file slides/slides02.pdf has changed diff -r 70c307641d05 -r 1e4da6d2490c slides/slides02.tex --- a/slides/slides02.tex Mon Sep 22 13:42:14 2014 +0100 +++ b/slides/slides02.tex Fri Sep 26 14:06:55 2014 +0100 @@ -1,34 +1,48 @@ \documentclass[dvipsnames,14pt,t]{beamer} -\usepackage{beamerthemeplaincu} -\usepackage[absolute,overlay]{textpos} -\usepackage{ifthen} -\usepackage{tikz} -\usepackage{pgf} -\usepackage{calc} -\usepackage{ulem} -\usepackage{courier} -\usepackage{listings} -\renewcommand{\uline}[1]{#1} -\usetikzlibrary{arrows} -\usetikzlibrary{automata} -\usetikzlibrary{shapes} -\usetikzlibrary{shadows} -\usetikzlibrary{positioning} -\usetikzlibrary{plotmarks} -\usetikzlibrary{calc} -\usepackage{graphicx} +\usepackage{../slides} +\usepackage{../graphics} \usepackage{../langs} \usepackage{../data} -\makeatletter -\lst@CCPutMacro\lst@ProcessOther {"2D}{\lst@ttfamily{-{}}{-{}}} -\@empty\z@\@empty -\makeatother +%\usepackage{beamerthemeplaincu} +%\usepackage[absolute,overlay]{textpos} +%\usepackage{ifthen} +%\usepackage{tikz} +%\usepackage{pgf} +%\usepackage{calc} +%\usepackage{ulem} +%\usepackage{courier} +%\usepackage{listings} +%\renewcommand{\uline}[1]{#1} +%\usetikzlibrary{arrows} +%\usetikzlibrary{automata} +%\usetikzlibrary{shapes} +%\usetikzlibrary{shadows} +%\usetikzlibrary{positioning} +%\usetikzlibrary{plotmarks} +%\usetikzlibrary{calc} +%\usepackage{graphicx} +%\usepackage{../langs} +%\usepackage{../data} + +%\makeatletter +%\lst@CCPutMacro\lst@ProcessOther {"2D}{\lst@ttfamily{-{}}{-{}}} +%\@empty\z@\@empty +%\makeatother + +\hfuzz=220pt + +\lstset{language=Scala, + style=mystyle, + numbersep=0pt, + numbers=none, + xleftmargin=0mm} + +\newcommand{\bl}[1]{\textcolor{blue}{#1}} % beamer stuff -\renewcommand{\slidecaption}{AFL 02, King's College London, 2.~October 2013} -\newcommand{\bl}[1]{\textcolor{blue}{#1}} -\newcommand{\dn}{\stackrel{\mbox{\scriptsize def}}{=}}% for definitions +\renewcommand{\slidecaption}{AFL 02, King's College London} + \begin{document}