Binary file handouts/ho04.pdf has changed
--- a/handouts/ho04.tex Tue Sep 20 12:17:01 2016 +0100
+++ b/handouts/ho04.tex Tue Sep 20 12:25:09 2016 +0100
@@ -10,20 +10,34 @@
\section*{Handout 4 (Sulzmann \& Lu Algorithm)}
So far our algorithm based on derivatives was only able to say
-yes or no depending on whether a string was matched by regular
+yes or no depending on whether a string was matched by a regular
expression or not. Often a more interesting question is to
find out \emph{how} a regular expression matched a string?
Answering this question will also help us with the problem we
are after, namely tokenising an input string.
-The algorithm we will be looking at was designed by Sulzmann
-\& Lu in a rather recent paper (from 2014). A link to it is
+The algorithm we will be looking at in this lecture was designed by Sulzmann
+\& Lu in a rather recent research paper (from 2014). A link to it is
provided on KEATS, in case you are interested.\footnote{In my
humble opinion this is an interesting instance of the research
literature: it contains a very neat idea, but its presentation
-is rather sloppy. In earlier versions of their paper, a King's
-student and I found several rather annoying typos in their
-examples and definitions.} In order to give an answer for
+is rather sloppy. In earlier versions of this paper, a King's
+undergraduate student and I found several rather annoying typos in the
+examples and definitions.} My PhD student Fahad Ausaf and I even more recently
+wrote a paper where we build on their result: we provided a
+mathematical proof that their algorithm is really correct---the proof
+Sulzmann \& Lu had originally given contained major flaws. Such correctness
+proofs are important: Kuklewicz maintains a unit-test library
+for the kind of algorithma we are interested in here and he showed
+that many implementations in the ``wild'' are buggy, that is not
+satisfy his unit tests:
+
+\begin{center}
+\url{http://www.haskell.org/haskellwiki/Regex_Posix}
+\end{center}
+
+
+In order to give an answer for
\emph{how} a regular expression matches a string, Sulzmann and
Lu use \emph{values}. A value will be the output of the
algorithm whenever the regular expression matches the string.
@@ -54,7 +68,7 @@
& $\mid$ & $Seq(v_1,v_2)$\\
& $\mid$ & $\Left(v)$ \\
& $\mid$ & $Right(v)$ \\
- & $\mid$ & $Stars [v_1,\ldots\,v_n]$ \\
+ & $\mid$ & $Stars\,[v_1,\ldots\,v_n]$ \\
\end{tabular}
\end{tabular}
\end{center}
@@ -71,7 +85,7 @@
of $r^*$. For sequence, there is exactly one value, composed
of two component values ($v_1$ and $v_2$).
-My definition of regular expressions and values in Scala is
+My implementation of regular expressions and values in Scala is
shown below. I have in my implementation the convention that
regular expressions are written entirely with upper-case
letters, while values just start with a single upper-case
@@ -87,14 +101,16 @@
Graphically the algorithm by Sulzmann \& Lu can be illustrated
by the picture in Figure~\ref{Sulz} where the path from the
-left to the right involving $der/nullable$ is the first phase
-of the algorithm and $mkeps/inj$, the path from right to left,
+left to the right involving $\textit{der}/\textit{nullable}$ is the first phase
+of the algorithm and $\textit{mkeps}/\textit{inj}$, the path from right to left,
the second phase. This picture shows the steps required when a
regular expression, say $r_1$, matches the string $abc$. We
first build the three derivatives (according to $a$, $b$ and
-$c$). We then use $nullable$ to find out whether the resulting
+$c$). We then use $\textit{nullable}$ to find out whether the resulting
regular expression can match the empty string. If yes, we call
-the function $mkeps$.
+the function $\textit{mkeps}$. The $\textit{mkeps}$ function calculates a value for how a regular
+expression has matched the empty string. Its definition
+is as follows:
\begin{figure}[t]
\begin{center}
@@ -102,43 +118,41 @@
every node/.style={minimum size=7mm}]
\node (r1) {$r_1$};
\node (r2) [right=of r1]{$r_2$};
-\draw[->,line width=1mm](r1)--(r2) node[above,midway] {$der\,a$};
+\draw[->,line width=1mm](r1)--(r2) node[above,midway] {$\textit{der}\,a$};
\node (r3) [right=of r2]{$r_3$};
-\draw[->,line width=1mm](r2)--(r3) node[above,midway] {$der\,b$};
+\draw[->,line width=1mm](r2)--(r3) node[above,midway] {$\textit{der}\,b$};
\node (r4) [right=of r3]{$r_4$};
-\draw[->,line width=1mm](r3)--(r4) node[above,midway] {$der\,c$};
-\draw (r4) node[anchor=west] {\;\raisebox{3mm}{$nullable$}};
+\draw[->,line width=1mm](r3)--(r4) node[above,midway] {$\textit{der}\,c$};
+\draw (r4) node[anchor=west] {\;\raisebox{3mm}{$\textit{nullable}$}};
\node (v4) [below=of r4]{$v_4$};
\draw[->,line width=1mm](r4) -- (v4);
\node (v3) [left=of v4] {$v_3$};
-\draw[->,line width=1mm](v4)--(v3) node[below,midway] {$inj\,c$};
+\draw[->,line width=1mm](v4)--(v3) node[below,midway] {$\textit{inj}\,c$};
\node (v2) [left=of v3]{$v_2$};
-\draw[->,line width=1mm](v3)--(v2) node[below,midway] {$inj\,b$};
+\draw[->,line width=1mm](v3)--(v2) node[below,midway] {$\textit{inj}\,b$};
\node (v1) [left=of v2] {$v_1$};
-\draw[->,line width=1mm](v2)--(v1) node[below,midway] {$inj\,a$};
-\draw (r4) node[anchor=north west] {\;\raisebox{-8mm}{$mkeps$}};
+\draw[->,line width=1mm](v2)--(v1) node[below,midway] {$\textit{inj}\,a$};
+\draw (r4) node[anchor=north west] {\;\raisebox{-8mm}{$\textit{mkeps}$}};
\end{tikzpicture}
\end{center}
\caption{The two phases of the algorithm by Sulzmann \& Lu.
\label{Sulz}}
\end{figure}
-The $mkeps$ function calculates a value for how a regular
-expression has matched the empty string. Its definition
-is as follows:
+
\begin{center}
\begin{tabular}{lcl}
- $mkeps(\ONE)$ & $\dn$ & $Empty$\\
- $mkeps(r_1 + r_2)$ & $\dn$ & if $nullable(r_1)$ \\
- & & then $\Left(mkeps(r_1))$\\
- & & else $Right(mkeps(r_2))$\\
- $mkeps(r_1 \cdot r_2)$ & $\dn$ & $Seq(mkeps(r_1),mkeps(r_2))$\\
- $mkeps(r^*)$ & $\dn$ & $[]$ \\
+ $\textit{mkeps}(\ONE)$ & $\dn$ & $Empty$\\
+ $\textit{mkeps}(r_1 + r_2)$ & $\dn$ & if $\textit{nullable}(r_1)$ \\
+ & & then $\Left(\textit{mkeps}(r_1))$\\
+ & & else $Right(\textit{mkeps}(r_2))$\\
+ $\textit{mkeps}(r_1 \cdot r_2)$ & $\dn$ & $Seq(\textit{mkeps}(r_1),\textit{mkeps}(r_2))$\\
+ $\textit{mkeps}(r^*)$ & $\dn$ & $Stars\,[]$ \\
\end{tabular}
\end{center}
-\noindent There are no cases for $\ZERO$ and $c$, since
+\noindent There is are no cases for $\ZERO$ and $c$, since
these regular expression cannot match the empty string. Note
also that in case of alternatives we give preference to the
regular expression on the left-hand side. This will become
@@ -149,31 +163,31 @@
has matched a string. For this we need a function that
reverses this ``chopping off'' for values which we did in the
first phase for derivatives. The corresponding function is
-called $inj$ for injection. This function takes three
+called $\textit{inj}$ (short for injection). This function takes three
arguments: the first one is a regular expression for which we
want to calculate the value, the second is the character we
want to inject and the third argument is the value where we
will inject the character into. The result of this function is a
-new value. The definition of $inj$ is as follows:
+new value. The definition of $\textit{inj}$ is as follows:
\begin{center}
\begin{tabular}{l@{\hspace{1mm}}c@{\hspace{1mm}}l}
- $inj\,(c)\,c\,Empty$ & $\dn$ & $Char\,c$\\
- $inj\,(r_1 + r_2)\,c\,\Left(v)$ & $\dn$ & $\Left(inj\,r_1\,c\,v)$\\
- $inj\,(r_1 + r_2)\,c\,Right(v)$ & $\dn$ & $Right(inj\,r_2\,c\,v)$\\
- $inj\,(r_1 \cdot r_2)\,c\,Seq(v_1,v_2)$ & $\dn$ & $Seq(inj\,r_1\,c\,v_1,v_2)$\\
- $inj\,(r_1 \cdot r_2)\,c\,\Left(Seq(v_1,v_2))$ & $\dn$ & $Seq(inj\,r_1\,c\,v_1,v_2)$\\
- $inj\,(r_1 \cdot r_2)\,c\,Right(v)$ & $\dn$ & $Seq(mkeps(r_1),inj\,r_2\,c\,v)$\\
- $inj\,(r^*)\,c\,Seq(v,vs)$ & $\dn$ & $inj\,r\,c\,v\,::\,vs$\\
+ $\textit{inj}\,(c)\,c\,Empty$ & $\dn$ & $Char\,c$\\
+ $\textit{inj}\,(r_1 + r_2)\,c\,\Left(v)$ & $\dn$ & $\Left(\textit{inj}\,r_1\,c\,v)$\\
+ $\textit{inj}\,(r_1 + r_2)\,c\,Right(v)$ & $\dn$ & $Right(\textit{inj}\,r_2\,c\,v)$\\
+ $\textit{inj}\,(r_1 \cdot r_2)\,c\,Seq(v_1,v_2)$ & $\dn$ & $Seq(\textit{inj}\,r_1\,c\,v_1,v_2)$\\
+ $\textit{inj}\,(r_1 \cdot r_2)\,c\,\Left(Seq(v_1,v_2))$ & $\dn$ & $Seq(\textit{inj}\,r_1\,c\,v_1,v_2)$\\
+ $\textit{inj}\,(r_1 \cdot r_2)\,c\,Right(v)$ & $\dn$ & $Seq(\textit{mkeps}(r_1),\textit{inj}\,r_2\,c\,v)$\\
+ $\textit{inj}\,(r^*)\,c\,Seq(v,Stars\,vs)$ & $\dn$ & $Stars(\textit{inj}\,r\,c\,v\,::\,vs)$\\
\end{tabular}
\end{center}
\noindent This definition is by recursion on the regular
expression and by analysing the shape of the values. Therefore
-there are, for example, three cases for sequence regular
-expressions (for all possible shapes of the value). The last
+there are three cases for sequence regular
+expressions (for all possible shapes of the value we can encounter). The last
clause for the star regular expression returns a list where
-the first element is $inj\,r\,c\,v$ and the other elements are
+the first element is $\textit{inj}\,r\,c\,v$ and the other elements are
$vs$. That means $\_\,::\,\_$ should be read as list cons.
To understand what is going on, it might be best to do some
@@ -195,36 +209,36 @@
\noindent According to the simple algorithm, we would test
whether $r_4$ is nullable, which in this case it indeed is.
-This means we can use the function $mkeps$ to calculate a
+This means we can use the function $\textit{mkeps}$ to calculate a
value for how $r_4$ was able to match the empty string.
Remember that this function gives preference for alternatives
on the left-hand side. However there is only $\ONE$ on the
-very right-hand side of $r_4$ that matches the empty string.
-Therefore $mkeps$ returns the value
+very right-hand side of $r_4$ (underlined)
+
+\begin{center}
+$r_4$:\;$(\ZERO \cdot (b \cdot c)) +
+ ((\ZERO \cdot c) + \underline{\ONE})$\\
+\end{center}
+
+\noindent
+that matches the empty string.
+Therefore $\textit{mkeps}$ returns the value
\begin{center}
$v_4:\;Right(Right(Empty))$
\end{center}
\noindent If there had been a $\ONE$ on the left, then
-$mkeps$ would have returned something of the form
+$\textit{mkeps}$ would have returned something of the form
$\Left(\ldots)$. The point is that from this value we can
directly read off which part of $r_4$ matched the empty
string: take the right-alternative first, and then the
-right-alternative again. Remember $r_4$ is of the form
-
-\begin{center}
-$r_4$:\;$(\ZERO \cdot (b \cdot c)) +
- ((\ZERO \cdot c) + \underline{\ONE})$\\
-\end{center}
-
-\noindent the value tells us that the underlined $\ONE$
-is responsible for matching the empty string.
+right-alternative again.
Next we have to ``inject'' the last character, that is $c$ in
the running example, into this value $v_4$ in order to
calculate how $r_3$ could have matched the string $c$.
-According to the definition of $inj$ we obtain
+According to the definition of $\textit{inj}$ we obtain
\begin{center}
$v_3:\;Right(Seq(Empty, Char(c)))$
@@ -263,12 +277,12 @@
$|\Left(v)|$ & $\dn$ & $|v|$\\
$|Right(v)|$ & $\dn$ & $|v|$\\
$|Seq(v_1,v_2)|$& $\dn$ & $|v_1| \,@\, |v_2|$\\
- $|[v_1,\ldots ,v_n]|$ & $\dn$ & $|v_1| \,@\ldots @\, |v_n|$\\
+ $|Stars\,[v_1,\ldots ,v_n]|$ & $\dn$ & $|v_1| \,@\ldots @\, |v_n|$\\
\end{tabular}
\end{center}
\noindent Using flatten we can see what are the strings behind
-the values calculated by $mkeps$ and $inj$. In our running
+the values calculated by $\textit{mkeps}$ and $\textit{inj}$. In our running
example:
\begin{center}
@@ -280,10 +294,8 @@
\end{tabular}
\end{center}
-\noindent This indicates that $inj$ indeed is injecting, or
-adding, back a character into the value. If we continue until
-all characters are injected back, we have a value that can
-indeed say how the string $abc$ was matched.
+\noindent This indicates that $\textit{inj}$ indeed is injecting, or
+adding, back a character into the value.
There is a problem, however, with the described algorithm
so far: it is very slow. We need to include the simplification
@@ -301,23 +313,29 @@
first example consider the last derivation step in our earlier
example:
-\begin{center}
-$r_4 = der\,c\,r_3 =
-(\ZERO \cdot (b \cdot c)) + ((\ZERO \cdot c) + \ONE)$
-\end{center}
+\begin{equation}\label{regexfour}
+r_4 = \textit{der}\,c\,r_3 =
+(\ZERO \cdot (b \cdot c)) + ((\ZERO \cdot c) + \ONE)
+\end{equation}
\noindent Simplifying this regular expression would just give
-us $\ONE$. Running $mkeps$ with this regular expression as
-input, however, would then provide us with $Empty$ instead of
-$Right(Right(Empty))$ that was obtained without the
-simplification. The problem is we need to recreate this more
-complicated value, rather than just return $Empty$.
+us $\ONE$. Running $\textit{mkeps}$ with this $\ONE$ as
+input, however, would give us with the value $Empty$ instead of
+
+\[Right(Right(Empty))\]
-This will require what I call \emph{rectification functions}.
-They need to be calculated whenever a regular expression gets
-simplified. Rectification functions take a value as argument
-and return a (rectified) value. Let us first take a look again
-at our simplification rules:
+\noindent
+that was obtained without the simplification. The problem is we need
+to recreate this more complicated value, rather than just return
+$Empty$. This will require what I call \emph{rectification functions}.
+They need to be calculated whenever a regular expression gets
+simplified.
+
+Rectification functions take a value as argument and return a
+(rectified) value. In the example above the argument would be $Empty$
+and the output $Right(Right(Empty))$. Before we define these
+rectification functions in general, let us first take a look again at
+our simplification rules:
\begin{center}
\begin{tabular}{l@{\hspace{2mm}}c@{\hspace{2mm}}l}
@@ -331,7 +349,7 @@
\end{tabular}
\end{center}
-\noindent Applying them to $r_4$ will require several nested
+\noindent Applying them to $r_4$ in \eqref{regexfour} will require several nested
simplifications in order end up with just $\ONE$. However,
it is possible to apply them in a depth-first, or inside-out,
manner in order to calculate this simplified regular
@@ -391,28 +409,28 @@
case):
\begin{center}
-\begin{tabular}{l}
-$simp(r)$:\\
-\quad case $r = r_1 + r_2$\\
-\qquad let $(r_{1s}, f_{1s}) = simp(r_1)$\\
-\qquad \phantom{let} $(r_{2s}, f_{2s}) = simp(r_2)$\smallskip\\
-\qquad case $r_{1s} = \ZERO$:
+\begin{tabular}{ll}
+$_1$ & $simp(r)$:\\
+$_2$ & \quad case $r = r_1 + r_2$\\
+$_3$ & \qquad let $(r_{1s}, f_{1s}) = simp(r_1)$\\
+$_4$ & \qquad \phantom{let} $(r_{2s}, f_{2s}) = simp(r_2)$\smallskip\\
+$_5$ & \qquad case $r_{1s} = \ZERO$:
return $(r_{2s}, \lambda v. \,Right(f_{2s}(v)))$\\
-\qquad case $r_{2s} = \ZERO$:
+$_6$ & \qquad case $r_{2s} = \ZERO$:
return $(r_{1s}, \lambda v. \,\Left(f_{1s}(v)))$\\
-\qquad case $r_{1s} = r_{2s}$:
+$_7$ & \qquad case $r_{1s} = r_{2s}$:
return $(r_{1s}, \lambda v. \,\Left(f_{1s}(v)))$\\
-\qquad otherwise:
+$_8$ & \qquad otherwise:
return $(r_{1s} + r_{2s}, f_{alt}(f_{1s}, f_{2s}))$\\
\end{tabular}
\end{center}
\noindent We first recursively call the simplification with
-$r_1$ and $r_2$. This gives simplified regular expressions,
+$r_1$ and $r_2$ (Lines 3 and 4). This gives simplified regular expressions,
$r_{1s}$ and $r_{2s}$, as well as two rectification functions
$f_{1s}$ and $f_{2s}$. We next need to test whether the
simplified regular expressions are $\ZERO$ so as to make
-further simplifications. In case $r_{1s}$ is $\ZERO$,
+further simplifications. In case $r_{1s}$ is $\ZERO$ (Line 5),
then we can return $r_{2s}$ (the other alternative). However
for this we need to build a corresponding rectification
function, which as said above is
@@ -465,7 +483,7 @@
\begin{center}
\begin{tabular}{l}
-$simp(r)$:\qquad\qquad (continued)\\
+$simp(r)$:\hspace{5cm} (continued)\\
\quad case $r = r_1 \cdot r_2$\\
\qquad let $(r_{1s}, f_{1s}) = simp(r_1)$\\
\qquad \phantom{let} $(r_{2s}, f_{2s}) = simp(r_2)$\smallskip\\
@@ -517,10 +535,9 @@
$\lambda v.\,v$
\end{center}
-\noindent This completes the high-level version of the
-simplification function, which is also shown again in
-Figure~\ref{simp}. This can now be used in a \emph{lexing
-function} as follows:
+\noindent This completes the high-level version of the simplification
+function, which is shown again in Figure~\ref{simp}. The Scala code
+for the simplifaction function is in Figure~\ref{simprect}.
\begin{figure}[t]
\begin{center}
@@ -562,13 +579,25 @@
regular expression and a rectification function.\label{simp}}
\end{figure}
+\begin{figure}[p]
+\lstinputlisting{../progs/app61.scala}
+
+\caption{The Scala code for the simplification function. The
+first part defines some auxillary functions for the rectification.
+The second part give the simplification function.
+\label{simprect}}
+\end{figure}
+
+
+We are now in the position to define a \emph{lexing function} as follows:
+
\begin{center}
\begin{tabular}{lcl}
-$lex\,r\,[]$ & $\dn$ & if $nullable(r)$ then $mkeps(r)$\\
+$lex\,r\,[]$ & $\dn$ & if $\textit{nullable}(r)$ then $\textit{mkeps}(r)$\\
& & else $error$\\
$lex\,r\,c\!::\!s$ & $\dn$ & let
- $(r_{simp}, f_{rect}) = simp(der(c, r))$\\
-& & $inj\,r\,c\,f_{rect}(lex\,r_{simp}\,s)$
+ $(r_{simp}, f_{rect}) = simp(\textit{der}(c, r))$\\
+& & $\textit{inj}\,r\,c\,f_{rect}(lex\,r_{simp}\,s)$
\end{tabular}
\end{center}
@@ -576,7 +605,7 @@
seen in earlier lectures. In the first clause we are given an
empty string, $[]$, and need to test wether the regular
expression is $nullable$. If yes, we can proceed normally and
-just return the value calculated by $mkeps$. The second clause
+just return the value calculated by $\textit{mkeps}$. The second clause
is for strings where the first character is $c$, say, and the
rest of the string is $s$. We first build the derivative of
$r$ with respect to $c$; simplify the resulting regular
@@ -614,8 +643,8 @@
{../progs/app03.scala}}
\noindent Since we regard records as regular expressions we
-need to extend the functions $nullable$ and $der$. Similarly
-$mkeps$ and $inj$ need to be extended. This means we also need
+need to extend the functions $\textit{nullable}$ and $\textit{der}$. Similarly
+$\textit{mkeps}$ and $\textit{inj}$ need to be extended. This means we also need
to extend the definition of values, which in Scala looks as
follows:
@@ -646,7 +675,7 @@
$env(\Left(v))$ & $\dn$ & $env(v)$\\
$env(Right(v))$ & $\dn$ & $env(v)$\\
$env(Seq(v_1,v_2))$& $\dn$ & $env(v_1) \,@\, env(v_2)$\\
- $env([v_1,\ldots ,v_n])$ & $\dn$ &
+ $env(Stars\,[v_1,\ldots ,v_n])$ & $\dn$ &
$env(v_1) \,@\ldots @\, env(v_n)$\\
$env(Rec(x:v))$ & $\dn$ & $(x:|v|) :: env(v)$\\
\end{tabular}
--- a/progs/app6.scala Tue Sep 20 12:17:01 2016 +0100
+++ b/progs/app6.scala Tue Sep 20 12:25:09 2016 +0100
@@ -1,4 +1,4 @@
-def simp(r: Rexp): Rexp = r match {
+def simp(r: Rexp) : Rexp = r match {
case ALT(r1, r2) => {
(simp(r1), simp(r2)) match {
case (ZERO, r2s) => r2s
--- /dev/null Thu Jan 01 00:00:00 1970 +0000
+++ b/progs/app61.scala Tue Sep 20 12:25:09 2016 +0100
@@ -0,0 +1,42 @@
+def F_ID(v: Val): Val = v
+def F_RIGHT(f: Val => Val) = (v:Val) => Right(f(v))
+def F_LEFT(f: Val => Val) = (v:Val) => Left(f(v))
+def F_ALT(f1: Val => Val, f2: Val => Val) = (v:Val) => v match {
+ case Right(v) => Right(f2(v))
+ case Left(v) => Left(f1(v))
+}
+def F_SEQ(f1: Val => Val, f2: Val => Val) = (v:Val) => v match {
+ case Seq(v1, v2) => Seq(f1(v1), f2(v2))
+}
+def F_SEQ_Empty1(f1: Val => Val, f2: Val => Val) =
+ (v:Val) => Seq(f1(Empty), f2(v))
+def F_SEQ_Empty2(f1: Val => Val, f2: Val => Val) =
+ (v:Val) => Seq(f1(v), f2(Empty))
+def F_ERROR(v: Val): Val = throw new Exception("error")
+
+// simplification of regular expressions returning also a
+// rectification function; no simplification under STAR
+def simp(r: Rexp): (Rexp, Val => Val) = r match {
+ case ALT(r1, r2) => {
+ val (r1s, f1s) = simp(r1)
+ val (r2s, f2s) = simp(r2)
+ (r1s, r2s) match {
+ case (ZERO, _) => (r2s, F_RIGHT(f2s))
+ case (_, ZERO) => (r1s, F_LEFT(f1s))
+ case _ => if (r1s == r2s) (r1s, F_LEFT(f1s))
+ else (ALT (r1s, r2s), F_ALT(f1s, f2s))
+ }
+ }
+ case SEQ(r1, r2) => {
+ val (r1s, f1s) = simp(r1)
+ val (r2s, f2s) = simp(r2)
+ (r1s, r2s) match {
+ case (ZERO, _) => (ZERO, F_ERROR)
+ case (_, ZERO) => (ZERO, F_ERROR)
+ case (ONE, _) => (r2s, F_SEQ_Empty1(f1s, f2s))
+ case (_, ONE) => (r1s, F_SEQ_Empty2(f1s, f2s))
+ case _ => (SEQ(r1s,r2s), F_SEQ(f1s, f2s))
+ }
+ }
+ case r => (r, F_ID)
+}
--- a/progs/re1.scala Tue Sep 20 12:17:01 2016 +0100
+++ b/progs/re1.scala Tue Sep 20 12:25:09 2016 +0100
@@ -56,6 +56,8 @@
// the evil regular expression a?{n} a{n}
def EVIL1(n: Int) = SEQ(NTIMES(OPT(CHAR('a')), n), NTIMES(CHAR('a'), n))
+
+// the evil regular expression (a*)*b
val EVIL2 = SEQ(STAR(CHAR('a')), CHAR('b'))
//for measuring time
--- a/progs/re2.scala Tue Sep 20 12:17:01 2016 +0100
+++ b/progs/re2.scala Tue Sep 20 12:25:09 2016 +0100
@@ -1,3 +1,5 @@
+// version with explicit n-times regular expression
+
abstract class Rexp
case object NULL extends Rexp
case object EMPTY extends Rexp
@@ -41,6 +43,7 @@
//optional: one or zero times
def OPT(r: Rexp) = ALT(r, EMPTY)
+//evil regular expressions
def EVIL1(n: Int) = SEQ(NTIMES(OPT(CHAR('a')), n), NTIMES(CHAR('a'), n))
val EVIL2 = SEQ(STAR(CHAR('a')), CHAR('b'))
@@ -55,7 +58,7 @@
println(i + ": " + "%.5f".format(time_needed(1, matches(EVIL1(i), "a" * i))))
}
-//a bit bolder test
+//a bit bolder tests
for (i <- 1 to 1001 by 50) {
println(i + " " + "%.5f".format(time_needed(2, matches(EVIL1(i), "a" * i))))
}
--- a/progs/re3.scala Tue Sep 20 12:17:01 2016 +0100
+++ b/progs/re3.scala Tue Sep 20 12:25:09 2016 +0100
@@ -1,36 +1,18 @@
-abstract class Rexp
-case object NULL extends Rexp
-case object EMPTY extends Rexp
+
+abstract class Rexp
+case object ZERO extends Rexp
+case object ONE extends Rexp
case class CHAR(c: Char) extends Rexp
-case class ALT(r1: Rexp, r2: Rexp) extends Rexp
+case class ALT(r1: Rexp, r2: Rexp) extends Rexp
case class SEQ(r1: Rexp, r2: Rexp) extends Rexp
case class STAR(r: Rexp) extends Rexp
case class NTIMES(r: Rexp, n: Int) extends Rexp
-def simp(r: Rexp): Rexp = r match {
- case ALT(r1, r2) => {
- (simp(r1), simp(r2)) match {
- case (NULL, r2s) => r2s
- case (r1s, NULL) => r1s
- case (r1s, r2s) => if (r1s == r2s) r1s else ALT(r1s, r2s)
- }
- }
- case SEQ(r1, r2) => {
- (simp(r1), simp(r2)) match {
- case (NULL, _) => NULL
- case (_, NULL) => NULL
- case (EMPTY, r2s) => r2s
- case (r1s, EMPTY) => r1s
- case (r1s, r2s) => SEQ(r1s, r2s)
- }
- }
- case NTIMES(r, n) => NTIMES(simp(r), n)
- case r => r
-}
-
+// nullable function: tests whether the regular
+// expression can recognise the empty string
def nullable (r: Rexp) : Boolean = r match {
- case NULL => false
- case EMPTY => true
+ case ZERO => false
+ case ONE => true
case CHAR(_) => false
case ALT(r1, r2) => nullable(r1) || nullable(r2)
case SEQ(r1, r2) => nullable(r1) && nullable(r2)
@@ -38,31 +20,56 @@
case NTIMES(r, i) => if (i == 0) true else nullable(r)
}
+// derivative of a regular expression w.r.t. a character
def der (c: Char, r: Rexp) : Rexp = r match {
- case NULL => NULL
- case EMPTY => NULL
- case CHAR(d) => if (c == d) EMPTY else NULL
+ case ZERO => ZERO
+ case ONE => ZERO
+ case CHAR(d) => if (c == d) ONE else ZERO
case ALT(r1, r2) => ALT(der(c, r1), der(c, r2))
case SEQ(r1, r2) =>
if (nullable(r1)) ALT(SEQ(der(c, r1), r2), der(c, r2))
else SEQ(der(c, r1), r2)
case STAR(r) => SEQ(der(c, r), STAR(r))
case NTIMES(r, i) =>
- if (i == 0) NULL else SEQ(der(c, r), NTIMES(r, i - 1))
+ if (i == 0) ZERO else SEQ(der(c, r), NTIMES(r, i - 1))
}
+def simp(r: Rexp) : Rexp = r match {
+ case ALT(r1, r2) => (simp(r1), simp(r2)) match {
+ case (ZERO, r2s) => r2s
+ case (r1s, ZERO) => r1s
+ case (r1s, r2s) => if (r1s == r2s) r1s else ALT (r1s, r2s)
+ }
+ case SEQ(r1, r2) => (simp(r1), simp(r2)) match {
+ case (ZERO, _) => ZERO
+ case (_, ZERO) => ZERO
+ case (ONE, r2s) => r2s
+ case (r1s, ONE) => r1s
+ case (r1s, r2s) => SEQ(r1s, r2s)
+ }
+ case NTIMES(r1, n) => NTIMES(simp(r), n)
+ case r => r
+}
+
+
+// derivative w.r.t. a string (iterates der)
def ders (s: List[Char], r: Rexp) : Rexp = s match {
case Nil => r
case c::s => ders(s, simp(der(c, r)))
}
-def matches(r: Rexp, s: String) : Boolean = nullable(ders(s.toList, r))
+
+// main matcher function
+def matcher(r: Rexp, s: String) : Boolean = nullable(ders(s.toList, r))
//one or zero
-def OPT(r: Rexp) = ALT(r, EMPTY)
+def OPT(r: Rexp) = ALT(r, ONE)
-def EVIL(n: Int) = SEQ(NTIMES(OPT(CHAR('a')), n), NTIMES(CHAR('a'), n))
+//evil regular expressions
+def EVIL1(n: Int) = SEQ(NTIMES(OPT(CHAR('a')), n), NTIMES(CHAR('a'), n))
+val EVIL2 = SEQ(STAR(CHAR('a')), CHAR('b'))
+
def time_needed[T](i: Int, code: => T) = {
val start = System.nanoTime()
@@ -71,70 +78,16 @@
(end - start)/(i * 1.0e9)
}
+val i = 5000
+println(i + " " + "%.5f".format(time_needed(10, matcher(EVIL1(i), "a" * i))))
-for (i <- 1 to 12001 by 500) {
- println(i + " " + "%.5f".format(time_needed(1, matches(EVIL(i), "a" * i))))
+for (i <- 1 to 9001 by 1000) {
+ println(i + " " + "%.5f".format(time_needed(2, matcher(EVIL1(i), "a" * i))))
}
-
-// some convenience for typing in regular expressions
-import scala.language.implicitConversions
-import scala.language.reflectiveCalls
-def charlist2rexp(s : List[Char]) : Rexp = s match {
- case Nil => EMPTY
- case c::Nil => CHAR(c)
- case c::s => SEQ(CHAR(c), charlist2rexp(s))
-}
-implicit def string2rexp(s : String) : Rexp = charlist2rexp(s.toList)
-
-implicit def RexpOps (r: Rexp) = new {
- def | (s: Rexp) = ALT(r, s)
- def % = STAR(r)
- def ~ (s: Rexp) = SEQ(r, s)
-}
-
-implicit def stringOps (s: String) = new {
- def | (r: Rexp) = ALT(s, r)
- def | (r: String) = ALT(s, r)
- def % = STAR(s)
- def ~ (r: Rexp) = SEQ(s, r)
- def ~ (r: String) = SEQ(s, r)
+for (i <- 1 to 7500001 by 500000) {
+ println(i + " " + "%.5f".format(time_needed(2, matcher(EVIL2, "a" * i))))
}
-def PLUS(r: Rexp) = SEQ(r, STAR(r))
-def RANGE(s: List[Char]) : Rexp = s match {
- case Nil => NULL
- case c::s => ALT(CHAR(c), RANGE(s))
-}
-
-println("EVILS:")
-val EVIL1 = PLUS(PLUS("a" ~ "a" ~ "a"))
-val EVIL2 = PLUS(PLUS("aaaaaaaaaaaaaaaaaaa" ~ OPT("a"))) // 19 as ~ a?
-
-
-//40 * aaa matches
-//43 * aaa + aa does not
-//45 * aaa + a
-
-println("EVIL1:")
-println(matches(EVIL1, "aaa" * 40))
-println(matches(EVIL1, "aaa" * 43 + "aa"))
-println(matches(EVIL1, "aaa" * 45 + "a"))
-println("EVIL2:")
-println(matches(EVIL2, "aaa" * 40))
-println(matches(EVIL2, "aaa" * 43 + "aa"))
-println(matches(EVIL2, "aaa" * 45 + "a"))
-
-
-
-
-println("EMAIL:")
-val LOWERCASE = "abcdefghijklmnopqrstuvwxyz"
-val DIGITS = "0123456789"
-val EMAIL = PLUS(RANGE((LOWERCASE + DIGITS + "_" + "." + "-").toList)) ~ "@" ~
- PLUS(RANGE((LOWERCASE + DIGITS + "." + "-").toList)) ~ "." ~
- NMTIMES(RANGE((LOWERCASE + ".").toList), 2, 6)
-
-val my_email = "christian.urban@kcl.ac.uk"
--- /dev/null Thu Jan 01 00:00:00 1970 +0000
+++ b/progs/re3ext.scala Tue Sep 20 12:25:09 2016 +0100
@@ -0,0 +1,151 @@
+abstract class Rexp
+case object ZERO extends Rexp
+case object ONE extends Rexp
+case class CHAR(c: Char) extends Rexp
+case class ALT(r1: Rexp, r2: Rexp) extends Rexp
+case class SEQ(r1: Rexp, r2: Rexp) extends Rexp
+case class STAR(r: Rexp) extends Rexp
+case class NTIMES(r: Rexp, n: Int) extends Rexp
+
+def simp(r: Rexp): Rexp = r match {
+ case ALT(r1, r2) => {
+ (simp(r1), simp(r2)) match {
+ case (ZERO, r2s) => r2s
+ case (r1s, ZERO) => r1s
+ case (r1s, r2s) => if (r1s == r2s) r1s else ALT(r1s, r2s)
+ }
+ }
+ case SEQ(r1, r2) => {
+ (simp(r1), simp(r2)) match {
+ case (ZERO, _) => ZERO
+ case (_, ZERO) => ZERO
+ case (ONE, r2s) => r2s
+ case (r1s, ONE) => r1s
+ case (r1s, r2s) => SEQ(r1s, r2s)
+ }
+ }
+ case NTIMES(r, n) => NTIMES(simp(r), n)
+ case r => r
+}
+
+def nullable (r: Rexp) : Boolean = r match {
+ case ZERO => false
+ case ONE => true
+ case CHAR(_) => false
+ case ALT(r1, r2) => nullable(r1) || nullable(r2)
+ case SEQ(r1, r2) => nullable(r1) && nullable(r2)
+ case STAR(_) => true
+ case NTIMES(r, i) => if (i == 0) true else nullable(r)
+}
+
+def der (c: Char, r: Rexp) : Rexp = r match {
+ case ZERO => ZERO
+ case ONE => ZERO
+ case CHAR(d) => if (c == d) ONE else ZERO
+ case ALT(r1, r2) => ALT(der(c, r1), der(c, r2))
+ case SEQ(r1, r2) =>
+ if (nullable(r1)) ALT(SEQ(der(c, r1), r2), der(c, r2))
+ else SEQ(der(c, r1), r2)
+ case STAR(r) => SEQ(der(c, r), STAR(r))
+ case NTIMES(r, i) =>
+ if (i == 0) ZERO else SEQ(der(c, r), NTIMES(r, i - 1))
+}
+
+def ders (s: List[Char], r: Rexp) : Rexp = s match {
+ case Nil => r
+ case c::s => ders(s, simp(der(c, r)))
+}
+
+def matches(r: Rexp, s: String) : Boolean = nullable(ders(s.toList, r))
+
+
+//one or zero
+def OPT(r: Rexp) = ALT(r, ONE)
+
+def EVIL(n: Int) = SEQ(NTIMES(OPT(CHAR('a')), n), NTIMES(CHAR('a'), n))
+
+def time_needed[T](i: Int, code: => T) = {
+ val start = System.nanoTime()
+ for (j <- 1 to i) code
+ val end = System.nanoTime()
+ (end - start)/(i * 1.0e9)
+}
+
+/*
+for (i <- 1 to 9001 by 1000) {
+ println(i + " " + "%.5f".format(time_needed(1, matches(EVIL(i), "a" * i))))
+}
+*/
+
+// some convenience for typing in regular expressions
+import scala.language.implicitConversions
+import scala.language.reflectiveCalls
+
+def charlist2rexp(s : List[Char]) : Rexp = s match {
+ case Nil => ONE
+ case c::Nil => CHAR(c)
+ case c::s => SEQ(CHAR(c), charlist2rexp(s))
+}
+implicit def string2rexp(s : String) : Rexp = charlist2rexp(s.toList)
+
+implicit def RexpOps (r: Rexp) = new {
+ def | (s: Rexp) = ALT(r, s)
+ def % = STAR(r)
+ def ~ (s: Rexp) = SEQ(r, s)
+}
+
+implicit def stringOps (s: String) = new {
+ def | (r: Rexp) = ALT(s, r)
+ def | (r: String) = ALT(s, r)
+ def % = STAR(s)
+ def ~ (r: Rexp) = SEQ(s, r)
+ def ~ (r: String) = SEQ(s, r)
+}
+
+
+def PLUS(r: Rexp) = SEQ(r, STAR(r))
+def RANGE(s: List[Char]) : Rexp = s match {
+ case Nil => ZERO
+ case c::s => ALT(CHAR(c), RANGE(s))
+}
+def NMTIMES(r: Rexp, n: Int, m: Int) : Rexp =
+ if (n >= m) NTIMES(r, n) else ALT(NTIMES(r, m), NMTIMES(r, n, m - 1))
+
+
+println("Coursework:")
+val REG1 = PLUS(PLUS("a" ~ "a" ~ "a"))
+val REG2 = PLUS(PLUS("aaaaaaaaaaaaaaaaaaa" ~ OPT("a"))) // 19 as ~ a?
+
+
+//40 * aaa matches
+//43 * aaa + aa does not
+//45 * aaa + a
+
+println("EVIL1:")
+println(matches(REG1, "aaa" * 40))
+println(matches(REG1, "aaa" * 43 + "aa"))
+println(matches(REG1, "aaa" * 45 + "a"))
+println("EVIL2:")
+println(matches(REG2, "aaa" * 40))
+println(matches(REG2, "aaa" * 43 + "aa"))
+println(matches(REG2, "aaa" * 45 + "a"))
+
+println("EMAIL:")
+val LOWERCASE = "abcdefghijklmnopqrstuvwxyz"
+val DIGITS = "0123456789"
+val EMAIL = PLUS(RANGE((LOWERCASE + DIGITS + "_" + "." + "-").toList)) ~ "@" ~
+ PLUS(RANGE((LOWERCASE + DIGITS + "." + "-").toList)) ~ "." ~
+ NMTIMES(RANGE((LOWERCASE + ".").toList), 2, 6)
+
+val my_email = "christian.urban@kcl.ac.uk"
+
+println(matches(EMAIL, my_email))
+
+println(matches(NTIMES("a", 2), "a"))
+println(matches(NTIMES("a", 2), "aa"))
+println(matches(NTIMES("a", 2), "aaa"))
+
+println(matches(NMTIMES("a", 2, 3), "a"))
+println(matches(NMTIMES("a", 2, 3), "aa"))
+println(matches(NMTIMES("a", 2, 3), "aaa"))
+println(matches(NMTIMES("a", 2, 3), "aaaa"))
--- a/progs/re4.scala Tue Sep 20 12:17:01 2016 +0100
+++ b/progs/re4.scala Tue Sep 20 12:25:09 2016 +0100
@@ -1,55 +1,12 @@
-import scala.annotation.tailrec
-import scala.language.implicitConversions
-abstract class Rexp {
- def simp : Rexp = this
-}
-
+abstract class Rexp
case object ZERO extends Rexp
case object ONE extends Rexp
case class CHAR(c: Char) extends Rexp
-case class ALT(r1: Rexp, r2: Rexp) extends Rexp {
- override def simp = (r1.simp, r2.simp) match {
- case (ZERO, r) => r
- case (r, ZERO) => r
- case (r, ONE) => if (nullable(r)) r else ALT(r, ONE)
- case (ONE, r) => if (nullable(r)) r else ALT(r, ONE)
- case (r1, r2) => if (r1 == r2) r1 else ALT(r1, r2)
- }
-}
-case class SEQ(r1: Rexp, r2: Rexp) extends Rexp {
- override def simp = (r1.simp, r2.simp) match {
- case (ZERO, _) => ZERO
- case (_, ZERO) => ZERO
- case (ONE, r) => r
- case (r, ONE) => r
- case (r1, r2) => SEQ(r1, r2)
- }
-}
-case class STAR(r: Rexp) extends Rexp {
- override def simp = r.simp match {
- case ZERO => ONE
- case ONE => ONE
- case r => STAR(r)
- }
-}
-case class NTIMES(r: Rexp, n: Int) extends Rexp {
- override def simp = if (n == 0) ONE else
- r.simp match {
- case ZERO => ZERO
- case ONE => ONE
- case r => NTIMES(r, n)
- }
-}
-
-// some convenience for typing in regular expressions
-def charlist2rexp(s : List[Char]) : Rexp = s match {
- case Nil => ONE
- case c::Nil => CHAR(c)
- case c::s => SEQ(CHAR(c), charlist2rexp(s))
-}
-implicit def string2rexp(s : String) : Rexp = charlist2rexp(s.toList)
-
+case class ALT(r1: Rexp, r2: Rexp) extends Rexp
+case class SEQ(r1: Rexp, r2: Rexp) extends Rexp
+case class STAR(r: Rexp) extends Rexp
+case class NTIMES(r: Rexp, n: Int) extends Rexp
// nullable function: tests whether the regular
// expression can recognise the empty string
@@ -77,24 +34,48 @@
if (i == 0) ZERO else SEQ(der(c, r), NTIMES(r, i - 1))
}
-// derivative w.r.t. a string (iterates der)
-@tailrec
-def ders (s: List[Char], r: Rexp) : Rexp = s match {
- case Nil => r
- case c::s => ders(s, der(c, r).simp)
+def simp(r: Rexp) : Rexp = r match {
+ case ALT(r1, r2) => (simp(r1), simp(r2)) match {
+ case (ZERO, r2s) => r2s
+ case (r1s, ZERO) => r1s
+ case (r1s, r2s) => if (r1s == r2s) r1s else ALT (r1s, r2s)
+ }
+ case SEQ(r1, r2) => (simp(r1), simp(r2)) match {
+ case (ZERO, _) => ZERO
+ case (_, ZERO) => ZERO
+ case (ONE, r2s) => r2s
+ case (r1s, ONE) => r1s
+ case (r1s, r2s) => SEQ(r1s, r2s)
+ }
+ case NTIMES(r1, n) => simp(r1) match {
+ case ZERO => ZERO
+ case ONE => ONE
+ case r1s => NTIMES(r1s, n)
+ }
+ case r => r
}
-
+// derivative w.r.t. a string (iterates der)
+def ders2(s: List[Char], r: Rexp) : Rexp = (s, r) match {
+ case (Nil, r) => r
+ case (s, ZERO) => ZERO
+ case (s, ONE) => if (s == Nil) ONE else ZERO
+ case (s, CHAR(c)) => if (s == List(c)) ONE else
+ if (s == Nil) CHAR(c) else ZERO
+ case (s, ALT(r1, r2)) => ALT(ders2(s, r2), ders2(s, r2))
+ case (c::s, r) => ders2(s, simp(der(c, r)))
+}
// main matcher function
-def matcher(r: Rexp, s: String) : Boolean = nullable(ders(s.toList, r))
+def matcher(r: Rexp, s: String) : Boolean = nullable(ders2(s.toList, r))
//one or zero
def OPT(r: Rexp) = ALT(r, ONE)
-def EVIL1(n: Int) = SEQ(NTIMES(OPT("a"), n), NTIMES("a", n))
-val EVIL2 = SEQ(STAR("a"), "b")
+def EVIL1(n: Int) = SEQ(NTIMES(OPT(CHAR('a')), n), NTIMES(CHAR('a'), n))
+val EVIL2 = SEQ(STAR(CHAR('a')), CHAR('b'))
+
def time_needed[T](i: Int, code: => T) = {
val start = System.nanoTime()
@@ -106,7 +87,7 @@
val i = 5000
println(i + " " + "%.5f".format(time_needed(10, matcher(EVIL1(i), "a" * i))))
-for (i <- 1 to 9001 by 1000) {
+for (i <- 1 to 7000001 by 1000000) {
println(i + " " + "%.5f".format(time_needed(2, matcher(EVIL1(i), "a" * i))))
}
--- a/progs/re5.scala Tue Sep 20 12:17:01 2016 +0100
+++ /dev/null Thu Jan 01 00:00:00 1970 +0000
@@ -1,118 +0,0 @@
-import scala.annotation.tailrec
-import scala.language.implicitConversions
-
-abstract class Rexp {
- def simp : Rexp = this
-}
-
-case object ZERO extends Rexp
-case object ONE extends Rexp
-case class CHAR(c: Char) extends Rexp
-case class ALT(r1: Rexp, r2: Rexp) extends Rexp {
- override def simp = (r1.simp, r2.simp) match {
- case (ZERO, r) => r
- case (r, ZERO) => r
- case (r, ONE) => if (nullable(r)) r else ALT(r, ONE)
- case (ONE, r) => if (nullable(r)) r else ALT(r, ONE)
- case (r1, r2) => if (r1 == r2) r1 else ALT(r1, r2)
- }
-}
-case class SEQ(r1: Rexp, r2: Rexp) extends Rexp {
- override def simp = (r1.simp, r2.simp) match {
- case (ZERO, _) => ZERO
- case (_, ZERO) => ZERO
- case (ONE, r) => r
- case (r, ONE) => r
- case (r1, r2) => SEQ(r1, r2)
- }
-}
-case class STAR(r: Rexp) extends Rexp {
- override def simp = r.simp match {
- case ZERO => ONE
- case ONE => ONE
- case r => STAR(r)
- }
-}
-case class NTIMES(r: Rexp, n: Int) extends Rexp {
- override def simp = if (n == 0) ONE else
- r.simp match {
- case ZERO => ZERO
- case ONE => ONE
- case r => NTIMES(r, n)
- }
-}
-
-// some convenience for typing in regular expressions
-def charlist2rexp(s : List[Char]) : Rexp = s match {
- case Nil => ONE
- case c::Nil => CHAR(c)
- case c::s => SEQ(CHAR(c), charlist2rexp(s))
-}
-implicit def string2rexp(s : String) : Rexp = charlist2rexp(s.toList)
-
-
-// nullable function: tests whether the regular
-// expression can recognise the empty string
-def nullable (r: Rexp) : Boolean = r match {
- case ZERO => false
- case ONE => true
- case CHAR(_) => false
- case ALT(r1, r2) => nullable(r1) || nullable(r2)
- case SEQ(r1, r2) => nullable(r1) && nullable(r2)
- case STAR(_) => true
- case NTIMES(r, i) => if (i == 0) true else nullable(r)
-}
-
-// derivative of a regular expression w.r.t. a character
-def der (c: Char, r: Rexp) : Rexp = r match {
- case ZERO => ZERO
- case ONE => ZERO
- case CHAR(d) => if (c == d) ONE else ZERO
- case ALT(r1, r2) => ALT(der(c, r1), der(c, r2))
- case SEQ(r1, r2) =>
- if (nullable(r1)) ALT(SEQ(der(c, r1), r2), der(c, r2))
- else SEQ(der(c, r1), r2)
- case STAR(r) => SEQ(der(c, r), STAR(r))
- case NTIMES(r, i) =>
- if (i == 0) ZERO else SEQ(der(c, r), NTIMES(r, i - 1))
-}
-
-// derivative w.r.t. a string (iterates der)
-def ders2(s: List[Char], r: Rexp) : Rexp = (s, r) match {
- case (Nil, r) => r
- case (s, ZERO) => ZERO
- case (s, ONE) => if (s == Nil) ONE else ZERO
- case (s, CHAR(c)) => if (s == List(c)) ONE else
- if (s == Nil) CHAR(c) else ZERO
- case (s, ALT(r1, r2)) => ALT(ders2(s, r2), ders2(s, r2))
- case (c::s, r) => ders2(s, der(c, r).simp)
-}
-
-// main matcher function
-def matcher(r: Rexp, s: String) : Boolean = nullable(ders2(s.toList, r))
-
-
-//one or zero
-def OPT(r: Rexp) = ALT(r, ONE)
-
-def EVIL1(n: Int) = SEQ(NTIMES(OPT("a"), n), NTIMES("a", n))
-val EVIL2 = SEQ(STAR("a"), "b")
-
-def time_needed[T](i: Int, code: => T) = {
- val start = System.nanoTime()
- for (j <- 1 to i) code
- val end = System.nanoTime()
- (end - start)/(i * 1.0e9)
-}
-
-val i = 5000
-println(i + " " + "%.5f".format(time_needed(10, matcher(EVIL1(i), "a" * i))))
-
-for (i <- 1 to 7000001 by 1000000) {
- println(i + " " + "%.5f".format(time_needed(2, matcher(EVIL1(i), "a" * i))))
-}
-
-for (i <- 1 to 7500001 by 500000) {
- println(i + " " + "%.5f".format(time_needed(2, matcher(EVIL2, "a" * i))))
-}
-
--- a/progs/regexp.scala Tue Sep 20 12:17:01 2016 +0100
+++ /dev/null Thu Jan 01 00:00:00 1970 +0000
@@ -1,106 +0,0 @@
-// regular expressions
-abstract class Rexp
-
-case object NULL extends Rexp
-case object EMPTY extends Rexp
-case class CHAR(c: Char) extends Rexp
-case class ALT(r1: Rexp, r2: Rexp) extends Rexp
-case class SEQ(r1: Rexp, r2: Rexp) extends Rexp
-case class STAR(r: Rexp) extends Rexp
-
-// some convenience for typing in regular expressions
-def charlist2rexp(s : List[Char]) : Rexp = s match {
- case Nil => EMPTY
- case c::Nil => CHAR(c)
- case c::s => SEQ(CHAR(c), charlist2rexp(s))
-}
-implicit def string2rexp(s : String) : Rexp = charlist2rexp(s.toList)
-
-
-// for example
-println(STAR("abc"))
-
-// produces STAR(SEQ(CHAR(a),SEQ(CHAR(b),SEQ(CHAR(c),EMPTY))))
-
-
-
-// a simple-minded regular expression matcher:
-// it loops for examples like STAR(EMPTY) with
-// strings this regular expression does not match
-
-def smatchers(rs: List[Rexp], s: List[Char]) : Boolean = (rs, s) match {
- case (NULL::rs, s) => false
- case (EMPTY::rs, s) => smatchers(rs, s)
- case (CHAR(c)::rs, Nil) => false
- case (CHAR(c)::rs, d::s) => (c ==d) && smatchers(rs, s)
- case (ALT(r1, r2)::rs, s) => smatchers(r1::rs, s) || smatchers(r2::rs, s)
- case (SEQ(r1, r2)::rs, s) => smatchers(r1::r2::rs, s)
- case (STAR(r)::rs, s) => smatchers(rs, s) || smatchers(r::STAR(r)::rs, s)
- case (Nil, s) => s == Nil
-}
-
-def smatcher(r: Rexp, s: String) = smatchers(List(r), s.toList)
-
-// regular expression: a
-println(smatcher(CHAR('a'), "ab"))
-
-// regular expression: a + (b o c)
-println(smatcher(ALT(CHAR('a'), SEQ(CHAR('b'), CHAR('c'))), "ab"))
-
-// regular expression: a + (b o c)
-println(smatcher(ALT(CHAR('a'), SEQ(CHAR('b'), CHAR('c'))), "bc"))
-
-// loops for regular expression epsilon*
-//println(smatcher(STAR(EMPTY), "a"))
-
-
-
-// Regular expression matcher that works properly
-//================================================
-
-// nullable function: tests whether the regular
-// expression can recognise the empty string
-def nullable (r: Rexp) : Boolean = r match {
- case NULL => false
- case EMPTY => true
- case CHAR(_) => false
- case ALT(r1, r2) => nullable(r1) || nullable(r2)
- case SEQ(r1, r2) => nullable(r1) && nullable(r2)
- case STAR(_) => true
-}
-
-// derivative of a regular expression w.r.t. a character
-def der (c: Char, r: Rexp) : Rexp = r match {
- case NULL => NULL
- case EMPTY => NULL
- case CHAR(d) => if (c == d) EMPTY else NULL
- case ALT(r1, r2) => ALT(der(c, r1), der(c, r2))
- case SEQ(r1, r2) =>
- if (nullable(r1)) ALT(SEQ(der(c, r1), r2), der(c, r2))
- else SEQ(der(c, r1), r2)
- case STAR(r) => SEQ(der(c, r), STAR(r))
-}
-
-// derivative w.r.t. a string (iterates der)
-def ders (s: List[Char], r: Rexp) : Rexp = s match {
- case Nil => r
- case c::s => ders(s, der(c, r))
-}
-
-// main matcher function
-def matcher(r: Rexp, s: String) : Boolean = nullable(ders(s.toList, r))
-
-
-//examples
-
-println(matcher(SEQ(STAR("a"), STAR("b")), "bbaaa"))
-println(matcher(ALT(STAR("a"), STAR("b")), ""))
-println(matcher("abc", ""))
-println(matcher(STAR(ALT(EMPTY, "a")), ""))
-println(matcher(STAR(EMPTY), "a"))
-println(matcher("cab","cab"))
-println(matcher(STAR("a"),"aaa"))
-println(matcher("cab" ,"cab"))
-println(matcher(STAR("a"),"aaa"))
-
-
--- a/progs/regexp3.scala Tue Sep 20 12:17:01 2016 +0100
+++ /dev/null Thu Jan 01 00:00:00 1970 +0000
@@ -1,141 +0,0 @@
-
-// regular expressions including NOT
-abstract class Rexp
-
-case object NULL extends Rexp
-case object EMPTY extends Rexp
-case class CHAR(c: Char) extends Rexp
-case class ALT(r1: Rexp, r2: Rexp) extends Rexp
-case class SEQ(r1: Rexp, r2: Rexp) extends Rexp
-case class STAR(r: Rexp) extends Rexp
-case class NOT(r: Rexp) extends Rexp
-
-
-// some convenience for typing in regular expressions
-def charlist2rexp(s : List[Char]) : Rexp = s match {
- case Nil => EMPTY
- case c::Nil => CHAR(c)
- case c::s => SEQ(CHAR(c), charlist2rexp(s))
-}
-implicit def string2rexp(s : String) : Rexp = charlist2rexp(s.toList)
-
-
-// nullable function: tests whether the regular
-// expression can recognise the empty string
-def nullable (r: Rexp) : Boolean = r match {
- case NULL => false
- case EMPTY => true
- case CHAR(_) => false
- case ALT(r1, r2) => nullable(r1) || nullable(r2)
- case SEQ(r1, r2) => nullable(r1) && nullable(r2)
- case STAR(_) => true
- case NOT(r) => !(nullable(r))
-}
-
-// tests whether a regular expression
-// cannot recognise more
-def no_more (r: Rexp) : Boolean = r match {
- case NULL => true
- case EMPTY => false
- case CHAR(_) => false
- case ALT(r1, r2) => no_more(r1) && no_more(r2)
- case SEQ(r1, r2) => if (nullable(r1)) (no_more(r1) && no_more(r2)) else no_more(r1)
- case STAR(_) => false
- case NOT(r) => !(no_more(r))
-}
-
-
-// derivative of a regular expression w.r.t. a character
-def der (c: Char, r: Rexp) : Rexp = r match {
- case NULL => NULL
- case EMPTY => NULL case CHAR(d) => if (c == d) EMPTY else NULL
- case ALT(r1, r2) => ALT(der(c, r1), der(c, r2))
- case SEQ(r1, r2) =>
- if (nullable(r1)) ALT(SEQ(der(c, r1), r2), der(c, r2))
- else SEQ(der(c, r1), r2)
- case STAR(r) => SEQ(der(c, r), STAR(r))
- case NOT(r) => NOT(der (c, r))
-}
-
-// regular expression for specifying
-// ranges of characters
-def RANGE(s : List[Char]) : Rexp = s match {
- case Nil => NULL
- case c::Nil => CHAR(c)
- case c::s => ALT(CHAR(c), RANGE(s))
-}
-
-// one or more
-def PLUS(r: Rexp) = SEQ(r, STAR(r))
-
-// some regular expressions
-val LOWERCASE = RANGE("abcdefghijklmnopqrstuvwxyz".toList)
-val UPPERCASE = RANGE("ABCDEFGHIJKLMNOPQRSTUVWXYZ".toList)
-val LETTER = ALT(LOWERCASE, UPPERCASE)
-val DIGIT = RANGE("0123456789".toList)
-val NONZERODIGIT = RANGE("123456789".toList)
-
-val IDENT = SEQ(LETTER, STAR(ALT(LETTER,DIGIT)))
-val NUMBER = ALT(SEQ(NONZERODIGIT, STAR(DIGIT)), "0")
-val WHITESPACE = RANGE(" \n".toList)
-val WHITESPACES = PLUS(WHITESPACE)
-
-val ALL = ALT(ALT(LETTER, DIGIT), WHITESPACE)
-val COMMENT = SEQ(SEQ("/*", NOT(SEQ(SEQ(STAR(ALL), "*/"), STAR(ALL)))), "*/")
-
-
-// for classifying the strings that have been recognised
-abstract class Token
-
-case object T_WHITESPACE extends Token
-case object T_COMMENT extends Token
-case class T_IDENT(s: String) extends Token
-case class T_OP(s: String) extends Token
-case class T_NUM(n: Int) extends Token
-case class T_KEYWORD(s: String) extends Token
-
-
-// an example list of syntactic rules
-type Rule = (Rexp, List[Char] => Token)
-
-val rules: List[Rule]=
- List(("if", (s) => T_KEYWORD(s.mkString)),
- ("then", (s) => T_KEYWORD(s.mkString)),
- ("else", (s) => T_KEYWORD(s.mkString)),
- ("+", (s) => T_OP(s.mkString)),
- (IDENT, (s) => T_IDENT(s.mkString)),
- (NUMBER, (s) => T_NUM(s.mkString.toInt)),
- (WHITESPACES, (s) => T_WHITESPACE),
- (COMMENT, (s) => T_COMMENT))
-
-
-def error (s: String) = throw new IllegalArgumentException ("Cannot tokenize: " + s)
-
-def munch(r: Rexp, action: List[Char] => Token, s: List[Char], t: List[Char]) : Option[(List[Char], Token)] =
- s match {
- case Nil if (nullable(r)) => Some(Nil, action(t))
- case Nil => None
- case c::s if (no_more(der (c, r)) && nullable(r)) => Some(c::s, action(t))
- case c::s if (no_more(der (c, r))) => None
- case c::s => munch(der (c, r), action, s, t ::: List(c))
- }
-
-def one_token (rs: List[Rule], s: List[Char]) : (List[Char], Token) = {
- val somes = rs.map { (r) => munch(r._1, r._2, s, Nil) } .flatten
- if (somes == Nil) error(s.mkString) else (somes sortBy (_._1.length) head)
-}
-
-def tokenize (rs: List[Rule], s: List[Char]) : List[Token] = s match {
- case Nil => Nil
- case _ => one_token(rs, s) match {
- case (rest, token) => token :: tokenize(rs, rest)
- }
-}
-
-//examples
-println(tokenize(rules, "if true then then 42 else +".toList))
-println(tokenize(rules, "if+true+then+then+42+else +".toList))
-println(tokenize(rules, "ifff if 34 34".toList))
-println(tokenize(rules, "/*ifff if */ hhjj /*34 */".toList))
-println(tokenize(rules, "/* if true then */ then 42 else +".toList))
-//println(tokenize(rules, "ifff $ if 34".toList)) // causes an error because of the symbol $
--- a/progs/regexp4.scala Tue Sep 20 12:17:01 2016 +0100
+++ /dev/null Thu Jan 01 00:00:00 1970 +0000
@@ -1,168 +0,0 @@
-// regular expressions
-abstract class Rexp
-
-case object NULL extends Rexp
-case object EMPTY extends Rexp
-case class CHAR(c: Char) extends Rexp
-case class ALT(r1: Rexp, r2: Rexp) extends Rexp
-case class SEQ(r1: Rexp, r2: Rexp) extends Rexp
-case class STAR(r: Rexp) extends Rexp
-case class NOT(r: Rexp) extends Rexp
-
-
-// some convenience for typing in regular expressions
-def charlist2rexp(s : List[Char]) : Rexp = s match {
- case Nil => EMPTY
- case c::Nil => CHAR(c)
- case c::s => SEQ(CHAR(c), charlist2rexp(s))
-}
-implicit def string2rexp(s : String) : Rexp = charlist2rexp(s.toList)
-
-
-// nullable function: tests whether the regular
-// expression can recognise the empty string
-def nullable (r: Rexp) : Boolean = r match {
- case NULL => false
- case EMPTY => true
- case CHAR(_) => false
- case ALT(r1, r2) => nullable(r1) || nullable(r2)
- case SEQ(r1, r2) => nullable(r1) && nullable(r2)
- case STAR(_) => true
- case NOT(r) => !(nullable(r))
-}
-
-// tests whether a regular expression
-// recognises nothing
-def zeroable (r: Rexp) : Boolean = r match {
- case NULL => true
- case EMPTY => false
- case CHAR(_) => false
- case ALT(r1, r2) => zeroable(r1) && zeroable(r2)
- case SEQ(r1, r2) => zeroable(r1) || zeroable(r2)
- case STAR(_) => false
- case NOT(r) => !(zeroable(r))
-}
-
-def starts_with (r: Rexp, c: Char) : Boolean = r match {
- case NULL => false
- case EMPTY => false
- case CHAR(d) => (c == d)
- case ALT(r1, r2) => starts_with(r1, c) || starts_with(r2, c)
- case SEQ(r1, r2) => if (nullable(r1)) (starts_with(r1, c) || starts_with(r2, c))
- else starts_with(r1, c)
- case STAR(r) => starts_with(r, c)
- case NOT(r) => !(starts_with(r, c))
-}
-
-// derivative of a regular expression w.r.t. a character
-def der (c: Char, r: Rexp) : Rexp = r match {
- case NULL => NULL
- case EMPTY => NULL
- case CHAR(d) => if (c == d) EMPTY else NULL
- case ALT(r1, r2) => ALT(der(c, r1), der(c, r2))
- case SEQ(r1, r2) =>
- if (nullable(r1)) ALT(SEQ(der(c, r1), r2), der(c, r2))
- else SEQ(der(c, r1), r2)
- case STAR(r) => SEQ(der(c, r), STAR(r))
- case NOT(r) => NOT(der (c, r))
-}
-
-// derivative w.r.t. a string (iterates der)
-def ders (s: List[Char], r: Rexp) : Rexp = s match {
- case Nil => r
- case c::s => ders(s, der(c, r))
-}
-
-// main matcher function
-def matcher(r: Rexp, s: String) : Boolean = nullable(ders(s.toList, r))
-
-
-// regular expression for specifying
-// ranges of characters
-def RANGE(s : List[Char]) : Rexp = s match {
- case Nil => NULL
- case c::Nil => CHAR(c)
- case c::s => ALT(CHAR(c), RANGE(s))
-}
-
-//one or more
-def PLUS(r: Rexp) = SEQ(r, STAR(r))
-
-
-//some regular expressions
-val LOWERCASE = RANGE("abcdefghijklmnopqrstuvwxyz".toList)
-val UPPERCASE = RANGE("ABCDEFGHIJKLMNOPQRSTUVWXYZ".toList)
-val LETTER = ALT(LOWERCASE, UPPERCASE)
-val DIGITS = RANGE("0123456789".toList)
-val NONZERODIGITS = RANGE("123456789".toList)
-
-val IDENT = SEQ(LETTER, STAR(ALT(LETTER,DIGITS)))
-val NUMBER = ALT(SEQ(NONZERODIGITS, STAR(DIGITS)), "0")
-val WHITESPACE = RANGE(" \n".toList)
-val SYMBOLS = RANGE("/*".toList)
-
-val ALL = ALT(ALT(ALT(LETTER, DIGITS), WHITESPACE), SYMBOLS)
-
-val COMMENT = SEQ(SEQ("/*", NOT(SEQ(SEQ(STAR(ALL), "*/"), STAR(ALL)))), "*/")
-
-println(matcher(NUMBER, "0"))
-println(matcher(NUMBER, "01"))
-println(matcher(NUMBER, "123450"))
-
-println(matcher(SEQ(STAR("a"), STAR("b")), "bbaaa"))
-println(matcher(ALT(STAR("a"), STAR("b")), ""))
-println(matcher("abc", ""))
-println(matcher(STAR(ALT(EMPTY, "a")), ""))
-println(matcher(STAR(EMPTY), "a"))
-println(matcher("cab","cab"))
-println(matcher(STAR("a"),"aaa"))
-println(matcher("cab" ,"cab"))
-println(matcher(STAR("a"),"aaa"))
-
-println(matcher(COMMENT, "/* */"))
-println(matcher(COMMENT, "/* foobar comment */"))
-println(matcher(COMMENT, "/* test */ test */"))
-
-// an example list of regular expressions
-val regs: List[Rexp]= List("if", "then", "else", "+", IDENT, NUMBER, COMMENT, WHITESPACE)
-
-
-def error (s: String) = throw new IllegalArgumentException ("Could not lex " + s)
-
-def munch(r: Rexp, s: List[Char], t: List[Char]) : Option[(List[Char], List[Char])] =
- if (zeroable(r)) None else s match {
- case Nil => if (nullable(r)) Some(Nil, t) else None
- case c::s if (zeroable(der (c, r)) && nullable(r)) => Some(c::s, t)
- //case c::s if (zeroable(der (c, r))) => None
- case c::s => munch(der (c, r), s, t ::: List(c))
-}
-
-
-def lex_one (regs: List[Rexp], s: List[Char]) : (List[Char], List[Char]) = {
- val somes = regs.map { munch(_, s, Nil) } .flatten
- if (somes == Nil) error(s.mkString) else (somes sortBy (_._1.length) head)
-}
-
-def lex_all (regs: List[Rexp], s: List[Char]) : List[String] = s match {
- case Nil => Nil
- case _ => lex_one(regs, s) match {
- case (rest, s) => s.mkString :: lex_all(regs, rest)
- }
-}
-
-
-
-starts_with(der('/', COMMENT), '*')
-
-munch(COMMENT, "/*ifff if 34 */".toList, Nil)
-val COMMENT2 = NOT(SEQ(SEQ(STAR(ALL), "*/"), STAR(ALL)))
-
-der('a', COMMENT2)
-zeroable(der('a', COMMENT2))
-
-matcher(COMMENT2, "ifff if 34")
-munch(COMMENT2, "ifff if 34".toList, Nil)
-starts_with(COMMENT2, 'i')
-lex_all(regs, "ifff if 34".toList)
-lex_all(regs, "ifff $ if 34".toList)
-
--- a/progs/regexp5.scala Tue Sep 20 12:17:01 2016 +0100
+++ /dev/null Thu Jan 01 00:00:00 1970 +0000
@@ -1,177 +0,0 @@
-// regular expressions
-abstract class Rexp
-
-case object NULL extends Rexp
-case object EMPTY extends Rexp
-case class CHAR(c: Char) extends Rexp
-case class ALT(r1: Rexp, r2: Rexp) extends Rexp
-case class SEQ(r1: Rexp, r2: Rexp) extends Rexp
-case class STAR(r: Rexp) extends Rexp
-case class NOT(r: Rexp) extends Rexp
-
-
-// some convenience for typing in regular expressions
-def charlist2rexp(s : List[Char]) : Rexp = s match {
- case Nil => EMPTY
- case c::Nil => CHAR(c)
- case c::s => SEQ(CHAR(c), charlist2rexp(s))
-}
-implicit def string2rexp(s : String) : Rexp = charlist2rexp(s.toList)
-
-
-// nullable function: tests whether the regular
-// expression can recognise the empty string
-def nullable (r: Rexp) : Boolean = r match {
- case NULL => false
- case EMPTY => true
- case CHAR(_) => false
- case ALT(r1, r2) => nullable(r1) || nullable(r2)
- case SEQ(r1, r2) => nullable(r1) && nullable(r2)
- case STAR(_) => true
- case NOT(r) => !(nullable(r))
-}
-
-// tests whether a regular expression
-// recognises nothing
-def zeroable (r: Rexp) : Boolean = r match {
- case NULL => true
- case EMPTY => false
- case CHAR(_) => false
- case ALT(r1, r2) => zeroable(r1) && zeroable(r2)
- case SEQ(r1, r2) => if (nullable(r1)) (zeroable(r1) && zeroable(r2)) else zeroable(r1)
- //zeroable(r1) || zeroable(r2)
- case STAR(_) => false
- case NOT(r) => !(zeroable(r))
-}
-
-
-// derivative of a regular expression w.r.t. a character
-def der (c: Char, r: Rexp) : Rexp = r match {
- case NULL => NULL
- case EMPTY => NULL
- case CHAR(d) => if (c == d) EMPTY else NULL
- case ALT(r1, r2) => ALT(der(c, r1), der(c, r2))
- case SEQ(r1, r2) =>
- if (nullable(r1)) ALT(SEQ(der(c, r1), r2), der(c, r2))
- else SEQ(der(c, r1), r2)
- case STAR(r) => SEQ(der(c, r), STAR(r))
- case NOT(r) => NOT(der (c, r))
-}
-
-// derivative w.r.t. a string (iterates der)
-def ders (s: List[Char], r: Rexp) : Rexp = s match {
- case Nil => r
- case c::s => ders(s, der(c, r))
-}
-
-// main matcher function
-def matcher(r: Rexp, s: String) : Boolean = nullable(ders(s.toList, r))
-
-
-// regular expression for specifying
-// ranges of characters
-def RANGE(s : List[Char]) : Rexp = s match {
- case Nil => NULL
- case c::Nil => CHAR(c)
- case c::s => ALT(CHAR(c), RANGE(s))
-}
-
-//one or more
-def PLUS(r: Rexp) = SEQ(r, STAR(r))
-
-
-//some regular expressions
-val LOWERCASE = RANGE("abcdefghijklmnopqrstuvwxyz".toList)
-val UPPERCASE = RANGE("ABCDEFGHIJKLMNOPQRSTUVWXYZ".toList)
-val LETTER = ALT(LOWERCASE, UPPERCASE)
-val DIGITS = RANGE("0123456789".toList)
-val NONZERODIGITS = RANGE("123456789".toList)
-
-val IDENT = SEQ(LETTER, STAR(ALT(LETTER,DIGITS)))
-val NUMBER = ALT(SEQ(NONZERODIGITS, STAR(DIGITS)), "0")
-val WHITESPACE = RANGE(" \n".toList)
-
-val ALL = ALT(ALT(LETTER, DIGITS), WHITESPACE)
-
-val COMMENT = SEQ(SEQ("/*", NOT(SEQ(SEQ(STAR(ALL), "*/"), STAR(ALL)))), "*/")
-
-println(matcher(NUMBER, "0"))
-println(matcher(NUMBER, "01"))
-println(matcher(NUMBER, "123450"))
-
-println(matcher(SEQ(STAR("a"), STAR("b")), "bbaaa"))
-println(matcher(ALT(STAR("a"), STAR("b")), ""))
-println(matcher("abc", ""))
-println(matcher(STAR(ALT(EMPTY, "a")), ""))
-println(matcher(STAR(EMPTY), "a"))
-println(matcher("cab","cab"))
-println(matcher(STAR("a"),"aaa"))
-println(matcher("cab" ,"cab"))
-println(matcher(STAR("a"),"aaa"))
-
-println(matcher(COMMENT, "/* */"))
-println(matcher(COMMENT, "/* 34 */"))
-println(matcher(COMMENT, "/* foobar comment */"))
-println(matcher(COMMENT, "/* test */ test */"))
-
-// an example list of regular expressions
-
-abstract class Token
-
-case object T_WHITESPACE extends Token
-case object T_COMMENT extends Token
-case class T_IDENT(s: String) extends Token
-case class T_OP(s: String) extends Token
-case class T_NUM(n: Int) extends Token
-case class T_KEYWORD(s: String) extends Token
-
-val regs: List[Rexp]= List("if", "then", "else", "+", IDENT, NUMBER, WHITESPACE)
-
-type Rule = (Rexp, List[Char] => Token)
-
-val rules: List[Rule]=
- List(("if", (s) => T_KEYWORD(s.mkString)),
- ("then", (s) => T_KEYWORD(s.mkString)),
- ("else", (s) => T_KEYWORD(s.mkString)),
- ("+", (s) => T_OP(s.mkString)),
- (IDENT, (s) => T_IDENT(s.mkString)),
- (NUMBER, (s) => T_NUM(s.mkString.toInt)),
- (WHITESPACE, (s) => T_WHITESPACE),
- (COMMENT, (s) => T_COMMENT))
-
-
-def error (s: String) = throw new IllegalArgumentException ("Could not lex " + s)
-
-def munch(r: Rexp, action: List[Char] => Token, s: List[Char], t: List[Char]) : Option[(List[Char], Token)] =
-{ println("string " + s)
- println(" rexp " + r)
- s match {
- case Nil if (nullable(r)) => Some(Nil, action(t))
- case Nil => { println("1"); None }
- case c::s if (zeroable(der (c, r)) && nullable(r)) => Some(c::s, action(t))
- case c::s if (zeroable(der (c, r))) => { println("2"); None }
- case c::s => munch(der (c, r), action, s, t ::: List(c))
- }
-}
-
-def lex_one (rs: List[Rule], s: List[Char]) : (List[Char], Token) = {
- val somes = rs.map { (r) => munch(r._1, r._2, s, Nil) } .flatten
- if (somes == Nil) error(s.mkString) else (somes sortBy (_._1.length) head)
-}
-
-def lex_all (rs: List[Rule], s: List[Char]) : List[Token] = s match {
- case Nil => Nil
- case _ => lex_one(rs, s) match {
- case (rest, t) => t :: lex_all(rs, rest)
- }
-}
-
-
-
-println(matcher(COMMENT, "/*ifff if 34 34*/"))
-rules.map { (r) => munch(r._1, r._2, "/*ifff if 34 34*/ ".toList, Nil) }
-println(lex_all(rules, "ifff if 34 34".toList))
-println(lex_all(rules, " /*ifff if 34 34*/ ".toList))
-println(lex_all(rules, "ifff $ if 34".toList))
-
-