handouts/ho02.tex
author Christian Urban <christian dot urban at kcl dot ac dot uk>
Fri, 26 Sep 2014 14:06:55 +0100
changeset 258 1e4da6d2490c
parent 251 5b5a68df6d16
child 259 e5f4b8ff23b8
permissions -rw-r--r--
updated programs
Ignore whitespace changes - Everywhere: Within whitespace: At end of lines:
123
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents:
diff changeset
     1
\documentclass{article}
251
5b5a68df6d16 updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 217
diff changeset
     2
\usepackage{../style}
217
cd6066f1056a updated handouts
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 140
diff changeset
     3
\usepackage{../langs}
123
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents:
diff changeset
     4
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents:
diff changeset
     5
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents:
diff changeset
     6
\begin{document}
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents:
diff changeset
     7
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents:
diff changeset
     8
\section*{Handout 2}
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents:
diff changeset
     9
258
1e4da6d2490c updated programs
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 251
diff changeset
    10
Having specified what problem our matching algorithm, \pcode{match},
1e4da6d2490c updated programs
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 251
diff changeset
    11
is supposed to solve, namely for a given regular expression $r$ and
1e4da6d2490c updated programs
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 251
diff changeset
    12
string $s$ answer \textit{true} if and only if
123
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents:
diff changeset
    13
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents:
diff changeset
    14
\[
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents:
diff changeset
    15
s \in L(r)
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents:
diff changeset
    16
\]
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents:
diff changeset
    17
251
5b5a68df6d16 updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 217
diff changeset
    18
\noindent we can look at an algorithm to solve this problem.
5b5a68df6d16 updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 217
diff changeset
    19
Clearly we cannot use the function $L$ directly for this,
5b5a68df6d16 updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 217
diff changeset
    20
because in general the set of strings $L$ returns is infinite
5b5a68df6d16 updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 217
diff changeset
    21
(recall what $L(a^*)$ is). In such cases there is no way we
5b5a68df6d16 updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 217
diff changeset
    22
can implement an exhaustive test for whether a string is
258
1e4da6d2490c updated programs
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 251
diff changeset
    23
member of this set or not. Before we come to the matching 
1e4da6d2490c updated programs
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 251
diff changeset
    24
algorithm, lets have a closer look at what it means when
1e4da6d2490c updated programs
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 251
diff changeset
    25
two regular expressions are equivalent.
1e4da6d2490c updated programs
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 251
diff changeset
    26
1e4da6d2490c updated programs
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 251
diff changeset
    27
\subsection*{Regular Expression Equivalences}
123
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents:
diff changeset
    28
258
1e4da6d2490c updated programs
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 251
diff changeset
    29
1e4da6d2490c updated programs
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 251
diff changeset
    30
\subsection*{Matching Algorithm}
1e4da6d2490c updated programs
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 251
diff changeset
    31
1e4da6d2490c updated programs
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 251
diff changeset
    32
The algorithm we will define below consists of two parts. One is the
1e4da6d2490c updated programs
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 251
diff changeset
    33
function $nullable$ which takes a regular expression as argument and
1e4da6d2490c updated programs
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 251
diff changeset
    34
decides whether it can match the empty string (this means it returns a
124
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 123
diff changeset
    35
boolean). This can be easily defined recursively as follows:
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 123
diff changeset
    36
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 123
diff changeset
    37
\begin{center}
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 123
diff changeset
    38
\begin{tabular}{@ {}l@ {\hspace{2mm}}c@ {\hspace{2mm}}l@ {}}
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 123
diff changeset
    39
$nullable(\varnothing)$      & $\dn$ & $f\!\/alse$\\
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 123
diff changeset
    40
$nullable(\epsilon)$           & $\dn$ &  $true$\\
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 123
diff changeset
    41
$nullable (c)$                    & $\dn$ &  $f\!alse$\\
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 123
diff changeset
    42
$nullable (r_1 + r_2)$       & $\dn$ &  $nullable(r_1) \vee nullable(r_2)$\\ 
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 123
diff changeset
    43
$nullable (r_1 \cdot r_2)$ & $\dn$ &  $nullable(r_1) \wedge nullable(r_2)$\\
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 123
diff changeset
    44
$nullable (r^*)$                & $\dn$ & $true$ \\
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 123
diff changeset
    45
\end{tabular}
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 123
diff changeset
    46
\end{center}
123
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents:
diff changeset
    47
124
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 123
diff changeset
    48
\noindent
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 123
diff changeset
    49
The idea behind this function is that the following property holds:
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 123
diff changeset
    50
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 123
diff changeset
    51
\[
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 123
diff changeset
    52
nullable(r) \;\;\text{if and only if}\;\; ""\in L(r)
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 123
diff changeset
    53
\]
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 123
diff changeset
    54
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 123
diff changeset
    55
\noindent
258
1e4da6d2490c updated programs
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 251
diff changeset
    56
Note on the left-hand side we have a function we can implement; on the
1e4da6d2490c updated programs
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 251
diff changeset
    57
right we have its specification.
124
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 123
diff changeset
    58
258
1e4da6d2490c updated programs
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 251
diff changeset
    59
The other function of our matching algorithm calculates a
1e4da6d2490c updated programs
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 251
diff changeset
    60
\emph{derivative} of a regular expression. This is a function which
1e4da6d2490c updated programs
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 251
diff changeset
    61
will take a regular expression, say $r$, and a character, say $c$, as
1e4da6d2490c updated programs
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 251
diff changeset
    62
argument and return a new regular expression. Be careful that the
1e4da6d2490c updated programs
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 251
diff changeset
    63
intuition behind this function is not so easy to grasp on first
1e4da6d2490c updated programs
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 251
diff changeset
    64
reading. Essentially this function solves the following problem: if
1e4da6d2490c updated programs
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 251
diff changeset
    65
$r$ can match a string of the form $c\!::\!s$, what does the regular
1e4da6d2490c updated programs
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 251
diff changeset
    66
expression look like that can match just $s$. The definition of this
125
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 124
diff changeset
    67
function is as follows:
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 124
diff changeset
    68
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 124
diff changeset
    69
\begin{center}
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 124
diff changeset
    70
\begin{tabular}{@ {}l@ {\hspace{2mm}}c@ {\hspace{2mm}}l@ {\hspace{-10mm}}l@ {}}
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 124
diff changeset
    71
  $der\, c\, (\varnothing)$      & $\dn$ & $\varnothing$ & \\
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 124
diff changeset
    72
  $der\, c\, (\epsilon)$           & $\dn$ & $\varnothing$ & \\
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 124
diff changeset
    73
  $der\, c\, (d)$                     & $\dn$ & if $c = d$ then $\epsilon$ else $\varnothing$ & \\
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 124
diff changeset
    74
  $der\, c\, (r_1 + r_2)$        & $\dn$ & $der\, c\, r_1 + der\, c\, r_2$ & \\
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 124
diff changeset
    75
  $der\, c\, (r_1 \cdot r_2)$  & $\dn$  & if $nullable (r_1)$\\
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 124
diff changeset
    76
  & & then $(der\,c\,r_1) \cdot r_2 + der\, c\, r_2$\\ 
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 124
diff changeset
    77
  & & else $(der\, c\, r_1) \cdot r_2$\\
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 124
diff changeset
    78
  $der\, c\, (r^*)$          & $\dn$ & $(der\,c\,r) \cdot (r^*)$ &
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 124
diff changeset
    79
  \end{tabular}
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 124
diff changeset
    80
\end{center}
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 124
diff changeset
    81
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 124
diff changeset
    82
\noindent
258
1e4da6d2490c updated programs
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 251
diff changeset
    83
The first two clauses can be rationalised as follows: recall that
1e4da6d2490c updated programs
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 251
diff changeset
    84
$der$ should calculate a regular expression, if the ``input'' regular
1e4da6d2490c updated programs
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 251
diff changeset
    85
expression can match a string of the form $c\!::\!s$. Since neither
1e4da6d2490c updated programs
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 251
diff changeset
    86
$\varnothing$ nor $\epsilon$ can match such a string we return
1e4da6d2490c updated programs
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 251
diff changeset
    87
$\varnothing$. In the third case we have to make a case-distinction:
1e4da6d2490c updated programs
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 251
diff changeset
    88
In case the regular expression is $c$, then clearly it can recognise a
1e4da6d2490c updated programs
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 251
diff changeset
    89
string of the form $c\!::\!s$, just that $s$ is the empty
1e4da6d2490c updated programs
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 251
diff changeset
    90
string. Therefore we return the $\epsilon$-regular expression. In the
1e4da6d2490c updated programs
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 251
diff changeset
    91
other case we again return $\varnothing$ since no string of the
1e4da6d2490c updated programs
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 251
diff changeset
    92
$c\!::\!s$ can be matched.  The $+$-case is relatively
1e4da6d2490c updated programs
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 251
diff changeset
    93
straightforward: all strings of the form $c\!::\!s$ are either matched
1e4da6d2490c updated programs
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 251
diff changeset
    94
by the regular expression $r_1$ or $r_2$. So we just have to
1e4da6d2490c updated programs
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 251
diff changeset
    95
recursively call $der$ with these two regular expressions and compose
1e4da6d2490c updated programs
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 251
diff changeset
    96
the results again with $+$. The $\cdot$-case is more complicated: if
1e4da6d2490c updated programs
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 251
diff changeset
    97
$r_1\cdot r_2$ matches a string of the form $c\!::\!s$, then the first
1e4da6d2490c updated programs
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 251
diff changeset
    98
part must be matched by $r_1$.  Consequently, it makes sense to
1e4da6d2490c updated programs
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 251
diff changeset
    99
construct the regular expression for $s$ by calling $der$ with $r_1$
1e4da6d2490c updated programs
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 251
diff changeset
   100
and ``appending'' $r_2$. There is however one exception to this simple
1e4da6d2490c updated programs
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 251
diff changeset
   101
rule: if $r_1$ can match the empty string, then all of $c\!::\!s$ is
1e4da6d2490c updated programs
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 251
diff changeset
   102
matched by $r_2$. So in case $r_1$ is nullable (that is can match the
1e4da6d2490c updated programs
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 251
diff changeset
   103
empty string) we have to allow the choice $der\,c\,r_2$ for
1e4da6d2490c updated programs
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 251
diff changeset
   104
calculating the regular expression that can match $s$. The $*$-case is
1e4da6d2490c updated programs
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 251
diff changeset
   105
again simple: if $r^*$ matches a string of the form $c\!::\!s$, then
1e4da6d2490c updated programs
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 251
diff changeset
   106
the first part must be ``matched'' by a single copy of $r$. Therefore
1e4da6d2490c updated programs
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 251
diff changeset
   107
we call recursively $der\,c\,r$ and ``append'' $r^*$ in order to match
1e4da6d2490c updated programs
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 251
diff changeset
   108
the rest of $s$.
125
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 124
diff changeset
   109
258
1e4da6d2490c updated programs
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 251
diff changeset
   110
Another way to rationalise the definition of $der$ is to consider the
1e4da6d2490c updated programs
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 251
diff changeset
   111
following operation on sets:
125
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 124
diff changeset
   112
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 124
diff changeset
   113
\[
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 124
diff changeset
   114
Der\,c\,A\;\dn\;\{s\,|\,c\!::\!s \in A\}
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 124
diff changeset
   115
\]
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 124
diff changeset
   116
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 124
diff changeset
   117
\noindent
258
1e4da6d2490c updated programs
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 251
diff changeset
   118
which essentially transforms a set of strings $A$ by filtering out all
1e4da6d2490c updated programs
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 251
diff changeset
   119
strings that do not start with $c$ and then strips off the $c$ from
1e4da6d2490c updated programs
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 251
diff changeset
   120
all the remaining strings. For example suppose $A = \{"f\!oo", "bar",
1e4da6d2490c updated programs
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 251
diff changeset
   121
"f\!rak"\}$ then
125
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 124
diff changeset
   122
\[
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 124
diff changeset
   123
Der\,f\,A = \{"oo", "rak"\}\quad,\quad
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 124
diff changeset
   124
Der\,b\,A = \{"ar"\}  \quad \text{and} \quad
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 124
diff changeset
   125
Der\,a\,A = \varnothing
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 124
diff changeset
   126
\]
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 124
diff changeset
   127
 
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 124
diff changeset
   128
\noindent
258
1e4da6d2490c updated programs
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 251
diff changeset
   129
Note that in the last case $Der$ is empty, because no string in $A$
1e4da6d2490c updated programs
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 251
diff changeset
   130
starts with $a$. With this operation we can state the following
1e4da6d2490c updated programs
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 251
diff changeset
   131
property about $der$:
125
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 124
diff changeset
   132
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 124
diff changeset
   133
\[
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 124
diff changeset
   134
L(der\,c\,r) = Der\,c\,(L(r))
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 124
diff changeset
   135
\]
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 124
diff changeset
   136
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 124
diff changeset
   137
\noindent
258
1e4da6d2490c updated programs
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 251
diff changeset
   138
This property clarifies what regular expression $der$ calculates,
1e4da6d2490c updated programs
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 251
diff changeset
   139
namely take the set of strings that $r$ can match (that is $L(r)$),
1e4da6d2490c updated programs
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 251
diff changeset
   140
filter out all strings not starting with $c$ and strip off the $c$
1e4da6d2490c updated programs
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 251
diff changeset
   141
from the remaining strings---this is exactly the language that
1e4da6d2490c updated programs
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 251
diff changeset
   142
$der\,c\,r$ can match.
125
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 124
diff changeset
   143
258
1e4da6d2490c updated programs
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 251
diff changeset
   144
If we want to find out whether the string $"abc"$ is matched by the
1e4da6d2490c updated programs
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 251
diff changeset
   145
regular expression $r$ then we can iteratively apply $Der$ as follows
140
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 133
diff changeset
   146
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 133
diff changeset
   147
\begin{enumerate}
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 133
diff changeset
   148
\item $Der\,a\,(L(r))$
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 133
diff changeset
   149
\item $Der\,b\,(Der\,a\,(L(r)))$
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 133
diff changeset
   150
\item $Der\,c\,(Der\,b\,(Der\,a\,(L(r))))$
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 133
diff changeset
   151
\end{enumerate}
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 133
diff changeset
   152
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 133
diff changeset
   153
\noindent
258
1e4da6d2490c updated programs
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 251
diff changeset
   154
In the last step we need to test whether the empty string is in the
1e4da6d2490c updated programs
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 251
diff changeset
   155
set. Our matching algorithm will work similarly, just using regular
1e4da6d2490c updated programs
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 251
diff changeset
   156
expression instead of sets. For this we need to lift the notion of
1e4da6d2490c updated programs
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 251
diff changeset
   157
derivatives from characters to strings. This can be done using the
1e4da6d2490c updated programs
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 251
diff changeset
   158
following function, taking a string and regular expression as input
1e4da6d2490c updated programs
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 251
diff changeset
   159
and a regular expression as output.
125
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 124
diff changeset
   160
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 124
diff changeset
   161
\begin{center}
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 124
diff changeset
   162
\begin{tabular}{@ {}l@ {\hspace{2mm}}c@ {\hspace{2mm}}l@ {\hspace{-10mm}}l@ {}}
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 124
diff changeset
   163
  $der\!s\, []\, r$     & $\dn$ & $r$ & \\
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 124
diff changeset
   164
  $der\!s\, (c\!::\!s)\, r$ & $\dn$ & $der\!s\,s\,(der\,c\,r)$ & \\
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 124
diff changeset
   165
  \end{tabular}
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 124
diff changeset
   166
\end{center}
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 124
diff changeset
   167
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 124
diff changeset
   168
\noindent
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 124
diff changeset
   169
Having $ders$ in place, we can finally define our matching algorithm:
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 124
diff changeset
   170
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 124
diff changeset
   171
\[
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 124
diff changeset
   172
match\,s\,r = nullable(ders\,s\,r)
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 124
diff changeset
   173
\]
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 124
diff changeset
   174
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 124
diff changeset
   175
\noindent
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 124
diff changeset
   176
We claim that
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 124
diff changeset
   177
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 124
diff changeset
   178
\[
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 124
diff changeset
   179
match\,s\,r\quad\text{if and only if}\quad s\in L(r)
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 124
diff changeset
   180
\]
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 124
diff changeset
   181
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 124
diff changeset
   182
\noindent
258
1e4da6d2490c updated programs
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 251
diff changeset
   183
holds, which means our algorithm satisfies the specification. This
1e4da6d2490c updated programs
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 251
diff changeset
   184
algorithm was introduced by Janus Brzozowski in 1964. Its main
1e4da6d2490c updated programs
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 251
diff changeset
   185
attractions are simplicity and being fast, as well as being easily
1e4da6d2490c updated programs
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 251
diff changeset
   186
extendable for other regular expressions such as $r^{\{n\}}$, $r^?$,
1e4da6d2490c updated programs
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 251
diff changeset
   187
$\sim{}r$ and so on.
1e4da6d2490c updated programs
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 251
diff changeset
   188
1e4da6d2490c updated programs
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 251
diff changeset
   189
\subsection*{The Matching Algorithm in Scala}
1e4da6d2490c updated programs
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 251
diff changeset
   190
1e4da6d2490c updated programs
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 251
diff changeset
   191
126
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 125
diff changeset
   192
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 125
diff changeset
   193
\begin{figure}[p]
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 125
diff changeset
   194
{\lstset{language=Scala}\texttt{\lstinputlisting{../progs/app5.scala}}}
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 125
diff changeset
   195
{\lstset{language=Scala}\texttt{\lstinputlisting{../progs/app6.scala}}}
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 125
diff changeset
   196
\caption{Scala implementation of the nullable and derivatives functions.}
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 125
diff changeset
   197
\end{figure}
123
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents:
diff changeset
   198
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents:
diff changeset
   199
\end{document}
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents:
diff changeset
   200
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents:
diff changeset
   201
%%% Local Variables: 
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents:
diff changeset
   202
%%% mode: latex
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents:
diff changeset
   203
%%% TeX-master: t
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents:
diff changeset
   204
%%% End: