handouts/ho03.tex
changeset 140 1be892087df2
child 141 665087dcf7d2
equal deleted inserted replaced
139:6e7c3db9023d 140:1be892087df2
       
     1 \documentclass{article}
       
     2 \usepackage{charter}
       
     3 \usepackage{hyperref}
       
     4 \usepackage{amssymb}
       
     5 \usepackage{amsmath}
       
     6 \usepackage[T1]{fontenc}
       
     7 \usepackage{listings}
       
     8 \usepackage{xcolor}
       
     9 
       
    10 \newcommand{\dn}{\stackrel{\mbox{\scriptsize def}}{=}}%
       
    11 
       
    12 \definecolor{javared}{rgb}{0.6,0,0} % for strings
       
    13 \definecolor{javagreen}{rgb}{0.25,0.5,0.35} % comments
       
    14 \definecolor{javapurple}{rgb}{0.5,0,0.35} % keywords
       
    15 \definecolor{javadocblue}{rgb}{0.25,0.35,0.75} % javadoc
       
    16 
       
    17 \lstdefinelanguage{scala}{
       
    18   morekeywords={abstract,case,catch,class,def,%
       
    19     do,else,extends,false,final,finally,%
       
    20     for,if,implicit,import,match,mixin,%
       
    21     new,null,object,override,package,%
       
    22     private,protected,requires,return,sealed,%
       
    23     super,this,throw,trait,true,try,%
       
    24     type,val,var,while,with,yield},
       
    25   otherkeywords={=>,<-,<\%,<:,>:,\#,@},
       
    26   sensitive=true,
       
    27   morecomment=[l]{//},
       
    28   morecomment=[n]{/*}{*/},
       
    29   morestring=[b]",
       
    30   morestring=[b]',
       
    31   morestring=[b]"""
       
    32 }
       
    33 
       
    34 \lstset{language=Scala,
       
    35 	basicstyle=\ttfamily,
       
    36 	keywordstyle=\color{javapurple}\bfseries,
       
    37 	stringstyle=\color{javagreen},
       
    38 	commentstyle=\color{javagreen},
       
    39 	morecomment=[s][\color{javadocblue}]{/**}{*/},
       
    40 	numbers=left,
       
    41 	numberstyle=\tiny\color{black},
       
    42 	stepnumber=1,
       
    43 	numbersep=10pt,
       
    44 	tabsize=2,
       
    45 	showspaces=false,
       
    46 	showstringspaces=false}
       
    47 	
       
    48 \begin{document}
       
    49 
       
    50 \section*{Handout 2}
       
    51 
       
    52 Having specified what problem our matching algorithm, $match$, is supposed to solve, namely
       
    53 for a given regular expression $r$ and string $s$ answer $true$ if and only if
       
    54 
       
    55 \[
       
    56 s \in L(r)
       
    57 \]
       
    58 
       
    59 \noindent
       
    60 Clearly we cannot use the function $L$ directly in order to solve this problem, because in general
       
    61 the set of strings $L$ returns is infinite (recall what $L(a^*)$ is). In such cases there is no algorithm
       
    62 then can test exhaustively, whether a string is member of this set.
       
    63 
       
    64 The algorithm we define below consists of two parts. One is the function $nullable$ which takes a
       
    65 regular expression as argument and decides whether it can match the empty string (this means it returns a 
       
    66 boolean). This can be easily defined recursively as follows:
       
    67 
       
    68 \begin{center}
       
    69 \begin{tabular}{@ {}l@ {\hspace{2mm}}c@ {\hspace{2mm}}l@ {}}
       
    70 $nullable(\varnothing)$      & $\dn$ & $f\!\/alse$\\
       
    71 $nullable(\epsilon)$           & $\dn$ &  $true$\\
       
    72 $nullable (c)$                    & $\dn$ &  $f\!alse$\\
       
    73 $nullable (r_1 + r_2)$       & $\dn$ &  $nullable(r_1) \vee nullable(r_2)$\\ 
       
    74 $nullable (r_1 \cdot r_2)$ & $\dn$ &  $nullable(r_1) \wedge nullable(r_2)$\\
       
    75 $nullable (r^*)$                & $\dn$ & $true$ \\
       
    76 \end{tabular}
       
    77 \end{center}
       
    78 
       
    79 \noindent
       
    80 The idea behind this function is that the following property holds:
       
    81 
       
    82 \[
       
    83 nullable(r) \;\;\text{if and only if}\;\; ""\in L(r)
       
    84 \]
       
    85 
       
    86 \noindent
       
    87 On the left-hand side we have a function we can implement; on the right we have its specification. 
       
    88 
       
    89 The other function is calculating a \emph{derivative} of a regular expression. This is a function
       
    90 which will take a regular expression, say $r$, and a character, say $c$, as argument and return 
       
    91 a new regular expression. Beware that the intuition behind this function is not so easy to grasp on first
       
    92 reading. Essentially this function solves the following problem: if $r$ can match a string of the form
       
    93 $c\!::\!s$, what does the regular expression look like that can match just $s$. The definition of this
       
    94 function is as follows:
       
    95 
       
    96 \begin{center}
       
    97 \begin{tabular}{@ {}l@ {\hspace{2mm}}c@ {\hspace{2mm}}l@ {\hspace{-10mm}}l@ {}}
       
    98   $der\, c\, (\varnothing)$      & $\dn$ & $\varnothing$ & \\
       
    99   $der\, c\, (\epsilon)$           & $\dn$ & $\varnothing$ & \\
       
   100   $der\, c\, (d)$                     & $\dn$ & if $c = d$ then $\epsilon$ else $\varnothing$ & \\
       
   101   $der\, c\, (r_1 + r_2)$        & $\dn$ & $der\, c\, r_1 + der\, c\, r_2$ & \\
       
   102   $der\, c\, (r_1 \cdot r_2)$  & $\dn$  & if $nullable (r_1)$\\
       
   103   & & then $(der\,c\,r_1) \cdot r_2 + der\, c\, r_2$\\ 
       
   104   & & else $(der\, c\, r_1) \cdot r_2$\\
       
   105   $der\, c\, (r^*)$          & $\dn$ & $(der\,c\,r) \cdot (r^*)$ &
       
   106   \end{tabular}
       
   107 \end{center}
       
   108 
       
   109 \noindent
       
   110 The first two clauses can be rationalised as follows: recall that $der$ should calculate a regular
       
   111 expression, if the ``input'' regular expression can match a string of the form $c\!::\!s$. Since neither
       
   112 $\varnothing$ nor $\epsilon$ can match such a string we return $\varnothing$. In the third case
       
   113 we have to make a case-distinction: In case the regular expression is $c$, then clearly it can recognise
       
   114 a string of the form $c\!::\!s$, just that $s$ is the empty string. Therefore we return the $\epsilon$-regular 
       
   115 expression. In the other case we again return $\varnothing$ since no string of the $c\!::\!s$ can be matched.
       
   116 The $+$-case is relatively straightforward: all strings of the form $c\!::\!s$ are either matched by the
       
   117 regular expression $r_1$ or $r_2$. So we just have to recursively call $der$ with these two regular
       
   118 expressions and compose the results again with $+$. The $\cdot$-case is more complicated:
       
   119 if $r_1\cdot r_2$ matches a string of the form $c\!::\!s$, then the first part must be matched by $r_1$.
       
   120 Consequently, it makes sense to construct the regular expression for $s$ by calling $der$ with $r_1$ and
       
   121 ``appending'' $r_2$. There is however one exception to this simple rule: if $r_1$ can match the empty
       
   122 string, then all of $c\!::\!s$ is matched by $r_2$. So in case $r_1$ is nullable (that is can match the
       
   123 empty string) we have to allow the choice $der\,c\,r_2$ for calculating the regular expression that can match 
       
   124 $s$. The $*$-case is again simple: if $r^*$ matches a string of the form $c\!::\!s$, then the first part must be
       
   125 ``matched'' by a single copy of $r$. Therefore we call recursively $der\,c\,r$ and ``append'' $r^*$ in order to 
       
   126 match the rest of $s$.
       
   127 
       
   128 Another way to rationalise the definition of $der$ is to consider the following operation on sets:
       
   129 
       
   130 \[
       
   131 Der\,c\,A\;\dn\;\{s\,|\,c\!::\!s \in A\}
       
   132 \]
       
   133 
       
   134 \noindent
       
   135 which essentially transforms a set of strings $A$ by filtering out all strings that do not start with $c$ and then
       
   136 strip off the $c$ from all the remaining strings. For example suppose $A = \{"f\!oo", "bar", "f\!rak"\}$ then
       
   137 \[
       
   138 Der\,f\,A = \{"oo", "rak"\}\quad,\quad
       
   139 Der\,b\,A = \{"ar"\}  \quad \text{and} \quad
       
   140 Der\,a\,A = \varnothing
       
   141 \]
       
   142  
       
   143 \noindent
       
   144 Note that in the last case $Der$ is empty, because no string in $A$ starts with $a$. With this operation we can
       
   145 state the following property about $der$:
       
   146 
       
   147 \[
       
   148 L(der\,c\,r) = Der\,c\,(L(r))
       
   149 \]
       
   150 
       
   151 \noindent
       
   152 This property clarifies what regular expression $der$ calculates, namely take the set of strings
       
   153 that $r$ can match ($L(r)$), filter out all strings not starting with $c$ and strip off the $c$ from the
       
   154 remaining strings---this is exactly the language that $der\,c\,r$ can match.
       
   155 
       
   156 For our matching algorithm we need to lift the notion of derivatives from characters to strings. This can be
       
   157 done using the following function, taking a string and regular expression as input and a regular expression 
       
   158 as output.
       
   159 
       
   160 \begin{center}
       
   161 \begin{tabular}{@ {}l@ {\hspace{2mm}}c@ {\hspace{2mm}}l@ {\hspace{-10mm}}l@ {}}
       
   162   $der\!s\, []\, r$     & $\dn$ & $r$ & \\
       
   163   $der\!s\, (c\!::\!s)\, r$ & $\dn$ & $der\!s\,s\,(der\,c\,r)$ & \\
       
   164   \end{tabular}
       
   165 \end{center}
       
   166 
       
   167 \noindent
       
   168 Having $ders$ in place, we can finally define our matching algorithm:
       
   169 
       
   170 \[
       
   171 match\,s\,r = nullable(ders\,s\,r)
       
   172 \]
       
   173 
       
   174 \noindent
       
   175 We claim that
       
   176 
       
   177 \[
       
   178 match\,s\,r\quad\text{if and only if}\quad s\in L(r)
       
   179 \]
       
   180 
       
   181 \noindent
       
   182 holds, which means our algorithm satisfies the specification. This algorithm was introduced by
       
   183 Janus Brzozowski in 1964. Its main attractions are simplicity and being fast, as well as
       
   184 being easily extendable for other regular expressions such as $r^{\{n\}}$, $r^?$, $\sim{}r$ and so on.
       
   185 
       
   186 \begin{figure}[p]
       
   187 {\lstset{language=Scala}\texttt{\lstinputlisting{../progs/app5.scala}}}
       
   188 {\lstset{language=Scala}\texttt{\lstinputlisting{../progs/app6.scala}}}
       
   189 \caption{Scala implementation of the nullable and derivatives functions.}
       
   190 \end{figure}
       
   191 
       
   192 \end{document}
       
   193 
       
   194 %%% Local Variables: 
       
   195 %%% mode: latex
       
   196 %%% TeX-master: t
       
   197 %%% End: