handouts/ho02.tex
changeset 125 39c75cf4e079
parent 124 dd8b5a3dac0a
child 126 7c7185cb4f2b
equal deleted inserted replaced
124:dd8b5a3dac0a 125:39c75cf4e079
    88 
    88 
    89 The other function is calculating a \emph{derivative} of a regular expression. This is a function
    89 The other function is calculating a \emph{derivative} of a regular expression. This is a function
    90 which will take a regular expression, say $r$, and a character, say $c$, as argument and return 
    90 which will take a regular expression, say $r$, and a character, say $c$, as argument and return 
    91 a new regular expression. Beware that the intuition behind this function is not so easy to grasp on first
    91 a new regular expression. Beware that the intuition behind this function is not so easy to grasp on first
    92 reading. Essentially this function solves the following problem: if $r$ can match a string of the form
    92 reading. Essentially this function solves the following problem: if $r$ can match a string of the form
    93 $c\!::\!s$, what does the regular expression look like that can match just $s$.
    93 $c\!::\!s$, what does the regular expression look like that can match just $s$. The definition of this
       
    94 function is as follows:
       
    95 
       
    96 \begin{center}
       
    97 \begin{tabular}{@ {}l@ {\hspace{2mm}}c@ {\hspace{2mm}}l@ {\hspace{-10mm}}l@ {}}
       
    98   $der\, c\, (\varnothing)$      & $\dn$ & $\varnothing$ & \\
       
    99   $der\, c\, (\epsilon)$           & $\dn$ & $\varnothing$ & \\
       
   100   $der\, c\, (d)$                     & $\dn$ & if $c = d$ then $\epsilon$ else $\varnothing$ & \\
       
   101   $der\, c\, (r_1 + r_2)$        & $\dn$ & $der\, c\, r_1 + der\, c\, r_2$ & \\
       
   102   $der\, c\, (r_1 \cdot r_2)$  & $\dn$  & if $nullable (r_1)$\\
       
   103   & & then $(der\,c\,r_1) \cdot r_2 + der\, c\, r_2$\\ 
       
   104   & & else $(der\, c\, r_1) \cdot r_2$\\
       
   105   $der\, c\, (r^*)$          & $\dn$ & $(der\,c\,r) \cdot (r^*)$ &
       
   106   \end{tabular}
       
   107 \end{center}
       
   108 
       
   109 \noindent
       
   110 The first two clauses can be rationalised as follows: recall that $der$ should calculate a regular
       
   111 expression, if the ``input'' regular expression can match a string of the form $c\!::\!s$. Since neither
       
   112 $\varnothing$ nor $\epsilon$ can match such a string we return $\varnothing$. In the third case
       
   113 we have to make a case-distinction: In case the regular expression is $c$, then clearly it can recognise
       
   114 a string of the form $c\!::\!s$, just that $s$ is the empty string. Therefore we return the $\epsilon$-regular 
       
   115 expression. In the other case we again return $\varnothing$ since no string of the $c\!::\!s$ can be matched.
       
   116 The $+$-case is relatively straightforward: all strings of the form $c\!::\!s$ are either matched by the
       
   117 regular expression $r_1$ or $r_2$. So we just have to recursively call $der$ with these two regular
       
   118 expressions and compose the results again with $+$. The $\cdot$-case is more complicated:
       
   119 if $r_1\cdot r_2$ matches a string of the form $c\!::\!s$, then the first part must be matched by $r_1$.
       
   120 Consequently, it makes sense to construct the regular expression for $s$ by calling $der$ with $r_1$ and
       
   121 ``appending'' $r_2$. There is however one exception to this simple rule: if $r_1$ can match the empty
       
   122 string, then all of $c\!::\!s$ is matched by $r_2$. So in case $r_1$ is nullable (that is can match the
       
   123 empty string) we have to allow the choice $der\,c\,r_2$ for calculating the regular expression that can match 
       
   124 $s$. The $*$-case is again simple: if $r^*$ matches a string of the form $c\!::\!s$, then the first part must be
       
   125 ``matched'' by a single copy of $r$. Therefore we call recursively $der\,c\,r$ and ``append'' $r^*$ in order to 
       
   126 match the rest of $s$.
       
   127 
       
   128 Another way to rationalise the definition of $der$ is to consider the following operation on sets:
       
   129 
       
   130 \[
       
   131 Der\,c\,A\;\dn\;\{s\,|\,c\!::\!s \in A\}
       
   132 \]
       
   133 
       
   134 \noindent
       
   135 which essentially transforms a set of strings $A$ by filtering out all strings that do not start with $c$ and then
       
   136 strip off the $c$ from all the remaining strings. For example suppose $A = \{"f\!oo", "bar", "f\!rak"\}$ then
       
   137 \[
       
   138 Der\,f\,A = \{"oo", "rak"\}\quad,\quad
       
   139 Der\,b\,A = \{"ar"\}  \quad \text{and} \quad
       
   140 Der\,a\,A = \varnothing
       
   141 \]
       
   142  
       
   143 \noindent
       
   144 Note that in the last case $Der$ is empty, because no string in $A$ starts with $a$. With this operation we can
       
   145 state the following property about $der$:
       
   146 
       
   147 \[
       
   148 L(der\,c\,r) = Der\,c\,(L(r))
       
   149 \]
       
   150 
       
   151 \noindent
       
   152 This property clarifies what regular expression $der$ calculates, namely take the set of strings
       
   153 that $r$ can match ($L(r)$), filter out all strings not starting with $c$ and strip off the $c$ from the
       
   154 remaining strings---this is exactly the language that $der\,c\,r$ can match.
       
   155 
       
   156 For our matching algorithm we need to lift the notion of derivatives from characters to strings. This can be
       
   157 done using the following function, taking a string and regular expression as input and a regular expression 
       
   158 as output.
       
   159 
       
   160 \begin{center}
       
   161 \begin{tabular}{@ {}l@ {\hspace{2mm}}c@ {\hspace{2mm}}l@ {\hspace{-10mm}}l@ {}}
       
   162   $der\!s\, []\, r$     & $\dn$ & $r$ & \\
       
   163   $der\!s\, (c\!::\!s)\, r$ & $\dn$ & $der\!s\,s\,(der\,c\,r)$ & \\
       
   164   \end{tabular}
       
   165 \end{center}
       
   166 
       
   167 \noindent
       
   168 Having $ders$ in place, we can finally define our matching algorithm:
       
   169 
       
   170 \[
       
   171 match\,s\,r = nullable(ders\,s\,r)
       
   172 \]
       
   173 
       
   174 \noindent
       
   175 We claim that
       
   176 
       
   177 \[
       
   178 match\,s\,r\quad\text{if and only if}\quad s\in L(r)
       
   179 \]
       
   180 
       
   181 \noindent
       
   182 holds, which means our algorithm satisfies the specification.
    94 
   183 
    95 \end{document}
   184 \end{document}
    96 
   185 
    97 %%% Local Variables: 
   186 %%% Local Variables: 
    98 %%% mode: latex
   187 %%% mode: latex