handouts/ho02.tex
changeset 140 1be892087df2
parent 133 09efdf5cf07c
child 217 cd6066f1056a
equal deleted inserted replaced
139:6e7c3db9023d 140:1be892087df2
    55 \[
    55 \[
    56 s \in L(r)
    56 s \in L(r)
    57 \]
    57 \]
    58 
    58 
    59 \noindent
    59 \noindent
    60 Clearly we cannot use the function $L$ directly in order to solve this problem, because in general
    60 we can look at an algorithm to solve this problem.
    61 the set of strings $L$ returns is infinite (recall what $L(a^*)$ is). In such cases there is no algorithm
    61 Clearly we cannot use the function $L$ directly for this, because in general
    62 then can test exhaustively, whether a string is member of this set.
    62 the set of strings $L$ returns is infinite (recall what $L(a^*)$ is). In such cases there is no way we can implement
    63 
    63 an exhaustive test for whether a string is member of this set or not.
    64 The algorithm we define below consists of two parts. One is the function $nullable$ which takes a
    64 
       
    65 The algorithm we will define below consists of two parts. One is the function $nullable$ which takes a
    65 regular expression as argument and decides whether it can match the empty string (this means it returns a 
    66 regular expression as argument and decides whether it can match the empty string (this means it returns a 
    66 boolean). This can be easily defined recursively as follows:
    67 boolean). This can be easily defined recursively as follows:
    67 
    68 
    68 \begin{center}
    69 \begin{center}
    69 \begin{tabular}{@ {}l@ {\hspace{2mm}}c@ {\hspace{2mm}}l@ {}}
    70 \begin{tabular}{@ {}l@ {\hspace{2mm}}c@ {\hspace{2mm}}l@ {}}
    82 \[
    83 \[
    83 nullable(r) \;\;\text{if and only if}\;\; ""\in L(r)
    84 nullable(r) \;\;\text{if and only if}\;\; ""\in L(r)
    84 \]
    85 \]
    85 
    86 
    86 \noindent
    87 \noindent
    87 On the left-hand side we have a function we can implement; on the right we have its specification. 
    88 Note on the left-hand side we have a function we can implement; on the right we have its specification. 
    88 
    89 
    89 The other function is calculating a \emph{derivative} of a regular expression. This is a function
    90 The other function of our matching algorithm calculates a \emph{derivative} of a regular expression. This is a function
    90 which will take a regular expression, say $r$, and a character, say $c$, as argument and return 
    91 which will take a regular expression, say $r$, and a character, say $c$, as argument and return 
    91 a new regular expression. Beware that the intuition behind this function is not so easy to grasp on first
    92 a new regular expression. Be careful that the intuition behind this function is not so easy to grasp on first
    92 reading. Essentially this function solves the following problem: if $r$ can match a string of the form
    93 reading. Essentially this function solves the following problem: if $r$ can match a string of the form
    93 $c\!::\!s$, what does the regular expression look like that can match just $s$. The definition of this
    94 $c\!::\!s$, what does the regular expression look like that can match just $s$. The definition of this
    94 function is as follows:
    95 function is as follows:
    95 
    96 
    96 \begin{center}
    97 \begin{center}
   131 Der\,c\,A\;\dn\;\{s\,|\,c\!::\!s \in A\}
   132 Der\,c\,A\;\dn\;\{s\,|\,c\!::\!s \in A\}
   132 \]
   133 \]
   133 
   134 
   134 \noindent
   135 \noindent
   135 which essentially transforms a set of strings $A$ by filtering out all strings that do not start with $c$ and then
   136 which essentially transforms a set of strings $A$ by filtering out all strings that do not start with $c$ and then
   136 strip off the $c$ from all the remaining strings. For example suppose $A = \{"f\!oo", "bar", "f\!rak"\}$ then
   137 strips off the $c$ from all the remaining strings. For example suppose $A = \{"f\!oo", "bar", "f\!rak"\}$ then
   137 \[
   138 \[
   138 Der\,f\,A = \{"oo", "rak"\}\quad,\quad
   139 Der\,f\,A = \{"oo", "rak"\}\quad,\quad
   139 Der\,b\,A = \{"ar"\}  \quad \text{and} \quad
   140 Der\,b\,A = \{"ar"\}  \quad \text{and} \quad
   140 Der\,a\,A = \varnothing
   141 Der\,a\,A = \varnothing
   141 \]
   142 \]
   148 L(der\,c\,r) = Der\,c\,(L(r))
   149 L(der\,c\,r) = Der\,c\,(L(r))
   149 \]
   150 \]
   150 
   151 
   151 \noindent
   152 \noindent
   152 This property clarifies what regular expression $der$ calculates, namely take the set of strings
   153 This property clarifies what regular expression $der$ calculates, namely take the set of strings
   153 that $r$ can match ($L(r)$), filter out all strings not starting with $c$ and strip off the $c$ from the
   154 that $r$ can match (that is $L(r)$), filter out all strings not starting with $c$ and strip off the $c$ from the
   154 remaining strings---this is exactly the language that $der\,c\,r$ can match.
   155 remaining strings---this is exactly the language that $der\,c\,r$ can match.
   155 
   156 
   156 For our matching algorithm we need to lift the notion of derivatives from characters to strings. This can be
   157 If we want to find out whether the string $"abc"$ is matched by the regular expression $r$
       
   158 then we can iteratively apply $Der$ as follows
       
   159 
       
   160 \begin{enumerate}
       
   161 \item $Der\,a\,(L(r))$
       
   162 \item $Der\,b\,(Der\,a\,(L(r)))$
       
   163 \item $Der\,c\,(Der\,b\,(Der\,a\,(L(r))))$
       
   164 \end{enumerate}
       
   165 
       
   166 \noindent
       
   167 In the last step we need to test whether the empty string is in the set. Our matching algorithm will work similarly, 
       
   168 just using regular expression instead of sets. For this we need to lift the notion of derivatives from characters to strings. This can be
   157 done using the following function, taking a string and regular expression as input and a regular expression 
   169 done using the following function, taking a string and regular expression as input and a regular expression 
   158 as output.
   170 as output.
   159 
   171 
   160 \begin{center}
   172 \begin{center}
   161 \begin{tabular}{@ {}l@ {\hspace{2mm}}c@ {\hspace{2mm}}l@ {\hspace{-10mm}}l@ {}}
   173 \begin{tabular}{@ {}l@ {\hspace{2mm}}c@ {\hspace{2mm}}l@ {\hspace{-10mm}}l@ {}}