handouts/ho02.tex
changeset 296 796b9b81ac8d
parent 291 201c2c6d8696
child 318 7975e4f0d4de
equal deleted inserted replaced
295:19f23c4c2167 296:796b9b81ac8d
   147 $r + r$ & $\equiv$ & $r$
   147 $r + r$ & $\equiv$ & $r$
   148 \end{tabular}
   148 \end{tabular}
   149 \end{center}
   149 \end{center}
   150 
   150 
   151 \noindent which always hold no matter what the regular
   151 \noindent which always hold no matter what the regular
   152 expression $r$ looks like. The first are easy to verify since
   152 expression $r$ looks like. The first two are easy to verify since
   153 $L(\varnothing)$ is the empty set. The next two are also easy 
   153 $L(\varnothing)$ is the empty set. The next two are also easy 
   154 to verify since $L(\epsilon) = \{[]\}$ and appending the empty 
   154 to verify since $L(\epsilon) = \{[]\}$ and appending the empty 
   155 string to every string of another set, leaves the set 
   155 string to every string of another set, leaves the set 
   156 unchanged. Be careful to fully comprehend the fifth and 
   156 unchanged. Be careful to fully comprehend the fifth and 
   157 sixth equivalence: if you concatenate two sets of strings
   157 sixth equivalence: if you concatenate two sets of strings
   158 and one is the empty set, then the concatenation will also be
   158 and one is the empty set, then the concatenation will also be
   159 the empty set. Check the definition of \pcode{_ @ _}.
   159 the empty set. Check the definition of \pcode{_ @ _}.
   160 The last equivalence is again trivial.
   160 The last equivalence is again trivial.
   161 
   161 
   162 
       
   163 What will be important later on is that we can orient these
   162 What will be important later on is that we can orient these
   164 equivalences and read them from left to right. In this way we
   163 equivalences and read them from left to right. In this way we
   165 can view them as \emph{simplification rules}. Suppose for 
   164 can view them as \emph{simplification rules}. Suppose for 
   166 example the regular expression 
   165 example the regular expression 
   167 
   166 
   170 \label{big}
   169 \label{big}
   171 \end{equation}
   170 \end{equation}
   172 
   171 
   173 \noindent If we can find an equivalent regular expression that
   172 \noindent If we can find an equivalent regular expression that
   174 is simpler (smaller for example), then this might potentially 
   173 is simpler (smaller for example), then this might potentially 
   175 make our matching algorithm is faster. The reason is that 
   174 make our matching algorithm run faster. The reason is that 
   176 whether a string $s$ is in $L(r)$ or in $L(r')$ with $r\equiv r'$
   175 whether a string $s$ is in $L(r)$ or in $L(r')$ with $r\equiv r'$
   177 will always give the same answer. In the example above you 
   176 will always give the same answer. In the example above you 
   178 will see that the regular expression is equivalent to $r_1$
   177 will see that the regular expression is equivalent to $r_1$
   179 if you iteratively apply the simplification rules from above:
   178 if you iteratively apply the simplification rules from above:
   180 
   179 
   189 $\equiv$ & $\underline{r_1 + \varnothing}$\smallskip\\
   188 $\equiv$ & $\underline{r_1 + \varnothing}$\smallskip\\
   190 $\equiv$ & $r_1$\
   189 $\equiv$ & $r_1$\
   191 \end{tabular}
   190 \end{tabular}
   192 \end{center}
   191 \end{center}
   193 
   192 
   194 \noindent In each step I underlined where a simplification
   193 \noindent In each step, I underlined where a simplification
   195 rule is applied. Our matching algorithm in the next section
   194 rule is applied. Our matching algorithm in the next section
   196 will often generate such ``useless'' $\epsilon$s and
   195 will often generate such ``useless'' $\epsilon$s and
   197 $\varnothing$s, therefore simplifying them away will make the
   196 $\varnothing$s, therefore simplifying them away will make the
   198 algorithm quite a bit faster.
   197 algorithm quite a bit faster.
   199 
   198 
   390 
   389 
   391 \subsection*{The Matching Algorithm in Scala}
   390 \subsection*{The Matching Algorithm in Scala}
   392 
   391 
   393 Another attraction of the algorithm is that it can be easily
   392 Another attraction of the algorithm is that it can be easily
   394 implemented in a functional programming language, like Scala.
   393 implemented in a functional programming language, like Scala.
   395 Given the implementation of regular expressions in Scala given
   394 Given the implementation of regular expressions in Scala shown
   396 in the first lecture and handout, the functions for
   395 in the first lecture and handout, the functions and subfunctions
   397 \pcode{matches} are shown in Figure~\ref{scala1}.
   396 for \pcode{matches} are shown in Figure~\ref{scala1}.
   398 
   397 
   399 \begin{figure}[p]
   398 \begin{figure}[p]
   400 \lstinputlisting{../progs/app5.scala}
   399 \lstinputlisting{../progs/app5.scala}
   401 \caption{Scala implementation of the nullable and 
   400 \caption{Scala implementation of the nullable and 
   402   derivatives functions.\label{scala1}}
   401   derivatives functions.\label{scala1}}
   535 \end{center}
   534 \end{center}
   536 
   535 
   537 \noindent I leave it to you to contemplate whether such a
   536 \noindent I leave it to you to contemplate whether such a
   538 simplification can have any impact on the correctness of our
   537 simplification can have any impact on the correctness of our
   539 algorithm (will it change any answers?). Figure~\ref{scala2}
   538 algorithm (will it change any answers?). Figure~\ref{scala2}
   540 give a simplification function that recursively traverses a
   539 gives a simplification function that recursively traverses a
   541 regular expression and simplifies it according to the rules
   540 regular expression and simplifies it according to the rules
   542 given at the beginning. There are only rules for $+$, $\cdot$
   541 given at the beginning. There are only rules for $+$, $\cdot$
   543 and $n$-times (the latter because we added it in the second
   542 and $n$-times (the latter because we added it in the second
   544 version of our matcher). There is no rule for a star, because
   543 version of our matcher). There is no rule for a star, because
   545 empirical data and also a little thought showed that
   544 empirical data and also a little thought showed that