handouts/ho02.tex
changeset 510 25580bf89ac0
parent 492 39b7ff2cf1bc
child 511 1af5ec39d006
equal deleted inserted replaced
509:c78bb592a034 510:25580bf89ac0
    21 the regular expressions $a^?{}^{\{n\}}\cdot a^{\{n\}}$ and strings
    21 the regular expressions $a^?{}^{\{n\}}\cdot a^{\{n\}}$ and strings
    22 also composed of $n$ \pcode{a}s (this time the regular expressions
    22 also composed of $n$ \pcode{a}s (this time the regular expressions
    23 match the strings).  To see the substantial differences in the left
    23 match the strings).  To see the substantial differences in the left
    24 and right plots below, note the different scales of the $x$-axes.
    24 and right plots below, note the different scales of the $x$-axes.
    25 
    25 
    26 
    26   
    27 \begin{center}
    27 \begin{center}
    28 Graphs: $(a^*)^* \cdot b$ and strings $\underbrace{a\ldots a}_{n}$
    28 Graphs: $(a^*)^* \cdot b$ and strings $\underbrace{a\ldots a}_{n}$
    29 \begin{tabular}{@{}cc@{}}
    29 \begin{tabular}{@{}cc@{}}
    30 \begin{tikzpicture}
    30 \begin{tikzpicture}
    31 \begin{axis}[
    31 \begin{axis}[
   280 \noindent
   280 \noindent
   281 We will not use them in our algorithm, but feel free to convince you
   281 We will not use them in our algorithm, but feel free to convince you
   282 that they hold. As an aside, there has been a lot of research about
   282 that they hold. As an aside, there has been a lot of research about
   283 questions like: Can one always decide when two regular expressions are
   283 questions like: Can one always decide when two regular expressions are
   284 equivalent or not? What does an algorithm look like to decide this
   284 equivalent or not? What does an algorithm look like to decide this
   285 efficiently?
   285 efficiently? So in general it is not a trivial problem.
   286 
   286 
   287 \subsection*{The Matching Algorithm}
   287 \subsection*{The Matching Algorithm}
   288 
   288 
   289 The algorithm we will define below consists of two parts. One
   289 The algorithm we will define below consists of two parts. One
   290 is the function $\textit{nullable}$ which takes a regular expression as
   290 is the function $\textit{nullable}$ which takes a regular expression as
   724 \end{axis}
   724 \end{axis}
   725 \end{tikzpicture}
   725 \end{tikzpicture}
   726 \end{center}
   726 \end{center}
   727 
   727 
   728 \noindent
   728 \noindent
   729 To reacap, Python and Ruby needed approximately 30 seconds to match a
   729 To recap, Python and Ruby needed approximately 30 seconds to match a
   730 string of 28 \texttt{a}s and the regular expression $a^{?\{n\}} \cdot
   730 string of 28 \texttt{a}s and the regular expression $a^{?\{n\}} \cdot
   731 a^{\{n\}}$.  We need a third of this time to do the same with strings
   731 a^{\{n\}}$.  We need a third of this time to do the same with strings
   732 up to 11,000 \texttt{a}s.  Similarly, Java and Python needed 30
   732 up to 11,000 \texttt{a}s.  Similarly, Java and Python needed 30
   733 seconds to find out the regular expression $(a^*)^* \cdot b$ does not
   733 seconds to find out the regular expression $(a^*)^* \cdot b$ does not
   734 match the string of 28 \texttt{a}s. We can do the same in
   734 match the string of 28 \texttt{a}s. We can do the same in
   772 \noindent
   772 \noindent
   773 I have not fully understood why this version is much faster,
   773 I have not fully understood why this version is much faster,
   774 but it seems it is a combination of the clauses for \texttt{ALT}
   774 but it seems it is a combination of the clauses for \texttt{ALT}
   775 and \texttt{SEQ}. In the latter case we call \texttt{der} with 
   775 and \texttt{SEQ}. In the latter case we call \texttt{der} with 
   776 a single character and this potentially produces an alternative.
   776 a single character and this potentially produces an alternative.
   777 The derivative of such an alternative can then be more effeciently  
   777 The derivative of such an alternative can then be more efficiently  
   778 calculated by \texttt{ders2} since it pushes a whole string
   778 calculated by \texttt{ders2} since it pushes a whole string
   779 under an \texttt{ALT}. The numbers are that in the second case  
   779 under an \texttt{ALT}. The numbers are that in the second case  
   780 $(a^*)^* \cdot b$ both versions are pretty much the same, but in the 
   780 $(a^*)^* \cdot b$ both versions are pretty much the same, but in the 
   781 first case $a^{?\{n\}} \cdot a^{\{n\}}$ the improvement gives 
   781 first case $a^{?\{n\}} \cdot a^{\{n\}}$ the improvement gives 
   782 another factor of 100 speedup. Nice!
   782 another factor of 100 speedup. Nice!
  1044 meets its specification. To have done so, requires a few induction
  1044 meets its specification. To have done so, requires a few induction
  1045 proofs about strings and regular expressions. Following the \emph{induction
  1045 proofs about strings and regular expressions. Following the \emph{induction
  1046 recipes} is already a big step in actually performing these proofs.
  1046 recipes} is already a big step in actually performing these proofs.
  1047 If you do not believe it, proofs have helped me to make sure my code
  1047 If you do not believe it, proofs have helped me to make sure my code
  1048 is correct and in several instances prevented me of letting slip
  1048 is correct and in several instances prevented me of letting slip
  1049 embarassing mistakes into the `wild'.
  1049 embarrassing mistakes into the `wild'.
  1050 
  1050 
  1051 \end{document}
  1051 \end{document}
  1052 
  1052 
  1053 
  1053 
  1054 
  1054