handouts/ho03.tex
changeset 936 0b5f06539a84
parent 932 5678414a3898
child 967 ce5de01b9632
equal deleted inserted replaced
935:4e221cf587fa 936:0b5f06539a84
   560   Page~\pageref{simplecases}.
   560   Page~\pageref{simplecases}.
   561   \label{thompson1}}
   561   \label{thompson1}}
   562 \end{figure}
   562 \end{figure}
   563 
   563 
   564 \begin{figure}[p]
   564 \begin{figure}[p]
   565 \small
   565 \small 
   566 \lstinputlisting[numbers=left,firstline=48,firstnumber=48,lastline=85]{../progs/automata/thompson.sc}
   566 \lstinputlisting[numbers=left,firstline=48,firstnumber=48,lastline=85]{../progs/automata/thompson.sc}
   567 \caption{The second part of the Thompson Construction implementing
   567 \caption{The second part of the Thompson Construction implementing
   568   the composition of NFAs according to $\cdot$, $+$ and ${}^*$.
   568   the composition of NFAs according to $\cdot$, $+$ and ${}^*$.
   569   The implicit class (Lines 48--54) about rich partial functions
   569   The extension (Lines 48--54) about rich partial functions
   570   implements the infix operation \texttt{+++} which
   570   implements the infix operation \texttt{+++} which
   571   combines an $\epsilon$NFA transition with an NFA transition
   571   combines an $\epsilon$NFA transition with an NFA transition
   572   (both are given as partial functions---but with different type!).\label{thompson2}}
   572   (both are given as partial functions---but with different type!).\label{thompson2}}
   573 \end{figure}
   573 \end{figure}
   574 
   574 
   910 
   910 
   911 \noindent
   911 \noindent
   912 OK\ldots now you know why regular expression matchers in those
   912 OK\ldots now you know why regular expression matchers in those
   913 languages are sometimes so slow. A bit surprising, don't you agree?
   913 languages are sometimes so slow. A bit surprising, don't you agree?
   914 Also it is still a mystery why Rust, which because of the reasons
   914 Also it is still a mystery why Rust, which because of the reasons
   915 above avoids NFAs and uses DFAs instead cannot compete in all cases
   915 above avoids NFAs and uses DFAs instead, cannot compete in all cases
   916 with our simple derivative-based regular expression matcher in Scala.
   916 with our simple derivative-based regular expression matcher in Scala.
       
   917 There is an explanation for this as well\ldots{}remember there the
       
   918 offending examples are of the form $r^{\{n\}}$. Why could they be
       
   919 a problem in Rust?
   917 
   920 
   918 \subsection*{Subset Construction}
   921 \subsection*{Subset Construction}
   919 
   922 
   920 So of course, some clever developers of regular expression matchers are aware of
   923 So of course, some clever developers of regular expression matchers are aware of
   921 these problems with sluggish NFAs and try to address them. One common
   924 these problems with sluggish NFAs and try to address them. One common
  1621 \node[draw=none,fill=none] at (7,0.1) {\Large\ldots};
  1624 \node[draw=none,fill=none] at (7,0.1) {\Large\ldots};
  1622 \end{tikzpicture}  
  1625 \end{tikzpicture}  
  1623 \end{center}
  1626 \end{center}
  1624 
  1627 
  1625 \noindent
  1628 \noindent
  1626 I let you implement this ``automaton''.
  1629 You might want to implement this ``automaton''. What do you get?
  1627 
  1630 
  1628 While this makes all sense (modulo the flaw with the infinite states),
  1631 While this makes all sense (modulo the flaw with the infinite states),
  1629 does this automaton teach us anything new? The answer is no, because it
  1632 does this automaton teach us anything new? The answer is no, because it
  1630 boils down to just another implementation of the Brzozowski
  1633 boils down to just another implementation of the Brzozowski
  1631 algorithm from Lecture 2. There \emph{is} however something interesting
  1634 algorithm from Lecture 2. There \emph{is} however something interesting
  1632 in this construction
  1635 in this construction
  1633 which Brzozowski already cleverly found out, because there is
  1636 which Brzozowski already cleverly found out, because there is
  1634 a way to restrict the number of states to something finite.
  1637 a way to restrict the number of states to something finite.
  1635 Meaning it would give us a real automaton.
  1638 Meaning it would give us a real automaton.
  1636 However, this would lead us far, far away from what we want
  1639 However, this would lead us far, far away from what we want
  1637 discuss here.
  1640 discuss here. The end.
  1638 
  1641 
  1639 
  1642 
  1640 
  1643 
  1641 %\section*{Further Reading}
  1644 %\section*{Further Reading}
  1642 
  1645