handouts/ho02.tex
changeset 831 d499da29894c
parent 764 6718ef6143b8
child 873 a25da86f7c8c
equal deleted inserted replaced
830:dbf9d710ce65 831:d499da29894c
     6 \usepackage{../data}
     6 \usepackage{../data}
     7 
     7 
     8 
     8 
     9 \begin{document}
     9 \begin{document}
    10 \fnote{\copyright{} Christian Urban, King's College London, 
    10 \fnote{\copyright{} Christian Urban, King's College London, 
    11   2014, 2015, 2016, 2017, 2018, 2019, 2020}
    11   2014, 2015, 2016, 2017, 2018, 2019, 2020, 2021}
    12 
    12 
    13 
    13 
    14 \section*{Handout 2 (Regular Expression Matching)}
    14 \section*{Handout 2 (Regular Expression Matching)}
    15 
    15 
    16 %\noindent
    16 %\noindent
    25 
    25 
    26 \noindent
    26 \noindent
    27 This lecture is about implementing a more efficient regular expression
    27 This lecture is about implementing a more efficient regular expression
    28 matcher (the plots on the right below)---more efficient than the
    28 matcher (the plots on the right below)---more efficient than the
    29 matchers from regular expression libraries in Ruby, Python, JavaScript
    29 matchers from regular expression libraries in Ruby, Python, JavaScript
    30 and Java (the plots on the left). For this consider some experimental
    30 and Java (the plots on the left).\footnote{Have a look at KEATS: students
    31 data: The first pair of plots shows the running time for the
    31 last year contributed also date for the Swift and Dart languages.}\medskip
    32 regular expression $(a^*)^*\cdot b$ and strings composed of $n$
    32 
    33 \pcode{a}s, like
    33 \noindent
       
    34 To start with let us look more closely at the experimental data: The
       
    35 first pair of plots shows the running time for the regular expression
       
    36 $(a^*)^*\cdot b$ and strings composed of $n$ \pcode{a}s, like
    34 \[
    37 \[
    35 \underbrace{\pcode{a}\ldots\pcode{a}}_{n} 
    38 \underbrace{\pcode{a}\ldots\pcode{a}}_{n} 
    36 \]
    39 \]
    37 
    40 
    38 \noindent
    41 \noindent
   145 
   148 
   146 \noindent
   149 \noindent
   147 In what follows we will use these regular expressions and strings as
   150 In what follows we will use these regular expressions and strings as
   148 running examples. There will be several versions (V1, V2, V3,\ldots)
   151 running examples. There will be several versions (V1, V2, V3,\ldots)
   149 of our matcher.\footnote{The corresponding files are
   152 of our matcher.\footnote{The corresponding files are
   150   \texttt{re1.scala}, \texttt{re2.scala} and so on. As usual, you can
   153   \texttt{re1.sc}, \texttt{re2.sc} and so on. As usual, you can
   151   find the code on KEATS.}
   154   find the code on KEATS.}
   152 
   155 
   153 Having specified in the previous lecture what
   156 Having specified in the previous lecture what
   154 problem our regular expression matcher is supposed to solve,
   157 problem our regular expression matcher is supposed to solve,
   155 namely for any given regular expression $r$ and string $s$
   158 namely for any given regular expression $r$ and string $s$
   300 \noindent
   303 \noindent
   301 We will not use them in our algorithm, but feel free to convince
   304 We will not use them in our algorithm, but feel free to convince
   302 yourself that they actually hold. As an aside, there has been a lot of
   305 yourself that they actually hold. As an aside, there has been a lot of
   303 research about questions like: Can one always decide when two regular
   306 research about questions like: Can one always decide when two regular
   304 expressions are equivalent or not? What does an algorithm look like to
   307 expressions are equivalent or not? What does an algorithm look like to
   305 decide this efficiently? So in general it is not a trivial problem.
   308 decide this efficiently? Surprisingly, many of such questions
       
   309 turn out to be non-trivial problems.
       
   310 
   306 
   311 
   307 \subsection*{The Matching Algorithm}
   312 \subsection*{The Matching Algorithm}
   308 
   313 
   309 The algorithm we will introduce below consists of two parts. One is the
   314 The algorithm we will introduce below consists of two parts. One is the
   310 function $\textit{nullable}$ which takes a regular expression as an
   315 function $\textit{nullable}$ which takes a regular expression as an
   490 \textit{matcher}\,r\,s\quad\text{if and only if}\quad s\in L(r)
   495 \textit{matcher}\,r\,s\quad\text{if and only if}\quad s\in L(r)
   491 \]
   496 \]
   492 
   497 
   493 \noindent holds, which means our algorithm satisfies the
   498 \noindent holds, which means our algorithm satisfies the
   494 specification. Of course we can claim many things\ldots
   499 specification. Of course we can claim many things\ldots
   495 whether the claim holds any water is a different question,
   500 whether the claim holds any water is a different question.
   496 which for example is the point of the Strand-2 Coursework.
       
   497 
   501 
   498 This algorithm was introduced by Janusz Brzozowski in 1964, but 
   502 This algorithm was introduced by Janusz Brzozowski in 1964, but 
   499 is more widely known only in the last 10 or so years. Its
   503 is more widely known only in the last 10 or so years. Its
   500 main attractions are simplicity and being fast, as well as
   504 main attractions are simplicity and being fast, as well as
   501 being easily extendible for other regular expressions such as
   505 being easily extendible for other regular expressions such as
   502 $r^{\{n\}}$, $r^?$, $\sim{}r$ and so on (this is subject of
   506 $r^{\{n\}}$, $r^?$, $\sim{}r$ and so on (this is subject of
   503 Strand-1 Coursework 1). 
   507 Coursework 1). 
   504 
   508 
   505 \subsection*{The Matching Algorithm in Scala}
   509 \subsection*{The Matching Algorithm in Scala}
   506 
   510 
   507 Another attraction of the algorithm is that it can be easily
   511 Another attraction of the algorithm is that it can be easily
   508 implemented in a functional programming language, like Scala.
   512 implemented in a functional programming language, like Scala.