handouts/ho02.tex
author Christian Urban <christian.urban@kcl.ac.uk>
Sat, 23 Sep 2023 22:26:52 +0100
changeset 926 42ecc3186944
parent 874 ffe02fd574a5
child 932 5678414a3898
permissions -rw-r--r--
updated
Ignore whitespace changes - Everywhere: Within whitespace: At end of lines:
646
2abd285c66d1 updated
Christian Urban <urbanc@in.tum.de>
parents: 618
diff changeset
     1
% !TEX program = xelatex
123
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents:
diff changeset
     2
\documentclass{article}
251
5b5a68df6d16 updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 217
diff changeset
     3
\usepackage{../style}
217
cd6066f1056a updated handouts
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 140
diff changeset
     4
\usepackage{../langs}
261
24531cfaa36a updated handouts
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 259
diff changeset
     5
\usepackage{../graphics}
24531cfaa36a updated handouts
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 259
diff changeset
     6
\usepackage{../data}
480
9e42ccbbd1e6 updated
Christian Urban <urbanc@in.tum.de>
parents: 478
diff changeset
     7
399
5c1fbb39c93e updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 394
diff changeset
     8
123
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents:
diff changeset
     9
\begin{document}
727
eb9343126625 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents: 646
diff changeset
    10
\fnote{\copyright{} Christian Urban, King's College London, 
926
42ecc3186944 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents: 874
diff changeset
    11
  2014, 2015, 2016, 2017, 2018, 2019, 2020, 2021, 2022, 2023}
399
5c1fbb39c93e updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 394
diff changeset
    12
123
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents:
diff changeset
    13
272
1446bc47a294 updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 268
diff changeset
    14
\section*{Handout 2 (Regular Expression Matching)}
123
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents:
diff changeset
    15
757
ea0be0662be0 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents: 727
diff changeset
    16
%\noindent
ea0be0662be0 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents: 727
diff changeset
    17
%{\bf Checklist}
ea0be0662be0 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents: 727
diff changeset
    18
%
ea0be0662be0 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents: 727
diff changeset
    19
%\begin{itemize}
ea0be0662be0 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents: 727
diff changeset
    20
%  \item You have understood the derivative-based matching algorithm.
ea0be0662be0 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents: 727
diff changeset
    21
%  \item You know how the derivative is related to the meaning of regular
ea0be0662be0 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents: 727
diff changeset
    22
%  expressions.
ea0be0662be0 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents: 727
diff changeset
    23
%  \item You can extend the algorithm to non-basic regular expressions.
ea0be0662be0 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents: 727
diff changeset
    24
%\end{itemize}\bigskip\bigskip\bigskip
727
eb9343126625 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents: 646
diff changeset
    25
eb9343126625 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents: 646
diff changeset
    26
\noindent
412
1cef3924f7a2 updated
Christian Urban <urbanc@in.tum.de>
parents: 399
diff changeset
    27
This lecture is about implementing a more efficient regular expression
478
48b842c997c7 updated
Christian Urban <urbanc@in.tum.de>
parents: 477
diff changeset
    28
matcher (the plots on the right below)---more efficient than the
926
42ecc3186944 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents: 874
diff changeset
    29
matchers from regular expression libraries in Ruby, Python, JavaScript, Swift
831
d499da29894c updated
Christian Urban <christian.urban@kcl.ac.uk>
parents: 764
diff changeset
    30
and Java (the plots on the left).\footnote{Have a look at KEATS: students
926
42ecc3186944 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents: 874
diff changeset
    31
last year contributed also data for the Dart language.}\medskip
831
d499da29894c updated
Christian Urban <christian.urban@kcl.ac.uk>
parents: 764
diff changeset
    32
d499da29894c updated
Christian Urban <christian.urban@kcl.ac.uk>
parents: 764
diff changeset
    33
\noindent
d499da29894c updated
Christian Urban <christian.urban@kcl.ac.uk>
parents: 764
diff changeset
    34
To start with let us look more closely at the experimental data: The
d499da29894c updated
Christian Urban <christian.urban@kcl.ac.uk>
parents: 764
diff changeset
    35
first pair of plots shows the running time for the regular expression
d499da29894c updated
Christian Urban <christian.urban@kcl.ac.uk>
parents: 764
diff changeset
    36
$(a^*)^*\cdot b$ and strings composed of $n$ \pcode{a}s, like
727
eb9343126625 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents: 646
diff changeset
    37
\[
764
6718ef6143b8 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents: 757
diff changeset
    38
\underbrace{\pcode{a}\ldots\pcode{a}}_{n} 
727
eb9343126625 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents: 646
diff changeset
    39
\]
eb9343126625 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents: 646
diff changeset
    40
eb9343126625 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents: 646
diff changeset
    41
\noindent
eb9343126625 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents: 646
diff changeset
    42
This means the regular expression actually does not match the strings.
eb9343126625 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents: 646
diff changeset
    43
The second pair of plots shows the running time for the regular
eb9343126625 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents: 646
diff changeset
    44
expressions of the form $a^?{}^{\{n\}}\cdot a^{\{n\}}$ and corresponding
eb9343126625 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents: 646
diff changeset
    45
strings composed of $n$ \pcode{a}s (this time the regular expressions
eb9343126625 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents: 646
diff changeset
    46
match the strings).  To see the substantial differences in the left and
618
f4818c95a32e updated
Christian Urban <urbanc@in.tum.de>
parents: 571
diff changeset
    47
right plots below, note the different scales of the $x$-axes.
478
48b842c997c7 updated
Christian Urban <urbanc@in.tum.de>
parents: 477
diff changeset
    48
510
cu
parents: 492
diff changeset
    49
  
478
48b842c997c7 updated
Christian Urban <urbanc@in.tum.de>
parents: 477
diff changeset
    50
\begin{center}
48b842c997c7 updated
Christian Urban <urbanc@in.tum.de>
parents: 477
diff changeset
    51
Graphs: $(a^*)^* \cdot b$ and strings $\underbrace{a\ldots a}_{n}$
48b842c997c7 updated
Christian Urban <urbanc@in.tum.de>
parents: 477
diff changeset
    52
\begin{tabular}{@{}cc@{}}
550
71fc4a7a7039 updated
Christian Urban <urbanc@in.tum.de>
parents: 512
diff changeset
    53
\begin{tikzpicture}[baseline=(current bounding box.north)]
71fc4a7a7039 updated
Christian Urban <urbanc@in.tum.de>
parents: 512
diff changeset
    54
  \begin{axis}[
478
48b842c997c7 updated
Christian Urban <urbanc@in.tum.de>
parents: 477
diff changeset
    55
    xlabel={$n$},
48b842c997c7 updated
Christian Urban <urbanc@in.tum.de>
parents: 477
diff changeset
    56
    x label style={at={(1.05,0.0)}},
48b842c997c7 updated
Christian Urban <urbanc@in.tum.de>
parents: 477
diff changeset
    57
    ylabel={time in secs},
48b842c997c7 updated
Christian Urban <urbanc@in.tum.de>
parents: 477
diff changeset
    58
    enlargelimits=false,
48b842c997c7 updated
Christian Urban <urbanc@in.tum.de>
parents: 477
diff changeset
    59
    xtick={0,5,...,30},
48b842c997c7 updated
Christian Urban <urbanc@in.tum.de>
parents: 477
diff changeset
    60
    xmax=33,
48b842c997c7 updated
Christian Urban <urbanc@in.tum.de>
parents: 477
diff changeset
    61
    ymax=35,
48b842c997c7 updated
Christian Urban <urbanc@in.tum.de>
parents: 477
diff changeset
    62
    ytick={0,5,...,30},
48b842c997c7 updated
Christian Urban <urbanc@in.tum.de>
parents: 477
diff changeset
    63
    scaled ticks=false,
48b842c997c7 updated
Christian Urban <urbanc@in.tum.de>
parents: 477
diff changeset
    64
    axis lines=left,
48b842c997c7 updated
Christian Urban <urbanc@in.tum.de>
parents: 477
diff changeset
    65
    width=5cm,
926
42ecc3186944 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents: 874
diff changeset
    66
    height=4.5cm, 
42ecc3186944 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents: 874
diff changeset
    67
    legend entries={Java 8, Python, JavaScript, Swift},  
478
48b842c997c7 updated
Christian Urban <urbanc@in.tum.de>
parents: 477
diff changeset
    68
    legend pos=north west,
48b842c997c7 updated
Christian Urban <urbanc@in.tum.de>
parents: 477
diff changeset
    69
    legend cell align=left]
48b842c997c7 updated
Christian Urban <urbanc@in.tum.de>
parents: 477
diff changeset
    70
\addplot[blue,mark=*, mark options={fill=white}] table {re-python2.data};
48b842c997c7 updated
Christian Urban <urbanc@in.tum.de>
parents: 477
diff changeset
    71
\addplot[cyan,mark=*, mark options={fill=white}] table {re-java.data};
618
f4818c95a32e updated
Christian Urban <urbanc@in.tum.de>
parents: 571
diff changeset
    72
\addplot[red,mark=*, mark options={fill=white}] table {re-js.data};
926
42ecc3186944 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents: 874
diff changeset
    73
\addplot[magenta,mark=*, mark options={fill=white}] table {re-swift.data};
478
48b842c997c7 updated
Christian Urban <urbanc@in.tum.de>
parents: 477
diff changeset
    74
\end{axis}
48b842c997c7 updated
Christian Urban <urbanc@in.tum.de>
parents: 477
diff changeset
    75
\end{tikzpicture}
48b842c997c7 updated
Christian Urban <urbanc@in.tum.de>
parents: 477
diff changeset
    76
&
550
71fc4a7a7039 updated
Christian Urban <urbanc@in.tum.de>
parents: 512
diff changeset
    77
\begin{tikzpicture}[baseline=(current bounding box.north)]
478
48b842c997c7 updated
Christian Urban <urbanc@in.tum.de>
parents: 477
diff changeset
    78
  \begin{axis}[
48b842c997c7 updated
Christian Urban <urbanc@in.tum.de>
parents: 477
diff changeset
    79
    xlabel={$n$},
488
598741d39d21 updated
Christian Urban <urbanc@in.tum.de>
parents: 481
diff changeset
    80
    x label style={at={(1.1,0.0)}},
598741d39d21 updated
Christian Urban <urbanc@in.tum.de>
parents: 481
diff changeset
    81
    %%xtick={0,1000000,...,5000000}, 
478
48b842c997c7 updated
Christian Urban <urbanc@in.tum.de>
parents: 477
diff changeset
    82
    ylabel={time in secs},
48b842c997c7 updated
Christian Urban <urbanc@in.tum.de>
parents: 477
diff changeset
    83
    enlargelimits=false,
48b842c997c7 updated
Christian Urban <urbanc@in.tum.de>
parents: 477
diff changeset
    84
    ymax=35,
48b842c997c7 updated
Christian Urban <urbanc@in.tum.de>
parents: 477
diff changeset
    85
    ytick={0,5,...,30},
48b842c997c7 updated
Christian Urban <urbanc@in.tum.de>
parents: 477
diff changeset
    86
    axis lines=left,
488
598741d39d21 updated
Christian Urban <urbanc@in.tum.de>
parents: 481
diff changeset
    87
    %scaled ticks=false,
478
48b842c997c7 updated
Christian Urban <urbanc@in.tum.de>
parents: 477
diff changeset
    88
    width=6.5cm,
926
42ecc3186944 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents: 874
diff changeset
    89
    height=4.5cm,
488
598741d39d21 updated
Christian Urban <urbanc@in.tum.de>
parents: 481
diff changeset
    90
    legend entries={Our matcher},  
478
48b842c997c7 updated
Christian Urban <urbanc@in.tum.de>
parents: 477
diff changeset
    91
    legend pos=north east,
48b842c997c7 updated
Christian Urban <urbanc@in.tum.de>
parents: 477
diff changeset
    92
    legend cell align=left]
48b842c997c7 updated
Christian Urban <urbanc@in.tum.de>
parents: 477
diff changeset
    93
%\addplot[green,mark=square*,mark options={fill=white}] table {re2a.data};    
48b842c997c7 updated
Christian Urban <urbanc@in.tum.de>
parents: 477
diff changeset
    94
\addplot[black,mark=square*,mark options={fill=white}] table {re3a.data};
48b842c997c7 updated
Christian Urban <urbanc@in.tum.de>
parents: 477
diff changeset
    95
\end{axis}
48b842c997c7 updated
Christian Urban <urbanc@in.tum.de>
parents: 477
diff changeset
    96
\end{tikzpicture}
48b842c997c7 updated
Christian Urban <urbanc@in.tum.de>
parents: 477
diff changeset
    97
\end{tabular}
488
598741d39d21 updated
Christian Urban <urbanc@in.tum.de>
parents: 481
diff changeset
    98
\end{center}\bigskip
263
92e6985018ae updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 262
diff changeset
    99
261
24531cfaa36a updated handouts
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 259
diff changeset
   100
\begin{center}
415
4ae59fd3b174 updated
Christian Urban <urbanc@in.tum.de>
parents: 414
diff changeset
   101
Graphs: $a^{?\{n\}} \cdot a^{\{n\}}$ and strings $\underbrace{a\ldots a}_{n}$\\
261
24531cfaa36a updated handouts
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 259
diff changeset
   102
\begin{tabular}{@{}cc@{}}
268
18bef085a7ca updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 263
diff changeset
   103
\begin{tikzpicture}
399
5c1fbb39c93e updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 394
diff changeset
   104
\begin{axis}[
414
065ca01b62ae updated
Christian Urban <urbanc@in.tum.de>
parents: 412
diff changeset
   105
    xlabel={$n$},
065ca01b62ae updated
Christian Urban <urbanc@in.tum.de>
parents: 412
diff changeset
   106
    x label style={at={(1.05,0.0)}},
412
1cef3924f7a2 updated
Christian Urban <urbanc@in.tum.de>
parents: 399
diff changeset
   107
    ylabel={\small time in secs},
262
ee4304bc6350 updated handouts
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 261
diff changeset
   108
    enlargelimits=false,
ee4304bc6350 updated handouts
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 261
diff changeset
   109
    xtick={0,5,...,30},
291
201c2c6d8696 updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 272
diff changeset
   110
    xmax=33,
268
18bef085a7ca updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 263
diff changeset
   111
    ymax=35,
18bef085a7ca updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 263
diff changeset
   112
    ytick={0,5,...,30},
262
ee4304bc6350 updated handouts
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 261
diff changeset
   113
    scaled ticks=false,
ee4304bc6350 updated handouts
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 261
diff changeset
   114
    axis lines=left,
ee4304bc6350 updated handouts
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 261
diff changeset
   115
    width=5cm,
926
42ecc3186944 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents: 874
diff changeset
   116
    height=4.55cm, 
262
ee4304bc6350 updated handouts
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 261
diff changeset
   117
    legend entries={Python,Ruby},  
ee4304bc6350 updated handouts
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 261
diff changeset
   118
    legend pos=north west,
268
18bef085a7ca updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 263
diff changeset
   119
    legend cell align=left]
434
8664ff87cd77 updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 433
diff changeset
   120
\addplot[blue,mark=*, mark options={fill=white}] table {re-python.data};
8664ff87cd77 updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 433
diff changeset
   121
\addplot[brown,mark=triangle*, mark options={fill=white}] table {re-ruby.data};  
268
18bef085a7ca updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 263
diff changeset
   122
\end{axis}
18bef085a7ca updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 263
diff changeset
   123
\end{tikzpicture}
261
24531cfaa36a updated handouts
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 259
diff changeset
   124
&
268
18bef085a7ca updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 263
diff changeset
   125
\begin{tikzpicture}
399
5c1fbb39c93e updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 394
diff changeset
   126
  \begin{axis}[
414
065ca01b62ae updated
Christian Urban <urbanc@in.tum.de>
parents: 412
diff changeset
   127
    xlabel={$n$},
065ca01b62ae updated
Christian Urban <urbanc@in.tum.de>
parents: 412
diff changeset
   128
    x label style={at={(1.1,0.05)}},
412
1cef3924f7a2 updated
Christian Urban <urbanc@in.tum.de>
parents: 399
diff changeset
   129
    ylabel={\small time in secs},
1cef3924f7a2 updated
Christian Urban <urbanc@in.tum.de>
parents: 399
diff changeset
   130
    enlargelimits=false,
477
b78664a24f5d updated
Christian Urban <urbanc@in.tum.de>
parents: 471
diff changeset
   131
    xtick={0,2500,...,11000},
b78664a24f5d updated
Christian Urban <urbanc@in.tum.de>
parents: 471
diff changeset
   132
    xmax=12000,
412
1cef3924f7a2 updated
Christian Urban <urbanc@in.tum.de>
parents: 399
diff changeset
   133
    ymax=35,
1cef3924f7a2 updated
Christian Urban <urbanc@in.tum.de>
parents: 399
diff changeset
   134
    ytick={0,5,...,30},
1cef3924f7a2 updated
Christian Urban <urbanc@in.tum.de>
parents: 399
diff changeset
   135
    scaled ticks=false,
1cef3924f7a2 updated
Christian Urban <urbanc@in.tum.de>
parents: 399
diff changeset
   136
    axis lines=left,
1cef3924f7a2 updated
Christian Urban <urbanc@in.tum.de>
parents: 399
diff changeset
   137
    width=6.5cm,
926
42ecc3186944 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents: 874
diff changeset
   138
    height=4.5cm,
488
598741d39d21 updated
Christian Urban <urbanc@in.tum.de>
parents: 481
diff changeset
   139
    legend entries={Our matcher},  
478
48b842c997c7 updated
Christian Urban <urbanc@in.tum.de>
parents: 477
diff changeset
   140
    legend pos=north east,
48b842c997c7 updated
Christian Urban <urbanc@in.tum.de>
parents: 477
diff changeset
   141
    legend cell align=left]
48b842c997c7 updated
Christian Urban <urbanc@in.tum.de>
parents: 477
diff changeset
   142
%\addplot[green,mark=square*,mark options={fill=white}] table {re2.data};
412
1cef3924f7a2 updated
Christian Urban <urbanc@in.tum.de>
parents: 399
diff changeset
   143
\addplot[black,mark=square*,mark options={fill=white}] table {re3.data};
1cef3924f7a2 updated
Christian Urban <urbanc@in.tum.de>
parents: 399
diff changeset
   144
\end{axis}
1cef3924f7a2 updated
Christian Urban <urbanc@in.tum.de>
parents: 399
diff changeset
   145
\end{tikzpicture}
1cef3924f7a2 updated
Christian Urban <urbanc@in.tum.de>
parents: 399
diff changeset
   146
\end{tabular}
1cef3924f7a2 updated
Christian Urban <urbanc@in.tum.de>
parents: 399
diff changeset
   147
\end{center}
488
598741d39d21 updated
Christian Urban <urbanc@in.tum.de>
parents: 481
diff changeset
   148
\bigskip
261
24531cfaa36a updated handouts
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 259
diff changeset
   149
412
1cef3924f7a2 updated
Christian Urban <urbanc@in.tum.de>
parents: 399
diff changeset
   150
\noindent
488
598741d39d21 updated
Christian Urban <urbanc@in.tum.de>
parents: 481
diff changeset
   151
In what follows we will use these regular expressions and strings as
598741d39d21 updated
Christian Urban <urbanc@in.tum.de>
parents: 481
diff changeset
   152
running examples. There will be several versions (V1, V2, V3,\ldots)
598741d39d21 updated
Christian Urban <urbanc@in.tum.de>
parents: 481
diff changeset
   153
of our matcher.\footnote{The corresponding files are
831
d499da29894c updated
Christian Urban <christian.urban@kcl.ac.uk>
parents: 764
diff changeset
   154
  \texttt{re1.sc}, \texttt{re2.sc} and so on. As usual, you can
727
eb9343126625 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents: 646
diff changeset
   155
  find the code on KEATS.}
261
24531cfaa36a updated handouts
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 259
diff changeset
   156
412
1cef3924f7a2 updated
Christian Urban <urbanc@in.tum.de>
parents: 399
diff changeset
   157
Having specified in the previous lecture what
325
794c599cee53 updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 318
diff changeset
   158
problem our regular expression matcher is supposed to solve,
794c599cee53 updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 318
diff changeset
   159
namely for any given regular expression $r$ and string $s$
794c599cee53 updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 318
diff changeset
   160
answer \textit{true} if and only if
123
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents:
diff changeset
   161
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents:
diff changeset
   162
\[
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents:
diff changeset
   163
s \in L(r)
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents:
diff changeset
   164
\]
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents:
diff changeset
   165
488
598741d39d21 updated
Christian Urban <urbanc@in.tum.de>
parents: 481
diff changeset
   166
\noindent we can look for an algorithm to solve this problem. Clearly
412
1cef3924f7a2 updated
Christian Urban <urbanc@in.tum.de>
parents: 399
diff changeset
   167
we cannot use the function $L$ directly for this, because in general
1cef3924f7a2 updated
Christian Urban <urbanc@in.tum.de>
parents: 399
diff changeset
   168
the set of strings $L$ returns is infinite (recall what $L(a^*)$ is).
1cef3924f7a2 updated
Christian Urban <urbanc@in.tum.de>
parents: 399
diff changeset
   169
In such cases there is no way we can implement an exhaustive test for
1cef3924f7a2 updated
Christian Urban <urbanc@in.tum.de>
parents: 399
diff changeset
   170
whether a string is member of this set or not. In contrast our
1cef3924f7a2 updated
Christian Urban <urbanc@in.tum.de>
parents: 399
diff changeset
   171
matching algorithm will operate on the regular expression $r$ and
414
065ca01b62ae updated
Christian Urban <urbanc@in.tum.de>
parents: 412
diff changeset
   172
string $s$, only, which are both finite objects. Before we explain 
646
2abd285c66d1 updated
Christian Urban <urbanc@in.tum.de>
parents: 618
diff changeset
   173
the matching algorithm, let us have a closer look at what it
412
1cef3924f7a2 updated
Christian Urban <urbanc@in.tum.de>
parents: 399
diff changeset
   174
means when two regular expressions are equivalent.
258
1e4da6d2490c updated programs
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 251
diff changeset
   175
1e4da6d2490c updated programs
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 251
diff changeset
   176
\subsection*{Regular Expression Equivalences}
123
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents:
diff changeset
   177
261
24531cfaa36a updated handouts
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 259
diff changeset
   178
We already defined in Handout 1 what it means for two regular
727
eb9343126625 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents: 646
diff changeset
   179
expressions to be equivalent, namely whether their 
eb9343126625 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents: 646
diff changeset
   180
\emph{meaning} is the same language:
258
1e4da6d2490c updated programs
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 251
diff changeset
   181
261
24531cfaa36a updated handouts
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 259
diff changeset
   182
\[
24531cfaa36a updated handouts
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 259
diff changeset
   183
r_1 \equiv r_2 \;\dn\; L(r_1) = L(r_2)
24531cfaa36a updated handouts
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 259
diff changeset
   184
\]
24531cfaa36a updated handouts
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 259
diff changeset
   185
24531cfaa36a updated handouts
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 259
diff changeset
   186
\noindent
24531cfaa36a updated handouts
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 259
diff changeset
   187
It is relatively easy to verify that some concrete equivalences
24531cfaa36a updated handouts
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 259
diff changeset
   188
hold, for example
124
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 123
diff changeset
   189
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 123
diff changeset
   190
\begin{center}
261
24531cfaa36a updated handouts
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 259
diff changeset
   191
\begin{tabular}{rcl}
24531cfaa36a updated handouts
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 259
diff changeset
   192
$(a + b)  + c$ & $\equiv$ & $a + (b + c)$\\
24531cfaa36a updated handouts
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 259
diff changeset
   193
$a + a$        & $\equiv$ & $a$\\
24531cfaa36a updated handouts
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 259
diff changeset
   194
$a + b$        & $\equiv$ & $b + a$\\
24531cfaa36a updated handouts
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 259
diff changeset
   195
$(a \cdot b) \cdot c$ & $\equiv$ & $a \cdot (b \cdot c)$\\
24531cfaa36a updated handouts
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 259
diff changeset
   196
$c \cdot (a + b)$ & $\equiv$ & $(c \cdot a) + (c \cdot b)$\\
124
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 123
diff changeset
   197
\end{tabular}
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 123
diff changeset
   198
\end{center}
123
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents:
diff changeset
   199
124
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 123
diff changeset
   200
\noindent
261
24531cfaa36a updated handouts
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 259
diff changeset
   201
but also easy to verify that the following regular expressions
24531cfaa36a updated handouts
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 259
diff changeset
   202
are \emph{not} equivalent
24531cfaa36a updated handouts
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 259
diff changeset
   203
24531cfaa36a updated handouts
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 259
diff changeset
   204
\begin{center}
24531cfaa36a updated handouts
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 259
diff changeset
   205
\begin{tabular}{rcl}
24531cfaa36a updated handouts
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 259
diff changeset
   206
$a \cdot a$ & $\not\equiv$ & $a$\\
24531cfaa36a updated handouts
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 259
diff changeset
   207
$a + (b \cdot c)$ & $\not\equiv$ & $(a + b) \cdot (a + c)$\\
24531cfaa36a updated handouts
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 259
diff changeset
   208
\end{tabular}
24531cfaa36a updated handouts
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 259
diff changeset
   209
\end{center}
24531cfaa36a updated handouts
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 259
diff changeset
   210
24531cfaa36a updated handouts
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 259
diff changeset
   211
\noindent I leave it to you to verify these equivalences and
24531cfaa36a updated handouts
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 259
diff changeset
   212
non-equivalences. It is also interesting to look at some
399
5c1fbb39c93e updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 394
diff changeset
   213
corner cases involving $\ONE$ and $\ZERO$:
261
24531cfaa36a updated handouts
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 259
diff changeset
   214
24531cfaa36a updated handouts
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 259
diff changeset
   215
\begin{center}
24531cfaa36a updated handouts
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 259
diff changeset
   216
\begin{tabular}{rcl}
399
5c1fbb39c93e updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 394
diff changeset
   217
$a \cdot \ZERO$ & $\not\equiv$ & $a$\\
5c1fbb39c93e updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 394
diff changeset
   218
$a + \ONE$ & $\not\equiv$ & $a$\\
5c1fbb39c93e updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 394
diff changeset
   219
$\ONE$ & $\equiv$ & $\ZERO^*$\\
5c1fbb39c93e updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 394
diff changeset
   220
$\ONE^*$ & $\equiv$ & $\ONE$\\
5c1fbb39c93e updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 394
diff changeset
   221
$\ZERO^*$ & $\not\equiv$ & $\ZERO$\\
261
24531cfaa36a updated handouts
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 259
diff changeset
   222
\end{tabular}
24531cfaa36a updated handouts
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 259
diff changeset
   223
\end{center}
24531cfaa36a updated handouts
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 259
diff changeset
   224
24531cfaa36a updated handouts
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 259
diff changeset
   225
\noindent Again I leave it to you to make sure you agree
24531cfaa36a updated handouts
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 259
diff changeset
   226
with these equivalences and non-equivalences. 
24531cfaa36a updated handouts
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 259
diff changeset
   227
24531cfaa36a updated handouts
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 259
diff changeset
   228
318
7975e4f0d4de updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 296
diff changeset
   229
For our matching algorithm however the following seven
261
24531cfaa36a updated handouts
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 259
diff changeset
   230
equivalences will play an important role:
24531cfaa36a updated handouts
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 259
diff changeset
   231
24531cfaa36a updated handouts
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 259
diff changeset
   232
\begin{center}
24531cfaa36a updated handouts
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 259
diff changeset
   233
\begin{tabular}{rcl}
399
5c1fbb39c93e updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 394
diff changeset
   234
$r + \ZERO$  & $\equiv$ & $r$\\
5c1fbb39c93e updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 394
diff changeset
   235
$\ZERO + r$  & $\equiv$ & $r$\\
5c1fbb39c93e updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 394
diff changeset
   236
$r \cdot \ONE$ & $\equiv$ & $r$\\
5c1fbb39c93e updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 394
diff changeset
   237
$\ONE \cdot r$     & $\equiv$ & $r$\\
5c1fbb39c93e updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 394
diff changeset
   238
$r \cdot \ZERO$ & $\equiv$ & $\ZERO$\\
5c1fbb39c93e updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 394
diff changeset
   239
$\ZERO \cdot r$ & $\equiv$ & $\ZERO$\\
261
24531cfaa36a updated handouts
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 259
diff changeset
   240
$r + r$ & $\equiv$ & $r$
24531cfaa36a updated handouts
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 259
diff changeset
   241
\end{tabular}
24531cfaa36a updated handouts
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 259
diff changeset
   242
\end{center}
24531cfaa36a updated handouts
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 259
diff changeset
   243
727
eb9343126625 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents: 646
diff changeset
   244
\noindent They always hold no matter what the regular expression $r$
412
1cef3924f7a2 updated
Christian Urban <urbanc@in.tum.de>
parents: 399
diff changeset
   245
looks like. The first two are easy to verify since $L(\ZERO)$ is the
1cef3924f7a2 updated
Christian Urban <urbanc@in.tum.de>
parents: 399
diff changeset
   246
empty set. The next two are also easy to verify since $L(\ONE) =
1cef3924f7a2 updated
Christian Urban <urbanc@in.tum.de>
parents: 399
diff changeset
   247
\{[]\}$ and appending the empty string to every string of another set,
1cef3924f7a2 updated
Christian Urban <urbanc@in.tum.de>
parents: 399
diff changeset
   248
leaves the set unchanged. Be careful to fully comprehend the fifth and
1cef3924f7a2 updated
Christian Urban <urbanc@in.tum.de>
parents: 399
diff changeset
   249
sixth equivalence: if you concatenate two sets of strings and one is
1cef3924f7a2 updated
Christian Urban <urbanc@in.tum.de>
parents: 399
diff changeset
   250
the empty set, then the concatenation will also be the empty set. To
1cef3924f7a2 updated
Christian Urban <urbanc@in.tum.de>
parents: 399
diff changeset
   251
see this, check the definition of $\_ @ \_$ for sets. The last
1cef3924f7a2 updated
Christian Urban <urbanc@in.tum.de>
parents: 399
diff changeset
   252
equivalence is again trivial.
261
24531cfaa36a updated handouts
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 259
diff changeset
   253
727
eb9343126625 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents: 646
diff changeset
   254
What will be critical later on is that we can orient these
261
24531cfaa36a updated handouts
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 259
diff changeset
   255
equivalences and read them from left to right. In this way we
325
794c599cee53 updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 318
diff changeset
   256
can view them as \emph{simplification rules}. Consider for 
261
24531cfaa36a updated handouts
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 259
diff changeset
   257
example the regular expression 
24531cfaa36a updated handouts
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 259
diff changeset
   258
24531cfaa36a updated handouts
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 259
diff changeset
   259
\begin{equation}
399
5c1fbb39c93e updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 394
diff changeset
   260
(r_1 + \ZERO) \cdot \ONE + ((\ONE + r_2) + r_3) \cdot (r_4 \cdot \ZERO)
261
24531cfaa36a updated handouts
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 259
diff changeset
   261
\label{big}
24531cfaa36a updated handouts
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 259
diff changeset
   262
\end{equation}
24531cfaa36a updated handouts
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 259
diff changeset
   263
412
1cef3924f7a2 updated
Christian Urban <urbanc@in.tum.de>
parents: 399
diff changeset
   264
\noindent If we can find an equivalent regular expression that is
488
598741d39d21 updated
Christian Urban <urbanc@in.tum.de>
parents: 481
diff changeset
   265
simpler (that usually means smaller), then this might potentially make
727
eb9343126625 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents: 646
diff changeset
   266
our matching algorithm run faster. We can look for such a simpler, but
eb9343126625 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents: 646
diff changeset
   267
equivalent, regular expression $r'$ because whether a string $s$ is in
926
42ecc3186944 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents: 874
diff changeset
   268
$L(r)$ or in $L(r')$ does not matter as long as $r\equiv r'$. Yes? \footnote{You have checked this for yourself? Your friendly lecturer might talk rubbish\ldots{}one never knows.}
488
598741d39d21 updated
Christian Urban <urbanc@in.tum.de>
parents: 481
diff changeset
   269
727
eb9343126625 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents: 646
diff changeset
   270
In the example above you will see that the regular expression in
eb9343126625 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents: 646
diff changeset
   271
\eqref{big} is equivalent to just $r_1$. You can verify this by
eb9343126625 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents: 646
diff changeset
   272
iteratively applying the simplification rules from above:
261
24531cfaa36a updated handouts
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 259
diff changeset
   273
24531cfaa36a updated handouts
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 259
diff changeset
   274
\begin{center}
24531cfaa36a updated handouts
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 259
diff changeset
   275
\begin{tabular}{ll}
399
5c1fbb39c93e updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 394
diff changeset
   276
 & $(r_1 + \ZERO) \cdot \ONE + ((\ONE + r_2) + r_3) \cdot 
5c1fbb39c93e updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 394
diff changeset
   277
(\underline{r_4 \cdot \ZERO})$\smallskip\\
5c1fbb39c93e updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 394
diff changeset
   278
$\equiv$ & $(r_1 + \ZERO) \cdot \ONE + \underline{((\ONE + r_2) + r_3) \cdot 
5c1fbb39c93e updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 394
diff changeset
   279
\ZERO}$\smallskip\\
5c1fbb39c93e updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 394
diff changeset
   280
$\equiv$ & $\underline{(r_1 + \ZERO) \cdot \ONE} + \ZERO$\smallskip\\
5c1fbb39c93e updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 394
diff changeset
   281
$\equiv$ & $(\underline{r_1 + \ZERO}) + \ZERO$\smallskip\\
5c1fbb39c93e updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 394
diff changeset
   282
$\equiv$ & $\underline{r_1 + \ZERO}$\smallskip\\
261
24531cfaa36a updated handouts
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 259
diff changeset
   283
$\equiv$ & $r_1$\
24531cfaa36a updated handouts
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 259
diff changeset
   284
\end{tabular}
24531cfaa36a updated handouts
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 259
diff changeset
   285
\end{center}
24531cfaa36a updated handouts
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 259
diff changeset
   286
296
796b9b81ac8d updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 291
diff changeset
   287
\noindent In each step, I underlined where a simplification
261
24531cfaa36a updated handouts
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 259
diff changeset
   288
rule is applied. Our matching algorithm in the next section
399
5c1fbb39c93e updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 394
diff changeset
   289
will often generate such ``useless'' $\ONE$s and
5c1fbb39c93e updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 394
diff changeset
   290
$\ZERO$s, therefore simplifying them away will make the
261
24531cfaa36a updated handouts
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 259
diff changeset
   291
algorithm quite a bit faster.
24531cfaa36a updated handouts
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 259
diff changeset
   292
488
598741d39d21 updated
Christian Urban <urbanc@in.tum.de>
parents: 481
diff changeset
   293
Finally here are three equivalences between regular expressions which are
479
52aa298211f6 updated
Christian Urban <urbanc@in.tum.de>
parents: 478
diff changeset
   294
not so obvious:
52aa298211f6 updated
Christian Urban <urbanc@in.tum.de>
parents: 478
diff changeset
   295
52aa298211f6 updated
Christian Urban <urbanc@in.tum.de>
parents: 478
diff changeset
   296
\begin{center}
52aa298211f6 updated
Christian Urban <urbanc@in.tum.de>
parents: 478
diff changeset
   297
\begin{tabular}{rcl}
727
eb9343126625 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents: 646
diff changeset
   298
$r^*$  & $\equiv$ & $\ONE + r\cdot r^*$\\
479
52aa298211f6 updated
Christian Urban <urbanc@in.tum.de>
parents: 478
diff changeset
   299
$(r_1 + r_2)^*$  & $\equiv$ & $r_1^* \cdot (r_2\cdot r_1^*)^*$\\
727
eb9343126625 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents: 646
diff changeset
   300
$(r_1 \cdot r_2)^*$ & $\equiv$ & $\ONE + r_1\cdot (r_2 \cdot r_1)^* \cdot r_2$\\
479
52aa298211f6 updated
Christian Urban <urbanc@in.tum.de>
parents: 478
diff changeset
   301
\end{tabular}
52aa298211f6 updated
Christian Urban <urbanc@in.tum.de>
parents: 478
diff changeset
   302
\end{center}
52aa298211f6 updated
Christian Urban <urbanc@in.tum.de>
parents: 478
diff changeset
   303
52aa298211f6 updated
Christian Urban <urbanc@in.tum.de>
parents: 478
diff changeset
   304
\noindent
727
eb9343126625 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents: 646
diff changeset
   305
We will not use them in our algorithm, but feel free to convince
eb9343126625 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents: 646
diff changeset
   306
yourself that they actually hold. As an aside, there has been a lot of
926
42ecc3186944 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents: 874
diff changeset
   307
research on questions like: Can one always decide whether two regular
727
eb9343126625 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents: 646
diff changeset
   308
expressions are equivalent or not? What does an algorithm look like to
831
d499da29894c updated
Christian Urban <christian.urban@kcl.ac.uk>
parents: 764
diff changeset
   309
decide this efficiently? Surprisingly, many of such questions
d499da29894c updated
Christian Urban <christian.urban@kcl.ac.uk>
parents: 764
diff changeset
   310
turn out to be non-trivial problems.
d499da29894c updated
Christian Urban <christian.urban@kcl.ac.uk>
parents: 764
diff changeset
   311
479
52aa298211f6 updated
Christian Urban <urbanc@in.tum.de>
parents: 478
diff changeset
   312
261
24531cfaa36a updated handouts
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 259
diff changeset
   313
\subsection*{The Matching Algorithm}
24531cfaa36a updated handouts
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 259
diff changeset
   314
926
42ecc3186944 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents: 874
diff changeset
   315
The regular expression matching algorithm we will introduce below
42ecc3186944 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents: 874
diff changeset
   316
consists of two parts: One is the function $\textit{nullable}$ which
42ecc3186944 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents: 874
diff changeset
   317
takes a regular expression as an argument and decides whether it can
42ecc3186944 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents: 874
diff changeset
   318
match the empty string (this means it returns a boolean in
42ecc3186944 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents: 874
diff changeset
   319
Scala). This can be easily defined recursively as follows:
261
24531cfaa36a updated handouts
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 259
diff changeset
   320
24531cfaa36a updated handouts
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 259
diff changeset
   321
\begin{center}
24531cfaa36a updated handouts
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 259
diff changeset
   322
\begin{tabular}{@ {}l@ {\hspace{2mm}}c@ {\hspace{2mm}}l@ {}}
412
1cef3924f7a2 updated
Christian Urban <urbanc@in.tum.de>
parents: 399
diff changeset
   323
$\textit{nullable}(\ZERO)$      & $\dn$ & $\textit{false}$\\
1cef3924f7a2 updated
Christian Urban <urbanc@in.tum.de>
parents: 399
diff changeset
   324
$\textit{nullable}(\ONE)$         & $\dn$ & $\textit{true}$\\
1cef3924f7a2 updated
Christian Urban <urbanc@in.tum.de>
parents: 399
diff changeset
   325
$\textit{nullable}(c)$                & $\dn$ & $\textit{false}$\\
1cef3924f7a2 updated
Christian Urban <urbanc@in.tum.de>
parents: 399
diff changeset
   326
$\textit{nullable}(r_1 + r_2)$     & $\dn$ &  $\textit{nullable}(r_1) \vee \textit{nullable}(r_2)$\\ 
1cef3924f7a2 updated
Christian Urban <urbanc@in.tum.de>
parents: 399
diff changeset
   327
$\textit{nullable}(r_1 \cdot r_2)$ & $\dn$ &  $\textit{nullable}(r_1) \wedge \textit{nullable}(r_2)$\\
1cef3924f7a2 updated
Christian Urban <urbanc@in.tum.de>
parents: 399
diff changeset
   328
$\textit{nullable}(r^*)$              & $\dn$ & $\textit{true}$ \\
261
24531cfaa36a updated handouts
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 259
diff changeset
   329
\end{tabular}
24531cfaa36a updated handouts
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 259
diff changeset
   330
\end{center}
24531cfaa36a updated handouts
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 259
diff changeset
   331
24531cfaa36a updated handouts
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 259
diff changeset
   332
\noindent The idea behind this function is that the following
24531cfaa36a updated handouts
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 259
diff changeset
   333
property holds:
124
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 123
diff changeset
   334
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 123
diff changeset
   335
\[
412
1cef3924f7a2 updated
Christian Urban <urbanc@in.tum.de>
parents: 399
diff changeset
   336
\textit{nullable}(r) \;\;\text{if and only if}\;\; []\in L(r)
124
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 123
diff changeset
   337
\]
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 123
diff changeset
   338
727
eb9343126625 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents: 646
diff changeset
   339
\noindent Note on the left-hand side of the if-and-only-if we have a
874
ffe02fd574a5 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents: 873
diff changeset
   340
function we can implement, for example in Scala; on the right we have
727
eb9343126625 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents: 646
diff changeset
   341
its specification (which we cannot implement in a programming language).
124
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 123
diff changeset
   342
258
1e4da6d2490c updated programs
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 251
diff changeset
   343
The other function of our matching algorithm calculates a
261
24531cfaa36a updated handouts
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 259
diff changeset
   344
\emph{derivative} of a regular expression. This is a function
24531cfaa36a updated handouts
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 259
diff changeset
   345
which will take a regular expression, say $r$, and a
412
1cef3924f7a2 updated
Christian Urban <urbanc@in.tum.de>
parents: 399
diff changeset
   346
character, say $c$, as arguments and returns a new regular
488
598741d39d21 updated
Christian Urban <urbanc@in.tum.de>
parents: 481
diff changeset
   347
expression. Be mindful that the intuition behind this function
261
24531cfaa36a updated handouts
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 259
diff changeset
   348
is not so easy to grasp on first reading. Essentially this
24531cfaa36a updated handouts
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 259
diff changeset
   349
function solves the following problem: if $r$ can match a
488
598741d39d21 updated
Christian Urban <urbanc@in.tum.de>
parents: 481
diff changeset
   350
string of the form $c\!::\!s$, what does a regular
325
794c599cee53 updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 318
diff changeset
   351
expression look like that can match just $s$? The definition
261
24531cfaa36a updated handouts
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 259
diff changeset
   352
of this function is as follows:
125
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 124
diff changeset
   353
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 124
diff changeset
   354
\begin{center}
261
24531cfaa36a updated handouts
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 259
diff changeset
   355
\begin{tabular}{l@ {\hspace{2mm}}c@ {\hspace{2mm}}l}
414
065ca01b62ae updated
Christian Urban <urbanc@in.tum.de>
parents: 412
diff changeset
   356
  $\textit{der}\, c\, (\ZERO)$      & $\dn$ & $\ZERO$\\
065ca01b62ae updated
Christian Urban <urbanc@in.tum.de>
parents: 412
diff changeset
   357
  $\textit{der}\, c\, (\ONE)$         & $\dn$ & $\ZERO$ \\
065ca01b62ae updated
Christian Urban <urbanc@in.tum.de>
parents: 412
diff changeset
   358
  $\textit{der}\, c\, (d)$                & $\dn$ & if $c = d$ then $\ONE$ else $\ZERO$\\
065ca01b62ae updated
Christian Urban <urbanc@in.tum.de>
parents: 412
diff changeset
   359
  $\textit{der}\, c\, (r_1 + r_2)$        & $\dn$ & $\textit{der}\, c\, r_1 + \textit{der}\, c\, r_2$\\
065ca01b62ae updated
Christian Urban <urbanc@in.tum.de>
parents: 412
diff changeset
   360
  $\textit{der}\, c\, (r_1 \cdot r_2)$  & $\dn$  & if $\textit{nullable} (r_1)$\\
065ca01b62ae updated
Christian Urban <urbanc@in.tum.de>
parents: 412
diff changeset
   361
  & & then $(\textit{der}\,c\,r_1) \cdot r_2 + \textit{der}\, c\, r_2$\\ 
065ca01b62ae updated
Christian Urban <urbanc@in.tum.de>
parents: 412
diff changeset
   362
  & & else $(\textit{der}\, c\, r_1) \cdot r_2$\\
065ca01b62ae updated
Christian Urban <urbanc@in.tum.de>
parents: 412
diff changeset
   363
  $\textit{der}\, c\, (r^*)$          & $\dn$ & $(\textit{der}\,c\,r) \cdot (r^*)$
125
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 124
diff changeset
   364
  \end{tabular}
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 124
diff changeset
   365
\end{center}
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 124
diff changeset
   366
727
eb9343126625 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents: 646
diff changeset
   367
\noindent The first two clauses can be rationalised as follows: recall
eb9343126625 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents: 646
diff changeset
   368
that $\textit{der}$ should calculate a regular expression so that
926
42ecc3186944 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents: 874
diff changeset
   369
provided the ``input'' regular expression can match a string of the
727
eb9343126625 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents: 646
diff changeset
   370
form $c\!::\!s$, we want a regular expression for $s$. Since neither
926
42ecc3186944 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents: 874
diff changeset
   371
$\ZERO$ nor $\ONE$ can match a string of the form $c\!::\!s$, we
42ecc3186944 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents: 874
diff changeset
   372
return $\ZERO$. In the third case we have to make a case-distinction:
42ecc3186944 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents: 874
diff changeset
   373
In case the regular expression is $c$, then clearly it can recognise a
42ecc3186944 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents: 874
diff changeset
   374
string of the form $c\!::\!s$, just that $s$ is the empty
42ecc3186944 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents: 874
diff changeset
   375
string. Therefore we return the $\ONE$-regular expression in this
42ecc3186944 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents: 874
diff changeset
   376
case, as it can match the empty string.  In the other case we again
42ecc3186944 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents: 874
diff changeset
   377
return $\ZERO$ since no string of the $c\!::\!s$ can be matched. Next
42ecc3186944 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents: 874
diff changeset
   378
come the recursive cases, which are a bit more involved. Fortunately,
42ecc3186944 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents: 874
diff changeset
   379
the $+$-case is still relatively straightforward: all strings of the
42ecc3186944 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents: 874
diff changeset
   380
form $c\!::\!s$ are either matched by the regular expression $r_1$ or
42ecc3186944 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents: 874
diff changeset
   381
$r_2$. So we just have to recursively call $\textit{der}$ with these
42ecc3186944 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents: 874
diff changeset
   382
two regular expressions and compose the results again with $+$. Makes
42ecc3186944 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents: 874
diff changeset
   383
sense?
727
eb9343126625 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents: 646
diff changeset
   384
412
1cef3924f7a2 updated
Christian Urban <urbanc@in.tum.de>
parents: 399
diff changeset
   385
1cef3924f7a2 updated
Christian Urban <urbanc@in.tum.de>
parents: 399
diff changeset
   386
The $\cdot$-case is more complicated: if $r_1\cdot r_2$
261
24531cfaa36a updated handouts
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 259
diff changeset
   387
matches a string of the form $c\!::\!s$, then the first part
24531cfaa36a updated handouts
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 259
diff changeset
   388
must be matched by $r_1$. Consequently, it makes sense to
414
065ca01b62ae updated
Christian Urban <urbanc@in.tum.de>
parents: 412
diff changeset
   389
construct the regular expression for $s$ by calling $\textit{der}$ with
261
24531cfaa36a updated handouts
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 259
diff changeset
   390
$r_1$ and ``appending'' $r_2$. There is however one exception
24531cfaa36a updated handouts
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 259
diff changeset
   391
to this simple rule: if $r_1$ can match the empty string, then
24531cfaa36a updated handouts
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 259
diff changeset
   392
all of $c\!::\!s$ is matched by $r_2$. So in case $r_1$ is
24531cfaa36a updated handouts
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 259
diff changeset
   393
nullable (that is can match the empty string) we have to allow
414
065ca01b62ae updated
Christian Urban <urbanc@in.tum.de>
parents: 412
diff changeset
   394
the choice $\textit{der}\,c\,r_2$ for calculating the regular
325
794c599cee53 updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 318
diff changeset
   395
expression that can match $s$. Therefore we have to add the
414
065ca01b62ae updated
Christian Urban <urbanc@in.tum.de>
parents: 412
diff changeset
   396
regular expression $\textit{der}\,c\,r_2$ in the result. The $*$-case
325
794c599cee53 updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 318
diff changeset
   397
is again simple: if $r^*$ matches a string of the form
794c599cee53 updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 318
diff changeset
   398
$c\!::\!s$, then the first part must be ``matched'' by a
414
065ca01b62ae updated
Christian Urban <urbanc@in.tum.de>
parents: 412
diff changeset
   399
single copy of $r$. Therefore we call recursively $\textit{der}\,c\,r$
065ca01b62ae updated
Christian Urban <urbanc@in.tum.de>
parents: 412
diff changeset
   400
and ``append'' $r^*$ in order to match the rest of $s$. Still
065ca01b62ae updated
Christian Urban <urbanc@in.tum.de>
parents: 412
diff changeset
   401
makes sense?
125
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 124
diff changeset
   402
488
598741d39d21 updated
Christian Urban <urbanc@in.tum.de>
parents: 481
diff changeset
   403
If all this did not make sense yet, here is another way to explain the
598741d39d21 updated
Christian Urban <urbanc@in.tum.de>
parents: 481
diff changeset
   404
definition of $\textit{der}$ by considering the following operation on
598741d39d21 updated
Christian Urban <urbanc@in.tum.de>
parents: 481
diff changeset
   405
sets:
125
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 124
diff changeset
   406
399
5c1fbb39c93e updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 394
diff changeset
   407
\begin{equation}\label{Der}
414
065ca01b62ae updated
Christian Urban <urbanc@in.tum.de>
parents: 412
diff changeset
   408
\textit{Der}\,c\,A\;\dn\;\{s\,|\,c\!::\!s \in A\}
399
5c1fbb39c93e updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 394
diff changeset
   409
\end{equation}
125
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 124
diff changeset
   410
291
201c2c6d8696 updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 272
diff changeset
   411
\noindent This operation essentially transforms a set of
201c2c6d8696 updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 272
diff changeset
   412
strings $A$ by filtering out all strings that do not start
201c2c6d8696 updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 272
diff changeset
   413
with $c$ and then strips off the $c$ from all the remaining
201c2c6d8696 updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 272
diff changeset
   414
strings. For example suppose $A = \{f\!oo, bar, f\!rak\}$ then
343
539b2e88f5b9 updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 340
diff changeset
   415
414
065ca01b62ae updated
Christian Urban <urbanc@in.tum.de>
parents: 412
diff changeset
   416
\[ \textit{Der}\,f\,A = \{oo, rak\}\quad,\quad 
065ca01b62ae updated
Christian Urban <urbanc@in.tum.de>
parents: 412
diff changeset
   417
   \textit{Der}\,b\,A = \{ar\} \quad \text{and} \quad 
065ca01b62ae updated
Christian Urban <urbanc@in.tum.de>
parents: 412
diff changeset
   418
   \textit{Der}\,a\,A = \{\} 
343
539b2e88f5b9 updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 340
diff changeset
   419
\]
125
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 124
diff changeset
   420
 
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 124
diff changeset
   421
\noindent
414
065ca01b62ae updated
Christian Urban <urbanc@in.tum.de>
parents: 412
diff changeset
   422
Note that in the last case $\textit{Der}$ is empty, because no string in $A$
258
1e4da6d2490c updated programs
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 251
diff changeset
   423
starts with $a$. With this operation we can state the following
414
065ca01b62ae updated
Christian Urban <urbanc@in.tum.de>
parents: 412
diff changeset
   424
property about $\textit{der}$:
125
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 124
diff changeset
   425
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 124
diff changeset
   426
\[
414
065ca01b62ae updated
Christian Urban <urbanc@in.tum.de>
parents: 412
diff changeset
   427
L(\textit{der}\,c\,r) = \textit{Der}\,c\,(L(r))
125
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 124
diff changeset
   428
\]
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 124
diff changeset
   429
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 124
diff changeset
   430
\noindent
414
065ca01b62ae updated
Christian Urban <urbanc@in.tum.de>
parents: 412
diff changeset
   431
This property clarifies what regular expression $\textit{der}$ calculates,
258
1e4da6d2490c updated programs
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 251
diff changeset
   432
namely take the set of strings that $r$ can match (that is $L(r)$),
1e4da6d2490c updated programs
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 251
diff changeset
   433
filter out all strings not starting with $c$ and strip off the $c$
1e4da6d2490c updated programs
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 251
diff changeset
   434
from the remaining strings---this is exactly the language that
414
065ca01b62ae updated
Christian Urban <urbanc@in.tum.de>
parents: 412
diff changeset
   435
$\textit{der}\,c\,r$ can match.
125
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 124
diff changeset
   436
261
24531cfaa36a updated handouts
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 259
diff changeset
   437
If we want to find out whether the string $abc$ is matched by
414
065ca01b62ae updated
Christian Urban <urbanc@in.tum.de>
parents: 412
diff changeset
   438
the regular expression $r_1$ then we can iteratively apply $\textit{der}$
261
24531cfaa36a updated handouts
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 259
diff changeset
   439
as follows
24531cfaa36a updated handouts
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 259
diff changeset
   440
24531cfaa36a updated handouts
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 259
diff changeset
   441
\begin{center}
24531cfaa36a updated handouts
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 259
diff changeset
   442
\begin{tabular}{rll}
24531cfaa36a updated handouts
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 259
diff changeset
   443
Input: $r_1$, $abc$\medskip\\
414
065ca01b62ae updated
Christian Urban <urbanc@in.tum.de>
parents: 412
diff changeset
   444
Step 1: & build derivative of $a$ and $r_1$ & $(r_2 = \textit{der}\,a\,r_1)$\smallskip\\
065ca01b62ae updated
Christian Urban <urbanc@in.tum.de>
parents: 412
diff changeset
   445
Step 2: & build derivative of $b$ and $r_2$ & $(r_3 = \textit{der}\,b\,r_2)$\smallskip\\
433
c08290ee4f1f updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 416
diff changeset
   446
Step 3: & build derivative of $c$ and $r_3$ & $(r_4 = \textit{der}\,c\,r_3)$\smallskip\\
c08290ee4f1f updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 416
diff changeset
   447
Step 4: & the string is exhausted: & $(\textit{nullable}(r_4))$\\
c08290ee4f1f updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 416
diff changeset
   448
        & test whether $r_4$ can recognise the\\
261
24531cfaa36a updated handouts
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 259
diff changeset
   449
        & empty string\smallskip\\
412
1cef3924f7a2 updated
Christian Urban <urbanc@in.tum.de>
parents: 399
diff changeset
   450
Output: & result of this test $\Rightarrow \textit{true} \,\text{or}\, \textit{false}$\\        
261
24531cfaa36a updated handouts
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 259
diff changeset
   451
\end{tabular}
24531cfaa36a updated handouts
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 259
diff changeset
   452
\end{center}
140
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 133
diff changeset
   453
414
065ca01b62ae updated
Christian Urban <urbanc@in.tum.de>
parents: 412
diff changeset
   454
\noindent Again the operation $\textit{Der}$ might help to rationalise
261
24531cfaa36a updated handouts
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 259
diff changeset
   455
this algorithm. We want to know whether $abc \in L(r_1)$. We
414
065ca01b62ae updated
Christian Urban <urbanc@in.tum.de>
parents: 412
diff changeset
   456
do not know yet---but let us assume it is. Then $\textit{Der}\,a\,L(r_1)$
261
24531cfaa36a updated handouts
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 259
diff changeset
   457
builds the set where all the strings not starting with $a$ are
24531cfaa36a updated handouts
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 259
diff changeset
   458
filtered out. Of the remaining strings, the $a$ is stripped
412
1cef3924f7a2 updated
Christian Urban <urbanc@in.tum.de>
parents: 399
diff changeset
   459
off. So we should still have $bc$ in the set.
1cef3924f7a2 updated
Christian Urban <urbanc@in.tum.de>
parents: 399
diff changeset
   460
Then we continue with filtering out all strings not
261
24531cfaa36a updated handouts
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 259
diff changeset
   461
starting with $b$ and stripping off the $b$ from the remaining
414
065ca01b62ae updated
Christian Urban <urbanc@in.tum.de>
parents: 412
diff changeset
   462
strings, that means we build $\textit{Der}\,b\,(\textit{Der}\,a\,(L(r_1)))$.
261
24531cfaa36a updated handouts
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 259
diff changeset
   463
Finally we filter out all strings not starting with $c$ and
24531cfaa36a updated handouts
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 259
diff changeset
   464
strip off $c$ from the remaining string. This is
414
065ca01b62ae updated
Christian Urban <urbanc@in.tum.de>
parents: 412
diff changeset
   465
$\textit{Der}\,c\,(\textit{Der}\,b\,(\textit{Der}\,a\,(L(r_1))))$. Now if $abc$ was in the 
065ca01b62ae updated
Christian Urban <urbanc@in.tum.de>
parents: 412
diff changeset
   466
original set ($L(r_1)$), then $\textit{Der}\,c\,(\textit{Der}\,b\,(\textit{Der}\,a\,(L(r_1))))$ 
412
1cef3924f7a2 updated
Christian Urban <urbanc@in.tum.de>
parents: 399
diff changeset
   467
must contain the empty string. If not, then $abc$ was not in the 
261
24531cfaa36a updated handouts
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 259
diff changeset
   468
language we started with.
140
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 133
diff changeset
   469
414
065ca01b62ae updated
Christian Urban <urbanc@in.tum.de>
parents: 412
diff changeset
   470
Our matching algorithm using $\textit{der}$ and $\textit{nullable}$ works
571
499007a7bce2 updated
Christian Urban <urbanc@in.tum.de>
parents: 566
diff changeset
   471
similarly, just using regular expressions instead of sets. In order to
414
065ca01b62ae updated
Christian Urban <urbanc@in.tum.de>
parents: 412
diff changeset
   472
define our algorithm we need to extend the notion of derivatives from single
261
24531cfaa36a updated handouts
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 259
diff changeset
   473
characters to strings. This can be done using the following
414
065ca01b62ae updated
Christian Urban <urbanc@in.tum.de>
parents: 412
diff changeset
   474
function, taking a string and a regular expression as input and
261
24531cfaa36a updated handouts
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 259
diff changeset
   475
a regular expression as output.
125
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 124
diff changeset
   476
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 124
diff changeset
   477
\begin{center}
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 124
diff changeset
   478
\begin{tabular}{@ {}l@ {\hspace{2mm}}c@ {\hspace{2mm}}l@ {\hspace{-10mm}}l@ {}}
261
24531cfaa36a updated handouts
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 259
diff changeset
   479
  $\textit{ders}\, []\, r$     & $\dn$ & $r$ & \\
414
065ca01b62ae updated
Christian Urban <urbanc@in.tum.de>
parents: 412
diff changeset
   480
  $\textit{ders}\, (c\!::\!s)\, r$ & $\dn$ & $\textit{ders}\,s\,(\textit{der}\,c\,r)$ & \\
125
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 124
diff changeset
   481
  \end{tabular}
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 124
diff changeset
   482
\end{center}
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 124
diff changeset
   483
414
065ca01b62ae updated
Christian Urban <urbanc@in.tum.de>
parents: 412
diff changeset
   484
\noindent This function iterates $\textit{der}$ taking one character at
488
598741d39d21 updated
Christian Urban <urbanc@in.tum.de>
parents: 481
diff changeset
   485
the time from the original string until the string is exhausted.
414
065ca01b62ae updated
Christian Urban <urbanc@in.tum.de>
parents: 412
diff changeset
   486
Having $\textit{der}s$ in place, we can finally define our matching
325
794c599cee53 updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 318
diff changeset
   487
algorithm:
125
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 124
diff changeset
   488
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 124
diff changeset
   489
\[
764
6718ef6143b8 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents: 757
diff changeset
   490
\textit{matcher}\,r\,s \dn \textit{nullable}(\textit{ders}\,s\,r)
125
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 124
diff changeset
   491
\]
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 124
diff changeset
   492
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 124
diff changeset
   493
\noindent
325
794c599cee53 updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 318
diff changeset
   494
and we can claim that
261
24531cfaa36a updated handouts
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 259
diff changeset
   495
24531cfaa36a updated handouts
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 259
diff changeset
   496
\[
764
6718ef6143b8 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents: 757
diff changeset
   497
\textit{matcher}\,r\,s\quad\text{if and only if}\quad s\in L(r)
261
24531cfaa36a updated handouts
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 259
diff changeset
   498
\]
24531cfaa36a updated handouts
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 259
diff changeset
   499
24531cfaa36a updated handouts
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 259
diff changeset
   500
\noindent holds, which means our algorithm satisfies the
24531cfaa36a updated handouts
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 259
diff changeset
   501
specification. Of course we can claim many things\ldots
831
d499da29894c updated
Christian Urban <christian.urban@kcl.ac.uk>
parents: 764
diff changeset
   502
whether the claim holds any water is a different question.
261
24531cfaa36a updated handouts
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 259
diff changeset
   503
566
b153c04834eb updated
Christian Urban <urbanc@in.tum.de>
parents: 550
diff changeset
   504
This algorithm was introduced by Janusz Brzozowski in 1964, but 
414
065ca01b62ae updated
Christian Urban <urbanc@in.tum.de>
parents: 412
diff changeset
   505
is more widely known only in the last 10 or so years. Its
261
24531cfaa36a updated handouts
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 259
diff changeset
   506
main attractions are simplicity and being fast, as well as
566
b153c04834eb updated
Christian Urban <urbanc@in.tum.de>
parents: 550
diff changeset
   507
being easily extendible for other regular expressions such as
261
24531cfaa36a updated handouts
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 259
diff changeset
   508
$r^{\{n\}}$, $r^?$, $\sim{}r$ and so on (this is subject of
831
d499da29894c updated
Christian Urban <christian.urban@kcl.ac.uk>
parents: 764
diff changeset
   509
Coursework 1). 
258
1e4da6d2490c updated programs
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 251
diff changeset
   510
1e4da6d2490c updated programs
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 251
diff changeset
   511
\subsection*{The Matching Algorithm in Scala}
1e4da6d2490c updated programs
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 251
diff changeset
   512
261
24531cfaa36a updated handouts
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 259
diff changeset
   513
Another attraction of the algorithm is that it can be easily
24531cfaa36a updated handouts
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 259
diff changeset
   514
implemented in a functional programming language, like Scala.
296
796b9b81ac8d updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 291
diff changeset
   515
Given the implementation of regular expressions in Scala shown
796b9b81ac8d updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 291
diff changeset
   516
in the first lecture and handout, the functions and subfunctions
764
6718ef6143b8 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents: 757
diff changeset
   517
for \pcode{matcher} are shown in Figure~\ref{scala1}.
126
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 125
diff changeset
   518
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 125
diff changeset
   519
\begin{figure}[p]
477
b78664a24f5d updated
Christian Urban <urbanc@in.tum.de>
parents: 471
diff changeset
   520
  \lstinputlisting[numbers=left,linebackgroundcolor=
b78664a24f5d updated
Christian Urban <urbanc@in.tum.de>
parents: 471
diff changeset
   521
                  {\ifodd\value{lstnumber}\color{capri!3}\fi}]
b78664a24f5d updated
Christian Urban <urbanc@in.tum.de>
parents: 471
diff changeset
   522
                  {../progs/app5.scala}
874
ffe02fd574a5 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents: 873
diff changeset
   523
\caption{A Scala implementation of the \textit{nullable} and 
ffe02fd574a5 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents: 873
diff changeset
   524
  the derivative function. These functions are easy to
512
a6aa52ecc1c5 updated
cu
parents: 511
diff changeset
   525
  implement in functional programming languages. This is because pattern 
325
794c599cee53 updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 318
diff changeset
   526
  matching and recursion allow us to mimic the mathematical
488
598741d39d21 updated
Christian Urban <urbanc@in.tum.de>
parents: 481
diff changeset
   527
  definitions very closely. Nearly all functional
598741d39d21 updated
Christian Urban <urbanc@in.tum.de>
parents: 481
diff changeset
   528
  programming languages support pattern matching and
598741d39d21 updated
Christian Urban <urbanc@in.tum.de>
parents: 481
diff changeset
   529
  recursion out of the box.\label{scala1}}
126
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 125
diff changeset
   530
\end{figure}
123
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents:
diff changeset
   531
414
065ca01b62ae updated
Christian Urban <urbanc@in.tum.de>
parents: 412
diff changeset
   532
443
cd43d8c6eb84 updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 434
diff changeset
   533
%Remember our second example involving the regular expression
cd43d8c6eb84 updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 434
diff changeset
   534
%$(a^*)^* \cdot b$ which could not match strings of $n$ \texttt{a}s. 
cd43d8c6eb84 updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 434
diff changeset
   535
%Java needed around 30 seconds to find this out a string with $n=28$.
cd43d8c6eb84 updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 434
diff changeset
   536
%It seems our algorithm is doing rather well in comparison:
cd43d8c6eb84 updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 434
diff changeset
   537
%
cd43d8c6eb84 updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 434
diff changeset
   538
%\begin{center}
cd43d8c6eb84 updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 434
diff changeset
   539
%\begin{tikzpicture}
cd43d8c6eb84 updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 434
diff changeset
   540
%\begin{axis}[
cd43d8c6eb84 updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 434
diff changeset
   541
%    title={Graph: $(a^*)^* \cdot b$ and strings $\underbrace{a\ldots a}_{n}$},
cd43d8c6eb84 updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 434
diff changeset
   542
%    xlabel={$n$},
cd43d8c6eb84 updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 434
diff changeset
   543
%    x label style={at={(1.05,0.0)}},
cd43d8c6eb84 updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 434
diff changeset
   544
%    ylabel={time in secs},
cd43d8c6eb84 updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 434
diff changeset
   545
%    enlargelimits=false,
cd43d8c6eb84 updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 434
diff changeset
   546
%    xtick={0,1000,...,6500},
cd43d8c6eb84 updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 434
diff changeset
   547
%    xmax=6800,
cd43d8c6eb84 updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 434
diff changeset
   548
%    ytick={0,5,...,30},
cd43d8c6eb84 updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 434
diff changeset
   549
%    ymax=34,
cd43d8c6eb84 updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 434
diff changeset
   550
%    scaled ticks=false,
cd43d8c6eb84 updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 434
diff changeset
   551
%    axis lines=left,
cd43d8c6eb84 updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 434
diff changeset
   552
%    width=8cm,
cd43d8c6eb84 updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 434
diff changeset
   553
%    height=4.5cm, 
cd43d8c6eb84 updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 434
diff changeset
   554
%    legend entries={Java,Scala V1},  
cd43d8c6eb84 updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 434
diff changeset
   555
%    legend pos=north east,
cd43d8c6eb84 updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 434
diff changeset
   556
%    legend cell align=left]
cd43d8c6eb84 updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 434
diff changeset
   557
%\addplot[cyan,mark=*, mark options={fill=white}] table {re-java.data};
cd43d8c6eb84 updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 434
diff changeset
   558
%\addplot[red,mark=triangle*,mark options={fill=white}] table {re1a.data};
cd43d8c6eb84 updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 434
diff changeset
   559
%\end{axis}
cd43d8c6eb84 updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 434
diff changeset
   560
%\end{tikzpicture}
cd43d8c6eb84 updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 434
diff changeset
   561
%\end{center}
cd43d8c6eb84 updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 434
diff changeset
   562
%
cd43d8c6eb84 updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 434
diff changeset
   563
%\noindent
cd43d8c6eb84 updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 434
diff changeset
   564
%This is not an error: it hardly takes more than half a second for
cd43d8c6eb84 updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 434
diff changeset
   565
%strings up to the length of 6500. After that we receive a
cd43d8c6eb84 updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 434
diff changeset
   566
%StackOverflow exception, but still\ldots
414
065ca01b62ae updated
Christian Urban <urbanc@in.tum.de>
parents: 412
diff changeset
   567
065ca01b62ae updated
Christian Urban <urbanc@in.tum.de>
parents: 412
diff changeset
   568
For running the algorithm with our first example, the evil
566
b153c04834eb updated
Christian Urban <urbanc@in.tum.de>
parents: 550
diff changeset
   569
regular expression $a^?{}^{\{n\}}\cdot a^{\{n\}}$, we need to implement
488
598741d39d21 updated
Christian Urban <urbanc@in.tum.de>
parents: 481
diff changeset
   570
the optional regular expression and the `exactly $n$-times
598741d39d21 updated
Christian Urban <urbanc@in.tum.de>
parents: 481
diff changeset
   571
regular expression'. This can be done with the translations
261
24531cfaa36a updated handouts
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 259
diff changeset
   572
24531cfaa36a updated handouts
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 259
diff changeset
   573
\lstinputlisting[numbers=none]{../progs/app51.scala}
24531cfaa36a updated handouts
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 259
diff changeset
   574
414
065ca01b62ae updated
Christian Urban <urbanc@in.tum.de>
parents: 412
diff changeset
   575
\noindent Running the matcher with this example, we find it is
261
24531cfaa36a updated handouts
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 259
diff changeset
   576
slightly worse then the matcher in Ruby and Python.
262
ee4304bc6350 updated handouts
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 261
diff changeset
   577
Ooops\ldots
ee4304bc6350 updated handouts
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 261
diff changeset
   578
ee4304bc6350 updated handouts
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 261
diff changeset
   579
\begin{center}
414
065ca01b62ae updated
Christian Urban <urbanc@in.tum.de>
parents: 412
diff changeset
   580
\begin{tikzpicture}
065ca01b62ae updated
Christian Urban <urbanc@in.tum.de>
parents: 412
diff changeset
   581
\begin{axis}[    
415
4ae59fd3b174 updated
Christian Urban <urbanc@in.tum.de>
parents: 414
diff changeset
   582
    title={Graph: $a^{?\{n\}} \cdot a^{\{n\}}$ and strings $\underbrace{a\ldots a}_{n}$},
414
065ca01b62ae updated
Christian Urban <urbanc@in.tum.de>
parents: 412
diff changeset
   583
    xlabel={$n$},
065ca01b62ae updated
Christian Urban <urbanc@in.tum.de>
parents: 412
diff changeset
   584
    x label style={at={(1.05,0.0)}},
065ca01b62ae updated
Christian Urban <urbanc@in.tum.de>
parents: 412
diff changeset
   585
    ylabel={time in secs},
262
ee4304bc6350 updated handouts
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 261
diff changeset
   586
    enlargelimits=false,
ee4304bc6350 updated handouts
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 261
diff changeset
   587
    xtick={0,5,...,30},
415
4ae59fd3b174 updated
Christian Urban <urbanc@in.tum.de>
parents: 414
diff changeset
   588
    xmax=32,
414
065ca01b62ae updated
Christian Urban <urbanc@in.tum.de>
parents: 412
diff changeset
   589
    ytick={0,5,...,30},
262
ee4304bc6350 updated handouts
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 261
diff changeset
   590
    scaled ticks=false,
ee4304bc6350 updated handouts
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 261
diff changeset
   591
    axis lines=left,
ee4304bc6350 updated handouts
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 261
diff changeset
   592
    width=6cm,
ee4304bc6350 updated handouts
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 261
diff changeset
   593
    height=5cm, 
ee4304bc6350 updated handouts
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 261
diff changeset
   594
    legend entries={Python,Ruby,Scala V1},  
ee4304bc6350 updated handouts
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 261
diff changeset
   595
    legend pos=outer north east,
415
4ae59fd3b174 updated
Christian Urban <urbanc@in.tum.de>
parents: 414
diff changeset
   596
    legend cell align=left]
434
8664ff87cd77 updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 433
diff changeset
   597
\addplot[blue,mark=*, mark options={fill=white}] table {re-python.data};
8664ff87cd77 updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 433
diff changeset
   598
\addplot[brown,mark=pentagon*, mark options={fill=white}] table {re-ruby.data};  
8664ff87cd77 updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 433
diff changeset
   599
\addplot[red,mark=triangle*,mark options={fill=white}] table {re1.data};  
414
065ca01b62ae updated
Christian Urban <urbanc@in.tum.de>
parents: 412
diff changeset
   600
\end{axis}
065ca01b62ae updated
Christian Urban <urbanc@in.tum.de>
parents: 412
diff changeset
   601
\end{tikzpicture}
262
ee4304bc6350 updated handouts
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 261
diff changeset
   602
\end{center}
261
24531cfaa36a updated handouts
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 259
diff changeset
   603
488
598741d39d21 updated
Christian Urban <urbanc@in.tum.de>
parents: 481
diff changeset
   604
\noindent Analysing this failure we notice that for $a^{\{n\}}$, for
598741d39d21 updated
Christian Urban <urbanc@in.tum.de>
parents: 481
diff changeset
   605
example, we generate quite big regular expressions:
261
24531cfaa36a updated handouts
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 259
diff changeset
   606
24531cfaa36a updated handouts
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 259
diff changeset
   607
\begin{center}
24531cfaa36a updated handouts
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 259
diff changeset
   608
\begin{tabular}{rl}
24531cfaa36a updated handouts
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 259
diff changeset
   609
1: & $a$\\
24531cfaa36a updated handouts
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 259
diff changeset
   610
2: & $a\cdot a$\\
24531cfaa36a updated handouts
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 259
diff changeset
   611
3: & $a\cdot a\cdot a$\\
24531cfaa36a updated handouts
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 259
diff changeset
   612
& \ldots\\
24531cfaa36a updated handouts
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 259
diff changeset
   613
13: & $a\cdot a\cdot a\cdot a\cdot a\cdot a\cdot a\cdot a\cdot a\cdot a\cdot a\cdot a\cdot a$\\
262
ee4304bc6350 updated handouts
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 261
diff changeset
   614
& \ldots
261
24531cfaa36a updated handouts
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 259
diff changeset
   615
\end{tabular}
24531cfaa36a updated handouts
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 259
diff changeset
   616
\end{center}
24531cfaa36a updated handouts
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 259
diff changeset
   617
24531cfaa36a updated handouts
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 259
diff changeset
   618
\noindent Our algorithm traverses such regular expressions at
24531cfaa36a updated handouts
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 259
diff changeset
   619
least once every time a derivative is calculated. So having
262
ee4304bc6350 updated handouts
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 261
diff changeset
   620
large regular expressions will cause problems. This problem
399
5c1fbb39c93e updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 394
diff changeset
   621
is aggravated by $a^?$ being represented as $a + \ONE$.
262
ee4304bc6350 updated handouts
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 261
diff changeset
   622
488
598741d39d21 updated
Christian Urban <urbanc@in.tum.de>
parents: 481
diff changeset
   623
We can however fix this easily by having an explicit constructor for
262
ee4304bc6350 updated handouts
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 261
diff changeset
   624
$r^{\{n\}}$. In Scala we would introduce a constructor like
ee4304bc6350 updated handouts
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 261
diff changeset
   625
ee4304bc6350 updated handouts
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 261
diff changeset
   626
\begin{center}
ee4304bc6350 updated handouts
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 261
diff changeset
   627
\code{case class NTIMES(r: Rexp, n: Int) extends Rexp}
ee4304bc6350 updated handouts
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 261
diff changeset
   628
\end{center}
ee4304bc6350 updated handouts
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 261
diff changeset
   629
478
48b842c997c7 updated
Christian Urban <urbanc@in.tum.de>
parents: 477
diff changeset
   630
\noindent With this fix we have a constant ``size'' regular expression
48b842c997c7 updated
Christian Urban <urbanc@in.tum.de>
parents: 477
diff changeset
   631
for our running example no matter how large $n$ is (see the
48b842c997c7 updated
Christian Urban <urbanc@in.tum.de>
parents: 477
diff changeset
   632
\texttt{size} section in the implementations).  This means we have to
48b842c997c7 updated
Christian Urban <urbanc@in.tum.de>
parents: 477
diff changeset
   633
also add cases for \pcode{NTIMES} in the functions $\textit{nullable}$
48b842c997c7 updated
Christian Urban <urbanc@in.tum.de>
parents: 477
diff changeset
   634
and $\textit{der}$. Does the change have any effect?
262
ee4304bc6350 updated handouts
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 261
diff changeset
   635
ee4304bc6350 updated handouts
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 261
diff changeset
   636
\begin{center}
414
065ca01b62ae updated
Christian Urban <urbanc@in.tum.de>
parents: 412
diff changeset
   637
\begin{tikzpicture}
065ca01b62ae updated
Christian Urban <urbanc@in.tum.de>
parents: 412
diff changeset
   638
\begin{axis}[
415
4ae59fd3b174 updated
Christian Urban <urbanc@in.tum.de>
parents: 414
diff changeset
   639
    title={Graph: $a^{?\{n\}} \cdot a^{\{n\}}$ and strings $\underbrace{a\ldots a}_{n}$},
414
065ca01b62ae updated
Christian Urban <urbanc@in.tum.de>
parents: 412
diff changeset
   640
    xlabel={$n$},
065ca01b62ae updated
Christian Urban <urbanc@in.tum.de>
parents: 412
diff changeset
   641
    x label style={at={(1.01,0.0)}},
065ca01b62ae updated
Christian Urban <urbanc@in.tum.de>
parents: 412
diff changeset
   642
    ylabel={time in secs},
262
ee4304bc6350 updated handouts
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 261
diff changeset
   643
    enlargelimits=false,
477
b78664a24f5d updated
Christian Urban <urbanc@in.tum.de>
parents: 471
diff changeset
   644
    xtick={0,200,...,1100},
b78664a24f5d updated
Christian Urban <urbanc@in.tum.de>
parents: 471
diff changeset
   645
    xmax=1200,
414
065ca01b62ae updated
Christian Urban <urbanc@in.tum.de>
parents: 412
diff changeset
   646
    ytick={0,5,...,30},
262
ee4304bc6350 updated handouts
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 261
diff changeset
   647
    scaled ticks=false,
ee4304bc6350 updated handouts
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 261
diff changeset
   648
    axis lines=left,
414
065ca01b62ae updated
Christian Urban <urbanc@in.tum.de>
parents: 412
diff changeset
   649
    width=10cm,
262
ee4304bc6350 updated handouts
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 261
diff changeset
   650
    height=5cm, 
ee4304bc6350 updated handouts
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 261
diff changeset
   651
    legend entries={Python,Ruby,Scala V1,Scala V2},  
ee4304bc6350 updated handouts
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 261
diff changeset
   652
    legend pos=outer north east,
414
065ca01b62ae updated
Christian Urban <urbanc@in.tum.de>
parents: 412
diff changeset
   653
    legend cell align=left]
434
8664ff87cd77 updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 433
diff changeset
   654
\addplot[blue,mark=*, mark options={fill=white}] table {re-python.data};
8664ff87cd77 updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 433
diff changeset
   655
\addplot[brown,mark=pentagon*, mark options={fill=white}] table {re-ruby.data};  
8664ff87cd77 updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 433
diff changeset
   656
\addplot[red,mark=triangle*,mark options={fill=white}] table {re1.data};  
8664ff87cd77 updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 433
diff changeset
   657
\addplot[green,mark=square*,mark options={fill=white}] table {re2.data};
414
065ca01b62ae updated
Christian Urban <urbanc@in.tum.de>
parents: 412
diff changeset
   658
\end{axis}
065ca01b62ae updated
Christian Urban <urbanc@in.tum.de>
parents: 412
diff changeset
   659
\end{tikzpicture}
262
ee4304bc6350 updated handouts
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 261
diff changeset
   660
\end{center}
ee4304bc6350 updated handouts
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 261
diff changeset
   661
478
48b842c997c7 updated
Christian Urban <urbanc@in.tum.de>
parents: 477
diff changeset
   662
\noindent Now we are talking business! The modified matcher can within
48b842c997c7 updated
Christian Urban <urbanc@in.tum.de>
parents: 477
diff changeset
   663
25 seconds handle regular expressions up to $n = 1,100$ before a
48b842c997c7 updated
Christian Urban <urbanc@in.tum.de>
parents: 477
diff changeset
   664
StackOverflow is raised. Recall that Python and Ruby (and our first
48b842c997c7 updated
Christian Urban <urbanc@in.tum.de>
parents: 477
diff changeset
   665
version, Scala V1) could only handle $n = 27$ or so in 30
488
598741d39d21 updated
Christian Urban <urbanc@in.tum.de>
parents: 481
diff changeset
   666
seconds. We have not tried our algorithm on the second example $(a^*)^* \cdot
511
   667
b$---I leave this to you.
262
ee4304bc6350 updated handouts
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 261
diff changeset
   668
412
1cef3924f7a2 updated
Christian Urban <urbanc@in.tum.de>
parents: 399
diff changeset
   669
262
ee4304bc6350 updated handouts
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 261
diff changeset
   670
The moral is that our algorithm is rather sensitive to the
ee4304bc6350 updated handouts
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 261
diff changeset
   671
size of regular expressions it needs to handle. This is of
414
065ca01b62ae updated
Christian Urban <urbanc@in.tum.de>
parents: 412
diff changeset
   672
course obvious because both $\textit{nullable}$ and $\textit{der}$ frequently
325
794c599cee53 updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 318
diff changeset
   673
need to traverse the whole regular expression. There seems,
794c599cee53 updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 318
diff changeset
   674
however, one more issue for making the algorithm run faster.
794c599cee53 updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 318
diff changeset
   675
The derivative function often produces ``useless''
399
5c1fbb39c93e updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 394
diff changeset
   676
$\ZERO$s and $\ONE$s. To see this, consider $r = ((a
478
48b842c997c7 updated
Christian Urban <urbanc@in.tum.de>
parents: 477
diff changeset
   677
\cdot b) + b)^*$ and the following three derivatives
262
ee4304bc6350 updated handouts
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 261
diff changeset
   678
ee4304bc6350 updated handouts
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 261
diff changeset
   679
\begin{center}
ee4304bc6350 updated handouts
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 261
diff changeset
   680
\begin{tabular}{l}
414
065ca01b62ae updated
Christian Urban <urbanc@in.tum.de>
parents: 412
diff changeset
   681
$\textit{der}\,a\,r = ((\ONE \cdot b) + \ZERO) \cdot r$\\
065ca01b62ae updated
Christian Urban <urbanc@in.tum.de>
parents: 412
diff changeset
   682
$\textit{der}\,b\,r = ((\ZERO \cdot b) + \ONE)\cdot r$\\
065ca01b62ae updated
Christian Urban <urbanc@in.tum.de>
parents: 412
diff changeset
   683
$\textit{der}\,c\,r = ((\ZERO \cdot b) + \ZERO)\cdot r$
262
ee4304bc6350 updated handouts
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 261
diff changeset
   684
\end{tabular}
ee4304bc6350 updated handouts
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 261
diff changeset
   685
\end{center}
ee4304bc6350 updated handouts
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 261
diff changeset
   686
ee4304bc6350 updated handouts
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 261
diff changeset
   687
\noindent 
488
598741d39d21 updated
Christian Urban <urbanc@in.tum.de>
parents: 481
diff changeset
   688
If we simplify them according to the simplification rules from the
598741d39d21 updated
Christian Urban <urbanc@in.tum.de>
parents: 481
diff changeset
   689
beginning, we can replace the right-hand sides by the smaller
598741d39d21 updated
Christian Urban <urbanc@in.tum.de>
parents: 481
diff changeset
   690
equivalent regular expressions
262
ee4304bc6350 updated handouts
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 261
diff changeset
   691
ee4304bc6350 updated handouts
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 261
diff changeset
   692
\begin{center}
ee4304bc6350 updated handouts
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 261
diff changeset
   693
\begin{tabular}{l}
414
065ca01b62ae updated
Christian Urban <urbanc@in.tum.de>
parents: 412
diff changeset
   694
$\textit{der}\,a\,r \equiv b \cdot r$\\
065ca01b62ae updated
Christian Urban <urbanc@in.tum.de>
parents: 412
diff changeset
   695
$\textit{der}\,b\,r \equiv r$\\
065ca01b62ae updated
Christian Urban <urbanc@in.tum.de>
parents: 412
diff changeset
   696
$\textit{der}\,c\,r \equiv \ZERO$
262
ee4304bc6350 updated handouts
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 261
diff changeset
   697
\end{tabular}
ee4304bc6350 updated handouts
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 261
diff changeset
   698
\end{center}
ee4304bc6350 updated handouts
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 261
diff changeset
   699
ee4304bc6350 updated handouts
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 261
diff changeset
   700
\noindent I leave it to you to contemplate whether such a
478
48b842c997c7 updated
Christian Urban <urbanc@in.tum.de>
parents: 477
diff changeset
   701
simplification can have any impact on the correctness of our algorithm
48b842c997c7 updated
Christian Urban <urbanc@in.tum.de>
parents: 477
diff changeset
   702
(will it change any answers?). Figure~\ref{scala2} gives a
48b842c997c7 updated
Christian Urban <urbanc@in.tum.de>
parents: 477
diff changeset
   703
simplification function that recursively traverses a regular
48b842c997c7 updated
Christian Urban <urbanc@in.tum.de>
parents: 477
diff changeset
   704
expression and simplifies it according to the rules given at the
571
499007a7bce2 updated
Christian Urban <urbanc@in.tum.de>
parents: 566
diff changeset
   705
beginning. There are only rules for $+$ and $\cdot$. There is 
499007a7bce2 updated
Christian Urban <urbanc@in.tum.de>
parents: 566
diff changeset
   706
no simplification rule for a star, because
478
48b842c997c7 updated
Christian Urban <urbanc@in.tum.de>
parents: 477
diff changeset
   707
empirical data and also a little thought showed that simplifying under
48b842c997c7 updated
Christian Urban <urbanc@in.tum.de>
parents: 477
diff changeset
   708
a star is a waste of computation time. The simplification function
48b842c997c7 updated
Christian Urban <urbanc@in.tum.de>
parents: 477
diff changeset
   709
will be called after every derivation.  This additional step removes
48b842c997c7 updated
Christian Urban <urbanc@in.tum.de>
parents: 477
diff changeset
   710
all the ``junk'' the derivative function introduced. Does this improve
48b842c997c7 updated
Christian Urban <urbanc@in.tum.de>
parents: 477
diff changeset
   711
the speed? You bet!!
262
ee4304bc6350 updated handouts
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 261
diff changeset
   712
ee4304bc6350 updated handouts
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 261
diff changeset
   713
\begin{figure}[p]
477
b78664a24f5d updated
Christian Urban <urbanc@in.tum.de>
parents: 471
diff changeset
   714
\lstinputlisting[numbers=left,linebackgroundcolor=
b78664a24f5d updated
Christian Urban <urbanc@in.tum.de>
parents: 471
diff changeset
   715
  {\ifodd\value{lstnumber}\color{capri!3}\fi}]
b78664a24f5d updated
Christian Urban <urbanc@in.tum.de>
parents: 471
diff changeset
   716
                {../progs/app6.scala}
262
ee4304bc6350 updated handouts
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 261
diff changeset
   717
\caption{The simplification function and modified 
325
794c599cee53 updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 318
diff changeset
   718
\texttt{ders}-function; this function now
333
8890852e18b7 updated coursework
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 332
diff changeset
   719
calls \texttt{der} first, but then simplifies
343
539b2e88f5b9 updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 340
diff changeset
   720
the resulting derivative regular expressions before
539b2e88f5b9 updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 340
diff changeset
   721
building the next derivative, see
566
b153c04834eb updated
Christian Urban <urbanc@in.tum.de>
parents: 550
diff changeset
   722
Line~24.\label{scala2}}
262
ee4304bc6350 updated handouts
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 261
diff changeset
   723
\end{figure}
ee4304bc6350 updated handouts
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 261
diff changeset
   724
ee4304bc6350 updated handouts
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 261
diff changeset
   725
\begin{center}
268
18bef085a7ca updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 263
diff changeset
   726
\begin{tikzpicture}
414
065ca01b62ae updated
Christian Urban <urbanc@in.tum.de>
parents: 412
diff changeset
   727
\begin{axis}[
415
4ae59fd3b174 updated
Christian Urban <urbanc@in.tum.de>
parents: 414
diff changeset
   728
    title={Graph: $a^{?\{n\}} \cdot a^{\{n\}}$ and strings $\underbrace{a\ldots a}_{n}$},
414
065ca01b62ae updated
Christian Urban <urbanc@in.tum.de>
parents: 412
diff changeset
   729
    xlabel={$n$},
065ca01b62ae updated
Christian Urban <urbanc@in.tum.de>
parents: 412
diff changeset
   730
    x label style={at={(1.04,0.0)}},
065ca01b62ae updated
Christian Urban <urbanc@in.tum.de>
parents: 412
diff changeset
   731
    ylabel={time in secs},
262
ee4304bc6350 updated handouts
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 261
diff changeset
   732
    enlargelimits=false,
478
48b842c997c7 updated
Christian Urban <urbanc@in.tum.de>
parents: 477
diff changeset
   733
    xtick={0,2500,...,10000},
48b842c997c7 updated
Christian Urban <urbanc@in.tum.de>
parents: 477
diff changeset
   734
    xmax=12000,
268
18bef085a7ca updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 263
diff changeset
   735
    ytick={0,5,...,30},
443
cd43d8c6eb84 updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 434
diff changeset
   736
    ymax=32,
262
ee4304bc6350 updated handouts
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 261
diff changeset
   737
    scaled ticks=false,
ee4304bc6350 updated handouts
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 261
diff changeset
   738
    axis lines=left,
ee4304bc6350 updated handouts
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 261
diff changeset
   739
    width=9cm,
343
539b2e88f5b9 updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 340
diff changeset
   740
    height=5cm, 
415
4ae59fd3b174 updated
Christian Urban <urbanc@in.tum.de>
parents: 414
diff changeset
   741
    legend entries={Scala V2,Scala V3},
4ae59fd3b174 updated
Christian Urban <urbanc@in.tum.de>
parents: 414
diff changeset
   742
    legend pos=outer north east,
4ae59fd3b174 updated
Christian Urban <urbanc@in.tum.de>
parents: 414
diff changeset
   743
    legend cell align=left]
4ae59fd3b174 updated
Christian Urban <urbanc@in.tum.de>
parents: 414
diff changeset
   744
\addplot[green,mark=square*,mark options={fill=white}] table {re2.data};
268
18bef085a7ca updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 263
diff changeset
   745
\addplot[black,mark=square*,mark options={fill=white}] table {re3.data};
18bef085a7ca updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 263
diff changeset
   746
\end{axis}
18bef085a7ca updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 263
diff changeset
   747
\end{tikzpicture}
262
ee4304bc6350 updated handouts
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 261
diff changeset
   748
\end{center}
ee4304bc6350 updated handouts
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 261
diff changeset
   749
415
4ae59fd3b174 updated
Christian Urban <urbanc@in.tum.de>
parents: 414
diff changeset
   750
\noindent
510
cu
parents: 492
diff changeset
   751
To recap, Python and Ruby needed approximately 30 seconds to match a
478
48b842c997c7 updated
Christian Urban <urbanc@in.tum.de>
parents: 477
diff changeset
   752
string of 28 \texttt{a}s and the regular expression $a^{?\{n\}} \cdot
48b842c997c7 updated
Christian Urban <urbanc@in.tum.de>
parents: 477
diff changeset
   753
a^{\{n\}}$.  We need a third of this time to do the same with strings
566
b153c04834eb updated
Christian Urban <urbanc@in.tum.de>
parents: 550
diff changeset
   754
up to 11,000 \texttt{a}s.  Similarly, Java 8 and Python needed 30
478
48b842c997c7 updated
Christian Urban <urbanc@in.tum.de>
parents: 477
diff changeset
   755
seconds to find out the regular expression $(a^*)^* \cdot b$ does not
566
b153c04834eb updated
Christian Urban <urbanc@in.tum.de>
parents: 550
diff changeset
   756
match the string of 28 \texttt{a}s. In Java 9 and later this has been 
b153c04834eb updated
Christian Urban <urbanc@in.tum.de>
parents: 550
diff changeset
   757
cranked up to 39,000 \texttt{a}s, but we can do the same in the same 
571
499007a7bce2 updated
Christian Urban <urbanc@in.tum.de>
parents: 566
diff changeset
   758
amount of time for strings composed of nearly 6,000,000 \texttt{a}s. 
499007a7bce2 updated
Christian Urban <urbanc@in.tum.de>
parents: 566
diff changeset
   759
This is shown in the following plot.
415
4ae59fd3b174 updated
Christian Urban <urbanc@in.tum.de>
parents: 414
diff changeset
   760
4ae59fd3b174 updated
Christian Urban <urbanc@in.tum.de>
parents: 414
diff changeset
   761
414
065ca01b62ae updated
Christian Urban <urbanc@in.tum.de>
parents: 412
diff changeset
   762
\begin{center}
065ca01b62ae updated
Christian Urban <urbanc@in.tum.de>
parents: 412
diff changeset
   763
\begin{tikzpicture}
065ca01b62ae updated
Christian Urban <urbanc@in.tum.de>
parents: 412
diff changeset
   764
\begin{axis}[
415
4ae59fd3b174 updated
Christian Urban <urbanc@in.tum.de>
parents: 414
diff changeset
   765
    title={Graph: $(a^*)^* \cdot b$ and strings $\underbrace{a\ldots a}_{n}$},
414
065ca01b62ae updated
Christian Urban <urbanc@in.tum.de>
parents: 412
diff changeset
   766
    xlabel={$n$},
065ca01b62ae updated
Christian Urban <urbanc@in.tum.de>
parents: 412
diff changeset
   767
    ylabel={time in secs},
065ca01b62ae updated
Christian Urban <urbanc@in.tum.de>
parents: 412
diff changeset
   768
    enlargelimits=false,
478
48b842c997c7 updated
Christian Urban <urbanc@in.tum.de>
parents: 477
diff changeset
   769
    ymax=35,
414
065ca01b62ae updated
Christian Urban <urbanc@in.tum.de>
parents: 412
diff changeset
   770
    ytick={0,5,...,30},
065ca01b62ae updated
Christian Urban <urbanc@in.tum.de>
parents: 412
diff changeset
   771
    axis lines=left,
550
71fc4a7a7039 updated
Christian Urban <urbanc@in.tum.de>
parents: 512
diff changeset
   772
    %%scaled ticks=false,
478
48b842c997c7 updated
Christian Urban <urbanc@in.tum.de>
parents: 477
diff changeset
   773
    x label style={at={(1.09,0.0)}},
550
71fc4a7a7039 updated
Christian Urban <urbanc@in.tum.de>
parents: 512
diff changeset
   774
    %%xmax=7700000,
414
065ca01b62ae updated
Christian Urban <urbanc@in.tum.de>
parents: 412
diff changeset
   775
    width=9cm,
065ca01b62ae updated
Christian Urban <urbanc@in.tum.de>
parents: 412
diff changeset
   776
    height=5cm, 
478
48b842c997c7 updated
Christian Urban <urbanc@in.tum.de>
parents: 477
diff changeset
   777
    legend entries={Scala V3},
415
4ae59fd3b174 updated
Christian Urban <urbanc@in.tum.de>
parents: 414
diff changeset
   778
    legend pos=outer north east,
4ae59fd3b174 updated
Christian Urban <urbanc@in.tum.de>
parents: 414
diff changeset
   779
    legend cell align=left]
478
48b842c997c7 updated
Christian Urban <urbanc@in.tum.de>
parents: 477
diff changeset
   780
%\addplot[green,mark=square*,mark options={fill=white}] table {re2a.data};
414
065ca01b62ae updated
Christian Urban <urbanc@in.tum.de>
parents: 412
diff changeset
   781
\addplot[black,mark=square*,mark options={fill=white}] table {re3a.data};
065ca01b62ae updated
Christian Urban <urbanc@in.tum.de>
parents: 412
diff changeset
   782
\end{axis}
065ca01b62ae updated
Christian Urban <urbanc@in.tum.de>
parents: 412
diff changeset
   783
\end{tikzpicture}
065ca01b62ae updated
Christian Urban <urbanc@in.tum.de>
parents: 412
diff changeset
   784
\end{center}
065ca01b62ae updated
Christian Urban <urbanc@in.tum.de>
parents: 412
diff changeset
   785
415
4ae59fd3b174 updated
Christian Urban <urbanc@in.tum.de>
parents: 414
diff changeset
   786
\subsection*{Epilogue}
4ae59fd3b174 updated
Christian Urban <urbanc@in.tum.de>
parents: 414
diff changeset
   787
550
71fc4a7a7039 updated
Christian Urban <urbanc@in.tum.de>
parents: 512
diff changeset
   788
(23/Aug/2016) I found another place where this algorithm can
488
598741d39d21 updated
Christian Urban <urbanc@in.tum.de>
parents: 481
diff changeset
   789
be sped up (this idea is not integrated with what is coming next, but
598741d39d21 updated
Christian Urban <urbanc@in.tum.de>
parents: 481
diff changeset
   790
I present it nonetheless). The idea is to not define \texttt{ders}
598741d39d21 updated
Christian Urban <urbanc@in.tum.de>
parents: 481
diff changeset
   791
that it iterates the derivative character-by-character, but in bigger
598741d39d21 updated
Christian Urban <urbanc@in.tum.de>
parents: 481
diff changeset
   792
chunks. The resulting code for \texttt{ders2} looks as follows:
415
4ae59fd3b174 updated
Christian Urban <urbanc@in.tum.de>
parents: 414
diff changeset
   793
4ae59fd3b174 updated
Christian Urban <urbanc@in.tum.de>
parents: 414
diff changeset
   794
\lstinputlisting[numbers=none]{../progs/app52.scala} 
4ae59fd3b174 updated
Christian Urban <urbanc@in.tum.de>
parents: 414
diff changeset
   795
4ae59fd3b174 updated
Christian Urban <urbanc@in.tum.de>
parents: 414
diff changeset
   796
\noindent
4ae59fd3b174 updated
Christian Urban <urbanc@in.tum.de>
parents: 414
diff changeset
   797
I have not fully understood why this version is much faster,
4ae59fd3b174 updated
Christian Urban <urbanc@in.tum.de>
parents: 414
diff changeset
   798
but it seems it is a combination of the clauses for \texttt{ALT}
4ae59fd3b174 updated
Christian Urban <urbanc@in.tum.de>
parents: 414
diff changeset
   799
and \texttt{SEQ}. In the latter case we call \texttt{der} with 
4ae59fd3b174 updated
Christian Urban <urbanc@in.tum.de>
parents: 414
diff changeset
   800
a single character and this potentially produces an alternative.
510
cu
parents: 492
diff changeset
   801
The derivative of such an alternative can then be more efficiently  
415
4ae59fd3b174 updated
Christian Urban <urbanc@in.tum.de>
parents: 414
diff changeset
   802
calculated by \texttt{ders2} since it pushes a whole string
4ae59fd3b174 updated
Christian Urban <urbanc@in.tum.de>
parents: 414
diff changeset
   803
under an \texttt{ALT}. The numbers are that in the second case  
4ae59fd3b174 updated
Christian Urban <urbanc@in.tum.de>
parents: 414
diff changeset
   804
$(a^*)^* \cdot b$ both versions are pretty much the same, but in the 
4ae59fd3b174 updated
Christian Urban <urbanc@in.tum.de>
parents: 414
diff changeset
   805
first case $a^{?\{n\}} \cdot a^{\{n\}}$ the improvement gives 
4ae59fd3b174 updated
Christian Urban <urbanc@in.tum.de>
parents: 414
diff changeset
   806
another factor of 100 speedup. Nice!
414
065ca01b62ae updated
Christian Urban <urbanc@in.tum.de>
parents: 412
diff changeset
   807
415
4ae59fd3b174 updated
Christian Urban <urbanc@in.tum.de>
parents: 414
diff changeset
   808
\begin{center}
4ae59fd3b174 updated
Christian Urban <urbanc@in.tum.de>
parents: 414
diff changeset
   809
\begin{tabular}{cc}
4ae59fd3b174 updated
Christian Urban <urbanc@in.tum.de>
parents: 414
diff changeset
   810
\begin{tikzpicture}
4ae59fd3b174 updated
Christian Urban <urbanc@in.tum.de>
parents: 414
diff changeset
   811
\begin{axis}[
4ae59fd3b174 updated
Christian Urban <urbanc@in.tum.de>
parents: 414
diff changeset
   812
    title={Graph: $a^{?\{n\}} \cdot a^{\{n\}}$ and strings $\underbrace{a\ldots a}_{n}$},
4ae59fd3b174 updated
Christian Urban <urbanc@in.tum.de>
parents: 414
diff changeset
   813
    xlabel={$n$},
4ae59fd3b174 updated
Christian Urban <urbanc@in.tum.de>
parents: 414
diff changeset
   814
    x label style={at={(1.04,0.0)}},
4ae59fd3b174 updated
Christian Urban <urbanc@in.tum.de>
parents: 414
diff changeset
   815
    ylabel={time in secs},
4ae59fd3b174 updated
Christian Urban <urbanc@in.tum.de>
parents: 414
diff changeset
   816
    enlargelimits=false,
4ae59fd3b174 updated
Christian Urban <urbanc@in.tum.de>
parents: 414
diff changeset
   817
    xmax=7100000,
4ae59fd3b174 updated
Christian Urban <urbanc@in.tum.de>
parents: 414
diff changeset
   818
    ytick={0,5,...,30},
4ae59fd3b174 updated
Christian Urban <urbanc@in.tum.de>
parents: 414
diff changeset
   819
    ymax=33,
4ae59fd3b174 updated
Christian Urban <urbanc@in.tum.de>
parents: 414
diff changeset
   820
    %scaled ticks=false,
4ae59fd3b174 updated
Christian Urban <urbanc@in.tum.de>
parents: 414
diff changeset
   821
    axis lines=left,
488
598741d39d21 updated
Christian Urban <urbanc@in.tum.de>
parents: 481
diff changeset
   822
    width=5.3cm,
415
4ae59fd3b174 updated
Christian Urban <urbanc@in.tum.de>
parents: 414
diff changeset
   823
    height=5cm, 
4ae59fd3b174 updated
Christian Urban <urbanc@in.tum.de>
parents: 414
diff changeset
   824
    legend entries={Scala V3, Scala V4},
443
cd43d8c6eb84 updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 434
diff changeset
   825
    legend style={at={(0.1,-0.2)},anchor=north}]
415
4ae59fd3b174 updated
Christian Urban <urbanc@in.tum.de>
parents: 414
diff changeset
   826
\addplot[black,mark=square*,mark options={fill=white}] table {re3.data};
4ae59fd3b174 updated
Christian Urban <urbanc@in.tum.de>
parents: 414
diff changeset
   827
\addplot[purple,mark=square*,mark options={fill=white}] table {re4.data};
4ae59fd3b174 updated
Christian Urban <urbanc@in.tum.de>
parents: 414
diff changeset
   828
\end{axis}
4ae59fd3b174 updated
Christian Urban <urbanc@in.tum.de>
parents: 414
diff changeset
   829
\end{tikzpicture}
4ae59fd3b174 updated
Christian Urban <urbanc@in.tum.de>
parents: 414
diff changeset
   830
&
4ae59fd3b174 updated
Christian Urban <urbanc@in.tum.de>
parents: 414
diff changeset
   831
\begin{tikzpicture}
4ae59fd3b174 updated
Christian Urban <urbanc@in.tum.de>
parents: 414
diff changeset
   832
\begin{axis}[
4ae59fd3b174 updated
Christian Urban <urbanc@in.tum.de>
parents: 414
diff changeset
   833
    title={Graph: $(a^*)^* \cdot b$ and strings $\underbrace{a\ldots a}_{n}$},
4ae59fd3b174 updated
Christian Urban <urbanc@in.tum.de>
parents: 414
diff changeset
   834
    xlabel={$n$},
4ae59fd3b174 updated
Christian Urban <urbanc@in.tum.de>
parents: 414
diff changeset
   835
    x label style={at={(1.09,0.0)}},
4ae59fd3b174 updated
Christian Urban <urbanc@in.tum.de>
parents: 414
diff changeset
   836
    ylabel={time in secs},
4ae59fd3b174 updated
Christian Urban <urbanc@in.tum.de>
parents: 414
diff changeset
   837
    enlargelimits=false,
488
598741d39d21 updated
Christian Urban <urbanc@in.tum.de>
parents: 481
diff changeset
   838
    xmax=8200000,
415
4ae59fd3b174 updated
Christian Urban <urbanc@in.tum.de>
parents: 414
diff changeset
   839
    ytick={0,5,...,30},
4ae59fd3b174 updated
Christian Urban <urbanc@in.tum.de>
parents: 414
diff changeset
   840
    ymax=33,
4ae59fd3b174 updated
Christian Urban <urbanc@in.tum.de>
parents: 414
diff changeset
   841
    %scaled ticks=false,
4ae59fd3b174 updated
Christian Urban <urbanc@in.tum.de>
parents: 414
diff changeset
   842
    axis lines=left,
488
598741d39d21 updated
Christian Urban <urbanc@in.tum.de>
parents: 481
diff changeset
   843
    width=5.3cm,
415
4ae59fd3b174 updated
Christian Urban <urbanc@in.tum.de>
parents: 414
diff changeset
   844
    height=5cm, 
4ae59fd3b174 updated
Christian Urban <urbanc@in.tum.de>
parents: 414
diff changeset
   845
    legend entries={Scala V3, Scala V4},
443
cd43d8c6eb84 updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 434
diff changeset
   846
    legend style={at={(0.1,-0.2)},anchor=north}]
415
4ae59fd3b174 updated
Christian Urban <urbanc@in.tum.de>
parents: 414
diff changeset
   847
\addplot[black,mark=square*,mark options={fill=white}] table {re3a.data};
4ae59fd3b174 updated
Christian Urban <urbanc@in.tum.de>
parents: 414
diff changeset
   848
\addplot[purple,mark=square*,mark options={fill=white}] table {re4a.data};
4ae59fd3b174 updated
Christian Urban <urbanc@in.tum.de>
parents: 414
diff changeset
   849
\end{axis}
4ae59fd3b174 updated
Christian Urban <urbanc@in.tum.de>
parents: 414
diff changeset
   850
\end{tikzpicture}
4ae59fd3b174 updated
Christian Urban <urbanc@in.tum.de>
parents: 414
diff changeset
   851
\end{tabular}
4ae59fd3b174 updated
Christian Urban <urbanc@in.tum.de>
parents: 414
diff changeset
   852
\end{center}
414
065ca01b62ae updated
Christian Urban <urbanc@in.tum.de>
parents: 412
diff changeset
   853
412
1cef3924f7a2 updated
Christian Urban <urbanc@in.tum.de>
parents: 399
diff changeset
   854
334
fd89a63e9db3 updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 333
diff changeset
   855
\section*{Proofs}
fd89a63e9db3 updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 333
diff changeset
   856
339
bc395ccfba7f updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 338
diff changeset
   857
You might not like doing proofs. But they serve a very
343
539b2e88f5b9 updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 340
diff changeset
   858
important purpose in Computer Science: How can we be sure that
488
598741d39d21 updated
Christian Urban <urbanc@in.tum.de>
parents: 481
diff changeset
   859
our algorithm matches its specification? We can try to test
343
539b2e88f5b9 updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 340
diff changeset
   860
the algorithm, but that often overlooks corner cases and an
539b2e88f5b9 updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 340
diff changeset
   861
exhaustive testing is impossible (since there are infinitely
539b2e88f5b9 updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 340
diff changeset
   862
many inputs). Proofs allow us to ensure that an algorithm
539b2e88f5b9 updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 340
diff changeset
   863
really meets its specification. 
338
f16120cb4e19 updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 334
diff changeset
   864
339
bc395ccfba7f updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 338
diff changeset
   865
For the programs we look at in this module, the proofs will
bc395ccfba7f updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 338
diff changeset
   866
mostly by some form of induction. Remember that regular
bc395ccfba7f updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 338
diff changeset
   867
expressions are defined as 
bc395ccfba7f updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 338
diff changeset
   868
bc395ccfba7f updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 338
diff changeset
   869
\begin{center}
bc395ccfba7f updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 338
diff changeset
   870
\begin{tabular}{r@{\hspace{1mm}}r@{\hspace{1mm}}l@{\hspace{13mm}}l}
512
a6aa52ecc1c5 updated
cu
parents: 511
diff changeset
   871
  $r$ & $::=$ &   $\ZERO$         & nothing\\
399
5c1fbb39c93e updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 394
diff changeset
   872
        & $\mid$ & $\ONE$           & empty string / \texttt{""} / []\\
339
bc395ccfba7f updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 338
diff changeset
   873
        & $\mid$ & $c$                  & single character\\
bc395ccfba7f updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 338
diff changeset
   874
        & $\mid$ & $r_1 + r_2$          & alternative / choice\\
bc395ccfba7f updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 338
diff changeset
   875
        & $\mid$ & $r_1 \cdot r_2$      & sequence\\
bc395ccfba7f updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 338
diff changeset
   876
        & $\mid$ & $r^*$                & star (zero or more)\\
bc395ccfba7f updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 338
diff changeset
   877
  \end{tabular}
bc395ccfba7f updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 338
diff changeset
   878
\end{center}
bc395ccfba7f updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 338
diff changeset
   879
488
598741d39d21 updated
Christian Urban <urbanc@in.tum.de>
parents: 481
diff changeset
   880
\noindent If you want to show a property $P(r)$ for \emph{all} 
339
bc395ccfba7f updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 338
diff changeset
   881
regular expressions $r$, then you have to follow essentially 
bc395ccfba7f updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 338
diff changeset
   882
the recipe:
bc395ccfba7f updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 338
diff changeset
   883
bc395ccfba7f updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 338
diff changeset
   884
\begin{itemize}
399
5c1fbb39c93e updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 394
diff changeset
   885
\item $P$ has to hold for $\ZERO$, $\ONE$ and $c$
339
bc395ccfba7f updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 338
diff changeset
   886
 (these are the base cases).
bc395ccfba7f updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 338
diff changeset
   887
\item $P$ has to hold for $r_1 + r_2$ under the assumption 
bc395ccfba7f updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 338
diff changeset
   888
   that $P$ already holds for $r_1$ and $r_2$.
bc395ccfba7f updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 338
diff changeset
   889
\item $P$ has to hold for $r_1 \cdot r_2$ under the 
bc395ccfba7f updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 338
diff changeset
   890
  assumption that $P$ already holds for $r_1$ and $r_2$.
bc395ccfba7f updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 338
diff changeset
   891
\item $P$ has to hold for $r^*$ under the assumption 
bc395ccfba7f updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 338
diff changeset
   892
  that $P$ already holds for $r$.
bc395ccfba7f updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 338
diff changeset
   893
\end{itemize}
bc395ccfba7f updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 338
diff changeset
   894
bc395ccfba7f updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 338
diff changeset
   895
\noindent 
bc395ccfba7f updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 338
diff changeset
   896
A simple proof is for example showing the following 
bc395ccfba7f updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 338
diff changeset
   897
property:
bc395ccfba7f updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 338
diff changeset
   898
343
539b2e88f5b9 updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 340
diff changeset
   899
\begin{equation}
412
1cef3924f7a2 updated
Christian Urban <urbanc@in.tum.de>
parents: 399
diff changeset
   900
\textit{nullable}(r) \;\;\text{if and only if}\;\; []\in L(r)
343
539b2e88f5b9 updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 340
diff changeset
   901
\label{nullableprop}
539b2e88f5b9 updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 340
diff changeset
   902
\end{equation}
339
bc395ccfba7f updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 338
diff changeset
   903
bc395ccfba7f updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 338
diff changeset
   904
\noindent
bc395ccfba7f updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 338
diff changeset
   905
Let us say that this property is $P(r)$, then the first case
399
5c1fbb39c93e updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 394
diff changeset
   906
we need to check is whether $P(\ZERO)$ (see recipe 
339
bc395ccfba7f updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 338
diff changeset
   907
above). So we have to show that
bc395ccfba7f updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 338
diff changeset
   908
bc395ccfba7f updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 338
diff changeset
   909
\[
412
1cef3924f7a2 updated
Christian Urban <urbanc@in.tum.de>
parents: 399
diff changeset
   910
\textit{nullable}(\ZERO) \;\;\text{if and only if}\;\; 
399
5c1fbb39c93e updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 394
diff changeset
   911
[]\in L(\ZERO)
339
bc395ccfba7f updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 338
diff changeset
   912
\]
bc395ccfba7f updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 338
diff changeset
   913
412
1cef3924f7a2 updated
Christian Urban <urbanc@in.tum.de>
parents: 399
diff changeset
   914
\noindent whereby $\textit{nullable}(\ZERO)$ is by definition of
1cef3924f7a2 updated
Christian Urban <urbanc@in.tum.de>
parents: 399
diff changeset
   915
the function $\textit{nullable}$ always $\textit{false}$. We also have
399
5c1fbb39c93e updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 394
diff changeset
   916
that $L(\ZERO)$ is by definition $\{\}$. It is
343
539b2e88f5b9 updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 340
diff changeset
   917
impossible that the empty string $[]$ is in the empty set.
339
bc395ccfba7f updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 338
diff changeset
   918
Therefore also the right-hand side is false. Consequently we
343
539b2e88f5b9 updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 340
diff changeset
   919
verified this case: both sides are false. We would still need
399
5c1fbb39c93e updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 394
diff changeset
   920
to do this for $P(\ONE)$ and $P(c)$. I leave this to
343
539b2e88f5b9 updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 340
diff changeset
   921
you to verify.
340
c49122dbcdd1 updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 339
diff changeset
   922
c49122dbcdd1 updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 339
diff changeset
   923
Next we need to check the inductive cases, for example
c49122dbcdd1 updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 339
diff changeset
   924
$P(r_1 + r_2)$, which is
c49122dbcdd1 updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 339
diff changeset
   925
343
539b2e88f5b9 updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 340
diff changeset
   926
\begin{equation}
412
1cef3924f7a2 updated
Christian Urban <urbanc@in.tum.de>
parents: 399
diff changeset
   927
\textit{nullable}(r_1 + r_2) \;\;\text{if and only if}\;\; 
340
c49122dbcdd1 updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 339
diff changeset
   928
[]\in L(r_1 + r_2)
343
539b2e88f5b9 updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 340
diff changeset
   929
\label{propalt}
539b2e88f5b9 updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 340
diff changeset
   930
\end{equation}
340
c49122dbcdd1 updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 339
diff changeset
   931
488
598741d39d21 updated
Christian Urban <urbanc@in.tum.de>
parents: 481
diff changeset
   932
\noindent The difference to the base cases is that in the inductive
598741d39d21 updated
Christian Urban <urbanc@in.tum.de>
parents: 481
diff changeset
   933
cases we can already assume we proved $P$ for the components, that is
598741d39d21 updated
Christian Urban <urbanc@in.tum.de>
parents: 481
diff changeset
   934
we can assume.
340
c49122dbcdd1 updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 339
diff changeset
   935
c49122dbcdd1 updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 339
diff changeset
   936
\begin{center}
c49122dbcdd1 updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 339
diff changeset
   937
\begin{tabular}{l}
412
1cef3924f7a2 updated
Christian Urban <urbanc@in.tum.de>
parents: 399
diff changeset
   938
$\textit{nullable}(r_1) \;\;\text{if and only if}\;\; []\in L(r_1)$ and\\
1cef3924f7a2 updated
Christian Urban <urbanc@in.tum.de>
parents: 399
diff changeset
   939
$\textit{nullable}(r_2) \;\;\text{if and only if}\;\; []\in L(r_2)$\\
340
c49122dbcdd1 updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 339
diff changeset
   940
\end{tabular}
c49122dbcdd1 updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 339
diff changeset
   941
\end{center}
c49122dbcdd1 updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 339
diff changeset
   942
488
598741d39d21 updated
Christian Urban <urbanc@in.tum.de>
parents: 481
diff changeset
   943
\noindent These are called the induction hypotheses. To check this 
412
1cef3924f7a2 updated
Christian Urban <urbanc@in.tum.de>
parents: 399
diff changeset
   944
case, we can start from $\textit{nullable}(r_1 + r_2)$, which by 
488
598741d39d21 updated
Christian Urban <urbanc@in.tum.de>
parents: 481
diff changeset
   945
definition of $\textit{nullable}$ is
340
c49122dbcdd1 updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 339
diff changeset
   946
c49122dbcdd1 updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 339
diff changeset
   947
\[
412
1cef3924f7a2 updated
Christian Urban <urbanc@in.tum.de>
parents: 399
diff changeset
   948
\textit{nullable}(r_1) \vee \textit{nullable}(r_2)
340
c49122dbcdd1 updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 339
diff changeset
   949
\]
c49122dbcdd1 updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 339
diff changeset
   950
343
539b2e88f5b9 updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 340
diff changeset
   951
\noindent Using the two induction hypotheses from above,
539b2e88f5b9 updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 340
diff changeset
   952
we can transform this into 
340
c49122dbcdd1 updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 339
diff changeset
   953
c49122dbcdd1 updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 339
diff changeset
   954
\[
c49122dbcdd1 updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 339
diff changeset
   955
[] \in L(r_1) \vee []\in(r_2)
c49122dbcdd1 updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 339
diff changeset
   956
\]
c49122dbcdd1 updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 339
diff changeset
   957
412
1cef3924f7a2 updated
Christian Urban <urbanc@in.tum.de>
parents: 399
diff changeset
   958
\noindent We just replaced the $\textit{nullable}(\ldots)$ parts by
340
c49122dbcdd1 updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 339
diff changeset
   959
the equivalent $[] \in L(\ldots)$ from the induction 
c49122dbcdd1 updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 339
diff changeset
   960
hypotheses. A bit of thinking convinces you that if
343
539b2e88f5b9 updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 340
diff changeset
   961
$[] \in L(r_1) \vee []\in L(r_2)$ then the empty string
340
c49122dbcdd1 updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 339
diff changeset
   962
must be in the union $L(r_1)\cup L(r_2)$, that is
c49122dbcdd1 updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 339
diff changeset
   963
c49122dbcdd1 updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 339
diff changeset
   964
\[
c49122dbcdd1 updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 339
diff changeset
   965
[] \in L(r_1)\cup L(r_2)
c49122dbcdd1 updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 339
diff changeset
   966
\]
c49122dbcdd1 updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 339
diff changeset
   967
488
598741d39d21 updated
Christian Urban <urbanc@in.tum.de>
parents: 481
diff changeset
   968
\noindent but this is by definition of $L$ exactly $[] \in L(r_1 +
598741d39d21 updated
Christian Urban <urbanc@in.tum.de>
parents: 481
diff changeset
   969
r_2)$, which we needed to establish according to statement in
343
539b2e88f5b9 updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 340
diff changeset
   970
\eqref{propalt}. What we have shown is that starting from
412
1cef3924f7a2 updated
Christian Urban <urbanc@in.tum.de>
parents: 399
diff changeset
   971
$\textit{nullable}(r_1 + r_2)$ we have done equivalent transformations
488
598741d39d21 updated
Christian Urban <urbanc@in.tum.de>
parents: 481
diff changeset
   972
to end up with $[] \in L(r_1 + r_2)$. Consequently we have established
598741d39d21 updated
Christian Urban <urbanc@in.tum.de>
parents: 481
diff changeset
   973
that $P(r_1 + r_2)$ holds.
340
c49122dbcdd1 updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 339
diff changeset
   974
c49122dbcdd1 updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 339
diff changeset
   975
In order to complete the proof we would now need to look 
343
539b2e88f5b9 updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 340
diff changeset
   976
at the cases \mbox{$P(r_1\cdot r_2)$} and $P(r^*)$. Again I let you
340
c49122dbcdd1 updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 339
diff changeset
   977
check the details.
c49122dbcdd1 updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 339
diff changeset
   978
488
598741d39d21 updated
Christian Urban <urbanc@in.tum.de>
parents: 481
diff changeset
   979
You might also have to do induction proofs over strings. 
340
c49122dbcdd1 updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 339
diff changeset
   980
That means you want to establish a property $P(s)$ for all
c49122dbcdd1 updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 339
diff changeset
   981
strings $s$. For this remember strings are lists of 
c49122dbcdd1 updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 339
diff changeset
   982
characters. These lists can be either the empty list or a
c49122dbcdd1 updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 339
diff changeset
   983
list of the form $c::s$. If you want to perform an induction
c49122dbcdd1 updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 339
diff changeset
   984
proof for strings you need to consider the cases
c49122dbcdd1 updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 339
diff changeset
   985
c49122dbcdd1 updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 339
diff changeset
   986
\begin{itemize}
c49122dbcdd1 updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 339
diff changeset
   987
\item $P$ has to hold for $[]$ (this is the base case).
c49122dbcdd1 updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 339
diff changeset
   988
\item $P$ has to hold for $c::s$ under the assumption 
c49122dbcdd1 updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 339
diff changeset
   989
   that $P$ already holds for $s$.
c49122dbcdd1 updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 339
diff changeset
   990
\end{itemize}
c49122dbcdd1 updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 339
diff changeset
   991
c49122dbcdd1 updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 339
diff changeset
   992
\noindent
c49122dbcdd1 updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 339
diff changeset
   993
Given this recipe, I let you show
c49122dbcdd1 updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 339
diff changeset
   994
343
539b2e88f5b9 updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 340
diff changeset
   995
\begin{equation}
414
065ca01b62ae updated
Christian Urban <urbanc@in.tum.de>
parents: 412
diff changeset
   996
\textit{Ders}\,s\,(L(r)) = L(\textit{ders}\,s\,r)
343
539b2e88f5b9 updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 340
diff changeset
   997
\label{dersprop}
539b2e88f5b9 updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 340
diff changeset
   998
\end{equation}
340
c49122dbcdd1 updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 339
diff changeset
   999
414
065ca01b62ae updated
Christian Urban <urbanc@in.tum.de>
parents: 412
diff changeset
  1000
\noindent by induction on $s$. Recall $\textit{Der}$ is defined for 
065ca01b62ae updated
Christian Urban <urbanc@in.tum.de>
parents: 412
diff changeset
  1001
character---see \eqref{Der}; $\textit{Ders}$ is similar, but for strings:
399
5c1fbb39c93e updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 394
diff changeset
  1002
5c1fbb39c93e updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 394
diff changeset
  1003
\[
414
065ca01b62ae updated
Christian Urban <urbanc@in.tum.de>
parents: 412
diff changeset
  1004
\textit{Ders}\,s\,A\;\dn\;\{s'\,|\,s @ s' \in A\}
399
5c1fbb39c93e updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 394
diff changeset
  1005
\]
5c1fbb39c93e updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 394
diff changeset
  1006
5c1fbb39c93e updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 394
diff changeset
  1007
\noindent In this proof you can assume the following property
414
065ca01b62ae updated
Christian Urban <urbanc@in.tum.de>
parents: 412
diff changeset
  1008
for $der$ and $\textit{Der}$ has already been proved, that is you can
399
5c1fbb39c93e updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 394
diff changeset
  1009
assume
340
c49122dbcdd1 updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 339
diff changeset
  1010
c49122dbcdd1 updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 339
diff changeset
  1011
\[
414
065ca01b62ae updated
Christian Urban <urbanc@in.tum.de>
parents: 412
diff changeset
  1012
L(\textit{der}\,c\,r) = \textit{Der}\,c\,(L(r))
340
c49122dbcdd1 updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 339
diff changeset
  1013
\]
c49122dbcdd1 updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 339
diff changeset
  1014
488
598741d39d21 updated
Christian Urban <urbanc@in.tum.de>
parents: 481
diff changeset
  1015
\noindent holds (this would be of course another property that needs
598741d39d21 updated
Christian Urban <urbanc@in.tum.de>
parents: 481
diff changeset
  1016
to be proved in a side-lemma by induction on $r$). This is a bit
598741d39d21 updated
Christian Urban <urbanc@in.tum.de>
parents: 481
diff changeset
  1017
more challenging, but not impossible.
338
f16120cb4e19 updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 334
diff changeset
  1018
343
539b2e88f5b9 updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 340
diff changeset
  1019
To sum up, using reasoning like the one shown above allows us 
539b2e88f5b9 updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 340
diff changeset
  1020
to show the correctness of our algorithm. To see this,
539b2e88f5b9 updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 340
diff changeset
  1021
start from the specification
539b2e88f5b9 updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 340
diff changeset
  1022
539b2e88f5b9 updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 340
diff changeset
  1023
\[
539b2e88f5b9 updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 340
diff changeset
  1024
s \in L(r)
539b2e88f5b9 updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 340
diff changeset
  1025
\]
539b2e88f5b9 updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 340
diff changeset
  1026
539b2e88f5b9 updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 340
diff changeset
  1027
\noindent That is the problem we want to solve. Thinking a 
539b2e88f5b9 updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 340
diff changeset
  1028
little, you will see that this problem is equivalent to the 
539b2e88f5b9 updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 340
diff changeset
  1029
following problem
539b2e88f5b9 updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 340
diff changeset
  1030
539b2e88f5b9 updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 340
diff changeset
  1031
\begin{equation}
414
065ca01b62ae updated
Christian Urban <urbanc@in.tum.de>
parents: 412
diff changeset
  1032
[] \in \textit{Ders}\,s\,(L(r))
343
539b2e88f5b9 updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 340
diff changeset
  1033
\label{dersstep}
539b2e88f5b9 updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 340
diff changeset
  1034
\end{equation}
539b2e88f5b9 updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 340
diff changeset
  1035
488
598741d39d21 updated
Christian Urban <urbanc@in.tum.de>
parents: 481
diff changeset
  1036
\noindent You agree?  But we have shown above in \eqref{dersprop},
598741d39d21 updated
Christian Urban <urbanc@in.tum.de>
parents: 481
diff changeset
  1037
that the $\textit{Ders}$ can be replaced by
598741d39d21 updated
Christian Urban <urbanc@in.tum.de>
parents: 481
diff changeset
  1038
$L(\textit{ders}\ldots)$. That means \eqref{dersstep} is equivalent to
343
539b2e88f5b9 updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 340
diff changeset
  1039
539b2e88f5b9 updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 340
diff changeset
  1040
\begin{equation}
414
065ca01b62ae updated
Christian Urban <urbanc@in.tum.de>
parents: 412
diff changeset
  1041
[] \in L(\textit{ders}\,s\,r)
343
539b2e88f5b9 updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 340
diff changeset
  1042
\label{prefinalstep}
539b2e88f5b9 updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 340
diff changeset
  1043
\end{equation}
539b2e88f5b9 updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 340
diff changeset
  1044
539b2e88f5b9 updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 340
diff changeset
  1045
\noindent We have also shown that testing whether the empty
412
1cef3924f7a2 updated
Christian Urban <urbanc@in.tum.de>
parents: 399
diff changeset
  1046
string is in a language is equivalent to the $\textit{nullable}$
343
539b2e88f5b9 updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 340
diff changeset
  1047
function; see \eqref{nullableprop}. That means
539b2e88f5b9 updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 340
diff changeset
  1048
\eqref{prefinalstep} is equivalent with
539b2e88f5b9 updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 340
diff changeset
  1049
 
539b2e88f5b9 updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 340
diff changeset
  1050
\[
414
065ca01b62ae updated
Christian Urban <urbanc@in.tum.de>
parents: 412
diff changeset
  1051
\textit{nullable}(\textit{ders}\,s\,r)
343
539b2e88f5b9 updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 340
diff changeset
  1052
\] 
539b2e88f5b9 updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 340
diff changeset
  1053
764
6718ef6143b8 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents: 757
diff changeset
  1054
\noindent But this is just the definition of $matcher$
343
539b2e88f5b9 updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 340
diff changeset
  1055
539b2e88f5b9 updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 340
diff changeset
  1056
\[
764
6718ef6143b8 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents: 757
diff changeset
  1057
matcher\,s\,r \dn nullable(\textit{ders}\,s\,r)
343
539b2e88f5b9 updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 340
diff changeset
  1058
\]
539b2e88f5b9 updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 340
diff changeset
  1059
539b2e88f5b9 updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 340
diff changeset
  1060
\noindent In effect we have shown
539b2e88f5b9 updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 340
diff changeset
  1061
539b2e88f5b9 updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 340
diff changeset
  1062
\[
764
6718ef6143b8 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents: 757
diff changeset
  1063
matcher\,s\,r\;\;\text{if and only if}\;\;
343
539b2e88f5b9 updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 340
diff changeset
  1064
s\in L(r)
539b2e88f5b9 updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 340
diff changeset
  1065
\]
539b2e88f5b9 updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 340
diff changeset
  1066
488
598741d39d21 updated
Christian Urban <urbanc@in.tum.de>
parents: 481
diff changeset
  1067
\noindent which is the property we set out to prove: our algorithm
598741d39d21 updated
Christian Urban <urbanc@in.tum.de>
parents: 481
diff changeset
  1068
meets its specification. To have done so, requires a few induction
926
42ecc3186944 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents: 874
diff changeset
  1069
proofs about strings and regular expressions. Following the
42ecc3186944 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents: 874
diff changeset
  1070
\emph{induction recipes} is already a big step in actually performing
42ecc3186944 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents: 874
diff changeset
  1071
these proofs.  If you do not believe it, proofs have helped me to make
42ecc3186944 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents: 874
diff changeset
  1072
sure my code is correct and in several instances prevented me of
42ecc3186944 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents: 874
diff changeset
  1073
letting slip embarrassing mistakes into the `wild'. In fact I have
42ecc3186944 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents: 874
diff changeset
  1074
found a number of mistakes in the brilliant work by Sulzmann and Lu,
42ecc3186944 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents: 874
diff changeset
  1075
which we are going to have a look at in Lecture 4. But in Lecture 3 we
42ecc3186944 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents: 874
diff changeset
  1076
should first find out why on earth are existing regular expression
42ecc3186944 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents: 874
diff changeset
  1077
matchers so abysmally slow. Are the people in Python, Ruby, Swift,
42ecc3186944 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents: 874
diff changeset
  1078
JavaScript, Java and also in Rust\footnote{Interestingly the regex
42ecc3186944 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents: 874
diff changeset
  1079
  engine in Rust says it guarantees a linear behaviour when deciding
42ecc3186944 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents: 874
diff changeset
  1080
  when a regular expression matches a
42ecc3186944 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents: 874
diff changeset
  1081
  string. \here{https://youtu.be/3N_ywhx6_K0?t=31}} just idiots? For
42ecc3186944 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents: 874
diff changeset
  1082
example could it possibly be that what we have implemented here in
42ecc3186944 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents: 874
diff changeset
  1083
Scala is faster than the regex engine that has been implemented in
42ecc3186944 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents: 874
diff changeset
  1084
Rust? See you next week\ldots
343
539b2e88f5b9 updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 340
diff changeset
  1085
262
ee4304bc6350 updated handouts
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 261
diff changeset
  1086
\end{document}
261
24531cfaa36a updated handouts
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 259
diff changeset
  1087
24531cfaa36a updated handouts
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 259
diff changeset
  1088
24531cfaa36a updated handouts
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 259
diff changeset
  1089
566
b153c04834eb updated
Christian Urban <urbanc@in.tum.de>
parents: 550
diff changeset
  1090
% !TeX program = latexmk -xelatex
123
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents:
diff changeset
  1091
%%% Local Variables: 
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents:
diff changeset
  1092
%%% mode: latex
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents:
diff changeset
  1093
%%% TeX-master: t
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents:
diff changeset
  1094
%%% End: