videos/01-evilregexes.srt
author Christian Urban <christian.urban@kcl.ac.uk>
Fri, 11 Oct 2024 19:13:00 +0100
changeset 967 ce5de01b9632
parent 837 499405058cfd
permissions -rw-r--r--
updated
Ignore whitespace changes - Everywhere: Within whitespace: At end of lines:
761
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
     1
1
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
     2
00:00:06,240 --> 00:00:11,050
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
     3
Welcome back. This video
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
     4
is about regular expressions.
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
     5
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
     6
2
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
     7
00:00:11,050 --> 00:00:14,230
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
     8
We want to use regular
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
     9
expressions in our lexer.
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
    10
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
    11
3
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
    12
00:00:14,230 --> 00:00:16,165
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
    13
And the purpose of the
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
    14
lexer is to find
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
    15
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
    16
4
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
    17
00:00:16,165 --> 00:00:18,070
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
    18
out where the words in
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
    19
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
    20
5
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
    21
00:00:18,070 --> 00:00:21,070
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
    22
our programs are. However
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
    23
regular expressions
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
    24
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
    25
6
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
    26
00:00:21,070 --> 00:00:23,875
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
    27
are fundamental tool
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
    28
in computer science.
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
    29
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
    30
7
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
    31
00:00:23,875 --> 00:00:27,910
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
    32
And I'm sure you've used them
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
    33
already on several occasions.
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
    34
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
    35
8
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
    36
00:00:27,910 --> 00:00:30,370
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
    37
And one would expect that about
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
    38
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
    39
9
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
    40
00:00:30,370 --> 00:00:31,750
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
    41
regular expressions since they are
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
    42
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
    43
10
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
    44
00:00:31,750 --> 00:00:33,850
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
    45
so well-known and well studied,
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
    46
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
    47
11
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
    48
00:00:33,850 --> 00:00:37,915
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
    49
that everything under the
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
    50
sun is known about them.
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
    51
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
    52
12
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
    53
00:00:37,915 --> 00:00:41,080
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
    54
But actually there's
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
    55
still some surprising
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
    56
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
    57
13
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
    58
00:00:41,080 --> 00:00:44,465
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
    59
and interesting
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
    60
problems with them.
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
    61
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
    62
14
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
    63
00:00:44,465 --> 00:00:47,945
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
    64
And I want to show you
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
    65
them in this video.
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
    66
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
    67
15
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
    68
00:00:47,945 --> 00:00:50,720
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
    69
I'm sure you've seen
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
    70
regular expressions
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
    71
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
    72
16
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
    73
00:00:50,720 --> 00:00:52,445
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
    74
many, many times before.
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
    75
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
    76
17
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
    77
00:00:52,445 --> 00:00:55,100
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
    78
But just to be on the same page,
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
    79
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
    80
18
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
    81
00:00:55,100 --> 00:00:57,110
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
    82
let me just recap them.
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
    83
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
    84
19
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
    85
00:00:57,110 --> 00:00:59,210
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
    86
So here in this line,
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
    87
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
    88
20
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
    89
00:00:59,210 --> 00:01:01,790
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
    90
there is a regular expression
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
    91
which is supposed to
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
    92
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
    93
21
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
    94
00:01:01,790 --> 00:01:05,285
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
    95
recognize some form
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
    96
of email addresses.
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
    97
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
    98
22
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
    99
00:01:05,285 --> 00:01:07,745
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   100
So an e-mail address
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   101
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   102
23
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   103
00:01:07,745 --> 00:01:11,000
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   104
has part which is
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   105
before the @ symbol,
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   106
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   107
24
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   108
00:01:11,000 --> 00:01:13,400
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   109
which is the name of the person.
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   110
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   111
25
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   112
00:01:13,400 --> 00:01:16,880
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   113
And that can be
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   114
any number between
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   115
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   116
26
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   117
00:01:16,880 --> 00:01:20,195
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   118
0 and 9, and letters between a and z.
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   119
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   120
27
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   121
00:01:20,195 --> 00:01:24,155
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   122
Let's say we avoiding
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   123
here capital letters.
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   124
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   125
28
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   126
00:01:24,155 --> 00:01:26,045
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   127
There can be underscores.
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   128
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   129
29
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   130
00:01:26,045 --> 00:01:29,405
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   131
There can be a dot and
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   132
there can be hyphens.
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   133
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   134
30
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   135
00:01:29,405 --> 00:01:35,390
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   136
And after the @ symbol
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   137
comes the domain name.
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   138
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   139
31
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   140
00:01:35,390 --> 00:01:37,310
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   141
So as you can see here,
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   142
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   143
32
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   144
00:01:37,310 --> 00:01:40,640
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   145
we use things like star to
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   146
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   147
33
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   148
00:01:40,640 --> 00:01:44,314
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   149
match letters
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   150
zero or more times.
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   151
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   152
34
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   153
00:01:44,314 --> 00:01:45,985
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   154
Or we have a plus,
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   155
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   156
35
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   157
00:01:45,985 --> 00:01:47,420
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   158
which means you have to match
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   159
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   160
36
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   161
00:01:47,420 --> 00:01:52,489
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   162
at least once or more
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   163
times. Then we have.
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   164
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   165
37
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   166
00:01:52,489 --> 00:01:55,790
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   167
question mark, which says you
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   168
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   169
38
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   170
00:01:55,790 --> 00:01:59,105
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   171
match either it is there
837
499405058cfd updated
Christian Urban <christian.urban@kcl.ac.uk>
parents: 765
diff changeset
   172
or it is not there.
761
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   173
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   174
39
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   175
00:01:59,105 --> 00:02:01,340
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   176
You are also regular
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   177
expressions which
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   178
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   179
40
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   180
00:02:01,340 --> 00:02:03,755
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   181
match exactly n-times.
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   182
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   183
41
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   184
00:02:03,755 --> 00:02:08,720
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   185
Or this is a regular expression
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   186
for between n and m times.
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   187
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   188
42
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   189
00:02:08,720 --> 00:02:12,065
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   190
You can see in
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   191
this email address,
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   192
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   193
43
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   194
00:02:12,065 --> 00:02:13,730
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   195
the top-level domain
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   196
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   197
44
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   198
00:02:13,730 --> 00:02:16,130
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   199
name can be any letter 
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   200
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   201
45
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   202
00:02:16,130 --> 00:02:19,265
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   203
between a to z,
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   204
and contain dots,
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   205
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   206
46
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   207
00:02:19,265 --> 00:02:22,340
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   208
but can only be two
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   209
characters long
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   210
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   211
47
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   212
00:02:22,340 --> 00:02:25,685
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   213
up till six characters
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   214
and not more.
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   215
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   216
48
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   217
00:02:25,685 --> 00:02:29,240
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   218
Then you also have
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   219
something like ranges.
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   220
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   221
49
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   222
00:02:29,240 --> 00:02:31,220
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   223
So you can see, letters between a
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   224
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   225
50
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   226
00:02:31,220 --> 00:02:33,635
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   227
and z and 0 to 9 and so on.
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   228
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   229
51
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   230
00:02:33,635 --> 00:02:36,545
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   231
Here you also have regular
837
499405058cfd updated
Christian Urban <christian.urban@kcl.ac.uk>
parents: 765
diff changeset
   232
expressions which can
761
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   233
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   234
52
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   235
00:02:36,545 --> 00:02:40,070
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   236
match something which
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   237
isn't in this range.
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   238
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   239
53
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   240
00:02:40,070 --> 00:02:42,560
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   241
So for example, if
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   242
you want for example match,
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   243
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   244
54
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   245
00:02:42,560 --> 00:02:44,030
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   246
letters but not numbers,
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   247
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   248
55
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   249
00:02:44,030 --> 00:02:45,800
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   250
you would say, well, if
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   251
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   252
56
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   253
00:02:45,800 --> 00:02:48,990
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   254
this is a number that
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   255
should not match.
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   256
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   257
57
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   258
00:02:49,090 --> 00:02:52,804
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   259
Typically you also
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   260
have these ranges.
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   261
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   262
58
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   263
00:02:52,804 --> 00:02:55,565
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   264
Lowercase letters,
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   265
capital letters.
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   266
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   267
59
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   268
00:02:55,565 --> 00:02:58,550
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   269
Then you have some
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   270
special regular expressions
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   271
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   272
60
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   273
00:02:58,550 --> 00:03:02,195
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   274
like this one is only
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   275
supposed to match digits.
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   276
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   277
61
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   278
00:03:02,195 --> 00:03:05,674
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   279
A dot is supposed to
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   280
match any character.
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   281
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   282
62
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   283
00:03:05,674 --> 00:03:07,370
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   284
And then they have also something
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   285
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   286
63
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   287
00:03:07,370 --> 00:03:09,800
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   288
called groups which
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   289
is supposed to be
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   290
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   291
64
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   292
00:03:09,800 --> 00:03:12,799
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   293
used when you are
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   294
trying to extract
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   295
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   296
65
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   297
00:03:12,799 --> 00:03:15,605
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   298
a string you've matched.
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   299
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   300
66
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   301
00:03:15,605 --> 00:03:19,925
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   302
Okay, so these are the
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   303
typical regular expressions.
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   304
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   305
67
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   306
00:03:19,925 --> 00:03:23,075
837
499405058cfd updated
Christian Urban <christian.urban@kcl.ac.uk>
parents: 765
diff changeset
   307
And here's a particular one
761
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   308
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   309
68
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   310
00:03:23,075 --> 00:03:25,820
837
499405058cfd updated
Christian Urban <christian.urban@kcl.ac.uk>
parents: 765
diff changeset
   311
trying to match something
761
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   312
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   313
69
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   314
00:03:25,820 --> 00:03:28,770
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   315
which resembles
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   316
an email address.
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   317
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   318
70
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   319
00:03:29,590 --> 00:03:33,065
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   320
Clearly that should be all easy.
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   321
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   322
71
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   323
00:03:33,065 --> 00:03:36,230
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   324
And our technology should
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   325
be on top of that.
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   326
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   327
72
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   328
00:03:36,230 --> 00:03:37,865
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   329
That we can take a
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   330
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   331
73
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   332
00:03:37,865 --> 00:03:41,015
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   333
regular expressions and
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   334
we can take a string,
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   335
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   336
74
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   337
00:03:41,015 --> 00:03:43,340
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   338
and we should have programs to
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   339
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   340
75
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   341
00:03:43,340 --> 00:03:45,680
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   342
decide whether this
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   343
string is matched
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   344
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   345
76
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   346
00:03:45,680 --> 00:03:50,330
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   347
by a regular expression or
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   348
not and should be easy-peasy, no?
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   349
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   350
77
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   351
00:03:50,330 --> 00:03:56,150
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   352
Well, let's have a
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   353
look at two examples.
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   354
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   355
78
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   356
00:03:56,150 --> 00:04:00,860
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   357
The first regular expression
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   358
is a star star b.
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   359
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   360
79
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   361
00:04:00,860 --> 00:04:02,990
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   362
And it is supposed
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   363
to match strings of
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   364
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   365
80
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   366
00:04:02,990 --> 00:04:05,825
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   367
the form 0 or more a's,
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   368
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   369
81
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   370
00:04:05,825 --> 00:04:10,385
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   371
followed by a b. The parentheses
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   372
you can ignore.
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   373
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   374
82
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   375
00:04:10,385 --> 00:04:11,990
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   376
And a star star
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   377
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   378
83
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   379
00:04:11,990 --> 00:04:14,120
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   380
also doesn't
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   381
make any difference
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   382
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   383
84
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   384
00:04:14,120 --> 00:04:16,505
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   385
to what kind of strings
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   386
that can be matched.
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   387
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   388
85
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   389
00:04:16,505 --> 00:04:21,635
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   390
It can only make 0 more
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   391
a's followed by a b.
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   392
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   393
86
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   394
00:04:21,635 --> 00:04:23,900
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   395
And the other regular expression
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   396
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   397
87
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   398
00:04:23,900 --> 00:04:26,990
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   399
is possibly a character a,
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   400
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   401
88
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   402
00:04:26,990 --> 00:04:32,930
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   403
n times, followed by character
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   404
a axactly n-times.
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   405
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   406
89
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   407
00:04:32,930 --> 00:04:35,570
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   408
And we will try out
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   409
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   410
90
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   411
00:04:35,570 --> 00:04:38,360
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   412
these two regular expressions
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   413
with strings of the form a,
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   414
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   415
91
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   416
00:04:38,360 --> 00:04:39,890
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   417
aa, and so on,
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   418
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   419
92
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   420
00:04:39,890 --> 00:04:45,770
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   421
and up to the length of n. And
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   422
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   423
93
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   424
00:04:45,770 --> 00:04:49,130
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   425
this regular expression should
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   426
actually not match any of
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   427
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   428
94
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   429
00:04:49,130 --> 00:04:53,315
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   430
the strings because the
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   431
final b is missing.
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   432
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   433
95
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   434
00:04:53,315 --> 00:04:56,150
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   435
But that is
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   436
okay. For example
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   437
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   438
96
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   439
00:04:56,150 --> 00:04:57,425
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   440
if you have a regular expression
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   441
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   442
97
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   443
00:04:57,425 --> 00:05:00,110
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   444
that is supposed to
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   445
check whether a string is
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   446
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   447
98
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   448
00:05:00,110 --> 00:05:01,490
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   449
an email address and the user
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   450
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   451
99
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   452
00:05:01,490 --> 00:05:03,380
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   453
gives some random
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   454
strings in there,
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   455
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   456
100
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   457
00:05:03,380 --> 00:05:06,545
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   458
then this regular expression
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   459
should not match that string.
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   460
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   461
101
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   462
00:05:06,545 --> 00:05:08,420
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   463
And for this regular expression
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   464
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   465
102
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   466
00:05:08,420 --> 00:05:11,195
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   467
you have to scratch a
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   468
little bit of your head,
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   469
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   470
103
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   471
00:05:11,195 --> 00:05:12,620
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   472
what it can actually match.
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   473
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   474
104
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   475
00:05:12,620 --> 00:05:14,720
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   476
But after a little bit
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   477
of head scratching,
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   478
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   479
105
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   480
00:05:14,720 --> 00:05:18,260
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   481
you find out can match
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   482
any string which is of
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   483
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   484
106
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   485
00:05:18,260 --> 00:05:22,580
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   486
the length n a's up
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   487
to 2n of a's.
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   488
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   489
107
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   490
00:05:22,580 --> 00:05:24,290
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   491
So anything in this range,
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   492
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   493
108
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   494
00:05:24,290 --> 00:05:27,185
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   495
this regular expression
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   496
can actually match.
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   497
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   498
109
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   499
00:05:27,185 --> 00:05:30,395
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   500
Okay, let's
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   501
take a random tool,
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   502
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   503
110
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   504
00:05:30,395 --> 00:05:32,630
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   505
maybe for example Python.
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   506
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   507
111
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   508
00:05:32,630 --> 00:05:35,240
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   509
So here's a little
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   510
Python program.
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   511
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   512
112
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   513
00:05:35,240 --> 00:05:38,690
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   514
It uses the library
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   515
function of Python to
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   516
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   517
113
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   518
00:05:38,690 --> 00:05:42,935
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   519
match the regular expressions of
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   520
a star star b.
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   521
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   522
114
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   523
00:05:42,935 --> 00:05:46,805
837
499405058cfd updated
Christian Urban <christian.urban@kcl.ac.uk>
parents: 765
diff changeset
   524
And we measure the time with longer
499405058cfd updated
Christian Urban <christian.urban@kcl.ac.uk>
parents: 765
diff changeset
   525
and longer strings of a's.
761
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   526
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   527
115
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   528
00:05:46,805 --> 00:05:48,770
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   529
And so conveniently we can give
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   530
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   531
116
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   532
00:05:48,770 --> 00:05:51,140
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   533
the number of a's here
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   534
on the command line.
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   535
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   536
117
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   537
00:05:51,140 --> 00:05:56,900
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   538
If I just call
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   539
this on the command line,
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   540
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   541
118
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   542
00:05:56,900 --> 00:05:59,900
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   543
Let's say we first
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   544
start with five a's.
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   545
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   546
119
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   547
00:05:59,900 --> 00:06:03,920
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   548
And I get also the times which
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   549
in this case is next to nothing.
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   550
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   551
120
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   552
00:06:03,920 --> 00:06:05,960
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   553
And here's the string
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   554
we just matched.
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   555
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   556
121
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   557
00:06:05,960 --> 00:06:07,640
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   558
And obviously the
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   559
regular expression
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   560
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   561
122
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   562
00:06:07,640 --> 00:06:09,110
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   563
did not match the string.
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   564
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   565
123
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   566
00:06:09,110 --> 00:06:11,255
837
499405058cfd updated
Christian Urban <christian.urban@kcl.ac.uk>
parents: 765
diff changeset
   567
That's indicated by this None.
761
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   568
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   569
124
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   570
00:06:11,255 --> 00:06:13,925
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   571
Let's take ten a's.
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   572
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   573
125
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   574
00:06:13,925 --> 00:06:16,490
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   575
It's also pretty quick.
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   576
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   577
126
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   578
00:06:16,490 --> 00:06:20,780
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   579
Fifteen a's, even quicker,
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   580
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   581
127
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   582
00:06:20,780 --> 00:06:23,180
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   583
but these times always need to
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   584
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   585
128
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   586
00:06:23,180 --> 00:06:25,820
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   587
be taken with a grain of salt.
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   588
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   589
129
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   590
00:06:25,820 --> 00:06:28,040
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   591
They are not 100
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   592
percent accurate.
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   593
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   594
130
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   595
00:06:28,040 --> 00:06:31,490
837
499405058cfd updated
Christian Urban <christian.urban@kcl.ac.uk>
parents: 765
diff changeset
   596
So 15 is also OK.
499405058cfd updated
Christian Urban <christian.urban@kcl.ac.uk>
parents: 765
diff changeset
   597
Let's take 20.
761
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   598
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   599
131
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   600
00:06:31,490 --> 00:06:36,965
837
499405058cfd updated
Christian Urban <christian.urban@kcl.ac.uk>
parents: 765
diff changeset
   601
Hmmm this already takes
761
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   602
double the time.
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   603
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   604
132
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   605
00:06:36,965 --> 00:06:42,440
837
499405058cfd updated
Christian Urban <christian.urban@kcl.ac.uk>
parents: 765
diff changeset
   606
Twenty-five. Then even longer.
761
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   607
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   608
133
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   609
00:06:42,440 --> 00:06:45,680
837
499405058cfd updated
Christian Urban <christian.urban@kcl.ac.uk>
parents: 765
diff changeset
   610
Okay, then suddenly
499405058cfd updated
Christian Urban <christian.urban@kcl.ac.uk>
parents: 765
diff changeset
   611
from 0.2 seconds,
761
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   612
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   613
134
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   614
00:06:45,680 --> 00:06:48,960
837
499405058cfd updated
Christian Urban <christian.urban@kcl.ac.uk>
parents: 765
diff changeset
   615
it now takes almost four seconds.
761
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   616
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   617
135
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   618
00:06:49,600 --> 00:06:54,890
837
499405058cfd updated
Christian Urban <christian.urban@kcl.ac.uk>
parents: 765
diff changeset
   619
Twenty-Six, this
761
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   620
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   621
136
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   622
00:06:54,890 --> 00:07:01,415
837
499405058cfd updated
Christian Urban <christian.urban@kcl.ac.uk>
parents: 765
diff changeset
   623
takes six seconds...
499405058cfd updated
Christian Urban <christian.urban@kcl.ac.uk>
parents: 765
diff changeset
   624
already double. 
761
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   625
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   626
137
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   627
00:07:01,415 --> 00:07:07,229
837
499405058cfd updated
Christian Urban <christian.urban@kcl.ac.uk>
parents: 765
diff changeset
   628
Let's go to 28. That would be
499405058cfd updated
Christian Urban <christian.urban@kcl.ac.uk>
parents: 765
diff changeset
   629
...hmmm....hmmm
761
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   630
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   631
138
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   632
00:07:08,890 --> 00:07:11,840
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   633
You see the string
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   634
isn't very long,
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   635
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   636
139
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   637
00:07:11,840 --> 00:07:13,340
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   638
so that could be easily like
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   639
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   640
140
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   641
00:07:13,340 --> 00:07:16,070
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   642
just the size of
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   643
an email address.
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   644
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   645
141
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   646
00:07:16,070 --> 00:07:19,280
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   647
And the regular
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   648
expression matching
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   649
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   650
142
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   651
00:07:19,280 --> 00:07:22,550
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   652
engine in Python needs
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   653
quite a long time
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   654
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   655
143
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   656
00:07:22,550 --> 00:07:24,710
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   657
to find out that
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   658
this string of 28
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   659
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   660
144
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   661
00:07:24,710 --> 00:07:26,570
837
499405058cfd updated
Christian Urban <christian.urban@kcl.ac.uk>
parents: 765
diff changeset
   662
a's is actually not matched
761
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   663
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   664
145
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   665
00:07:26,570 --> 00:07:28,490
837
499405058cfd updated
Christian Urban <christian.urban@kcl.ac.uk>
parents: 765
diff changeset
   666
by that. You see it's
761
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   667
still not finished.
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   668
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   669
146
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   670
00:07:28,490 --> 00:07:32,900
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   671
I think it should take
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   672
approximately like 20 seconds.
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   673
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   674
147
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   675
00:07:32,900 --> 00:07:34,400
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   676
Okay. Already 30.
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   677
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   678
148
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   679
00:07:34,400 --> 00:07:36,530
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   680
And if we would try
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   681
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   682
149
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   683
00:07:36,530 --> 00:07:40,805
837
499405058cfd updated
Christian Urban <christian.urban@kcl.ac.uk>
parents: 765
diff changeset
   684
30, we would be already here
499405058cfd updated
Christian Urban <christian.urban@kcl.ac.uk>
parents: 765
diff changeset
   685
for more than a minute.
761
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   686
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   687
150
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   688
00:07:40,805 --> 00:07:43,940
837
499405058cfd updated
Christian Urban <christian.urban@kcl.ac.uk>
parents: 765
diff changeset
   689
And if I could use
499405058cfd updated
Christian Urban <christian.urban@kcl.ac.uk>
parents: 765
diff changeset
   690
something like 100,
761
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   691
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   692
151
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   693
00:07:43,940 --> 00:07:46,220
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   694
you remember if a doubling in
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   695
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   696
152
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   697
00:07:46,220 --> 00:07:48,770
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   698
each step or the second step,
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   699
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   700
153
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   701
00:07:48,770 --> 00:07:50,720
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   702
the story with the chess board,
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   703
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   704
154
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   705
00:07:50,720 --> 00:07:53,855
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   706
we probably would sit here
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   707
until the next century.
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   708
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   709
155
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   710
00:07:53,855 --> 00:07:56,820
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   711
So something strange here.
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   712
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   713
156
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   714
00:07:57,580 --> 00:08:01,355
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   715
Okay, that might be just
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   716
a problem of Python.
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   717
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   718
157
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   719
00:08:01,355 --> 00:08:02,990
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   720
Let's have a look at another
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   721
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   722
158
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   723
00:08:02,990 --> 00:08:04,985
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   724
regular expression
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   725
matching engine.
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   726
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   727
159
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   728
00:08:04,985 --> 00:08:06,890
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   729
This time from JavaScript,
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   730
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   731
160
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   732
00:08:06,890 --> 00:08:10,040
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   733
also are pretty well-known
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   734
programming language.
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   735
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   736
161
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   737
00:08:10,040 --> 00:08:13,610
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   738
So here you can see
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   739
it's still a star,
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   740
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   741
162
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   742
00:08:13,610 --> 00:08:16,235
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   743
star followed by b,
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   744
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   745
163
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   746
00:08:16,235 --> 00:08:18,920
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   747
by direct expression is
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   748
supposed to match that from
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   749
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   750
164
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   751
00:08:18,920 --> 00:08:21,830
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   752
the beginning of the
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   753
string up till the end.
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   754
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   755
165
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   756
00:08:21,830 --> 00:08:23,930
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   757
So there's not any difference
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   758
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   759
166
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   760
00:08:23,930 --> 00:08:26,150
837
499405058cfd updated
Christian Urban <christian.urban@kcl.ac.uk>
parents: 765
diff changeset
   761
in the strings this regular
761
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   762
expression matches.
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   763
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   764
167
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   765
00:08:26,150 --> 00:08:28,610
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   766
We'll just start at the
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   767
beginning of the string
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   768
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   769
168
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   770
00:08:28,610 --> 00:08:31,460
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   771
and finish at the
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   772
end of the string.
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   773
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   774
169
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   775
00:08:31,460 --> 00:08:35,285
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   776
And we again, we just use
837
499405058cfd updated
Christian Urban <christian.urban@kcl.ac.uk>
parents: 765
diff changeset
   777
repeated a's for that.
761
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   778
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   779
170
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   780
00:08:35,285 --> 00:08:38,195
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   781
And similarly, we can
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   782
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   783
171
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   784
00:08:38,195 --> 00:08:41,930
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   785
call it on the command line
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   786
and can do some timing.
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   787
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   788
172
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   789
00:08:41,930 --> 00:08:44,540
837
499405058cfd updated
Christian Urban <christian.urban@kcl.ac.uk>
parents: 765
diff changeset
   790
So ten a's is very good.
761
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   791
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   792
173
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   793
00:08:44,540 --> 00:08:46,340
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   794
Here's the string.
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   795
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   796
174
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   797
00:08:46,340 --> 00:08:48,320
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   798
It cannot match that string.
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   799
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   800
175
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   801
00:08:48,320 --> 00:08:50,525
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   802
And it's pretty fast.
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   803
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   804
176
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   805
00:08:50,525 --> 00:08:54,725
837
499405058cfd updated
Christian Urban <christian.urban@kcl.ac.uk>
parents: 765
diff changeset
   806
Twenty...also pretty fast.
761
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   807
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   808
177
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   809
00:08:54,725 --> 00:08:59,120
837
499405058cfd updated
Christian Urban <christian.urban@kcl.ac.uk>
parents: 765
diff changeset
   810
Twenty-five... Again,
761
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   811
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   812
178
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   813
00:08:59,120 --> 00:09:06,650
837
499405058cfd updated
Christian Urban <christian.urban@kcl.ac.uk>
parents: 765
diff changeset
   814
somehow is a kind of
761
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   815
threshold that is 25, 26.
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   816
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   817
179
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   818
00:09:06,650 --> 00:09:09,485
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   819
Suddenly it takes much longer.
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   820
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   821
180
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   822
00:09:09,485 --> 00:09:14,360
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   823
And it has essentially the
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   824
same problem as with Python.
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   825
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   826
181
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   827
00:09:14,360 --> 00:09:17,165
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   828
So you'll see in now from 26 on,
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   829
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   830
182
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   831
00:09:17,165 --> 00:09:19,250
837
499405058cfd updated
Christian Urban <christian.urban@kcl.ac.uk>
parents: 765
diff changeset
   832
the times always
499405058cfd updated
Christian Urban <christian.urban@kcl.ac.uk>
parents: 765
diff changeset
   833
double from
761
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   834
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   835
183
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   836
00:09:19,250 --> 00:09:21,860
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   837
three seconds to seven seconds.
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   838
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   839
184
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   840
00:09:21,860 --> 00:09:23,330
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   841
So you can imagine what that
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   842
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   843
185
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   844
00:09:23,330 --> 00:09:24,890
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   845
roughly takes when I put your
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   846
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   847
186
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   848
00:09:24,890 --> 00:09:30,230
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   849
27 and you see the
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   850
string isn't very long.
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   851
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   852
187
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   853
00:09:30,230 --> 00:09:32,165
837
499405058cfd updated
Christian Urban <christian.urban@kcl.ac.uk>
parents: 765
diff changeset
   854
It is just twenty-or-something a's.
761
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   855
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   856
188
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   857
00:09:32,165 --> 00:09:35,419
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   858
Imagine you have to
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   859
search a database
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   860
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   861
189
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   862
00:09:35,419 --> 00:09:38,720
837
499405058cfd updated
Christian Urban <christian.urban@kcl.ac.uk>
parents: 765
diff changeset
   863
with Gigabytes of data
761
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   864
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   865
190
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   866
00:09:38,720 --> 00:09:42,260
837
499405058cfd updated
Christian Urban <christian.urban@kcl.ac.uk>
parents: 765
diff changeset
   867
with these regular
499405058cfd updated
Christian Urban <christian.urban@kcl.ac.uk>
parents: 765
diff changeset
   868
expressions that would 
761
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   869
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   870
191
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   871
00:09:42,260 --> 00:09:48,150
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   872
need years to go through with
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   873
these regular expressions.
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   874
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   875
192
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   876
00:09:48,630 --> 00:09:51,850
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   877
Okay, maybe the people in
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   878
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   879
193
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   880
00:09:51,850 --> 00:09:55,435
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   881
Python and JavaScript,
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   882
they're just idiots.
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   883
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   884
194
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   885
00:09:55,435 --> 00:09:58,180
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   886
Surely Java must do much better.
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   887
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   888
195
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   889
00:09:58,180 --> 00:10:01,045
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   890
So here's a program.
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   891
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   892
196
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   893
00:10:01,045 --> 00:10:03,415
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   894
You can see this again
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   895
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   896
197
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   897
00:10:03,415 --> 00:10:05,980
837
499405058cfd updated
Christian Urban <christian.urban@kcl.ac.uk>
parents: 765
diff changeset
   898
is the regular expression
761
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   899
and we just having
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   900
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   901
198
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   902
00:10:05,980 --> 00:10:08,320
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   903
some scaffolding to generate
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   904
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   905
199
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   906
00:10:08,320 --> 00:10:11,905
837
499405058cfd updated
Christian Urban <christian.urban@kcl.ac.uk>
parents: 765
diff changeset
   907
strings from 5 up till 28.
761
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   908
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   909
200
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   910
00:10:11,905 --> 00:10:14,305
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   911
And if we run that,
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   912
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   913
201
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   914
00:10:14,305 --> 00:10:16,660
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   915
actually does that automatically.
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   916
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   917
202
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   918
00:10:16,660 --> 00:10:19,900
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   919
So uphill 19, pretty fast,
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   920
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   921
203
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   922
00:10:19,900 --> 00:10:24,925
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   923
but then starting from
837
499405058cfd updated
Christian Urban <christian.urban@kcl.ac.uk>
parents: 765
diff changeset
   924
23, it is getting pretty slow.
761
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   925
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   926
204
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   927
00:10:24,925 --> 00:10:27,445
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   928
So the question is
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   929
what's going on?
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   930
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   931
205
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   932
00:10:27,445 --> 00:10:29,230
837
499405058cfd updated
Christian Urban <christian.urban@kcl.ac.uk>
parents: 765
diff changeset
   933
By the way, I'm not gloating here.
761
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   934
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   935
206
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   936
00:10:29,230 --> 00:10:33,755
837
499405058cfd updated
Christian Urban <christian.urban@kcl.ac.uk>
parents: 765
diff changeset
   937
Scala uses internally
761
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   938
the regular expression
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   939
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   940
207
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   941
00:10:33,755 --> 00:10:36,665
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   942
matching engine from Java.
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   943
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   944
208
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   945
00:10:36,665 --> 00:10:39,065
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   946
So would have exactly
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   947
the same problem.
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   948
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   949
209
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   950
00:10:39,065 --> 00:10:41,480
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   951
Also, I have been
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   952
here very careful,
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   953
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   954
210
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   955
00:10:41,480 --> 00:10:43,550
765
b294cfbb5c01 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents: 761
diff changeset
   956
I'm using here Java 8,
761
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   957
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   958
211
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   959
00:10:43,550 --> 00:10:46,085
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   960
which nowadays is quite old.
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   961
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   962
212
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   963
00:10:46,085 --> 00:10:50,765
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   964
But you will see also
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   965
current Java versions.
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   966
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   967
213
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   968
00:10:50,765 --> 00:10:55,490
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   969
We will see we can out-compete
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   970
them by magnitudes.
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   971
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   972
214
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   973
00:10:55,490 --> 00:10:57,605
837
499405058cfd updated
Christian Urban <christian.urban@kcl.ac.uk>
parents: 765
diff changeset
   974
So I think I can 
761
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   975
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   976
215
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   977
00:10:57,605 --> 00:10:59,165
837
499405058cfd updated
Christian Urban <christian.urban@kcl.ac.uk>
parents: 765
diff changeset
   978
now, just finish this here.
761
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   979
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   980
216
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   981
00:10:59,165 --> 00:11:04,025
837
499405058cfd updated
Christian Urban <christian.urban@kcl.ac.uk>
parents: 765
diff changeset
   982
You see the problem.
499405058cfd updated
Christian Urban <christian.urban@kcl.ac.uk>
parents: 765
diff changeset
   983
Just for completeness sake.
761
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   984
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   985
217
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   986
00:11:04,025 --> 00:11:07,010
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   987
Here is a Ruby program.
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   988
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   989
218
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   990
00:11:07,010 --> 00:11:09,935
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   991
This is using the other
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   992
regular expression.
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   993
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   994
219
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   995
00:11:09,935 --> 00:11:12,935
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   996
In this case the
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   997
string should match.
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   998
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   999
220
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1000
00:11:12,935 --> 00:11:20,300
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1001
And again it tries out
837
499405058cfd updated
Christian Urban <christian.urban@kcl.ac.uk>
parents: 765
diff changeset
  1002
strings between 1 and 30 here.
761
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1003
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1004
221
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1005
00:11:20,300 --> 00:11:23,450
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1006
That's a program actually
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1007
a former student produced.
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1008
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1009
222
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1010
00:11:23,450 --> 00:11:25,565
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1011
And you can see four a's
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1012
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1013
223
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1014
00:11:25,565 --> 00:11:29,780
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1015
of links up till 20
837
499405058cfd updated
Christian Urban <christian.urban@kcl.ac.uk>
parents: 765
diff changeset
  1016
a's is pretty fast.
761
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1017
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1018
224
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1019
00:11:29,780 --> 00:11:32,495
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1020
But then starting at 26,
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1021
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1022
225
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1023
00:11:32,495 --> 00:11:35,285
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1024
it's getting really slow.
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1025
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1026
226
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1027
00:11:35,285 --> 00:11:37,100
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1028
So in this case,
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1029
remember the string
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1030
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1031
227
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1032
00:11:37,100 --> 00:11:38,870
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1033
is actually matched by
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1034
the regular expression.
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1035
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1036
228
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1037
00:11:38,870 --> 00:11:40,130
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1038
So it has nothing to do
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1039
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1040
229
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1041
00:11:40,130 --> 00:11:41,540
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1042
with a regular
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1043
expression actually
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1044
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1045
230
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1046
00:11:41,540 --> 00:11:45,485
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1047
matches a string or does
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1048
not match a string.
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1049
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1050
231
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1051
00:11:45,485 --> 00:11:48,260
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1052
I admit though these
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1053
regular expressions
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1054
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1055
232
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1056
00:11:48,260 --> 00:11:49,610
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1057
are carefully chosen,
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1058
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1059
233
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1060
00:11:49,610 --> 00:11:52,250
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1061
as you will see later on.
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1062
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1063
234
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1064
00:11:52,250 --> 00:11:55,620
837
499405058cfd updated
Christian Urban <christian.urban@kcl.ac.uk>
parents: 765
diff changeset
  1065
I also just stop that here.
761
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1066
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1067
235
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1068
00:11:55,710 --> 00:12:00,985
837
499405058cfd updated
Christian Urban <christian.urban@kcl.ac.uk>
parents: 765
diff changeset
  1069
Okay, this slide collects
499405058cfd updated
Christian Urban <christian.urban@kcl.ac.uk>
parents: 765
diff changeset
  1070
the information about times.
761
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1071
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1072
236
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1073
00:12:00,985 --> 00:12:03,400
837
499405058cfd updated
Christian Urban <christian.urban@kcl.ac.uk>
parents: 765
diff changeset
  1074
On the right-hand side will
761
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1075
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1076
237
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1077
00:12:03,400 --> 00:12:05,860
837
499405058cfd updated
Christian Urban <christian.urban@kcl.ac.uk>
parents: 765
diff changeset
  1078
be our regular expression matcher,
761
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1079
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1080
238
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1081
00:12:05,860 --> 00:12:08,290
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1082
which we implement next week.
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1083
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1084
239
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1085
00:12:08,290 --> 00:12:10,795
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1086
On the left-hand side,
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1087
are these times by
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1088
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1089
240
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1090
00:12:10,795 --> 00:12:14,260
837
499405058cfd updated
Christian Urban <christian.urban@kcl.ac.uk>
parents: 765
diff changeset
  1091
various other regular
761
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1092
expression matching engines?
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1093
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1094
241
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1095
00:12:14,260 --> 00:12:17,809
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1096
On the top is this
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1097
regular expression.
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1098
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1099
242
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1100
00:12:19,080 --> 00:12:23,335
837
499405058cfd updated
Christian Urban <christian.urban@kcl.ac.uk>
parents: 765
diff changeset
  1101
Possible a n-times a n-times.
761
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1102
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1103
243
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1104
00:12:23,335 --> 00:12:26,890
837
499405058cfd updated
Christian Urban <christian.urban@kcl.ac.uk>
parents: 765
diff changeset
  1105
And on the lower
499405058cfd updated
Christian Urban <christian.urban@kcl.ac.uk>
parents: 765
diff changeset
  1106
is (a*)* b.
761
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1107
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1108
244
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1109
00:12:26,890 --> 00:12:30,370
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1110
And the x-axis show here
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1111
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1112
245
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1113
00:12:30,370 --> 00:12:35,335
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1114
the length of the
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1115
string. How many a's.
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1116
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1117
246
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1118
00:12:35,335 --> 00:12:38,925
837
499405058cfd updated
Christian Urban <christian.urban@kcl.ac.uk>
parents: 765
diff changeset
  1119
And on the y-axis is the time
761
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1120
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1121
247
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1122
00:12:38,925 --> 00:12:41,660
837
499405058cfd updated
Christian Urban <christian.urban@kcl.ac.uk>
parents: 765
diff changeset
  1123
they need to decide whether
761
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1124
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1125
248
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1126
00:12:41,660 --> 00:12:44,615
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1127
the string is matched by
837
499405058cfd updated
Christian Urban <christian.urban@kcl.ac.uk>
parents: 765
diff changeset
  1128
the regular expression or not.
761
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1129
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1130
249
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1131
00:12:44,615 --> 00:12:46,415
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1132
So you can see here, Python,
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1133
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1134
250
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1135
00:12:46,415 --> 00:12:47,945
837
499405058cfd updated
Christian Urban <christian.urban@kcl.ac.uk>
parents: 765
diff changeset
  1136
Java 8 and JavaScript,
761
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1137
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1138
251
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1139
00:12:47,945 --> 00:12:52,250
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1140
they max out approximately
837
499405058cfd updated
Christian Urban <christian.urban@kcl.ac.uk>
parents: 765
diff changeset
  1141
at between 25 and 30.
761
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1142
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1143
252
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1144
00:12:52,250 --> 00:12:53,900
837
499405058cfd updated
Christian Urban <christian.urban@kcl.ac.uk>
parents: 765
diff changeset
  1145
Because then it takes already
761
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1146
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1147
253
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1148
00:12:53,900 --> 00:12:55,160
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1149
a half a minute to
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1150
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1151
254
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1152
00:12:55,160 --> 00:12:57,410
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1153
decide whether the string
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1154
is matched or not.
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1155
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1156
255
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1157
00:12:57,410 --> 00:13:00,815
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1158
And similarly, in
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1159
the other example,
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1160
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1161
256
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1162
00:13:00,815 --> 00:13:03,830
837
499405058cfd updated
Christian Urban <christian.urban@kcl.ac.uk>
parents: 765
diff changeset
  1163
Python and Ruby max out
761
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1164
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1165
257
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1166
00:13:03,830 --> 00:13:07,220
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1167
at a similar kind of
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1168
length of the strings.
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1169
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1170
258
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1171
00:13:07,220 --> 00:13:10,400
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1172
Because then they use also
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1173
half a minute to decide
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1174
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1175
259
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1176
00:13:10,400 --> 00:13:13,940
837
499405058cfd updated
Christian Urban <christian.urban@kcl.ac.uk>
parents: 765
diff changeset
  1177
whether this regular expression
761
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1178
actually matches the string.
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1179
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1180
260
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1181
00:13:13,940 --> 00:13:16,790
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1182
Contrast that with
837
499405058cfd updated
Christian Urban <christian.urban@kcl.ac.uk>
parents: 765
diff changeset
  1183
761
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1184
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1185
261
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1186
00:13:16,790 --> 00:13:19,235
837
499405058cfd updated
Christian Urban <christian.urban@kcl.ac.uk>
parents: 765
diff changeset
  1187
the regular expression matcher
761
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1188
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1189
262
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1190
00:13:19,235 --> 00:13:21,470
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1191
which we're going to implement.
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1192
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1193
263
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1194
00:13:21,470 --> 00:13:25,040
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1195
This can match
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1196
approximately 10 thousand
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1197
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1198
264
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1199
00:13:25,040 --> 00:13:30,065
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1200
a's in this example and
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1201
needs less than ten seconds.
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1202
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1203
265
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1204
00:13:30,065 --> 00:13:32,285
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1205
Actually, there will be
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1206
two versions of that.
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1207
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1208
266
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1209
00:13:32,285 --> 00:13:34,850
837
499405058cfd updated
Christian Urban <christian.urban@kcl.ac.uk>
parents: 765
diff changeset
  1210
The first version will be
761
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1211
also relatively slow.
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1212
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1213
267
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1214
00:13:34,850 --> 00:13:36,410
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1215
But the second version,
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1216
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1217
268
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1218
00:13:36,410 --> 00:13:38,240
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1219
in contrast to Python,
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1220
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1221
269
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1222
00:13:38,240 --> 00:13:40,295
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1223
Ruby, we'll be blindingly fast.
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1224
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1225
270
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1226
00:13:40,295 --> 00:13:42,380
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1227
And in the second example,
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1228
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1229
271
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1230
00:13:42,380 --> 00:13:45,740
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1231
you have to be careful
837
499405058cfd updated
Christian Urban <christian.urban@kcl.ac.uk>
parents: 765
diff changeset
  1232
about the x-axis because
761
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1233
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1234
272
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1235
00:13:45,740 --> 00:13:49,385
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1236
that means four times
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1237
ten to the power six.
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1238
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1239
273
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1240
00:13:49,385 --> 00:13:51,695
837
499405058cfd updated
Christian Urban <christian.urban@kcl.ac.uk>
parents: 765
diff changeset
  1241
It's actually 4 million a's.
761
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1242
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1243
274
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1244
00:13:51,695 --> 00:13:55,100
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1245
So our regular
837
499405058cfd updated
Christian Urban <christian.urban@kcl.ac.uk>
parents: 765
diff changeset
  1246
expression matcher needs
761
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1247
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1248
275
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1249
00:13:55,100 --> 00:13:57,635
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1250
less than ten seconds to
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1251
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1252
276
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1253
00:13:57,635 --> 00:14:00,725
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1254
match a string of length
837
499405058cfd updated
Christian Urban <christian.urban@kcl.ac.uk>
parents: 765
diff changeset
  1255
of 4 million a's.
761
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1256
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1257
277
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1258
00:14:00,725 --> 00:14:04,430
837
499405058cfd updated
Christian Urban <christian.urban@kcl.ac.uk>
parents: 765
diff changeset
  1259
Contrast that Python, Java 8,
761
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1260
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1261
278
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1262
00:14:04,430 --> 00:14:06,770
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1263
and JavaScript need half a minute
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1264
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1265
279
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1266
00:14:06,770 --> 00:14:09,905
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1267
already for a string
837
499405058cfd updated
Christian Urban <christian.urban@kcl.ac.uk>
parents: 765
diff changeset
  1268
of length just 30.
761
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1269
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1270
280
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1271
00:14:09,905 --> 00:14:12,365
837
499405058cfd updated
Christian Urban <christian.urban@kcl.ac.uk>
parents: 765
diff changeset
  1272
I was very careful with Java 8.
761
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1273
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1274
281
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1275
00:14:12,365 --> 00:14:15,725
837
499405058cfd updated
Christian Urban <christian.urban@kcl.ac.uk>
parents: 765
diff changeset
  1276
Yes, Java 9 and above,
761
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1277
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1278
282
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1279
00:14:15,725 --> 00:14:17,180
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1280
they already have
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1281
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1282
283
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1283
00:14:17,180 --> 00:14:19,610
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1284
a much better regular
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1285
expression matching engine,
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1286
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1287
284
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1288
00:14:19,610 --> 00:14:22,805
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1289
but still we will be running
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1290
circles around them.
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1291
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1292
285
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1293
00:14:22,805 --> 00:14:27,050
837
499405058cfd updated
Christian Urban <christian.urban@kcl.ac.uk>
parents: 765
diff changeset
  1294
with this data.
499405058cfd updated
Christian Urban <christian.urban@kcl.ac.uk>
parents: 765
diff changeset
  1295
I call this slide:
761
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1296
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1297
286
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1298
00:14:27,050 --> 00:14:29,675
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1299
Why bother with
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1300
regular expressions?
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1301
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1302
287
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1303
00:14:29,675 --> 00:14:33,515
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1304
But you can probably
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1305
see these are
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1306
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1307
288
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1308
00:14:33,515 --> 00:14:34,910
837
499405058cfd updated
Christian Urban <christian.urban@kcl.ac.uk>
parents: 765
diff changeset
  1309
abysmal times by
761
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1310
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1311
289
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1312
00:14:34,910 --> 00:14:38,015
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1313
the existing regular
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1314
expression matching engines.
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1315
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1316
290
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1317
00:14:38,015 --> 00:14:40,070
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1318
And it's actually
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1319
surprising that after
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1320
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1321
291
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1322
00:14:40,070 --> 00:14:42,695
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1323
one lecture we can already
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1324
do substantially better.
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1325
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1326
292
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1327
00:14:42,695 --> 00:14:47,495
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1328
And if you don't believe
837
499405058cfd updated
Christian Urban <christian.urban@kcl.ac.uk>
parents: 765
diff changeset
  1329
in the times, I gave here,
761
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1330
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1331
293
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1332
00:14:47,495 --> 00:14:50,090
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1333
please feel free to
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1334
play on your own
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1335
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1336
294
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1337
00:14:50,090 --> 00:14:52,865
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1338
with the examples
837
499405058cfd updated
Christian Urban <christian.urban@kcl.ac.uk>
parents: 765
diff changeset
  1339
I uploaded on KEATS.
761
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1340
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1341
295
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1342
00:14:52,865 --> 00:14:55,235
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1343
These are exactly the programs
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1344
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1345
296
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1346
00:14:55,235 --> 00:14:57,470
837
499405058cfd updated
Christian Urban <christian.urban@kcl.ac.uk>
parents: 765
diff changeset
  1347
I used here in the examples.
761
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1348
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1349
297
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1350
00:14:57,470 --> 00:14:59,255
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1351
So feel free.
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1352
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1353
298
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1354
00:14:59,255 --> 00:15:01,970
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1355
You might however now think, hmm.
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1356
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1357
299
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1358
00:15:01,970 --> 00:15:05,449
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1359
These are two very
837
499405058cfd updated
Christian Urban <christian.urban@kcl.ac.uk>
parents: 765
diff changeset
  1360
well chosen examples,
761
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1361
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1362
300
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1363
00:15:05,449 --> 00:15:07,145
837
499405058cfd updated
Christian Urban <christian.urban@kcl.ac.uk>
parents: 765
diff changeset
  1364
and I admit that's true,
761
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1365
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1366
301
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1367
00:15:07,145 --> 00:15:09,410
837
499405058cfd updated
Christian Urban <christian.urban@kcl.ac.uk>
parents: 765
diff changeset
  1368
and such problems never
761
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1369
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1370
302
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1371
00:15:09,410 --> 00:15:12,540
837
499405058cfd updated
Christian Urban <christian.urban@kcl.ac.uk>
parents: 765
diff changeset
  1372
cause any problems
761
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1373
in real life.
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1374
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1375
303
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1376
00:15:13,300 --> 00:15:15,980
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1377
Regular expressions are used very
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1378
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1379
304
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1380
00:15:15,980 --> 00:15:19,415
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1381
frequently and they
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1382
do cause problems.
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1383
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1384
305
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1385
00:15:19,415 --> 00:15:21,410
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1386
So here's my first example from
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1387
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1388
306
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1389
00:15:21,410 --> 00:15:23,885
837
499405058cfd updated
Christian Urban <christian.urban@kcl.ac.uk>
parents: 765
diff changeset
  1390
a company called Cloudflare.
761
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1391
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1392
307
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1393
00:15:23,885 --> 00:15:27,560
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1394
This is a huge hosting company
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1395
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1396
308
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1397
00:15:27,560 --> 00:15:30,935
837
499405058cfd updated
Christian Urban <christian.urban@kcl.ac.uk>
parents: 765
diff changeset
  1398
which hosts very
761
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1399
well-known web pages.
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1400
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1401
309
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1402
00:15:30,935 --> 00:15:34,970
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1403
And they really try hard
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1404
to have no outage at all.
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1405
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1406
310
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1407
00:15:34,970 --> 00:15:37,340
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1408
And they manage
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1409
that for six years.
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1410
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1411
311
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1412
00:15:37,340 --> 00:15:39,320
837
499405058cfd updated
Christian Urban <christian.urban@kcl.ac.uk>
parents: 765
diff changeset
  1413
But then a regular expression,
761
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1414
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1415
312
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1416
00:15:39,320 --> 00:15:41,180
837
499405058cfd updated
Christian Urban <christian.urban@kcl.ac.uk>
parents: 765
diff changeset
  1417
actually this one, caused
761
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1418
a problem and you
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1419
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1420
313
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1421
00:15:41,180 --> 00:15:43,265
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1422
can see they're also
837
499405058cfd updated
Christian Urban <christian.urban@kcl.ac.uk>
parents: 765
diff changeset
  1423
two stars
761
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1424
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1425
314
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1426
00:15:43,265 --> 00:15:44,630
837
499405058cfd updated
Christian Urban <christian.urban@kcl.ac.uk>
parents: 765
diff changeset
  1427
at the end.
761
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1428
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1429
315
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1430
00:15:44,630 --> 00:15:46,955
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1431
And because of that string needed
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1432
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1433
316
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1434
00:15:46,955 --> 00:15:49,865
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1435
too much time to be matched.
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1436
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1437
317
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1438
00:15:49,865 --> 00:15:50,990
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1439
And because of that,
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1440
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1441
318
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1442
00:15:50,990 --> 00:15:52,430
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1443
they had some outage for,
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1444
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1445
319
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1446
00:15:52,430 --> 00:15:54,125
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1447
I think several hours,
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1448
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1449
320
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1450
00:15:54,125 --> 00:15:57,920
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1451
actually in their malware
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1452
detection subsystem.
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1453
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1454
321
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1455
00:15:57,920 --> 00:16:02,060
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1456
And the second example
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1457
comes from 2016,
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1458
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1459
322
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1460
00:16:02,060 --> 00:16:04,040
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1461
where Stack Exchange,
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1462
I guess you know
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1463
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1464
323
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1465
00:16:04,040 --> 00:16:06,650
837
499405058cfd updated
Christian Urban <christian.urban@kcl.ac.uk>
parents: 765
diff changeset
  1466
this webpage, had
499405058cfd updated
Christian Urban <christian.urban@kcl.ac.uk>
parents: 765
diff changeset
  1467
also an outage for
761
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1468
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1469
324
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1470
00:16:06,650 --> 00:16:08,390
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1471
I think at least an hour.
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1472
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1473
325
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1474
00:16:08,390 --> 00:16:13,070
837
499405058cfd updated
Christian Urban <christian.urban@kcl.ac.uk>
parents: 765
diff changeset
  1475
Because a regular expression,
499405058cfd updated
Christian Urban <christian.urban@kcl.ac.uk>
parents: 765
diff changeset
  1476
needed to format posts,
761
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1477
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1478
326
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1479
00:16:13,070 --> 00:16:15,575
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1480
needed too much time to
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1481
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1482
327
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1483
00:16:15,575 --> 00:16:19,010
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1484
recognize whether this post
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1485
should be accepted or not.
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1486
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1487
328
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1488
00:16:19,010 --> 00:16:23,390
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1489
And again, there was a
837
499405058cfd updated
Christian Urban <christian.urban@kcl.ac.uk>
parents: 765
diff changeset
  1490
similar kind of problem.
761
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1491
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1492
329
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1493
00:16:23,390 --> 00:16:24,950
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1494
And you can read
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1495
the stories behind
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1496
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1497
330
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1498
00:16:24,950 --> 00:16:28,080
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1499
that on these two given links.
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1500
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1501
331
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1502
00:16:28,720 --> 00:16:31,730
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1503
When I looked at
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1504
this the first time,
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1505
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1506
332
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1507
00:16:31,730 --> 00:16:34,175
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1508
what surprised me is
837
499405058cfd updated
Christian Urban <christian.urban@kcl.ac.uk>
parents: 765
diff changeset
  1509
that theoreticians,
761
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1510
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1511
333
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1512
00:16:34,175 --> 00:16:37,520
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1513
who sometimes dedicate their
837
499405058cfd updated
Christian Urban <christian.urban@kcl.ac.uk>
parents: 765
diff changeset
  1514
life to regular expressions
761
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1515
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1516
334
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1517
00:16:37,520 --> 00:16:39,440
837
499405058cfd updated
Christian Urban <christian.urban@kcl.ac.uk>
parents: 765
diff changeset
  1518
and know really a lot about
761
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1519
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1520
335
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1521
00:16:39,440 --> 00:16:41,690
837
499405058cfd updated
Christian Urban <christian.urban@kcl.ac.uk>
parents: 765
diff changeset
  1522
them, didn't know
761
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1523
anything about this.
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1524
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1525
336
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1526
00:16:41,690 --> 00:16:43,610
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1527
But engineers, they
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1528
already created
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1529
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1530
337
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1531
00:16:43,610 --> 00:16:46,160
837
499405058cfd updated
Christian Urban <christian.urban@kcl.ac.uk>
parents: 765
diff changeset
  1532
a name for that:
499405058cfd updated
Christian Urban <christian.urban@kcl.ac.uk>
parents: 765
diff changeset
  1533
Regular Expression
761
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1534
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1535
338
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1536
00:16:46,160 --> 00:16:47,975
837
499405058cfd updated
Christian Urban <christian.urban@kcl.ac.uk>
parents: 765
diff changeset
  1537
Denial of Service Attack.
761
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1538
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1539
339
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1540
00:16:47,975 --> 00:16:49,745
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1541
Because what you can,
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1542
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1543
340
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1544
00:16:49,745 --> 00:16:51,230
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1545
what can happen now is that
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1546
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1547
341
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1548
00:16:51,230 --> 00:16:54,920
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1549
attackers look for
837
499405058cfd updated
Christian Urban <christian.urban@kcl.ac.uk>
parents: 765
diff changeset
  1550
certain strings
761
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1551
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1552
342
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1553
00:16:54,920 --> 00:16:56,780
837
499405058cfd updated
Christian Urban <christian.urban@kcl.ac.uk>
parents: 765
diff changeset
  1554
that make your regular expression
761
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1555
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1556
343
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1557
00:16:56,780 --> 00:16:59,105
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1558
matching engine topple over.
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1559
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1560
344
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1561
00:16:59,105 --> 00:17:01,370
837
499405058cfd updated
Christian Urban <christian.urban@kcl.ac.uk>
parents: 765
diff changeset
  1562
And these kind of 
761
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1563
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1564
345
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1565
00:17:01,370 --> 00:17:04,160
837
499405058cfd updated
Christian Urban <christian.urban@kcl.ac.uk>
parents: 765
diff changeset
  1566
regular expressions are called
499405058cfd updated
Christian Urban <christian.urban@kcl.ac.uk>
parents: 765
diff changeset
  1567
Evil Regular Expressions.
761
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1568
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1569
346
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1570
00:17:04,160 --> 00:17:06,350
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1571
And actually there are
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1572
quite a number of them.
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1573
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1574
347
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1575
00:17:06,350 --> 00:17:08,495
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1576
So you seen this one,
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1577
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1578
348
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1579
00:17:08,495 --> 00:17:11,255
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1580
the first one, and the
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1581
second one already.
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1582
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1583
349
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1584
00:17:11,255 --> 00:17:13,400
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1585
But there are many, many more.
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1586
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1587
350
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1588
00:17:13,400 --> 00:17:15,620
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1589
And you can easily have in
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1590
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1591
351
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1592
00:17:15,620 --> 00:17:18,560
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1593
your program one of
837
499405058cfd updated
Christian Urban <christian.urban@kcl.ac.uk>
parents: 765
diff changeset
  1594
these regular expressions.
761
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1595
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1596
352
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1597
00:17:18,560 --> 00:17:21,830
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1598
And then you have the
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1599
problem that if you do have
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1600
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1601
353
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1602
00:17:21,830 --> 00:17:23,240
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1603
this regular expression and
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1604
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1605
354
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1606
00:17:23,240 --> 00:17:25,640
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1607
somebody finds the
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1608
corresponding string,
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1609
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1610
355
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1611
00:17:25,640 --> 00:17:29,945
837
499405058cfd updated
Christian Urban <christian.urban@kcl.ac.uk>
parents: 765
diff changeset
  1612
which make the regular
761
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1613
matching engine topple over,
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1614
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1615
356
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1616
00:17:29,945 --> 00:17:31,820
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1617
then you have a problem
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1618
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1619
357
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1620
00:17:31,820 --> 00:17:34,295
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1621
because your webpage is
837
499405058cfd updated
Christian Urban <christian.urban@kcl.ac.uk>
parents: 765
diff changeset
  1622
probably not available.
761
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1623
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1624
358
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1625
00:17:34,295 --> 00:17:36,140
837
499405058cfd updated
Christian Urban <christian.urban@kcl.ac.uk>
parents: 765
diff changeset
  1626
This phenomenon is also sometimes 
761
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1627
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1628
359
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1629
00:17:36,140 --> 00:17:39,350
837
499405058cfd updated
Christian Urban <christian.urban@kcl.ac.uk>
parents: 765
diff changeset
  1630
called
761
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1631
catastrophic backtracking.
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1632
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1633
360
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1634
00:17:39,350 --> 00:17:43,595
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1635
In lecture three, we will
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1636
look at this more carefully.
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1637
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1638
361
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1639
00:17:43,595 --> 00:17:46,910
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1640
And actually why that
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1641
is such a problem in
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1642
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1643
362
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1644
00:17:46,910 --> 00:17:50,795
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1645
real life is actually
837
499405058cfd updated
Christian Urban <christian.urban@kcl.ac.uk>
parents: 765
diff changeset
  1646
not to do with lexers.
761
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1647
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1648
363
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1649
00:17:50,795 --> 00:17:53,180
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1650
Yes, regular
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1651
expressions are used as
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1652
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1653
364
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1654
00:17:53,180 --> 00:17:55,040
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1655
the basic tool for implementing
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1656
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1657
365
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1658
00:17:55,040 --> 00:17:57,185
837
499405058cfd updated
Christian Urban <christian.urban@kcl.ac.uk>
parents: 765
diff changeset
  1659
lexers. But regular expressions,
761
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1660
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1661
366
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1662
00:17:57,185 --> 00:18:00,065
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1663
of course, used in
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1664
a much wider area.
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1665
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1666
367
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1667
00:18:00,065 --> 00:18:03,770
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1668
And they especially used for
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1669
network intrusion detection.
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1670
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1671
368
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1672
00:18:03,770 --> 00:18:06,590
837
499405058cfd updated
Christian Urban <christian.urban@kcl.ac.uk>
parents: 765
diff changeset
  1673
Remember, say you're having to
761
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1674
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1675
369
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1676
00:18:06,590 --> 00:18:10,130
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1677
administer a big network
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1678
and you only want to let
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1679
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1680
370
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1681
00:18:10,130 --> 00:18:13,640
837
499405058cfd updated
Christian Urban <christian.urban@kcl.ac.uk>
parents: 765
diff changeset
  1682
in packets which you think are OK
761
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1683
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1684
371
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1685
00:18:13,640 --> 00:18:14,930
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1686
and you want to keep out
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1687
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1688
372
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1689
00:18:14,930 --> 00:18:17,645
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1690
any package which might
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1691
hack into your network.
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1692
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1693
373
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1694
00:18:17,645 --> 00:18:22,670
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1695
So what they have is they
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1696
have suites of thousands and
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1697
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1698
374
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1699
00:18:22,670 --> 00:18:25,745
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1700
sometimes even more
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1701
regular expressions which
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1702
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1703
375
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1704
00:18:25,745 --> 00:18:27,755
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1705
check whether this package
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1706
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1707
376
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1708
00:18:27,755 --> 00:18:30,065
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1709
satisfies some patterns or not.
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1710
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1711
377
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1712
00:18:30,065 --> 00:18:31,460
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1713
And in this case it will be left
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1714
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1715
378
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1716
00:18:31,460 --> 00:18:34,205
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1717
out or it will be let in.
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1718
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1719
379
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1720
00:18:34,205 --> 00:18:36,335
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1721
And with networks,
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1722
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1723
380
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1724
00:18:36,335 --> 00:18:39,080
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1725
the problem is that our
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1726
hardware is already
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1727
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1728
381
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1729
00:18:39,080 --> 00:18:43,190
837
499405058cfd updated
Christian Urban <christian.urban@kcl.ac.uk>
parents: 765
diff changeset
  1730
so fast that the regular
499405058cfd updated
Christian Urban <christian.urban@kcl.ac.uk>
parents: 765
diff changeset
  1731
expressions
761
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1732
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1733
382
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1734
00:18:43,190 --> 00:18:45,169
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1735
really become a bottleneck.
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1736
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1737
383
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1738
00:18:45,169 --> 00:18:47,060
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1739
Because what do you do if now is
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1740
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1741
384
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1742
00:18:47,060 --> 00:18:49,880
837
499405058cfd updated
Christian Urban <christian.urban@kcl.ac.uk>
parents: 765
diff changeset
  1743
suddenly a regular expression
499405058cfd updated
Christian Urban <christian.urban@kcl.ac.uk>
parents: 765
diff changeset
  1744
takes too much time?
761
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1745
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1746
385
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1747
00:18:49,880 --> 00:18:52,670
837
499405058cfd updated
Christian Urban <christian.urban@kcl.ac.uk>
parents: 765
diff changeset
  1748
Do you just stop the matching
761
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1749
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1750
386
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1751
00:18:52,670 --> 00:18:55,100
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1752
and let the package
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1753
in regardless?
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1754
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1755
387
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1756
00:18:55,100 --> 00:18:58,190
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1757
Or do you just hold
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1758
the network up
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1759
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1760
388
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1761
00:18:58,190 --> 00:19:01,715
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1762
and don't let anything in
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1763
until you decided that.
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1764
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1765
389
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1766
00:19:01,715 --> 00:19:04,895
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1767
So that's actually a
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1768
really hard problem.
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1769
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1770
390
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1771
00:19:04,895 --> 00:19:06,650
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1772
But the first time I came across
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1773
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1774
391
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1775
00:19:06,650 --> 00:19:09,965
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1776
that problem was actually
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1777
by this engineer.
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1778
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1779
392
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1780
00:19:09,965 --> 00:19:13,820
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1781
And it's always say that
837
499405058cfd updated
Christian Urban <christian.urban@kcl.ac.uk>
parents: 765
diff changeset
  1782
Germans don't have any humor.
761
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1783
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1784
393
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1785
00:19:13,820 --> 00:19:16,985
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1786
But I found that
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1787
video quite funny.
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1788
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1789
394
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1790
00:19:16,985 --> 00:19:19,145
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1791
Maybe you have a
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1792
different opinion,
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1793
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1794
395
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1795
00:19:19,145 --> 00:19:21,095
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1796
but feel free to
837
499405058cfd updated
Christian Urban <christian.urban@kcl.ac.uk>
parents: 765
diff changeset
  1797
have a look. 
761
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1798
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1799
396
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1800
00:19:21,095 --> 00:19:23,705
837
499405058cfd updated
Christian Urban <christian.urban@kcl.ac.uk>
parents: 765
diff changeset
  1801
It explains exactly that problem.
761
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1802
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1803
397
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1804
00:19:23,705 --> 00:19:25,610
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1805
So in the next video,
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1806
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1807
398
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1808
00:19:25,610 --> 00:19:28,445
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1809
we will start to
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1810
implement this matcher.
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1811
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1812
399
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1813
00:19:28,445 --> 00:19:30,870
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1814
So I hope to see you there.