videos/01-evilregexes.srt
author Christian Urban <christian.urban@kcl.ac.uk>
Sat, 04 Sep 2021 14:08:09 +0100
changeset 834 d3e38dd3b449
parent 765 b294cfbb5c01
child 837 499405058cfd
permissions -rw-r--r--
cwupdates
Ignore whitespace changes - Everywhere: Within whitespace: At end of lines:
761
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
     1
1
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
     2
00:00:06,240 --> 00:00:11,050
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
     3
Welcome back. This video
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
     4
is about regular expressions.
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
     5
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
     6
2
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
     7
00:00:11,050 --> 00:00:14,230
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
     8
We want to use regular
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
     9
expressions in our lexer.
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
    10
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
    11
3
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
    12
00:00:14,230 --> 00:00:16,165
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
    13
And the purpose of the
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
    14
lexer is to find
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
    15
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
    16
4
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
    17
00:00:16,165 --> 00:00:18,070
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
    18
out where the words in
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
    19
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
    20
5
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
    21
00:00:18,070 --> 00:00:21,070
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
    22
our programs are. However
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
    23
regular expressions
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
    24
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
    25
6
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
    26
00:00:21,070 --> 00:00:23,875
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
    27
are fundamental tool
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
    28
in computer science.
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
    29
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
    30
7
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
    31
00:00:23,875 --> 00:00:27,910
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
    32
And I'm sure you've used them
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
    33
already on several occasions.
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
    34
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
    35
8
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
    36
00:00:27,910 --> 00:00:30,370
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
    37
And one would expect that about
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
    38
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
    39
9
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
    40
00:00:30,370 --> 00:00:31,750
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
    41
regular expressions since they are
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
    42
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
    43
10
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
    44
00:00:31,750 --> 00:00:33,850
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
    45
so well-known and well studied,
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
    46
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
    47
11
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
    48
00:00:33,850 --> 00:00:37,915
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
    49
that everything under the
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
    50
sun is known about them.
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
    51
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
    52
12
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
    53
00:00:37,915 --> 00:00:41,080
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
    54
But actually there's
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
    55
still some surprising
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
    56
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
    57
13
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
    58
00:00:41,080 --> 00:00:44,465
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
    59
and interesting
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
    60
problems with them.
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
    61
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
    62
14
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
    63
00:00:44,465 --> 00:00:47,945
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
    64
And I want to show you
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
    65
them in this video.
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
    66
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
    67
15
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
    68
00:00:47,945 --> 00:00:50,720
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
    69
I'm sure you've seen
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
    70
regular expressions
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
    71
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
    72
16
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
    73
00:00:50,720 --> 00:00:52,445
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
    74
many, many times before.
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
    75
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
    76
17
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
    77
00:00:52,445 --> 00:00:55,100
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
    78
But just to be on the same page,
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
    79
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
    80
18
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
    81
00:00:55,100 --> 00:00:57,110
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
    82
let me just recap them.
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
    83
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
    84
19
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
    85
00:00:57,110 --> 00:00:59,210
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
    86
So here in this line,
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
    87
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
    88
20
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
    89
00:00:59,210 --> 00:01:01,790
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
    90
there is a regular expression
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
    91
which is supposed to
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
    92
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
    93
21
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
    94
00:01:01,790 --> 00:01:05,285
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
    95
recognize some form
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
    96
of email addresses.
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
    97
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
    98
22
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
    99
00:01:05,285 --> 00:01:07,745
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   100
So an e-mail address
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   101
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   102
23
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   103
00:01:07,745 --> 00:01:11,000
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   104
has part which is
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   105
before the @ symbol,
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   106
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   107
24
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   108
00:01:11,000 --> 00:01:13,400
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   109
which is the name of the person.
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   110
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   111
25
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   112
00:01:13,400 --> 00:01:16,880
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   113
And that can be
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   114
any number between
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   115
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   116
26
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   117
00:01:16,880 --> 00:01:20,195
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   118
0 and 9, and letters between a and z.
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   119
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   120
27
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   121
00:01:20,195 --> 00:01:24,155
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   122
Let's say we avoiding
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   123
here capital letters.
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   124
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   125
28
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   126
00:01:24,155 --> 00:01:26,045
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   127
There can be underscores.
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   128
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   129
29
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   130
00:01:26,045 --> 00:01:29,405
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   131
There can be a dot and
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   132
there can be hyphens.
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   133
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   134
30
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   135
00:01:29,405 --> 00:01:35,390
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   136
And after the @ symbol
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   137
comes the domain name.
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   138
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   139
31
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   140
00:01:35,390 --> 00:01:37,310
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   141
So as you can see here,
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   142
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   143
32
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   144
00:01:37,310 --> 00:01:40,640
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   145
we use things like star to
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   146
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   147
33
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   148
00:01:40,640 --> 00:01:44,314
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   149
match letters
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   150
zero or more times.
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   151
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   152
34
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   153
00:01:44,314 --> 00:01:45,985
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   154
Or we have a plus,
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   155
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   156
35
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   157
00:01:45,985 --> 00:01:47,420
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   158
which means you have to match
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   159
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   160
36
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   161
00:01:47,420 --> 00:01:52,489
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   162
at least once or more
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   163
times. Then we have.
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   164
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   165
37
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   166
00:01:52,489 --> 00:01:55,790
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   167
question mark, which says you
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   168
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   169
38
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   170
00:01:55,790 --> 00:01:59,105
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   171
match either it is there
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   172
or it ss not there.
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   173
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   174
39
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   175
00:01:59,105 --> 00:02:01,340
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   176
You are also regular
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   177
expressions which
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   178
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   179
40
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   180
00:02:01,340 --> 00:02:03,755
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   181
match exactly n-times.
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   182
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   183
41
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   184
00:02:03,755 --> 00:02:08,720
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   185
Or this is a regular expression
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   186
for between n and m times.
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   187
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   188
42
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   189
00:02:08,720 --> 00:02:12,065
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   190
You can see in
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   191
this email address,
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   192
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   193
43
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   194
00:02:12,065 --> 00:02:13,730
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   195
the top-level domain
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   196
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   197
44
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   198
00:02:13,730 --> 00:02:16,130
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   199
name can be any letter 
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   200
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   201
45
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   202
00:02:16,130 --> 00:02:19,265
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   203
between a to z,
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   204
and contain dots,
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   205
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   206
46
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   207
00:02:19,265 --> 00:02:22,340
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   208
but can only be two
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   209
characters long
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   210
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   211
47
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   212
00:02:22,340 --> 00:02:25,685
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   213
up till six characters
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   214
and not more.
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   215
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   216
48
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   217
00:02:25,685 --> 00:02:29,240
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   218
Then you also have
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   219
something like ranges.
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   220
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   221
49
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   222
00:02:29,240 --> 00:02:31,220
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   223
So you can see, letters between a
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   224
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   225
50
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   226
00:02:31,220 --> 00:02:33,635
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   227
and z and 0 to 9 and so on.
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   228
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   229
51
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   230
00:02:33,635 --> 00:02:36,545
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   231
Here you also have regular
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   232
expression which can
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   233
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   234
52
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   235
00:02:36,545 --> 00:02:40,070
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   236
match something which
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   237
isn't in this range.
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   238
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   239
53
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   240
00:02:40,070 --> 00:02:42,560
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   241
So for example, if
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   242
you want for example match,
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   243
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   244
54
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   245
00:02:42,560 --> 00:02:44,030
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   246
letters but not numbers,
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   247
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   248
55
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   249
00:02:44,030 --> 00:02:45,800
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   250
you would say, well, if
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   251
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   252
56
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   253
00:02:45,800 --> 00:02:48,990
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   254
this is a number that
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   255
should not match.
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   256
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   257
57
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   258
00:02:49,090 --> 00:02:52,804
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   259
Typically you also
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   260
have these ranges.
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   261
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   262
58
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   263
00:02:52,804 --> 00:02:55,565
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   264
Lowercase letters,
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   265
capital letters.
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   266
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   267
59
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   268
00:02:55,565 --> 00:02:58,550
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   269
Then you have some
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   270
special regular expressions
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   271
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   272
60
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   273
00:02:58,550 --> 00:03:02,195
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   274
like this one is only
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   275
supposed to match digits.
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   276
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   277
61
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   278
00:03:02,195 --> 00:03:05,674
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   279
A dot is supposed to
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   280
match any character.
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   281
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   282
62
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   283
00:03:05,674 --> 00:03:07,370
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   284
And then they have also something
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   285
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   286
63
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   287
00:03:07,370 --> 00:03:09,800
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   288
called groups which
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   289
is supposed to be
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   290
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   291
64
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   292
00:03:09,800 --> 00:03:12,799
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   293
used when you are
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   294
trying to extract
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   295
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   296
65
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   297
00:03:12,799 --> 00:03:15,605
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   298
a string you've matched.
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   299
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   300
66
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   301
00:03:15,605 --> 00:03:19,925
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   302
Okay, so these are the
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   303
typical regular expressions.
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   304
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   305
67
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   306
00:03:19,925 --> 00:03:23,075
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   307
And here's a particular one.
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   308
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   309
68
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   310
00:03:23,075 --> 00:03:25,820
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   311
Trying to match something
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   312
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   313
69
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   314
00:03:25,820 --> 00:03:28,770
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   315
which resembles
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   316
an email address.
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   317
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   318
70
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   319
00:03:29,590 --> 00:03:33,065
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   320
Clearly that should be all easy.
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   321
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   322
71
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   323
00:03:33,065 --> 00:03:36,230
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   324
And our technology should
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   325
be on top of that.
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   326
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   327
72
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   328
00:03:36,230 --> 00:03:37,865
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   329
That we can take a
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   330
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   331
73
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   332
00:03:37,865 --> 00:03:41,015
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   333
regular expressions and
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   334
we can take a string,
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   335
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   336
74
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   337
00:03:41,015 --> 00:03:43,340
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   338
and we should have programs to
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   339
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   340
75
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   341
00:03:43,340 --> 00:03:45,680
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   342
decide whether this
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   343
string is matched
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   344
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   345
76
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   346
00:03:45,680 --> 00:03:50,330
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   347
by a regular expression or
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   348
not and should be easy-peasy, no?
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   349
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   350
77
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   351
00:03:50,330 --> 00:03:56,150
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   352
Well, let's have a
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   353
look at two examples.
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   354
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   355
78
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   356
00:03:56,150 --> 00:04:00,860
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   357
The first regular expression
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   358
is a star star b.
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   359
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   360
79
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   361
00:04:00,860 --> 00:04:02,990
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   362
And it is supposed
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   363
to match strings of
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   364
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   365
80
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   366
00:04:02,990 --> 00:04:05,825
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   367
the form 0 or more a's,
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   368
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   369
81
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   370
00:04:05,825 --> 00:04:10,385
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   371
followed by a b. The parentheses
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   372
you can ignore.
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   373
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   374
82
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   375
00:04:10,385 --> 00:04:11,990
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   376
And a star star
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   377
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   378
83
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   379
00:04:11,990 --> 00:04:14,120
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   380
also doesn't
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   381
make any difference
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   382
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   383
84
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   384
00:04:14,120 --> 00:04:16,505
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   385
to what kind of strings
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   386
that can be matched.
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   387
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   388
85
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   389
00:04:16,505 --> 00:04:21,635
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   390
It can only make 0 more
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   391
a's followed by a b.
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   392
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   393
86
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   394
00:04:21,635 --> 00:04:23,900
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   395
And the other regular expression
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   396
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   397
87
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   398
00:04:23,900 --> 00:04:26,990
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   399
is possibly a character a,
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   400
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   401
88
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   402
00:04:26,990 --> 00:04:32,930
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   403
n times, followed by character
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   404
a axactly n-times.
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   405
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   406
89
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   407
00:04:32,930 --> 00:04:35,570
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   408
And we will try out
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   409
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   410
90
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   411
00:04:35,570 --> 00:04:38,360
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   412
these two regular expressions
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   413
with strings of the form a,
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   414
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   415
91
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   416
00:04:38,360 --> 00:04:39,890
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   417
aa, and so on,
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   418
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   419
92
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   420
00:04:39,890 --> 00:04:45,770
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   421
and up to the length of n. And
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   422
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   423
93
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   424
00:04:45,770 --> 00:04:49,130
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   425
this regular expression should
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   426
actually not match any of
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   427
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   428
94
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   429
00:04:49,130 --> 00:04:53,315
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   430
the strings because the
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   431
final b is missing.
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   432
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   433
95
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   434
00:04:53,315 --> 00:04:56,150
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   435
But that is
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   436
okay. For example
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   437
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   438
96
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   439
00:04:56,150 --> 00:04:57,425
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   440
if you have a regular expression
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   441
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   442
97
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   443
00:04:57,425 --> 00:05:00,110
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   444
that is supposed to
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   445
check whether a string is
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   446
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   447
98
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   448
00:05:00,110 --> 00:05:01,490
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   449
an email address and the user
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   450
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   451
99
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   452
00:05:01,490 --> 00:05:03,380
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   453
gives some random
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   454
strings in there,
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   455
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   456
100
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   457
00:05:03,380 --> 00:05:06,545
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   458
then this regular expression
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   459
should not match that string.
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   460
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   461
101
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   462
00:05:06,545 --> 00:05:08,420
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   463
And for this regular expression
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   464
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   465
102
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   466
00:05:08,420 --> 00:05:11,195
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   467
you have to scratch a
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   468
little bit of your head,
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   469
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   470
103
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   471
00:05:11,195 --> 00:05:12,620
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   472
what it can actually match.
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   473
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   474
104
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   475
00:05:12,620 --> 00:05:14,720
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   476
But after a little bit
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   477
of head scratching,
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   478
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   479
105
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   480
00:05:14,720 --> 00:05:18,260
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   481
you find out can match
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   482
any string which is of
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   483
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   484
106
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   485
00:05:18,260 --> 00:05:22,580
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   486
the length n a's up
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   487
to 2n of a's.
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   488
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   489
107
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   490
00:05:22,580 --> 00:05:24,290
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   491
So anything in this range,
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   492
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   493
108
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   494
00:05:24,290 --> 00:05:27,185
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   495
this regular expression
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   496
can actually match.
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   497
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   498
109
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   499
00:05:27,185 --> 00:05:30,395
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   500
Okay, let's
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   501
take a random tool,
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   502
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   503
110
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   504
00:05:30,395 --> 00:05:32,630
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   505
maybe for example Python.
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   506
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   507
111
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   508
00:05:32,630 --> 00:05:35,240
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   509
So here's a little
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   510
Python program.
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   511
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   512
112
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   513
00:05:35,240 --> 00:05:38,690
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   514
It uses the library
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   515
function of Python to
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   516
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   517
113
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   518
00:05:38,690 --> 00:05:42,935
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   519
match the regular expressions of
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   520
a star star b.
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   521
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   522
114
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   523
00:05:42,935 --> 00:05:46,805
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   524
And we measure time with longer
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   525
and longer strings of a.
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   526
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   527
115
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   528
00:05:46,805 --> 00:05:48,770
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   529
And so conveniently we can give
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   530
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   531
116
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   532
00:05:48,770 --> 00:05:51,140
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   533
the number of a's here
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   534
on the command line.
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   535
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   536
117
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   537
00:05:51,140 --> 00:05:56,900
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   538
If I just call
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   539
this on the command line,
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   540
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   541
118
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   542
00:05:56,900 --> 00:05:59,900
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   543
Let's say we first
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   544
start with five a's.
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   545
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   546
119
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   547
00:05:59,900 --> 00:06:03,920
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   548
And I get also the times which
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   549
in this case is next to nothing.
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   550
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   551
120
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   552
00:06:03,920 --> 00:06:05,960
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   553
And here's the string
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   554
we just matched.
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   555
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   556
121
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   557
00:06:05,960 --> 00:06:07,640
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   558
And obviously the
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   559
regular expression
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   560
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   561
122
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   562
00:06:07,640 --> 00:06:09,110
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   563
did not match the string.
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   564
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   565
123
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   566
00:06:09,110 --> 00:06:11,255
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   567
That's indicated by this none.
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   568
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   569
124
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   570
00:06:11,255 --> 00:06:13,925
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   571
Let's take ten a's.
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   572
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   573
125
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   574
00:06:13,925 --> 00:06:16,490
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   575
It's also pretty quick.
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   576
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   577
126
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   578
00:06:16,490 --> 00:06:20,780
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   579
Fifteen a's, even quicker,
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   580
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   581
127
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   582
00:06:20,780 --> 00:06:23,180
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   583
but these times always need to
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   584
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   585
128
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   586
00:06:23,180 --> 00:06:25,820
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   587
be taken with a grain of salt.
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   588
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   589
129
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   590
00:06:25,820 --> 00:06:28,040
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   591
They are not 100
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   592
percent accurate.
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   593
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   594
130
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   595
00:06:28,040 --> 00:06:31,490
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   596
So 15 is also a let's take
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   597
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   598
131
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   599
00:06:31,490 --> 00:06:36,965
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   600
28th notes already
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   601
double the time.
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   602
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   603
132
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   604
00:06:36,965 --> 00:06:42,440
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   605
Twenty-five longer.
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   606
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   607
133
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   608
00:06:42,440 --> 00:06:45,680
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   609
Okay, that suddenly
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   610
from 02 seconds,
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   611
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   612
134
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   613
00:06:45,680 --> 00:06:48,960
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   614
it takes almost four seconds.
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   615
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   616
135
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   617
00:06:49,600 --> 00:06:54,890
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   618
Six this
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   619
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   620
136
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   621
00:06:54,890 --> 00:07:01,415
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   622
takes six seconds
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   623
already Double, okay?
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   624
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   625
137
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   626
00:07:01,415 --> 00:07:07,229
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   627
Go to 28. That would be now.
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   628
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   629
138
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   630
00:07:08,890 --> 00:07:11,840
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   631
You see the string
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   632
isn't very long,
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   633
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   634
139
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   635
00:07:11,840 --> 00:07:13,340
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   636
so that could be easily like
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   637
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   638
140
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   639
00:07:13,340 --> 00:07:16,070
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   640
just the size of
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   641
an email address.
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   642
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   643
141
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   644
00:07:16,070 --> 00:07:19,280
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   645
And the regular
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   646
expression matching
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   647
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   648
142
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   649
00:07:19,280 --> 00:07:22,550
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   650
engine in Python needs
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   651
quite a long time
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   652
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   653
143
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   654
00:07:22,550 --> 00:07:24,710
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   655
to find out that
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   656
this string of 28
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   657
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   658
144
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   659
00:07:24,710 --> 00:07:26,570
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   660
AES is actually not much
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   661
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   662
145
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   663
00:07:26,570 --> 00:07:28,490
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   664
by that you see it's
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   665
still not finished.
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   666
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   667
146
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   668
00:07:28,490 --> 00:07:32,900
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   669
I think it should take
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   670
approximately like 20 seconds.
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   671
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   672
147
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   673
00:07:32,900 --> 00:07:34,400
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   674
Okay. Already 30.
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   675
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   676
148
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   677
00:07:34,400 --> 00:07:36,530
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   678
And if we would try
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   679
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   680
149
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   681
00:07:36,530 --> 00:07:40,805
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   682
30 would be already
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   683
more than a minute.
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   684
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   685
150
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   686
00:07:40,805 --> 00:07:43,940
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   687
And if I could read
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   688
something like hundreds,
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   689
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   690
151
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   691
00:07:43,940 --> 00:07:46,220
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   692
you remember if a doubling in
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   693
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   694
152
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   695
00:07:46,220 --> 00:07:48,770
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   696
each step or the second step,
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   697
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   698
153
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   699
00:07:48,770 --> 00:07:50,720
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   700
the story with the chess board,
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   701
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   702
154
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   703
00:07:50,720 --> 00:07:53,855
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   704
we probably would sit here
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   705
until the next century.
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   706
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   707
155
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   708
00:07:53,855 --> 00:07:56,820
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   709
So something strange here.
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   710
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   711
156
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   712
00:07:57,580 --> 00:08:01,355
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   713
Okay, that might be just
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   714
a problem of Python.
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   715
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   716
157
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   717
00:08:01,355 --> 00:08:02,990
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   718
Let's have a look at another
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   719
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   720
158
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   721
00:08:02,990 --> 00:08:04,985
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   722
regular expression
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   723
matching engine.
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   724
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   725
159
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   726
00:08:04,985 --> 00:08:06,890
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   727
This time from JavaScript,
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   728
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   729
160
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   730
00:08:06,890 --> 00:08:10,040
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   731
also are pretty well-known
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   732
programming language.
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   733
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   734
161
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   735
00:08:10,040 --> 00:08:13,610
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   736
So here you can see
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   737
it's still a star,
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   738
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   739
162
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   740
00:08:13,610 --> 00:08:16,235
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   741
star followed by b,
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   742
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   743
163
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   744
00:08:16,235 --> 00:08:18,920
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   745
by direct expression is
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   746
supposed to match that from
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   747
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   748
164
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   749
00:08:18,920 --> 00:08:21,830
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   750
the beginning of the
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   751
string up till the end.
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   752
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   753
165
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   754
00:08:21,830 --> 00:08:23,930
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   755
So there's not any difference
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   756
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   757
166
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   758
00:08:23,930 --> 00:08:26,150
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   759
in the strings this work
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   760
expression matches.
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   761
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   762
167
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   763
00:08:26,150 --> 00:08:28,610
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   764
We'll just start at the
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   765
beginning of the string
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   766
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   767
168
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   768
00:08:28,610 --> 00:08:31,460
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   769
and finish at the
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   770
end of the string.
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   771
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   772
169
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   773
00:08:31,460 --> 00:08:35,285
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   774
And we again, we just use
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   775
repeated A's for that.
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   776
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   777
170
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   778
00:08:35,285 --> 00:08:38,195
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   779
And similarly, we can
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   780
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   781
171
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   782
00:08:38,195 --> 00:08:41,930
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   783
call it on the command line
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   784
and can do some timing.
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   785
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   786
172
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   787
00:08:41,930 --> 00:08:44,540
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   788
So ten SBA, good.
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   789
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   790
173
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   791
00:08:44,540 --> 00:08:46,340
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   792
Here's the string.
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   793
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   794
174
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   795
00:08:46,340 --> 00:08:48,320
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   796
It cannot match that string.
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   797
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   798
175
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   799
00:08:48,320 --> 00:08:50,525
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   800
And it's pretty fast.
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   801
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   802
176
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   803
00:08:50,525 --> 00:08:54,725
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   804
Friendly. Although pretty fast.
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   805
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   806
177
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   807
00:08:54,725 --> 00:08:59,120
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   808
Five, again,
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   809
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   810
178
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   811
00:08:59,120 --> 00:09:06,650
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   812
somehow is kind of
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   813
threshold that is 25, 26.
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   814
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   815
179
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   816
00:09:06,650 --> 00:09:09,485
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   817
Suddenly it takes much longer.
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   818
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   819
180
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   820
00:09:09,485 --> 00:09:14,360
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   821
And it has essentially the
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   822
same problem as with Python.
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   823
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   824
181
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   825
00:09:14,360 --> 00:09:17,165
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   826
So you'll see in now from 26 on,
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   827
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   828
182
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   829
00:09:17,165 --> 00:09:19,250
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   830
the Times has always
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   831
doubling from
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   832
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   833
183
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   834
00:09:19,250 --> 00:09:21,860
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   835
three seconds to seven seconds.
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   836
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   837
184
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   838
00:09:21,860 --> 00:09:23,330
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   839
So you can imagine what that
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   840
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   841
185
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   842
00:09:23,330 --> 00:09:24,890
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   843
roughly takes when I put your
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   844
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   845
186
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   846
00:09:24,890 --> 00:09:30,230
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   847
27 and you see the
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   848
string isn't very long.
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   849
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   850
187
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   851
00:09:30,230 --> 00:09:32,165
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   852
Let's choose twenties or maize.
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   853
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   854
188
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   855
00:09:32,165 --> 00:09:35,419
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   856
Imagine you have to
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   857
search a database
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   858
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   859
189
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   860
00:09:35,419 --> 00:09:38,720
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   861
with kilobytes of data.
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   862
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   863
190
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   864
00:09:38,720 --> 00:09:42,260
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   865
This, these regular
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   866
expressions that would years
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   867
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   868
191
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   869
00:09:42,260 --> 00:09:48,150
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   870
need years to go through with
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   871
these regular expressions.
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   872
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   873
192
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   874
00:09:48,630 --> 00:09:51,850
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   875
Okay, maybe the people in
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   876
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   877
193
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   878
00:09:51,850 --> 00:09:55,435
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   879
Python and JavaScript,
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   880
they're just idiots.
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   881
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   882
194
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   883
00:09:55,435 --> 00:09:58,180
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   884
Surely Java must do much better.
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   885
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   886
195
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   887
00:09:58,180 --> 00:10:01,045
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   888
So here's a program.
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   889
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   890
196
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   891
00:10:01,045 --> 00:10:03,415
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   892
You can see this again
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   893
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   894
197
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   895
00:10:03,415 --> 00:10:05,980
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   896
is the reg expression
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   897
and we just having
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   898
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   899
198
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   900
00:10:05,980 --> 00:10:08,320
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   901
some scaffolding to generate
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   902
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   903
199
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   904
00:10:08,320 --> 00:10:11,905
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   905
strings from five up till 28.
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   906
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   907
200
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   908
00:10:11,905 --> 00:10:14,305
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   909
And if we run that,
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   910
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   911
201
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   912
00:10:14,305 --> 00:10:16,660
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   913
actually does that automatically.
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   914
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   915
202
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   916
00:10:16,660 --> 00:10:19,900
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   917
So uphill 19, pretty fast,
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   918
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   919
203
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   920
00:10:19,900 --> 00:10:24,925
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   921
but then starting from
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   922
23, skidding pretty slow.
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   923
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   924
204
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   925
00:10:24,925 --> 00:10:27,445
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   926
So the question is
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   927
what's going on?
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   928
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   929
205
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   930
00:10:27,445 --> 00:10:29,230
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   931
By the way, I'm not quoting here.
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   932
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   933
206
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   934
00:10:29,230 --> 00:10:33,755
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   935
Scala, using internally
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   936
the regular expression
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   937
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   938
207
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   939
00:10:33,755 --> 00:10:36,665
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   940
matching engine from Java.
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   941
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   942
208
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   943
00:10:36,665 --> 00:10:39,065
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   944
So would have exactly
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   945
the same problem.
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   946
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   947
209
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   948
00:10:39,065 --> 00:10:41,480
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   949
Also, I have been
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   950
here very careful,
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   951
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   952
210
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   953
00:10:41,480 --> 00:10:43,550
765
b294cfbb5c01 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents: 761
diff changeset
   954
I'm using here Java 8,
761
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   955
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   956
211
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   957
00:10:43,550 --> 00:10:46,085
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   958
which nowadays is quite old.
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   959
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   960
212
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   961
00:10:46,085 --> 00:10:50,765
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   962
But you will see also
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   963
current Java versions.
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   964
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   965
213
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   966
00:10:50,765 --> 00:10:55,490
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   967
We will see we can out-compete
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   968
them by magnitudes.
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   969
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   970
214
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   971
00:10:55,490 --> 00:10:57,605
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   972
So I think I can that.
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   973
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   974
215
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   975
00:10:57,605 --> 00:10:59,165
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   976
Now, just finish here.
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   977
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   978
216
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   979
00:10:59,165 --> 00:11:04,025
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   980
You see the problem. Just
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   981
for completeness sake.
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   982
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   983
217
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   984
00:11:04,025 --> 00:11:07,010
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   985
Here is a Ruby program.
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   986
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   987
218
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   988
00:11:07,010 --> 00:11:09,935
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   989
This is using the other
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   990
regular expression.
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   991
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   992
219
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   993
00:11:09,935 --> 00:11:12,935
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   994
In this case the
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   995
string should match.
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   996
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   997
220
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   998
00:11:12,935 --> 00:11:20,300
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   999
And again it tries out
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1000
strings between 130 here.
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1001
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1002
221
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1003
00:11:20,300 --> 00:11:23,450
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1004
That's a program actually
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1005
a former student produced.
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1006
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1007
222
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1008
00:11:23,450 --> 00:11:25,565
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1009
And you can see four a's
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1010
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1011
223
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1012
00:11:25,565 --> 00:11:29,780
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1013
of links up till 20
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1014
AES is pretty fast.
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1015
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1016
224
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1017
00:11:29,780 --> 00:11:32,495
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1018
But then starting at 26,
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1019
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1020
225
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1021
00:11:32,495 --> 00:11:35,285
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1022
it's getting really slow.
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1023
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1024
226
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1025
00:11:35,285 --> 00:11:37,100
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1026
So in this case,
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1027
remember the string
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1028
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1029
227
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1030
00:11:37,100 --> 00:11:38,870
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1031
is actually matched by
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1032
the regular expression.
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1033
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1034
228
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1035
00:11:38,870 --> 00:11:40,130
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1036
So it has nothing to do
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1037
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1038
229
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1039
00:11:40,130 --> 00:11:41,540
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1040
with a regular
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1041
expression actually
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1042
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1043
230
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1044
00:11:41,540 --> 00:11:45,485
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1045
matches a string or does
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1046
not match a string.
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1047
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1048
231
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1049
00:11:45,485 --> 00:11:48,260
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1050
I admit though these
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1051
regular expressions
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1052
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1053
232
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1054
00:11:48,260 --> 00:11:49,610
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1055
are carefully chosen,
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1056
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1057
233
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1058
00:11:49,610 --> 00:11:52,250
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1059
as you will see later on.
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1060
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1061
234
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1062
00:11:52,250 --> 00:11:55,620
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1063
Hey, I also just stop that here.
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1064
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1065
235
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1066
00:11:55,710 --> 00:12:00,985
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1067
Okay, this slight collect
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1068
this information about times.
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1069
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1070
236
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1071
00:12:00,985 --> 00:12:03,400
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1072
On the right hand side will
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1073
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1074
237
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1075
00:12:03,400 --> 00:12:05,860
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1076
be our regular expression mantra,
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1077
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1078
238
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1079
00:12:05,860 --> 00:12:08,290
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1080
which we implement next week.
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1081
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1082
239
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1083
00:12:08,290 --> 00:12:10,795
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1084
On the left-hand side,
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1085
are these times by
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1086
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1087
240
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1088
00:12:10,795 --> 00:12:14,260
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1089
barriers than regular
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1090
expression matching engines?
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1091
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1092
241
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1093
00:12:14,260 --> 00:12:17,809
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1094
On the top is this
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1095
regular expression.
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1096
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1097
242
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1098
00:12:19,080 --> 00:12:23,335
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1099
Possible a n times a n times.
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1100
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1101
243
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1102
00:12:23,335 --> 00:12:26,890
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1103
And on the lowest
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1104
is a star, star b.
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1105
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1106
244
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1107
00:12:26,890 --> 00:12:30,370
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1108
And the x-axis show here
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1109
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1110
245
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1111
00:12:30,370 --> 00:12:35,335
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1112
the length of the
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1113
string. How many a's.
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1114
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1115
246
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1116
00:12:35,335 --> 00:12:38,925
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1117
And on the y axis is the time.
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1118
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1119
247
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1120
00:12:38,925 --> 00:12:41,660
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1121
They need to decide whether
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1122
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1123
248
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1124
00:12:41,660 --> 00:12:44,615
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1125
the string is matched by
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1126
the rate expression or not.
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1127
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1128
249
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1129
00:12:44,615 --> 00:12:46,415
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1130
So you can see here, Python,
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1131
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1132
250
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1133
00:12:46,415 --> 00:12:47,945
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1134
Java eight in JavaScript,
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1135
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1136
251
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1137
00:12:47,945 --> 00:12:52,250
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1138
they max out approximately
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1139
at between 2530.
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1140
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1141
252
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1142
00:12:52,250 --> 00:12:53,900
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1143
The kristin, it takes already
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1144
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1145
253
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1146
00:12:53,900 --> 00:12:55,160
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1147
a half a minute to
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1148
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1149
254
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1150
00:12:55,160 --> 00:12:57,410
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1151
decide whether the string
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1152
is matched or not.
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1153
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1154
255
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1155
00:12:57,410 --> 00:13:00,815
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1156
And similarly, in
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1157
the other example,
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1158
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1159
256
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1160
00:13:00,815 --> 00:13:03,830
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1161
Python and derived Ruby max out
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1162
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1163
257
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1164
00:13:03,830 --> 00:13:07,220
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1165
at a similar kind of
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1166
length of the strings.
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1167
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1168
258
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1169
00:13:07,220 --> 00:13:10,400
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1170
Because then they use also
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1171
half a minute to decide
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1172
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1173
259
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1174
00:13:10,400 --> 00:13:13,940
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1175
whether this rec expression
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1176
actually matches the string.
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1177
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1178
260
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1179
00:13:13,940 --> 00:13:16,790
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1180
Contrast that with
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1181
the reg expression
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1182
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1183
261
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1184
00:13:16,790 --> 00:13:19,235
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1185
which we are regular
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1186
expression mantra,
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1187
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1188
262
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1189
00:13:19,235 --> 00:13:21,470
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1190
which we're going to implement.
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1191
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1192
263
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1193
00:13:21,470 --> 00:13:25,040
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1194
This can match
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1195
approximately 10 thousand
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1196
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1197
264
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1198
00:13:25,040 --> 00:13:30,065
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1199
a's in this example and
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1200
needs less than ten seconds.
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1201
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1202
265
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1203
00:13:30,065 --> 00:13:32,285
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1204
Actually, there will be
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1205
two versions of that.
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1206
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1207
266
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1208
00:13:32,285 --> 00:13:34,850
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1209
First version may be
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1210
also relatively slow.
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1211
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1212
267
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1213
00:13:34,850 --> 00:13:36,410
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1214
But the second version,
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1215
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1216
268
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1217
00:13:36,410 --> 00:13:38,240
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1218
in contrast to Python,
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1219
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1220
269
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1221
00:13:38,240 --> 00:13:40,295
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1222
Ruby, we'll be blindingly fast.
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1223
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1224
270
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1225
00:13:40,295 --> 00:13:42,380
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1226
And in the second example,
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1227
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1228
271
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1229
00:13:42,380 --> 00:13:45,740
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1230
you have to be careful
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1231
about the x axis because
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1232
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1233
272
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1234
00:13:45,740 --> 00:13:49,385
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1235
that means four times
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1236
ten to the power six.
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1237
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1238
273
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1239
00:13:49,385 --> 00:13:51,695
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1240
It's actually 4 million A's.
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1241
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1242
274
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1243
00:13:51,695 --> 00:13:55,100
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1244
So our regular
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1245
expression match or need
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1246
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1247
275
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1248
00:13:55,100 --> 00:13:57,635
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1249
less than ten seconds to
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1250
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1251
276
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1252
00:13:57,635 --> 00:14:00,725
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1253
match a string of length
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1254
of 4 million A's.
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1255
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1256
277
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1257
00:14:00,725 --> 00:14:04,430
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1258
Contrast that Python, Java eight,
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1259
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1260
278
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1261
00:14:04,430 --> 00:14:06,770
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1262
and JavaScript need half a minute
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1263
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1264
279
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1265
00:14:06,770 --> 00:14:09,905
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1266
already for a string
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1267
of length just 30,
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1268
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1269
280
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1270
00:14:09,905 --> 00:14:12,365
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1271
unless you're very
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1272
careful with Java eight.
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1273
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1274
281
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1275
00:14:12,365 --> 00:14:15,725
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1276
Yes, Java nine and above,
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1277
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1278
282
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1279
00:14:15,725 --> 00:14:17,180
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1280
they already have
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1281
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1282
283
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1283
00:14:17,180 --> 00:14:19,610
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1284
a much better regular
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1285
expression matching engine,
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1286
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1287
284
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1288
00:14:19,610 --> 00:14:22,805
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1289
but still we will be running
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1290
circles around them.
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1291
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1292
285
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1293
00:14:22,805 --> 00:14:27,050
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1294
It's this data. I
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1295
call this slide.
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1296
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1297
286
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1298
00:14:27,050 --> 00:14:29,675
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1299
Why bother with
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1300
regular expressions?
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1301
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1302
287
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1303
00:14:29,675 --> 00:14:33,515
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1304
But you can probably
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1305
see these are
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1306
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1307
288
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1308
00:14:33,515 --> 00:14:34,910
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1309
at least more times by
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1310
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1311
289
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1312
00:14:34,910 --> 00:14:38,015
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1313
the existing regular
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1314
expression matching engines.
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1315
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1316
290
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1317
00:14:38,015 --> 00:14:40,070
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1318
And it's actually
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1319
surprising that after
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1320
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1321
291
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1322
00:14:40,070 --> 00:14:42,695
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1323
one lecture we can already
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1324
do substantially better.
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1325
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1326
292
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1327
00:14:42,695 --> 00:14:47,495
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1328
And if you don't believe
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1329
in D times, I gave here,
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1330
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1331
293
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1332
00:14:47,495 --> 00:14:50,090
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1333
please feel free to
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1334
play on your own
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1335
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1336
294
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1337
00:14:50,090 --> 00:14:52,865
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1338
with the examples
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1339
I uploaded, Keats.
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1340
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1341
295
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1342
00:14:52,865 --> 00:14:55,235
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1343
These are exactly the programs
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1344
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1345
296
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1346
00:14:55,235 --> 00:14:57,470
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1347
are used here in the examples.
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1348
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1349
297
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1350
00:14:57,470 --> 00:14:59,255
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1351
So feel free.
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1352
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1353
298
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1354
00:14:59,255 --> 00:15:01,970
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1355
You might however now think, hmm.
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1356
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1357
299
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1358
00:15:01,970 --> 00:15:05,449
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1359
These are two very
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1360
well chosen examples.
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1361
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1362
300
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1363
00:15:05,449 --> 00:15:07,145
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1364
And I admit that's true.
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1365
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1366
301
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1367
00:15:07,145 --> 00:15:09,410
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1368
And such problem there never
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1369
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1370
302
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1371
00:15:09,410 --> 00:15:12,540
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1372
causing any problems
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1373
in real life.
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1374
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1375
303
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1376
00:15:13,300 --> 00:15:15,980
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1377
Regular expressions are used very
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1378
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1379
304
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1380
00:15:15,980 --> 00:15:19,415
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1381
frequently and they
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1382
do cause problems.
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1383
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1384
305
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1385
00:15:19,415 --> 00:15:21,410
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1386
So here's my first example from
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1387
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1388
306
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1389
00:15:21,410 --> 00:15:23,885
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1390
a company called cloudflare.
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1391
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1392
307
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1393
00:15:23,885 --> 00:15:27,560
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1394
This is a huge hosting company
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1395
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1396
308
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1397
00:15:27,560 --> 00:15:30,935
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1398
which host very
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1399
well-known web pages.
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1400
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1401
309
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1402
00:15:30,935 --> 00:15:34,970
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1403
And they really try hard
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1404
to have no outage at all.
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1405
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1406
310
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1407
00:15:34,970 --> 00:15:37,340
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1408
And they manage
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1409
that for six years.
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1410
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1411
311
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1412
00:15:37,340 --> 00:15:39,320
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1413
But then a Rekha expression,
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1414
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1415
312
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1416
00:15:39,320 --> 00:15:41,180
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1417
actually this one caused
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1418
a problem and you
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1419
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1420
313
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1421
00:15:41,180 --> 00:15:43,265
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1422
can see they're also
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1423
like two stars.
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1424
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1425
314
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1426
00:15:43,265 --> 00:15:44,630
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1427
They are at the end.
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1428
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1429
315
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1430
00:15:44,630 --> 00:15:46,955
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1431
And because of that string needed
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1432
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1433
316
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1434
00:15:46,955 --> 00:15:49,865
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1435
too much time to be matched.
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1436
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1437
317
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1438
00:15:49,865 --> 00:15:50,990
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1439
And because of that,
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1440
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1441
318
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1442
00:15:50,990 --> 00:15:52,430
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1443
they had some outage for,
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1444
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1445
319
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1446
00:15:52,430 --> 00:15:54,125
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1447
I think several hours,
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1448
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1449
320
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1450
00:15:54,125 --> 00:15:57,920
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1451
actually in their malware
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1452
detection subsystem.
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1453
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1454
321
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1455
00:15:57,920 --> 00:16:02,060
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1456
And the second example
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1457
comes from 2016,
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1458
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1459
322
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1460
00:16:02,060 --> 00:16:04,040
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1461
where Stack Exchange,
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1462
I guess you know
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1463
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1464
323
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1465
00:16:04,040 --> 00:16:06,650
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1466
this webpage had
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1467
also an outage from,
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1468
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1469
324
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1470
00:16:06,650 --> 00:16:08,390
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1471
I think at least an hour.
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1472
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1473
325
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1474
00:16:08,390 --> 00:16:13,070
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1475
Because a regular expression
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1476
then needed to format posts,
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1477
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1478
326
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1479
00:16:13,070 --> 00:16:15,575
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1480
needed too much time to
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1481
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1482
327
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1483
00:16:15,575 --> 00:16:19,010
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1484
recognize whether this post
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1485
should be accepted or not.
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1486
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1487
328
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1488
00:16:19,010 --> 00:16:23,390
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1489
And again, there was a
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1490
semi kind of problem.
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1491
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1492
329
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1493
00:16:23,390 --> 00:16:24,950
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1494
And you can read
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1495
the stories behind
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1496
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1497
330
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1498
00:16:24,950 --> 00:16:28,080
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1499
that on these two given links.
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1500
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1501
331
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1502
00:16:28,720 --> 00:16:31,730
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1503
When I looked at
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1504
this the first time,
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1505
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1506
332
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1507
00:16:31,730 --> 00:16:34,175
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1508
what surprised me is
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1509
that theoretician
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1510
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1511
333
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1512
00:16:34,175 --> 00:16:37,520
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1513
who sometimes dedicate their
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1514
life to regular expression.
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1515
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1516
334
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1517
00:16:37,520 --> 00:16:39,440
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1518
And no really a lot about
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1519
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1520
335
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1521
00:16:39,440 --> 00:16:41,690
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1522
them didn't know
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1523
anything about this.
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1524
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1525
336
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1526
00:16:41,690 --> 00:16:43,610
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1527
But engineers, they
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1528
already created
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1529
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1530
337
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1531
00:16:43,610 --> 00:16:46,160
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1532
a name for that
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1533
regular expression,
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1534
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1535
338
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1536
00:16:46,160 --> 00:16:47,975
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1537
denial of service attack.
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1538
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1539
339
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1540
00:16:47,975 --> 00:16:49,745
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1541
Because what you can,
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1542
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1543
340
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1544
00:16:49,745 --> 00:16:51,230
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1545
what can happen now is that
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1546
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1547
341
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1548
00:16:51,230 --> 00:16:54,920
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1549
attackers look for
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1550
certain strings.
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1551
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1552
342
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1553
00:16:54,920 --> 00:16:56,780
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1554
You make your regular expression
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1555
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1556
343
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1557
00:16:56,780 --> 00:16:59,105
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1558
matching engine topple over.
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1559
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1560
344
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1561
00:16:59,105 --> 00:17:01,370
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1562
And these kind of expressions,
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1563
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1564
345
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1565
00:17:01,370 --> 00:17:04,160
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1566
regular expressions called
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1567
Eve of reg expression.
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1568
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1569
346
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1570
00:17:04,160 --> 00:17:06,350
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1571
And actually there are
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1572
quite a number of them.
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1573
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1574
347
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1575
00:17:06,350 --> 00:17:08,495
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1576
So you seen this one,
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1577
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1578
348
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1579
00:17:08,495 --> 00:17:11,255
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1580
the first one, and the
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1581
second one already.
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1582
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1583
349
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1584
00:17:11,255 --> 00:17:13,400
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1585
But there are many, many more.
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1586
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1587
350
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1588
00:17:13,400 --> 00:17:15,620
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1589
And you can easily have in
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1590
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1591
351
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1592
00:17:15,620 --> 00:17:18,560
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1593
your program one of
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1594
these reg expression.
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1595
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1596
352
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1597
00:17:18,560 --> 00:17:21,830
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1598
And then you have the
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1599
problem that if you do have
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1600
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1601
353
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1602
00:17:21,830 --> 00:17:23,240
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1603
this regular expression and
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1604
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1605
354
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1606
00:17:23,240 --> 00:17:25,640
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1607
somebody finds the
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1608
corresponding string,
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1609
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1610
355
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1611
00:17:25,640 --> 00:17:29,945
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1612
which make the records
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1613
matching engine topple over,
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1614
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1615
356
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1616
00:17:29,945 --> 00:17:31,820
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1617
then you have a problem
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1618
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1619
357
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1620
00:17:31,820 --> 00:17:34,295
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1621
because your webpage is
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1622
probably not variable.
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1623
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1624
358
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1625
00:17:34,295 --> 00:17:36,140
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1626
This is also sometimes called
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1627
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1628
359
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1629
00:17:36,140 --> 00:17:39,350
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1630
this phenomenon,
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1631
catastrophic backtracking.
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1632
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1633
360
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1634
00:17:39,350 --> 00:17:43,595
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1635
In lecture three, we will
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1636
look at this more carefully.
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1637
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1638
361
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1639
00:17:43,595 --> 00:17:46,910
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1640
And actually why that
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1641
is such a problem in
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1642
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1643
362
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1644
00:17:46,910 --> 00:17:50,795
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1645
real life is actually
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1646
not to do with Lexus.
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1647
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1648
363
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1649
00:17:50,795 --> 00:17:53,180
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1650
Yes, regular
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1651
expressions are used as
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1652
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1653
364
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1654
00:17:53,180 --> 00:17:55,040
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1655
the basic tool for implementing
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1656
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1657
365
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1658
00:17:55,040 --> 00:17:57,185
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1659
like source bad reg expressions,
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1660
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1661
366
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1662
00:17:57,185 --> 00:18:00,065
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1663
of course, used in
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1664
a much wider area.
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1665
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1666
367
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1667
00:18:00,065 --> 00:18:03,770
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1668
And they especially used for
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1669
network intrusion detection.
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1670
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1671
368
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1672
00:18:03,770 --> 00:18:06,590
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1673
Remember, you having to
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1674
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1675
369
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1676
00:18:06,590 --> 00:18:10,130
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1677
administer a big network
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1678
and you only want to let
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1679
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1680
370
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1681
00:18:10,130 --> 00:18:13,640
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1682
in packets which you think are K
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1683
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1684
371
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1685
00:18:13,640 --> 00:18:14,930
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1686
and you want to keep out
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1687
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1688
372
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1689
00:18:14,930 --> 00:18:17,645
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1690
any package which might
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1691
hack into your network.
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1692
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1693
373
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1694
00:18:17,645 --> 00:18:22,670
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1695
So what they have is they
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1696
have suites of thousands and
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1697
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1698
374
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1699
00:18:22,670 --> 00:18:25,745
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1700
sometimes even more
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1701
regular expressions which
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1702
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1703
375
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1704
00:18:25,745 --> 00:18:27,755
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1705
check whether this package
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1706
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1707
376
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1708
00:18:27,755 --> 00:18:30,065
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1709
satisfies some patterns or not.
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1710
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1711
377
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1712
00:18:30,065 --> 00:18:31,460
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1713
And in this case it will be left
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1714
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1715
378
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1716
00:18:31,460 --> 00:18:34,205
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1717
out or it will be let in.
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1718
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1719
379
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1720
00:18:34,205 --> 00:18:36,335
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1721
And with networks,
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1722
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1723
380
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1724
00:18:36,335 --> 00:18:39,080
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1725
the problem is that our
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1726
hardware is already
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1727
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1728
381
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1729
00:18:39,080 --> 00:18:43,190
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1730
so fast that the reg expressions
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1731
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1732
382
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1733
00:18:43,190 --> 00:18:45,169
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1734
really become a bottleneck.
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1735
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1736
383
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1737
00:18:45,169 --> 00:18:47,060
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1738
Because what do you do if now is
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1739
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1740
384
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1741
00:18:47,060 --> 00:18:49,880
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1742
suddenly a reg expression
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1743
takes too much time
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1744
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1745
385
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1746
00:18:49,880 --> 00:18:52,670
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1747
to just stop the matching
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1748
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1749
386
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1750
00:18:52,670 --> 00:18:55,100
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1751
and let the package
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1752
in regardless?
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1753
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1754
387
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1755
00:18:55,100 --> 00:18:58,190
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1756
Or do you just hold
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1757
the network up
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1758
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1759
388
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1760
00:18:58,190 --> 00:19:01,715
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1761
and don't let anything in
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1762
until you decided that.
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1763
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1764
389
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1765
00:19:01,715 --> 00:19:04,895
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1766
So that's actually a
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1767
really hard problem.
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1768
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1769
390
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1770
00:19:04,895 --> 00:19:06,650
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1771
But the first time I came across
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1772
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1773
391
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1774
00:19:06,650 --> 00:19:09,965
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1775
that problem was actually
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1776
by this engineer.
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1777
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1778
392
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1779
00:19:09,965 --> 00:19:13,820
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1780
And it's always say that
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1781
Germans don't have any Yammer.
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1782
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1783
393
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1784
00:19:13,820 --> 00:19:16,985
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1785
But I found that
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1786
video quite funny.
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1787
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1788
394
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1789
00:19:16,985 --> 00:19:19,145
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1790
Maybe you have a
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1791
different opinion,
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1792
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1793
395
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1794
00:19:19,145 --> 00:19:21,095
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1795
but feel free to
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1796
have a look which
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1797
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1798
396
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1799
00:19:21,095 --> 00:19:23,705
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1800
explains exactly that problem.
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1801
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1802
397
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1803
00:19:23,705 --> 00:19:25,610
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1804
So in the next video,
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1805
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1806
398
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1807
00:19:25,610 --> 00:19:28,445
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1808
we will start to
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1809
implement this matcher.
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1810
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1811
399
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1812
00:19:28,445 --> 00:19:30,870
82a1315c128d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1813
So I hope to see you there.