videos/01-evilregexes.srt
author Christian Urban <christian.urban@kcl.ac.uk>
Mon, 30 Oct 2023 18:46:27 +0000
changeset 950 285da21f44c0
parent 837 cb31a037049c
permissions -rw-r--r--
updated
Ignore whitespace changes - Everywhere: Within whitespace: At end of lines:
761
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
     1
1
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
     2
00:00:06,240 --> 00:00:11,050
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
     3
Welcome back. This video
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
     4
is about regular expressions.
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
     5
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
     6
2
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
     7
00:00:11,050 --> 00:00:14,230
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
     8
We want to use regular
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
     9
expressions in our lexer.
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
    10
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
    11
3
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
    12
00:00:14,230 --> 00:00:16,165
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
    13
And the purpose of the
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
    14
lexer is to find
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
    15
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
    16
4
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
    17
00:00:16,165 --> 00:00:18,070
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
    18
out where the words in
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
    19
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
    20
5
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
    21
00:00:18,070 --> 00:00:21,070
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
    22
our programs are. However
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
    23
regular expressions
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
    24
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
    25
6
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
    26
00:00:21,070 --> 00:00:23,875
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
    27
are fundamental tool
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
    28
in computer science.
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
    29
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
    30
7
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
    31
00:00:23,875 --> 00:00:27,910
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
    32
And I'm sure you've used them
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
    33
already on several occasions.
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
    34
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
    35
8
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
    36
00:00:27,910 --> 00:00:30,370
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
    37
And one would expect that about
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
    38
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
    39
9
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
    40
00:00:30,370 --> 00:00:31,750
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
    41
regular expressions since they are
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
    42
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
    43
10
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
    44
00:00:31,750 --> 00:00:33,850
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
    45
so well-known and well studied,
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
    46
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
    47
11
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
    48
00:00:33,850 --> 00:00:37,915
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
    49
that everything under the
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
    50
sun is known about them.
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
    51
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
    52
12
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
    53
00:00:37,915 --> 00:00:41,080
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
    54
But actually there's
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
    55
still some surprising
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
    56
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
    57
13
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
    58
00:00:41,080 --> 00:00:44,465
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
    59
and interesting
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
    60
problems with them.
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
    61
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
    62
14
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
    63
00:00:44,465 --> 00:00:47,945
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
    64
And I want to show you
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
    65
them in this video.
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
    66
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
    67
15
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
    68
00:00:47,945 --> 00:00:50,720
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
    69
I'm sure you've seen
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
    70
regular expressions
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
    71
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
    72
16
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
    73
00:00:50,720 --> 00:00:52,445
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
    74
many, many times before.
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
    75
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
    76
17
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
    77
00:00:52,445 --> 00:00:55,100
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
    78
But just to be on the same page,
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
    79
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
    80
18
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
    81
00:00:55,100 --> 00:00:57,110
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
    82
let me just recap them.
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
    83
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
    84
19
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
    85
00:00:57,110 --> 00:00:59,210
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
    86
So here in this line,
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
    87
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
    88
20
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
    89
00:00:59,210 --> 00:01:01,790
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
    90
there is a regular expression
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
    91
which is supposed to
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
    92
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
    93
21
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
    94
00:01:01,790 --> 00:01:05,285
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
    95
recognize some form
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
    96
of email addresses.
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
    97
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
    98
22
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
    99
00:01:05,285 --> 00:01:07,745
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   100
So an e-mail address
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   101
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   102
23
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   103
00:01:07,745 --> 00:01:11,000
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   104
has part which is
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   105
before the @ symbol,
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   106
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   107
24
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   108
00:01:11,000 --> 00:01:13,400
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   109
which is the name of the person.
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   110
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   111
25
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   112
00:01:13,400 --> 00:01:16,880
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   113
And that can be
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   114
any number between
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   115
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   116
26
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   117
00:01:16,880 --> 00:01:20,195
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   118
0 and 9, and letters between a and z.
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   119
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   120
27
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   121
00:01:20,195 --> 00:01:24,155
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   122
Let's say we avoiding
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   123
here capital letters.
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   124
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   125
28
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   126
00:01:24,155 --> 00:01:26,045
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   127
There can be underscores.
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   128
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   129
29
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   130
00:01:26,045 --> 00:01:29,405
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   131
There can be a dot and
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   132
there can be hyphens.
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   133
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   134
30
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   135
00:01:29,405 --> 00:01:35,390
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   136
And after the @ symbol
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   137
comes the domain name.
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   138
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   139
31
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   140
00:01:35,390 --> 00:01:37,310
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   141
So as you can see here,
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   142
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   143
32
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   144
00:01:37,310 --> 00:01:40,640
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   145
we use things like star to
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   146
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   147
33
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   148
00:01:40,640 --> 00:01:44,314
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   149
match letters
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   150
zero or more times.
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   151
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   152
34
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   153
00:01:44,314 --> 00:01:45,985
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   154
Or we have a plus,
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   155
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   156
35
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   157
00:01:45,985 --> 00:01:47,420
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   158
which means you have to match
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   159
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   160
36
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   161
00:01:47,420 --> 00:01:52,489
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   162
at least once or more
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   163
times. Then we have.
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   164
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   165
37
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   166
00:01:52,489 --> 00:01:55,790
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   167
question mark, which says you
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   168
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   169
38
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   170
00:01:55,790 --> 00:01:59,105
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   171
match either it is there
837
cb31a037049c updated
Christian Urban <christian.urban@kcl.ac.uk>
parents: 765
diff changeset
   172
or it is not there.
761
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   173
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   174
39
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   175
00:01:59,105 --> 00:02:01,340
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   176
You are also regular
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   177
expressions which
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   178
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   179
40
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   180
00:02:01,340 --> 00:02:03,755
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   181
match exactly n-times.
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   182
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   183
41
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   184
00:02:03,755 --> 00:02:08,720
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   185
Or this is a regular expression
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   186
for between n and m times.
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   187
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   188
42
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   189
00:02:08,720 --> 00:02:12,065
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   190
You can see in
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   191
this email address,
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   192
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   193
43
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   194
00:02:12,065 --> 00:02:13,730
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   195
the top-level domain
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   196
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   197
44
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   198
00:02:13,730 --> 00:02:16,130
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   199
name can be any letter 
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   200
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   201
45
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   202
00:02:16,130 --> 00:02:19,265
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   203
between a to z,
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   204
and contain dots,
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   205
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   206
46
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   207
00:02:19,265 --> 00:02:22,340
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   208
but can only be two
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   209
characters long
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   210
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   211
47
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   212
00:02:22,340 --> 00:02:25,685
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   213
up till six characters
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   214
and not more.
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   215
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   216
48
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   217
00:02:25,685 --> 00:02:29,240
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   218
Then you also have
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   219
something like ranges.
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   220
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   221
49
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   222
00:02:29,240 --> 00:02:31,220
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   223
So you can see, letters between a
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   224
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   225
50
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   226
00:02:31,220 --> 00:02:33,635
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   227
and z and 0 to 9 and so on.
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   228
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   229
51
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   230
00:02:33,635 --> 00:02:36,545
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   231
Here you also have regular
837
cb31a037049c updated
Christian Urban <christian.urban@kcl.ac.uk>
parents: 765
diff changeset
   232
expressions which can
761
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   233
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   234
52
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   235
00:02:36,545 --> 00:02:40,070
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   236
match something which
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   237
isn't in this range.
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   238
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   239
53
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   240
00:02:40,070 --> 00:02:42,560
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   241
So for example, if
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   242
you want for example match,
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   243
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   244
54
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   245
00:02:42,560 --> 00:02:44,030
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   246
letters but not numbers,
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   247
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   248
55
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   249
00:02:44,030 --> 00:02:45,800
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   250
you would say, well, if
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   251
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   252
56
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   253
00:02:45,800 --> 00:02:48,990
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   254
this is a number that
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   255
should not match.
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   256
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   257
57
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   258
00:02:49,090 --> 00:02:52,804
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   259
Typically you also
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   260
have these ranges.
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   261
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   262
58
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   263
00:02:52,804 --> 00:02:55,565
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   264
Lowercase letters,
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   265
capital letters.
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   266
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   267
59
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   268
00:02:55,565 --> 00:02:58,550
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   269
Then you have some
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   270
special regular expressions
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   271
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   272
60
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   273
00:02:58,550 --> 00:03:02,195
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   274
like this one is only
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   275
supposed to match digits.
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   276
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   277
61
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   278
00:03:02,195 --> 00:03:05,674
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   279
A dot is supposed to
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   280
match any character.
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   281
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   282
62
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   283
00:03:05,674 --> 00:03:07,370
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   284
And then they have also something
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   285
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   286
63
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   287
00:03:07,370 --> 00:03:09,800
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   288
called groups which
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   289
is supposed to be
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   290
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   291
64
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   292
00:03:09,800 --> 00:03:12,799
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   293
used when you are
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   294
trying to extract
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   295
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   296
65
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   297
00:03:12,799 --> 00:03:15,605
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   298
a string you've matched.
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   299
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   300
66
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   301
00:03:15,605 --> 00:03:19,925
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   302
Okay, so these are the
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   303
typical regular expressions.
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   304
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   305
67
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   306
00:03:19,925 --> 00:03:23,075
837
cb31a037049c updated
Christian Urban <christian.urban@kcl.ac.uk>
parents: 765
diff changeset
   307
And here's a particular one
761
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   308
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   309
68
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   310
00:03:23,075 --> 00:03:25,820
837
cb31a037049c updated
Christian Urban <christian.urban@kcl.ac.uk>
parents: 765
diff changeset
   311
trying to match something
761
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   312
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   313
69
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   314
00:03:25,820 --> 00:03:28,770
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   315
which resembles
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   316
an email address.
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   317
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   318
70
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   319
00:03:29,590 --> 00:03:33,065
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   320
Clearly that should be all easy.
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   321
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   322
71
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   323
00:03:33,065 --> 00:03:36,230
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   324
And our technology should
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   325
be on top of that.
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   326
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   327
72
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   328
00:03:36,230 --> 00:03:37,865
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   329
That we can take a
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   330
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   331
73
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   332
00:03:37,865 --> 00:03:41,015
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   333
regular expressions and
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   334
we can take a string,
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   335
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   336
74
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   337
00:03:41,015 --> 00:03:43,340
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   338
and we should have programs to
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   339
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   340
75
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   341
00:03:43,340 --> 00:03:45,680
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   342
decide whether this
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   343
string is matched
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   344
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   345
76
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   346
00:03:45,680 --> 00:03:50,330
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   347
by a regular expression or
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   348
not and should be easy-peasy, no?
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   349
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   350
77
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   351
00:03:50,330 --> 00:03:56,150
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   352
Well, let's have a
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   353
look at two examples.
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   354
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   355
78
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   356
00:03:56,150 --> 00:04:00,860
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   357
The first regular expression
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   358
is a star star b.
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   359
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   360
79
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   361
00:04:00,860 --> 00:04:02,990
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   362
And it is supposed
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   363
to match strings of
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   364
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   365
80
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   366
00:04:02,990 --> 00:04:05,825
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   367
the form 0 or more a's,
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   368
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   369
81
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   370
00:04:05,825 --> 00:04:10,385
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   371
followed by a b. The parentheses
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   372
you can ignore.
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   373
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   374
82
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   375
00:04:10,385 --> 00:04:11,990
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   376
And a star star
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   377
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   378
83
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   379
00:04:11,990 --> 00:04:14,120
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   380
also doesn't
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   381
make any difference
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   382
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   383
84
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   384
00:04:14,120 --> 00:04:16,505
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   385
to what kind of strings
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   386
that can be matched.
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   387
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   388
85
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   389
00:04:16,505 --> 00:04:21,635
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   390
It can only make 0 more
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   391
a's followed by a b.
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   392
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   393
86
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   394
00:04:21,635 --> 00:04:23,900
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   395
And the other regular expression
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   396
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   397
87
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   398
00:04:23,900 --> 00:04:26,990
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   399
is possibly a character a,
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   400
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   401
88
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   402
00:04:26,990 --> 00:04:32,930
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   403
n times, followed by character
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   404
a axactly n-times.
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   405
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   406
89
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   407
00:04:32,930 --> 00:04:35,570
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   408
And we will try out
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   409
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   410
90
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   411
00:04:35,570 --> 00:04:38,360
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   412
these two regular expressions
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   413
with strings of the form a,
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   414
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   415
91
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   416
00:04:38,360 --> 00:04:39,890
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   417
aa, and so on,
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   418
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   419
92
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   420
00:04:39,890 --> 00:04:45,770
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   421
and up to the length of n. And
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   422
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   423
93
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   424
00:04:45,770 --> 00:04:49,130
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   425
this regular expression should
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   426
actually not match any of
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   427
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   428
94
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   429
00:04:49,130 --> 00:04:53,315
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   430
the strings because the
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   431
final b is missing.
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   432
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   433
95
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   434
00:04:53,315 --> 00:04:56,150
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   435
But that is
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   436
okay. For example
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   437
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   438
96
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   439
00:04:56,150 --> 00:04:57,425
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   440
if you have a regular expression
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   441
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   442
97
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   443
00:04:57,425 --> 00:05:00,110
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   444
that is supposed to
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   445
check whether a string is
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   446
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   447
98
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   448
00:05:00,110 --> 00:05:01,490
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   449
an email address and the user
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   450
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   451
99
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   452
00:05:01,490 --> 00:05:03,380
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   453
gives some random
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   454
strings in there,
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   455
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   456
100
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   457
00:05:03,380 --> 00:05:06,545
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   458
then this regular expression
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   459
should not match that string.
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   460
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   461
101
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   462
00:05:06,545 --> 00:05:08,420
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   463
And for this regular expression
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   464
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   465
102
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   466
00:05:08,420 --> 00:05:11,195
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   467
you have to scratch a
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   468
little bit of your head,
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   469
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   470
103
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   471
00:05:11,195 --> 00:05:12,620
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   472
what it can actually match.
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   473
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   474
104
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   475
00:05:12,620 --> 00:05:14,720
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   476
But after a little bit
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   477
of head scratching,
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   478
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   479
105
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   480
00:05:14,720 --> 00:05:18,260
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   481
you find out can match
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   482
any string which is of
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   483
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   484
106
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   485
00:05:18,260 --> 00:05:22,580
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   486
the length n a's up
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   487
to 2n of a's.
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   488
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   489
107
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   490
00:05:22,580 --> 00:05:24,290
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   491
So anything in this range,
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   492
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   493
108
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   494
00:05:24,290 --> 00:05:27,185
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   495
this regular expression
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   496
can actually match.
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   497
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   498
109
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   499
00:05:27,185 --> 00:05:30,395
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   500
Okay, let's
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   501
take a random tool,
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   502
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   503
110
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   504
00:05:30,395 --> 00:05:32,630
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   505
maybe for example Python.
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   506
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   507
111
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   508
00:05:32,630 --> 00:05:35,240
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   509
So here's a little
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   510
Python program.
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   511
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   512
112
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   513
00:05:35,240 --> 00:05:38,690
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   514
It uses the library
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   515
function of Python to
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   516
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   517
113
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   518
00:05:38,690 --> 00:05:42,935
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   519
match the regular expressions of
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   520
a star star b.
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   521
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   522
114
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   523
00:05:42,935 --> 00:05:46,805
837
cb31a037049c updated
Christian Urban <christian.urban@kcl.ac.uk>
parents: 765
diff changeset
   524
And we measure the time with longer
cb31a037049c updated
Christian Urban <christian.urban@kcl.ac.uk>
parents: 765
diff changeset
   525
and longer strings of a's.
761
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   526
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   527
115
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   528
00:05:46,805 --> 00:05:48,770
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   529
And so conveniently we can give
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   530
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   531
116
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   532
00:05:48,770 --> 00:05:51,140
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   533
the number of a's here
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   534
on the command line.
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   535
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   536
117
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   537
00:05:51,140 --> 00:05:56,900
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   538
If I just call
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   539
this on the command line,
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   540
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   541
118
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   542
00:05:56,900 --> 00:05:59,900
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   543
Let's say we first
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   544
start with five a's.
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   545
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   546
119
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   547
00:05:59,900 --> 00:06:03,920
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   548
And I get also the times which
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   549
in this case is next to nothing.
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   550
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   551
120
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   552
00:06:03,920 --> 00:06:05,960
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   553
And here's the string
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   554
we just matched.
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   555
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   556
121
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   557
00:06:05,960 --> 00:06:07,640
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   558
And obviously the
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   559
regular expression
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   560
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   561
122
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   562
00:06:07,640 --> 00:06:09,110
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   563
did not match the string.
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   564
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   565
123
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   566
00:06:09,110 --> 00:06:11,255
837
cb31a037049c updated
Christian Urban <christian.urban@kcl.ac.uk>
parents: 765
diff changeset
   567
That's indicated by this None.
761
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   568
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   569
124
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   570
00:06:11,255 --> 00:06:13,925
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   571
Let's take ten a's.
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   572
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   573
125
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   574
00:06:13,925 --> 00:06:16,490
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   575
It's also pretty quick.
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   576
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   577
126
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   578
00:06:16,490 --> 00:06:20,780
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   579
Fifteen a's, even quicker,
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   580
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   581
127
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   582
00:06:20,780 --> 00:06:23,180
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   583
but these times always need to
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   584
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   585
128
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   586
00:06:23,180 --> 00:06:25,820
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   587
be taken with a grain of salt.
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   588
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   589
129
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   590
00:06:25,820 --> 00:06:28,040
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   591
They are not 100
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   592
percent accurate.
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   593
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   594
130
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   595
00:06:28,040 --> 00:06:31,490
837
cb31a037049c updated
Christian Urban <christian.urban@kcl.ac.uk>
parents: 765
diff changeset
   596
So 15 is also OK.
cb31a037049c updated
Christian Urban <christian.urban@kcl.ac.uk>
parents: 765
diff changeset
   597
Let's take 20.
761
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   598
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   599
131
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   600
00:06:31,490 --> 00:06:36,965
837
cb31a037049c updated
Christian Urban <christian.urban@kcl.ac.uk>
parents: 765
diff changeset
   601
Hmmm this already takes
761
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   602
double the time.
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   603
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   604
132
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   605
00:06:36,965 --> 00:06:42,440
837
cb31a037049c updated
Christian Urban <christian.urban@kcl.ac.uk>
parents: 765
diff changeset
   606
Twenty-five. Then even longer.
761
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   607
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   608
133
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   609
00:06:42,440 --> 00:06:45,680
837
cb31a037049c updated
Christian Urban <christian.urban@kcl.ac.uk>
parents: 765
diff changeset
   610
Okay, then suddenly
cb31a037049c updated
Christian Urban <christian.urban@kcl.ac.uk>
parents: 765
diff changeset
   611
from 0.2 seconds,
761
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   612
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   613
134
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   614
00:06:45,680 --> 00:06:48,960
837
cb31a037049c updated
Christian Urban <christian.urban@kcl.ac.uk>
parents: 765
diff changeset
   615
it now takes almost four seconds.
761
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   616
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   617
135
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   618
00:06:49,600 --> 00:06:54,890
837
cb31a037049c updated
Christian Urban <christian.urban@kcl.ac.uk>
parents: 765
diff changeset
   619
Twenty-Six, this
761
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   620
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   621
136
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   622
00:06:54,890 --> 00:07:01,415
837
cb31a037049c updated
Christian Urban <christian.urban@kcl.ac.uk>
parents: 765
diff changeset
   623
takes six seconds...
cb31a037049c updated
Christian Urban <christian.urban@kcl.ac.uk>
parents: 765
diff changeset
   624
already double. 
761
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   625
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   626
137
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   627
00:07:01,415 --> 00:07:07,229
837
cb31a037049c updated
Christian Urban <christian.urban@kcl.ac.uk>
parents: 765
diff changeset
   628
Let's go to 28. That would be
cb31a037049c updated
Christian Urban <christian.urban@kcl.ac.uk>
parents: 765
diff changeset
   629
...hmmm....hmmm
761
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   630
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   631
138
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   632
00:07:08,890 --> 00:07:11,840
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   633
You see the string
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   634
isn't very long,
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   635
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   636
139
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   637
00:07:11,840 --> 00:07:13,340
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   638
so that could be easily like
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   639
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   640
140
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   641
00:07:13,340 --> 00:07:16,070
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   642
just the size of
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   643
an email address.
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   644
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   645
141
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   646
00:07:16,070 --> 00:07:19,280
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   647
And the regular
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   648
expression matching
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   649
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   650
142
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   651
00:07:19,280 --> 00:07:22,550
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   652
engine in Python needs
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   653
quite a long time
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   654
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   655
143
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   656
00:07:22,550 --> 00:07:24,710
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   657
to find out that
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   658
this string of 28
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   659
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   660
144
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   661
00:07:24,710 --> 00:07:26,570
837
cb31a037049c updated
Christian Urban <christian.urban@kcl.ac.uk>
parents: 765
diff changeset
   662
a's is actually not matched
761
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   663
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   664
145
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   665
00:07:26,570 --> 00:07:28,490
837
cb31a037049c updated
Christian Urban <christian.urban@kcl.ac.uk>
parents: 765
diff changeset
   666
by that. You see it's
761
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   667
still not finished.
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   668
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   669
146
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   670
00:07:28,490 --> 00:07:32,900
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   671
I think it should take
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   672
approximately like 20 seconds.
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   673
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   674
147
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   675
00:07:32,900 --> 00:07:34,400
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   676
Okay. Already 30.
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   677
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   678
148
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   679
00:07:34,400 --> 00:07:36,530
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   680
And if we would try
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   681
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   682
149
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   683
00:07:36,530 --> 00:07:40,805
837
cb31a037049c updated
Christian Urban <christian.urban@kcl.ac.uk>
parents: 765
diff changeset
   684
30, we would be already here
cb31a037049c updated
Christian Urban <christian.urban@kcl.ac.uk>
parents: 765
diff changeset
   685
for more than a minute.
761
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   686
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   687
150
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   688
00:07:40,805 --> 00:07:43,940
837
cb31a037049c updated
Christian Urban <christian.urban@kcl.ac.uk>
parents: 765
diff changeset
   689
And if I could use
cb31a037049c updated
Christian Urban <christian.urban@kcl.ac.uk>
parents: 765
diff changeset
   690
something like 100,
761
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   691
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   692
151
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   693
00:07:43,940 --> 00:07:46,220
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   694
you remember if a doubling in
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   695
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   696
152
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   697
00:07:46,220 --> 00:07:48,770
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   698
each step or the second step,
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   699
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   700
153
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   701
00:07:48,770 --> 00:07:50,720
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   702
the story with the chess board,
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   703
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   704
154
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   705
00:07:50,720 --> 00:07:53,855
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   706
we probably would sit here
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   707
until the next century.
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   708
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   709
155
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   710
00:07:53,855 --> 00:07:56,820
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   711
So something strange here.
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   712
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   713
156
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   714
00:07:57,580 --> 00:08:01,355
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   715
Okay, that might be just
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   716
a problem of Python.
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   717
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   718
157
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   719
00:08:01,355 --> 00:08:02,990
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   720
Let's have a look at another
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   721
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   722
158
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   723
00:08:02,990 --> 00:08:04,985
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   724
regular expression
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   725
matching engine.
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   726
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   727
159
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   728
00:08:04,985 --> 00:08:06,890
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   729
This time from JavaScript,
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   730
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   731
160
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   732
00:08:06,890 --> 00:08:10,040
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   733
also are pretty well-known
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   734
programming language.
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   735
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   736
161
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   737
00:08:10,040 --> 00:08:13,610
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   738
So here you can see
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   739
it's still a star,
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   740
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   741
162
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   742
00:08:13,610 --> 00:08:16,235
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   743
star followed by b,
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   744
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   745
163
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   746
00:08:16,235 --> 00:08:18,920
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   747
by direct expression is
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   748
supposed to match that from
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   749
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   750
164
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   751
00:08:18,920 --> 00:08:21,830
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   752
the beginning of the
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   753
string up till the end.
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   754
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   755
165
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   756
00:08:21,830 --> 00:08:23,930
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   757
So there's not any difference
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   758
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   759
166
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   760
00:08:23,930 --> 00:08:26,150
837
cb31a037049c updated
Christian Urban <christian.urban@kcl.ac.uk>
parents: 765
diff changeset
   761
in the strings this regular
761
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   762
expression matches.
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   763
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   764
167
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   765
00:08:26,150 --> 00:08:28,610
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   766
We'll just start at the
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   767
beginning of the string
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   768
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   769
168
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   770
00:08:28,610 --> 00:08:31,460
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   771
and finish at the
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   772
end of the string.
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   773
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   774
169
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   775
00:08:31,460 --> 00:08:35,285
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   776
And we again, we just use
837
cb31a037049c updated
Christian Urban <christian.urban@kcl.ac.uk>
parents: 765
diff changeset
   777
repeated a's for that.
761
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   778
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   779
170
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   780
00:08:35,285 --> 00:08:38,195
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   781
And similarly, we can
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   782
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   783
171
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   784
00:08:38,195 --> 00:08:41,930
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   785
call it on the command line
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   786
and can do some timing.
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   787
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   788
172
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   789
00:08:41,930 --> 00:08:44,540
837
cb31a037049c updated
Christian Urban <christian.urban@kcl.ac.uk>
parents: 765
diff changeset
   790
So ten a's is very good.
761
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   791
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   792
173
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   793
00:08:44,540 --> 00:08:46,340
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   794
Here's the string.
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   795
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   796
174
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   797
00:08:46,340 --> 00:08:48,320
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   798
It cannot match that string.
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   799
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   800
175
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   801
00:08:48,320 --> 00:08:50,525
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   802
And it's pretty fast.
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   803
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   804
176
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   805
00:08:50,525 --> 00:08:54,725
837
cb31a037049c updated
Christian Urban <christian.urban@kcl.ac.uk>
parents: 765
diff changeset
   806
Twenty...also pretty fast.
761
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   807
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   808
177
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   809
00:08:54,725 --> 00:08:59,120
837
cb31a037049c updated
Christian Urban <christian.urban@kcl.ac.uk>
parents: 765
diff changeset
   810
Twenty-five... Again,
761
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   811
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   812
178
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   813
00:08:59,120 --> 00:09:06,650
837
cb31a037049c updated
Christian Urban <christian.urban@kcl.ac.uk>
parents: 765
diff changeset
   814
somehow is a kind of
761
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   815
threshold that is 25, 26.
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   816
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   817
179
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   818
00:09:06,650 --> 00:09:09,485
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   819
Suddenly it takes much longer.
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   820
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   821
180
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   822
00:09:09,485 --> 00:09:14,360
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   823
And it has essentially the
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   824
same problem as with Python.
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   825
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   826
181
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   827
00:09:14,360 --> 00:09:17,165
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   828
So you'll see in now from 26 on,
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   829
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   830
182
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   831
00:09:17,165 --> 00:09:19,250
837
cb31a037049c updated
Christian Urban <christian.urban@kcl.ac.uk>
parents: 765
diff changeset
   832
the times always
cb31a037049c updated
Christian Urban <christian.urban@kcl.ac.uk>
parents: 765
diff changeset
   833
double from
761
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   834
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   835
183
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   836
00:09:19,250 --> 00:09:21,860
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   837
three seconds to seven seconds.
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   838
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   839
184
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   840
00:09:21,860 --> 00:09:23,330
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   841
So you can imagine what that
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   842
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   843
185
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   844
00:09:23,330 --> 00:09:24,890
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   845
roughly takes when I put your
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   846
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   847
186
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   848
00:09:24,890 --> 00:09:30,230
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   849
27 and you see the
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   850
string isn't very long.
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   851
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   852
187
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   853
00:09:30,230 --> 00:09:32,165
837
cb31a037049c updated
Christian Urban <christian.urban@kcl.ac.uk>
parents: 765
diff changeset
   854
It is just twenty-or-something a's.
761
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   855
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   856
188
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   857
00:09:32,165 --> 00:09:35,419
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   858
Imagine you have to
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   859
search a database
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   860
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   861
189
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   862
00:09:35,419 --> 00:09:38,720
837
cb31a037049c updated
Christian Urban <christian.urban@kcl.ac.uk>
parents: 765
diff changeset
   863
with Gigabytes of data
761
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   864
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   865
190
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   866
00:09:38,720 --> 00:09:42,260
837
cb31a037049c updated
Christian Urban <christian.urban@kcl.ac.uk>
parents: 765
diff changeset
   867
with these regular
cb31a037049c updated
Christian Urban <christian.urban@kcl.ac.uk>
parents: 765
diff changeset
   868
expressions that would 
761
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   869
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   870
191
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   871
00:09:42,260 --> 00:09:48,150
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   872
need years to go through with
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   873
these regular expressions.
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   874
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   875
192
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   876
00:09:48,630 --> 00:09:51,850
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   877
Okay, maybe the people in
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   878
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   879
193
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   880
00:09:51,850 --> 00:09:55,435
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   881
Python and JavaScript,
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   882
they're just idiots.
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   883
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   884
194
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   885
00:09:55,435 --> 00:09:58,180
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   886
Surely Java must do much better.
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   887
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   888
195
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   889
00:09:58,180 --> 00:10:01,045
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   890
So here's a program.
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   891
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   892
196
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   893
00:10:01,045 --> 00:10:03,415
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   894
You can see this again
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   895
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   896
197
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   897
00:10:03,415 --> 00:10:05,980
837
cb31a037049c updated
Christian Urban <christian.urban@kcl.ac.uk>
parents: 765
diff changeset
   898
is the regular expression
761
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   899
and we just having
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   900
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   901
198
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   902
00:10:05,980 --> 00:10:08,320
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   903
some scaffolding to generate
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   904
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   905
199
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   906
00:10:08,320 --> 00:10:11,905
837
cb31a037049c updated
Christian Urban <christian.urban@kcl.ac.uk>
parents: 765
diff changeset
   907
strings from 5 up till 28.
761
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   908
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   909
200
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   910
00:10:11,905 --> 00:10:14,305
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   911
And if we run that,
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   912
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   913
201
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   914
00:10:14,305 --> 00:10:16,660
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   915
actually does that automatically.
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   916
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   917
202
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   918
00:10:16,660 --> 00:10:19,900
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   919
So uphill 19, pretty fast,
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   920
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   921
203
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   922
00:10:19,900 --> 00:10:24,925
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   923
but then starting from
837
cb31a037049c updated
Christian Urban <christian.urban@kcl.ac.uk>
parents: 765
diff changeset
   924
23, it is getting pretty slow.
761
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   925
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   926
204
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   927
00:10:24,925 --> 00:10:27,445
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   928
So the question is
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   929
what's going on?
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   930
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   931
205
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   932
00:10:27,445 --> 00:10:29,230
837
cb31a037049c updated
Christian Urban <christian.urban@kcl.ac.uk>
parents: 765
diff changeset
   933
By the way, I'm not gloating here.
761
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   934
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   935
206
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   936
00:10:29,230 --> 00:10:33,755
837
cb31a037049c updated
Christian Urban <christian.urban@kcl.ac.uk>
parents: 765
diff changeset
   937
Scala uses internally
761
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   938
the regular expression
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   939
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   940
207
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   941
00:10:33,755 --> 00:10:36,665
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   942
matching engine from Java.
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   943
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   944
208
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   945
00:10:36,665 --> 00:10:39,065
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   946
So would have exactly
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   947
the same problem.
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   948
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   949
209
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   950
00:10:39,065 --> 00:10:41,480
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   951
Also, I have been
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   952
here very careful,
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   953
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   954
210
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   955
00:10:41,480 --> 00:10:43,550
765
b66602e0b42d updated
Christian Urban <christian.urban@kcl.ac.uk>
parents: 761
diff changeset
   956
I'm using here Java 8,
761
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   957
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   958
211
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   959
00:10:43,550 --> 00:10:46,085
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   960
which nowadays is quite old.
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   961
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   962
212
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   963
00:10:46,085 --> 00:10:50,765
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   964
But you will see also
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   965
current Java versions.
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   966
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   967
213
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   968
00:10:50,765 --> 00:10:55,490
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   969
We will see we can out-compete
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   970
them by magnitudes.
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   971
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   972
214
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   973
00:10:55,490 --> 00:10:57,605
837
cb31a037049c updated
Christian Urban <christian.urban@kcl.ac.uk>
parents: 765
diff changeset
   974
So I think I can 
761
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   975
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   976
215
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   977
00:10:57,605 --> 00:10:59,165
837
cb31a037049c updated
Christian Urban <christian.urban@kcl.ac.uk>
parents: 765
diff changeset
   978
now, just finish this here.
761
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   979
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   980
216
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   981
00:10:59,165 --> 00:11:04,025
837
cb31a037049c updated
Christian Urban <christian.urban@kcl.ac.uk>
parents: 765
diff changeset
   982
You see the problem.
cb31a037049c updated
Christian Urban <christian.urban@kcl.ac.uk>
parents: 765
diff changeset
   983
Just for completeness sake.
761
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   984
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   985
217
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   986
00:11:04,025 --> 00:11:07,010
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   987
Here is a Ruby program.
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   988
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   989
218
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   990
00:11:07,010 --> 00:11:09,935
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   991
This is using the other
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   992
regular expression.
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   993
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   994
219
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   995
00:11:09,935 --> 00:11:12,935
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   996
In this case the
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   997
string should match.
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   998
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   999
220
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1000
00:11:12,935 --> 00:11:20,300
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1001
And again it tries out
837
cb31a037049c updated
Christian Urban <christian.urban@kcl.ac.uk>
parents: 765
diff changeset
  1002
strings between 1 and 30 here.
761
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1003
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1004
221
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1005
00:11:20,300 --> 00:11:23,450
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1006
That's a program actually
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1007
a former student produced.
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1008
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1009
222
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1010
00:11:23,450 --> 00:11:25,565
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1011
And you can see four a's
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1012
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1013
223
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1014
00:11:25,565 --> 00:11:29,780
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1015
of links up till 20
837
cb31a037049c updated
Christian Urban <christian.urban@kcl.ac.uk>
parents: 765
diff changeset
  1016
a's is pretty fast.
761
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1017
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1018
224
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1019
00:11:29,780 --> 00:11:32,495
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1020
But then starting at 26,
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1021
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1022
225
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1023
00:11:32,495 --> 00:11:35,285
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1024
it's getting really slow.
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1025
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1026
226
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1027
00:11:35,285 --> 00:11:37,100
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1028
So in this case,
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1029
remember the string
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1030
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1031
227
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1032
00:11:37,100 --> 00:11:38,870
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1033
is actually matched by
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1034
the regular expression.
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1035
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1036
228
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1037
00:11:38,870 --> 00:11:40,130
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1038
So it has nothing to do
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1039
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1040
229
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1041
00:11:40,130 --> 00:11:41,540
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1042
with a regular
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1043
expression actually
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1044
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1045
230
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1046
00:11:41,540 --> 00:11:45,485
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1047
matches a string or does
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1048
not match a string.
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1049
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1050
231
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1051
00:11:45,485 --> 00:11:48,260
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1052
I admit though these
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1053
regular expressions
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1054
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1055
232
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1056
00:11:48,260 --> 00:11:49,610
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1057
are carefully chosen,
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1058
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1059
233
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1060
00:11:49,610 --> 00:11:52,250
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1061
as you will see later on.
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1062
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1063
234
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1064
00:11:52,250 --> 00:11:55,620
837
cb31a037049c updated
Christian Urban <christian.urban@kcl.ac.uk>
parents: 765
diff changeset
  1065
I also just stop that here.
761
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1066
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1067
235
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1068
00:11:55,710 --> 00:12:00,985
837
cb31a037049c updated
Christian Urban <christian.urban@kcl.ac.uk>
parents: 765
diff changeset
  1069
Okay, this slide collects
cb31a037049c updated
Christian Urban <christian.urban@kcl.ac.uk>
parents: 765
diff changeset
  1070
the information about times.
761
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1071
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1072
236
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1073
00:12:00,985 --> 00:12:03,400
837
cb31a037049c updated
Christian Urban <christian.urban@kcl.ac.uk>
parents: 765
diff changeset
  1074
On the right-hand side will
761
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1075
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1076
237
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1077
00:12:03,400 --> 00:12:05,860
837
cb31a037049c updated
Christian Urban <christian.urban@kcl.ac.uk>
parents: 765
diff changeset
  1078
be our regular expression matcher,
761
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1079
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1080
238
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1081
00:12:05,860 --> 00:12:08,290
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1082
which we implement next week.
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1083
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1084
239
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1085
00:12:08,290 --> 00:12:10,795
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1086
On the left-hand side,
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1087
are these times by
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1088
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1089
240
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1090
00:12:10,795 --> 00:12:14,260
837
cb31a037049c updated
Christian Urban <christian.urban@kcl.ac.uk>
parents: 765
diff changeset
  1091
various other regular
761
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1092
expression matching engines?
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1093
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1094
241
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1095
00:12:14,260 --> 00:12:17,809
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1096
On the top is this
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1097
regular expression.
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1098
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1099
242
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1100
00:12:19,080 --> 00:12:23,335
837
cb31a037049c updated
Christian Urban <christian.urban@kcl.ac.uk>
parents: 765
diff changeset
  1101
Possible a n-times a n-times.
761
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1102
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1103
243
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1104
00:12:23,335 --> 00:12:26,890
837
cb31a037049c updated
Christian Urban <christian.urban@kcl.ac.uk>
parents: 765
diff changeset
  1105
And on the lower
cb31a037049c updated
Christian Urban <christian.urban@kcl.ac.uk>
parents: 765
diff changeset
  1106
is (a*)* b.
761
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1107
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1108
244
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1109
00:12:26,890 --> 00:12:30,370
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1110
And the x-axis show here
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1111
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1112
245
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1113
00:12:30,370 --> 00:12:35,335
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1114
the length of the
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1115
string. How many a's.
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1116
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1117
246
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1118
00:12:35,335 --> 00:12:38,925
837
cb31a037049c updated
Christian Urban <christian.urban@kcl.ac.uk>
parents: 765
diff changeset
  1119
And on the y-axis is the time
761
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1120
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1121
247
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1122
00:12:38,925 --> 00:12:41,660
837
cb31a037049c updated
Christian Urban <christian.urban@kcl.ac.uk>
parents: 765
diff changeset
  1123
they need to decide whether
761
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1124
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1125
248
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1126
00:12:41,660 --> 00:12:44,615
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1127
the string is matched by
837
cb31a037049c updated
Christian Urban <christian.urban@kcl.ac.uk>
parents: 765
diff changeset
  1128
the regular expression or not.
761
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1129
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1130
249
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1131
00:12:44,615 --> 00:12:46,415
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1132
So you can see here, Python,
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1133
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1134
250
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1135
00:12:46,415 --> 00:12:47,945
837
cb31a037049c updated
Christian Urban <christian.urban@kcl.ac.uk>
parents: 765
diff changeset
  1136
Java 8 and JavaScript,
761
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1137
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1138
251
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1139
00:12:47,945 --> 00:12:52,250
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1140
they max out approximately
837
cb31a037049c updated
Christian Urban <christian.urban@kcl.ac.uk>
parents: 765
diff changeset
  1141
at between 25 and 30.
761
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1142
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1143
252
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1144
00:12:52,250 --> 00:12:53,900
837
cb31a037049c updated
Christian Urban <christian.urban@kcl.ac.uk>
parents: 765
diff changeset
  1145
Because then it takes already
761
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1146
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1147
253
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1148
00:12:53,900 --> 00:12:55,160
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1149
a half a minute to
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1150
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1151
254
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1152
00:12:55,160 --> 00:12:57,410
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1153
decide whether the string
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1154
is matched or not.
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1155
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1156
255
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1157
00:12:57,410 --> 00:13:00,815
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1158
And similarly, in
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1159
the other example,
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1160
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1161
256
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1162
00:13:00,815 --> 00:13:03,830
837
cb31a037049c updated
Christian Urban <christian.urban@kcl.ac.uk>
parents: 765
diff changeset
  1163
Python and Ruby max out
761
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1164
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1165
257
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1166
00:13:03,830 --> 00:13:07,220
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1167
at a similar kind of
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1168
length of the strings.
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1169
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1170
258
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1171
00:13:07,220 --> 00:13:10,400
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1172
Because then they use also
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1173
half a minute to decide
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1174
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1175
259
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1176
00:13:10,400 --> 00:13:13,940
837
cb31a037049c updated
Christian Urban <christian.urban@kcl.ac.uk>
parents: 765
diff changeset
  1177
whether this regular expression
761
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1178
actually matches the string.
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1179
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1180
260
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1181
00:13:13,940 --> 00:13:16,790
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1182
Contrast that with
837
cb31a037049c updated
Christian Urban <christian.urban@kcl.ac.uk>
parents: 765
diff changeset
  1183
761
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1184
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1185
261
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1186
00:13:16,790 --> 00:13:19,235
837
cb31a037049c updated
Christian Urban <christian.urban@kcl.ac.uk>
parents: 765
diff changeset
  1187
the regular expression matcher
761
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1188
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1189
262
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1190
00:13:19,235 --> 00:13:21,470
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1191
which we're going to implement.
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1192
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1193
263
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1194
00:13:21,470 --> 00:13:25,040
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1195
This can match
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1196
approximately 10 thousand
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1197
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1198
264
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1199
00:13:25,040 --> 00:13:30,065
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1200
a's in this example and
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1201
needs less than ten seconds.
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1202
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1203
265
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1204
00:13:30,065 --> 00:13:32,285
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1205
Actually, there will be
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1206
two versions of that.
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1207
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1208
266
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1209
00:13:32,285 --> 00:13:34,850
837
cb31a037049c updated
Christian Urban <christian.urban@kcl.ac.uk>
parents: 765
diff changeset
  1210
The first version will be
761
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1211
also relatively slow.
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1212
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1213
267
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1214
00:13:34,850 --> 00:13:36,410
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1215
But the second version,
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1216
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1217
268
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1218
00:13:36,410 --> 00:13:38,240
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1219
in contrast to Python,
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1220
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1221
269
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1222
00:13:38,240 --> 00:13:40,295
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1223
Ruby, we'll be blindingly fast.
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1224
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1225
270
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1226
00:13:40,295 --> 00:13:42,380
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1227
And in the second example,
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1228
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1229
271
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1230
00:13:42,380 --> 00:13:45,740
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1231
you have to be careful
837
cb31a037049c updated
Christian Urban <christian.urban@kcl.ac.uk>
parents: 765
diff changeset
  1232
about the x-axis because
761
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1233
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1234
272
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1235
00:13:45,740 --> 00:13:49,385
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1236
that means four times
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1237
ten to the power six.
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1238
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1239
273
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1240
00:13:49,385 --> 00:13:51,695
837
cb31a037049c updated
Christian Urban <christian.urban@kcl.ac.uk>
parents: 765
diff changeset
  1241
It's actually 4 million a's.
761
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1242
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1243
274
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1244
00:13:51,695 --> 00:13:55,100
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1245
So our regular
837
cb31a037049c updated
Christian Urban <christian.urban@kcl.ac.uk>
parents: 765
diff changeset
  1246
expression matcher needs
761
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1247
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1248
275
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1249
00:13:55,100 --> 00:13:57,635
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1250
less than ten seconds to
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1251
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1252
276
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1253
00:13:57,635 --> 00:14:00,725
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1254
match a string of length
837
cb31a037049c updated
Christian Urban <christian.urban@kcl.ac.uk>
parents: 765
diff changeset
  1255
of 4 million a's.
761
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1256
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1257
277
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1258
00:14:00,725 --> 00:14:04,430
837
cb31a037049c updated
Christian Urban <christian.urban@kcl.ac.uk>
parents: 765
diff changeset
  1259
Contrast that Python, Java 8,
761
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1260
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1261
278
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1262
00:14:04,430 --> 00:14:06,770
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1263
and JavaScript need half a minute
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1264
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1265
279
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1266
00:14:06,770 --> 00:14:09,905
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1267
already for a string
837
cb31a037049c updated
Christian Urban <christian.urban@kcl.ac.uk>
parents: 765
diff changeset
  1268
of length just 30.
761
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1269
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1270
280
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1271
00:14:09,905 --> 00:14:12,365
837
cb31a037049c updated
Christian Urban <christian.urban@kcl.ac.uk>
parents: 765
diff changeset
  1272
I was very careful with Java 8.
761
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1273
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1274
281
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1275
00:14:12,365 --> 00:14:15,725
837
cb31a037049c updated
Christian Urban <christian.urban@kcl.ac.uk>
parents: 765
diff changeset
  1276
Yes, Java 9 and above,
761
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1277
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1278
282
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1279
00:14:15,725 --> 00:14:17,180
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1280
they already have
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1281
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1282
283
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1283
00:14:17,180 --> 00:14:19,610
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1284
a much better regular
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1285
expression matching engine,
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1286
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1287
284
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1288
00:14:19,610 --> 00:14:22,805
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1289
but still we will be running
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1290
circles around them.
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1291
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1292
285
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1293
00:14:22,805 --> 00:14:27,050
837
cb31a037049c updated
Christian Urban <christian.urban@kcl.ac.uk>
parents: 765
diff changeset
  1294
with this data.
cb31a037049c updated
Christian Urban <christian.urban@kcl.ac.uk>
parents: 765
diff changeset
  1295
I call this slide:
761
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1296
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1297
286
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1298
00:14:27,050 --> 00:14:29,675
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1299
Why bother with
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1300
regular expressions?
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1301
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1302
287
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1303
00:14:29,675 --> 00:14:33,515
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1304
But you can probably
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1305
see these are
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1306
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1307
288
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1308
00:14:33,515 --> 00:14:34,910
837
cb31a037049c updated
Christian Urban <christian.urban@kcl.ac.uk>
parents: 765
diff changeset
  1309
abysmal times by
761
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1310
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1311
289
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1312
00:14:34,910 --> 00:14:38,015
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1313
the existing regular
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1314
expression matching engines.
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1315
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1316
290
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1317
00:14:38,015 --> 00:14:40,070
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1318
And it's actually
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1319
surprising that after
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1320
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1321
291
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1322
00:14:40,070 --> 00:14:42,695
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1323
one lecture we can already
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1324
do substantially better.
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1325
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1326
292
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1327
00:14:42,695 --> 00:14:47,495
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1328
And if you don't believe
837
cb31a037049c updated
Christian Urban <christian.urban@kcl.ac.uk>
parents: 765
diff changeset
  1329
in the times, I gave here,
761
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1330
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1331
293
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1332
00:14:47,495 --> 00:14:50,090
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1333
please feel free to
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1334
play on your own
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1335
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1336
294
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1337
00:14:50,090 --> 00:14:52,865
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1338
with the examples
837
cb31a037049c updated
Christian Urban <christian.urban@kcl.ac.uk>
parents: 765
diff changeset
  1339
I uploaded on KEATS.
761
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1340
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1341
295
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1342
00:14:52,865 --> 00:14:55,235
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1343
These are exactly the programs
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1344
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1345
296
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1346
00:14:55,235 --> 00:14:57,470
837
cb31a037049c updated
Christian Urban <christian.urban@kcl.ac.uk>
parents: 765
diff changeset
  1347
I used here in the examples.
761
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1348
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1349
297
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1350
00:14:57,470 --> 00:14:59,255
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1351
So feel free.
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1352
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1353
298
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1354
00:14:59,255 --> 00:15:01,970
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1355
You might however now think, hmm.
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1356
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1357
299
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1358
00:15:01,970 --> 00:15:05,449
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1359
These are two very
837
cb31a037049c updated
Christian Urban <christian.urban@kcl.ac.uk>
parents: 765
diff changeset
  1360
well chosen examples,
761
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1361
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1362
300
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1363
00:15:05,449 --> 00:15:07,145
837
cb31a037049c updated
Christian Urban <christian.urban@kcl.ac.uk>
parents: 765
diff changeset
  1364
and I admit that's true,
761
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1365
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1366
301
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1367
00:15:07,145 --> 00:15:09,410
837
cb31a037049c updated
Christian Urban <christian.urban@kcl.ac.uk>
parents: 765
diff changeset
  1368
and such problems never
761
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1369
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1370
302
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1371
00:15:09,410 --> 00:15:12,540
837
cb31a037049c updated
Christian Urban <christian.urban@kcl.ac.uk>
parents: 765
diff changeset
  1372
cause any problems
761
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1373
in real life.
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1374
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1375
303
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1376
00:15:13,300 --> 00:15:15,980
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1377
Regular expressions are used very
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1378
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1379
304
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1380
00:15:15,980 --> 00:15:19,415
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1381
frequently and they
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1382
do cause problems.
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1383
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1384
305
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1385
00:15:19,415 --> 00:15:21,410
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1386
So here's my first example from
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1387
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1388
306
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1389
00:15:21,410 --> 00:15:23,885
837
cb31a037049c updated
Christian Urban <christian.urban@kcl.ac.uk>
parents: 765
diff changeset
  1390
a company called Cloudflare.
761
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1391
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1392
307
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1393
00:15:23,885 --> 00:15:27,560
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1394
This is a huge hosting company
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1395
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1396
308
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1397
00:15:27,560 --> 00:15:30,935
837
cb31a037049c updated
Christian Urban <christian.urban@kcl.ac.uk>
parents: 765
diff changeset
  1398
which hosts very
761
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1399
well-known web pages.
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1400
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1401
309
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1402
00:15:30,935 --> 00:15:34,970
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1403
And they really try hard
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1404
to have no outage at all.
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1405
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1406
310
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1407
00:15:34,970 --> 00:15:37,340
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1408
And they manage
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1409
that for six years.
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1410
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1411
311
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1412
00:15:37,340 --> 00:15:39,320
837
cb31a037049c updated
Christian Urban <christian.urban@kcl.ac.uk>
parents: 765
diff changeset
  1413
But then a regular expression,
761
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1414
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1415
312
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1416
00:15:39,320 --> 00:15:41,180
837
cb31a037049c updated
Christian Urban <christian.urban@kcl.ac.uk>
parents: 765
diff changeset
  1417
actually this one, caused
761
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1418
a problem and you
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1419
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1420
313
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1421
00:15:41,180 --> 00:15:43,265
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1422
can see they're also
837
cb31a037049c updated
Christian Urban <christian.urban@kcl.ac.uk>
parents: 765
diff changeset
  1423
two stars
761
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1424
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1425
314
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1426
00:15:43,265 --> 00:15:44,630
837
cb31a037049c updated
Christian Urban <christian.urban@kcl.ac.uk>
parents: 765
diff changeset
  1427
at the end.
761
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1428
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1429
315
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1430
00:15:44,630 --> 00:15:46,955
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1431
And because of that string needed
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1432
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1433
316
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1434
00:15:46,955 --> 00:15:49,865
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1435
too much time to be matched.
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1436
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1437
317
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1438
00:15:49,865 --> 00:15:50,990
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1439
And because of that,
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1440
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1441
318
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1442
00:15:50,990 --> 00:15:52,430
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1443
they had some outage for,
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1444
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1445
319
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1446
00:15:52,430 --> 00:15:54,125
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1447
I think several hours,
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1448
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1449
320
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1450
00:15:54,125 --> 00:15:57,920
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1451
actually in their malware
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1452
detection subsystem.
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1453
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1454
321
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1455
00:15:57,920 --> 00:16:02,060
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1456
And the second example
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1457
comes from 2016,
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1458
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1459
322
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1460
00:16:02,060 --> 00:16:04,040
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1461
where Stack Exchange,
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1462
I guess you know
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1463
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1464
323
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1465
00:16:04,040 --> 00:16:06,650
837
cb31a037049c updated
Christian Urban <christian.urban@kcl.ac.uk>
parents: 765
diff changeset
  1466
this webpage, had
cb31a037049c updated
Christian Urban <christian.urban@kcl.ac.uk>
parents: 765
diff changeset
  1467
also an outage for
761
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1468
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1469
324
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1470
00:16:06,650 --> 00:16:08,390
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1471
I think at least an hour.
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1472
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1473
325
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1474
00:16:08,390 --> 00:16:13,070
837
cb31a037049c updated
Christian Urban <christian.urban@kcl.ac.uk>
parents: 765
diff changeset
  1475
Because a regular expression,
cb31a037049c updated
Christian Urban <christian.urban@kcl.ac.uk>
parents: 765
diff changeset
  1476
needed to format posts,
761
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1477
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1478
326
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1479
00:16:13,070 --> 00:16:15,575
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1480
needed too much time to
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1481
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1482
327
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1483
00:16:15,575 --> 00:16:19,010
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1484
recognize whether this post
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1485
should be accepted or not.
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1486
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1487
328
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1488
00:16:19,010 --> 00:16:23,390
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1489
And again, there was a
837
cb31a037049c updated
Christian Urban <christian.urban@kcl.ac.uk>
parents: 765
diff changeset
  1490
similar kind of problem.
761
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1491
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1492
329
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1493
00:16:23,390 --> 00:16:24,950
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1494
And you can read
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1495
the stories behind
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1496
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1497
330
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1498
00:16:24,950 --> 00:16:28,080
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1499
that on these two given links.
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1500
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1501
331
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1502
00:16:28,720 --> 00:16:31,730
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1503
When I looked at
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1504
this the first time,
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1505
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1506
332
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1507
00:16:31,730 --> 00:16:34,175
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1508
what surprised me is
837
cb31a037049c updated
Christian Urban <christian.urban@kcl.ac.uk>
parents: 765
diff changeset
  1509
that theoreticians,
761
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1510
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1511
333
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1512
00:16:34,175 --> 00:16:37,520
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1513
who sometimes dedicate their
837
cb31a037049c updated
Christian Urban <christian.urban@kcl.ac.uk>
parents: 765
diff changeset
  1514
life to regular expressions
761
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1515
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1516
334
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1517
00:16:37,520 --> 00:16:39,440
837
cb31a037049c updated
Christian Urban <christian.urban@kcl.ac.uk>
parents: 765
diff changeset
  1518
and know really a lot about
761
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1519
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1520
335
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1521
00:16:39,440 --> 00:16:41,690
837
cb31a037049c updated
Christian Urban <christian.urban@kcl.ac.uk>
parents: 765
diff changeset
  1522
them, didn't know
761
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1523
anything about this.
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1524
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1525
336
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1526
00:16:41,690 --> 00:16:43,610
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1527
But engineers, they
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1528
already created
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1529
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1530
337
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1531
00:16:43,610 --> 00:16:46,160
837
cb31a037049c updated
Christian Urban <christian.urban@kcl.ac.uk>
parents: 765
diff changeset
  1532
a name for that:
cb31a037049c updated
Christian Urban <christian.urban@kcl.ac.uk>
parents: 765
diff changeset
  1533
Regular Expression
761
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1534
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1535
338
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1536
00:16:46,160 --> 00:16:47,975
837
cb31a037049c updated
Christian Urban <christian.urban@kcl.ac.uk>
parents: 765
diff changeset
  1537
Denial of Service Attack.
761
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1538
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1539
339
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1540
00:16:47,975 --> 00:16:49,745
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1541
Because what you can,
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1542
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1543
340
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1544
00:16:49,745 --> 00:16:51,230
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1545
what can happen now is that
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1546
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1547
341
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1548
00:16:51,230 --> 00:16:54,920
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1549
attackers look for
837
cb31a037049c updated
Christian Urban <christian.urban@kcl.ac.uk>
parents: 765
diff changeset
  1550
certain strings
761
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1551
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1552
342
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1553
00:16:54,920 --> 00:16:56,780
837
cb31a037049c updated
Christian Urban <christian.urban@kcl.ac.uk>
parents: 765
diff changeset
  1554
that make your regular expression
761
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1555
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1556
343
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1557
00:16:56,780 --> 00:16:59,105
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1558
matching engine topple over.
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1559
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1560
344
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1561
00:16:59,105 --> 00:17:01,370
837
cb31a037049c updated
Christian Urban <christian.urban@kcl.ac.uk>
parents: 765
diff changeset
  1562
And these kind of 
761
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1563
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1564
345
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1565
00:17:01,370 --> 00:17:04,160
837
cb31a037049c updated
Christian Urban <christian.urban@kcl.ac.uk>
parents: 765
diff changeset
  1566
regular expressions are called
cb31a037049c updated
Christian Urban <christian.urban@kcl.ac.uk>
parents: 765
diff changeset
  1567
Evil Regular Expressions.
761
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1568
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1569
346
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1570
00:17:04,160 --> 00:17:06,350
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1571
And actually there are
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1572
quite a number of them.
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1573
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1574
347
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1575
00:17:06,350 --> 00:17:08,495
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1576
So you seen this one,
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1577
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1578
348
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1579
00:17:08,495 --> 00:17:11,255
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1580
the first one, and the
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1581
second one already.
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1582
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1583
349
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1584
00:17:11,255 --> 00:17:13,400
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1585
But there are many, many more.
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1586
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1587
350
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1588
00:17:13,400 --> 00:17:15,620
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1589
And you can easily have in
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1590
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1591
351
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1592
00:17:15,620 --> 00:17:18,560
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1593
your program one of
837
cb31a037049c updated
Christian Urban <christian.urban@kcl.ac.uk>
parents: 765
diff changeset
  1594
these regular expressions.
761
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1595
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1596
352
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1597
00:17:18,560 --> 00:17:21,830
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1598
And then you have the
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1599
problem that if you do have
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1600
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1601
353
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1602
00:17:21,830 --> 00:17:23,240
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1603
this regular expression and
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1604
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1605
354
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1606
00:17:23,240 --> 00:17:25,640
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1607
somebody finds the
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1608
corresponding string,
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1609
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1610
355
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1611
00:17:25,640 --> 00:17:29,945
837
cb31a037049c updated
Christian Urban <christian.urban@kcl.ac.uk>
parents: 765
diff changeset
  1612
which make the regular
761
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1613
matching engine topple over,
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1614
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1615
356
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1616
00:17:29,945 --> 00:17:31,820
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1617
then you have a problem
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1618
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1619
357
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1620
00:17:31,820 --> 00:17:34,295
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1621
because your webpage is
837
cb31a037049c updated
Christian Urban <christian.urban@kcl.ac.uk>
parents: 765
diff changeset
  1622
probably not available.
761
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1623
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1624
358
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1625
00:17:34,295 --> 00:17:36,140
837
cb31a037049c updated
Christian Urban <christian.urban@kcl.ac.uk>
parents: 765
diff changeset
  1626
This phenomenon is also sometimes 
761
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1627
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1628
359
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1629
00:17:36,140 --> 00:17:39,350
837
cb31a037049c updated
Christian Urban <christian.urban@kcl.ac.uk>
parents: 765
diff changeset
  1630
called
761
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1631
catastrophic backtracking.
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1632
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1633
360
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1634
00:17:39,350 --> 00:17:43,595
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1635
In lecture three, we will
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1636
look at this more carefully.
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1637
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1638
361
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1639
00:17:43,595 --> 00:17:46,910
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1640
And actually why that
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1641
is such a problem in
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1642
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1643
362
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1644
00:17:46,910 --> 00:17:50,795
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1645
real life is actually
837
cb31a037049c updated
Christian Urban <christian.urban@kcl.ac.uk>
parents: 765
diff changeset
  1646
not to do with lexers.
761
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1647
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1648
363
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1649
00:17:50,795 --> 00:17:53,180
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1650
Yes, regular
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1651
expressions are used as
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1652
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1653
364
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1654
00:17:53,180 --> 00:17:55,040
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1655
the basic tool for implementing
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1656
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1657
365
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1658
00:17:55,040 --> 00:17:57,185
837
cb31a037049c updated
Christian Urban <christian.urban@kcl.ac.uk>
parents: 765
diff changeset
  1659
lexers. But regular expressions,
761
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1660
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1661
366
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1662
00:17:57,185 --> 00:18:00,065
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1663
of course, used in
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1664
a much wider area.
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1665
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1666
367
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1667
00:18:00,065 --> 00:18:03,770
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1668
And they especially used for
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1669
network intrusion detection.
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1670
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1671
368
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1672
00:18:03,770 --> 00:18:06,590
837
cb31a037049c updated
Christian Urban <christian.urban@kcl.ac.uk>
parents: 765
diff changeset
  1673
Remember, say you're having to
761
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1674
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1675
369
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1676
00:18:06,590 --> 00:18:10,130
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1677
administer a big network
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1678
and you only want to let
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1679
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1680
370
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1681
00:18:10,130 --> 00:18:13,640
837
cb31a037049c updated
Christian Urban <christian.urban@kcl.ac.uk>
parents: 765
diff changeset
  1682
in packets which you think are OK
761
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1683
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1684
371
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1685
00:18:13,640 --> 00:18:14,930
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1686
and you want to keep out
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1687
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1688
372
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1689
00:18:14,930 --> 00:18:17,645
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1690
any package which might
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1691
hack into your network.
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1692
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1693
373
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1694
00:18:17,645 --> 00:18:22,670
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1695
So what they have is they
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1696
have suites of thousands and
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1697
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1698
374
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1699
00:18:22,670 --> 00:18:25,745
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1700
sometimes even more
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1701
regular expressions which
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1702
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1703
375
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1704
00:18:25,745 --> 00:18:27,755
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1705
check whether this package
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1706
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1707
376
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1708
00:18:27,755 --> 00:18:30,065
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1709
satisfies some patterns or not.
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1710
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1711
377
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1712
00:18:30,065 --> 00:18:31,460
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1713
And in this case it will be left
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1714
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1715
378
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1716
00:18:31,460 --> 00:18:34,205
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1717
out or it will be let in.
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1718
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1719
379
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1720
00:18:34,205 --> 00:18:36,335
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1721
And with networks,
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1722
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1723
380
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1724
00:18:36,335 --> 00:18:39,080
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1725
the problem is that our
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1726
hardware is already
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1727
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1728
381
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1729
00:18:39,080 --> 00:18:43,190
837
cb31a037049c updated
Christian Urban <christian.urban@kcl.ac.uk>
parents: 765
diff changeset
  1730
so fast that the regular
cb31a037049c updated
Christian Urban <christian.urban@kcl.ac.uk>
parents: 765
diff changeset
  1731
expressions
761
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1732
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1733
382
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1734
00:18:43,190 --> 00:18:45,169
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1735
really become a bottleneck.
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1736
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1737
383
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1738
00:18:45,169 --> 00:18:47,060
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1739
Because what do you do if now is
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1740
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1741
384
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1742
00:18:47,060 --> 00:18:49,880
837
cb31a037049c updated
Christian Urban <christian.urban@kcl.ac.uk>
parents: 765
diff changeset
  1743
suddenly a regular expression
cb31a037049c updated
Christian Urban <christian.urban@kcl.ac.uk>
parents: 765
diff changeset
  1744
takes too much time?
761
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1745
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1746
385
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1747
00:18:49,880 --> 00:18:52,670
837
cb31a037049c updated
Christian Urban <christian.urban@kcl.ac.uk>
parents: 765
diff changeset
  1748
Do you just stop the matching
761
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1749
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1750
386
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1751
00:18:52,670 --> 00:18:55,100
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1752
and let the package
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1753
in regardless?
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1754
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1755
387
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1756
00:18:55,100 --> 00:18:58,190
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1757
Or do you just hold
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1758
the network up
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1759
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1760
388
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1761
00:18:58,190 --> 00:19:01,715
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1762
and don't let anything in
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1763
until you decided that.
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1764
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1765
389
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1766
00:19:01,715 --> 00:19:04,895
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1767
So that's actually a
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1768
really hard problem.
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1769
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1770
390
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1771
00:19:04,895 --> 00:19:06,650
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1772
But the first time I came across
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1773
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1774
391
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1775
00:19:06,650 --> 00:19:09,965
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1776
that problem was actually
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1777
by this engineer.
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1778
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1779
392
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1780
00:19:09,965 --> 00:19:13,820
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1781
And it's always say that
837
cb31a037049c updated
Christian Urban <christian.urban@kcl.ac.uk>
parents: 765
diff changeset
  1782
Germans don't have any humor.
761
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1783
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1784
393
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1785
00:19:13,820 --> 00:19:16,985
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1786
But I found that
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1787
video quite funny.
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1788
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1789
394
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1790
00:19:16,985 --> 00:19:19,145
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1791
Maybe you have a
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1792
different opinion,
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1793
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1794
395
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1795
00:19:19,145 --> 00:19:21,095
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1796
but feel free to
837
cb31a037049c updated
Christian Urban <christian.urban@kcl.ac.uk>
parents: 765
diff changeset
  1797
have a look. 
761
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1798
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1799
396
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1800
00:19:21,095 --> 00:19:23,705
837
cb31a037049c updated
Christian Urban <christian.urban@kcl.ac.uk>
parents: 765
diff changeset
  1801
It explains exactly that problem.
761
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1802
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1803
397
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1804
00:19:23,705 --> 00:19:25,610
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1805
So in the next video,
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1806
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1807
398
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1808
00:19:25,610 --> 00:19:28,445
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1809
we will start to
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1810
implement this matcher.
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1811
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1812
399
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1813
00:19:28,445 --> 00:19:30,870
fb07ac060866 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1814
So I hope to see you there.