videos/01-basics1.srt
author Christian Urban <christian.urban@kcl.ac.uk>
Fri, 29 Nov 2024 18:59:32 +0000
changeset 976 e9eac62928f5
parent 763 4e628958c01a
permissions -rw-r--r--
updated
Ignore whitespace changes - Everywhere: Within whitespace: At end of lines:
763
4e628958c01a updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
     1
1
4e628958c01a updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
     2
00:00:06,710 --> 00:00:09,225
4e628958c01a updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
     3
Thanks for tuning in again.
4e628958c01a updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
     4
4e628958c01a updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
     5
2
4e628958c01a updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
     6
00:00:09,225 --> 00:00:11,640
4e628958c01a updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
     7
In this video, we want to specify
4e628958c01a updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
     8
4e628958c01a updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
     9
3
4e628958c01a updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
    10
00:00:11,640 --> 00:00:14,370
4e628958c01a updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
    11
what problem our regular
4e628958c01a updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
    12
expression matcher
4e628958c01a updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
    13
4e628958c01a updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
    14
4
4e628958c01a updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
    15
00:00:14,370 --> 00:00:16,155
4e628958c01a updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
    16
is actually supposed to solve.
4e628958c01a updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
    17
4e628958c01a updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
    18
5
4e628958c01a updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
    19
00:00:16,155 --> 00:00:18,900
4e628958c01a updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
    20
The reason is that
4e628958c01a updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
    21
we know that some of
4e628958c01a updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
    22
4e628958c01a updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
    23
6
4e628958c01a updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
    24
00:00:18,900 --> 00:00:21,585
4e628958c01a updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
    25
the existing regular
4e628958c01a updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
    26
expression matching engines
4e628958c01a updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
    27
4e628958c01a updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
    28
7
4e628958c01a updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
    29
00:00:21,585 --> 00:00:25,200
4e628958c01a updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
    30
are not just abysmally
4e628958c01a updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
    31
slow in some examples,
4e628958c01a updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
    32
4e628958c01a updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
    33
8
4e628958c01a updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
    34
00:00:25,200 --> 00:00:27,105
4e628958c01a updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
    35
as you've seen in the
4e628958c01a updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
    36
previous video,
4e628958c01a updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
    37
4e628958c01a updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
    38
9
4e628958c01a updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
    39
00:00:27,105 --> 00:00:30,570
4e628958c01a updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
    40
but also produce sometimes
4e628958c01a updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
    41
incorrect results.
4e628958c01a updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
    42
4e628958c01a updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
    43
10
4e628958c01a updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
    44
00:00:30,570 --> 00:00:33,330
4e628958c01a updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
    45
In order to avoid
4e628958c01a updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
    46
this with our matcher,
4e628958c01a updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
    47
4e628958c01a updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
    48
11
4e628958c01a updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
    49
00:00:33,330 --> 00:00:35,325
4e628958c01a updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
    50
we need to somehow explain
4e628958c01a updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
    51
4e628958c01a updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
    52
12
4e628958c01a updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
    53
00:00:35,325 --> 00:00:39,255
4e628958c01a updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
    54
precisely what is the problem
4e628958c01a updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
    55
our algorithm solves.
4e628958c01a updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
    56
4e628958c01a updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
    57
13
4e628958c01a updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
    58
00:00:39,255 --> 00:00:41,935
4e628958c01a updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
    59
This will require
4e628958c01a updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
    60
a bit of theory, but
4e628958c01a updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
    61
4e628958c01a updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
    62
14
4e628958c01a updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
    63
00:00:41,935 --> 00:00:45,335
4e628958c01a updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
    64
I hope it is nevertheless
4e628958c01a updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
    65
a bit of fun.
4e628958c01a updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
    66
4e628958c01a updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
    67
15
4e628958c01a updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
    68
00:00:45,335 --> 00:00:47,915
4e628958c01a updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
    69
First, we have to specify
4e628958c01a updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
    70
4e628958c01a updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
    71
16
4e628958c01a updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
    72
00:00:47,915 --> 00:00:50,585
4e628958c01a updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
    73
what we mean by a
4e628958c01a updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
    74
regular expression.
4e628958c01a updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
    75
4e628958c01a updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
    76
17
4e628958c01a updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
    77
00:00:50,585 --> 00:00:53,210
4e628958c01a updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
    78
You've seen earlier some
4e628958c01a updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
    79
examples. They were
4e628958c01a updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
    80
4e628958c01a updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
    81
18
4e628958c01a updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
    82
00:00:53,210 --> 00:00:56,060
4e628958c01a updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
    83
actually taken or
4e628958c01a updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
    84
inspired by what
4e628958c01a updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
    85
4e628958c01a updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
    86
19
4e628958c01a updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
    87
00:00:56,060 --> 00:00:58,850
4e628958c01a updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
    88
is available in standard
4e628958c01a updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
    89
regular expression matching
4e628958c01a updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
    90
4e628958c01a updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
    91
20
4e628958c01a updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
    92
00:00:58,850 --> 00:01:02,330
4e628958c01a updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
    93
engines, like star, plus and n-times.
4e628958c01a updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
    94
4e628958c01a updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
    95
21
4e628958c01a updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
    96
00:01:02,330 --> 00:01:05,690
4e628958c01a updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
    97
But for many tasks,
4e628958c01a updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
    98
for our algorithm,
4e628958c01a updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
    99
4e628958c01a updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   100
22
4e628958c01a updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   101
00:01:05,690 --> 00:01:10,174
4e628958c01a updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   102
we will focus only what I call
4e628958c01a updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   103
basic regular expressions.
4e628958c01a updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   104
4e628958c01a updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   105
23
4e628958c01a updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   106
00:01:10,174 --> 00:01:11,840
4e628958c01a updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   107
Since I'm lazy, I will call
4e628958c01a updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   108
4e628958c01a updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   109
24
4e628958c01a updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   110
00:01:11,840 --> 00:01:13,550
4e628958c01a updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   111
these basic regular expressions
4e628958c01a updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   112
4e628958c01a updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   113
25
4e628958c01a updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   114
00:01:13,550 --> 00:01:15,485
4e628958c01a updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   115
just as regular expressions.
4e628958c01a updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   116
4e628958c01a updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   117
26
4e628958c01a updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   118
00:01:15,485 --> 00:01:17,405
4e628958c01a updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   119
And the ones you've seen earlier
4e628958c01a updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   120
4e628958c01a updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   121
27
4e628958c01a updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   122
00:01:17,405 --> 00:01:19,400
4e628958c01a updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   123
as extended regular expressions.
4e628958c01a updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   124
4e628958c01a updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   125
28
4e628958c01a updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   126
00:01:19,400 --> 00:01:22,940
4e628958c01a updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   127
So the basic regulare expressions,
4e628958c01a updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   128
or just regular expressions,
4e628958c01a updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   129
4e628958c01a updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   130
29
4e628958c01a updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   131
00:01:22,940 --> 00:01:25,280
4e628958c01a updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   132
they will have characters.
4e628958c01a updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   133
4e628958c01a updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   134
30
4e628958c01a updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   135
00:01:25,280 --> 00:01:27,170
4e628958c01a updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   136
So you can match any character,
4e628958c01a updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   137
4e628958c01a updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   138
31
4e628958c01a updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   139
00:01:27,170 --> 00:01:31,370
4e628958c01a updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   140
a,b,c to z or 0 to 9.
4e628958c01a updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   141
4e628958c01a updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   142
32
4e628958c01a updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   143
00:01:31,370 --> 00:01:35,525
4e628958c01a updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   144
Any Ascii character. 'c' here
4e628958c01a updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   145
is just a representative.
4e628958c01a updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   146
4e628958c01a updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   147
33
4e628958c01a updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   148
00:01:35,525 --> 00:01:38,825
4e628958c01a updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   149
So we can match
4e628958c01a updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   150
single characters.
4e628958c01a updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   151
4e628958c01a updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   152
34
4e628958c01a updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   153
00:01:38,825 --> 00:01:42,440
4e628958c01a updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   154
Then we can match alternatives.
4e628958c01a updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   155
4e628958c01a updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   156
35
4e628958c01a updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   157
00:01:42,440 --> 00:01:44,930
4e628958c01a updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   158
That means a string
4e628958c01a updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   159
is either matched
4e628958c01a updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   160
4e628958c01a updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   161
36
4e628958c01a updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   162
00:01:44,930 --> 00:01:46,730
4e628958c01a updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   163
by the regular expression r1
4e628958c01a updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   164
4e628958c01a updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   165
37
4e628958c01a updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   166
00:01:46,730 --> 00:01:49,324
4e628958c01a updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   167
or by the regular expression r2.
4e628958c01a updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   168
4e628958c01a updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   169
38
4e628958c01a updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   170
00:01:49,324 --> 00:01:52,790
4e628958c01a updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   171
And for the
4e628958c01a updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   172
alternative we write +.
4e628958c01a updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   173
4e628958c01a updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   174
39
4e628958c01a updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   175
00:01:52,790 --> 00:01:55,175
4e628958c01a updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   176
Then we also have sequence.
4e628958c01a updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   177
4e628958c01a updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   178
40
4e628958c01a updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   179
00:01:55,175 --> 00:01:57,410
4e628958c01a updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   180
This sequence regular
4e628958c01a updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   181
expression essentially
4e628958c01a updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   182
4e628958c01a updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   183
41
4e628958c01a updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   184
00:01:57,410 --> 00:01:59,915
4e628958c01a updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   185
says that a string needs to be matched
4e628958c01a updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   186
4e628958c01a updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   187
42
4e628958c01a updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   188
00:01:59,915 --> 00:02:02,210
4e628958c01a updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   189
the first part by
4e628958c01a updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   190
the regular expression r1
4e628958c01a updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   191
4e628958c01a updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   192
43
4e628958c01a updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   193
00:02:02,210 --> 00:02:06,275
4e628958c01a updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   194
and then the second
4e628958c01a updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   195
part by the r2.
4e628958c01a updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   196
4e628958c01a updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   197
44
4e628958c01a updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   198
00:02:06,275 --> 00:02:10,190
4e628958c01a updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   199
And then we have also the
4e628958c01a updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   200
star regular expression,
4e628958c01a updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   201
4e628958c01a updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   202
45
4e628958c01a updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   203
00:02:10,190 --> 00:02:12,980
4e628958c01a updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   204
which says the regular
4e628958c01a updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   205
expression needs to match
4e628958c01a updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   206
4e628958c01a updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   207
46
4e628958c01a updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   208
00:02:12,980 --> 00:02:16,520
4e628958c01a updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   209
the string with zero
4e628958c01a updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   210
or more copies.
4e628958c01a updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   211
4e628958c01a updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   212
47
4e628958c01a updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   213
00:02:16,520 --> 00:02:18,140
4e628958c01a updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   214
And then we also have some
4e628958c01a updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   215
4e628958c01a updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   216
48
4e628958c01a updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   217
00:02:18,140 --> 00:02:20,060
4e628958c01a updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   218
slightly strange
4e628958c01a updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   219
regular expressions.
4e628958c01a updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   220
4e628958c01a updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   221
49
4e628958c01a updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   222
00:02:20,060 --> 00:02:22,505
4e628958c01a updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   223
We have the regular expression 1,
4e628958c01a updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   224
4e628958c01a updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   225
50
4e628958c01a updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   226
00:02:22,505 --> 00:02:25,910
4e628958c01a updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   227
which can only match
4e628958c01a updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   228
the empty string.
4e628958c01a updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   229
4e628958c01a updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   230
51
4e628958c01a updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   231
00:02:25,910 --> 00:02:29,075
4e628958c01a updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   232
I'm using here the
4e628958c01a updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   233
notation 1 for that
4e628958c01a updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   234
4e628958c01a updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   235
52
4e628958c01a updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   236
00:02:29,075 --> 00:02:31,340
4e628958c01a updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   237
and in my writing I will always
4e628958c01a updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   238
4e628958c01a updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   239
53
4e628958c01a updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   240
00:02:31,340 --> 00:02:33,440
4e628958c01a updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   241
make sure that for the
4e628958c01a updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   242
regular expression
4e628958c01a updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   243
4e628958c01a updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   244
54
4e628958c01a updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   245
00:02:33,440 --> 00:02:35,765
4e628958c01a updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   246
I will write the
4e628958c01a updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   247
1 in a bold font.
4e628958c01a updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   248
4e628958c01a updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   249
55
4e628958c01a updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   250
00:02:35,765 --> 00:02:38,510
4e628958c01a updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   251
So whenever you see
4e628958c01a updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   252
a 1 in bold font,
4e628958c01a updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   253
4e628958c01a updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   254
56
4e628958c01a updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   255
00:02:38,510 --> 00:02:40,395
4e628958c01a updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   256
this is not the 1, but
4e628958c01a updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   257
4e628958c01a updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   258
57
4e628958c01a updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   259
00:02:40,395 --> 00:02:44,300
4e628958c01a updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   260
the regular expression which
4e628958c01a updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   261
can match the empty string.
4e628958c01a updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   262
4e628958c01a updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   263
58
4e628958c01a updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   264
00:02:44,300 --> 00:02:48,050
4e628958c01a updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   265
And we also have the
4e628958c01a updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   266
regular expression 0,
4e628958c01a updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   267
4e628958c01a updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   268
59
4e628958c01a updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   269
00:02:48,050 --> 00:02:50,315
4e628958c01a updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   270
which cannot match
4e628958c01a updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   271
anything at all.
4e628958c01a updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   272
4e628958c01a updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   273
60
4e628958c01a updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   274
00:02:50,315 --> 00:02:51,695
4e628958c01a updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   275
You might think, well,
4e628958c01a updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   276
4e628958c01a updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   277
61
4e628958c01a updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   278
00:02:51,695 --> 00:02:54,635
4e628958c01a updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   279
that's not much use if it cannot
4e628958c01a updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   280
match anything at all,
4e628958c01a updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   281
4e628958c01a updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   282
62
4e628958c01a updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   283
00:02:54,635 --> 00:02:58,130
4e628958c01a updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   284
but you will see why that
4e628958c01a updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   285
one is important later on.
4e628958c01a updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   286
4e628958c01a updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   287
63
4e628958c01a updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   288
00:02:58,130 --> 00:03:00,785
4e628958c01a updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   289
So our basic regular expressions,
4e628958c01a updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   290
4e628958c01a updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   291
64
4e628958c01a updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   292
00:03:00,785 --> 00:03:02,375
4e628958c01a updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   293
they will be 0,
4e628958c01a updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   294
4e628958c01a updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   295
65
4e628958c01a updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   296
00:03:02,375 --> 00:03:08,390
4e628958c01a updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   297
1, characters, alternatives,
4e628958c01a updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   298
sequences and stars.
4e628958c01a updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   299
4e628958c01a updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   300
66
4e628958c01a updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   301
00:03:08,390 --> 00:03:12,170
4e628958c01a updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   302
And these are all the
4e628958c01a updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   303
basic regular expressions.
4e628958c01a updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   304
4e628958c01a updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   305
67
4e628958c01a updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   306
00:03:12,170 --> 00:03:16,280
4e628958c01a updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   307
If this definition is a
4e628958c01a updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   308
bit too abstract for you,
4e628958c01a updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   309
4e628958c01a updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   310
68
4e628958c01a updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   311
00:03:16,280 --> 00:03:18,560
4e628958c01a updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   312
we can also look at
4e628958c01a updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   313
the concrete code,
4e628958c01a updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   314
4e628958c01a updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   315
69
4e628958c01a updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   316
00:03:18,560 --> 00:03:23,060
4e628958c01a updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   317
how that would pan out when
4e628958c01a updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   318
actually writing some Scala.
4e628958c01a updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   319
4e628958c01a updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   320
70
4e628958c01a updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   321
00:03:23,060 --> 00:03:28,040
4e628958c01a updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   322
I promised you, I show
4e628958c01a updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   323
you always my code in Scala.
4e628958c01a updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   324
4e628958c01a updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   325
71
4e628958c01a updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   326
00:03:28,040 --> 00:03:29,480
4e628958c01a updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   327
So here you would have
4e628958c01a updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   328
4e628958c01a updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   329
72
4e628958c01a updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   330
00:03:29,480 --> 00:03:32,885
4e628958c01a updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   331
first an abstract class
4e628958c01a updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   332
for regular expressions.
4e628958c01a updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   333
4e628958c01a updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   334
73
4e628958c01a updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   335
00:03:32,885 --> 00:03:37,580
4e628958c01a updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   336
Then you have one regular
4e628958c01a updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   337
expression for 0, 
4e628958c01a updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   338
4e628958c01a updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   339
74
4e628958c01a updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   340
00:03:37,580 --> 00:03:41,540
4e628958c01a updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   341
one regular expression for 1, 
4e628958c01a updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   342
4e628958c01a updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   343
75
4e628958c01a updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   344
00:03:41,540 --> 00:03:42,875
4e628958c01a updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   345
one regular expression, which
4e628958c01a updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   346
takes an argument,
4e628958c01a updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   347
4e628958c01a updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   348
76
4e628958c01a updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   349
00:03:42,875 --> 00:03:45,050
4e628958c01a updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   350
the character you want to match,
4e628958c01a updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   351
4e628958c01a updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   352
77
4e628958c01a updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   353
00:03:45,050 --> 00:03:47,915
4e628958c01a updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   354
the characters a,b, c and so on.
4e628958c01a updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   355
4e628958c01a updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   356
78
4e628958c01a updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   357
00:03:47,915 --> 00:03:50,945
4e628958c01a updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   358
Then we have an alternative
4e628958c01a updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   359
regular expression,
4e628958c01a updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   360
4e628958c01a updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   361
79
4e628958c01a updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   362
00:03:50,945 --> 00:03:53,480
4e628958c01a updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   363
which takes the first
4e628958c01a updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   364
alternative and
4e628958c01a updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   365
4e628958c01a updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   366
80
4e628958c01a updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   367
00:03:53,480 --> 00:03:56,435
4e628958c01a updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   368
the second alternative
4e628958c01a updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   369
as arguments.
4e628958c01a updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   370
4e628958c01a updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   371
81
4e628958c01a updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   372
00:03:56,435 --> 00:03:59,690
4e628958c01a updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   373
And we have a sequence
4e628958c01a updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   374
regular expression. Again,
4e628958c01a updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   375
4e628958c01a updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   376
82
4e628958c01a updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   377
00:03:59,690 --> 00:04:01,850
4e628958c01a updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   378
which takes the
4e628958c01a updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   379
first component and
4e628958c01a updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   380
4e628958c01a updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   381
83
4e628958c01a updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   382
00:04:01,850 --> 00:04:04,730
4e628958c01a updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   383
the second component
4e628958c01a updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   384
as two arguments.
4e628958c01a updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   385
4e628958c01a updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   386
84
4e628958c01a updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   387
00:04:04,730 --> 00:04:07,249
4e628958c01a updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   388
And we have the star
4e628958c01a updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   389
regular expression,
4e628958c01a updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   390
4e628958c01a updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   391
85
4e628958c01a updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   392
00:04:07,249 --> 00:04:10,880
4e628958c01a updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   393
which just take one regular
4e628958c01a updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   394
expression as argument.
4e628958c01a updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   395
4e628958c01a updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   396
86
4e628958c01a updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   397
00:04:10,880 --> 00:04:16,115
4e628958c01a updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   398
And all these reg expressions
4e628958c01a updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   399
extend our abstract class.
4e628958c01a updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   400
4e628958c01a updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   401
87
4e628958c01a updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   402
00:04:16,115 --> 00:04:20,300
4e628958c01a updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   403
For whatever I do in
4e628958c01a updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   404
this module here I have
4e628958c01a updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   405
4e628958c01a updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   406
88
4e628958c01a updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   407
00:04:20,300 --> 00:04:23,300
4e628958c01a updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   408
the convention that all
4e628958c01a updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   409
the regular expressions
4e628958c01a updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   410
4e628958c01a updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   411
89
4e628958c01a updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   412
00:04:23,300 --> 00:04:25,550
4e628958c01a updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   413
are written with capital letters.
4e628958c01a updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   414
4e628958c01a updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   415
90
4e628958c01a updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   416
00:04:25,550 --> 00:04:26,885
4e628958c01a updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   417
As you can see that here,
4e628958c01a updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   418
4e628958c01a updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   419
91
4e628958c01a updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   420
00:04:26,885 --> 00:04:31,685
4e628958c01a updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   421
O, 1,  character, these will be
4e628958c01a updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   422
always regular expressions.
4e628958c01a updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   423
4e628958c01a updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   424
92
4e628958c01a updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   425
00:04:31,685 --> 00:04:34,370
4e628958c01a updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   426
They have all capital letters.
4e628958c01a updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   427
4e628958c01a updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   428
93
4e628958c01a updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   429
00:04:34,370 --> 00:04:36,484
4e628958c01a updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   430
Let's for a moment,
4e628958c01a updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   431
4e628958c01a updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   432
94
4e628958c01a updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   433
00:04:36,484 --> 00:04:38,720
4e628958c01a updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   434
play around with this definition.
4e628958c01a updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   435
4e628958c01a updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   436
95
4e628958c01a updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   437
00:04:38,720 --> 00:04:41,945
4e628958c01a updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   438
I'm using here the
4e628958c01a updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   439
Ammonite REPL.
4e628958c01a updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   440
4e628958c01a updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   441
96
4e628958c01a updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   442
00:04:41,945 --> 00:04:46,950
4e628958c01a updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   443
And I can evaluate
4e628958c01a updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   444
this definition.
4e628958c01a updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   445
4e628958c01a updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   446
97
4e628958c01a updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   447
00:04:53,430 --> 00:04:55,810
4e628958c01a updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   448
And now I can start to
4e628958c01a updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   449
4e628958c01a updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   450
98
4e628958c01a updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   451
00:04:55,810 --> 00:04:58,570
4e628958c01a updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   452
define particular
4e628958c01a updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   453
regular expressions.
4e628958c01a updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   454
4e628958c01a updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   455
99
4e628958c01a updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   456
00:04:58,570 --> 00:05:00,340
4e628958c01a updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   457
For example, if I need
4e628958c01a updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   458
a regular expression
4e628958c01a updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   459
4e628958c01a updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   460
100
4e628958c01a updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   461
00:05:00,340 --> 00:05:02,860
4e628958c01a updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   462
which can recognise
4e628958c01a updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   463
the character a,
4e628958c01a updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   464
4e628958c01a updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   465
101
4e628958c01a updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   466
00:05:02,860 --> 00:05:06,025
4e628958c01a updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   467
then I would write
4e628958c01a updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   468
something like this.
4e628958c01a updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   469
4e628958c01a updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   470
102
4e628958c01a updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   471
00:05:06,025 --> 00:05:08,710
4e628958c01a updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   472
So this regular expression
4e628958c01a updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   473
takes an argument,
4e628958c01a updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   474
4e628958c01a updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   475
103
4e628958c01a updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   476
00:05:08,710 --> 00:05:13,615
4e628958c01a updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   477
the character 'a'  to specify
4e628958c01a updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   478
which character to match.
4e628958c01a updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   479
4e628958c01a updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   480
104
4e628958c01a updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   481
00:05:13,615 --> 00:05:16,945
4e628958c01a updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   482
We do this obviously also with 'b'.
4e628958c01a updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   483
4e628958c01a updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   484
105
4e628958c01a updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   485
00:05:16,945 --> 00:05:19,405
4e628958c01a updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   486
And I can do that with
4e628958c01a updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   487
4e628958c01a updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   488
106
4e628958c01a updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   489
00:05:19,405 --> 00:05:22,975
4e628958c01a updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   490
'c'. So now we have three
4e628958c01a updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   491
regular expressions.
4e628958c01a updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   492
4e628958c01a updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   493
107
4e628958c01a updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   494
00:05:22,975 --> 00:05:25,570
4e628958c01a updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   495
If you look very carefully
4e628958c01a updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   496
at this definition,
4e628958c01a updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   497
4e628958c01a updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   498
108
4e628958c01a updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   499
00:05:25,570 --> 00:05:27,070
4e628958c01a updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   500
you can actually see
4e628958c01a updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   501
4e628958c01a updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   502
109
4e628958c01a updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   503
00:05:27,070 --> 00:05:29,940
4e628958c01a updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   504
these regular
4e628958c01a updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   505
expressions are trees.
4e628958c01a updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   506
4e628958c01a updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   507
110
4e628958c01a updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   508
00:05:29,940 --> 00:05:33,365
4e628958c01a updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   509
So no matter what we
4e628958c01a updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   510
write down on paper,
4e628958c01a updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   511
4e628958c01a updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   512
111
4e628958c01a updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   513
00:05:33,365 --> 00:05:36,755
4e628958c01a updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   514
they are behind the
4e628958c01a updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   515
scenes always trees.
4e628958c01a updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   516
4e628958c01a updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   517
112
4e628958c01a updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   518
00:05:36,755 --> 00:05:40,010
4e628958c01a updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   519
And you can see that
4e628958c01a updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   520
actually in this definition.
4e628958c01a updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   521
4e628958c01a updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   522
113
4e628958c01a updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   523
00:05:40,010 --> 00:05:44,330
4e628958c01a updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   524
If you define two regular
4e628958c01a updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   525
expressions r1 and r2.
4e628958c01a updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   526
4e628958c01a updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   527
114
4e628958c01a updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   528
00:05:44,330 --> 00:05:49,310
4e628958c01a updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   529
They are essentially
4e628958c01a updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   530
the alternative of a, b and c.
4e628958c01a updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   531
4e628958c01a updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   532
115
4e628958c01a updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   533
00:05:49,310 --> 00:05:52,760
4e628958c01a updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   534
Then this regular expression
4e628958c01a updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   535
4e628958c01a updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   536
116
4e628958c01a updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   537
00:05:52,760 --> 00:05:54,710
4e628958c01a updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   538
can match either the character
4e628958c01a updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   539
4e628958c01a updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   540
117
4e628958c01a updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   541
00:05:54,710 --> 00:05:57,980
4e628958c01a updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   542
a or the character b
4e628958c01a updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   543
or the character c.
4e628958c01a updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   544
4e628958c01a updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   545
118
4e628958c01a updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   546
00:05:57,980 --> 00:06:01,640
4e628958c01a updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   547
And the same for the
4e628958c01a updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   548
regular expression r2.
4e628958c01a updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   549
4e628958c01a updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   550
119
4e628958c01a updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   551
00:06:01,640 --> 00:06:03,875
4e628958c01a updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   552
So let me just evaluate that.
4e628958c01a updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   553
4e628958c01a updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   554
120
4e628958c01a updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   555
00:06:03,875 --> 00:06:05,690
4e628958c01a updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   556
And even though these are
4e628958c01a updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   557
4e628958c01a updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   558
121
4e628958c01a updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   559
00:06:05,690 --> 00:06:07,175
4e628958c01a updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   560
two regular expressions
4e628958c01a updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   561
which can match
4e628958c01a updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   562
4e628958c01a updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   563
122
4e628958c01a updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   564
00:06:07,175 --> 00:06:11,750
4e628958c01a updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   565
exactly the same things,
4e628958c01a updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   566
they a different trees.
4e628958c01a updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   567
4e628958c01a updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   568
123
4e628958c01a updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   569
00:06:11,750 --> 00:06:14,195
4e628958c01a updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   570
So if I ask Scala,
4e628958c01a updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   571
4e628958c01a updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   572
124
4e628958c01a updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   573
00:06:14,195 --> 00:06:16,460
4e628958c01a updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   574
are these trees different?
4e628958c01a updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   575
4e628958c01a updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   576
125
4e628958c01a updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   577
00:06:16,460 --> 00:06:19,250
4e628958c01a updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   578
Or ask if they're
4e628958c01a updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   579
4e628958c01a updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   580
126
4e628958c01a updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   581
00:06:19,250 --> 00:06:21,865
4e628958c01a updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   582
the same, then Scala will say No,
4e628958c01a updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   583
4e628958c01a updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   584
127
4e628958c01a updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   585
00:06:21,865 --> 00:06:25,440
4e628958c01a updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   586
they actually different trees.
4e628958c01a updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   587
4e628958c01a updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   588
128
4e628958c01a updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   589
00:06:25,450 --> 00:06:28,459
4e628958c01a updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   590
Let's come back to
4e628958c01a updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   591
this definition.
4e628958c01a updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   592
4e628958c01a updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   593
129
4e628958c01a updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   594
00:06:28,459 --> 00:06:31,760
4e628958c01a updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   595
If we want to write down
4e628958c01a updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   596
regular expressions on paper,
4e628958c01a updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   597
4e628958c01a updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   598
130
4e628958c01a updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   599
00:06:31,760 --> 00:06:33,620
4e628958c01a updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   600
then we want to be sloppy as
4e628958c01a updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   601
4e628958c01a updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   602
131
4e628958c01a updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   603
00:06:33,620 --> 00:06:35,750
4e628958c01a updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   604
mathematicians rather than as
4e628958c01a updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   605
4e628958c01a updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   606
132
4e628958c01a updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   607
00:06:35,750 --> 00:06:37,745
4e628958c01a updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   608
precise as computer scientists.
4e628958c01a updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   609
4e628958c01a updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   610
133
4e628958c01a updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   611
00:06:37,745 --> 00:06:40,490
4e628958c01a updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   612
So when we want to write down
4e628958c01a updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   613
a regular expression which can
4e628958c01a updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   614
4e628958c01a updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   615
134
4e628958c01a updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   616
00:06:40,490 --> 00:06:43,955
4e628958c01a updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   617
either match the character
4e628958c01a updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   618
a or the character b,
4e628958c01a updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   619
4e628958c01a updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   620
135
4e628958c01a updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   621
00:06:43,955 --> 00:06:49,130
4e628958c01a updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   622
then we would write down
4e628958c01a updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   623
something like this, a plus b.
4e628958c01a updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   624
4e628958c01a updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   625
136
4e628958c01a updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   626
00:06:49,130 --> 00:06:51,170
4e628958c01a updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   627
And if you want to have
4e628958c01a updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   628
the regular expression
4e628958c01a updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   629
4e628958c01a updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   630
137
4e628958c01a updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   631
00:06:51,170 --> 00:06:52,625
4e628958c01a updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   632
which can either match
4e628958c01a updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   633
4e628958c01a updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   634
138
4e628958c01a updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   635
00:06:52,625 --> 00:06:55,925
4e628958c01a updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   636
the character a or b or c,
4e628958c01a updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   637
4e628958c01a updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   638
139
4e628958c01a updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   639
00:06:55,925 --> 00:06:58,340
4e628958c01a updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   640
we will write
4e628958c01a updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   641
something like this.
4e628958c01a updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   642
4e628958c01a updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   643
140
4e628958c01a updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   644
00:06:58,340 --> 00:07:01,370
4e628958c01a updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   645
But of course behind the
4e628958c01a updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   646
scenes, these are trees.
4e628958c01a updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   647
4e628958c01a updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   648
141
4e628958c01a updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   649
00:07:01,370 --> 00:07:04,460
4e628958c01a updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   650
So we should have written
4e628958c01a updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   651
them with parentheses.
4e628958c01a updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   652
4e628958c01a updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   653
142
4e628958c01a updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   654
00:07:04,460 --> 00:07:06,440
4e628958c01a updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   655
And you can see
4e628958c01a updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   656
actually, there are two
4e628958c01a updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   657
4e628958c01a updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   658
143
4e628958c01a updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   659
00:07:06,440 --> 00:07:08,990
4e628958c01a updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   660
regular expressions I
4e628958c01a updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   661
could have written down.
4e628958c01a updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   662
4e628958c01a updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   663
144
4e628958c01a updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   664
00:07:08,990 --> 00:07:11,270
4e628958c01a updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   665
They're different.
4e628958c01a updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   666
4e628958c01a updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   667
145
4e628958c01a updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   668
00:07:11,270 --> 00:07:12,710
4e628958c01a updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   669
Just by convention,
4e628958c01a updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   670
4e628958c01a updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   671
146
4e628958c01a updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   672
00:07:12,710 --> 00:07:15,575
4e628958c01a updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   673
we on't write these parentheses.
4e628958c01a updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   674
4e628958c01a updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   675
147
4e628958c01a updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   676
00:07:15,575 --> 00:07:18,740
4e628958c01a updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   677
And that is similar with sequences.
4e628958c01a updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   678
4e628958c01a updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   679
148
4e628958c01a updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   680
00:07:18,740 --> 00:07:20,000
4e628958c01a updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   681
If I want to write down
4e628958c01a updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   682
4e628958c01a updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   683
149
4e628958c01a updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   684
00:07:20,000 --> 00:07:22,955
4e628958c01a updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   685
the regular expression which
4e628958c01a updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   686
can match first an 'a',
4e628958c01a updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   687
4e628958c01a updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   688
150
4e628958c01a updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   689
00:07:22,955 --> 00:07:25,010
4e628958c01a updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   690
then a 'b', and then a 'c',
4e628958c01a updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   691
4e628958c01a updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   692
151
4e628958c01a updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   693
00:07:25,010 --> 00:07:28,160
4e628958c01a updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   694
then I would write down
4e628958c01a updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   695
something like this.
4e628958c01a updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   696
4e628958c01a updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   697
152
4e628958c01a updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   698
00:07:28,160 --> 00:07:32,120
4e628958c01a updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   699
Just, there are again
4e628958c01a updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   700
4e628958c01a updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   701
153
4e628958c01a updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   702
00:07:32,120 --> 00:07:35,735
4e628958c01a updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   703
two regular expressions I
4e628958c01a updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   704
could have written down.
4e628958c01a updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   705
4e628958c01a updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   706
154
4e628958c01a updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   707
00:07:35,735 --> 00:07:38,480
4e628958c01a updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   708
Again by convention we don't
4e628958c01a updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   709
4e628958c01a updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   710
155
4e628958c01a updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   711
00:07:38,480 --> 00:07:40,670
4e628958c01a updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   712
write these parentheses though.
4e628958c01a updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   713
4e628958c01a updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   714
156
4e628958c01a updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   715
00:07:40,670 --> 00:07:42,350
4e628958c01a updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   716
However, sometimes we have to be
4e628958c01a updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   717
4e628958c01a updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   718
157
4e628958c01a updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   719
00:07:42,350 --> 00:07:43,940
4e628958c01a updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   720
very careful with parentheses,
4e628958c01a updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   721
4e628958c01a updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   722
158
4e628958c01a updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   723
00:07:43,940 --> 00:07:47,195
4e628958c01a updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   724
especially with star. 
4e628958c01a updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   725
4e628958c01a updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   726
159
4e628958c01a updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   727
00:07:47,195 --> 00:07:50,525
4e628958c01a updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   728
Because this regular expression
4e628958c01a updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   729
is definitely not
4e628958c01a updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   730
4e628958c01a updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   731
160
4e628958c01a updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   732
00:07:50,525 --> 00:07:54,900
4e628958c01a updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   733
the same as this regular expression.
4e628958c01a updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   734
4e628958c01a updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   735
161
4e628958c01a updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   736
00:07:56,100 --> 00:07:59,410
4e628958c01a updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   737
The first one here can match
4e628958c01a updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   738
4e628958c01a updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   739
162
4e628958c01a updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   740
00:07:59,410 --> 00:08:03,610
4e628958c01a updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   741
any strings containing a or b's.
4e628958c01a updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   742
4e628958c01a updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   743
163
4e628958c01a updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   744
00:08:03,610 --> 00:08:05,860
4e628958c01a updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   745
While this regular expression can
4e628958c01a updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   746
4e628958c01a updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   747
164
4e628958c01a updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   748
00:08:05,860 --> 00:08:07,945
4e628958c01a updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   749
only match the single character
4e628958c01a updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   750
4e628958c01a updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   751
165
4e628958c01a updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   752
00:08:07,945 --> 00:08:13,300
4e628958c01a updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   753
a or any string
4e628958c01a updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   754
containing only b's.
4e628958c01a updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   755
4e628958c01a updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   756
166
4e628958c01a updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   757
00:08:13,300 --> 00:08:15,265
4e628958c01a updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   758
So to make the difference clear,
4e628958c01a updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   759
4e628958c01a updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   760
167
4e628958c01a updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   761
00:08:15,265 --> 00:08:20,065
4e628958c01a updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   762
in this example, we would have
4e628958c01a updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   763
to use the parentheses.
4e628958c01a updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   764
4e628958c01a updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   765
168
4e628958c01a updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   766
00:08:20,065 --> 00:08:23,140
4e628958c01a updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   767
There's one more issue
4e628958c01a updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   768
with this definition.
4e628958c01a updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   769
4e628958c01a updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   770
169
4e628958c01a updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   771
00:08:23,140 --> 00:08:26,635
4e628958c01a updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   772
Why do we focus on these
4e628958c01a updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   773
basic regular expressions?
4e628958c01a updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   774
4e628958c01a updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   775
170
4e628958c01a updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   776
00:08:26,635 --> 00:08:28,660
4e628958c01a updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   777
Why don't we also include
4e628958c01a updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   778
4e628958c01a updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   779
171
4e628958c01a updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   780
00:08:28,660 --> 00:08:31,285
4e628958c01a updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   781
the ones from the
4e628958c01a updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   782
extended regular expressions.
4e628958c01a updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   783
4e628958c01a updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   784
172
4e628958c01a updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   785
00:08:31,285 --> 00:08:33,055
4e628958c01a updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   786
The answers very easy.
4e628958c01a updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   787
4e628958c01a updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   788
173
4e628958c01a updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   789
00:08:33,055 --> 00:08:35,680
4e628958c01a updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   790
These basic regular
4e628958c01a updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   791
expressions can be used
4e628958c01a updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   792
4e628958c01a updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   793
174
4e628958c01a updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   794
00:08:35,680 --> 00:08:38,370
4e628958c01a updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   795
to represent also
4e628958c01a updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   796
the extended ones.
4e628958c01a updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   797
4e628958c01a updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   798
175
4e628958c01a updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   799
00:08:38,370 --> 00:08:40,220
4e628958c01a updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   800
Let me give you some examples.
4e628958c01a updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   801
4e628958c01a updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   802
176
4e628958c01a updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   803
00:08:40,220 --> 00:08:44,225
4e628958c01a updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   804
If I have a regular
4e628958c01a updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   805
expression r+, for example,
4e628958c01a updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   806
4e628958c01a updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   807
177
4e628958c01a updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   808
00:08:44,225 --> 00:08:46,280
4e628958c01a updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   809
then the meaning
4e628958c01a updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   810
was I have to use
4e628958c01a updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   811
4e628958c01a updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   812
178
4e628958c01a updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   813
00:08:46,280 --> 00:08:49,115
4e628958c01a updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   814
at least one or more copies
4e628958c01a updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   815
4e628958c01a updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   816
179
4e628958c01a updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   817
00:08:49,115 --> 00:08:51,200
4e628958c01a updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   818
of this r to
4e628958c01a updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   819
match a string. 
4e628958c01a updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   820
4e628958c01a updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   821
180
4e628958c01a updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   822
00:08:51,200 --> 00:08:53,810
4e628958c01a updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   823
Well, one or more copies
4e628958c01a updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   824
can be represented by
4e628958c01a updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   825
4e628958c01a updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   826
181
4e628958c01a updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   827
00:08:53,810 --> 00:08:58,385
4e628958c01a updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   828
the basic ones as just
4e628958c01a updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   829
r followed by r*.
4e628958c01a updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   830
4e628958c01a updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   831
182
4e628958c01a updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   832
00:08:58,385 --> 00:09:01,760
4e628958c01a updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   833
Meaning I have to use one
4e628958c01a updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   834
copy of r, followed by
4e628958c01a updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   835
4e628958c01a updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   836
183
4e628958c01a updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   837
00:09:01,760 --> 00:09:05,150
4e628958c01a updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   838
0 or more copies of r.
4e628958c01a updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   839
4e628958c01a updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   840
184
4e628958c01a updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   841
00:09:05,150 --> 00:09:07,895
4e628958c01a updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   842
Similarly, if I have the optional
4e628958c01a updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   843
regular expression,
4e628958c01a updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   844
4e628958c01a updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   845
185
4e628958c01a updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   846
00:09:07,895 --> 00:09:10,715
4e628958c01a updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   847
which is supposed to
4e628958c01a updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   848
match a string
4e628958c01a updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   849
4e628958c01a updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   850
186
4e628958c01a updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   851
00:09:10,715 --> 00:09:13,865
4e628958c01a updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   852
by using r, or match
4e628958c01a updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   853
the empty string.
4e628958c01a updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   854
4e628958c01a updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   855
187
4e628958c01a updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   856
00:09:13,865 --> 00:09:19,295
4e628958c01a updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   857
Then this can be obviously
4e628958c01a updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   858
defined as r + 1.
4e628958c01a updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   859
4e628958c01a updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   860
188
4e628958c01a updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   861
00:09:19,295 --> 00:09:23,945
4e628958c01a updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   862
So here is the bold
4e628958c01a updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   863
regular expression 1,
4e628958c01a updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   864
4e628958c01a updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   865
189
4e628958c01a updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   866
00:09:23,945 --> 00:09:26,180
4e628958c01a updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   867
which means it either can
4e628958c01a updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   868
4e628958c01a updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   869
190
4e628958c01a updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   870
00:09:26,180 --> 00:09:28,205
4e628958c01a updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   871
recognize whatever
4e628958c01a updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   872
r can recognize,
4e628958c01a updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   873
4e628958c01a updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   874
191
4e628958c01a updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   875
00:09:28,205 --> 00:09:30,470
4e628958c01a updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   876
or it can recognize
4e628958c01a updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   877
the empty string.
4e628958c01a updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   878
4e628958c01a updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   879
192
4e628958c01a updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   880
00:09:30,470 --> 00:09:35,150
4e628958c01a updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   881
And if I have ranges, like a
4e628958c01a updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   882
to z,  then I can define
4e628958c01a updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   883
4e628958c01a updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   884
193
4e628958c01a updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   885
00:09:35,150 --> 00:09:41,135
4e628958c01a updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   886
that as a + b + c + ...
4e628958c01a updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   887
and so on until z.
4e628958c01a updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   888
4e628958c01a updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   889
194
4e628958c01a updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   890
00:09:41,135 --> 00:09:45,920
4e628958c01a updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   891
Maybe this definition is not
4e628958c01a updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   892
good in terms of runtime,
4e628958c01a updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   893
4e628958c01a updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   894
195
4e628958c01a updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   895
00:09:45,920 --> 00:09:47,960
4e628958c01a updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   896
but in terms of just being able
4e628958c01a updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   897
4e628958c01a updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   898
196
4e628958c01a updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   899
00:09:47,960 --> 00:09:50,780
4e628958c01a updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   900
to recognize strings
4e628958c01a updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   901
or match strings,
4e628958c01a updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   902
4e628958c01a updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   903
197
4e628958c01a updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   904
00:09:50,780 --> 00:09:54,680
4e628958c01a updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   905
the basic regular expressions
4e628958c01a updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   906
will be just sufficient.
4e628958c01a updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   907
4e628958c01a updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   908
198
4e628958c01a updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   909
00:09:54,680 --> 00:09:56,690
4e628958c01a updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   910
Unfortunately, we
4e628958c01a updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   911
also need to have
4e628958c01a updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   912
4e628958c01a updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   913
199
4e628958c01a updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   914
00:09:56,690 --> 00:09:58,850
4e628958c01a updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   915
a quick chat about strings.
4e628958c01a updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   916
4e628958c01a updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   917
200
4e628958c01a updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   918
00:09:58,850 --> 00:10:02,255
4e628958c01a updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   919
In Scala, it's crystal
4e628958c01a updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   920
clear what a string is.
4e628958c01a updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   921
4e628958c01a updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   922
201
4e628958c01a updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   923
00:10:02,255 --> 00:10:05,480
4e628958c01a updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   924
There's a separate datatype
4e628958c01a updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   925
which is called string.
4e628958c01a updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   926
4e628958c01a updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   927
202
4e628958c01a updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   928
00:10:05,480 --> 00:10:07,895
4e628958c01a updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   929
So here, for example,
4e628958c01a updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   930
is a string.
4e628958c01a updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   931
4e628958c01a updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   932
203
4e628958c01a updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   933
00:10:07,895 --> 00:10:09,200
4e628958c01a updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   934
And as you can see,
4e628958c01a updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   935
4e628958c01a updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   936
204
4e628958c01a updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   937
00:10:09,200 --> 00:10:11,105
4e628958c01a updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   938
it is of the type string.
4e628958c01a updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   939
4e628958c01a updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   940
205
4e628958c01a updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   941
00:10:11,105 --> 00:10:13,985
4e628958c01a updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   942
And the empty string
4e628958c01a updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   943
will be just that.
4e628958c01a updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   944
4e628958c01a updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   945
206
4e628958c01a updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   946
00:10:13,985 --> 00:10:16,160
4e628958c01a updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   947
However, when we write things down on
4e628958c01a updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   948
4e628958c01a updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   949
207
4e628958c01a updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   950
00:10:16,160 --> 00:10:18,320
4e628958c01a updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   951
paper and think
4e628958c01a updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   952
about our algorithm,
4e628958c01a updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   953
4e628958c01a updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   954
208
4e628958c01a updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   955
00:10:18,320 --> 00:10:22,790
4e628958c01a updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   956
we want to think of strings
4e628958c01a updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   957
as lists of characters.
4e628958c01a updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   958
4e628958c01a updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   959
209
4e628958c01a updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   960
00:10:22,790 --> 00:10:26,070
4e628958c01a updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   961
So more something like this.
4e628958c01a updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   962
4e628958c01a updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   963
210
4e628958c01a updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   964
00:10:27,070 --> 00:10:31,745
4e628958c01a updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   965
You can see here, this is actually
4e628958c01a updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   966
a list of characters.
4e628958c01a updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   967
4e628958c01a updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   968
211
4e628958c01a updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   969
00:10:31,745 --> 00:10:35,150
4e628958c01a updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   970
And the two operations
4e628958c01a updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   971
we need are taking
4e628958c01a updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   972
4e628958c01a updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   973
212
4e628958c01a updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   974
00:10:35,150 --> 00:10:37,280
4e628958c01a updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   975
the head of this list and
4e628958c01a updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   976
4e628958c01a updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   977
213
4e628958c01a updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   978
00:10:37,280 --> 00:10:39,770
4e628958c01a updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   979
the rest of the list
4e628958c01a updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   980
or tail of the list.
4e628958c01a updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   981
4e628958c01a updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   982
214
4e628958c01a updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   983
00:10:39,770 --> 00:10:41,720
4e628958c01a updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   984
That's why we want
4e628958c01a updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   985
to regard them as
4e628958c01a updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   986
4e628958c01a updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   987
215
4e628958c01a updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   988
00:10:41,720 --> 00:10:45,260
4e628958c01a updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   989
lists rather than strings.
4e628958c01a updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   990
4e628958c01a updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   991
216
4e628958c01a updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   992
00:10:45,260 --> 00:10:48,200
4e628958c01a updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   993
So if I'm using a
4e628958c01a updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   994
string like this,
4e628958c01a updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   995
4e628958c01a updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   996
217
4e628958c01a updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   997
00:10:48,200 --> 00:10:51,935
4e628958c01a updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   998
then on paper I always will
4e628958c01a updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
   999
write something like that.
4e628958c01a updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1000
4e628958c01a updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1001
218
4e628958c01a updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1002
00:10:51,935 --> 00:10:54,575
4e628958c01a updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1003
Or since I'm lazy, just that.
4e628958c01a updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1004
4e628958c01a updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1005
219
4e628958c01a updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1006
00:10:54,575 --> 00:10:56,675
4e628958c01a updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1007
And for the empty string,
4e628958c01a updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1008
4e628958c01a updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1009
220
4e628958c01a updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1010
00:10:56,675 --> 00:10:59,210
4e628958c01a updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1011
I will write either
4e628958c01a updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1012
the empty list, with
4e628958c01a updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1013
4e628958c01a updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1014
221
4e628958c01a updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1015
00:10:59,210 --> 00:11:03,920
4e628958c01a updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1016
two brackets or,
4e628958c01a updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1017
being lazy, just that.
4e628958c01a updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1018
4e628958c01a updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1019
222
4e628958c01a updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1020
00:11:03,920 --> 00:11:06,620
4e628958c01a updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1021
Actually there is one
4e628958c01a updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1022
more operation we need on
4e628958c01a updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1023
4e628958c01a updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1024
223
4e628958c01a updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1025
00:11:06,620 --> 00:11:09,410
4e628958c01a updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1026
strings and that
4e628958c01a updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1027
is concatenation.
4e628958c01a updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1028
4e628958c01a updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1029
224
4e628958c01a updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1030
00:11:09,410 --> 00:11:11,255
4e628958c01a updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1031
If you have a string s1,
4e628958c01a updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1032
4e628958c01a updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1033
225
4e628958c01a updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1034
00:11:11,255 --> 00:11:14,510
4e628958c01a updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1035
string s2, and put an
4e628958c01a updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1036
at symbol in between,
4e628958c01a updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1037
4e628958c01a updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1038
226
4e628958c01a updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1039
00:11:14,510 --> 00:11:18,050
4e628958c01a updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1040
that means we want to
4e628958c01a updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1041
concatenate both strings.
4e628958c01a updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1042
4e628958c01a updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1043
227
4e628958c01a updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1044
00:11:18,050 --> 00:11:22,625
4e628958c01a updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1045
So foo concatenated with
4e628958c01a updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1046
bar, would be foobar.
4e628958c01a updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1047
4e628958c01a updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1048
228
4e628958c01a updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1049
00:11:22,625 --> 00:11:25,085
4e628958c01a updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1050
And any string concatenated with
4e628958c01a updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1051
4e628958c01a updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1052
229
4e628958c01a updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1053
00:11:25,085 --> 00:11:27,950
4e628958c01a updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1054
the empty string
4e628958c01a updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1055
is left untouched.
4e628958c01a updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1056
4e628958c01a updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1057
230
4e628958c01a updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1058
00:11:27,950 --> 00:11:31,310
4e628958c01a updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1059
So baz concatenated with
4e628958c01a updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1060
4e628958c01a updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1061
231
4e628958c01a updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1062
00:11:31,310 --> 00:11:33,545
4e628958c01a updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1063
the empty string, is just baz.
4e628958c01a updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1064
4e628958c01a updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1065
232
4e628958c01a updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1066
00:11:33,545 --> 00:11:37,295
4e628958c01a updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1067
So that's like if we have
4e628958c01a updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1068
strings as lists of characters,
4e628958c01a updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1069
4e628958c01a updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1070
233
4e628958c01a updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1071
00:11:37,295 --> 00:11:39,755
4e628958c01a updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1072
that will be just list append.
4e628958c01a updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1073
4e628958c01a updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1074
234
4e628958c01a updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1075
00:11:39,755 --> 00:11:41,480
4e628958c01a updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1076
In the next video,
4e628958c01a updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1077
4e628958c01a updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1078
235
4e628958c01a updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1079
00:11:41,480 --> 00:11:43,160
4e628958c01a updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1080
we will use these definitions
4e628958c01a updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1081
4e628958c01a updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1082
236
4e628958c01a updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1083
00:11:43,160 --> 00:11:45,050
4e628958c01a updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1084
and introduce the notion of what
4e628958c01a updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1085
4e628958c01a updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1086
237
4e628958c01a updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1087
00:11:45,050 --> 00:11:46,850
4e628958c01a updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1088
a language is and
4e628958c01a updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1089
4e628958c01a updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1090
238
4e628958c01a updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1091
00:11:46,850 --> 00:11:49,920
4e628958c01a updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1092
what the meaning of a
4e628958c01a updated
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff changeset
  1093
regular expression is.