763
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff
changeset
|
1 |
1
|
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff
changeset
|
2 |
00:00:06,710 --> 00:00:09,225
|
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff
changeset
|
3 |
Thanks for tuning in again.
|
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff
changeset
|
4 |
|
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff
changeset
|
5 |
2
|
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff
changeset
|
6 |
00:00:09,225 --> 00:00:11,640
|
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff
changeset
|
7 |
In this video, we want to specify
|
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff
changeset
|
8 |
|
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff
changeset
|
9 |
3
|
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff
changeset
|
10 |
00:00:11,640 --> 00:00:14,370
|
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff
changeset
|
11 |
what problem our regular
|
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff
changeset
|
12 |
expression matcher
|
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff
changeset
|
13 |
|
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff
changeset
|
14 |
4
|
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff
changeset
|
15 |
00:00:14,370 --> 00:00:16,155
|
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff
changeset
|
16 |
is actually supposed to solve.
|
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff
changeset
|
17 |
|
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff
changeset
|
18 |
5
|
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff
changeset
|
19 |
00:00:16,155 --> 00:00:18,900
|
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff
changeset
|
20 |
The reason is that
|
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff
changeset
|
21 |
we know that some of
|
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff
changeset
|
22 |
|
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff
changeset
|
23 |
6
|
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff
changeset
|
24 |
00:00:18,900 --> 00:00:21,585
|
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff
changeset
|
25 |
the existing regular
|
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff
changeset
|
26 |
expression matching engines
|
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff
changeset
|
27 |
|
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff
changeset
|
28 |
7
|
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff
changeset
|
29 |
00:00:21,585 --> 00:00:25,200
|
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff
changeset
|
30 |
are not just abysmally
|
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff
changeset
|
31 |
slow in some examples,
|
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff
changeset
|
32 |
|
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff
changeset
|
33 |
8
|
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff
changeset
|
34 |
00:00:25,200 --> 00:00:27,105
|
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff
changeset
|
35 |
as you've seen in the
|
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff
changeset
|
36 |
previous video,
|
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff
changeset
|
37 |
|
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff
changeset
|
38 |
9
|
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff
changeset
|
39 |
00:00:27,105 --> 00:00:30,570
|
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff
changeset
|
40 |
but also produce sometimes
|
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff
changeset
|
41 |
incorrect results.
|
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff
changeset
|
42 |
|
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff
changeset
|
43 |
10
|
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff
changeset
|
44 |
00:00:30,570 --> 00:00:33,330
|
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff
changeset
|
45 |
In order to avoid
|
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff
changeset
|
46 |
this with our matcher,
|
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff
changeset
|
47 |
|
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff
changeset
|
48 |
11
|
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff
changeset
|
49 |
00:00:33,330 --> 00:00:35,325
|
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff
changeset
|
50 |
we need to somehow explain
|
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff
changeset
|
51 |
|
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff
changeset
|
52 |
12
|
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff
changeset
|
53 |
00:00:35,325 --> 00:00:39,255
|
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff
changeset
|
54 |
precisely what is the problem
|
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff
changeset
|
55 |
our algorithm solves.
|
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff
changeset
|
56 |
|
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff
changeset
|
57 |
13
|
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff
changeset
|
58 |
00:00:39,255 --> 00:00:41,935
|
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff
changeset
|
59 |
This will require
|
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff
changeset
|
60 |
a bit of theory, but
|
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff
changeset
|
61 |
|
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff
changeset
|
62 |
14
|
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff
changeset
|
63 |
00:00:41,935 --> 00:00:45,335
|
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff
changeset
|
64 |
I hope it is nevertheless
|
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff
changeset
|
65 |
a bit of fun.
|
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff
changeset
|
66 |
|
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff
changeset
|
67 |
15
|
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff
changeset
|
68 |
00:00:45,335 --> 00:00:47,915
|
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff
changeset
|
69 |
First, we have to specify
|
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff
changeset
|
70 |
|
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff
changeset
|
71 |
16
|
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff
changeset
|
72 |
00:00:47,915 --> 00:00:50,585
|
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff
changeset
|
73 |
what we mean by a
|
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff
changeset
|
74 |
regular expression.
|
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff
changeset
|
75 |
|
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff
changeset
|
76 |
17
|
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff
changeset
|
77 |
00:00:50,585 --> 00:00:53,210
|
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff
changeset
|
78 |
You've seen earlier some
|
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff
changeset
|
79 |
examples. They were
|
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff
changeset
|
80 |
|
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff
changeset
|
81 |
18
|
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff
changeset
|
82 |
00:00:53,210 --> 00:00:56,060
|
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff
changeset
|
83 |
actually taken or
|
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff
changeset
|
84 |
inspired by what
|
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff
changeset
|
85 |
|
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff
changeset
|
86 |
19
|
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff
changeset
|
87 |
00:00:56,060 --> 00:00:58,850
|
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff
changeset
|
88 |
is available in standard
|
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff
changeset
|
89 |
regular expression matching
|
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff
changeset
|
90 |
|
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff
changeset
|
91 |
20
|
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff
changeset
|
92 |
00:00:58,850 --> 00:01:02,330
|
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff
changeset
|
93 |
engines, like star, plus and n-times.
|
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff
changeset
|
94 |
|
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff
changeset
|
95 |
21
|
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff
changeset
|
96 |
00:01:02,330 --> 00:01:05,690
|
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff
changeset
|
97 |
But for many tasks,
|
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff
changeset
|
98 |
for our algorithm,
|
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff
changeset
|
99 |
|
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff
changeset
|
100 |
22
|
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff
changeset
|
101 |
00:01:05,690 --> 00:01:10,174
|
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff
changeset
|
102 |
we will focus only what I call
|
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff
changeset
|
103 |
basic regular expressions.
|
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff
changeset
|
104 |
|
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff
changeset
|
105 |
23
|
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff
changeset
|
106 |
00:01:10,174 --> 00:01:11,840
|
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff
changeset
|
107 |
Since I'm lazy, I will call
|
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff
changeset
|
108 |
|
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff
changeset
|
109 |
24
|
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff
changeset
|
110 |
00:01:11,840 --> 00:01:13,550
|
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff
changeset
|
111 |
these basic regular expressions
|
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff
changeset
|
112 |
|
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff
changeset
|
113 |
25
|
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff
changeset
|
114 |
00:01:13,550 --> 00:01:15,485
|
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff
changeset
|
115 |
just as regular expressions.
|
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff
changeset
|
116 |
|
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff
changeset
|
117 |
26
|
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff
changeset
|
118 |
00:01:15,485 --> 00:01:17,405
|
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff
changeset
|
119 |
And the ones you've seen earlier
|
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff
changeset
|
120 |
|
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff
changeset
|
121 |
27
|
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff
changeset
|
122 |
00:01:17,405 --> 00:01:19,400
|
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff
changeset
|
123 |
as extended regular expressions.
|
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff
changeset
|
124 |
|
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff
changeset
|
125 |
28
|
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff
changeset
|
126 |
00:01:19,400 --> 00:01:22,940
|
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff
changeset
|
127 |
So the basic regulare expressions,
|
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff
changeset
|
128 |
or just regular expressions,
|
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff
changeset
|
129 |
|
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff
changeset
|
130 |
29
|
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff
changeset
|
131 |
00:01:22,940 --> 00:01:25,280
|
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff
changeset
|
132 |
they will have characters.
|
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff
changeset
|
133 |
|
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff
changeset
|
134 |
30
|
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff
changeset
|
135 |
00:01:25,280 --> 00:01:27,170
|
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff
changeset
|
136 |
So you can match any character,
|
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff
changeset
|
137 |
|
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff
changeset
|
138 |
31
|
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff
changeset
|
139 |
00:01:27,170 --> 00:01:31,370
|
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff
changeset
|
140 |
a,b,c to z or 0 to 9.
|
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff
changeset
|
141 |
|
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff
changeset
|
142 |
32
|
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff
changeset
|
143 |
00:01:31,370 --> 00:01:35,525
|
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff
changeset
|
144 |
Any Ascii character. 'c' here
|
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff
changeset
|
145 |
is just a representative.
|
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff
changeset
|
146 |
|
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff
changeset
|
147 |
33
|
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff
changeset
|
148 |
00:01:35,525 --> 00:01:38,825
|
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff
changeset
|
149 |
So we can match
|
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff
changeset
|
150 |
single characters.
|
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff
changeset
|
151 |
|
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff
changeset
|
152 |
34
|
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff
changeset
|
153 |
00:01:38,825 --> 00:01:42,440
|
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff
changeset
|
154 |
Then we can match alternatives.
|
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff
changeset
|
155 |
|
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff
changeset
|
156 |
35
|
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff
changeset
|
157 |
00:01:42,440 --> 00:01:44,930
|
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff
changeset
|
158 |
That means a string
|
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff
changeset
|
159 |
is either matched
|
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff
changeset
|
160 |
|
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff
changeset
|
161 |
36
|
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff
changeset
|
162 |
00:01:44,930 --> 00:01:46,730
|
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff
changeset
|
163 |
by the regular expression r1
|
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff
changeset
|
164 |
|
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff
changeset
|
165 |
37
|
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff
changeset
|
166 |
00:01:46,730 --> 00:01:49,324
|
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff
changeset
|
167 |
or by the regular expression r2.
|
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff
changeset
|
168 |
|
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff
changeset
|
169 |
38
|
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff
changeset
|
170 |
00:01:49,324 --> 00:01:52,790
|
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff
changeset
|
171 |
And for the
|
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff
changeset
|
172 |
alternative we write +.
|
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff
changeset
|
173 |
|
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff
changeset
|
174 |
39
|
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff
changeset
|
175 |
00:01:52,790 --> 00:01:55,175
|
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff
changeset
|
176 |
Then we also have sequence.
|
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff
changeset
|
177 |
|
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff
changeset
|
178 |
40
|
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff
changeset
|
179 |
00:01:55,175 --> 00:01:57,410
|
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff
changeset
|
180 |
This sequence regular
|
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff
changeset
|
181 |
expression essentially
|
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff
changeset
|
182 |
|
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff
changeset
|
183 |
41
|
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff
changeset
|
184 |
00:01:57,410 --> 00:01:59,915
|
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff
changeset
|
185 |
says that a string needs to be matched
|
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff
changeset
|
186 |
|
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff
changeset
|
187 |
42
|
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff
changeset
|
188 |
00:01:59,915 --> 00:02:02,210
|
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff
changeset
|
189 |
the first part by
|
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff
changeset
|
190 |
the regular expression r1
|
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff
changeset
|
191 |
|
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff
changeset
|
192 |
43
|
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff
changeset
|
193 |
00:02:02,210 --> 00:02:06,275
|
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff
changeset
|
194 |
and then the second
|
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff
changeset
|
195 |
part by the r2.
|
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff
changeset
|
196 |
|
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff
changeset
|
197 |
44
|
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff
changeset
|
198 |
00:02:06,275 --> 00:02:10,190
|
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff
changeset
|
199 |
And then we have also the
|
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff
changeset
|
200 |
star regular expression,
|
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff
changeset
|
201 |
|
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff
changeset
|
202 |
45
|
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff
changeset
|
203 |
00:02:10,190 --> 00:02:12,980
|
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff
changeset
|
204 |
which says the regular
|
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff
changeset
|
205 |
expression needs to match
|
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff
changeset
|
206 |
|
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff
changeset
|
207 |
46
|
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff
changeset
|
208 |
00:02:12,980 --> 00:02:16,520
|
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff
changeset
|
209 |
the string with zero
|
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff
changeset
|
210 |
or more copies.
|
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff
changeset
|
211 |
|
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff
changeset
|
212 |
47
|
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff
changeset
|
213 |
00:02:16,520 --> 00:02:18,140
|
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff
changeset
|
214 |
And then we also have some
|
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff
changeset
|
215 |
|
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff
changeset
|
216 |
48
|
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff
changeset
|
217 |
00:02:18,140 --> 00:02:20,060
|
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff
changeset
|
218 |
slightly strange
|
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff
changeset
|
219 |
regular expressions.
|
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff
changeset
|
220 |
|
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff
changeset
|
221 |
49
|
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff
changeset
|
222 |
00:02:20,060 --> 00:02:22,505
|
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff
changeset
|
223 |
We have the regular expression 1,
|
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff
changeset
|
224 |
|
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff
changeset
|
225 |
50
|
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff
changeset
|
226 |
00:02:22,505 --> 00:02:25,910
|
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff
changeset
|
227 |
which can only match
|
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff
changeset
|
228 |
the empty string.
|
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff
changeset
|
229 |
|
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff
changeset
|
230 |
51
|
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff
changeset
|
231 |
00:02:25,910 --> 00:02:29,075
|
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff
changeset
|
232 |
I'm using here the
|
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff
changeset
|
233 |
notation 1 for that
|
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff
changeset
|
234 |
|
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff
changeset
|
235 |
52
|
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff
changeset
|
236 |
00:02:29,075 --> 00:02:31,340
|
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff
changeset
|
237 |
and in my writing I will always
|
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff
changeset
|
238 |
|
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff
changeset
|
239 |
53
|
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff
changeset
|
240 |
00:02:31,340 --> 00:02:33,440
|
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff
changeset
|
241 |
make sure that for the
|
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff
changeset
|
242 |
regular expression
|
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff
changeset
|
243 |
|
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff
changeset
|
244 |
54
|
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff
changeset
|
245 |
00:02:33,440 --> 00:02:35,765
|
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff
changeset
|
246 |
I will write the
|
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff
changeset
|
247 |
1 in a bold font.
|
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff
changeset
|
248 |
|
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff
changeset
|
249 |
55
|
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff
changeset
|
250 |
00:02:35,765 --> 00:02:38,510
|
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff
changeset
|
251 |
So whenever you see
|
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff
changeset
|
252 |
a 1 in bold font,
|
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff
changeset
|
253 |
|
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff
changeset
|
254 |
56
|
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff
changeset
|
255 |
00:02:38,510 --> 00:02:40,395
|
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff
changeset
|
256 |
this is not the 1, but
|
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff
changeset
|
257 |
|
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff
changeset
|
258 |
57
|
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff
changeset
|
259 |
00:02:40,395 --> 00:02:44,300
|
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff
changeset
|
260 |
the regular expression which
|
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff
changeset
|
261 |
can match the empty string.
|
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff
changeset
|
262 |
|
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff
changeset
|
263 |
58
|
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff
changeset
|
264 |
00:02:44,300 --> 00:02:48,050
|
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff
changeset
|
265 |
And we also have the
|
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff
changeset
|
266 |
regular expression 0,
|
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff
changeset
|
267 |
|
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff
changeset
|
268 |
59
|
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff
changeset
|
269 |
00:02:48,050 --> 00:02:50,315
|
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff
changeset
|
270 |
which cannot match
|
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff
changeset
|
271 |
anything at all.
|
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff
changeset
|
272 |
|
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff
changeset
|
273 |
60
|
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff
changeset
|
274 |
00:02:50,315 --> 00:02:51,695
|
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff
changeset
|
275 |
You might think, well,
|
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff
changeset
|
276 |
|
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff
changeset
|
277 |
61
|
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff
changeset
|
278 |
00:02:51,695 --> 00:02:54,635
|
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff
changeset
|
279 |
that's not much use if it cannot
|
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff
changeset
|
280 |
match anything at all,
|
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff
changeset
|
281 |
|
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff
changeset
|
282 |
62
|
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff
changeset
|
283 |
00:02:54,635 --> 00:02:58,130
|
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff
changeset
|
284 |
but you will see why that
|
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff
changeset
|
285 |
one is important later on.
|
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff
changeset
|
286 |
|
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff
changeset
|
287 |
63
|
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff
changeset
|
288 |
00:02:58,130 --> 00:03:00,785
|
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff
changeset
|
289 |
So our basic regular expressions,
|
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff
changeset
|
290 |
|
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff
changeset
|
291 |
64
|
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff
changeset
|
292 |
00:03:00,785 --> 00:03:02,375
|
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff
changeset
|
293 |
they will be 0,
|
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff
changeset
|
294 |
|
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff
changeset
|
295 |
65
|
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff
changeset
|
296 |
00:03:02,375 --> 00:03:08,390
|
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff
changeset
|
297 |
1, characters, alternatives,
|
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff
changeset
|
298 |
sequences and stars.
|
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff
changeset
|
299 |
|
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff
changeset
|
300 |
66
|
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff
changeset
|
301 |
00:03:08,390 --> 00:03:12,170
|
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff
changeset
|
302 |
And these are all the
|
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff
changeset
|
303 |
basic regular expressions.
|
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff
changeset
|
304 |
|
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff
changeset
|
305 |
67
|
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff
changeset
|
306 |
00:03:12,170 --> 00:03:16,280
|
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff
changeset
|
307 |
If this definition is a
|
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff
changeset
|
308 |
bit too abstract for you,
|
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff
changeset
|
309 |
|
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff
changeset
|
310 |
68
|
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff
changeset
|
311 |
00:03:16,280 --> 00:03:18,560
|
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff
changeset
|
312 |
we can also look at
|
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff
changeset
|
313 |
the concrete code,
|
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff
changeset
|
314 |
|
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff
changeset
|
315 |
69
|
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff
changeset
|
316 |
00:03:18,560 --> 00:03:23,060
|
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff
changeset
|
317 |
how that would pan out when
|
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff
changeset
|
318 |
actually writing some Scala.
|
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff
changeset
|
319 |
|
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff
changeset
|
320 |
70
|
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff
changeset
|
321 |
00:03:23,060 --> 00:03:28,040
|
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff
changeset
|
322 |
I promised you, I show
|
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff
changeset
|
323 |
you always my code in Scala.
|
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff
changeset
|
324 |
|
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff
changeset
|
325 |
71
|
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff
changeset
|
326 |
00:03:28,040 --> 00:03:29,480
|
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff
changeset
|
327 |
So here you would have
|
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff
changeset
|
328 |
|
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff
changeset
|
329 |
72
|
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff
changeset
|
330 |
00:03:29,480 --> 00:03:32,885
|
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff
changeset
|
331 |
first an abstract class
|
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff
changeset
|
332 |
for regular expressions.
|
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff
changeset
|
333 |
|
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff
changeset
|
334 |
73
|
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff
changeset
|
335 |
00:03:32,885 --> 00:03:37,580
|
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff
changeset
|
336 |
Then you have one regular
|
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff
changeset
|
337 |
expression for 0,
|
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff
changeset
|
338 |
|
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff
changeset
|
339 |
74
|
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff
changeset
|
340 |
00:03:37,580 --> 00:03:41,540
|
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff
changeset
|
341 |
one regular expression for 1,
|
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff
changeset
|
342 |
|
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff
changeset
|
343 |
75
|
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff
changeset
|
344 |
00:03:41,540 --> 00:03:42,875
|
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff
changeset
|
345 |
one regular expression, which
|
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff
changeset
|
346 |
takes an argument,
|
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff
changeset
|
347 |
|
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff
changeset
|
348 |
76
|
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff
changeset
|
349 |
00:03:42,875 --> 00:03:45,050
|
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff
changeset
|
350 |
the character you want to match,
|
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff
changeset
|
351 |
|
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff
changeset
|
352 |
77
|
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff
changeset
|
353 |
00:03:45,050 --> 00:03:47,915
|
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff
changeset
|
354 |
the characters a,b, c and so on.
|
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff
changeset
|
355 |
|
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff
changeset
|
356 |
78
|
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff
changeset
|
357 |
00:03:47,915 --> 00:03:50,945
|
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff
changeset
|
358 |
Then we have an alternative
|
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff
changeset
|
359 |
regular expression,
|
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff
changeset
|
360 |
|
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff
changeset
|
361 |
79
|
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff
changeset
|
362 |
00:03:50,945 --> 00:03:53,480
|
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff
changeset
|
363 |
which takes the first
|
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff
changeset
|
364 |
alternative and
|
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff
changeset
|
365 |
|
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff
changeset
|
366 |
80
|
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff
changeset
|
367 |
00:03:53,480 --> 00:03:56,435
|
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff
changeset
|
368 |
the second alternative
|
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff
changeset
|
369 |
as arguments.
|
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff
changeset
|
370 |
|
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff
changeset
|
371 |
81
|
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff
changeset
|
372 |
00:03:56,435 --> 00:03:59,690
|
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff
changeset
|
373 |
And we have a sequence
|
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff
changeset
|
374 |
regular expression. Again,
|
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff
changeset
|
375 |
|
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff
changeset
|
376 |
82
|
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff
changeset
|
377 |
00:03:59,690 --> 00:04:01,850
|
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff
changeset
|
378 |
which takes the
|
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff
changeset
|
379 |
first component and
|
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff
changeset
|
380 |
|
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff
changeset
|
381 |
83
|
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff
changeset
|
382 |
00:04:01,850 --> 00:04:04,730
|
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff
changeset
|
383 |
the second component
|
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff
changeset
|
384 |
as two arguments.
|
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff
changeset
|
385 |
|
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff
changeset
|
386 |
84
|
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff
changeset
|
387 |
00:04:04,730 --> 00:04:07,249
|
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff
changeset
|
388 |
And we have the star
|
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff
changeset
|
389 |
regular expression,
|
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff
changeset
|
390 |
|
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff
changeset
|
391 |
85
|
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff
changeset
|
392 |
00:04:07,249 --> 00:04:10,880
|
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff
changeset
|
393 |
which just take one regular
|
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff
changeset
|
394 |
expression as argument.
|
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff
changeset
|
395 |
|
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff
changeset
|
396 |
86
|
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff
changeset
|
397 |
00:04:10,880 --> 00:04:16,115
|
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff
changeset
|
398 |
And all these reg expressions
|
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff
changeset
|
399 |
extend our abstract class.
|
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff
changeset
|
400 |
|
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff
changeset
|
401 |
87
|
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff
changeset
|
402 |
00:04:16,115 --> 00:04:20,300
|
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff
changeset
|
403 |
For whatever I do in
|
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff
changeset
|
404 |
this module here I have
|
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff
changeset
|
405 |
|
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff
changeset
|
406 |
88
|
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff
changeset
|
407 |
00:04:20,300 --> 00:04:23,300
|
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff
changeset
|
408 |
the convention that all
|
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff
changeset
|
409 |
the regular expressions
|
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff
changeset
|
410 |
|
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff
changeset
|
411 |
89
|
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff
changeset
|
412 |
00:04:23,300 --> 00:04:25,550
|
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff
changeset
|
413 |
are written with capital letters.
|
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff
changeset
|
414 |
|
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff
changeset
|
415 |
90
|
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff
changeset
|
416 |
00:04:25,550 --> 00:04:26,885
|
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff
changeset
|
417 |
As you can see that here,
|
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff
changeset
|
418 |
|
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff
changeset
|
419 |
91
|
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff
changeset
|
420 |
00:04:26,885 --> 00:04:31,685
|
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff
changeset
|
421 |
O, 1, character, these will be
|
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff
changeset
|
422 |
always regular expressions.
|
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff
changeset
|
423 |
|
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff
changeset
|
424 |
92
|
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff
changeset
|
425 |
00:04:31,685 --> 00:04:34,370
|
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff
changeset
|
426 |
They have all capital letters.
|
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff
changeset
|
427 |
|
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff
changeset
|
428 |
93
|
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff
changeset
|
429 |
00:04:34,370 --> 00:04:36,484
|
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff
changeset
|
430 |
Let's for a moment,
|
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff
changeset
|
431 |
|
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff
changeset
|
432 |
94
|
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff
changeset
|
433 |
00:04:36,484 --> 00:04:38,720
|
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff
changeset
|
434 |
play around with this definition.
|
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff
changeset
|
435 |
|
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff
changeset
|
436 |
95
|
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff
changeset
|
437 |
00:04:38,720 --> 00:04:41,945
|
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff
changeset
|
438 |
I'm using here the
|
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff
changeset
|
439 |
Ammonite REPL.
|
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff
changeset
|
440 |
|
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff
changeset
|
441 |
96
|
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff
changeset
|
442 |
00:04:41,945 --> 00:04:46,950
|
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff
changeset
|
443 |
And I can evaluate
|
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff
changeset
|
444 |
this definition.
|
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff
changeset
|
445 |
|
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff
changeset
|
446 |
97
|
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff
changeset
|
447 |
00:04:53,430 --> 00:04:55,810
|
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff
changeset
|
448 |
And now I can start to
|
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff
changeset
|
449 |
|
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff
changeset
|
450 |
98
|
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff
changeset
|
451 |
00:04:55,810 --> 00:04:58,570
|
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff
changeset
|
452 |
define particular
|
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff
changeset
|
453 |
regular expressions.
|
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff
changeset
|
454 |
|
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff
changeset
|
455 |
99
|
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff
changeset
|
456 |
00:04:58,570 --> 00:05:00,340
|
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff
changeset
|
457 |
For example, if I need
|
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff
changeset
|
458 |
a regular expression
|
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff
changeset
|
459 |
|
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff
changeset
|
460 |
100
|
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff
changeset
|
461 |
00:05:00,340 --> 00:05:02,860
|
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff
changeset
|
462 |
which can recognise
|
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff
changeset
|
463 |
the character a,
|
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff
changeset
|
464 |
|
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff
changeset
|
465 |
101
|
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff
changeset
|
466 |
00:05:02,860 --> 00:05:06,025
|
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff
changeset
|
467 |
then I would write
|
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff
changeset
|
468 |
something like this.
|
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff
changeset
|
469 |
|
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff
changeset
|
470 |
102
|
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff
changeset
|
471 |
00:05:06,025 --> 00:05:08,710
|
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff
changeset
|
472 |
So this regular expression
|
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff
changeset
|
473 |
takes an argument,
|
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff
changeset
|
474 |
|
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff
changeset
|
475 |
103
|
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff
changeset
|
476 |
00:05:08,710 --> 00:05:13,615
|
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff
changeset
|
477 |
the character 'a' to specify
|
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff
changeset
|
478 |
which character to match.
|
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff
changeset
|
479 |
|
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff
changeset
|
480 |
104
|
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff
changeset
|
481 |
00:05:13,615 --> 00:05:16,945
|
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff
changeset
|
482 |
We do this obviously also with 'b'.
|
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff
changeset
|
483 |
|
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff
changeset
|
484 |
105
|
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff
changeset
|
485 |
00:05:16,945 --> 00:05:19,405
|
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff
changeset
|
486 |
And I can do that with
|
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff
changeset
|
487 |
|
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff
changeset
|
488 |
106
|
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff
changeset
|
489 |
00:05:19,405 --> 00:05:22,975
|
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff
changeset
|
490 |
'c'. So now we have three
|
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff
changeset
|
491 |
regular expressions.
|
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff
changeset
|
492 |
|
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff
changeset
|
493 |
107
|
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff
changeset
|
494 |
00:05:22,975 --> 00:05:25,570
|
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff
changeset
|
495 |
If you look very carefully
|
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff
changeset
|
496 |
at this definition,
|
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff
changeset
|
497 |
|
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff
changeset
|
498 |
108
|
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff
changeset
|
499 |
00:05:25,570 --> 00:05:27,070
|
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff
changeset
|
500 |
you can actually see
|
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff
changeset
|
501 |
|
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff
changeset
|
502 |
109
|
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff
changeset
|
503 |
00:05:27,070 --> 00:05:29,940
|
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff
changeset
|
504 |
these regular
|
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff
changeset
|
505 |
expressions are trees.
|
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff
changeset
|
506 |
|
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff
changeset
|
507 |
110
|
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff
changeset
|
508 |
00:05:29,940 --> 00:05:33,365
|
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff
changeset
|
509 |
So no matter what we
|
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff
changeset
|
510 |
write down on paper,
|
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff
changeset
|
511 |
|
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff
changeset
|
512 |
111
|
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff
changeset
|
513 |
00:05:33,365 --> 00:05:36,755
|
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff
changeset
|
514 |
they are behind the
|
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff
changeset
|
515 |
scenes always trees.
|
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff
changeset
|
516 |
|
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff
changeset
|
517 |
112
|
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff
changeset
|
518 |
00:05:36,755 --> 00:05:40,010
|
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff
changeset
|
519 |
And you can see that
|
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff
changeset
|
520 |
actually in this definition.
|
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff
changeset
|
521 |
|
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff
changeset
|
522 |
113
|
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff
changeset
|
523 |
00:05:40,010 --> 00:05:44,330
|
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff
changeset
|
524 |
If you define two regular
|
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff
changeset
|
525 |
expressions r1 and r2.
|
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff
changeset
|
526 |
|
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff
changeset
|
527 |
114
|
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff
changeset
|
528 |
00:05:44,330 --> 00:05:49,310
|
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff
changeset
|
529 |
They are essentially
|
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff
changeset
|
530 |
the alternative of a, b and c.
|
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff
changeset
|
531 |
|
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff
changeset
|
532 |
115
|
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff
changeset
|
533 |
00:05:49,310 --> 00:05:52,760
|
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff
changeset
|
534 |
Then this regular expression
|
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff
changeset
|
535 |
|
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff
changeset
|
536 |
116
|
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff
changeset
|
537 |
00:05:52,760 --> 00:05:54,710
|
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff
changeset
|
538 |
can match either the character
|
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff
changeset
|
539 |
|
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff
changeset
|
540 |
117
|
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff
changeset
|
541 |
00:05:54,710 --> 00:05:57,980
|
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff
changeset
|
542 |
a or the character b
|
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff
changeset
|
543 |
or the character c.
|
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff
changeset
|
544 |
|
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff
changeset
|
545 |
118
|
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff
changeset
|
546 |
00:05:57,980 --> 00:06:01,640
|
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff
changeset
|
547 |
And the same for the
|
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff
changeset
|
548 |
regular expression r2.
|
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff
changeset
|
549 |
|
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff
changeset
|
550 |
119
|
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff
changeset
|
551 |
00:06:01,640 --> 00:06:03,875
|
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff
changeset
|
552 |
So let me just evaluate that.
|
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff
changeset
|
553 |
|
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff
changeset
|
554 |
120
|
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff
changeset
|
555 |
00:06:03,875 --> 00:06:05,690
|
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff
changeset
|
556 |
And even though these are
|
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff
changeset
|
557 |
|
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff
changeset
|
558 |
121
|
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff
changeset
|
559 |
00:06:05,690 --> 00:06:07,175
|
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff
changeset
|
560 |
two regular expressions
|
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff
changeset
|
561 |
which can match
|
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff
changeset
|
562 |
|
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff
changeset
|
563 |
122
|
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff
changeset
|
564 |
00:06:07,175 --> 00:06:11,750
|
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff
changeset
|
565 |
exactly the same things,
|
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff
changeset
|
566 |
they a different trees.
|
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff
changeset
|
567 |
|
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff
changeset
|
568 |
123
|
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff
changeset
|
569 |
00:06:11,750 --> 00:06:14,195
|
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff
changeset
|
570 |
So if I ask Scala,
|
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff
changeset
|
571 |
|
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff
changeset
|
572 |
124
|
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff
changeset
|
573 |
00:06:14,195 --> 00:06:16,460
|
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff
changeset
|
574 |
are these trees different?
|
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff
changeset
|
575 |
|
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff
changeset
|
576 |
125
|
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff
changeset
|
577 |
00:06:16,460 --> 00:06:19,250
|
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff
changeset
|
578 |
Or ask if they're
|
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff
changeset
|
579 |
|
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff
changeset
|
580 |
126
|
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff
changeset
|
581 |
00:06:19,250 --> 00:06:21,865
|
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff
changeset
|
582 |
the same, then Scala will say No,
|
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff
changeset
|
583 |
|
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff
changeset
|
584 |
127
|
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff
changeset
|
585 |
00:06:21,865 --> 00:06:25,440
|
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff
changeset
|
586 |
they actually different trees.
|
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff
changeset
|
587 |
|
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff
changeset
|
588 |
128
|
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff
changeset
|
589 |
00:06:25,450 --> 00:06:28,459
|
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff
changeset
|
590 |
Let's come back to
|
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff
changeset
|
591 |
this definition.
|
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff
changeset
|
592 |
|
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff
changeset
|
593 |
129
|
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff
changeset
|
594 |
00:06:28,459 --> 00:06:31,760
|
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff
changeset
|
595 |
If we want to write down
|
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff
changeset
|
596 |
regular expressions on paper,
|
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff
changeset
|
597 |
|
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff
changeset
|
598 |
130
|
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff
changeset
|
599 |
00:06:31,760 --> 00:06:33,620
|
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff
changeset
|
600 |
then we want to be sloppy as
|
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff
changeset
|
601 |
|
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff
changeset
|
602 |
131
|
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff
changeset
|
603 |
00:06:33,620 --> 00:06:35,750
|
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff
changeset
|
604 |
mathematicians rather than as
|
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff
changeset
|
605 |
|
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff
changeset
|
606 |
132
|
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff
changeset
|
607 |
00:06:35,750 --> 00:06:37,745
|
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff
changeset
|
608 |
precise as computer scientists.
|
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff
changeset
|
609 |
|
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff
changeset
|
610 |
133
|
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff
changeset
|
611 |
00:06:37,745 --> 00:06:40,490
|
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff
changeset
|
612 |
So when we want to write down
|
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff
changeset
|
613 |
a regular expression which can
|
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff
changeset
|
614 |
|
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff
changeset
|
615 |
134
|
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff
changeset
|
616 |
00:06:40,490 --> 00:06:43,955
|
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff
changeset
|
617 |
either match the character
|
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff
changeset
|
618 |
a or the character b,
|
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff
changeset
|
619 |
|
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff
changeset
|
620 |
135
|
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff
changeset
|
621 |
00:06:43,955 --> 00:06:49,130
|
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff
changeset
|
622 |
then we would write down
|
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff
changeset
|
623 |
something like this, a plus b.
|
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff
changeset
|
624 |
|
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff
changeset
|
625 |
136
|
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff
changeset
|
626 |
00:06:49,130 --> 00:06:51,170
|
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff
changeset
|
627 |
And if you want to have
|
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff
changeset
|
628 |
the regular expression
|
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff
changeset
|
629 |
|
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff
changeset
|
630 |
137
|
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff
changeset
|
631 |
00:06:51,170 --> 00:06:52,625
|
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff
changeset
|
632 |
which can either match
|
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff
changeset
|
633 |
|
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff
changeset
|
634 |
138
|
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff
changeset
|
635 |
00:06:52,625 --> 00:06:55,925
|
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff
changeset
|
636 |
the character a or b or c,
|
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff
changeset
|
637 |
|
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff
changeset
|
638 |
139
|
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff
changeset
|
639 |
00:06:55,925 --> 00:06:58,340
|
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff
changeset
|
640 |
we will write
|
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff
changeset
|
641 |
something like this.
|
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff
changeset
|
642 |
|
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff
changeset
|
643 |
140
|
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff
changeset
|
644 |
00:06:58,340 --> 00:07:01,370
|
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff
changeset
|
645 |
But of course behind the
|
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff
changeset
|
646 |
scenes, these are trees.
|
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff
changeset
|
647 |
|
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff
changeset
|
648 |
141
|
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff
changeset
|
649 |
00:07:01,370 --> 00:07:04,460
|
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff
changeset
|
650 |
So we should have written
|
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff
changeset
|
651 |
them with parentheses.
|
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff
changeset
|
652 |
|
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff
changeset
|
653 |
142
|
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff
changeset
|
654 |
00:07:04,460 --> 00:07:06,440
|
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff
changeset
|
655 |
And you can see
|
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff
changeset
|
656 |
actually, there are two
|
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff
changeset
|
657 |
|
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff
changeset
|
658 |
143
|
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff
changeset
|
659 |
00:07:06,440 --> 00:07:08,990
|
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff
changeset
|
660 |
regular expressions I
|
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff
changeset
|
661 |
could have written down.
|
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff
changeset
|
662 |
|
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff
changeset
|
663 |
144
|
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff
changeset
|
664 |
00:07:08,990 --> 00:07:11,270
|
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff
changeset
|
665 |
They're different.
|
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff
changeset
|
666 |
|
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff
changeset
|
667 |
145
|
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff
changeset
|
668 |
00:07:11,270 --> 00:07:12,710
|
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff
changeset
|
669 |
Just by convention,
|
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff
changeset
|
670 |
|
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff
changeset
|
671 |
146
|
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff
changeset
|
672 |
00:07:12,710 --> 00:07:15,575
|
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff
changeset
|
673 |
we on't write these parentheses.
|
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff
changeset
|
674 |
|
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff
changeset
|
675 |
147
|
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff
changeset
|
676 |
00:07:15,575 --> 00:07:18,740
|
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff
changeset
|
677 |
And that is similar with sequences.
|
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff
changeset
|
678 |
|
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff
changeset
|
679 |
148
|
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff
changeset
|
680 |
00:07:18,740 --> 00:07:20,000
|
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff
changeset
|
681 |
If I want to write down
|
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff
changeset
|
682 |
|
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff
changeset
|
683 |
149
|
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff
changeset
|
684 |
00:07:20,000 --> 00:07:22,955
|
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff
changeset
|
685 |
the regular expression which
|
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff
changeset
|
686 |
can match first an 'a',
|
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff
changeset
|
687 |
|
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff
changeset
|
688 |
150
|
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff
changeset
|
689 |
00:07:22,955 --> 00:07:25,010
|
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff
changeset
|
690 |
then a 'b', and then a 'c',
|
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff
changeset
|
691 |
|
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff
changeset
|
692 |
151
|
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff
changeset
|
693 |
00:07:25,010 --> 00:07:28,160
|
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff
changeset
|
694 |
then I would write down
|
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff
changeset
|
695 |
something like this.
|
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff
changeset
|
696 |
|
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff
changeset
|
697 |
152
|
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff
changeset
|
698 |
00:07:28,160 --> 00:07:32,120
|
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff
changeset
|
699 |
Just, there are again
|
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff
changeset
|
700 |
|
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff
changeset
|
701 |
153
|
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff
changeset
|
702 |
00:07:32,120 --> 00:07:35,735
|
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff
changeset
|
703 |
two regular expressions I
|
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff
changeset
|
704 |
could have written down.
|
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff
changeset
|
705 |
|
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff
changeset
|
706 |
154
|
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff
changeset
|
707 |
00:07:35,735 --> 00:07:38,480
|
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff
changeset
|
708 |
Again by convention we don't
|
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff
changeset
|
709 |
|
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff
changeset
|
710 |
155
|
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff
changeset
|
711 |
00:07:38,480 --> 00:07:40,670
|
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff
changeset
|
712 |
write these parentheses though.
|
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff
changeset
|
713 |
|
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff
changeset
|
714 |
156
|
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff
changeset
|
715 |
00:07:40,670 --> 00:07:42,350
|
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff
changeset
|
716 |
However, sometimes we have to be
|
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff
changeset
|
717 |
|
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff
changeset
|
718 |
157
|
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff
changeset
|
719 |
00:07:42,350 --> 00:07:43,940
|
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff
changeset
|
720 |
very careful with parentheses,
|
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff
changeset
|
721 |
|
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff
changeset
|
722 |
158
|
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff
changeset
|
723 |
00:07:43,940 --> 00:07:47,195
|
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff
changeset
|
724 |
especially with star.
|
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff
changeset
|
725 |
|
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff
changeset
|
726 |
159
|
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff
changeset
|
727 |
00:07:47,195 --> 00:07:50,525
|
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff
changeset
|
728 |
Because this regular expression
|
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff
changeset
|
729 |
is definitely not
|
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff
changeset
|
730 |
|
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff
changeset
|
731 |
160
|
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff
changeset
|
732 |
00:07:50,525 --> 00:07:54,900
|
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff
changeset
|
733 |
the same as this regular expression.
|
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff
changeset
|
734 |
|
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff
changeset
|
735 |
161
|
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff
changeset
|
736 |
00:07:56,100 --> 00:07:59,410
|
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff
changeset
|
737 |
The first one here can match
|
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff
changeset
|
738 |
|
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff
changeset
|
739 |
162
|
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff
changeset
|
740 |
00:07:59,410 --> 00:08:03,610
|
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff
changeset
|
741 |
any strings containing a or b's.
|
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff
changeset
|
742 |
|
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff
changeset
|
743 |
163
|
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff
changeset
|
744 |
00:08:03,610 --> 00:08:05,860
|
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff
changeset
|
745 |
While this regular expression can
|
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff
changeset
|
746 |
|
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff
changeset
|
747 |
164
|
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff
changeset
|
748 |
00:08:05,860 --> 00:08:07,945
|
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff
changeset
|
749 |
only match the single character
|
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff
changeset
|
750 |
|
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff
changeset
|
751 |
165
|
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff
changeset
|
752 |
00:08:07,945 --> 00:08:13,300
|
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff
changeset
|
753 |
a or any string
|
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff
changeset
|
754 |
containing only b's.
|
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff
changeset
|
755 |
|
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff
changeset
|
756 |
166
|
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff
changeset
|
757 |
00:08:13,300 --> 00:08:15,265
|
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff
changeset
|
758 |
So to make the difference clear,
|
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff
changeset
|
759 |
|
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff
changeset
|
760 |
167
|
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff
changeset
|
761 |
00:08:15,265 --> 00:08:20,065
|
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff
changeset
|
762 |
in this example, we would have
|
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff
changeset
|
763 |
to use the parentheses.
|
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff
changeset
|
764 |
|
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff
changeset
|
765 |
168
|
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff
changeset
|
766 |
00:08:20,065 --> 00:08:23,140
|
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff
changeset
|
767 |
There's one more issue
|
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff
changeset
|
768 |
with this definition.
|
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff
changeset
|
769 |
|
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff
changeset
|
770 |
169
|
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff
changeset
|
771 |
00:08:23,140 --> 00:08:26,635
|
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff
changeset
|
772 |
Why do we focus on these
|
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff
changeset
|
773 |
basic regular expressions?
|
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff
changeset
|
774 |
|
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff
changeset
|
775 |
170
|
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff
changeset
|
776 |
00:08:26,635 --> 00:08:28,660
|
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff
changeset
|
777 |
Why don't we also include
|
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff
changeset
|
778 |
|
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff
changeset
|
779 |
171
|
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff
changeset
|
780 |
00:08:28,660 --> 00:08:31,285
|
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff
changeset
|
781 |
the ones from the
|
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff
changeset
|
782 |
extended regular expressions.
|
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff
changeset
|
783 |
|
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff
changeset
|
784 |
172
|
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff
changeset
|
785 |
00:08:31,285 --> 00:08:33,055
|
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff
changeset
|
786 |
The answers very easy.
|
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff
changeset
|
787 |
|
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff
changeset
|
788 |
173
|
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff
changeset
|
789 |
00:08:33,055 --> 00:08:35,680
|
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff
changeset
|
790 |
These basic regular
|
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff
changeset
|
791 |
expressions can be used
|
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff
changeset
|
792 |
|
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff
changeset
|
793 |
174
|
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff
changeset
|
794 |
00:08:35,680 --> 00:08:38,370
|
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff
changeset
|
795 |
to represent also
|
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff
changeset
|
796 |
the extended ones.
|
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff
changeset
|
797 |
|
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff
changeset
|
798 |
175
|
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff
changeset
|
799 |
00:08:38,370 --> 00:08:40,220
|
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff
changeset
|
800 |
Let me give you some examples.
|
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff
changeset
|
801 |
|
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff
changeset
|
802 |
176
|
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff
changeset
|
803 |
00:08:40,220 --> 00:08:44,225
|
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff
changeset
|
804 |
If I have a regular
|
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff
changeset
|
805 |
expression r+, for example,
|
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff
changeset
|
806 |
|
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff
changeset
|
807 |
177
|
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff
changeset
|
808 |
00:08:44,225 --> 00:08:46,280
|
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff
changeset
|
809 |
then the meaning
|
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff
changeset
|
810 |
was I have to use
|
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff
changeset
|
811 |
|
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff
changeset
|
812 |
178
|
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff
changeset
|
813 |
00:08:46,280 --> 00:08:49,115
|
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff
changeset
|
814 |
at least one or more copies
|
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff
changeset
|
815 |
|
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff
changeset
|
816 |
179
|
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff
changeset
|
817 |
00:08:49,115 --> 00:08:51,200
|
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff
changeset
|
818 |
of this r to
|
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff
changeset
|
819 |
match a string.
|
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff
changeset
|
820 |
|
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff
changeset
|
821 |
180
|
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff
changeset
|
822 |
00:08:51,200 --> 00:08:53,810
|
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff
changeset
|
823 |
Well, one or more copies
|
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff
changeset
|
824 |
can be represented by
|
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff
changeset
|
825 |
|
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff
changeset
|
826 |
181
|
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff
changeset
|
827 |
00:08:53,810 --> 00:08:58,385
|
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff
changeset
|
828 |
the basic ones as just
|
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff
changeset
|
829 |
r followed by r*.
|
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff
changeset
|
830 |
|
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff
changeset
|
831 |
182
|
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff
changeset
|
832 |
00:08:58,385 --> 00:09:01,760
|
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff
changeset
|
833 |
Meaning I have to use one
|
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff
changeset
|
834 |
copy of r, followed by
|
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff
changeset
|
835 |
|
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff
changeset
|
836 |
183
|
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff
changeset
|
837 |
00:09:01,760 --> 00:09:05,150
|
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff
changeset
|
838 |
0 or more copies of r.
|
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff
changeset
|
839 |
|
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff
changeset
|
840 |
184
|
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff
changeset
|
841 |
00:09:05,150 --> 00:09:07,895
|
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff
changeset
|
842 |
Similarly, if I have the optional
|
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff
changeset
|
843 |
regular expression,
|
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff
changeset
|
844 |
|
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff
changeset
|
845 |
185
|
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff
changeset
|
846 |
00:09:07,895 --> 00:09:10,715
|
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff
changeset
|
847 |
which is supposed to
|
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff
changeset
|
848 |
match a string
|
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff
changeset
|
849 |
|
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff
changeset
|
850 |
186
|
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff
changeset
|
851 |
00:09:10,715 --> 00:09:13,865
|
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff
changeset
|
852 |
by using r, or match
|
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff
changeset
|
853 |
the empty string.
|
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff
changeset
|
854 |
|
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff
changeset
|
855 |
187
|
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff
changeset
|
856 |
00:09:13,865 --> 00:09:19,295
|
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff
changeset
|
857 |
Then this can be obviously
|
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff
changeset
|
858 |
defined as r + 1.
|
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff
changeset
|
859 |
|
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff
changeset
|
860 |
188
|
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff
changeset
|
861 |
00:09:19,295 --> 00:09:23,945
|
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff
changeset
|
862 |
So here is the bold
|
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff
changeset
|
863 |
regular expression 1,
|
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff
changeset
|
864 |
|
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff
changeset
|
865 |
189
|
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff
changeset
|
866 |
00:09:23,945 --> 00:09:26,180
|
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff
changeset
|
867 |
which means it either can
|
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff
changeset
|
868 |
|
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff
changeset
|
869 |
190
|
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff
changeset
|
870 |
00:09:26,180 --> 00:09:28,205
|
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff
changeset
|
871 |
recognize whatever
|
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff
changeset
|
872 |
r can recognize,
|
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff
changeset
|
873 |
|
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff
changeset
|
874 |
191
|
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff
changeset
|
875 |
00:09:28,205 --> 00:09:30,470
|
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff
changeset
|
876 |
or it can recognize
|
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff
changeset
|
877 |
the empty string.
|
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff
changeset
|
878 |
|
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff
changeset
|
879 |
192
|
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff
changeset
|
880 |
00:09:30,470 --> 00:09:35,150
|
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff
changeset
|
881 |
And if I have ranges, like a
|
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff
changeset
|
882 |
to z, then I can define
|
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff
changeset
|
883 |
|
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff
changeset
|
884 |
193
|
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff
changeset
|
885 |
00:09:35,150 --> 00:09:41,135
|
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff
changeset
|
886 |
that as a + b + c + ...
|
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff
changeset
|
887 |
and so on until z.
|
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff
changeset
|
888 |
|
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff
changeset
|
889 |
194
|
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff
changeset
|
890 |
00:09:41,135 --> 00:09:45,920
|
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff
changeset
|
891 |
Maybe this definition is not
|
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff
changeset
|
892 |
good in terms of runtime,
|
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff
changeset
|
893 |
|
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff
changeset
|
894 |
195
|
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff
changeset
|
895 |
00:09:45,920 --> 00:09:47,960
|
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff
changeset
|
896 |
but in terms of just being able
|
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff
changeset
|
897 |
|
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff
changeset
|
898 |
196
|
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff
changeset
|
899 |
00:09:47,960 --> 00:09:50,780
|
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff
changeset
|
900 |
to recognize strings
|
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff
changeset
|
901 |
or match strings,
|
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff
changeset
|
902 |
|
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff
changeset
|
903 |
197
|
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff
changeset
|
904 |
00:09:50,780 --> 00:09:54,680
|
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff
changeset
|
905 |
the basic regular expressions
|
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff
changeset
|
906 |
will be just sufficient.
|
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff
changeset
|
907 |
|
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff
changeset
|
908 |
198
|
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff
changeset
|
909 |
00:09:54,680 --> 00:09:56,690
|
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff
changeset
|
910 |
Unfortunately, we
|
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff
changeset
|
911 |
also need to have
|
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff
changeset
|
912 |
|
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff
changeset
|
913 |
199
|
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff
changeset
|
914 |
00:09:56,690 --> 00:09:58,850
|
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff
changeset
|
915 |
a quick chat about strings.
|
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff
changeset
|
916 |
|
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff
changeset
|
917 |
200
|
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff
changeset
|
918 |
00:09:58,850 --> 00:10:02,255
|
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff
changeset
|
919 |
In Scala, it's crystal
|
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff
changeset
|
920 |
clear what a string is.
|
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff
changeset
|
921 |
|
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff
changeset
|
922 |
201
|
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff
changeset
|
923 |
00:10:02,255 --> 00:10:05,480
|
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff
changeset
|
924 |
There's a separate datatype
|
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff
changeset
|
925 |
which is called string.
|
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff
changeset
|
926 |
|
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff
changeset
|
927 |
202
|
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff
changeset
|
928 |
00:10:05,480 --> 00:10:07,895
|
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff
changeset
|
929 |
So here, for example,
|
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff
changeset
|
930 |
is a string.
|
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff
changeset
|
931 |
|
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff
changeset
|
932 |
203
|
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff
changeset
|
933 |
00:10:07,895 --> 00:10:09,200
|
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff
changeset
|
934 |
And as you can see,
|
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff
changeset
|
935 |
|
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff
changeset
|
936 |
204
|
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff
changeset
|
937 |
00:10:09,200 --> 00:10:11,105
|
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff
changeset
|
938 |
it is of the type string.
|
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff
changeset
|
939 |
|
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff
changeset
|
940 |
205
|
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff
changeset
|
941 |
00:10:11,105 --> 00:10:13,985
|
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff
changeset
|
942 |
And the empty string
|
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff
changeset
|
943 |
will be just that.
|
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff
changeset
|
944 |
|
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff
changeset
|
945 |
206
|
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff
changeset
|
946 |
00:10:13,985 --> 00:10:16,160
|
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff
changeset
|
947 |
However, when we write things down on
|
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff
changeset
|
948 |
|
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff
changeset
|
949 |
207
|
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff
changeset
|
950 |
00:10:16,160 --> 00:10:18,320
|
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff
changeset
|
951 |
paper and think
|
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff
changeset
|
952 |
about our algorithm,
|
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff
changeset
|
953 |
|
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff
changeset
|
954 |
208
|
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff
changeset
|
955 |
00:10:18,320 --> 00:10:22,790
|
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff
changeset
|
956 |
we want to think of strings
|
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff
changeset
|
957 |
as lists of characters.
|
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff
changeset
|
958 |
|
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff
changeset
|
959 |
209
|
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff
changeset
|
960 |
00:10:22,790 --> 00:10:26,070
|
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff
changeset
|
961 |
So more something like this.
|
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff
changeset
|
962 |
|
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff
changeset
|
963 |
210
|
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff
changeset
|
964 |
00:10:27,070 --> 00:10:31,745
|
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff
changeset
|
965 |
You can see here, this is actually
|
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff
changeset
|
966 |
a list of characters.
|
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff
changeset
|
967 |
|
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff
changeset
|
968 |
211
|
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff
changeset
|
969 |
00:10:31,745 --> 00:10:35,150
|
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff
changeset
|
970 |
And the two operations
|
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff
changeset
|
971 |
we need are taking
|
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff
changeset
|
972 |
|
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff
changeset
|
973 |
212
|
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff
changeset
|
974 |
00:10:35,150 --> 00:10:37,280
|
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff
changeset
|
975 |
the head of this list and
|
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff
changeset
|
976 |
|
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff
changeset
|
977 |
213
|
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff
changeset
|
978 |
00:10:37,280 --> 00:10:39,770
|
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff
changeset
|
979 |
the rest of the list
|
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff
changeset
|
980 |
or tail of the list.
|
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff
changeset
|
981 |
|
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff
changeset
|
982 |
214
|
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff
changeset
|
983 |
00:10:39,770 --> 00:10:41,720
|
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff
changeset
|
984 |
That's why we want
|
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff
changeset
|
985 |
to regard them as
|
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff
changeset
|
986 |
|
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff
changeset
|
987 |
215
|
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff
changeset
|
988 |
00:10:41,720 --> 00:10:45,260
|
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff
changeset
|
989 |
lists rather than strings.
|
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff
changeset
|
990 |
|
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff
changeset
|
991 |
216
|
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff
changeset
|
992 |
00:10:45,260 --> 00:10:48,200
|
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff
changeset
|
993 |
So if I'm using a
|
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff
changeset
|
994 |
string like this,
|
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff
changeset
|
995 |
|
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff
changeset
|
996 |
217
|
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff
changeset
|
997 |
00:10:48,200 --> 00:10:51,935
|
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff
changeset
|
998 |
then on paper I always will
|
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff
changeset
|
999 |
write something like that.
|
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff
changeset
|
1000 |
|
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff
changeset
|
1001 |
218
|
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff
changeset
|
1002 |
00:10:51,935 --> 00:10:54,575
|
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff
changeset
|
1003 |
Or since I'm lazy, just that.
|
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff
changeset
|
1004 |
|
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff
changeset
|
1005 |
219
|
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff
changeset
|
1006 |
00:10:54,575 --> 00:10:56,675
|
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff
changeset
|
1007 |
And for the empty string,
|
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff
changeset
|
1008 |
|
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff
changeset
|
1009 |
220
|
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff
changeset
|
1010 |
00:10:56,675 --> 00:10:59,210
|
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff
changeset
|
1011 |
I will write either
|
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff
changeset
|
1012 |
the empty list, with
|
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff
changeset
|
1013 |
|
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff
changeset
|
1014 |
221
|
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff
changeset
|
1015 |
00:10:59,210 --> 00:11:03,920
|
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff
changeset
|
1016 |
two brackets or,
|
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff
changeset
|
1017 |
being lazy, just that.
|
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff
changeset
|
1018 |
|
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff
changeset
|
1019 |
222
|
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff
changeset
|
1020 |
00:11:03,920 --> 00:11:06,620
|
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff
changeset
|
1021 |
Actually there is one
|
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff
changeset
|
1022 |
more operation we need on
|
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff
changeset
|
1023 |
|
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff
changeset
|
1024 |
223
|
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff
changeset
|
1025 |
00:11:06,620 --> 00:11:09,410
|
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff
changeset
|
1026 |
strings and that
|
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff
changeset
|
1027 |
is concatenation.
|
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff
changeset
|
1028 |
|
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff
changeset
|
1029 |
224
|
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff
changeset
|
1030 |
00:11:09,410 --> 00:11:11,255
|
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff
changeset
|
1031 |
If you have a string s1,
|
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff
changeset
|
1032 |
|
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff
changeset
|
1033 |
225
|
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff
changeset
|
1034 |
00:11:11,255 --> 00:11:14,510
|
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff
changeset
|
1035 |
string s2, and put an
|
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff
changeset
|
1036 |
at symbol in between,
|
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff
changeset
|
1037 |
|
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff
changeset
|
1038 |
226
|
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff
changeset
|
1039 |
00:11:14,510 --> 00:11:18,050
|
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff
changeset
|
1040 |
that means we want to
|
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff
changeset
|
1041 |
concatenate both strings.
|
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff
changeset
|
1042 |
|
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff
changeset
|
1043 |
227
|
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff
changeset
|
1044 |
00:11:18,050 --> 00:11:22,625
|
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff
changeset
|
1045 |
So foo concatenated with
|
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff
changeset
|
1046 |
bar, would be foobar.
|
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff
changeset
|
1047 |
|
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff
changeset
|
1048 |
228
|
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff
changeset
|
1049 |
00:11:22,625 --> 00:11:25,085
|
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff
changeset
|
1050 |
And any string concatenated with
|
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff
changeset
|
1051 |
|
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff
changeset
|
1052 |
229
|
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff
changeset
|
1053 |
00:11:25,085 --> 00:11:27,950
|
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff
changeset
|
1054 |
the empty string
|
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff
changeset
|
1055 |
is left untouched.
|
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff
changeset
|
1056 |
|
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff
changeset
|
1057 |
230
|
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff
changeset
|
1058 |
00:11:27,950 --> 00:11:31,310
|
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff
changeset
|
1059 |
So baz concatenated with
|
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff
changeset
|
1060 |
|
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff
changeset
|
1061 |
231
|
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff
changeset
|
1062 |
00:11:31,310 --> 00:11:33,545
|
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff
changeset
|
1063 |
the empty string, is just baz.
|
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff
changeset
|
1064 |
|
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff
changeset
|
1065 |
232
|
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff
changeset
|
1066 |
00:11:33,545 --> 00:11:37,295
|
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff
changeset
|
1067 |
So that's like if we have
|
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff
changeset
|
1068 |
strings as lists of characters,
|
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff
changeset
|
1069 |
|
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff
changeset
|
1070 |
233
|
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff
changeset
|
1071 |
00:11:37,295 --> 00:11:39,755
|
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff
changeset
|
1072 |
that will be just list append.
|
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff
changeset
|
1073 |
|
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff
changeset
|
1074 |
234
|
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff
changeset
|
1075 |
00:11:39,755 --> 00:11:41,480
|
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff
changeset
|
1076 |
In the next video,
|
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff
changeset
|
1077 |
|
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff
changeset
|
1078 |
235
|
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff
changeset
|
1079 |
00:11:41,480 --> 00:11:43,160
|
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff
changeset
|
1080 |
we will use these definitions
|
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff
changeset
|
1081 |
|
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff
changeset
|
1082 |
236
|
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff
changeset
|
1083 |
00:11:43,160 --> 00:11:45,050
|
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff
changeset
|
1084 |
and introduce the notion of what
|
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff
changeset
|
1085 |
|
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff
changeset
|
1086 |
237
|
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff
changeset
|
1087 |
00:11:45,050 --> 00:11:46,850
|
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff
changeset
|
1088 |
a language is and
|
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff
changeset
|
1089 |
|
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff
changeset
|
1090 |
238
|
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff
changeset
|
1091 |
00:11:46,850 --> 00:11:49,920
|
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff
changeset
|
1092 |
what the meaning of a
|
Christian Urban <christian.urban@kcl.ac.uk>
parents:
diff
changeset
|
1093 |
regular expression is.
|