1
00:00:05,810 --> 00:00:11,039
Okay, so we have strings.
The empty string and string "abc".
2
00:00:11,039 --> 00:00:13,200
And we can take the head
3
00:00:13,200 --> 00:00:15,615
of a string and the
tail of the string,
4
00:00:15,615 --> 00:00:18,480
since we regard them as
lists of characters.
5
00:00:18,480 --> 00:00:22,230
We also introduced the
notion of a language.
6
00:00:22,230 --> 00:00:25,305
A language is just
a set of strings.
7
00:00:25,305 --> 00:00:26,700
So here's a language.
8
00:00:26,700 --> 00:00:28,275
It contains the empty string,
9
00:00:28,275 --> 00:00:30,360
the string hello, string foobar,
10
00:00:30,360 --> 00:00:31,965
the string, which is
11
00:00:31,965 --> 00:00:34,874
just a character a and
the string abc.
12
00:00:34,874 --> 00:00:37,800
There will be also other
languages, which for example,
13
00:00:37,800 --> 00:00:39,780
contain infinitely many strings.
14
00:00:39,780 --> 00:00:42,190
Actually that will
happen quite often.
15
00:00:42,190 --> 00:00:45,560
Now, you've seen this operation
16
00:00:45,560 --> 00:00:47,660
of concatenation of two strings.
17
00:00:47,660 --> 00:00:50,840
So if you have the string
foo and a string bar,
18
00:00:50,840 --> 00:00:53,255
we can just put them
next to each other.
19
00:00:53,255 --> 00:00:57,725
We will now extend that
notion also told languages.
20
00:00:57,725 --> 00:01:02,270
So assume you have a
language A and a language B.
21
00:01:02,270 --> 00:01:05,825
That means we take every
string from the language
22
00:01:05,825 --> 00:01:11,195
A and concatenate it with every
string of the language B.
23
00:01:11,195 --> 00:01:13,160
Here's an example.
24
00:01:13,160 --> 00:01:14,750
If you have the language A
25
00:01:14,750 --> 00:01:17,030
containing foo and bar as strings,
26
00:01:17,030 --> 00:01:20,585
and the language B containing
a and b as strings.
27
00:01:20,585 --> 00:01:23,000
Then concatenating
these two languages
28
00:01:23,000 --> 00:01:27,110
means I take foo with this a and b,
29
00:01:27,110 --> 00:01:28,625
giving fooa and foob,
30
00:01:28,625 --> 00:01:30,905
and I take bar and
concatenated with
31
00:01:30,905 --> 00:01:34,160
a and b, giving bara and barb.
32
00:01:34,160 --> 00:01:36,185
So the pointer is we're
33
00:01:36,185 --> 00:01:39,110
overloading this notion
of concatenating
34
00:01:39,110 --> 00:01:45,020
two strings also to
concatenating two languages.
35
00:01:45,020 --> 00:01:48,890
Okay, here are two corner
cases of this definition.
36
00:01:48,890 --> 00:01:51,395
What happens if I
have any language,
37
00:01:51,395 --> 00:01:54,470
say A, and I concatenate it with
38
00:01:54,470 --> 00:01:58,640
the language which contains
only the empty string.
39
00:01:58,640 --> 00:02:02,270
And secondly, if I
have any language,
40
00:02:02,270 --> 00:02:04,220
say again A, and I
41
00:02:04,220 --> 00:02:05,870
concatenate it with the language
42
00:02:05,870 --> 00:02:08,345
which doesn't contain any string.
43
00:02:08,345 --> 00:02:11,720
How would these two
cases be defined?
44
00:02:11,720 --> 00:02:16,055
Remember, this language here
contains a single element,
45
00:02:16,055 --> 00:02:17,645
namely the empty string,
46
00:02:17,645 --> 00:02:20,210
while this language
47
00:02:20,210 --> 00:02:23,585
does not contain any
string. It is empty.
48
00:02:23,585 --> 00:02:26,930
So these are two
different languages.
49
00:02:26,930 --> 00:02:29,960
Think about how this
would be defined.
50
00:02:29,960 --> 00:02:33,455
So what's the point of
all these definitions?
51
00:02:33,455 --> 00:02:36,110
Well, we want to make precise
52
00:02:36,110 --> 00:02:38,735
what is the meaning of
a regular expression.
53
00:02:38,735 --> 00:02:41,510
For this, we use
this function L. It
54
00:02:41,510 --> 00:02:44,180
will take a regular expression as
55
00:02:44,180 --> 00:02:47,150
argument and generates a
56
00:02:47,150 --> 00:02:49,970
set of strings or a language.
57
00:02:49,970 --> 00:02:52,670
And it's supposed to be the
language which contains
58
00:02:52,670 --> 00:02:56,330
all the strings this regular
expression can match.
59
00:02:56,330 --> 00:02:58,670
So remember the 0
regular expression,
60
00:02:58,670 --> 00:03:01,115
it's not supposed to
match any string.
61
00:03:01,115 --> 00:03:05,105
So we give it the empty
language or empty set.
62
00:03:05,105 --> 00:03:06,740
This regular expression,
63
00:03:06,740 --> 00:03:09,380
the one regular expression
is supposed to match
64
00:03:09,380 --> 00:03:13,445
exactly one string,
namely the empty string.
65
00:03:13,445 --> 00:03:15,710
the regular expression
66
00:03:15,710 --> 00:03:18,875
recognizing just the
character c, in this case.
67
00:03:18,875 --> 00:03:22,820
Well, that will be the
language which contains
68
00:03:22,820 --> 00:03:28,175
the string only containing
the single character c.
69
00:03:28,175 --> 00:03:31,295
Now, what is this regular
expression supposed to match?
70
00:03:31,295 --> 00:03:34,835
Well, a string is matched
by this regular expression
71
00:03:34,835 --> 00:03:37,355
if it is matched by
the first component, r1,
72
00:03:37,355 --> 00:03:40,970
or by the second component, r2.
73
00:03:40,970 --> 00:03:43,190
That's why this plus
is the alternative.
74
00:03:43,190 --> 00:03:45,605
So which are the strings
75
00:03:45,605 --> 00:03:48,320
this regular expression
can match? Well,
76
00:03:48,320 --> 00:03:51,275
all the strings
r1 one can match,
77
00:03:51,275 --> 00:03:54,170
given by L(r1) union...
78
00:03:54,170 --> 00:03:57,380
all the strings
R2 can match.
79
00:03:57,380 --> 00:04:01,325
And we use here the union
because this alternative,
80
00:04:01,325 --> 00:04:05,945
even if it says this string
is matched by r1 or by r2.
81
00:04:05,945 --> 00:04:08,465
The meaning of this
reg expression
82
00:04:08,465 --> 00:04:11,209
is the union of both languages.
83
00:04:11,209 --> 00:04:14,315
Now the sequence case.
84
00:04:14,315 --> 00:04:17,960
This means a string is
matched by this regular expression
85
00:04:17,960 --> 00:04:20,510
if the first part of this
string is matched by r1
86
00:04:20,510 --> 00:04:24,935
and the second part of
the string is matched by r2.
87
00:04:24,935 --> 00:04:28,490
Now we have to say, which
are all the strings
88
00:04:28,490 --> 00:04:31,645
this regular expression
can match. Well, it's
89
00:04:31,645 --> 00:04:34,495
all the strings
r1 can match,
90
00:04:34,495 --> 00:04:39,205
concatenated with all the
strings r2 can match.
91
00:04:39,205 --> 00:04:42,895
So for this, we have to use
this concatenation operation.
92
00:04:42,895 --> 00:04:47,440
So if r1 can match a string
and r2 can match a string,
93
00:04:47,440 --> 00:04:51,220
then in the meaning will be
the concatenation of that.
94
00:04:51,220 --> 00:04:53,170
So with this function L
95
00:04:53,170 --> 00:04:56,995
we can make precise
what are the strings,
96
00:04:56,995 --> 00:05:00,205
*all* the strings a regular
expression can match.
97
00:05:00,205 --> 00:05:04,689
The only case we haven't
specified yet is the r*.
98
00:05:04,689 --> 00:05:07,750
For this, we need
another operation
99
00:05:07,750 --> 00:05:12,470
on languages or sets of
strings, which we do next.
100
00:05:12,490 --> 00:05:17,285
We introduce a power
operation for languages.
101
00:05:17,285 --> 00:05:19,100
The power operation or
102
00:05:19,100 --> 00:05:22,835
the nth power is
defined recursively.
103
00:05:22,835 --> 00:05:26,615
So the A to the power of 0
104
00:05:26,615 --> 00:05:31,205
would be defined as the set
containing the empty string.
105
00:05:31,205 --> 00:05:36,200
And A to the power of
n + 1 would be A
106
00:05:36,200 --> 00:05:38,705
concatenated with
A to the power of
107
00:05:38,705 --> 00:05:42,160
n. Here are a few examples.
108
00:05:42,160 --> 00:05:45,380
A to the power of four
would be essentially
109
00:05:45,380 --> 00:05:47,660
A concatenated with
A concatenated with
110
00:05:47,660 --> 00:05:49,640
A concatenated with A,
111
00:05:49,640 --> 00:05:51,650
and also concatenated with
112
00:05:51,650 --> 00:05:55,580
the language which
contains the empty string.
113
00:05:55,580 --> 00:05:57,275
Because that's how we define
114
00:05:57,275 --> 00:05:59,975
the base case, A
to the power of 0.
115
00:05:59,975 --> 00:06:03,410
And A to the power
one would be just A,
116
00:06:03,410 --> 00:06:05,330
again followed by that one,
117
00:06:05,330 --> 00:06:07,790
but this would be just A.
118
00:06:07,790 --> 00:06:10,385
And A to the power 0
119
00:06:10,385 --> 00:06:14,270
is this language which
contains the empty string.
120
00:06:14,270 --> 00:06:16,670
With this power operation,
121
00:06:16,670 --> 00:06:19,505
I can also fill in this case.
122
00:06:19,505 --> 00:06:23,210
Remember this r* operation
is supposed to match
123
00:06:23,210 --> 00:06:27,680
a string with either
0 or more copies of r.
124
00:06:27,680 --> 00:06:31,940
So the meaning of this
regular expression would be
125
00:06:31,940 --> 00:06:37,475
now the Union of all the
powers greater or equal than 0,
126
00:06:37,475 --> 00:06:40,835
of the language that r can match.
127
00:06:40,835 --> 00:06:44,945
And we take now at the
union of all these powers,
128
00:06:44,945 --> 00:06:47,150
that means essentially what
129
00:06:47,150 --> 00:06:48,530
can the regular expression match
130
00:06:48,530 --> 00:06:51,440
if I take L of r
to the 0th power,
131
00:06:51,440 --> 00:06:53,030
what can I match with
132
00:06:53,030 --> 00:06:57,305
the first power, the
second power, and so on.
133
00:06:57,305 --> 00:07:00,544
And I take the union
of all these powers.
134
00:07:00,544 --> 00:07:03,830
That's the definition
of which strings
135
00:07:03,830 --> 00:07:05,510
this regular expression
is supposed to
136
00:07:05,510 --> 00:07:08,510
match. The meaning of
this regular expression.
137
00:07:08,510 --> 00:07:11,300
This is such an
important definition,
138
00:07:11,300 --> 00:07:13,250
that there's actually
a name for it.
139
00:07:13,250 --> 00:07:15,020
It's called the Kleene star.
140
00:07:15,020 --> 00:07:16,400
And it's written like this.
141
00:07:16,400 --> 00:07:18,335
If I have a language A,
142
00:07:18,335 --> 00:07:21,785
I can build the star
of this language
143
00:07:21,785 --> 00:07:26,255
by union up all the powers
greater or equal than 0.
144
00:07:26,255 --> 00:07:28,580
The A-star of
145
00:07:28,580 --> 00:07:31,580
a language would expand
to a to the power 0,
146
00:07:31,580 --> 00:07:33,770
union A to the power of one,
147
00:07:33,770 --> 00:07:36,665
A to the power of two, and so on.
148
00:07:36,665 --> 00:07:39,230
And it would
essentially mean, well,
149
00:07:39,230 --> 00:07:43,460
it's the language which contains
the empty string because
150
00:07:43,460 --> 00:07:48,290
of the A to the 0, one copy of A,
151
00:07:48,290 --> 00:07:51,020
two concatenated copies of A,
152
00:07:51,020 --> 00:07:55,070
three concatenated
copies of A, and so on.
153
00:07:55,070 --> 00:07:59,060
So that's what this A-star
is defined as.
154
00:07:59,060 --> 00:08:03,725
Essentially as the Union
of all the powers.
155
00:08:03,725 --> 00:08:05,990
And it's essentially
each power is how many
156
00:08:05,990 --> 00:08:08,750
times I have to
concatenate this language A.
157
00:08:08,750 --> 00:08:13,549
And note that this language
A-star will always contain
158
00:08:13,549 --> 00:08:21,240
the empty string because that
is how the A^0 is defined.
159
00:08:21,310 --> 00:08:23,540
So in this definition,
160
00:08:23,540 --> 00:08:25,969
I could have also written star
161
00:08:25,969 --> 00:08:29,180
here because the
meaning of this r*,
162
00:08:29,180 --> 00:08:34,055
regular expression
is the Kleene star of
163
00:08:34,055 --> 00:08:37,130
the language of r.
Remember that's
164
00:08:37,130 --> 00:08:41,525
the union of all powers
greater or equal than 0.
165
00:08:41,525 --> 00:08:43,640
It's behind.
I hope you don't get
166
00:08:43,640 --> 00:08:45,800
confused by the use of the stars.
167
00:08:45,800 --> 00:08:48,845
This star is for
the regular expressions.
168
00:08:48,845 --> 00:08:52,205
That star is for languages.
169
00:08:52,205 --> 00:08:54,274
They are two
different operations.
170
00:08:54,274 --> 00:08:56,555
They don't even
have the same type.
171
00:08:56,555 --> 00:08:58,954
Here might be
something interesting.
172
00:08:58,954 --> 00:09:00,560
Remember I asked you to think
173
00:09:00,560 --> 00:09:03,035
about these two corner cases.
174
00:09:03,035 --> 00:09:04,730
I hope you have done so,
175
00:09:04,730 --> 00:09:07,070
but let's also have look
at this together.
176
00:09:07,070 --> 00:09:09,785
According to the definition
177
00:09:09,785 --> 00:09:11,930
for this append of languages,
178
00:09:11,930 --> 00:09:13,670
I have to take every string in
179
00:09:13,670 --> 00:09:16,130
this set and have
180
00:09:16,130 --> 00:09:18,695
two concatenated with
every string in that set.
181
00:09:18,695 --> 00:09:22,115
So this set contains only
the empty string as element.
182
00:09:22,115 --> 00:09:24,820
So if I concatenate anything in
183
00:09:24,820 --> 00:09:27,745
there with the empty string,
that will be left untouched.
184
00:09:27,745 --> 00:09:31,855
So this one will be
actually A.
185
00:09:31,855 --> 00:09:34,600
This one I have to
take every string in
186
00:09:34,600 --> 00:09:36,190
this language and I have to
187
00:09:36,190 --> 00:09:39,115
concatenate with every
string in that language.
188
00:09:39,115 --> 00:09:41,110
But here isn't any string.
189
00:09:41,110 --> 00:09:43,300
So I can't concatenate that.
190
00:09:43,300 --> 00:09:46,900
That will be actually
the empty set (not empty string).
191
00:09:46,900 --> 00:09:49,570
So now let's have
192
00:09:49,570 --> 00:09:51,670
a look at regular expressions.
193
00:09:51,670 --> 00:09:53,230
That was with languages.
194
00:09:53,230 --> 00:09:56,035
But let's have a look at
regular expressions now.
195
00:09:56,035 --> 00:09:58,660
What would be the
meaning, for example,
196
00:09:58,660 --> 00:10:01,945
of r followed by 1?
197
00:10:01,945 --> 00:10:04,149
That is a regular expression.
198
00:10:04,149 --> 00:10:06,255
And it essentially says:
199
00:10:06,255 --> 00:10:07,850
this regular expression matches
200
00:10:07,850 --> 00:10:10,685
some strings and this matches
the empty string.
201
00:10:10,685 --> 00:10:13,730
So if you find out
what the meaning is,
202
00:10:13,730 --> 00:10:16,955
we would apply this
L-function to it.
203
00:10:16,955 --> 00:10:19,430
And if we now look
up the definition,
204
00:10:19,430 --> 00:10:27,360
that would be L of r
appended L of 1.
205
00:10:27,850 --> 00:10:32,255
If you look up what
this is defined,
206
00:10:32,255 --> 00:10:41,640
then this would be L of r
appended with this language.
207
00:10:41,950 --> 00:10:46,325
And if you now look up
our definition earlier,
208
00:10:46,325 --> 00:10:50,455
that will be just L of r.
209
00:10:50,455 --> 00:10:52,690
What that essentially
210
00:10:52,690 --> 00:10:55,765
means is if you have
this regular expression,
211
00:10:55,765 --> 00:10:58,885
this will match
exactly those strings
212
00:10:58,885 --> 00:11:01,000
which this regular
expression can match.
213
00:11:01,000 --> 00:11:04,794
And that pretty much
coincides with our intuition.
214
00:11:04,794 --> 00:11:05,950
This is supposed to match
215
00:11:05,950 --> 00:11:08,410
the empty string at
the end of the string,
216
00:11:08,410 --> 00:11:11,005
so it doesn't really
make any difference.
217
00:11:11,005 --> 00:11:13,480
And the meaning
really tells us that.
218
00:11:13,480 --> 00:11:15,880
But here's the
interesting thing.
219
00:11:15,880 --> 00:11:17,979
When you were in school,
220
00:11:17,979 --> 00:11:23,124
how would you think
about this expression?
221
00:11:23,124 --> 00:11:29,515
Well, r times 1
equals just our, okay?
222
00:11:29,515 --> 00:11:33,634
Furthermore, if you had r times 0.
223
00:11:33,634 --> 00:11:35,045
What would that be equal?
224
00:11:35,045 --> 00:11:37,205
Well, it would be 0.
225
00:11:37,205 --> 00:11:39,650
Ok, let's have
226
00:11:39,650 --> 00:11:42,605
a look how that works out
with regular expressions.
227
00:11:42,605 --> 00:11:46,355
So if you take r followed by 0,
228
00:11:46,355 --> 00:11:48,260
remember this is
the regular expression
229
00:11:48,260 --> 00:11:49,895
that cannot match anything.
230
00:11:49,895 --> 00:11:52,415
By unfolding the definition,
231
00:11:52,415 --> 00:11:59,104
I would get L of r
appended with L of 0.
232
00:11:59,104 --> 00:12:01,775
This is of course defined as L of r
233
00:12:01,775 --> 00:12:05,915
appended with the empty set.
234
00:12:05,915 --> 00:12:10,760
And this would, according
to the definition earlier,
235
00:12:10,760 --> 00:12:13,830
well just the empty set.
236
00:12:14,160 --> 00:12:20,260
And this would be
of course L of 0.
237
00:12:20,260 --> 00:12:24,580
So it seems like these laws,
238
00:12:24,580 --> 00:12:27,175
at least for the times,
239
00:12:27,175 --> 00:12:29,785
seem to be valid from math,
240
00:12:29,785 --> 00:12:31,420
are also valid with regular expressions,
241
00:12:31,420 --> 00:12:33,370
if we look at their meaning.
242
00:12:33,370 --> 00:12:36,670
These laws on natural numbers
243
00:12:36,670 --> 00:12:40,105
and regular expressions and
their close correspondance
244
00:12:40,105 --> 00:12:42,460
probably explain why I use
245
00:12:42,460 --> 00:12:46,975
a times for the sequence
regular expression.
246
00:12:46,975 --> 00:12:51,040
So r followed by the
regular expression 1,
247
00:12:51,040 --> 00:12:54,505
will have the same meaning
as that regular expression.
248
00:12:54,505 --> 00:12:59,120
And r followed by the
0 regular expression
249
00:12:59,120 --> 00:13:01,295
will have this meaning.
250
00:13:01,295 --> 00:13:03,590
You might now think, but I also
251
00:13:03,590 --> 00:13:06,605
wrote the alternative with a plus.
252
00:13:06,605 --> 00:13:10,370
Does a similar law
holds for plus?
253
00:13:10,370 --> 00:13:15,965
Of course, if I take r
plus 0 on natural numbers,
254
00:13:15,965 --> 00:13:20,060
that would be just r. Does
hold for regular expressions?
255
00:13:20,060 --> 00:13:22,085
Yes, indeed it holds.
256
00:13:22,085 --> 00:13:26,735
If I have this
regular expression and
257
00:13:26,735 --> 00:13:33,245
unfold the definition that
would be L(r) union L(0).
258
00:13:33,245 --> 00:13:38,060
This would be equal
to L(r) union...
259
00:13:38,060 --> 00:13:41,150
What is this? Well, that's
just the empty set.
260
00:13:41,150 --> 00:13:43,760
And if I union something
with the empty set,
261
00:13:43,760 --> 00:13:47,180
then this will be
just of L(r). So yes,
262
00:13:47,180 --> 00:13:50,120
indeed, plus is also very much
263
00:13:50,120 --> 00:13:54,184
inspired by the laws
on natural numbers.
264
00:13:54,184 --> 00:13:57,065
There exists other notations too,
265
00:13:57,065 --> 00:14:01,310
but I prefer this
using the plus for
266
00:14:01,310 --> 00:14:03,844
the alternatives and the times
267
00:14:03,844 --> 00:14:05,765
for the sequence expression.
268
00:14:05,765 --> 00:14:07,460
It's also the reason why I call
269
00:14:07,460 --> 00:14:10,325
this regular expression for
the empty string 1.
270
00:14:10,325 --> 00:14:14,915
And for matching
nothing at all 0.
271
00:14:14,915 --> 00:14:18,665
This correspondence to
natural numbers doesn't quite
272
00:14:18,665 --> 00:14:22,055
extend to the star
regular expression.
273
00:14:22,055 --> 00:14:25,055
But there's still a
connection. There too.
274
00:14:25,055 --> 00:14:26,510
Can you remember how
275
00:14:26,510 --> 00:14:29,345
the power operation on
languages was defined?
276
00:14:29,345 --> 00:14:31,370
The 0 case was defined
277
00:14:31,370 --> 00:14:34,520
as the set containing
the empty string.
278
00:14:34,520 --> 00:14:37,039
Why is that? It looks
a bit arbitrary.
279
00:14:37,039 --> 00:14:41,150
Why is it not the empty set
or why is it not defined as A?
280
00:14:41,150 --> 00:14:43,745
Why is defined in this
particular way?
281
00:14:43,745 --> 00:14:46,880
Well, can you remember how
282
00:14:46,880 --> 00:14:48,950
the power operation r to the
283
00:14:48,950 --> 00:14:51,440
0 is defined on natural numbers?
284
00:14:51,440 --> 00:14:53,930
Yes, that's defined as 1.
285
00:14:53,930 --> 00:14:57,125
Does that also apply to
regular expressions?
286
00:14:57,125 --> 00:15:00,725
Well, if we take the meaning
of a regular expression,
287
00:15:00,725 --> 00:15:04,730
let's say r, and raise
it to the 0th power,
288
00:15:04,730 --> 00:15:07,685
then this will be, of
course, by definition,
289
00:15:07,685 --> 00:15:12,245
be defined as the set
containing the empty string.
290
00:15:12,245 --> 00:15:17,160
And that is of course
equal to L(1).
291
00:15:17,830 --> 00:15:20,570
Again, you can see that
292
00:15:20,570 --> 00:15:23,300
this law on natural numbers also
293
00:15:23,300 --> 00:15:29,000
holds on the laws of regular
expressions - on heir meaning.
294
00:15:29,000 --> 00:15:32,810
What I find really beautiful
here is that of course,
295
00:15:32,810 --> 00:15:36,244
the meaning is defined on
sets, sets of strings.
296
00:15:36,244 --> 00:15:38,975
This of course, on natural numbers.
297
00:15:38,975 --> 00:15:41,060
You can probably
see now where I'm
298
00:15:41,060 --> 00:15:43,085
coming from with my notation.
299
00:15:43,085 --> 00:15:46,205
That is actually a slight
problem in this field,
300
00:15:46,205 --> 00:15:48,590
since it's so old.
301
00:15:48,590 --> 00:15:52,205
Pretty much everybody
introduced their own notation.
302
00:15:52,205 --> 00:15:53,870
And you now have heaps of
303
00:15:53,870 --> 00:15:55,550
different notations
for the same thing.
304
00:15:55,550 --> 00:15:57,379
Okay, that is slightly
exaggerating.
305
00:15:57,379 --> 00:16:00,830
And also the notation
I used for times and
306
00:16:00,830 --> 00:16:04,295
plus and 0 and 1s...definitely
I'm not the only one.
307
00:16:04,295 --> 00:16:05,435
There are many people
308
00:16:05,435 --> 00:16:07,625
who have used this
notation as well.
309
00:16:07,625 --> 00:16:10,190
It's just, it's not universal.
310
00:16:10,190 --> 00:16:12,290
Well, here's a question now.
311
00:16:12,290 --> 00:16:15,635
Why did we go through this
kerfuffle in the first place?
312
00:16:15,635 --> 00:16:19,370
Why did we introduce these
operations on languages?
313
00:16:19,370 --> 00:16:21,260
And why did we introduce
314
00:16:21,260 --> 00:16:23,255
the meaning of a
regular expression?
315
00:16:23,255 --> 00:16:26,300
Well, remember at the beginning,
316
00:16:26,300 --> 00:16:28,265
we wanted to specify
317
00:16:28,265 --> 00:16:31,880
what problem our algorithms
actually supposed to solve.
318
00:16:31,880 --> 00:16:35,960
And we want to make that
very precise so that we can
319
00:16:35,960 --> 00:16:38,555
be sure that our implementation
320
00:16:38,555 --> 00:16:40,295
really solves the problem.
321
00:16:40,295 --> 00:16:41,540
And that's what you can do now.
322
00:16:41,540 --> 00:16:45,605
We can say a regular
expression, r say,
323
00:16:45,605 --> 00:16:50,015
is matching a string
s if and only if
324
00:16:50,015 --> 00:16:55,474
the string s is in the language
of the regular expression.
325
00:16:55,474 --> 00:17:00,770
So that is the problem our
matcher is supposed to solve.
326
00:17:00,770 --> 00:17:03,320
And it's supposed to
solve that a bit faster
327
00:17:03,320 --> 00:17:06,860
than in Python, Ruby and Java.
328
00:17:06,860 --> 00:17:09,585
And unfortunately we cannot use
329
00:17:09,585 --> 00:17:12,260
the definition of L
directly for that.
330
00:17:12,260 --> 00:17:15,815
Because remember, in
the case of the star,
331
00:17:15,815 --> 00:17:17,690
sometimes the meaning of
332
00:17:17,690 --> 00:17:18,830
a regular expression is actually
333
00:17:18,830 --> 00:17:20,925
an infinite set of strings.
334
00:17:20,925 --> 00:17:23,575
And it's a tiny bit difficult
335
00:17:23,575 --> 00:17:27,040
to decide whether a string
is an infinite set.
336
00:17:27,040 --> 00:17:30,790
So we have to do something
more clever in our algorithm.
337
00:17:30,790 --> 00:17:33,535
But that's for next week.
So see you next week. Bye.
338
00:17:38,535 --> 00:17:41,680
Okay, just to troll you.
Here's one more slide.
339
00:17:41,680 --> 00:17:45,850
So when you go over the
handouts and the videos,
340
00:17:45,850 --> 00:17:47,875
think about this example.
341
00:17:47,875 --> 00:17:49,840
Imagine you have a language A,
342
00:17:49,840 --> 00:17:51,970
which contains four strings.
343
00:17:51,970 --> 00:17:53,725
First string is just the character a,
344
00:17:53,725 --> 00:17:57,445
second string is just
the character b, and so on.
345
00:17:57,445 --> 00:18:02,335
How many strings are there
in A to the power four.
346
00:18:02,335 --> 00:18:05,210
Okay, that should be
relatively simple.
347
00:18:05,210 --> 00:18:07,310
If not, just try it out by
348
00:18:07,310 --> 00:18:11,165
implementing it in Scala and see
how many strings there are.
349
00:18:11,165 --> 00:18:13,850
But now the more
interesting question,
350
00:18:13,850 --> 00:18:16,670
imagine the same language,
351
00:18:16,670 --> 00:18:19,400
except that instead
of the character d,
352
00:18:19,400 --> 00:18:22,250
we have here, the empty string.
353
00:18:22,250 --> 00:18:26,630
How many strings are in
A to the power four?
354
00:18:26,630 --> 00:18:31,320
Okay, I'll let you think
about this. Bye now.