videos/01-basics2.srt
author Christian Urban <christian.urban@kcl.ac.uk>
Wed, 29 May 2024 13:25:30 +0100
changeset 960 c7009356ddd8
parent 766 e8402d8ec8e6
permissions -rw-r--r--
updated

1
00:00:05,810 --> 00:00:11,039
Okay, so we have strings.
The empty string and string "abc".

2
00:00:11,039 --> 00:00:13,200
And we can take the head

3
00:00:13,200 --> 00:00:15,615
of a string and the
tail of the string,

4
00:00:15,615 --> 00:00:18,480
since we regard them as
lists of characters.

5
00:00:18,480 --> 00:00:22,230
We also introduced the
notion of a language.

6
00:00:22,230 --> 00:00:25,305
A language is just
a set of strings.

7
00:00:25,305 --> 00:00:26,700
So here's a language.

8
00:00:26,700 --> 00:00:28,275
It contains the empty string,

9
00:00:28,275 --> 00:00:30,360
the string hello, string foobar,

10
00:00:30,360 --> 00:00:31,965
the string, which is

11
00:00:31,965 --> 00:00:34,874
just a character a and
the string abc.

12
00:00:34,874 --> 00:00:37,800
There will be also other
languages, which for example,

13
00:00:37,800 --> 00:00:39,780
contain infinitely many strings.

14
00:00:39,780 --> 00:00:42,190
Actually that will
happen quite often.

15
00:00:42,190 --> 00:00:45,560
Now, you've seen this operation

16
00:00:45,560 --> 00:00:47,660
of concatenation of two strings.

17
00:00:47,660 --> 00:00:50,840
So if you have the string
foo and a string bar,

18
00:00:50,840 --> 00:00:53,255
we can just put them
next to each other.

19
00:00:53,255 --> 00:00:57,725
We will now extend that
notion also told languages.

20
00:00:57,725 --> 00:01:02,270
So assume you have a
language A and a language B.

21
00:01:02,270 --> 00:01:05,825
That means we take every
string from the language

22
00:01:05,825 --> 00:01:11,195
A and concatenate it with every
string of the language B.

23
00:01:11,195 --> 00:01:13,160
Here's an example.

24
00:01:13,160 --> 00:01:14,750
If you have the language A

25
00:01:14,750 --> 00:01:17,030
containing foo and bar as strings,

26
00:01:17,030 --> 00:01:20,585
and the language B containing
a and b as strings.

27
00:01:20,585 --> 00:01:23,000
Then concatenating
these two languages

28
00:01:23,000 --> 00:01:27,110
means I take foo with this a and b,

29
00:01:27,110 --> 00:01:28,625
giving fooa and foob,

30
00:01:28,625 --> 00:01:30,905
and I take bar and
concatenated with

31
00:01:30,905 --> 00:01:34,160
a and b, giving bara and barb.

32
00:01:34,160 --> 00:01:36,185
So the pointer is we're

33
00:01:36,185 --> 00:01:39,110
overloading this notion
of concatenating

34
00:01:39,110 --> 00:01:45,020
two strings also to
concatenating two languages.

35
00:01:45,020 --> 00:01:48,890
Okay, here are two corner
cases of this definition.

36
00:01:48,890 --> 00:01:51,395
What happens if I
have any language,

37
00:01:51,395 --> 00:01:54,470
say A, and I concatenate it with

38
00:01:54,470 --> 00:01:58,640
the language which contains
only the empty string.

39
00:01:58,640 --> 00:02:02,270
And secondly, if I
have any language,

40
00:02:02,270 --> 00:02:04,220
say again A, and I

41
00:02:04,220 --> 00:02:05,870
concatenate it with the language

42
00:02:05,870 --> 00:02:08,345
which doesn't contain any string.

43
00:02:08,345 --> 00:02:11,720
How would these two
cases be defined?

44
00:02:11,720 --> 00:02:16,055
Remember, this language here
contains a single element,

45
00:02:16,055 --> 00:02:17,645
namely the empty string,

46
00:02:17,645 --> 00:02:20,210
while this language 

47
00:02:20,210 --> 00:02:23,585
does not contain any
string. It is empty.

48
00:02:23,585 --> 00:02:26,930
So these are two
different languages.

49
00:02:26,930 --> 00:02:29,960
Think about how this
would be defined.

50
00:02:29,960 --> 00:02:33,455
So what's the point of
all these definitions?

51
00:02:33,455 --> 00:02:36,110
Well, we want to make precise

52
00:02:36,110 --> 00:02:38,735
what is the meaning of
a regular expression.

53
00:02:38,735 --> 00:02:41,510
For this, we use
this function L. It

54
00:02:41,510 --> 00:02:44,180
will take a regular expression as

55
00:02:44,180 --> 00:02:47,150
argument and generates a

56
00:02:47,150 --> 00:02:49,970
set of strings or a language.

57
00:02:49,970 --> 00:02:52,670
And it's supposed to be the
language which contains

58
00:02:52,670 --> 00:02:56,330
all the strings this regular
expression can match.

59
00:02:56,330 --> 00:02:58,670
So remember the 0
regular expression,

60
00:02:58,670 --> 00:03:01,115
it's not supposed to
match any string.

61
00:03:01,115 --> 00:03:05,105
So we give it the empty
language or empty set.

62
00:03:05,105 --> 00:03:06,740
This regular expression,

63
00:03:06,740 --> 00:03:09,380
the one regular expression
is supposed to match

64
00:03:09,380 --> 00:03:13,445
exactly one string,
namely the empty string.

65
00:03:13,445 --> 00:03:15,710
the regular expression

66
00:03:15,710 --> 00:03:18,875
recognizing just the
character c, in this case.

67
00:03:18,875 --> 00:03:22,820
Well, that will be the
language which contains

68
00:03:22,820 --> 00:03:28,175
the string only containing
the single character c. 

69
00:03:28,175 --> 00:03:31,295
Now, what is this regular
expression supposed to match?

70
00:03:31,295 --> 00:03:34,835
Well, a string is matched
by this regular expression

71
00:03:34,835 --> 00:03:37,355
if it is matched by
the first component, r1,

72
00:03:37,355 --> 00:03:40,970
or by the second component, r2.

73
00:03:40,970 --> 00:03:43,190
That's why this plus
is the alternative.

74
00:03:43,190 --> 00:03:45,605
So which are the strings

75
00:03:45,605 --> 00:03:48,320
this regular expression
can match? Well,

76
00:03:48,320 --> 00:03:51,275
all the strings
r1 one can match,

77
00:03:51,275 --> 00:03:54,170
given by L(r1) union...

78
00:03:54,170 --> 00:03:57,380
all the strings
R2 can match.

79
00:03:57,380 --> 00:04:01,325
And we use here the union
because this alternative,

80
00:04:01,325 --> 00:04:05,945
even if it says this string
is matched by r1 or by r2.

81
00:04:05,945 --> 00:04:08,465
The meaning of this
reg expression

82
00:04:08,465 --> 00:04:11,209
is the union of both languages.

83
00:04:11,209 --> 00:04:14,315
Now the sequence case.

84
00:04:14,315 --> 00:04:17,960
This means a string is
matched by this regular expression

85
00:04:17,960 --> 00:04:20,510
if the first part of this
string is matched by r1

86
00:04:20,510 --> 00:04:24,935
and the second part of
the string is matched by r2.

87
00:04:24,935 --> 00:04:28,490
Now we have to say, which
are all the strings

88
00:04:28,490 --> 00:04:31,645
this regular expression
can match. Well, it's

89
00:04:31,645 --> 00:04:34,495
all the strings
r1 can match,

90
00:04:34,495 --> 00:04:39,205
concatenated with all the
strings r2 can match.

91
00:04:39,205 --> 00:04:42,895
So for this, we have to use
this concatenation operation.

92
00:04:42,895 --> 00:04:47,440
So if r1 can match a string
and r2 can match a string,

93
00:04:47,440 --> 00:04:51,220
then in the meaning will be
the concatenation of that.

94
00:04:51,220 --> 00:04:53,170
So with this function L

95
00:04:53,170 --> 00:04:56,995
we can make precise
what are the strings,

96
00:04:56,995 --> 00:05:00,205
*all* the strings a regular
expression can match.

97
00:05:00,205 --> 00:05:04,689
The only case we haven't
specified yet is the r*.

98
00:05:04,689 --> 00:05:07,750
For this, we need
another operation

99
00:05:07,750 --> 00:05:12,470
on languages or sets of
strings, which we do next.

100
00:05:12,490 --> 00:05:17,285
We introduce a power
operation for languages.

101
00:05:17,285 --> 00:05:19,100
The power operation or

102
00:05:19,100 --> 00:05:22,835
the nth power is
defined recursively.

103
00:05:22,835 --> 00:05:26,615
So the A to the power of 0

104
00:05:26,615 --> 00:05:31,205
would be defined as the set
containing the empty string.

105
00:05:31,205 --> 00:05:36,200
And A to the power of
n + 1 would be A

106
00:05:36,200 --> 00:05:38,705
concatenated with
A to the power of

107
00:05:38,705 --> 00:05:42,160
n. Here are a few examples.

108
00:05:42,160 --> 00:05:45,380
A to the power of four
would be essentially

109
00:05:45,380 --> 00:05:47,660
A concatenated with
A concatenated with

110
00:05:47,660 --> 00:05:49,640
A concatenated with A,

111
00:05:49,640 --> 00:05:51,650
and also concatenated with

112
00:05:51,650 --> 00:05:55,580
the language which
contains the empty string.

113
00:05:55,580 --> 00:05:57,275
Because that's how we define

114
00:05:57,275 --> 00:05:59,975
the base case, A
to the power of 0.

115
00:05:59,975 --> 00:06:03,410
And A to the power
one would be just A,

116
00:06:03,410 --> 00:06:05,330
again followed by that one,

117
00:06:05,330 --> 00:06:07,790
but this would be just A.

118
00:06:07,790 --> 00:06:10,385
And A to the power 0

119
00:06:10,385 --> 00:06:14,270
is this language which
contains the empty string.

120
00:06:14,270 --> 00:06:16,670
With this power operation,

121
00:06:16,670 --> 00:06:19,505
I can also fill in this case.

122
00:06:19,505 --> 00:06:23,210
Remember this r* operation
is supposed to match

123
00:06:23,210 --> 00:06:27,680
a string with either
0 or more copies of r.

124
00:06:27,680 --> 00:06:31,940
So the meaning of this
regular expression would be

125
00:06:31,940 --> 00:06:37,475
now the Union of all the
powers greater or equal than 0,

126
00:06:37,475 --> 00:06:40,835
of the language that r can match.

127
00:06:40,835 --> 00:06:44,945
And we take now at the
union of all these powers,

128
00:06:44,945 --> 00:06:47,150
that means essentially what

129
00:06:47,150 --> 00:06:48,530
can the regular expression match

130
00:06:48,530 --> 00:06:51,440
if I take L of r
to the 0th power,

131
00:06:51,440 --> 00:06:53,030
what can I match with

132
00:06:53,030 --> 00:06:57,305
the first power, the
second power, and so on.

133
00:06:57,305 --> 00:07:00,544
And I take the union
of all these powers.

134
00:07:00,544 --> 00:07:03,830
That's the definition
of which strings

135
00:07:03,830 --> 00:07:05,510
this regular expression
is supposed to

136
00:07:05,510 --> 00:07:08,510
match. The meaning of
this regular expression.

137
00:07:08,510 --> 00:07:11,300
This is such an
important definition,

138
00:07:11,300 --> 00:07:13,250
that there's actually
a name for it.

139
00:07:13,250 --> 00:07:15,020
It's called the Kleene star.

140
00:07:15,020 --> 00:07:16,400
And it's written like this.

141
00:07:16,400 --> 00:07:18,335
If I have a language A,

142
00:07:18,335 --> 00:07:21,785
I can build the star
of this language

143
00:07:21,785 --> 00:07:26,255
by union up all the powers
greater or equal than 0.

144
00:07:26,255 --> 00:07:28,580
The A-star of

145
00:07:28,580 --> 00:07:31,580
a language would expand
to a to the power 0,

146
00:07:31,580 --> 00:07:33,770
union A to the power of one,

147
00:07:33,770 --> 00:07:36,665
A to the power of two, and so on.

148
00:07:36,665 --> 00:07:39,230
And it would
essentially mean, well,

149
00:07:39,230 --> 00:07:43,460
it's the language which contains
the empty string because

150
00:07:43,460 --> 00:07:48,290
of the A to the 0, one copy of A,

151
00:07:48,290 --> 00:07:51,020
two concatenated copies of A,

152
00:07:51,020 --> 00:07:55,070
three concatenated
copies of A, and so on.

153
00:07:55,070 --> 00:07:59,060
So that's what this A-star
is defined as.

154
00:07:59,060 --> 00:08:03,725
Essentially as the Union
of all the powers.

155
00:08:03,725 --> 00:08:05,990
And it's essentially
each power is how many

156
00:08:05,990 --> 00:08:08,750
times I have to
concatenate this language A.

157
00:08:08,750 --> 00:08:13,549
And note that this language
A-star will always contain

158
00:08:13,549 --> 00:08:21,240
the empty string because that
is how the A^0 is defined.

159
00:08:21,310 --> 00:08:23,540
So in this definition,

160
00:08:23,540 --> 00:08:25,969
I could have also written star

161
00:08:25,969 --> 00:08:29,180
here because the
meaning of this r*,

162
00:08:29,180 --> 00:08:34,055
regular expression
is the Kleene star of

163
00:08:34,055 --> 00:08:37,130
the language of r.
Remember that's

164
00:08:37,130 --> 00:08:41,525
the union of all powers
greater or equal than 0.

165
00:08:41,525 --> 00:08:43,640
It's behind.
I hope you don't get

166
00:08:43,640 --> 00:08:45,800
confused by the use of the stars.

167
00:08:45,800 --> 00:08:48,845
This star is for
the regular expressions.

168
00:08:48,845 --> 00:08:52,205
That star is for languages.

169
00:08:52,205 --> 00:08:54,274
They are two
different operations.

170
00:08:54,274 --> 00:08:56,555
They don't even
have the same type.

171
00:08:56,555 --> 00:08:58,954
Here might be
something interesting.

172
00:08:58,954 --> 00:09:00,560
Remember I asked you to think

173
00:09:00,560 --> 00:09:03,035
about these two corner cases.

174
00:09:03,035 --> 00:09:04,730
I hope you have done so,

175
00:09:04,730 --> 00:09:07,070
but let's also have look
at this together.

176
00:09:07,070 --> 00:09:09,785
According to the definition

177
00:09:09,785 --> 00:09:11,930
for this append of languages,

178
00:09:11,930 --> 00:09:13,670
I have to take every string in

179
00:09:13,670 --> 00:09:16,130
this set and have

180
00:09:16,130 --> 00:09:18,695
two concatenated with
every string in that set.

181
00:09:18,695 --> 00:09:22,115
So this set contains only
the empty string as element.

182
00:09:22,115 --> 00:09:24,820
So if I concatenate anything in

183
00:09:24,820 --> 00:09:27,745
there with the empty string,
that will be left untouched.

184
00:09:27,745 --> 00:09:31,855
So this one will be
actually A.

185
00:09:31,855 --> 00:09:34,600
This one I have to
take every string in

186
00:09:34,600 --> 00:09:36,190
this language and I have to

187
00:09:36,190 --> 00:09:39,115
concatenate with every
string in that language.

188
00:09:39,115 --> 00:09:41,110
But here isn't any string.

189
00:09:41,110 --> 00:09:43,300
So I can't concatenate that.

190
00:09:43,300 --> 00:09:46,900
That will be actually
the empty set (not empty string).

191
00:09:46,900 --> 00:09:49,570
So now let's have

192
00:09:49,570 --> 00:09:51,670
a look at regular expressions.

193
00:09:51,670 --> 00:09:53,230
That was with languages.

194
00:09:53,230 --> 00:09:56,035
But let's have a look at
regular expressions now.

195
00:09:56,035 --> 00:09:58,660
What would be the
meaning, for example,

196
00:09:58,660 --> 00:10:01,945
of r followed by 1?

197
00:10:01,945 --> 00:10:04,149
That is a regular expression.

198
00:10:04,149 --> 00:10:06,255
And it essentially says:

199
00:10:06,255 --> 00:10:07,850
this regular expression matches

200
00:10:07,850 --> 00:10:10,685
some strings and this matches
the empty string.

201
00:10:10,685 --> 00:10:13,730
So if you find out
what the meaning is,

202
00:10:13,730 --> 00:10:16,955
we would apply this
L-function to it.

203
00:10:16,955 --> 00:10:19,430
And if we now look
up the definition,

204
00:10:19,430 --> 00:10:27,360
that would be L of r
appended L of 1.

205
00:10:27,850 --> 00:10:32,255
If you look up what
this is defined,

206
00:10:32,255 --> 00:10:41,640
then this would be L of r
appended with this language.

207
00:10:41,950 --> 00:10:46,325
And if you now look up
our definition earlier,

208
00:10:46,325 --> 00:10:50,455
that will be just L of r.

209
00:10:50,455 --> 00:10:52,690
What that essentially

210
00:10:52,690 --> 00:10:55,765
means is if you have
this regular expression,

211
00:10:55,765 --> 00:10:58,885
this will match
exactly those strings

212
00:10:58,885 --> 00:11:01,000
which this regular
expression can match.

213
00:11:01,000 --> 00:11:04,794
And that pretty much
coincides with our intuition.

214
00:11:04,794 --> 00:11:05,950
This is supposed to match

215
00:11:05,950 --> 00:11:08,410
the empty string at
the end of the string,

216
00:11:08,410 --> 00:11:11,005
so it doesn't really
make any difference.

217
00:11:11,005 --> 00:11:13,480
And the meaning
really tells us that.

218
00:11:13,480 --> 00:11:15,880
But here's the
interesting thing.

219
00:11:15,880 --> 00:11:17,979
When you were in school,

220
00:11:17,979 --> 00:11:23,124
how would you think
about this expression?

221
00:11:23,124 --> 00:11:29,515
Well, r times 1
equals just our, okay?

222
00:11:29,515 --> 00:11:33,634
Furthermore, if you had r times 0.

223
00:11:33,634 --> 00:11:35,045
What would that be equal?

224
00:11:35,045 --> 00:11:37,205
Well, it would be 0.

225
00:11:37,205 --> 00:11:39,650
Ok, let's have

226
00:11:39,650 --> 00:11:42,605
a look how that works out
with regular expressions.

227
00:11:42,605 --> 00:11:46,355
So if you take r followed by 0,

228
00:11:46,355 --> 00:11:48,260
remember this is
the regular expression

229
00:11:48,260 --> 00:11:49,895
that cannot match anything.

230
00:11:49,895 --> 00:11:52,415
By unfolding the definition,

231
00:11:52,415 --> 00:11:59,104
I would get L of r
appended with L of 0.

232
00:11:59,104 --> 00:12:01,775
This is of course defined as L of r

233
00:12:01,775 --> 00:12:05,915
appended with the empty set.

234
00:12:05,915 --> 00:12:10,760
And this would, according
to the definition earlier,

235
00:12:10,760 --> 00:12:13,830
well just the empty set.

236
00:12:14,160 --> 00:12:20,260
And this would be
of course L of 0.

237
00:12:20,260 --> 00:12:24,580
So it seems like these laws,

238
00:12:24,580 --> 00:12:27,175
at least for the times,

239
00:12:27,175 --> 00:12:29,785
seem to be valid from math,

240
00:12:29,785 --> 00:12:31,420
are also valid with regular expressions,

241
00:12:31,420 --> 00:12:33,370
if we look at their meaning.

242
00:12:33,370 --> 00:12:36,670
These laws on natural numbers

243
00:12:36,670 --> 00:12:40,105
and regular expressions and
their close correspondance

244
00:12:40,105 --> 00:12:42,460
probably explain why I use 

245
00:12:42,460 --> 00:12:46,975
a times for the sequence
regular expression.

246
00:12:46,975 --> 00:12:51,040
So r followed by the
regular expression 1,

247
00:12:51,040 --> 00:12:54,505
will have the same meaning
as that regular expression.

248
00:12:54,505 --> 00:12:59,120
And r followed by the
0 regular expression

249
00:12:59,120 --> 00:13:01,295
will have this meaning.

250
00:13:01,295 --> 00:13:03,590
You might now think, but I also

251
00:13:03,590 --> 00:13:06,605
wrote the alternative with a plus.

252
00:13:06,605 --> 00:13:10,370
Does a similar law
holds for plus?

253
00:13:10,370 --> 00:13:15,965
Of course, if I take r
plus 0 on natural numbers,

254
00:13:15,965 --> 00:13:20,060
that would be just r. Does
hold for regular expressions?

255
00:13:20,060 --> 00:13:22,085
Yes, indeed it holds.

256
00:13:22,085 --> 00:13:26,735
If I have this
regular expression and

257
00:13:26,735 --> 00:13:33,245
unfold the definition that
would be L(r) union L(0).

258
00:13:33,245 --> 00:13:38,060
This would be equal
to L(r) union...

259
00:13:38,060 --> 00:13:41,150
What is this? Well, that's
just the empty set.

260
00:13:41,150 --> 00:13:43,760
And if I union something
with the empty set,

261
00:13:43,760 --> 00:13:47,180
then this will be
just of L(r). So yes,

262
00:13:47,180 --> 00:13:50,120
indeed, plus is also very much

263
00:13:50,120 --> 00:13:54,184
inspired by the laws
on natural numbers.

264
00:13:54,184 --> 00:13:57,065
There exists other notations too,

265
00:13:57,065 --> 00:14:01,310
but I prefer this
using the plus for

266
00:14:01,310 --> 00:14:03,844
the alternatives and the times

267
00:14:03,844 --> 00:14:05,765
for the sequence expression.

268
00:14:05,765 --> 00:14:07,460
It's also the reason why I call

269
00:14:07,460 --> 00:14:10,325
this regular expression for
the empty string 1.

270
00:14:10,325 --> 00:14:14,915
And for matching
nothing at all 0.

271
00:14:14,915 --> 00:14:18,665
This correspondence to
natural numbers doesn't quite

272
00:14:18,665 --> 00:14:22,055
extend to the star
regular expression.

273
00:14:22,055 --> 00:14:25,055
But there's still a
connection. There too.

274
00:14:25,055 --> 00:14:26,510
Can you remember how

275
00:14:26,510 --> 00:14:29,345
the power operation on
languages was defined?

276
00:14:29,345 --> 00:14:31,370
The 0 case was defined

277
00:14:31,370 --> 00:14:34,520
as the set containing
the empty string.

278
00:14:34,520 --> 00:14:37,039
Why is that? It looks
a bit arbitrary.

279
00:14:37,039 --> 00:14:41,150
Why is it not the empty set
or why is it not defined as A?

280
00:14:41,150 --> 00:14:43,745
Why is defined in this
particular way?

281
00:14:43,745 --> 00:14:46,880
Well, can you remember how

282
00:14:46,880 --> 00:14:48,950
the power operation r to the

283
00:14:48,950 --> 00:14:51,440
0 is defined on natural numbers?

284
00:14:51,440 --> 00:14:53,930
Yes, that's defined as 1.

285
00:14:53,930 --> 00:14:57,125
Does that also apply to
regular expressions?

286
00:14:57,125 --> 00:15:00,725
Well, if we take the meaning
of a regular expression,

287
00:15:00,725 --> 00:15:04,730
let's say r, and raise
it to the 0th power,

288
00:15:04,730 --> 00:15:07,685
then this will be, of
course, by definition,

289
00:15:07,685 --> 00:15:12,245
be defined as the set 
containing the empty string.

290
00:15:12,245 --> 00:15:17,160
And that is of course
equal to L(1).

291
00:15:17,830 --> 00:15:20,570
Again, you can see that

292
00:15:20,570 --> 00:15:23,300
this law on natural numbers also

293
00:15:23,300 --> 00:15:29,000
holds on the laws of regular
expressions - on heir meaning.

294
00:15:29,000 --> 00:15:32,810
What I find really beautiful
here is that of course,

295
00:15:32,810 --> 00:15:36,244
the meaning is defined on
sets, sets of strings.

296
00:15:36,244 --> 00:15:38,975
This of course, on natural numbers.

297
00:15:38,975 --> 00:15:41,060
You can probably
see now where I'm

298
00:15:41,060 --> 00:15:43,085
coming from with my notation.

299
00:15:43,085 --> 00:15:46,205
That is actually a slight
problem in this field,

300
00:15:46,205 --> 00:15:48,590
since it's so old.

301
00:15:48,590 --> 00:15:52,205
Pretty much everybody
introduced their own notation.

302
00:15:52,205 --> 00:15:53,870
And you now have heaps of

303
00:15:53,870 --> 00:15:55,550
different notations
for the same thing.

304
00:15:55,550 --> 00:15:57,379
Okay, that is slightly
exaggerating.

305
00:15:57,379 --> 00:16:00,830
And also the notation
I used for times and

306
00:16:00,830 --> 00:16:04,295
plus and 0 and 1s...definitely
I'm not the only one.

307
00:16:04,295 --> 00:16:05,435
There are many people

308
00:16:05,435 --> 00:16:07,625
who have used this
notation as well.

309
00:16:07,625 --> 00:16:10,190
It's just, it's not universal.

310
00:16:10,190 --> 00:16:12,290
Well, here's a question now.

311
00:16:12,290 --> 00:16:15,635
Why did we go through this
kerfuffle in the first place?

312
00:16:15,635 --> 00:16:19,370
Why did we introduce these
operations on languages?

313
00:16:19,370 --> 00:16:21,260
And why did we introduce

314
00:16:21,260 --> 00:16:23,255
the meaning of a
regular expression?

315
00:16:23,255 --> 00:16:26,300
Well, remember at the beginning,

316
00:16:26,300 --> 00:16:28,265
we wanted to specify

317
00:16:28,265 --> 00:16:31,880
what problem our algorithms
actually supposed to solve.

318
00:16:31,880 --> 00:16:35,960
And we want to make that
very precise so that we can

319
00:16:35,960 --> 00:16:38,555
be sure that our implementation

320
00:16:38,555 --> 00:16:40,295
really solves the problem.

321
00:16:40,295 --> 00:16:41,540
And that's what you can do now.

322
00:16:41,540 --> 00:16:45,605
We can say a regular
expression, r say,

323
00:16:45,605 --> 00:16:50,015
is matching a string
s if and only if

324
00:16:50,015 --> 00:16:55,474
the string s is in the language
of the regular expression.

325
00:16:55,474 --> 00:17:00,770
So that is the problem our
matcher is supposed to solve.

326
00:17:00,770 --> 00:17:03,320
And it's supposed to
solve that a bit faster

327
00:17:03,320 --> 00:17:06,860
than in Python, Ruby and Java.

328
00:17:06,860 --> 00:17:09,585
And unfortunately we cannot use

329
00:17:09,585 --> 00:17:12,260
the definition of L
directly for that.

330
00:17:12,260 --> 00:17:15,815
Because remember, in
the case of the star,

331
00:17:15,815 --> 00:17:17,690
sometimes the meaning of

332
00:17:17,690 --> 00:17:18,830
a regular expression is actually

333
00:17:18,830 --> 00:17:20,925
an infinite set of strings.

334
00:17:20,925 --> 00:17:23,575
And it's a tiny bit difficult

335
00:17:23,575 --> 00:17:27,040
to decide whether a string
is an infinite set.

336
00:17:27,040 --> 00:17:30,790
So we have to do something
more clever in our algorithm.

337
00:17:30,790 --> 00:17:33,535
But that's for next week.
So see you next week. Bye.

338
00:17:38,535 --> 00:17:41,680
Okay, just to troll you.
Here's one more slide.

339
00:17:41,680 --> 00:17:45,850
So when you go over the
handouts and the videos,

340
00:17:45,850 --> 00:17:47,875
think about this example.

341
00:17:47,875 --> 00:17:49,840
Imagine you have a language A,

342
00:17:49,840 --> 00:17:51,970
which contains four strings.

343
00:17:51,970 --> 00:17:53,725
First string is just the character a,

344
00:17:53,725 --> 00:17:57,445
second string is just
the character b, and so on.

345
00:17:57,445 --> 00:18:02,335
How many strings are there
in A to the power four.

346
00:18:02,335 --> 00:18:05,210
Okay, that should be
relatively simple.

347
00:18:05,210 --> 00:18:07,310
If not, just try it out by

348
00:18:07,310 --> 00:18:11,165
implementing it in Scala and see
how many strings there are.

349
00:18:11,165 --> 00:18:13,850
But now the more
interesting question,

350
00:18:13,850 --> 00:18:16,670
imagine the same language,

351
00:18:16,670 --> 00:18:19,400
except that instead
of the character d,

352
00:18:19,400 --> 00:18:22,250
we have here, the empty string.

353
00:18:22,250 --> 00:18:26,630
How many strings are in
A to the power four?

354
00:18:26,630 --> 00:18:31,320
Okay, I'll let you think
about this. Bye now.