videos/02-proof.srt
author Christian Urban <christian.urban@kcl.ac.uk>
Wed, 16 Oct 2024 13:14:13 +0100
changeset 968 d8d8911a3d6f
parent 776 939c10745a3a
permissions -rw-r--r--
updated

1
00:00:06,260 --> 00:00:10,275
I come back, I know
the topic of proofs

2
00:00:10,275 --> 00:00:13,950
is usually not the most
favorite topics with students.

3
00:00:13,950 --> 00:00:17,220
But I'm really passionate
about proofs in

4
00:00:17,220 --> 00:00:19,320
this module because they really

5
00:00:19,320 --> 00:00:21,885
help you with understanding
what's going on.

6
00:00:21,885 --> 00:00:25,170
And very often they help
you with preventing errors.

7
00:00:25,170 --> 00:00:28,695
You seen me earlier doing
these calculations.

8
00:00:28,695 --> 00:00:30,210
And if I hadn't had

9
00:00:30,210 --> 00:00:33,330
the safety net of
knowing what I'm doing,

10
00:00:33,330 --> 00:00:35,130
I could have easily gone wrong.

11
00:00:35,130 --> 00:00:36,390
And that's actually a problem is

12
00:00:36,390 --> 00:00:38,935
existing wreck Expression
Matching engines.

13
00:00:38,935 --> 00:00:41,330
Sometimes in corner
cases as we seen,

14
00:00:41,330 --> 00:00:43,130
they produce slow results,

15
00:00:43,130 --> 00:00:45,455
which is actually not so bad.

16
00:00:45,455 --> 00:00:48,020
Much worse is that
we also know they

17
00:00:48,020 --> 00:00:50,570
sometimes produce
incorrect results.

18
00:00:50,570 --> 00:00:51,980
So for us here,

19
00:00:51,980 --> 00:00:53,390
the proofs will be

20
00:00:53,390 --> 00:00:55,100
essentially the safety nets

21
00:00:55,100 --> 00:00:58,295
of making sure that our
algorithms actually correct.

22
00:00:58,295 --> 00:01:00,095
And that's actually
the beauty of that.

23
00:01:00,095 --> 00:01:03,170
It doesn't happen very often
in real life that we can

24
00:01:03,170 --> 00:01:06,500
actually prove the
correctness of our algorithm.

25
00:01:06,500 --> 00:01:08,405
But we can do this here.

26
00:01:08,405 --> 00:01:11,000
So we will start
a bit slow first,

27
00:01:11,000 --> 00:01:14,330
I will show you how
properties are stated.

28
00:01:14,330 --> 00:01:17,255
And then along the way
we will see how proofs

29
00:01:17,255 --> 00:01:20,940
actually can be performed
and how they can help you.

30
00:01:20,980 --> 00:01:25,205
You can probably remember this
slide from the beginning.

31
00:01:25,205 --> 00:01:29,300
Now we would probably stayed
the same property that

32
00:01:29,300 --> 00:01:32,420
our algorithm matches
the specification

33
00:01:32,420 --> 00:01:33,695
of what that means.

34
00:01:33,695 --> 00:01:35,450
Maybe slightly more mathematical,

35
00:01:35,450 --> 00:01:37,130
we would write
something like this.

36
00:01:37,130 --> 00:01:39,545
If our Marcia is given as

37
00:01:39,545 --> 00:01:42,740
a string direct expression
R and says yes,

38
00:01:42,740 --> 00:01:45,110
then this string
really needs to be in

39
00:01:45,110 --> 00:01:48,350
the language of R. And
if the matcher says no,

40
00:01:48,350 --> 00:01:50,120
then this string cannot be in

41
00:01:50,120 --> 00:01:52,325
the language of r. And we spent

42
00:01:52,325 --> 00:01:54,590
quite some effort to make precise

43
00:01:54,590 --> 00:01:57,065
what the language is
of a wreck expression.

44
00:01:57,065 --> 00:01:59,090
And we spend some effort to

45
00:01:59,090 --> 00:02:01,505
make precise what
our algorithm is.

46
00:02:01,505 --> 00:02:02,840
So we could now take

47
00:02:02,840 --> 00:02:06,875
this property and we could
generate some test cases,

48
00:02:06,875 --> 00:02:09,410
and that is all fine and good.

49
00:02:09,410 --> 00:02:13,340
Testing is a very important
thing for our software.

50
00:02:13,340 --> 00:02:16,715
But we can only do
this testing so far.

51
00:02:16,715 --> 00:02:19,385
Remember in the homework,

52
00:02:19,385 --> 00:02:21,275
there was a single string,

53
00:02:21,275 --> 00:02:23,750
abcd, and there were
actually infinitely

54
00:02:23,750 --> 00:02:27,530
many wreck expressions which
can match only that string.

55
00:02:27,530 --> 00:02:30,470
So testing something for
infinitely many things,

56
00:02:30,470 --> 00:02:32,000
that's a bit hard.

57
00:02:32,000 --> 00:02:36,710
And Summa, as soon as we have
an expression with a star,

58
00:02:36,710 --> 00:02:39,289
this l function returns,

59
00:02:39,289 --> 00:02:41,360
in the general case
an infinite set.

60
00:02:41,360 --> 00:02:43,940
And testing whether a
string is an infinite set,

61
00:02:43,940 --> 00:02:45,500
that's also a bit hard.

62
00:02:45,500 --> 00:02:47,510
So testing can only do so much.

63
00:02:47,510 --> 00:02:49,820
We could generate some
wreck expressions

64
00:02:49,820 --> 00:02:51,560
and we can generate some strings.

65
00:02:51,560 --> 00:02:53,990
And we can test
whether that is true.

66
00:02:53,990 --> 00:02:57,470
So B would be still
in the quandary that

67
00:02:57,470 --> 00:03:00,950
maybe there's a corner
case which doesn't work.

68
00:03:00,950 --> 00:03:03,110
So what we want to
achieve instead

69
00:03:03,110 --> 00:03:05,615
is you want to really make sure

70
00:03:05,615 --> 00:03:07,325
that this algorithm works for

71
00:03:07,325 --> 00:03:10,850
all reg expression
and for all strings.

72
00:03:10,850 --> 00:03:16,400
And that ONE proving will
help us to establish.

73
00:03:16,400 --> 00:03:19,040
Testing cannot do that.

74
00:03:19,040 --> 00:03:22,310
Remember, metro
mainly consisted of

75
00:03:22,310 --> 00:03:25,444
these functions,
nullable and derivative.

76
00:03:25,444 --> 00:03:27,755
So if you want to prove
anything about Metro.

77
00:03:27,755 --> 00:03:30,200
Bet on, make sure that
we know what actually

78
00:03:30,200 --> 00:03:33,890
the specification of
nullable and derivative is.

79
00:03:33,890 --> 00:03:35,750
That's actually quite
instructive to also

80
00:03:35,750 --> 00:03:37,850
understand how they work.

81
00:03:37,850 --> 00:03:39,695
So let's do that next.

82
00:03:39,695 --> 00:03:41,975
The central property of nullable,

83
00:03:41,975 --> 00:03:43,445
Actually that's not so hard.

84
00:03:43,445 --> 00:03:48,740
It's just says a function or
a reg expression is nullable

85
00:03:48,740 --> 00:03:50,360
if and only if

86
00:03:50,360 --> 00:03:52,160
the empty string is in

87
00:03:52,160 --> 00:03:54,410
the language that was the
purpose of this novel,

88
00:03:54,410 --> 00:03:55,670
it takes work, expression and

89
00:03:55,670 --> 00:03:58,490
test whether can match
the empty string.

90
00:03:58,490 --> 00:04:01,010
Meaning this empty string needs

91
00:04:01,010 --> 00:04:03,230
to be in the language
of R. And again,

92
00:04:03,230 --> 00:04:04,385
we have this if an only,

93
00:04:04,385 --> 00:04:05,885
if this one says yes,

94
00:04:05,885 --> 00:04:08,465
then the empty string needs
to be in this language.

95
00:04:08,465 --> 00:04:10,385
And if this nullable says no,

96
00:04:10,385 --> 00:04:13,085
the empty string CoMP
unless language.

97
00:04:13,085 --> 00:04:15,290
So that's relatively easy.

98
00:04:15,290 --> 00:04:18,110
The next one might be also
very instructive to look

99
00:04:18,110 --> 00:04:22,775
at what's the property the
derivative should satisfy.

100
00:04:22,775 --> 00:04:26,329
And remember, the derivative
was a kind of operation.

101
00:04:26,329 --> 00:04:28,850
We taking a wrecker
expression and we're looking

102
00:04:28,850 --> 00:04:31,955
at all the strings this
reg expression can match,

103
00:04:31,955 --> 00:04:35,510
but which starts with
C. And then me somehow

104
00:04:35,510 --> 00:04:37,250
looking for reg expression
which can match

105
00:04:37,250 --> 00:04:39,335
the same string starting with C,

106
00:04:39,335 --> 00:04:42,440
except that this
c is chopped off.

107
00:04:42,440 --> 00:04:47,150
And that actually saves
what this property states.

108
00:04:47,150 --> 00:04:50,300
Take the, take a
break expression are,

109
00:04:50,300 --> 00:04:53,045
and build the derivative
according to see,

110
00:04:53,045 --> 00:04:56,150
okay, so this gives
us a reg expression,

111
00:04:56,150 --> 00:04:58,025
the derivative of that

112
00:04:58,025 --> 00:05:01,775
r. Now we take the
a language of that,

113
00:05:01,775 --> 00:05:06,620
are that supposed to be
the same set as if I take

114
00:05:06,620 --> 00:05:12,845
the language of R and then
build the semantic derivative.

115
00:05:12,845 --> 00:05:16,055
Remember the semantic derivative
was doing exactly that.

116
00:05:16,055 --> 00:05:18,380
It goes through all
the strings in this

117
00:05:18,380 --> 00:05:21,350
that filters out everything which

118
00:05:21,350 --> 00:05:24,770
doesn't start with C.
So throws them away of

119
00:05:24,770 --> 00:05:26,510
the remaining strings
which start with

120
00:05:26,510 --> 00:05:28,670
C. It chops off so that C,

121
00:05:28,670 --> 00:05:31,370
So this will be exactly

122
00:05:31,370 --> 00:05:34,205
what our derivative
is supposed to do.

123
00:05:34,205 --> 00:05:36,605
And yes, if you can
establish that,

124
00:05:36,605 --> 00:05:38,450
then the language of

125
00:05:38,450 --> 00:05:43,355
this derivative operation
will produce the same sat.

126
00:05:43,355 --> 00:05:45,815
This actually also
how I'm going to

127
00:05:45,815 --> 00:05:49,760
test in cases icons
yet at the beginning,

128
00:05:49,760 --> 00:05:52,910
how your coursework is correct.

129
00:05:52,910 --> 00:05:55,625
If you give me a
definition for derivative

130
00:05:55,625 --> 00:05:58,730
of one of these bounded
record expressions,

131
00:05:58,730 --> 00:06:01,085
and I can't see
whether it's correct.

132
00:06:01,085 --> 00:06:03,380
Then what it will do
is I will just try to

133
00:06:03,380 --> 00:06:05,675
establish that this
property holds.

134
00:06:05,675 --> 00:06:07,265
And if I can establish that,

135
00:06:07,265 --> 00:06:08,870
then you coursework is find.

136
00:06:08,870 --> 00:06:10,550
And if I can't establish that,

137
00:06:10,550 --> 00:06:12,410
then usually it shows

138
00:06:12,410 --> 00:06:14,360
there is a counterexample
somewhere.

139
00:06:14,360 --> 00:06:19,220
Because this derivative,
as you've seen desk times,

140
00:06:19,220 --> 00:06:21,725
more than one solution
you can give.

141
00:06:21,725 --> 00:06:23,915
Some of them are more
efficient than others.

142
00:06:23,915 --> 00:06:25,685
But in terms of correctness,

143
00:06:25,685 --> 00:06:30,680
there are multiple solutions
for the coursework.

144
00:06:30,680 --> 00:06:34,145
And in the past I really had
to scratch my head and say,

145
00:06:34,145 --> 00:06:36,995
Is that definition
really correct?

146
00:06:36,995 --> 00:06:39,620
And so what I did is
I just went ahead and

147
00:06:39,620 --> 00:06:42,785
tried to prove that or
establish this property.

148
00:06:42,785 --> 00:06:45,110
The reason why we writing

149
00:06:45,110 --> 00:06:47,300
these properties
down so precisely,

150
00:06:47,300 --> 00:06:49,880
remember we know here
what the derivative is,

151
00:06:49,880 --> 00:06:51,290
we know what the
languages we know

152
00:06:51,290 --> 00:06:53,495
what this semantic derivative is.

153
00:06:53,495 --> 00:06:56,960
This is usually when you are
trying to prove something.

154
00:06:56,960 --> 00:06:58,715
And very important step.

155
00:06:58,715 --> 00:07:01,385
If you know what you're
actually trying to prove,

156
00:07:01,385 --> 00:07:05,510
then you have already
done half of the proof.

157
00:07:05,510 --> 00:07:08,300
Remember, we want to
establish these properties

158
00:07:08,300 --> 00:07:10,460
not just for some reg expression,

159
00:07:10,460 --> 00:07:13,940
but for all wrecker expression
and for all characters.

160
00:07:13,940 --> 00:07:17,225
So proving is the only
method to achieve that.

161
00:07:17,225 --> 00:07:18,590
And I said at the beginning of

162
00:07:18,590 --> 00:07:21,170
this video's proving is

163
00:07:21,170 --> 00:07:23,810
not the most favorite
subject of students.

164
00:07:23,810 --> 00:07:25,730
But I can promise you,

165
00:07:25,730 --> 00:07:29,630
if you just follow a very
simple minded recipe,

166
00:07:29,630 --> 00:07:33,695
that brings, you're already
very far in these proofs.

167
00:07:33,695 --> 00:07:35,720
Of course, there's
always some creativity

168
00:07:35,720 --> 00:07:37,070
involved in these proofs.

169
00:07:37,070 --> 00:07:39,380
But a simple-minded
recipe you can

170
00:07:39,380 --> 00:07:42,620
follow brings you're already
very close to the end.

171
00:07:42,620 --> 00:07:46,610
So let's have a look how
this recipe can be derived.

172
00:07:46,610 --> 00:07:48,740
Establishing a property is for

173
00:07:48,740 --> 00:07:50,900
all Rekha expression nearly

174
00:07:50,900 --> 00:07:55,805
always requires an induction
over wreck expressions.

175
00:07:55,805 --> 00:07:57,350
And so we have to look at what's

176
00:07:57,350 --> 00:07:59,150
the recipe for doing inductions

177
00:07:59,150 --> 00:08:02,390
over wreck expressions
and inductions.

178
00:08:02,390 --> 00:08:03,590
You probably remember from

179
00:08:03,590 --> 00:08:07,100
school time you did induction
over natural numbers.

180
00:08:07,100 --> 00:08:11,120
You had to establish that
some property holds for 0.

181
00:08:11,120 --> 00:08:13,775
And you had to establish it

182
00:08:13,775 --> 00:08:14,960
that if you assume that

183
00:08:14,960 --> 00:08:16,970
the property holds
for the N case,

184
00:08:16,970 --> 00:08:21,365
it has to also hold for
the n plus one case.

185
00:08:21,365 --> 00:08:24,350
And induction over wreck
expressions is very,

186
00:08:24,350 --> 00:08:26,810
very similar to this
kind of induction.

187
00:08:26,810 --> 00:08:30,754
Just that in case of
break expressions,

188
00:08:30,754 --> 00:08:33,200
we have essentially
three base cases.

189
00:08:33,200 --> 00:08:35,360
So in our grandma
or we say it rec,

190
00:08:35,360 --> 00:08:38,705
expressions are 01
and characters.

191
00:08:38,705 --> 00:08:42,260
And these will be the base
cases in our induction.

192
00:08:42,260 --> 00:08:43,895
So we have to establish

193
00:08:43,895 --> 00:08:46,475
any property we want
to prove by induction,

194
00:08:46,475 --> 00:08:49,055
you have to show
that it holds for 0,

195
00:08:49,055 --> 00:08:50,495
it holds for one,

196
00:08:50,495 --> 00:08:52,505
and it holds for the character.

197
00:08:52,505 --> 00:08:57,350
So there's nothing else we
can assume in these cases.

198
00:08:57,350 --> 00:08:59,300
We just have to prove
that this property

199
00:08:59,300 --> 00:09:01,835
holds in these three base cases.

200
00:09:01,835 --> 00:09:04,490
It is very much the
same as if you proved

201
00:09:04,490 --> 00:09:06,050
something over natural numbers

202
00:09:06,050 --> 00:09:08,270
and you had to prove it for 0.

203
00:09:08,270 --> 00:09:10,880
Now, the sequence case,

204
00:09:10,880 --> 00:09:12,755
the alternative in a star.

205
00:09:12,755 --> 00:09:15,815
These are the kind
of inductive cases.

206
00:09:15,815 --> 00:09:17,750
Remember in natural numbers,

207
00:09:17,750 --> 00:09:20,450
you could assume that
a property holds for

208
00:09:20,450 --> 00:09:24,770
n and then you had to
establish it for n plus one.

209
00:09:24,770 --> 00:09:26,945
This is here very simmer.

210
00:09:26,945 --> 00:09:29,000
You can assume that

211
00:09:29,000 --> 00:09:30,920
these properties hold for

212
00:09:30,920 --> 00:09:33,140
the components of
these reg expression.

213
00:09:33,140 --> 00:09:35,840
So you can assume that it
holds for this on one,

214
00:09:35,840 --> 00:09:38,960
and you can assume that
it holds for this R2.

215
00:09:38,960 --> 00:09:41,090
And your task is now to

216
00:09:41,090 --> 00:09:44,240
establish this property
also for sequences.

217
00:09:44,240 --> 00:09:46,255
The same for alternative soup.

218
00:09:46,255 --> 00:09:48,710
Assume that it holds
already for R1.

219
00:09:48,710 --> 00:09:51,005
You can assume that
it holds for the R2.

220
00:09:51,005 --> 00:09:53,930
You have to establish that
this property holds also for

221
00:09:53,930 --> 00:09:58,460
the entire alternative,
the same four star.

222
00:09:58,460 --> 00:10:00,440
So there you can assume it for r,

223
00:10:00,440 --> 00:10:02,330
but you have to now
establish it that

224
00:10:02,330 --> 00:10:04,760
it also holds for r star.

225
00:10:04,760 --> 00:10:07,910
Ok, that leads to this
simple-minded recipe

226
00:10:07,910 --> 00:10:09,560
which you can always follow.

227
00:10:09,560 --> 00:10:10,970
So first you have to know

228
00:10:10,970 --> 00:10:13,085
what's the property
you want to show.

229
00:10:13,085 --> 00:10:16,790
Luckily for you, that's
almost always given by me.

230
00:10:16,790 --> 00:10:19,100
So there's nothing
to think about it.

231
00:10:19,100 --> 00:10:22,100
But then what you have to
do in an induction proof,

232
00:10:22,100 --> 00:10:23,855
you have to show that
this property holds

233
00:10:23,855 --> 00:10:26,345
for 041 and characters.

234
00:10:26,345 --> 00:10:28,790
These are the base
cases in the induction.

235
00:10:28,790 --> 00:10:31,280
And then in the induction cases,

236
00:10:31,280 --> 00:10:32,555
you just have to say, well,

237
00:10:32,555 --> 00:10:34,535
I know that this
property holds for one,

238
00:10:34,535 --> 00:10:36,980
I know that the
property holds for R2.

239
00:10:36,980 --> 00:10:38,720
What does it look like that

240
00:10:38,720 --> 00:10:40,895
this property holds
for the alternative,

241
00:10:40,895 --> 00:10:44,480
for the sequence
and for the r-star,

242
00:10:44,480 --> 00:10:46,955
that will then requires
some reasoning.

243
00:10:46,955 --> 00:10:48,230
But if you know what you can

244
00:10:48,230 --> 00:10:50,359
assume and what do you
have to establish?

245
00:10:50,359 --> 00:10:52,940
Often can already see bought

246
00:10:52,940 --> 00:10:55,490
needs to be proved and
what's the argument like?

247
00:10:55,490 --> 00:10:58,325
So let's try that and
then concrete example.

248
00:10:58,325 --> 00:11:01,265
So how does approve
work in real life?

249
00:11:01,265 --> 00:11:03,530
And say that the beginning
first you have to know

250
00:11:03,530 --> 00:11:05,630
what's the property
you want to prove.

251
00:11:05,630 --> 00:11:10,280
So in our case, the P of R is if

252
00:11:10,280 --> 00:11:14,900
nullable of R holds if

253
00:11:14,900 --> 00:11:19,760
and only if the empty string
is in the language of.

254
00:11:19,760 --> 00:11:23,690
So we want to do an induction
of rec expressions.

255
00:11:23,690 --> 00:11:28,400
So we better have Fan property
which depends on this

256
00:11:28,400 --> 00:11:31,130
R. And so now in

257
00:11:31,130 --> 00:11:34,985
each case I would start with
a blank piece of paper.

258
00:11:34,985 --> 00:11:38,030
And I would write on the top,

259
00:11:38,030 --> 00:11:40,190
what are my assumptions?

260
00:11:40,190 --> 00:11:42,470
And on the bottom, I would write

261
00:11:42,470 --> 00:11:45,485
down what I need to prove.

262
00:11:45,485 --> 00:11:49,280
And then I would try
to somehow navigate

263
00:11:49,280 --> 00:11:53,540
my way from these assumptions
to what I want to prove,

264
00:11:53,540 --> 00:11:57,740
to somehow infer
small steps so that I

265
00:11:57,740 --> 00:11:59,960
can assure that what I'm trying

266
00:11:59,960 --> 00:12:02,780
to prove holds for
my assumptions.

267
00:12:02,780 --> 00:12:06,905
Obviously. That depends
on how hard that case is.

268
00:12:06,905 --> 00:12:08,930
And if you look at
our first case,

269
00:12:08,930 --> 00:12:10,835
what we have to show is if to

270
00:12:10,835 --> 00:12:13,655
prove that this P of 0 holds.

271
00:12:13,655 --> 00:12:15,410
And actually no assumption in

272
00:12:15,410 --> 00:12:17,450
this case because it's
one of the base cases.

273
00:12:17,450 --> 00:12:23,015
So what I have to show is I
have to show that nullable of

274
00:12:23,015 --> 00:12:29,570
0 if and only if the empty
string is in L of 0.

275
00:12:29,570 --> 00:12:31,340
Okay? And I have to

276
00:12:31,340 --> 00:12:34,190
somehow show that this
implementation of nullable in

277
00:12:34,190 --> 00:12:35,600
this case produces

278
00:12:35,600 --> 00:12:39,185
exactly the same answer
as that specification.

279
00:12:39,185 --> 00:12:41,660
Okay? So what is nullable of 0 is

280
00:12:41,660 --> 00:12:45,605
defined where debt is
just false by definition.

281
00:12:45,605 --> 00:12:50,225
And what is this
language of LCA defined?

282
00:12:50,225 --> 00:12:52,535
Well, that is just
the empty string.

283
00:12:52,535 --> 00:12:55,010
In the empty language.

284
00:12:55,010 --> 00:12:57,110
Is the empty string and
the empty language,

285
00:12:57,110 --> 00:12:59,240
no, that is false too.

286
00:12:59,240 --> 00:13:01,610
So I have shown that

287
00:13:01,610 --> 00:13:03,890
both of them produce
the same result.

288
00:13:03,890 --> 00:13:08,765
So I would already be done
in the first base case.

289
00:13:08,765 --> 00:13:12,125
The second base case is not
so much more difficult.

290
00:13:12,125 --> 00:13:15,455
We want to prove
that P of one holds.

291
00:13:15,455 --> 00:13:21,680
And that would say
if nullable of one,

292
00:13:21,680 --> 00:13:23,825
if and only if

293
00:13:23,825 --> 00:13:27,665
the empty string is in
the language of one.

294
00:13:27,665 --> 00:13:31,235
Okay? I can again look
at what's that defined,

295
00:13:31,235 --> 00:13:33,275
that is defined as true.

296
00:13:33,275 --> 00:13:35,450
And what is this language to

297
00:13:35,450 --> 00:13:37,640
find where that is defined as

298
00:13:37,640 --> 00:13:39,320
the empty string in

299
00:13:39,320 --> 00:13:42,230
the language containing
the empty string.

300
00:13:42,230 --> 00:13:44,150
And is the empty string in

301
00:13:44,150 --> 00:13:46,055
the language containing
the empty string?

302
00:13:46,055 --> 00:13:47,975
Yes, that's true as well.

303
00:13:47,975 --> 00:13:50,735
So again, I've shown that
the actual implementation

304
00:13:50,735 --> 00:13:55,250
produces the same result
as the specification.

305
00:13:55,250 --> 00:13:57,635
I leave the character base case

306
00:13:57,635 --> 00:14:00,690
for you to fill in the details.

307
00:14:01,150 --> 00:14:03,320
More interesting are these

308
00:14:03,320 --> 00:14:05,300
inductive cases as
you can imagine.

309
00:14:05,300 --> 00:14:08,360
So let's try to prove
it for the alternative.

310
00:14:08,360 --> 00:14:10,639
So I want to show a property

311
00:14:10,639 --> 00:14:13,790
for this record expression way.

312
00:14:13,790 --> 00:14:15,410
What does our property save

313
00:14:15,410 --> 00:14:16,970
for that reg expression
where we have to

314
00:14:16,970 --> 00:14:20,915
say that nullable of

315
00:14:20,915 --> 00:14:25,490
R1 plus R2 if and only if

316
00:14:25,490 --> 00:14:31,519
the empty string is in the
language if R1 plus R2.

317
00:14:31,519 --> 00:14:35,090
Ok? So I would now go ahead with

318
00:14:35,090 --> 00:14:39,695
my plank piece of paper and
I would write down on top.

319
00:14:39,695 --> 00:14:41,475
Now what can I assume?

320
00:14:41,475 --> 00:14:45,170
Well, I can assume
that P of R1 holds and

321
00:14:45,170 --> 00:14:49,175
I can assume what
p of r two holds.

322
00:14:49,175 --> 00:14:50,750
And let's just write that down.

323
00:14:50,750 --> 00:14:56,210
We can say that nullable of R1 if

324
00:14:56,210 --> 00:14:58,655
and only if the empty string

325
00:14:58,655 --> 00:15:02,000
is in the language of
R1, I can assume that.

326
00:15:02,000 --> 00:15:03,500
And I can assume

327
00:15:03,500 --> 00:15:10,355
that none above our two if
and only if the empty string.

328
00:15:10,355 --> 00:15:14,675
Or to Hawaii. What's now,

329
00:15:14,675 --> 00:15:15,830
what do I want to prove?

330
00:15:15,830 --> 00:15:18,455
Where I want to
prove this property?

331
00:15:18,455 --> 00:15:21,935
So let me write this here
again on the bottom.

332
00:15:21,935 --> 00:15:30,935
So p of R1 plus R2
is narrow form of R1

333
00:15:30,935 --> 00:15:34,595
plus R2 if and only if

334
00:15:34,595 --> 00:15:40,610
the empty string is in the
language of R1 plus R2.

335
00:15:40,610 --> 00:15:43,280
Well, the first thing I can do is

336
00:15:43,280 --> 00:15:46,760
I can infer what's the
definition of that?

337
00:15:46,760 --> 00:15:51,920
Where that would
be nullable of R1

338
00:15:51,920 --> 00:15:57,530
or nullable of our two.

339
00:15:57,530 --> 00:15:59,960
That's how it was defined, okay?

340
00:15:59,960 --> 00:16:01,730
And what can I do with this?

341
00:16:01,730 --> 00:16:04,700
I can also look at how is

342
00:16:04,700 --> 00:16:08,510
this language to find
where that is defined as

343
00:16:08,510 --> 00:16:16,730
the empty string in the
language of R1 union R2.

344
00:16:16,730 --> 00:16:19,910
Okay? So what does that mean?

345
00:16:19,910 --> 00:16:23,585
Well, Venice, the empty
string in a union.

346
00:16:23,585 --> 00:16:27,095
Well, it has to be
the case that either

347
00:16:27,095 --> 00:16:31,190
the empty string
is in L of R one,

348
00:16:31,190 --> 00:16:37,670
all the empty string
is o. L of R two.

349
00:16:37,670 --> 00:16:43,715
Ok, so that's, I just
applied all the definitions.

350
00:16:43,715 --> 00:16:47,930
And now I have to look better
my induction hypothesis.

351
00:16:47,930 --> 00:16:50,930
These somehow help.

352
00:16:50,930 --> 00:16:55,820
Well, I know nullable
of R1 holds if

353
00:16:55,820 --> 00:17:00,815
and only if the empty
string is in L of one.

354
00:17:00,815 --> 00:17:05,330
So that is by the induction
hypothesis, a hypothesis.

355
00:17:05,330 --> 00:17:07,670
And I also know by this,

356
00:17:07,670 --> 00:17:10,040
that this holds only if

357
00:17:10,040 --> 00:17:17,465
the empty string is in R2
and there's an in-between.

358
00:17:17,465 --> 00:17:19,580
Now you can see again,

359
00:17:19,580 --> 00:17:23,690
this is what my specification
say It must hold.

360
00:17:23,690 --> 00:17:25,640
And this is essentially

361
00:17:25,640 --> 00:17:28,685
what I could infer from
my implementation.

362
00:17:28,685 --> 00:17:30,140
And they are the same.

363
00:17:30,140 --> 00:17:32,270
So also this case,

364
00:17:32,270 --> 00:17:35,520
I've shown my property holds.

365
00:17:35,740 --> 00:17:39,275
Just for fun. Let's do this
also for the sequence.

366
00:17:39,275 --> 00:17:43,835
So we want to show
P of 014 by two.

367
00:17:43,835 --> 00:17:51,050
And that means I have to
prove that nullable of R1 by

368
00:17:51,050 --> 00:17:54,095
R2 if and only if

369
00:17:54,095 --> 00:18:00,440
the empty string is
one followed by two.

370
00:18:00,440 --> 00:18:04,880
Ok? And in this case
I'm just writing

371
00:18:04,880 --> 00:18:09,695
my induction
hypothesis on the top.

372
00:18:09,695 --> 00:18:11,060
And that would be,

373
00:18:11,060 --> 00:18:13,460
I know that p of r one holds.

374
00:18:13,460 --> 00:18:18,290
So another ball of R one

375
00:18:18,290 --> 00:18:24,860
if and only if the empty
string is in L one.

376
00:18:24,860 --> 00:18:29,525
And the P of R to say
is that I know already

377
00:18:29,525 --> 00:18:36,815
nullable of R2 if and only
if the empty string then R2.

378
00:18:36,815 --> 00:18:39,230
Okay? And I know some hard to

379
00:18:39,230 --> 00:18:42,285
infer that under
these assumptions,

380
00:18:42,285 --> 00:18:45,970
this property, like
in the earlier case,

381
00:18:45,970 --> 00:18:48,070
I just have a look how

382
00:18:48,070 --> 00:18:51,145
everything is defined and
see if I can go from there.

383
00:18:51,145 --> 00:18:56,485
So nullable of the
sequence would be nullable

384
00:18:56,485 --> 00:19:04,555
of R1 and nullable of R2.

385
00:19:04,555 --> 00:19:08,559
Okay? By the
induction hypothesis,

386
00:19:08,559 --> 00:19:16,255
I can now infer that
the empty string is

387
00:19:16,255 --> 00:19:24,750
in L one and the empty
string is in L two.

388
00:19:24,750 --> 00:19:28,175
Ok? Not much as I
can do on this side.

389
00:19:28,175 --> 00:19:31,520
But I can now try
to infer something

390
00:19:31,520 --> 00:19:33,110
here because I know how

391
00:19:33,110 --> 00:19:35,735
this L of this
sequence is defined.

392
00:19:35,735 --> 00:19:39,665
That says the empty
string has to be n l,

393
00:19:39,665 --> 00:19:45,110
one append of L of R, two.

394
00:19:45,110 --> 00:19:48,365
Ok? So they are not yet equal.

395
00:19:48,365 --> 00:19:50,150
But now I have to
essentially scratch

396
00:19:50,150 --> 00:19:52,430
my head. What does that mean?

397
00:19:52,430 --> 00:19:54,965
So this is the language
concatenation,

398
00:19:54,965 --> 00:19:57,035
Venice, the empty string.

399
00:19:57,035 --> 00:20:01,265
In this concatenation of
these two language as well.

400
00:20:01,265 --> 00:20:05,990
It will be there if
L of our wormholes

401
00:20:05,990 --> 00:20:12,560
and The empty string
of L R2 holds.

402
00:20:12,560 --> 00:20:14,600
And so again, I have shown that

403
00:20:14,600 --> 00:20:17,780
this is equivalent to that.

404
00:20:17,780 --> 00:20:20,480
Which means that this must be

405
00:20:20,480 --> 00:20:24,540
equal to that o holds
if and only if.

406
00:20:25,600 --> 00:20:28,940
Finally, the star case,

407
00:20:28,940 --> 00:20:31,235
p of r star.

408
00:20:31,235 --> 00:20:35,450
Ok? So I know my
induction hypothesis,

409
00:20:35,450 --> 00:20:41,075
let me write it again on
the top that P of R holds,

410
00:20:41,075 --> 00:20:46,835
which means nullable of R if and

411
00:20:46,835 --> 00:20:53,060
only if the empty
string is in L. Okay?

412
00:20:53,060 --> 00:20:58,580
And I have to prove
that nullable of

413
00:20:58,580 --> 00:21:08,930
r star if and only if the
empty string is n l of r star.

414
00:21:08,930 --> 00:21:15,020
Ok? So again, I just follow
how are things define?

415
00:21:15,020 --> 00:21:18,125
So this is defined
actually as true.

416
00:21:18,125 --> 00:21:21,365
So a star's always nullable.

417
00:21:21,365 --> 00:21:23,090
What is this defined where,

418
00:21:23,090 --> 00:21:26,225
remember that wants to find
using the Kleene star.

419
00:21:26,225 --> 00:21:28,685
So that is essentially L of

420
00:21:28,685 --> 00:21:33,680
R. And then we had this
star on languages.

421
00:21:33,680 --> 00:21:36,260
And does the star on

422
00:21:36,260 --> 00:21:39,380
languages always contain
the empty string?

423
00:21:39,380 --> 00:21:42,980
Yes, that is true as well.

424
00:21:42,980 --> 00:21:45,320
And so again, I've
shown their clone.

425
00:21:45,320 --> 00:21:47,420
And here you can see, even if I

426
00:21:47,420 --> 00:21:49,730
have here in this case
an induction hypothesis,

427
00:21:49,730 --> 00:21:51,860
I actually didn't need
that in my reasoning.

428
00:21:51,860 --> 00:21:53,960
Well, that might happen.

429
00:21:53,960 --> 00:21:56,840
This might have been the
first induction proof

430
00:21:56,840 --> 00:21:58,445
for regular expressions for you.

431
00:21:58,445 --> 00:22:00,680
But you've seen at
Watson so difficult,

432
00:22:00,680 --> 00:22:02,480
you just follow this recipe.

433
00:22:02,480 --> 00:22:04,280
It told us we can

434
00:22:04,280 --> 00:22:06,725
assume in each case
what we have to prove.

435
00:22:06,725 --> 00:22:09,860
And then we essentially just
followed what are things to

436
00:22:09,860 --> 00:22:13,595
find and try to infer
whether in each case,

437
00:22:13,595 --> 00:22:15,560
both of them produce
the same result,

438
00:22:15,560 --> 00:22:18,245
the specification and
the implementation.

439
00:22:18,245 --> 00:22:20,300
And the great thing about this,

440
00:22:20,300 --> 00:22:22,250
if you've done all these cases,

441
00:22:22,250 --> 00:22:25,490
we not just know that it holds
for some reg expression,

442
00:22:25,490 --> 00:22:28,280
but actually it holds for
all wreck expressions be

443
00:22:28,280 --> 00:22:30,890
couldn't achieve the same
result by looking just

444
00:22:30,890 --> 00:22:33,710
at testing because
we would not be

445
00:22:33,710 --> 00:22:37,280
able to test for all
record expressions.

446
00:22:37,280 --> 00:22:39,560
Ok, of course, there will be

447
00:22:39,560 --> 00:22:43,010
also induction proofs which
are more complicated.

448
00:22:43,010 --> 00:22:46,170
But let's leave that
for another time.

449
00:22:49,210 --> 00:22:53,405
You know, I always like to
talk about the subject.

450
00:22:53,405 --> 00:22:55,700
So here's one last thought.

451
00:22:55,700 --> 00:22:58,040
You might think these proofs for

452
00:22:58,040 --> 00:22:59,930
this little property
about nullable are

453
00:22:59,930 --> 00:23:04,295
totally overkill and
total waste of your time.

454
00:23:04,295 --> 00:23:07,310
Please rest assured disproves

455
00:23:07,310 --> 00:23:10,160
have saved me from so
much embarrassment.

456
00:23:10,160 --> 00:23:13,745
You cannot imagine,
once I have a proof,

457
00:23:13,745 --> 00:23:15,425
I can sleep much better.

458
00:23:15,425 --> 00:23:16,820
If I don't have a proof,

459
00:23:16,820 --> 00:23:19,560
I don't believe what I'm doing.