1
00:00:06,260 --> 00:00:10,275
I come back, I know
the topic of proofs
2
00:00:10,275 --> 00:00:13,950
is usually not the most
favorite topics with students.
3
00:00:13,950 --> 00:00:17,220
But I'm really passionate
about proofs in
4
00:00:17,220 --> 00:00:19,320
this module because they really
5
00:00:19,320 --> 00:00:21,885
help you with understanding
what's going on.
6
00:00:21,885 --> 00:00:25,170
And very often they help
you with preventing errors.
7
00:00:25,170 --> 00:00:28,695
You seen me earlier doing
these calculations.
8
00:00:28,695 --> 00:00:30,210
And if I hadn't had
9
00:00:30,210 --> 00:00:33,330
the safety net of
knowing what I'm doing,
10
00:00:33,330 --> 00:00:35,130
I could have easily gone wrong.
11
00:00:35,130 --> 00:00:36,390
And that's actually a problem is
12
00:00:36,390 --> 00:00:38,935
existing wreck Expression
Matching engines.
13
00:00:38,935 --> 00:00:41,330
Sometimes in corner
cases as we seen,
14
00:00:41,330 --> 00:00:43,130
they produce slow results,
15
00:00:43,130 --> 00:00:45,455
which is actually not so bad.
16
00:00:45,455 --> 00:00:48,020
Much worse is that
we also know they
17
00:00:48,020 --> 00:00:50,570
sometimes produce
incorrect results.
18
00:00:50,570 --> 00:00:51,980
So for us here,
19
00:00:51,980 --> 00:00:53,390
the proofs will be
20
00:00:53,390 --> 00:00:55,100
essentially the safety nets
21
00:00:55,100 --> 00:00:58,295
of making sure that our
algorithms actually correct.
22
00:00:58,295 --> 00:01:00,095
And that's actually
the beauty of that.
23
00:01:00,095 --> 00:01:03,170
It doesn't happen very often
in real life that we can
24
00:01:03,170 --> 00:01:06,500
actually prove the
correctness of our algorithm.
25
00:01:06,500 --> 00:01:08,405
But we can do this here.
26
00:01:08,405 --> 00:01:11,000
So we will start
a bit slow first,
27
00:01:11,000 --> 00:01:14,330
I will show you how
properties are stated.
28
00:01:14,330 --> 00:01:17,255
And then along the way
we will see how proofs
29
00:01:17,255 --> 00:01:20,940
actually can be performed
and how they can help you.
30
00:01:20,980 --> 00:01:25,205
You can probably remember this
slide from the beginning.
31
00:01:25,205 --> 00:01:29,300
Now we would probably stayed
the same property that
32
00:01:29,300 --> 00:01:32,420
our algorithm matches
the specification
33
00:01:32,420 --> 00:01:33,695
of what that means.
34
00:01:33,695 --> 00:01:35,450
Maybe slightly more mathematical,
35
00:01:35,450 --> 00:01:37,130
we would write
something like this.
36
00:01:37,130 --> 00:01:39,545
If our Marcia is given as
37
00:01:39,545 --> 00:01:42,740
a string direct expression
R and says yes,
38
00:01:42,740 --> 00:01:45,110
then this string
really needs to be in
39
00:01:45,110 --> 00:01:48,350
the language of R. And
if the matcher says no,
40
00:01:48,350 --> 00:01:50,120
then this string cannot be in
41
00:01:50,120 --> 00:01:52,325
the language of r. And we spent
42
00:01:52,325 --> 00:01:54,590
quite some effort to make precise
43
00:01:54,590 --> 00:01:57,065
what the language is
of a wreck expression.
44
00:01:57,065 --> 00:01:59,090
And we spend some effort to
45
00:01:59,090 --> 00:02:01,505
make precise what
our algorithm is.
46
00:02:01,505 --> 00:02:02,840
So we could now take
47
00:02:02,840 --> 00:02:06,875
this property and we could
generate some test cases,
48
00:02:06,875 --> 00:02:09,410
and that is all fine and good.
49
00:02:09,410 --> 00:02:13,340
Testing is a very important
thing for our software.
50
00:02:13,340 --> 00:02:16,715
But we can only do
this testing so far.
51
00:02:16,715 --> 00:02:19,385
Remember in the homework,
52
00:02:19,385 --> 00:02:21,275
there was a single string,
53
00:02:21,275 --> 00:02:23,750
abcd, and there were
actually infinitely
54
00:02:23,750 --> 00:02:27,530
many wreck expressions which
can match only that string.
55
00:02:27,530 --> 00:02:30,470
So testing something for
infinitely many things,
56
00:02:30,470 --> 00:02:32,000
that's a bit hard.
57
00:02:32,000 --> 00:02:36,710
And Summa, as soon as we have
an expression with a star,
58
00:02:36,710 --> 00:02:39,289
this l function returns,
59
00:02:39,289 --> 00:02:41,360
in the general case
an infinite set.
60
00:02:41,360 --> 00:02:43,940
And testing whether a
string is an infinite set,
61
00:02:43,940 --> 00:02:45,500
that's also a bit hard.
62
00:02:45,500 --> 00:02:47,510
So testing can only do so much.
63
00:02:47,510 --> 00:02:49,820
We could generate some
wreck expressions
64
00:02:49,820 --> 00:02:51,560
and we can generate some strings.
65
00:02:51,560 --> 00:02:53,990
And we can test
whether that is true.
66
00:02:53,990 --> 00:02:57,470
So B would be still
in the quandary that
67
00:02:57,470 --> 00:03:00,950
maybe there's a corner
case which doesn't work.
68
00:03:00,950 --> 00:03:03,110
So what we want to
achieve instead
69
00:03:03,110 --> 00:03:05,615
is you want to really make sure
70
00:03:05,615 --> 00:03:07,325
that this algorithm works for
71
00:03:07,325 --> 00:03:10,850
all reg expression
and for all strings.
72
00:03:10,850 --> 00:03:16,400
And that ONE proving will
help us to establish.
73
00:03:16,400 --> 00:03:19,040
Testing cannot do that.
74
00:03:19,040 --> 00:03:22,310
Remember, metro
mainly consisted of
75
00:03:22,310 --> 00:03:25,444
these functions,
nullable and derivative.
76
00:03:25,444 --> 00:03:27,755
So if you want to prove
anything about Metro.
77
00:03:27,755 --> 00:03:30,200
Bet on, make sure that
we know what actually
78
00:03:30,200 --> 00:03:33,890
the specification of
nullable and derivative is.
79
00:03:33,890 --> 00:03:35,750
That's actually quite
instructive to also
80
00:03:35,750 --> 00:03:37,850
understand how they work.
81
00:03:37,850 --> 00:03:39,695
So let's do that next.
82
00:03:39,695 --> 00:03:41,975
The central property of nullable,
83
00:03:41,975 --> 00:03:43,445
Actually that's not so hard.
84
00:03:43,445 --> 00:03:48,740
It's just says a function or
a reg expression is nullable
85
00:03:48,740 --> 00:03:50,360
if and only if
86
00:03:50,360 --> 00:03:52,160
the empty string is in
87
00:03:52,160 --> 00:03:54,410
the language that was the
purpose of this novel,
88
00:03:54,410 --> 00:03:55,670
it takes work, expression and
89
00:03:55,670 --> 00:03:58,490
test whether can match
the empty string.
90
00:03:58,490 --> 00:04:01,010
Meaning this empty string needs
91
00:04:01,010 --> 00:04:03,230
to be in the language
of R. And again,
92
00:04:03,230 --> 00:04:04,385
we have this if an only,
93
00:04:04,385 --> 00:04:05,885
if this one says yes,
94
00:04:05,885 --> 00:04:08,465
then the empty string needs
to be in this language.
95
00:04:08,465 --> 00:04:10,385
And if this nullable says no,
96
00:04:10,385 --> 00:04:13,085
the empty string CoMP
unless language.
97
00:04:13,085 --> 00:04:15,290
So that's relatively easy.
98
00:04:15,290 --> 00:04:18,110
The next one might be also
very instructive to look
99
00:04:18,110 --> 00:04:22,775
at what's the property the
derivative should satisfy.
100
00:04:22,775 --> 00:04:26,329
And remember, the derivative
was a kind of operation.
101
00:04:26,329 --> 00:04:28,850
We taking a wrecker
expression and we're looking
102
00:04:28,850 --> 00:04:31,955
at all the strings this
reg expression can match,
103
00:04:31,955 --> 00:04:35,510
but which starts with
C. And then me somehow
104
00:04:35,510 --> 00:04:37,250
looking for reg expression
which can match
105
00:04:37,250 --> 00:04:39,335
the same string starting with C,
106
00:04:39,335 --> 00:04:42,440
except that this
c is chopped off.
107
00:04:42,440 --> 00:04:47,150
And that actually saves
what this property states.
108
00:04:47,150 --> 00:04:50,300
Take the, take a
break expression are,
109
00:04:50,300 --> 00:04:53,045
and build the derivative
according to see,
110
00:04:53,045 --> 00:04:56,150
okay, so this gives
us a reg expression,
111
00:04:56,150 --> 00:04:58,025
the derivative of that
112
00:04:58,025 --> 00:05:01,775
r. Now we take the
a language of that,
113
00:05:01,775 --> 00:05:06,620
are that supposed to be
the same set as if I take
114
00:05:06,620 --> 00:05:12,845
the language of R and then
build the semantic derivative.
115
00:05:12,845 --> 00:05:16,055
Remember the semantic derivative
was doing exactly that.
116
00:05:16,055 --> 00:05:18,380
It goes through all
the strings in this
117
00:05:18,380 --> 00:05:21,350
that filters out everything which
118
00:05:21,350 --> 00:05:24,770
doesn't start with C.
So throws them away of
119
00:05:24,770 --> 00:05:26,510
the remaining strings
which start with
120
00:05:26,510 --> 00:05:28,670
C. It chops off so that C,
121
00:05:28,670 --> 00:05:31,370
So this will be exactly
122
00:05:31,370 --> 00:05:34,205
what our derivative
is supposed to do.
123
00:05:34,205 --> 00:05:36,605
And yes, if you can
establish that,
124
00:05:36,605 --> 00:05:38,450
then the language of
125
00:05:38,450 --> 00:05:43,355
this derivative operation
will produce the same sat.
126
00:05:43,355 --> 00:05:45,815
This actually also
how I'm going to
127
00:05:45,815 --> 00:05:49,760
test in cases icons
yet at the beginning,
128
00:05:49,760 --> 00:05:52,910
how your coursework is correct.
129
00:05:52,910 --> 00:05:55,625
If you give me a
definition for derivative
130
00:05:55,625 --> 00:05:58,730
of one of these bounded
record expressions,
131
00:05:58,730 --> 00:06:01,085
and I can't see
whether it's correct.
132
00:06:01,085 --> 00:06:03,380
Then what it will do
is I will just try to
133
00:06:03,380 --> 00:06:05,675
establish that this
property holds.
134
00:06:05,675 --> 00:06:07,265
And if I can establish that,
135
00:06:07,265 --> 00:06:08,870
then you coursework is find.
136
00:06:08,870 --> 00:06:10,550
And if I can't establish that,
137
00:06:10,550 --> 00:06:12,410
then usually it shows
138
00:06:12,410 --> 00:06:14,360
there is a counterexample
somewhere.
139
00:06:14,360 --> 00:06:19,220
Because this derivative,
as you've seen desk times,
140
00:06:19,220 --> 00:06:21,725
more than one solution
you can give.
141
00:06:21,725 --> 00:06:23,915
Some of them are more
efficient than others.
142
00:06:23,915 --> 00:06:25,685
But in terms of correctness,
143
00:06:25,685 --> 00:06:30,680
there are multiple solutions
for the coursework.
144
00:06:30,680 --> 00:06:34,145
And in the past I really had
to scratch my head and say,
145
00:06:34,145 --> 00:06:36,995
Is that definition
really correct?
146
00:06:36,995 --> 00:06:39,620
And so what I did is
I just went ahead and
147
00:06:39,620 --> 00:06:42,785
tried to prove that or
establish this property.
148
00:06:42,785 --> 00:06:45,110
The reason why we writing
149
00:06:45,110 --> 00:06:47,300
these properties
down so precisely,
150
00:06:47,300 --> 00:06:49,880
remember we know here
what the derivative is,
151
00:06:49,880 --> 00:06:51,290
we know what the
languages we know
152
00:06:51,290 --> 00:06:53,495
what this semantic derivative is.
153
00:06:53,495 --> 00:06:56,960
This is usually when you are
trying to prove something.
154
00:06:56,960 --> 00:06:58,715
And very important step.
155
00:06:58,715 --> 00:07:01,385
If you know what you're
actually trying to prove,
156
00:07:01,385 --> 00:07:05,510
then you have already
done half of the proof.
157
00:07:05,510 --> 00:07:08,300
Remember, we want to
establish these properties
158
00:07:08,300 --> 00:07:10,460
not just for some reg expression,
159
00:07:10,460 --> 00:07:13,940
but for all wrecker expression
and for all characters.
160
00:07:13,940 --> 00:07:17,225
So proving is the only
method to achieve that.
161
00:07:17,225 --> 00:07:18,590
And I said at the beginning of
162
00:07:18,590 --> 00:07:21,170
this video's proving is
163
00:07:21,170 --> 00:07:23,810
not the most favorite
subject of students.
164
00:07:23,810 --> 00:07:25,730
But I can promise you,
165
00:07:25,730 --> 00:07:29,630
if you just follow a very
simple minded recipe,
166
00:07:29,630 --> 00:07:33,695
that brings, you're already
very far in these proofs.
167
00:07:33,695 --> 00:07:35,720
Of course, there's
always some creativity
168
00:07:35,720 --> 00:07:37,070
involved in these proofs.
169
00:07:37,070 --> 00:07:39,380
But a simple-minded
recipe you can
170
00:07:39,380 --> 00:07:42,620
follow brings you're already
very close to the end.
171
00:07:42,620 --> 00:07:46,610
So let's have a look how
this recipe can be derived.
172
00:07:46,610 --> 00:07:48,740
Establishing a property is for
173
00:07:48,740 --> 00:07:50,900
all Rekha expression nearly
174
00:07:50,900 --> 00:07:55,805
always requires an induction
over wreck expressions.
175
00:07:55,805 --> 00:07:57,350
And so we have to look at what's
176
00:07:57,350 --> 00:07:59,150
the recipe for doing inductions
177
00:07:59,150 --> 00:08:02,390
over wreck expressions
and inductions.
178
00:08:02,390 --> 00:08:03,590
You probably remember from
179
00:08:03,590 --> 00:08:07,100
school time you did induction
over natural numbers.
180
00:08:07,100 --> 00:08:11,120
You had to establish that
some property holds for 0.
181
00:08:11,120 --> 00:08:13,775
And you had to establish it
182
00:08:13,775 --> 00:08:14,960
that if you assume that
183
00:08:14,960 --> 00:08:16,970
the property holds
for the N case,
184
00:08:16,970 --> 00:08:21,365
it has to also hold for
the n plus one case.
185
00:08:21,365 --> 00:08:24,350
And induction over wreck
expressions is very,
186
00:08:24,350 --> 00:08:26,810
very similar to this
kind of induction.
187
00:08:26,810 --> 00:08:30,754
Just that in case of
break expressions,
188
00:08:30,754 --> 00:08:33,200
we have essentially
three base cases.
189
00:08:33,200 --> 00:08:35,360
So in our grandma
or we say it rec,
190
00:08:35,360 --> 00:08:38,705
expressions are 01
and characters.
191
00:08:38,705 --> 00:08:42,260
And these will be the base
cases in our induction.
192
00:08:42,260 --> 00:08:43,895
So we have to establish
193
00:08:43,895 --> 00:08:46,475
any property we want
to prove by induction,
194
00:08:46,475 --> 00:08:49,055
you have to show
that it holds for 0,
195
00:08:49,055 --> 00:08:50,495
it holds for one,
196
00:08:50,495 --> 00:08:52,505
and it holds for the character.
197
00:08:52,505 --> 00:08:57,350
So there's nothing else we
can assume in these cases.
198
00:08:57,350 --> 00:08:59,300
We just have to prove
that this property
199
00:08:59,300 --> 00:09:01,835
holds in these three base cases.
200
00:09:01,835 --> 00:09:04,490
It is very much the
same as if you proved
201
00:09:04,490 --> 00:09:06,050
something over natural numbers
202
00:09:06,050 --> 00:09:08,270
and you had to prove it for 0.
203
00:09:08,270 --> 00:09:10,880
Now, the sequence case,
204
00:09:10,880 --> 00:09:12,755
the alternative in a star.
205
00:09:12,755 --> 00:09:15,815
These are the kind
of inductive cases.
206
00:09:15,815 --> 00:09:17,750
Remember in natural numbers,
207
00:09:17,750 --> 00:09:20,450
you could assume that
a property holds for
208
00:09:20,450 --> 00:09:24,770
n and then you had to
establish it for n plus one.
209
00:09:24,770 --> 00:09:26,945
This is here very simmer.
210
00:09:26,945 --> 00:09:29,000
You can assume that
211
00:09:29,000 --> 00:09:30,920
these properties hold for
212
00:09:30,920 --> 00:09:33,140
the components of
these reg expression.
213
00:09:33,140 --> 00:09:35,840
So you can assume that it
holds for this on one,
214
00:09:35,840 --> 00:09:38,960
and you can assume that
it holds for this R2.
215
00:09:38,960 --> 00:09:41,090
And your task is now to
216
00:09:41,090 --> 00:09:44,240
establish this property
also for sequences.
217
00:09:44,240 --> 00:09:46,255
The same for alternative soup.
218
00:09:46,255 --> 00:09:48,710
Assume that it holds
already for R1.
219
00:09:48,710 --> 00:09:51,005
You can assume that
it holds for the R2.
220
00:09:51,005 --> 00:09:53,930
You have to establish that
this property holds also for
221
00:09:53,930 --> 00:09:58,460
the entire alternative,
the same four star.
222
00:09:58,460 --> 00:10:00,440
So there you can assume it for r,
223
00:10:00,440 --> 00:10:02,330
but you have to now
establish it that
224
00:10:02,330 --> 00:10:04,760
it also holds for r star.
225
00:10:04,760 --> 00:10:07,910
Ok, that leads to this
simple-minded recipe
226
00:10:07,910 --> 00:10:09,560
which you can always follow.
227
00:10:09,560 --> 00:10:10,970
So first you have to know
228
00:10:10,970 --> 00:10:13,085
what's the property
you want to show.
229
00:10:13,085 --> 00:10:16,790
Luckily for you, that's
almost always given by me.
230
00:10:16,790 --> 00:10:19,100
So there's nothing
to think about it.
231
00:10:19,100 --> 00:10:22,100
But then what you have to
do in an induction proof,
232
00:10:22,100 --> 00:10:23,855
you have to show that
this property holds
233
00:10:23,855 --> 00:10:26,345
for 041 and characters.
234
00:10:26,345 --> 00:10:28,790
These are the base
cases in the induction.
235
00:10:28,790 --> 00:10:31,280
And then in the induction cases,
236
00:10:31,280 --> 00:10:32,555
you just have to say, well,
237
00:10:32,555 --> 00:10:34,535
I know that this
property holds for one,
238
00:10:34,535 --> 00:10:36,980
I know that the
property holds for R2.
239
00:10:36,980 --> 00:10:38,720
What does it look like that
240
00:10:38,720 --> 00:10:40,895
this property holds
for the alternative,
241
00:10:40,895 --> 00:10:44,480
for the sequence
and for the r-star,
242
00:10:44,480 --> 00:10:46,955
that will then requires
some reasoning.
243
00:10:46,955 --> 00:10:48,230
But if you know what you can
244
00:10:48,230 --> 00:10:50,359
assume and what do you
have to establish?
245
00:10:50,359 --> 00:10:52,940
Often can already see bought
246
00:10:52,940 --> 00:10:55,490
needs to be proved and
what's the argument like?
247
00:10:55,490 --> 00:10:58,325
So let's try that and
then concrete example.
248
00:10:58,325 --> 00:11:01,265
So how does approve
work in real life?
249
00:11:01,265 --> 00:11:03,530
And say that the beginning
first you have to know
250
00:11:03,530 --> 00:11:05,630
what's the property
you want to prove.
251
00:11:05,630 --> 00:11:10,280
So in our case, the P of R is if
252
00:11:10,280 --> 00:11:14,900
nullable of R holds if
253
00:11:14,900 --> 00:11:19,760
and only if the empty string
is in the language of.
254
00:11:19,760 --> 00:11:23,690
So we want to do an induction
of rec expressions.
255
00:11:23,690 --> 00:11:28,400
So we better have Fan property
which depends on this
256
00:11:28,400 --> 00:11:31,130
R. And so now in
257
00:11:31,130 --> 00:11:34,985
each case I would start with
a blank piece of paper.
258
00:11:34,985 --> 00:11:38,030
And I would write on the top,
259
00:11:38,030 --> 00:11:40,190
what are my assumptions?
260
00:11:40,190 --> 00:11:42,470
And on the bottom, I would write
261
00:11:42,470 --> 00:11:45,485
down what I need to prove.
262
00:11:45,485 --> 00:11:49,280
And then I would try
to somehow navigate
263
00:11:49,280 --> 00:11:53,540
my way from these assumptions
to what I want to prove,
264
00:11:53,540 --> 00:11:57,740
to somehow infer
small steps so that I
265
00:11:57,740 --> 00:11:59,960
can assure that what I'm trying
266
00:11:59,960 --> 00:12:02,780
to prove holds for
my assumptions.
267
00:12:02,780 --> 00:12:06,905
Obviously. That depends
on how hard that case is.
268
00:12:06,905 --> 00:12:08,930
And if you look at
our first case,
269
00:12:08,930 --> 00:12:10,835
what we have to show is if to
270
00:12:10,835 --> 00:12:13,655
prove that this P of 0 holds.
271
00:12:13,655 --> 00:12:15,410
And actually no assumption in
272
00:12:15,410 --> 00:12:17,450
this case because it's
one of the base cases.
273
00:12:17,450 --> 00:12:23,015
So what I have to show is I
have to show that nullable of
274
00:12:23,015 --> 00:12:29,570
0 if and only if the empty
string is in L of 0.
275
00:12:29,570 --> 00:12:31,340
Okay? And I have to
276
00:12:31,340 --> 00:12:34,190
somehow show that this
implementation of nullable in
277
00:12:34,190 --> 00:12:35,600
this case produces
278
00:12:35,600 --> 00:12:39,185
exactly the same answer
as that specification.
279
00:12:39,185 --> 00:12:41,660
Okay? So what is nullable of 0 is
280
00:12:41,660 --> 00:12:45,605
defined where debt is
just false by definition.
281
00:12:45,605 --> 00:12:50,225
And what is this
language of LCA defined?
282
00:12:50,225 --> 00:12:52,535
Well, that is just
the empty string.
283
00:12:52,535 --> 00:12:55,010
In the empty language.
284
00:12:55,010 --> 00:12:57,110
Is the empty string and
the empty language,
285
00:12:57,110 --> 00:12:59,240
no, that is false too.
286
00:12:59,240 --> 00:13:01,610
So I have shown that
287
00:13:01,610 --> 00:13:03,890
both of them produce
the same result.
288
00:13:03,890 --> 00:13:08,765
So I would already be done
in the first base case.
289
00:13:08,765 --> 00:13:12,125
The second base case is not
so much more difficult.
290
00:13:12,125 --> 00:13:15,455
We want to prove
that P of one holds.
291
00:13:15,455 --> 00:13:21,680
And that would say
if nullable of one,
292
00:13:21,680 --> 00:13:23,825
if and only if
293
00:13:23,825 --> 00:13:27,665
the empty string is in
the language of one.
294
00:13:27,665 --> 00:13:31,235
Okay? I can again look
at what's that defined,
295
00:13:31,235 --> 00:13:33,275
that is defined as true.
296
00:13:33,275 --> 00:13:35,450
And what is this language to
297
00:13:35,450 --> 00:13:37,640
find where that is defined as
298
00:13:37,640 --> 00:13:39,320
the empty string in
299
00:13:39,320 --> 00:13:42,230
the language containing
the empty string.
300
00:13:42,230 --> 00:13:44,150
And is the empty string in
301
00:13:44,150 --> 00:13:46,055
the language containing
the empty string?
302
00:13:46,055 --> 00:13:47,975
Yes, that's true as well.
303
00:13:47,975 --> 00:13:50,735
So again, I've shown that
the actual implementation
304
00:13:50,735 --> 00:13:55,250
produces the same result
as the specification.
305
00:13:55,250 --> 00:13:57,635
I leave the character base case
306
00:13:57,635 --> 00:14:00,690
for you to fill in the details.
307
00:14:01,150 --> 00:14:03,320
More interesting are these
308
00:14:03,320 --> 00:14:05,300
inductive cases as
you can imagine.
309
00:14:05,300 --> 00:14:08,360
So let's try to prove
it for the alternative.
310
00:14:08,360 --> 00:14:10,639
So I want to show a property
311
00:14:10,639 --> 00:14:13,790
for this record expression way.
312
00:14:13,790 --> 00:14:15,410
What does our property save
313
00:14:15,410 --> 00:14:16,970
for that reg expression
where we have to
314
00:14:16,970 --> 00:14:20,915
say that nullable of
315
00:14:20,915 --> 00:14:25,490
R1 plus R2 if and only if
316
00:14:25,490 --> 00:14:31,519
the empty string is in the
language if R1 plus R2.
317
00:14:31,519 --> 00:14:35,090
Ok? So I would now go ahead with
318
00:14:35,090 --> 00:14:39,695
my plank piece of paper and
I would write down on top.
319
00:14:39,695 --> 00:14:41,475
Now what can I assume?
320
00:14:41,475 --> 00:14:45,170
Well, I can assume
that P of R1 holds and
321
00:14:45,170 --> 00:14:49,175
I can assume what
p of r two holds.
322
00:14:49,175 --> 00:14:50,750
And let's just write that down.
323
00:14:50,750 --> 00:14:56,210
We can say that nullable of R1 if
324
00:14:56,210 --> 00:14:58,655
and only if the empty string
325
00:14:58,655 --> 00:15:02,000
is in the language of
R1, I can assume that.
326
00:15:02,000 --> 00:15:03,500
And I can assume
327
00:15:03,500 --> 00:15:10,355
that none above our two if
and only if the empty string.
328
00:15:10,355 --> 00:15:14,675
Or to Hawaii. What's now,
329
00:15:14,675 --> 00:15:15,830
what do I want to prove?
330
00:15:15,830 --> 00:15:18,455
Where I want to
prove this property?
331
00:15:18,455 --> 00:15:21,935
So let me write this here
again on the bottom.
332
00:15:21,935 --> 00:15:30,935
So p of R1 plus R2
is narrow form of R1
333
00:15:30,935 --> 00:15:34,595
plus R2 if and only if
334
00:15:34,595 --> 00:15:40,610
the empty string is in the
language of R1 plus R2.
335
00:15:40,610 --> 00:15:43,280
Well, the first thing I can do is
336
00:15:43,280 --> 00:15:46,760
I can infer what's the
definition of that?
337
00:15:46,760 --> 00:15:51,920
Where that would
be nullable of R1
338
00:15:51,920 --> 00:15:57,530
or nullable of our two.
339
00:15:57,530 --> 00:15:59,960
That's how it was defined, okay?
340
00:15:59,960 --> 00:16:01,730
And what can I do with this?
341
00:16:01,730 --> 00:16:04,700
I can also look at how is
342
00:16:04,700 --> 00:16:08,510
this language to find
where that is defined as
343
00:16:08,510 --> 00:16:16,730
the empty string in the
language of R1 union R2.
344
00:16:16,730 --> 00:16:19,910
Okay? So what does that mean?
345
00:16:19,910 --> 00:16:23,585
Well, Venice, the empty
string in a union.
346
00:16:23,585 --> 00:16:27,095
Well, it has to be
the case that either
347
00:16:27,095 --> 00:16:31,190
the empty string
is in L of R one,
348
00:16:31,190 --> 00:16:37,670
all the empty string
is o. L of R two.
349
00:16:37,670 --> 00:16:43,715
Ok, so that's, I just
applied all the definitions.
350
00:16:43,715 --> 00:16:47,930
And now I have to look better
my induction hypothesis.
351
00:16:47,930 --> 00:16:50,930
These somehow help.
352
00:16:50,930 --> 00:16:55,820
Well, I know nullable
of R1 holds if
353
00:16:55,820 --> 00:17:00,815
and only if the empty
string is in L of one.
354
00:17:00,815 --> 00:17:05,330
So that is by the induction
hypothesis, a hypothesis.
355
00:17:05,330 --> 00:17:07,670
And I also know by this,
356
00:17:07,670 --> 00:17:10,040
that this holds only if
357
00:17:10,040 --> 00:17:17,465
the empty string is in R2
and there's an in-between.
358
00:17:17,465 --> 00:17:19,580
Now you can see again,
359
00:17:19,580 --> 00:17:23,690
this is what my specification
say It must hold.
360
00:17:23,690 --> 00:17:25,640
And this is essentially
361
00:17:25,640 --> 00:17:28,685
what I could infer from
my implementation.
362
00:17:28,685 --> 00:17:30,140
And they are the same.
363
00:17:30,140 --> 00:17:32,270
So also this case,
364
00:17:32,270 --> 00:17:35,520
I've shown my property holds.
365
00:17:35,740 --> 00:17:39,275
Just for fun. Let's do this
also for the sequence.
366
00:17:39,275 --> 00:17:43,835
So we want to show
P of 014 by two.
367
00:17:43,835 --> 00:17:51,050
And that means I have to
prove that nullable of R1 by
368
00:17:51,050 --> 00:17:54,095
R2 if and only if
369
00:17:54,095 --> 00:18:00,440
the empty string is
one followed by two.
370
00:18:00,440 --> 00:18:04,880
Ok? And in this case
I'm just writing
371
00:18:04,880 --> 00:18:09,695
my induction
hypothesis on the top.
372
00:18:09,695 --> 00:18:11,060
And that would be,
373
00:18:11,060 --> 00:18:13,460
I know that p of r one holds.
374
00:18:13,460 --> 00:18:18,290
So another ball of R one
375
00:18:18,290 --> 00:18:24,860
if and only if the empty
string is in L one.
376
00:18:24,860 --> 00:18:29,525
And the P of R to say
is that I know already
377
00:18:29,525 --> 00:18:36,815
nullable of R2 if and only
if the empty string then R2.
378
00:18:36,815 --> 00:18:39,230
Okay? And I know some hard to
379
00:18:39,230 --> 00:18:42,285
infer that under
these assumptions,
380
00:18:42,285 --> 00:18:45,970
this property, like
in the earlier case,
381
00:18:45,970 --> 00:18:48,070
I just have a look how
382
00:18:48,070 --> 00:18:51,145
everything is defined and
see if I can go from there.
383
00:18:51,145 --> 00:18:56,485
So nullable of the
sequence would be nullable
384
00:18:56,485 --> 00:19:04,555
of R1 and nullable of R2.
385
00:19:04,555 --> 00:19:08,559
Okay? By the
induction hypothesis,
386
00:19:08,559 --> 00:19:16,255
I can now infer that
the empty string is
387
00:19:16,255 --> 00:19:24,750
in L one and the empty
string is in L two.
388
00:19:24,750 --> 00:19:28,175
Ok? Not much as I
can do on this side.
389
00:19:28,175 --> 00:19:31,520
But I can now try
to infer something
390
00:19:31,520 --> 00:19:33,110
here because I know how
391
00:19:33,110 --> 00:19:35,735
this L of this
sequence is defined.
392
00:19:35,735 --> 00:19:39,665
That says the empty
string has to be n l,
393
00:19:39,665 --> 00:19:45,110
one append of L of R, two.
394
00:19:45,110 --> 00:19:48,365
Ok? So they are not yet equal.
395
00:19:48,365 --> 00:19:50,150
But now I have to
essentially scratch
396
00:19:50,150 --> 00:19:52,430
my head. What does that mean?
397
00:19:52,430 --> 00:19:54,965
So this is the language
concatenation,
398
00:19:54,965 --> 00:19:57,035
Venice, the empty string.
399
00:19:57,035 --> 00:20:01,265
In this concatenation of
these two language as well.
400
00:20:01,265 --> 00:20:05,990
It will be there if
L of our wormholes
401
00:20:05,990 --> 00:20:12,560
and The empty string
of L R2 holds.
402
00:20:12,560 --> 00:20:14,600
And so again, I have shown that
403
00:20:14,600 --> 00:20:17,780
this is equivalent to that.
404
00:20:17,780 --> 00:20:20,480
Which means that this must be
405
00:20:20,480 --> 00:20:24,540
equal to that o holds
if and only if.
406
00:20:25,600 --> 00:20:28,940
Finally, the star case,
407
00:20:28,940 --> 00:20:31,235
p of r star.
408
00:20:31,235 --> 00:20:35,450
Ok? So I know my
induction hypothesis,
409
00:20:35,450 --> 00:20:41,075
let me write it again on
the top that P of R holds,
410
00:20:41,075 --> 00:20:46,835
which means nullable of R if and
411
00:20:46,835 --> 00:20:53,060
only if the empty
string is in L. Okay?
412
00:20:53,060 --> 00:20:58,580
And I have to prove
that nullable of
413
00:20:58,580 --> 00:21:08,930
r star if and only if the
empty string is n l of r star.
414
00:21:08,930 --> 00:21:15,020
Ok? So again, I just follow
how are things define?
415
00:21:15,020 --> 00:21:18,125
So this is defined
actually as true.
416
00:21:18,125 --> 00:21:21,365
So a star's always nullable.
417
00:21:21,365 --> 00:21:23,090
What is this defined where,
418
00:21:23,090 --> 00:21:26,225
remember that wants to find
using the Kleene star.
419
00:21:26,225 --> 00:21:28,685
So that is essentially L of
420
00:21:28,685 --> 00:21:33,680
R. And then we had this
star on languages.
421
00:21:33,680 --> 00:21:36,260
And does the star on
422
00:21:36,260 --> 00:21:39,380
languages always contain
the empty string?
423
00:21:39,380 --> 00:21:42,980
Yes, that is true as well.
424
00:21:42,980 --> 00:21:45,320
And so again, I've
shown their clone.
425
00:21:45,320 --> 00:21:47,420
And here you can see, even if I
426
00:21:47,420 --> 00:21:49,730
have here in this case
an induction hypothesis,
427
00:21:49,730 --> 00:21:51,860
I actually didn't need
that in my reasoning.
428
00:21:51,860 --> 00:21:53,960
Well, that might happen.
429
00:21:53,960 --> 00:21:56,840
This might have been the
first induction proof
430
00:21:56,840 --> 00:21:58,445
for regular expressions for you.
431
00:21:58,445 --> 00:22:00,680
But you've seen at
Watson so difficult,
432
00:22:00,680 --> 00:22:02,480
you just follow this recipe.
433
00:22:02,480 --> 00:22:04,280
It told us we can
434
00:22:04,280 --> 00:22:06,725
assume in each case
what we have to prove.
435
00:22:06,725 --> 00:22:09,860
And then we essentially just
followed what are things to
436
00:22:09,860 --> 00:22:13,595
find and try to infer
whether in each case,
437
00:22:13,595 --> 00:22:15,560
both of them produce
the same result,
438
00:22:15,560 --> 00:22:18,245
the specification and
the implementation.
439
00:22:18,245 --> 00:22:20,300
And the great thing about this,
440
00:22:20,300 --> 00:22:22,250
if you've done all these cases,
441
00:22:22,250 --> 00:22:25,490
we not just know that it holds
for some reg expression,
442
00:22:25,490 --> 00:22:28,280
but actually it holds for
all wreck expressions be
443
00:22:28,280 --> 00:22:30,890
couldn't achieve the same
result by looking just
444
00:22:30,890 --> 00:22:33,710
at testing because
we would not be
445
00:22:33,710 --> 00:22:37,280
able to test for all
record expressions.
446
00:22:37,280 --> 00:22:39,560
Ok, of course, there will be
447
00:22:39,560 --> 00:22:43,010
also induction proofs which
are more complicated.
448
00:22:43,010 --> 00:22:46,170
But let's leave that
for another time.
449
00:22:49,210 --> 00:22:53,405
You know, I always like to
talk about the subject.
450
00:22:53,405 --> 00:22:55,700
So here's one last thought.
451
00:22:55,700 --> 00:22:58,040
You might think these proofs for
452
00:22:58,040 --> 00:22:59,930
this little property
about nullable are
453
00:22:59,930 --> 00:23:04,295
totally overkill and
total waste of your time.
454
00:23:04,295 --> 00:23:07,310
Please rest assured disproves
455
00:23:07,310 --> 00:23:10,160
have saved me from so
much embarrassment.
456
00:23:10,160 --> 00:23:13,745
You cannot imagine,
once I have a proof,
457
00:23:13,745 --> 00:23:15,425
I can sleep much better.
458
00:23:15,425 --> 00:23:16,820
If I don't have a proof,
459
00:23:16,820 --> 00:23:19,560
I don't believe what I'm doing.