1
00:00:06,020 --> 00:00:09,945
Welcome back. After
this disappointment
2
00:00:09,945 --> 00:00:14,115
and case of over-promising
and under-achieving,
3
00:00:14,115 --> 00:00:17,295
let's have a look whether
we can recover from that.
4
00:00:17,295 --> 00:00:20,535
So here's one problem.
5
00:00:20,535 --> 00:00:23,790
When we looked at this
n-times regular expression, we
6
00:00:23,790 --> 00:00:27,330
were not able to represent
that directly.
7
00:00:27,330 --> 00:00:31,275
We had to represent it as a
sequence regular expression.
8
00:00:31,275 --> 00:00:32,850
So we expanded it.
9
00:00:32,850 --> 00:00:36,510
So times-one would be just a,
10
00:00:36,510 --> 00:00:40,595
times-two would
be a o a, and so on.
11
00:00:40,595 --> 00:00:43,535
And so you can see if
you go already to 13,
12
00:00:43,535 --> 00:00:47,960
n equals 13, the regular
expression becomes quite large.
13
00:00:47,960 --> 00:00:52,865
And now the functions
nullable and also derivative,
14
00:00:52,865 --> 00:00:56,360
they need to traverse
this regular expression.
15
00:00:56,360 --> 00:00:59,060
Remember this tree, sometimes,
16
00:00:59,060 --> 00:01:01,820
they have to traverse
that even several times.
17
00:01:01,820 --> 00:01:04,535
So that will slow
down the algorithm.
18
00:01:04,535 --> 00:01:08,150
And in particular, because
our first regular expression was
19
00:01:08,150 --> 00:01:11,840
actually not just a, but
a + 1. So we had
20
00:01:11,840 --> 00:01:14,330
in the case of 13,
we had a + 1, a + 1,...
21
00:01:14,330 --> 00:01:17,330
a + 1... 13 times.
22
00:01:17,330 --> 00:01:20,150
And this regular
expression just grows
23
00:01:20,150 --> 00:01:25,475
with the number n. Just to
show you this also in code,
24
00:01:25,475 --> 00:01:28,115
I'm in the file re1.sc,
25
00:01:28,115 --> 00:01:30,920
and in the end I have a size function.
26
00:01:30,920 --> 00:01:33,140
The size function takes
a regular expression as
27
00:01:33,140 --> 00:01:36,215
argument and produces an integer.
28
00:01:36,215 --> 00:01:39,980
And essentially it takes
this regular expression, seen as
29
00:01:39,980 --> 00:01:44,045
a tree and counts how many
nodes there are in this tree.
30
00:01:44,045 --> 00:01:49,490
And so if I take this EVIL1
regular expression, this one,
31
00:01:49,490 --> 00:01:52,160
and increase now this n. So you
32
00:01:52,160 --> 00:01:54,680
can already see
for n = 1,
33
00:01:54,680 --> 00:01:56,540
the size of this
record expression is 5
34
00:01:56,540 --> 00:01:59,615
and you see it
slowly increases.
35
00:01:59,615 --> 00:02:02,150
And this 20, i.e. n = 20,
36
00:02:02,150 --> 00:02:07,100
the size of this regular
expression is now already 119.
37
00:02:07,100 --> 00:02:10,145
The interesting
line is this one.
38
00:02:10,145 --> 00:02:13,295
Remember, when we
built the derivative,
39
00:02:13,295 --> 00:02:17,150
very often parts of a regular
expression gets copied.
40
00:02:17,150 --> 00:02:19,280
So if you have like EVIL1
41
00:02:19,280 --> 00:02:22,325
of 20, and you have 119 nodes,
42
00:02:22,325 --> 00:02:25,475
now this will be often copied.
43
00:02:25,475 --> 00:02:31,475
And we want to see what's the
resulting tree look like, i.e.
44
00:02:31,475 --> 00:02:32,780
how many nodes does it have.
45
00:02:32,780 --> 00:02:34,985
So I take here EVIL1 of 20
46
00:02:34,985 --> 00:02:38,405
and the derivative
according to 20 a's.
47
00:02:38,405 --> 00:02:41,820
And just have a look
what the size is.
48
00:02:43,210 --> 00:02:45,680
Okay, that takes a while.
49
00:02:45,680 --> 00:02:48,410
You see now this recular expression,
50
00:02:48,410 --> 00:02:52,280
the derivative has already
8 million plus nodes.
51
00:02:52,280 --> 00:02:55,400
And now nullable and
derivative have to
52
00:02:55,400 --> 00:02:58,430
traverse these trees with
8 million plus nodes.
53
00:02:58,430 --> 00:03:01,400
So it's no wonder
that this is slow.
54
00:03:01,400 --> 00:03:03,860
The first thing we
can try to mitigate
55
00:03:03,860 --> 00:03:06,350
this explosion problem is to
56
00:03:06,350 --> 00:03:09,050
have an explicit
n-times regular expression.
57
00:03:09,050 --> 00:03:11,600
So instead of expanding
58
00:03:11,600 --> 00:03:13,415
it to the sequence
regular expression,
59
00:03:13,415 --> 00:03:17,510
let's just have a single
regular expression n-times,
60
00:03:17,510 --> 00:03:20,540
which takes an expression and
61
00:03:20,540 --> 00:03:24,470
a number, and keeps that
regular expression small.
62
00:03:24,470 --> 00:03:27,095
I'm here in the file re2.sc,
63
00:03:27,095 --> 00:03:29,090
which is also on KEATS.
64
00:03:29,090 --> 00:03:32,570
And the only change I made
is I added explicitly
65
00:03:32,570 --> 00:03:33,860
a regular expression for
66
00:03:33,860 --> 00:03:36,770
n-times. The optional
regular expression
67
00:03:36,770 --> 00:03:39,920
we leave out at the
moment because this a?
68
00:03:39,920 --> 00:03:41,975
is represented as
69
00:03:41,975 --> 00:03:44,645
a + 1 and doesn't
expand too much.
70
00:03:44,645 --> 00:03:47,375
The real culprit
is this n-times.
71
00:03:47,375 --> 00:03:51,095
And instead of expanding
it n-times, we just
72
00:03:51,095 --> 00:03:54,275
take here a regular expression
and a natural number,
73
00:03:54,275 --> 00:03:56,960
which says how many times
it should be repeated.
74
00:03:56,960 --> 00:03:59,165
And in this week we
can keep it small.
75
00:03:59,165 --> 00:04:01,370
But by adding that
regular expression,
76
00:04:01,370 --> 00:04:05,150
we now have to think about
cases for nullable and
77
00:04:05,150 --> 00:04:07,340
derivative. So that we have
78
00:04:07,340 --> 00:04:10,205
to calculate next
what they look like.
79
00:04:10,205 --> 00:04:14,060
We can actually
calculate what the rule
80
00:04:14,060 --> 00:04:17,525
for nullable should be from
how we defined it earlier.
81
00:04:17,525 --> 00:04:19,580
Remember, if you have
a regular expression,
82
00:04:19,580 --> 00:04:21,980
r and we take it
n-times of this r,
83
00:04:21,980 --> 00:04:25,220
then in case if n = 0,
84
00:04:25,220 --> 00:04:28,130
we definded that as the
1 regular expression.
85
00:04:28,130 --> 00:04:30,380
If n = 1,
86
00:04:30,380 --> 00:04:32,495
it will be just a
single copy of r.
87
00:04:32,495 --> 00:04:33,725
If n = 2,
88
00:04:33,725 --> 00:04:35,270
it will be two copies in sequence.
89
00:04:35,270 --> 00:04:39,260
If three, we have
three copies and so on.
90
00:04:39,260 --> 00:04:41,390
Now we have to
somehow characterize
91
00:04:41,390 --> 00:04:44,285
when are these cases all nullable.
92
00:04:44,285 --> 00:04:47,810
Well, if n equals 0,
93
00:04:47,810 --> 00:04:49,850
then this will be defined as 1.
94
00:04:49,850 --> 00:04:52,100
So 1 can match
the empty string.
95
00:04:52,100 --> 00:04:54,785
So that should be
definitely true.
96
00:04:54,785 --> 00:04:57,725
Okay, if n = 1,
97
00:04:57,725 --> 00:05:00,470
when can this regular expression
match the empty string?
98
00:05:00,470 --> 00:05:02,000
So when should this be nullable?
99
00:05:02,000 --> 00:05:07,535
Well, if nullable of r holds,
100
00:05:07,535 --> 00:05:09,860
If it can match
the empty string,
101
00:05:09,860 --> 00:05:12,110
then we can match
the empty string.
102
00:05:12,110 --> 00:05:14,870
When can this regular expression
match the empty string?
103
00:05:14,870 --> 00:05:16,265
So these are now two copies of r.
104
00:05:16,265 --> 00:05:20,690
Well, both copies need
to match the empty string.
105
00:05:20,690 --> 00:05:22,820
So again, we have to ask
106
00:05:22,820 --> 00:05:25,774
both of them need to be nullable.
107
00:05:25,774 --> 00:05:28,520
Similarly, in the third case,
108
00:05:28,520 --> 00:05:30,710
all three copies need to be
109
00:05:30,710 --> 00:05:33,230
able to match the empty
string. When is that the case?
110
00:05:33,230 --> 00:05:38,360
Well, if nullable of
r holds and so on.
111
00:05:38,360 --> 00:05:48,754
So we can actually define that
if n = 0, then true,
112
00:05:48,754 --> 00:05:56,045
else we have to ask whether
nullable of r holds.
113
00:05:56,045 --> 00:05:58,550
So that will be the clause we
114
00:05:58,550 --> 00:06:01,625
have to implement for
n-times and nullable.
115
00:06:01,625 --> 00:06:04,280
Deriving the definition for
116
00:06:04,280 --> 00:06:06,920
the derivative of the n-times
117
00:06:06,920 --> 00:06:10,175
regular expression
is not that easy.
118
00:06:10,175 --> 00:06:12,755
We have to look again
how it was defined
119
00:06:12,755 --> 00:06:16,010
earlier and we somehow
have to spot a pattern.
120
00:06:16,010 --> 00:06:18,380
Otherwise, our
algorithm will be again
121
00:06:18,380 --> 00:06:20,975
quite slow if we don't
spot that pattern.
122
00:06:20,975 --> 00:06:22,550
So let's have a look.
123
00:06:22,550 --> 00:06:26,240
So in the case that
n is equal to 0,
124
00:06:26,240 --> 00:06:29,885
then r n-times was
defined as just 1.
125
00:06:29,885 --> 00:06:32,030
What would be the
derivative of one?
126
00:06:32,030 --> 00:06:36,140
So the derivative of c of one.
127
00:06:36,140 --> 00:06:38,990
Peaches defined as 0.
128
00:06:38,990 --> 00:06:41,465
Okay, fair enough.
129
00:06:41,465 --> 00:06:44,960
If he have any cross one,
130
00:06:44,960 --> 00:06:48,125
then we just have one copy
of this regular expression.
131
00:06:48,125 --> 00:06:50,120
So there's not much as we can do.
132
00:06:50,120 --> 00:06:53,735
We would have to calculate
the derivative of air are.
133
00:06:53,735 --> 00:06:57,395
Now in the n equals two case.
134
00:06:57,395 --> 00:07:00,245
That would mean we
have two copies of
135
00:07:00,245 --> 00:07:03,425
R. And they would be a sequence.
136
00:07:03,425 --> 00:07:07,775
How would be the derivative C of
137
00:07:07,775 --> 00:07:10,340
R four by R be
138
00:07:10,340 --> 00:07:13,265
defined where we would
look up the definition.
139
00:07:13,265 --> 00:07:15,470
And it would say that's
the complicated case
140
00:07:15,470 --> 00:07:16,760
d sequence or
141
00:07:16,760 --> 00:07:21,875
if null a bowl of R,
142
00:07:21,875 --> 00:07:25,250
Then the most complicated case.
143
00:07:25,250 --> 00:07:28,225
Else, That's the easy
case that would be
144
00:07:28,225 --> 00:07:32,660
derivative of c of R four
145
00:07:32,660 --> 00:07:35,540
by R. And then I
just have to copy
146
00:07:35,540 --> 00:07:40,775
that derivative of C
of four by r here.
147
00:07:40,775 --> 00:07:43,130
But in this case I have also
148
00:07:43,130 --> 00:07:51,145
the alternative derivative
of c of r. And from that,
149
00:07:51,145 --> 00:07:55,030
it looks like we can
do much better than
150
00:07:55,030 --> 00:07:58,390
that unless we do
151
00:07:58,390 --> 00:08:02,560
a little bit of magic and the
magic to do with this case.
152
00:08:02,560 --> 00:08:07,420
So let me look at this
case a bit more closely.
153
00:08:07,420 --> 00:08:09,819
Let me go back to slides
154
00:08:09,819 --> 00:08:12,940
because actually the calculation
might be a bit hairy.
155
00:08:12,940 --> 00:08:16,945
So remember we are in this
case where n equals two.
156
00:08:16,945 --> 00:08:18,550
And this was defined as
157
00:08:18,550 --> 00:08:20,680
this sequence work
expression R followed
158
00:08:20,680 --> 00:08:23,080
by r. And the question was,
159
00:08:23,080 --> 00:08:26,365
can we somehow make sense
out of this derivative
160
00:08:26,365 --> 00:08:30,370
where we have to build the
derivative of a sequence.
161
00:08:30,370 --> 00:08:33,020
So features the definition.
162
00:08:33,020 --> 00:08:36,050
We would ask if
the r is nullable,
163
00:08:36,050 --> 00:08:39,095
In this case, we return
this alternative.
164
00:08:39,095 --> 00:08:40,640
And if it's not nullable,
165
00:08:40,640 --> 00:08:44,135
It is just this
record expression.
166
00:08:44,135 --> 00:08:49,550
Now my claim is that
this whole if condition
167
00:08:49,550 --> 00:08:55,895
can be replaced by just this
simple derivative here.
168
00:08:55,895 --> 00:08:58,775
And if that claim is correct,
169
00:08:58,775 --> 00:09:01,520
there would be very nice
because in contrast to
170
00:09:01,520 --> 00:09:04,130
this if condition where
171
00:09:04,130 --> 00:09:06,280
we have to calculate
like nullable,
172
00:09:06,280 --> 00:09:08,330
decide in which branch we are.
173
00:09:08,330 --> 00:09:10,580
We don't have to
calculate your now,
174
00:09:10,580 --> 00:09:13,880
but we can just replace
it by this expression.
175
00:09:13,880 --> 00:09:16,670
So if we can
substantiate that claim,
176
00:09:16,670 --> 00:09:19,860
that will be definitely
good file algorithm.
177
00:09:20,140 --> 00:09:22,775
And to substantiate that,
178
00:09:22,775 --> 00:09:26,795
I will focus on this
record expression here.
179
00:09:26,795 --> 00:09:31,100
And notice that this record
expression the only be
180
00:09:31,100 --> 00:09:35,780
called or only be generated
if r is nullable.
181
00:09:35,780 --> 00:09:38,075
So in any other case,
182
00:09:38,075 --> 00:09:40,060
I will actually not go into it
183
00:09:40,060 --> 00:09:43,850
that if branch and would
be in the other one.
184
00:09:43,850 --> 00:09:45,260
So if we are in this if branch,
185
00:09:45,260 --> 00:09:47,705
we definitely know
that R is nullable.
186
00:09:47,705 --> 00:09:52,955
Okay? Okay, so here's
our regular expression.
187
00:09:52,955 --> 00:09:55,940
And we know it's nullable.
188
00:09:55,940 --> 00:09:57,920
So we have to somehow find
189
00:09:57,920 --> 00:10:00,380
an equivalent expression that
190
00:10:00,380 --> 00:10:04,100
is smaller or simpler
than that one.
191
00:10:04,100 --> 00:10:05,945
Let's see what we can do.
192
00:10:05,945 --> 00:10:10,160
So the first thing
actually is we multiplying
193
00:10:10,160 --> 00:10:16,595
this right hand side of the
alternative is times one.
194
00:10:16,595 --> 00:10:19,700
We can do that
because this does not
195
00:10:19,700 --> 00:10:23,090
change which strings this
work expression can match.
196
00:10:23,090 --> 00:10:25,685
Remember we even had it
as a simplification row,
197
00:10:25,685 --> 00:10:27,425
just in this case B,
198
00:10:27,425 --> 00:10:29,705
don't apply it as a
simplification will
199
00:10:29,705 --> 00:10:31,310
actually make this
work expression
200
00:10:31,310 --> 00:10:32,720
a bit more complicated.
201
00:10:32,720 --> 00:10:34,910
But times one doesn't make
202
00:10:34,910 --> 00:10:37,820
a difference because it
means the end of the string,
203
00:10:37,820 --> 00:10:40,070
we still want to match
the empty string.
204
00:10:40,070 --> 00:10:42,155
Okay, so that is possible.
205
00:10:42,155 --> 00:10:45,740
I can do that. Once
we have done that,
206
00:10:45,740 --> 00:10:48,410
you will notice that this
factor derivative of
207
00:10:48,410 --> 00:10:51,860
stuff are exactly the
same as that one.
208
00:10:51,860 --> 00:10:54,650
And in between is a plus.
209
00:10:54,650 --> 00:10:57,440
So you probably remember the law
210
00:10:57,440 --> 00:11:00,170
from school math
that I can pull out
211
00:11:00,170 --> 00:11:02,735
this factor derivative of c of r.
212
00:11:02,735 --> 00:11:06,320
And I'm inside the parentheses
is left with that.
213
00:11:06,320 --> 00:11:09,245
So now I have r plus one.
214
00:11:09,245 --> 00:11:13,055
Usually we cannot
simplify r plus one.
215
00:11:13,055 --> 00:11:15,530
If it had been R
plus 0, then yes,
216
00:11:15,530 --> 00:11:18,665
we could have got rid of the CRO.
217
00:11:18,665 --> 00:11:21,590
Plus one often
makes a difference,
218
00:11:21,590 --> 00:11:22,970
but not in our case.
219
00:11:22,970 --> 00:11:25,940
Remember, we know that
this R is nullable,
220
00:11:25,940 --> 00:11:29,840
so on its own can already
match the empty string.
221
00:11:29,840 --> 00:11:33,305
So we don't really need this
alternative plus one there,
222
00:11:33,305 --> 00:11:35,300
so we can just get rid of that.
223
00:11:35,300 --> 00:11:37,070
Okay, and so now we have
224
00:11:37,070 --> 00:11:40,535
a much simpler wound
reg expression.
225
00:11:40,535 --> 00:11:44,285
And this actually helps a
lot for our if condition.
226
00:11:44,285 --> 00:11:46,925
Look, this was the
original if condition
227
00:11:46,925 --> 00:11:50,270
and this is direct expression
h. We just simplified.
228
00:11:50,270 --> 00:11:53,105
If we replace it with this one,
229
00:11:53,105 --> 00:11:56,090
then we just end up with this.
230
00:11:56,090 --> 00:11:59,510
And now you will see that
the if condition is actually
231
00:11:59,510 --> 00:12:02,750
pointless because you
check if it's null above,
232
00:12:02,750 --> 00:12:05,060
we return this reg
expression or it's
233
00:12:05,060 --> 00:12:08,210
not nullable and we
return exactly the same.
234
00:12:08,210 --> 00:12:10,025
That doesn't make any difference.
235
00:12:10,025 --> 00:12:11,750
So we can just get rid of
236
00:12:11,750 --> 00:12:14,645
that one and can
replace it by that.
237
00:12:14,645 --> 00:12:16,865
And you see, this is now
238
00:12:16,865 --> 00:12:20,720
a much simpler case than
what we had before.
239
00:12:20,720 --> 00:12:24,170
So let's take stock
what we have so far.
240
00:12:24,170 --> 00:12:26,915
So we know India CEO case,
241
00:12:26,915 --> 00:12:30,920
the derivative needs
to be defined as 0.
242
00:12:30,920 --> 00:12:33,095
So because we define this
243
00:12:33,095 --> 00:12:36,785
and times as one,
the derivative is 0.
244
00:12:36,785 --> 00:12:39,889
For chest r, this will
be the derivative.
245
00:12:39,889 --> 00:12:42,170
And we can't do any
better than that
246
00:12:42,170 --> 00:12:45,620
for our followed by
RB just found out.
247
00:12:45,620 --> 00:12:47,270
Actually it is quite simple.
248
00:12:47,270 --> 00:12:51,410
Reg expression is equivalent
to the derivative.
249
00:12:51,410 --> 00:12:53,870
Now, we have to continue with
250
00:12:53,870 --> 00:12:56,090
that case where n is
equal to three and we
251
00:12:56,090 --> 00:12:58,099
now have three copies
252
00:12:58,099 --> 00:13:02,390
of this or what should
be the derivative?
253
00:13:02,390 --> 00:13:05,330
Well, if you look very carefully
254
00:13:05,330 --> 00:13:08,465
at this emerging pattern,
255
00:13:08,465 --> 00:13:12,410
I have to say then
what would be nice if,
256
00:13:12,410 --> 00:13:16,400
if he could show that in
the n equals three case,
257
00:13:16,400 --> 00:13:18,275
we end up with this.
258
00:13:18,275 --> 00:13:21,290
Because then what we have is.
259
00:13:21,290 --> 00:13:25,370
We can define our
nullable for n times
260
00:13:25,370 --> 00:13:31,025
s. If any cross 0 then
true as nullable.
261
00:13:31,025 --> 00:13:33,875
And for the derivative,
262
00:13:33,875 --> 00:13:37,190
we can save if n is equal to 0,
263
00:13:37,190 --> 00:13:40,235
then we return the
Sierra reg expression.
264
00:13:40,235 --> 00:13:43,295
Otherwise, as you can see
from this pattern here,
265
00:13:43,295 --> 00:13:50,735
it would be derivative of
c r four by r n minus one.
266
00:13:50,735 --> 00:13:54,770
Okay? And the only
267
00:13:54,770 --> 00:13:56,330
thing we have to make choice that
268
00:13:56,330 --> 00:13:58,175
this pattern actually holds.
269
00:13:58,175 --> 00:14:00,470
So that's, I will
show you next in
270
00:14:00,470 --> 00:14:03,735
the case for n equals three.
271
00:14:03,735 --> 00:14:07,810
If we have a wreck expression R
272
00:14:07,810 --> 00:14:09,820
and it's followed
by r and another r,
273
00:14:09,820 --> 00:14:11,275
three copies of it.
274
00:14:11,275 --> 00:14:14,245
We can just unfold
again the definition.
275
00:14:14,245 --> 00:14:16,930
So we would ask if I is nullable,
276
00:14:16,930 --> 00:14:19,765
then we have this if branch.
277
00:14:19,765 --> 00:14:23,905
And if i is not nullable
or we have this as branch.
278
00:14:23,905 --> 00:14:27,010
Okay? What can we do here?
279
00:14:27,010 --> 00:14:30,310
Well, we notice that
this one is just now
280
00:14:30,310 --> 00:14:34,510
the derivative of two
Rs, one after another.
281
00:14:34,510 --> 00:14:37,330
But this we just
calculated a moment ago,
282
00:14:37,330 --> 00:14:40,120
so we can just
replace it by this.
283
00:14:40,120 --> 00:14:43,255
Ok. That's what we
calculated earlier.
284
00:14:43,255 --> 00:14:46,665
But now you will see
again this factor,
285
00:14:46,665 --> 00:14:48,695
and this factor is the same.
286
00:14:48,695 --> 00:14:52,700
So I can pull that
out to be like that.
287
00:14:52,700 --> 00:14:57,380
And I have now followed
by R plus R. Oh,
288
00:14:57,380 --> 00:14:59,030
hey, man, now you probably
289
00:14:59,030 --> 00:15:00,680
remember how we did it earlier.
290
00:15:00,680 --> 00:15:03,080
We can now pull out one copy of
291
00:15:03,080 --> 00:15:06,020
this are to just get
something like this.
292
00:15:06,020 --> 00:15:08,765
We have to add one essentially,
293
00:15:08,765 --> 00:15:11,615
but we now get r plus one.
294
00:15:11,615 --> 00:15:15,065
And this r here is
now just pulled out.
295
00:15:15,065 --> 00:15:18,995
Well, here again kicks
in this reasoning.
296
00:15:18,995 --> 00:15:22,700
We go in this if branch
only if r is nullable,
297
00:15:22,700 --> 00:15:26,150
so on its own can already
match the empty string.
298
00:15:26,150 --> 00:15:28,895
So I don't need
really this plus one.
299
00:15:28,895 --> 00:15:30,695
I can just get rid of it.
300
00:15:30,695 --> 00:15:33,140
And so I now just have
to rearrange the parent,
301
00:15:33,140 --> 00:15:35,450
the thesis which we
said we can also do.
302
00:15:35,450 --> 00:15:37,595
So we have something like that.
303
00:15:37,595 --> 00:15:39,740
And here again, the
same reasoning,
304
00:15:39,740 --> 00:15:41,975
we have a if condition
305
00:15:41,975 --> 00:15:43,310
where it doesn't
really matter what
306
00:15:43,310 --> 00:15:44,405
we're going to return,
307
00:15:44,405 --> 00:15:46,205
it's in both branches the same.
308
00:15:46,205 --> 00:15:48,470
So we can just
replace it by that.
309
00:15:48,470 --> 00:15:51,920
And yes, now we can be
pretty sure they'll
310
00:15:51,920 --> 00:15:55,310
work for all the n times
regular expressions.
311
00:15:55,310 --> 00:15:57,860
And leave that to
the calculation for
312
00:15:57,860 --> 00:16:02,570
maybe R to the four to you.
313
00:16:02,570 --> 00:16:04,490
And the reason why I do it
314
00:16:04,490 --> 00:16:06,605
in such a detail,
this calculation,
315
00:16:06,605 --> 00:16:08,960
this is really meant
to help you with
316
00:16:08,960 --> 00:16:13,200
the coursework which is
coming up this week.
317
00:16:13,210 --> 00:16:16,250
I'm now back in our
implementation.
318
00:16:16,250 --> 00:16:20,660
In this Reto, said We have
this explicit constructor now
319
00:16:20,660 --> 00:16:25,535
for n times b can now fill
in the cases for nullable.
320
00:16:25,535 --> 00:16:27,635
So if we have R in times,
321
00:16:27,635 --> 00:16:30,995
if this n is equal to
0, we return true.
322
00:16:30,995 --> 00:16:34,190
Otherwise we have to test
whether eyes nullable.
323
00:16:34,190 --> 00:16:37,565
And in the derivative case, oi,
324
00:16:37,565 --> 00:16:40,339
if this n is equal to 0,
325
00:16:40,339 --> 00:16:43,564
the return this 0
wreck expression.
326
00:16:43,564 --> 00:16:46,700
Otherwise we return the
sequence of the derivative
327
00:16:46,700 --> 00:16:50,270
of psi of r four by n times of r,
328
00:16:50,270 --> 00:16:54,545
but n minus one times and
everything else stays the same.
329
00:16:54,545 --> 00:16:56,510
Here's the function for strings,
330
00:16:56,510 --> 00:16:58,430
derivative function for strings.
331
00:16:58,430 --> 00:17:01,595
In the main mantra
function as all the same.
332
00:17:01,595 --> 00:17:04,820
We still require this
definition because
333
00:17:04,820 --> 00:17:06,050
we haven't done anything about
334
00:17:06,050 --> 00:17:08,090
the optional record
expression yet.
335
00:17:08,090 --> 00:17:10,670
And we have now at this
336
00:17:10,670 --> 00:17:13,250
either warm and evil
2-break expression.
337
00:17:13,250 --> 00:17:15,290
And let's test it. And let's be
338
00:17:15,290 --> 00:17:17,000
a bit more ambitious.
Be testing it.
339
00:17:17,000 --> 00:17:20,315
The strings between 01000
340
00:17:20,315 --> 00:17:22,580
and let's see what the time say.
341
00:17:22,580 --> 00:17:26,210
I'm testing this again
inside the ammonite rebel.
342
00:17:26,210 --> 00:17:30,180
And you'll see it should
be now much quicker.
343
00:17:30,610 --> 00:17:34,640
Okay, it might slow
down also around 600.
344
00:17:34,640 --> 00:17:40,490
700 needs two seconds,
three seconds, 4800.
345
00:17:40,490 --> 00:17:43,940
Let's see about it
needs 4 thousand.
346
00:17:43,940 --> 00:17:47,435
But you have to remember
Ruby and Python.
347
00:17:47,435 --> 00:17:51,530
They needed half a
minute to just 43 TAs.
348
00:17:51,530 --> 00:17:54,485
We need a little bit
more than six seconds,
349
00:17:54,485 --> 00:17:57,110
but we are processing a string of
350
00:17:57,110 --> 00:18:00,575
1000 days so that success.
351
00:18:00,575 --> 00:18:04,775
So this speed is also explained
if you look at the sizes.
352
00:18:04,775 --> 00:18:08,975
Since we now have this explicit
and times constructor,
353
00:18:08,975 --> 00:18:11,930
it doesn't really matter
how big this n is.
354
00:18:11,930 --> 00:18:14,540
This evil one reg
expression will always be
355
00:18:14,540 --> 00:18:17,195
of this size seven,
the beginning.
356
00:18:17,195 --> 00:18:20,315
And you can also see if you
now build the derivatives,
357
00:18:20,315 --> 00:18:22,550
they still grow in size,
358
00:18:22,550 --> 00:18:24,740
but much more moderately.
359
00:18:24,740 --> 00:18:28,100
And let's try out this
example, this 20 a.
360
00:18:28,100 --> 00:18:31,685
So we build the derivative
according to 20 character A's.
361
00:18:31,685 --> 00:18:33,890
In the earlier example,
362
00:18:33,890 --> 00:18:35,780
we ended up this a
wreck expression which
363
00:18:35,780 --> 00:18:37,895
had like 8 million plus nodes.
364
00:18:37,895 --> 00:18:39,830
And if we do this now,
365
00:18:39,830 --> 00:18:43,205
then we just have a wreck
expression with 211 nodes.
366
00:18:43,205 --> 00:18:44,750
And that is much smaller and
367
00:18:44,750 --> 00:18:47,165
all the calculations
would be much quicker.
368
00:18:47,165 --> 00:18:49,550
So yeah, there's
369
00:18:49,550 --> 00:18:52,205
this push off CKY algorithm
and this improvement.
370
00:18:52,205 --> 00:18:54,890
We're now running
circles around Ruby and
371
00:18:54,890 --> 00:18:58,445
Python because they're just
stuck here at the beginning.
372
00:18:58,445 --> 00:19:00,230
Because they need already
373
00:19:00,230 --> 00:19:02,975
like half a minute
for just 30 a's.
374
00:19:02,975 --> 00:19:05,930
We now can do something
like thousand a's.
375
00:19:05,930 --> 00:19:07,580
And in reasonable time.
376
00:19:07,580 --> 00:19:09,740
I think this must be
timing I obtained with
377
00:19:09,740 --> 00:19:12,635
my older laptop some time ago.
378
00:19:12,635 --> 00:19:14,210
Because remember we
had something like
379
00:19:14,210 --> 00:19:16,670
six seconds here, it says 15.
380
00:19:16,670 --> 00:19:18,080
So you always have to take
381
00:19:18,080 --> 00:19:20,885
these times with
the pinch of salt.
382
00:19:20,885 --> 00:19:22,670
It's essentially only the trend,
383
00:19:22,670 --> 00:19:25,100
but it's clear we are
much, much better.
384
00:19:25,100 --> 00:19:27,065
So we have worked a lot,
385
00:19:27,065 --> 00:19:30,720
but we also got something
for it in return.