updated
authorChristian Urban <christian.urban@kcl.ac.uk>
Mon, 05 Oct 2020 01:42:47 +0100
changeset 773 260e330f7466
parent 772 3bf3f5bb067e
child 774 42733adf2a48
updated
videos/02-algo2.srt
videos/02-algo3.srt
--- a/videos/02-algo2.srt	Sun Oct 04 22:20:25 2020 +0100
+++ b/videos/02-algo2.srt	Mon Oct 05 01:42:47 2020 +0100
@@ -581,11 +581,11 @@
 
 126
 00:06:32,030 --> 00:06:36,140
-So the derivative of c of one.
+So the derivative of c of 1
 
 127
 00:06:36,140 --> 00:06:38,990
-Peaches defined as 0.
+would be defined as 0.
 
 128
 00:06:38,990 --> 00:06:41,465
@@ -593,7 +593,7 @@
 
 129
 00:06:41,465 --> 00:06:44,960
-If he have any cross one,
+If he have n = 1,
 
 130
 00:06:44,960 --> 00:06:48,125
@@ -602,74 +602,73 @@
 
 131
 00:06:48,125 --> 00:06:50,120
-So there's not much as we can do.
+So there's not much else we can do.
 
 132
 00:06:50,120 --> 00:06:53,735
 We would have to calculate
-the derivative of air are.
+the derivative of r.
 
 133
 00:06:53,735 --> 00:06:57,395
-Now in the n equals two case.
+Now in the n = 2 case,
 
 134
 00:06:57,395 --> 00:07:00,245
-That would mean we
-have two copies of
+that would mean we
+have two copies of r.
 
 135
 00:07:00,245 --> 00:07:03,425
-R. And they would be a sequence.
+And they would be a sequence.
 
 136
 00:07:03,425 --> 00:07:07,775
-How would be the derivative C of
+How would be the derivative c of
 
 137
 00:07:07,775 --> 00:07:10,340
-R four by R be
+r followed by r be
 
 138
 00:07:10,340 --> 00:07:13,265
-defined where we would
+defined? Well we would
 look up the definition.
 
 139
 00:07:13,265 --> 00:07:15,470
 And it would say that's
-the complicated case
+the complicated case,
 
 140
 00:07:15,470 --> 00:07:16,760
-d sequence or
+the sequence.
 
 141
 00:07:16,760 --> 00:07:21,875
-if null a bowl of R,
+If nullable of r,
 
 142
 00:07:21,875 --> 00:07:25,250
-Then the most complicated case.
+then the more complicated case,
 
 143
 00:07:25,250 --> 00:07:28,225
-Else, That's the easy
-case that would be
+else, that's the easy
+case, that would be
 
 144
 00:07:28,225 --> 00:07:32,660
-derivative of c of R four
+derivative of c of r, followed
 
 145
 00:07:32,660 --> 00:07:35,540
-by R. And then I
-just have to copy
+by r. And then I just have to copy
 
 146
 00:07:35,540 --> 00:07:40,775
-that derivative of C
-of four by r here.
+that derivative of c
+of r followed by r here.
 
 147
 00:07:40,775 --> 00:07:43,130
@@ -682,17 +681,17 @@
 
 149
 00:07:51,145 --> 00:07:55,030
-it looks like we can
+it looks like we can't
 do much better than
 
 150
 00:07:55,030 --> 00:07:58,390
-that unless we do
+that, unless we do
 
 151
 00:07:58,390 --> 00:08:02,560
 a little bit of magic and the
-magic to do with this case.
+magic has to do with this case.
 
 152
 00:08:02,560 --> 00:08:07,420
@@ -719,12 +718,12 @@
 
 157
 00:08:18,550 --> 00:08:20,680
-this sequence work
-expression R followed
+this sequence regular
+expression r followed by r.
 
 158
 00:08:20,680 --> 00:08:23,080
-by r. And the question was,
+And the question was,
 
 159
 00:08:23,080 --> 00:08:26,365
@@ -738,11 +737,11 @@
 
 161
 00:08:30,370 --> 00:08:33,020
-So features the definition.
+So if we just unfold the definition,
 
 162
 00:08:33,020 --> 00:08:36,050
-We would ask if
+we would ask if
 the r is nullable,
 
 163
@@ -756,8 +755,8 @@
 
 165
 00:08:40,640 --> 00:08:44,135
-It is just this
-record expression.
+it is just this
+regular expression.
 
 166
 00:08:44,135 --> 00:08:49,550
@@ -794,11 +793,11 @@
 173
 00:09:08,330 --> 00:09:10,580
 We don't have to
-calculate your now,
+calculate nullable,
 
 174
 00:09:10,580 --> 00:09:13,880
-but we can just replace
+we can just replace
 it by this expression.
 
 175
@@ -809,7 +808,7 @@
 176
 00:09:16,670 --> 00:09:19,860
 that will be definitely
-good file algorithm.
+good our algorithm.
 
 177
 00:09:20,140 --> 00:09:22,775
@@ -823,7 +822,7 @@
 179
 00:09:26,795 --> 00:09:31,100
 And notice that this record
-expression the only be
+expression will only be
 
 180
 00:09:31,100 --> 00:09:35,780
@@ -836,7 +835,7 @@
 
 182
 00:09:38,075 --> 00:09:40,060
-I will actually not go into it
+I will actually not go into 
 
 183
 00:09:40,060 --> 00:09:43,850
@@ -850,11 +849,11 @@
 185
 00:09:45,260 --> 00:09:47,705
 we definitely know
-that R is nullable.
+that r is nullable.
 
 186
 00:09:47,705 --> 00:09:52,955
-Okay? Okay, so here's
+Okay, so here's
 our regular expression.
 
 187
@@ -881,12 +880,12 @@
 192
 00:10:05,945 --> 00:10:10,160
 So the first thing
-actually is we multiplying
+actually is we're multiplying
 
 193
 00:10:10,160 --> 00:10:16,595
 this right hand side of the
-alternative is times one.
+alternative with times 1.
 
 194
 00:10:16,595 --> 00:10:19,700
@@ -896,26 +895,26 @@
 195
 00:10:19,700 --> 00:10:23,090
 change which strings this
-work expression can match.
+regular expression can match.
 
 196
 00:10:23,090 --> 00:10:25,685
 Remember we even had it
-as a simplification row,
+as a simplification rule,
 
 197
 00:10:25,685 --> 00:10:27,425
-just in this case B,
+just in this case we
 
 198
 00:10:27,425 --> 00:10:29,705
 don't apply it as a
-simplification will
+simplification rule,
 
 199
 00:10:29,705 --> 00:10:31,310
 actually make this
-work expression
+regular expression
 
 200
 00:10:31,310 --> 00:10:32,720
@@ -923,12 +922,12 @@
 
 201
 00:10:32,720 --> 00:10:34,910
-But times one doesn't make
+But times 1 doesn't make
 
 202
 00:10:34,910 --> 00:10:37,820
 a difference because it
-means the end of the string,
+means at the end of a string,
 
 203
 00:10:37,820 --> 00:10:40,070
@@ -951,7 +950,7 @@
 
 207
 00:10:48,410 --> 00:10:51,860
-stuff are exactly the
+r are exactly the
 same as that one.
 
 208
@@ -978,25 +977,24 @@
 
 213
 00:11:06,320 --> 00:11:09,245
-So now I have r plus one.
+So now I have r + 1.
 
 214
 00:11:09,245 --> 00:11:13,055
 Usually we cannot
-simplify r plus one.
+simplify r + 1.
 
 215
 00:11:13,055 --> 00:11:15,530
-If it had been R
-plus 0, then yes,
+If it had been r + 0, then yes,
 
 216
 00:11:15,530 --> 00:11:18,665
-we could have got rid of the CRO.
+we could have got rid of the 0.
 
 217
 00:11:18,665 --> 00:11:21,590
-Plus one often
+But this + 1 often
 makes a difference,
 
 218
@@ -1006,7 +1004,7 @@
 219
 00:11:22,970 --> 00:11:25,940
 Remember, we know that
-this R is nullable,
+this r is nullable,
 
 220
 00:11:25,940 --> 00:11:29,840
@@ -1028,23 +1026,23 @@
 
 224
 00:11:37,070 --> 00:11:40,535
-a much simpler wound
-reg expression.
+a much simpler equivalent
+regular expression.
 
 225
 00:11:40,535 --> 00:11:44,285
 And this actually helps a
-lot for our if condition.
+lot for our if-condition.
 
 226
 00:11:44,285 --> 00:11:46,925
 Look, this was the
-original if condition
+original if-condition
 
 227
 00:11:46,925 --> 00:11:50,270
-and this is direct expression
-h. We just simplified.
+and this is the regular expression
+we just simplified.
 
 228
 00:11:50,270 --> 00:11:53,105
@@ -1062,11 +1060,11 @@
 231
 00:11:59,510 --> 00:12:02,750
 pointless because you
-check if it's null above,
+check if it's nullable,
 
 232
 00:12:02,750 --> 00:12:05,060
-we return this reg
+we return this regular
 expression or it's
 
 233
@@ -1103,7 +1101,7 @@
 
 240
 00:12:24,170 --> 00:12:26,915
-So we know India CEO case,
+So we know in the 0-case,
 
 241
 00:12:26,915 --> 00:12:30,920
@@ -1112,35 +1110,35 @@
 
 242
 00:12:30,920 --> 00:12:33,095
-So because we define this
+Because we define this
 
 243
 00:12:33,095 --> 00:12:36,785
-and times as one,
+n-times as one,
 the derivative is 0.
 
 244
 00:12:36,785 --> 00:12:39,889
-For chest r, this will
+For just r, this will
 be the derivative.
 
 245
 00:12:39,889 --> 00:12:42,170
 And we can't do any
-better than that
+better than that.
 
 246
 00:12:42,170 --> 00:12:45,620
-for our followed by
-RB just found out.
+For r followed by r, as we
+just found out
 
 247
 00:12:45,620 --> 00:12:47,270
-Actually it is quite simple.
+actually it is quite simple
 
 248
 00:12:47,270 --> 00:12:51,410
-Reg expression is equivalent
+regular expression is equivalent
 to the derivative.
 
 249
@@ -1158,7 +1156,7 @@
 
 252
 00:12:58,099 --> 00:13:02,390
-of this or what should
+of this r. What should
 be the derivative?
 
 253
@@ -1185,17 +1183,17 @@
 
 258
 00:13:18,275 --> 00:13:21,290
-Because then what we have is.
+Because then what we have is this.
 
 259
 00:13:21,290 --> 00:13:25,370
 We can define our
-nullable for n times
+nullable for n-times
 
 260
 00:13:25,370 --> 00:13:31,025
-s. If any cross 0 then
-true as nullable.
+as if n = 0 then
+true, else nullable r.
 
 261
 00:13:31,025 --> 00:13:33,875
@@ -1208,7 +1206,7 @@
 263
 00:13:37,190 --> 00:13:40,235
 then we return the
-Sierra reg expression.
+0 regular expression.
 
 264
 00:13:40,235 --> 00:13:43,295
@@ -1218,7 +1216,7 @@
 265
 00:13:43,295 --> 00:13:50,735
 it would be derivative of
-c r four by r n minus one.
+c r followed by r{n-1}.
 
 266
 00:13:50,735 --> 00:13:54,770
@@ -1226,7 +1224,7 @@
 
 267
 00:13:54,770 --> 00:13:56,330
-thing we have to make choice that
+thing we have to make csure is that
 
 268
 00:13:56,330 --> 00:13:58,175
@@ -1243,7 +1241,7 @@
 
 271
 00:14:03,735 --> 00:14:07,810
-If we have a wreck expression R
+If we have a regular expression r
 
 272
 00:14:07,810 --> 00:14:09,820
@@ -1261,7 +1259,7 @@
 
 275
 00:14:14,245 --> 00:14:16,930
-So we would ask if I is nullable,
+So we would ask if r is nullable,
 
 276
 00:14:16,930 --> 00:14:19,765
@@ -1269,8 +1267,8 @@
 
 277
 00:14:19,765 --> 00:14:23,905
-And if i is not nullable
-or we have this as branch.
+And if r is not nullable,
+we have this else branch.
 
 278
 00:14:23,905 --> 00:14:27,010
@@ -1284,7 +1282,7 @@
 280
 00:14:30,310 --> 00:14:34,510
 the derivative of two
-Rs, one after another.
+r's, one after another.
 
 281
 00:14:34,510 --> 00:14:37,330
@@ -1317,12 +1315,12 @@
 
 287
 00:14:52,700 --> 00:14:57,380
-And I have now followed
-by R plus R. Oh,
+And I have now r followed
+by r plus r. 
 
 288
 00:14:57,380 --> 00:14:59,030
-hey, man, now you probably
+But now you probably
 
 289
 00:14:59,030 --> 00:15:00,680
@@ -1358,11 +1356,11 @@
 296
 00:15:18,995 --> 00:15:22,700
 We go in this if branch
-only if r is nullable,
+only if r is nullable.
 
 297
 00:15:22,700 --> 00:15:26,150
-so on its own can already
+So r on its own can already
 match the empty string.
 
 298
@@ -1377,12 +1375,11 @@
 300
 00:15:30,695 --> 00:15:33,140
 And so I now just have
-to rearrange the parent,
+to rearrange the parentheses
 
 301
 00:15:33,140 --> 00:15:35,450
-the thesis which we
-said we can also do.
+which we said we can also do.
 
 302
 00:15:35,450 --> 00:15:37,595
@@ -1395,7 +1392,7 @@
 
 304
 00:15:39,740 --> 00:15:41,975
-we have a if condition
+we have an if condition
 
 305
 00:15:41,975 --> 00:15:43,310
@@ -1422,17 +1419,17 @@
 
 310
 00:15:51,920 --> 00:15:55,310
-work for all the n times
+work for all the n-times
 regular expressions.
 
 311
 00:15:55,310 --> 00:15:57,860
-And leave that to
+And I leave 
 the calculation for
 
 312
 00:15:57,860 --> 00:16:02,570
-maybe R to the four to you.
+maybe r to the four to you.
 
 313
 00:16:02,570 --> 00:16:04,490
@@ -1460,17 +1457,17 @@
 
 318
 00:16:16,250 --> 00:16:20,660
-In this Reto, said We have
-this explicit constructor now
+In this re2.sc, we said we have
+this explicit constructor 
 
 319
 00:16:20,660 --> 00:16:25,535
-for n times b can now fill
+for n-times. We can now fill
 in the cases for nullable.
 
 320
 00:16:25,535 --> 00:16:27,635
-So if we have R in times,
+So if we have r n-times,
 
 321
 00:16:27,635 --> 00:16:30,995
@@ -1480,11 +1477,11 @@
 322
 00:16:30,995 --> 00:16:34,190
 Otherwise we have to test
-whether eyes nullable.
+whether r is nullable.
 
 323
 00:16:34,190 --> 00:16:37,565
-And in the derivative case, oi,
+And in the derivative case, 
 
 324
 00:16:37,565 --> 00:16:40,339
@@ -1493,7 +1490,7 @@
 325
 00:16:40,339 --> 00:16:43,564
 the return this 0
-wreck expression.
+regular expression.
 
 326
 00:16:43,564 --> 00:16:46,700
@@ -1502,11 +1499,11 @@
 
 327
 00:16:46,700 --> 00:16:50,270
-of psi of r four by n times of r,
+of c of r followed by n times of r,
 
 328
 00:16:50,270 --> 00:16:54,545
-but n minus one times and
+but n minus one times, and
 everything else stays the same.
 
 329
@@ -1519,8 +1516,8 @@
 
 331
 00:16:58,430 --> 00:17:01,595
-In the main mantra
-function as all the same.
+In the main matcher
+function is all the same.
 
 332
 00:17:01,595 --> 00:17:04,820
@@ -1533,17 +1530,17 @@
 
 334
 00:17:06,050 --> 00:17:08,090
-the optional record
+the optional regular
 expression yet.
 
 335
 00:17:08,090 --> 00:17:10,670
-And we have now at this
+And we have now this
 
 336
 00:17:10,670 --> 00:17:13,250
-either warm and evil
-2-break expression.
+EVIL1 and EVIL2
+regular expression.
 
 337
 00:17:13,250 --> 00:17:15,290
@@ -1552,11 +1549,11 @@
 338
 00:17:15,290 --> 00:17:17,000
 a bit more ambitious.
-Be testing it.
+We're testing it with
 
 339
 00:17:17,000 --> 00:17:20,315
-The strings between 01000
+strings between 0 and 1000
 
 340
 00:17:20,315 --> 00:17:22,580
@@ -1565,7 +1562,7 @@
 341
 00:17:22,580 --> 00:17:26,210
 I'm testing this again
-inside the ammonite rebel.
+inside the Ammonite REPL.
 
 342
 00:17:26,210 --> 00:17:30,180
@@ -1580,22 +1577,22 @@
 344
 00:17:34,640 --> 00:17:40,490
 700 needs two seconds,
-three seconds, 4800.
+three seconds for 800.
 
 345
 00:17:40,490 --> 00:17:43,940
-Let's see about it
-needs 4 thousand.
+Let's see what it
+needs for one thousand?
 
 346
 00:17:43,940 --> 00:17:47,435
 But you have to remember
-Ruby and Python.
+Ruby and Python
 
 347
 00:17:47,435 --> 00:17:51,530
-They needed half a
-minute to just 43 TAs.
+needed half a
+minute for just 30 a's.
 
 348
 00:17:51,530 --> 00:17:54,485
@@ -1608,17 +1605,17 @@
 
 350
 00:17:57,110 --> 00:18:00,575
-1000 days so that success.
+1000 a's. So that success.
 
 351
 00:18:00,575 --> 00:18:04,775
-So this speed is also explained
+This speed is also explained
 if you look at the sizes.
 
 352
 00:18:04,775 --> 00:18:08,975
 Since we now have this explicit
-and times constructor,
+n-times constructor,
 
 353
 00:18:08,975 --> 00:18:11,930
@@ -1627,12 +1624,12 @@
 
 354
 00:18:11,930 --> 00:18:14,540
-This evil one reg
+This EVIL1 regular
 expression will always be
 
 355
 00:18:14,540 --> 00:18:17,195
-of this size seven,
+of this size seven, at
 the beginning.
 
 356
@@ -1651,12 +1648,12 @@
 359
 00:18:24,740 --> 00:18:28,100
 And let's try out this
-example, this 20 a.
+example with 20 a's.
 
 360
 00:18:28,100 --> 00:18:31,685
 So we build the derivative
-according to 20 character A's.
+according to 20 character a's.
 
 361
 00:18:31,685 --> 00:18:33,890
@@ -1664,8 +1661,8 @@
 
 362
 00:18:33,890 --> 00:18:35,780
-we ended up this a
-wreck expression which
+we ended up with a
+regular expression that
 
 363
 00:18:35,780 --> 00:18:37,895
@@ -1677,7 +1674,7 @@
 
 365
 00:18:39,830 --> 00:18:43,205
-then we just have a wreck
+then we just have a regular
 expression with 211 nodes.
 
 366
@@ -1691,16 +1688,16 @@
 
 368
 00:18:47,165 --> 00:18:49,550
-So yeah, there's
+So yeah, the Brzozowski
 
 369
 00:18:49,550 --> 00:18:52,205
-this push off CKY algorithm
-and this improvement.
+algorithm
+and with this improvement,
 
 370
 00:18:52,205 --> 00:18:54,890
-We're now running
+we're now running
 circles around Ruby and
 
 371
@@ -1720,11 +1717,11 @@
 374
 00:19:02,975 --> 00:19:05,930
 We now can do something
-like thousand a's.
+like thousand a's
 
 375
 00:19:05,930 --> 00:19:07,580
-And in reasonable time.
+in a reasonable time.
 
 376
 00:19:07,580 --> 00:19:09,740
@@ -1742,11 +1739,11 @@
 
 379
 00:19:14,210 --> 00:19:16,670
-six seconds here, it says 15.
+six seconds. Here it says 15.
 
 380
 00:19:16,670 --> 00:19:18,080
-So you always have to take
+You always have to take
 
 381
 00:19:18,080 --> 00:19:20,885
@@ -1764,7 +1761,7 @@
 
 384
 00:19:25,100 --> 00:19:27,065
-So we have worked a lot,
+We have worked a lot,
 
 385
 00:19:27,065 --> 00:19:30,720
--- /dev/null	Thu Jan 01 00:00:00 1970 +0000
+++ b/videos/02-algo3.srt	Mon Oct 05 01:42:47 2020 +0100
@@ -0,0 +1,1754 @@
+1
+00:00:06,350 --> 00:00:10,305
+They come back. I
+can hear you saying,
+
+2
+00:00:10,305 --> 00:00:12,000
+yes, you tried it out on
+
+3
+00:00:12,000 --> 00:00:14,760
+one example and you
+were much better.
+
+4
+00:00:14,760 --> 00:00:17,415
+But how about on other examples?
+
+5
+00:00:17,415 --> 00:00:21,265
+Specifically, we had two
+regular expressions.
+
+6
+00:00:21,265 --> 00:00:23,480
+How about the other case?
+
+7
+00:00:23,480 --> 00:00:27,480
+Where let's have a look at
+that other case in this video.
+
+8
+00:00:29,140 --> 00:00:32,705
+So I'm back here
+in this Reto file.
+
+9
+00:00:32,705 --> 00:00:35,180
+And here's this test
+case which run quite
+
+10
+00:00:35,180 --> 00:00:39,665
+nicely for strings between 01000.
+
+11
+00:00:39,665 --> 00:00:42,725
+Here is the other test case.
+
+12
+00:00:42,725 --> 00:00:44,090
+I still run it only
+
+13
+00:00:44,090 --> 00:00:48,470
+for relatively small
+strings between 020.
+
+14
+00:00:48,470 --> 00:00:50,240
+And let's see what it say.
+
+15
+00:00:50,240 --> 00:00:51,800
+So I'm running the test in
+
+16
+00:00:51,800 --> 00:00:57,320
+the ammonoids repo and
+doesn't look too bad.
+
+17
+00:00:57,320 --> 00:01:01,160
+But this might be because
+20 is not general enough.
+
+18
+00:01:01,160 --> 00:01:03,620
+So let's try it with 30.
+
+19
+00:01:03,620 --> 00:01:06,530
+Let's run this test again.
+
+20
+00:01:06,530 --> 00:01:11,075
+And 20 is quite okay.
+
+21
+00:01:11,075 --> 00:01:15,965
+22, okay, that's now
+almost ten times as much.
+
+22
+00:01:15,965 --> 00:01:18,995
+And then the next
+one would be 24.
+
+23
+00:01:18,995 --> 00:01:23,615
+And we're waiting and waiting.
+
+24
+00:01:23,615 --> 00:01:26,410
+And we are waiting.
+
+25
+00:01:26,410 --> 00:01:29,300
+Actually, it isn't
+calculated at all.
+
+26
+00:01:29,300 --> 00:01:31,399
+It run out of memory.
+
+27
+00:01:31,399 --> 00:01:33,905
+So here is something going on,
+
+28
+00:01:33,905 --> 00:01:37,640
+which is Daphne bad and we
+have to have a look at that.
+
+29
+00:01:37,640 --> 00:01:40,640
+Okay? It's always
+instructive with
+
+30
+00:01:40,640 --> 00:01:43,460
+this algorithm to
+look at the sizes of
+
+31
+00:01:43,460 --> 00:01:45,695
+the record expressions
+we calculate
+
+32
+00:01:45,695 --> 00:01:49,625
+the evil to Is this
+a star, star B.
+
+33
+00:01:49,625 --> 00:01:51,800
+So there's nothing we
+can compress there.
+
+34
+00:01:51,800 --> 00:01:55,490
+It has just stars and
+sequences and nothing else.
+
+35
+00:01:55,490 --> 00:01:58,430
+And so it's not that we
+can play the same trick
+
+36
+00:01:58,430 --> 00:02:01,805
+of trying to introduce
+an explicit constructor.
+
+37
+00:02:01,805 --> 00:02:04,190
+But now let's have a
+look at the derivatives.
+
+38
+00:02:04,190 --> 00:02:05,600
+As you can see here.
+
+39
+00:02:05,600 --> 00:02:08,420
+If I take this evil to wreck
+
+40
+00:02:08,420 --> 00:02:09,935
+expression and they built
+
+41
+00:02:09,935 --> 00:02:12,470
+now longer and
+longer derivatives,
+
+42
+00:02:12,470 --> 00:02:14,090
+you can see this grows.
+
+43
+00:02:14,090 --> 00:02:16,055
+And as you see in this line,
+
+44
+00:02:16,055 --> 00:02:17,270
+if I'm trying to take
+
+45
+00:02:17,270 --> 00:02:20,180
+the derivative of a
+quite large string,
+
+46
+00:02:20,180 --> 00:02:21,680
+it is quite quick.
+
+47
+00:02:21,680 --> 00:02:26,870
+But the size of the
+reg expression,
+
+48
+00:02:26,870 --> 00:02:28,310
+the number of nodes,
+
+49
+00:02:28,310 --> 00:02:30,815
+is also like nearly
+eight millions.
+
+50
+00:02:30,815 --> 00:02:33,845
+And so even if that calculates
+relatively quickly,
+
+51
+00:02:33,845 --> 00:02:37,865
+that is the reason why at 24,
+
+52
+00:02:37,865 --> 00:02:39,560
+it just runs out of memory.
+
+53
+00:02:39,560 --> 00:02:42,785
+This reg expression
+just grew too much.
+
+54
+00:02:42,785 --> 00:02:46,520
+So we somehow need
+to still compress
+
+55
+00:02:46,520 --> 00:02:49,655
+this record expression
+and make it not
+
+56
+00:02:49,655 --> 00:02:52,700
+grow so much when we
+build derivative.
+
+57
+00:02:52,700 --> 00:02:54,020
+So we have to look at
+
+58
+00:02:54,020 --> 00:02:57,050
+where does that grow
+actually come from.
+
+59
+00:02:57,050 --> 00:03:00,170
+Let's look at the
+derivative operation
+
+60
+00:03:00,170 --> 00:03:01,895
+again in more detail.
+
+61
+00:03:01,895 --> 00:03:03,740
+When we introduced
+it, we looked at
+
+62
+00:03:03,740 --> 00:03:06,560
+this example of a
+wreck expression R,
+
+63
+00:03:06,560 --> 00:03:11,465
+which is a star of some
+alternative and some sequence.
+
+64
+00:03:11,465 --> 00:03:13,130
+And we can build now
+
+65
+00:03:13,130 --> 00:03:15,800
+the derivative of this
+r according to a,
+
+66
+00:03:15,800 --> 00:03:18,965
+b, and c and see
+what it calculates.
+
+67
+00:03:18,965 --> 00:03:21,935
+And you see in case of a,
+
+68
+00:03:21,935 --> 00:03:26,570
+it's like one times b plus
+0 and then followed by r,
+
+69
+00:03:26,570 --> 00:03:29,015
+which is this whole
+wreck expression again.
+
+70
+00:03:29,015 --> 00:03:30,935
+So you can also see.
+
+71
+00:03:30,935 --> 00:03:34,745
+Derivative operation
+introduces a lot of junk.
+
+72
+00:03:34,745 --> 00:03:38,165
+This plus 0 isn't
+really necessary.
+
+73
+00:03:38,165 --> 00:03:42,455
+And this times one could
+be also thrown away.
+
+74
+00:03:42,455 --> 00:03:43,685
+So in the first case,
+
+75
+00:03:43,685 --> 00:03:48,110
+actually we could just have
+one times b is b plus 0,
+
+76
+00:03:48,110 --> 00:03:49,580
+it still be a,
+
+77
+00:03:49,580 --> 00:03:54,905
+so it would be B followed by
+R. And in this second case,
+
+78
+00:03:54,905 --> 00:03:57,515
+C0 times b, that will be just 0.
+
+79
+00:03:57,515 --> 00:03:59,270
+Then 0 plus one is
+
+80
+00:03:59,270 --> 00:04:05,330
+11 times r would be just
+r. And in the last case,
+
+81
+00:04:05,330 --> 00:04:12,155
+C0 times B would be 00 plus
+0 is 00 times r would be 0.
+
+82
+00:04:12,155 --> 00:04:17,540
+Okay? So we could throw out
+all these useless zeros and
+
+83
+00:04:17,540 --> 00:04:20,960
+ones because actually
+we have to throw
+
+84
+00:04:20,960 --> 00:04:24,845
+them out because over time
+they just accumulate.
+
+85
+00:04:24,845 --> 00:04:27,035
+And then we build
+the next derivative.
+
+86
+00:04:27,035 --> 00:04:31,130
+0 wouldn't go away by
+building annuity where tests.
+
+87
+00:04:31,130 --> 00:04:34,310
+So we need to somehow
+a mechanism to
+
+88
+00:04:34,310 --> 00:04:39,120
+delete the junk from
+these derivatives.
+
+89
+00:04:39,280 --> 00:04:41,900
+But how to delete junk?
+
+90
+00:04:41,900 --> 00:04:43,370
+We already know we have
+
+91
+00:04:43,370 --> 00:04:47,825
+the simplification rules
+which say like if r plus 0,
+
+92
+00:04:47,825 --> 00:04:53,000
+I can just replace by
+r and the other ones.
+
+93
+00:04:53,000 --> 00:04:55,760
+And so what I now
+need to do is in
+
+94
+00:04:55,760 --> 00:04:58,715
+my algorithm when I
+built the derivative,
+
+95
+00:04:58,715 --> 00:05:01,415
+this might have
+introduced some chunk.
+
+96
+00:05:01,415 --> 00:05:02,960
+And I now have to have
+
+97
+00:05:02,960 --> 00:05:06,590
+essentially a additional function
+
+98
+00:05:06,590 --> 00:05:08,945
+which deletes this junk again.
+
+99
+00:05:08,945 --> 00:05:10,490
+And in doing so,
+
+100
+00:05:10,490 --> 00:05:13,340
+keep steer, Rekha expression,
+
+101
+00:05:13,340 --> 00:05:16,730
+relatively small, pickers debts,
+
+102
+00:05:16,730 --> 00:05:19,535
+the Achilles heel
+of this algorithm.
+
+103
+00:05:19,535 --> 00:05:22,565
+If this regular expression
+grows too much,
+
+104
+00:05:22,565 --> 00:05:25,070
+then all these functions
+will slow down.
+
+105
+00:05:25,070 --> 00:05:26,360
+So the purpose of
+
+106
+00:05:26,360 --> 00:05:30,455
+the simplification function
+is to after the derivative,
+
+107
+00:05:30,455 --> 00:05:33,080
+compress again this
+rec expression
+
+108
+00:05:33,080 --> 00:05:35,570
+and then do the next derivative.
+
+109
+00:05:35,570 --> 00:05:39,815
+So if we introduce that
+additional functions simp,
+
+110
+00:05:39,815 --> 00:05:42,440
+which essentially
+just goes through
+
+111
+00:05:42,440 --> 00:05:46,040
+this reg expression and
+deletes all this junk,
+
+112
+00:05:46,040 --> 00:05:50,045
+then we should be in a
+much better position.
+
+113
+00:05:50,045 --> 00:05:52,490
+As you can see on this slide,
+
+114
+00:05:52,490 --> 00:05:54,440
+the simplification
+function can actually
+
+115
+00:05:54,440 --> 00:05:56,855
+be implemented very easily.
+
+116
+00:05:56,855 --> 00:05:59,750
+However, there are few
+interesting points.
+
+117
+00:05:59,750 --> 00:06:02,720
+First of all, there
+are only two cases.
+
+118
+00:06:02,720 --> 00:06:05,060
+The only simplify when we have
+
+119
+00:06:05,060 --> 00:06:08,255
+an alternative or a sequence.
+
+120
+00:06:08,255 --> 00:06:12,859
+So for example, we will
+never simplify under star.
+
+121
+00:06:12,859 --> 00:06:15,950
+If you look at the
+derivative operation
+
+122
+00:06:15,950 --> 00:06:17,825
+and you scratch your
+head a little bit,
+
+123
+00:06:17,825 --> 00:06:20,180
+we'll find out why
+is that the case.
+
+124
+00:06:20,180 --> 00:06:22,145
+It's essentially a waste of time.
+
+125
+00:06:22,145 --> 00:06:25,505
+So you would not
+simplify under star.
+
+126
+00:06:25,505 --> 00:06:27,650
+You only simplify if you have
+
+127
+00:06:27,650 --> 00:06:30,530
+an alternative and the sequence.
+
+128
+00:06:30,530 --> 00:06:34,880
+Now, however, there
+is one small point.
+
+129
+00:06:34,880 --> 00:06:39,785
+You have to be aware of
+these simplification rules.
+
+130
+00:06:39,785 --> 00:06:42,740
+They need to be essentially
+applied backwards.
+
+131
+00:06:42,740 --> 00:06:45,650
+So I have to first essentially
+go to the leaves of
+
+132
+00:06:45,650 --> 00:06:49,085
+this record expression and
+then have to find out,
+
+133
+00:06:49,085 --> 00:06:51,170
+can I apply simplification?
+
+134
+00:06:51,170 --> 00:06:52,670
+And then if yes,
+
+135
+00:06:52,670 --> 00:06:55,430
+I apply the simplification
+and I look at the next step,
+
+136
+00:06:55,430 --> 00:06:57,815
+can I now simplify
+something more?
+
+137
+00:06:57,815 --> 00:06:59,390
+The reason is how
+
+138
+00:06:59,390 --> 00:07:03,125
+the simplification
+rules are formulated.
+
+139
+00:07:03,125 --> 00:07:05,300
+They might fire in
+
+140
+00:07:05,300 --> 00:07:08,765
+a higher level if something
+simplifies below.
+
+141
+00:07:08,765 --> 00:07:14,315
+So we have to essentially
+simplify from the inside out.
+
+142
+00:07:14,315 --> 00:07:16,850
+And in this
+simplification function,
+
+143
+00:07:16,850 --> 00:07:20,885
+what that means is the
+matching this reg expression.
+
+144
+00:07:20,885 --> 00:07:23,120
+Be test whether it's
+an alternative or
+
+145
+00:07:23,120 --> 00:07:26,345
+a sequence only then we
+actually go into action.
+
+146
+00:07:26,345 --> 00:07:28,385
+Once we know.
+
+147
+00:07:28,385 --> 00:07:30,530
+In case of an alternative,
+
+148
+00:07:30,530 --> 00:07:32,120
+what are the two components?
+
+149
+00:07:32,120 --> 00:07:35,765
+P first, simplify each component,
+
+150
+00:07:35,765 --> 00:07:39,095
+okay, and then we
+get a result back.
+
+151
+00:07:39,095 --> 00:07:41,180
+And we check whether
+
+152
+00:07:41,180 --> 00:07:45,679
+this simplification of
+R1 resulted into a 0.
+
+153
+00:07:45,679 --> 00:07:47,884
+Then because it's an alternative
+
+154
+00:07:47,884 --> 00:07:49,190
+than I can just replace it
+
+155
+00:07:49,190 --> 00:07:53,375
+by r is two a simplified
+version of R2.
+
+156
+00:07:53,375 --> 00:07:58,820
+If it came back r as
+two is actually 0,
+
+157
+00:07:58,820 --> 00:08:00,410
+then I can replace it by
+
+158
+00:08:00,410 --> 00:08:03,785
+the simplified version of a warm.
+
+159
+00:08:03,785 --> 00:08:07,460
+If I simplify both
+alternatives and it
+
+160
+00:08:07,460 --> 00:08:11,180
+happens that the simplified
+versions are the same,
+
+161
+00:08:11,180 --> 00:08:15,170
+next century I can throw away
+one of the alternatives.
+
+162
+00:08:15,170 --> 00:08:18,080
+Otherwise, I just have to
+keep the alternatives,
+
+163
+00:08:18,080 --> 00:08:21,155
+but the simplified components
+
+164
+00:08:21,155 --> 00:08:23,915
+in the sequence is
+pretty much the same.
+
+165
+00:08:23,915 --> 00:08:27,950
+I first have to check what
+does it simplify underneath.
+
+166
+00:08:27,950 --> 00:08:31,385
+So I call first simplify
+and then have a look.
+
+167
+00:08:31,385 --> 00:08:33,020
+Does it matches C or one of
+
+168
+00:08:33,020 --> 00:08:36,035
+the simplification
+damage, just return 0.
+
+169
+00:08:36,035 --> 00:08:38,330
+Or if the other component is 0,
+
+170
+00:08:38,330 --> 00:08:40,535
+I can also return a 0.
+
+171
+00:08:40,535 --> 00:08:43,310
+If it's one, then I keep
+the other component.
+
+172
+00:08:43,310 --> 00:08:45,920
+And if this iss two is one,
+
+173
+00:08:45,920 --> 00:08:47,615
+and I keep the first component,
+
+174
+00:08:47,615 --> 00:08:50,359
+and otherwise it's
+still the sequence.
+
+175
+00:08:50,359 --> 00:08:53,540
+And in all the other cases I
+don't have to do anything.
+
+176
+00:08:53,540 --> 00:08:55,700
+It's just a plain
+0. I leave it in.
+
+177
+00:08:55,700 --> 00:08:57,860
+If it's a plain
+warm, I leave it in.
+
+178
+00:08:57,860 --> 00:09:00,170
+And if something is under a star,
+
+179
+00:09:00,170 --> 00:09:02,840
+I don't open up this
+door and simplify it.
+
+180
+00:09:02,840 --> 00:09:06,680
+I just apply that to be
+as quick as possible.
+
+181
+00:09:06,680 --> 00:09:10,280
+Let's see whether this has
+any effect on our code.
+
+182
+00:09:10,280 --> 00:09:12,980
+So I'm now in the
+file read three,
+
+183
+00:09:12,980 --> 00:09:17,405
+and it's the same as Reto.
+
+184
+00:09:17,405 --> 00:09:20,885
+It still has this
+explicit and Times case,
+
+185
+00:09:20,885 --> 00:09:24,665
+nullable and derivative
+still the same.
+
+186
+00:09:24,665 --> 00:09:28,610
+Except now we have this
+simplification function as well.
+
+187
+00:09:28,610 --> 00:09:29,960
+And when we apply
+
+188
+00:09:29,960 --> 00:09:33,725
+the derivative after
+the apply deteriorated,
+
+189
+00:09:33,725 --> 00:09:35,870
+simplify it to throw away
+
+190
+00:09:35,870 --> 00:09:39,050
+all the junk this
+derivative introduced.
+
+191
+00:09:39,050 --> 00:09:41,510
+Otherwise everything
+stays the same.
+
+192
+00:09:41,510 --> 00:09:43,580
+You still have this expansion
+
+193
+00:09:43,580 --> 00:09:45,515
+of the optional reg expression.
+
+194
+00:09:45,515 --> 00:09:49,670
+Here, our two evil wreck
+expressions we use as a test.
+
+195
+00:09:49,670 --> 00:09:52,460
+And here's now this test case,
+
+196
+00:09:52,460 --> 00:09:55,835
+the first one and the
+actually now trying it
+
+197
+00:09:55,835 --> 00:10:00,515
+out for strings between
+09 thousand a's.
+
+198
+00:10:00,515 --> 00:10:08,285
+So C. So also gets
+simplification makes a,
+
+199
+00:10:08,285 --> 00:10:10,655
+actually this case fast on.
+
+200
+00:10:10,655 --> 00:10:16,260
+So we can now run strings
+up to 9 thousand.
+
+201
+00:10:16,260 --> 00:10:19,360
+And we don't actually
+sweat about this at all.
+
+202
+00:10:19,360 --> 00:10:24,745
+For I have to say my laptop
+is now starting its fan.
+
+203
+00:10:24,745 --> 00:10:28,525
+And also, because the times
+are relatively small,
+
+204
+00:10:28,525 --> 00:10:30,610
+I'm actually running
+each test three
+
+205
+00:10:30,610 --> 00:10:32,785
+times and then take the average,
+
+206
+00:10:32,785 --> 00:10:34,720
+which I didn't do before,
+
+207
+00:10:34,720 --> 00:10:37,720
+just to be a tiny bit
+more accurate map.
+
+208
+00:10:37,720 --> 00:10:42,025
+So three seconds for a
+string of 9 thousand a's.
+
+209
+00:10:42,025 --> 00:10:44,980
+And now the more
+interesting question is,
+
+210
+00:10:44,980 --> 00:10:46,240
+for the second case,
+
+211
+00:10:46,240 --> 00:10:48,625
+this E star, star b.
+
+212
+00:10:48,625 --> 00:10:51,250
+And you can already see
+on these numbers RB.
+
+213
+00:10:51,250 --> 00:10:53,260
+And now you're really ambitious.
+
+214
+00:10:53,260 --> 00:10:57,860
+And let's see how our
+program is coping with that.
+
+215
+00:11:02,610 --> 00:11:07,780
+Again. No sweat, at
+least not on my part.
+
+216
+00:11:07,780 --> 00:11:10,480
+The laptop thefts to
+calculate quite a bit.
+
+217
+00:11:10,480 --> 00:11:12,940
+But this is now a string of
+
+218
+00:11:12,940 --> 00:11:16,539
+3 million A's and
+it needs a second.
+
+219
+00:11:16,539 --> 00:11:20,680
+And let's see how far we get.
+
+220
+00:11:20,680 --> 00:11:25,280
+4 million a around two seconds.
+
+221
+00:11:27,030 --> 00:11:29,050
+So say it again,
+
+222
+00:11:29,050 --> 00:11:31,690
+I'm actually slowing it down.
+
+223
+00:11:31,690 --> 00:11:34,150
+He artificially run the test
+
+224
+00:11:34,150 --> 00:11:36,895
+three times and then
+take the average.
+
+225
+00:11:36,895 --> 00:11:42,749
+But it seems to be something
+less than five seconds.
+
+226
+00:11:42,749 --> 00:11:48,185
+And remember that is a
+string of 6 million A's.
+
+227
+00:11:48,185 --> 00:11:51,170
+Let's just have a
+look at the graphs.
+
+228
+00:11:51,170 --> 00:11:56,105
+So the simplification also
+made of first case faster.
+
+229
+00:11:56,105 --> 00:11:58,880
+So earlier without
+simplification,
+
+230
+00:11:58,880 --> 00:12:00,710
+where we just have
+this explicit and
+
+231
+00:12:00,710 --> 00:12:03,050
+times that at this graph.
+
+232
+00:12:03,050 --> 00:12:05,210
+And now we can go up to
+
+233
+00:12:05,210 --> 00:12:09,620
+10 thousand and be still
+under five seconds.
+
+234
+00:12:09,620 --> 00:12:12,300
+So that is good news.
+
+235
+00:12:13,270 --> 00:12:16,745
+In the other example, remember,
+
+236
+00:12:16,745 --> 00:12:19,220
+the existing regular
+expression matches in
+
+237
+00:12:19,220 --> 00:12:22,880
+Java eight, Python,
+and JavaScript.
+
+238
+00:12:22,880 --> 00:12:26,030
+And thanks to the
+student be Ozma,
+
+239
+00:12:26,030 --> 00:12:27,935
+half a graph for Swift.
+
+240
+00:12:27,935 --> 00:12:29,750
+They're pretty atrocious.
+
+241
+00:12:29,750 --> 00:12:33,320
+They need like for 30 Ace,
+
+242
+00:12:33,320 --> 00:12:37,490
+something like on the
+magnitude of thirty-seconds.
+
+243
+00:12:37,490 --> 00:12:39,740
+As I said already,
+
+244
+00:12:39,740 --> 00:12:42,665
+Java nine is slightly better.
+
+245
+00:12:42,665 --> 00:12:44,870
+Java nine and above this,
+
+246
+00:12:44,870 --> 00:12:46,220
+if you try that example,
+
+247
+00:12:46,220 --> 00:12:48,665
+the same exponential and nine,
+
+248
+00:12:48,665 --> 00:12:51,230
+you would be able to
+process something
+
+249
+00:12:51,230 --> 00:12:54,215
+like 40 thousand A's
+in half a minute.
+
+250
+00:12:54,215 --> 00:12:57,740
+I have to admit I'm not
+a 100% sure what they
+
+251
+00:12:57,740 --> 00:13:03,575
+did to improve the
+performance between Java 89.
+
+252
+00:13:03,575 --> 00:13:05,510
+I know they did something on
+
+253
+00:13:05,510 --> 00:13:07,790
+their regular expression
+matching engine,
+
+254
+00:13:07,790 --> 00:13:09,770
+but I haven't really looked
+
+255
+00:13:09,770 --> 00:13:12,230
+at what precisely they've done.
+
+256
+00:13:12,230 --> 00:13:17,945
+But that's still not
+really competition fas.
+
+257
+00:13:17,945 --> 00:13:20,915
+So our third version,
+
+258
+00:13:20,915 --> 00:13:24,335
+in this example, a star, star b.
+
+259
+00:13:24,335 --> 00:13:28,445
+We say it's something like
+We need 6 million A's.
+
+260
+00:13:28,445 --> 00:13:31,760
+And again, I think these
+are slightly older times,
+
+261
+00:13:31,760 --> 00:13:33,770
+so he had needs 20 seconds.
+
+262
+00:13:33,770 --> 00:13:37,250
+I think we had something
+like below five seconds.
+
+263
+00:13:37,250 --> 00:13:40,865
+But again, you can see
+the trends. We rants.
+
+264
+00:13:40,865 --> 00:13:42,590
+Circles around them.
+
+265
+00:13:42,590 --> 00:13:46,530
+And that's quite nice.
+
+266
+00:13:46,570 --> 00:13:49,774
+So what's good about
+this algorithm?
+
+267
+00:13:49,774 --> 00:13:51,605
+By pressure of ski?
+
+268
+00:13:51,605 --> 00:13:54,500
+Well, first, it extends
+
+269
+00:13:54,500 --> 00:13:57,800
+actually to quite a number
+of regular expressions.
+
+270
+00:13:57,800 --> 00:13:59,900
+So we saw in this example
+
+271
+00:13:59,900 --> 00:14:03,950
+the End Times regular expression
+is a little bit of work.
+
+272
+00:14:03,950 --> 00:14:05,405
+We could extend that.
+
+273
+00:14:05,405 --> 00:14:08,480
+It also extends to the
+not reg expression,
+
+274
+00:14:08,480 --> 00:14:10,820
+which I explain on
+the next slide.
+
+275
+00:14:10,820 --> 00:14:15,290
+It's very easy to implement
+in a functional language.
+
+276
+00:14:15,290 --> 00:14:16,610
+If you don't buy
+
+277
+00:14:16,610 --> 00:14:20,675
+all that functional
+programming language stuff,
+
+278
+00:14:20,675 --> 00:14:22,955
+you still have to agree.
+
+279
+00:14:22,955 --> 00:14:25,715
+Quite beautiful in Scala.
+
+280
+00:14:25,715 --> 00:14:28,160
+And you could also
+easily implemented
+
+281
+00:14:28,160 --> 00:14:31,760
+in C language or in Python.
+
+282
+00:14:31,760 --> 00:14:34,250
+There's really no
+problem with that.
+
+283
+00:14:34,250 --> 00:14:38,390
+Say the algorithm's actually
+quite old already brush-off,
+
+284
+00:14:38,390 --> 00:14:44,899
+ski wrote it down in
+1964 and his PhD thesis.
+
+285
+00:14:44,899 --> 00:14:49,460
+Somehow it was forgotten during
+
+286
+00:14:49,460 --> 00:14:54,095
+the next decades and only
+recently has been rediscovered.
+
+287
+00:14:54,095 --> 00:14:57,065
+At the moment, we are
+only solve the problem
+
+288
+00:14:57,065 --> 00:15:02,240
+of Gmail reg expression
+gamma string deaths,
+
+289
+00:15:02,240 --> 00:15:05,550
+the regular expression
+match the string yes or no.
+
+290
+00:15:05,550 --> 00:15:08,740
+The want to of course, use
+it as part of our lexicon.
+
+291
+00:15:08,740 --> 00:15:12,370
+And there we have to do
+something more complicated.
+
+292
+00:15:12,370 --> 00:15:15,384
+We have to essentially
+generate tokens.
+
+293
+00:15:15,384 --> 00:15:18,670
+And this will still take
+a little bit of work.
+
+294
+00:15:18,670 --> 00:15:22,045
+And that's actually relatively
+recent work by somebody
+
+295
+00:15:22,045 --> 00:15:26,110
+called suits Month and
+the co-worker called Lou.
+
+296
+00:15:26,110 --> 00:15:30,880
+And what I personally
+also find very satisfying
+
+297
+00:15:30,880 --> 00:15:32,695
+about this algorithm is
+
+298
+00:15:32,695 --> 00:15:36,040
+that we can actually
+prove that it's correct.
+
+299
+00:15:36,040 --> 00:15:37,735
+You might remember I did
+
+300
+00:15:37,735 --> 00:15:41,500
+quite some interesting
+transformations
+
+301
+00:15:41,500 --> 00:15:44,830
+when I calculated the derivative.
+
+302
+00:15:44,830 --> 00:15:48,270
+How can I be sure that
+this is all correct?
+
+303
+00:15:48,270 --> 00:15:50,270
+Actually, I can now go away and
+
+304
+00:15:50,270 --> 00:15:52,865
+prove the correctness
+of this algorithm.
+
+305
+00:15:52,865 --> 00:15:56,645
+Does it really satisfy
+the specification?
+
+306
+00:15:56,645 --> 00:15:58,550
+Is really fs string
+
+307
+00:15:58,550 --> 00:16:00,440
+is in the language
+of a reg expression.
+
+308
+00:16:00,440 --> 00:16:03,050
+Does that matter? Vd say yes.
+
+309
+00:16:03,050 --> 00:16:07,070
+However, I leave that
+for another video.
+
+310
+00:16:07,070 --> 00:16:10,580
+What I also like about
+this algorithm that can be
+
+311
+00:16:10,580 --> 00:16:13,775
+actually extended to quite a
+number of rec expressions.
+
+312
+00:16:13,775 --> 00:16:17,810
+So this is t not reg
+expression that is
+
+313
+00:16:17,810 --> 00:16:19,760
+opposed to match strings which
+
+314
+00:16:19,760 --> 00:16:22,235
+this reg expression cannot match.
+
+315
+00:16:22,235 --> 00:16:24,245
+So here's an example.
+
+316
+00:16:24,245 --> 00:16:28,640
+You're in the business of
+trying to find out what
+
+317
+00:16:28,640 --> 00:16:33,530
+our comments in languages like
+Java or C. Then you know,
+
+318
+00:16:33,530 --> 00:16:35,060
+they always start with a slash,
+
+319
+00:16:35,060 --> 00:16:36,245
+then comes a star,
+
+320
+00:16:36,245 --> 00:16:38,240
+then comes an arbitrary string.
+
+321
+00:16:38,240 --> 00:16:41,195
+And then they finish
+with a slash and a star.
+
+322
+00:16:41,195 --> 00:16:44,330
+And how you want to specify
+that is something like this.
+
+323
+00:16:44,330 --> 00:16:45,530
+You want to say, OK,
+
+324
+00:16:45,530 --> 00:16:48,245
+a comment starts with
+a slash and a star.
+
+325
+00:16:48,245 --> 00:16:51,410
+Then it comes any
+string in between.
+
+326
+00:16:51,410 --> 00:16:55,340
+But this string
+in-between cannot contain
+
+327
+00:16:55,340 --> 00:16:58,280
+any star and slash
+because that would then
+
+328
+00:16:58,280 --> 00:17:01,415
+indicate there's the
+end already there.
+
+329
+00:17:01,415 --> 00:17:03,680
+And then after you
+have such a string
+
+330
+00:17:03,680 --> 00:17:06,410
+which doesn't
+contain as standard,
+
+331
+00:17:06,410 --> 00:17:11,180
+then at the end you want to
+match a star and a slash.
+
+332
+00:17:11,180 --> 00:17:13,460
+So for these kind of comments,
+
+333
+00:17:13,460 --> 00:17:15,995
+this reg expression is
+actually quite useful.
+
+334
+00:17:15,995 --> 00:17:19,430
+And if you ever seen
+how to do the negation,
+
+335
+00:17:19,430 --> 00:17:21,995
+this kinda break
+expression with automata?
+
+336
+00:17:21,995 --> 00:17:24,710
+You will know that's
+a bit of a pain,
+
+337
+00:17:24,710 --> 00:17:26,675
+but for the Borowski,
+
+338
+00:17:26,675 --> 00:17:28,370
+it's no pain at all.
+
+339
+00:17:28,370 --> 00:17:30,710
+The meaning of this
+reg expression
+
+340
+00:17:30,710 --> 00:17:32,255
+would be something like that.
+
+341
+00:17:32,255 --> 00:17:35,540
+If I have all the
+strings there are,
+
+342
+00:17:35,540 --> 00:17:38,390
+and I take away all the strings,
+
+343
+00:17:38,390 --> 00:17:40,055
+this arc and match.
+
+344
+00:17:40,055 --> 00:17:42,080
+The remaining strings are
+
+345
+00:17:42,080 --> 00:17:44,630
+what this reg expression
+is supposed to match.
+
+346
+00:17:44,630 --> 00:17:47,165
+So I can specify
+what the meaning is.
+
+347
+00:17:47,165 --> 00:17:49,760
+And then it's also very easy to
+
+348
+00:17:49,760 --> 00:17:52,174
+say what is nullable
+and derivative.
+
+349
+00:17:52,174 --> 00:17:54,140
+So for nullable, it would be
+
+350
+00:17:54,140 --> 00:17:56,570
+just a test where
+the eyes nullable.
+
+351
+00:17:56,570 --> 00:17:58,985
+And I just take the
+negation of that.
+
+352
+00:17:58,985 --> 00:18:00,680
+And men I have to build
+
+353
+00:18:00,680 --> 00:18:03,620
+the derivative of this
+not reg expression.
+
+354
+00:18:03,620 --> 00:18:05,420
+Then I just have to move
+
+355
+00:18:05,420 --> 00:18:07,325
+this permutation does derivative,
+
+356
+00:18:07,325 --> 00:18:10,070
+derivative inside
+the wreck expression
+
+357
+00:18:10,070 --> 00:18:12,575
+and keep the not on
+the outermost level.
+
+358
+00:18:12,575 --> 00:18:15,515
+So there's again no
+pain involved in
+
+359
+00:18:15,515 --> 00:18:19,130
+adding this reg expression
+to the algorithm.
+
+360
+00:18:19,130 --> 00:18:22,100
+We made it to the end of
+this video and we made
+
+361
+00:18:22,100 --> 00:18:24,739
+it also to the end
+of this algorithm.
+
+362
+00:18:24,739 --> 00:18:27,620
+I can see to loose trends.
+
+363
+00:18:27,620 --> 00:18:29,420
+One is we implemented
+
+364
+00:18:29,420 --> 00:18:32,855
+this algorithm for the
+basic regular expressions.
+
+365
+00:18:32,855 --> 00:18:38,705
+And we also added the end
+times out of necessity.
+
+366
+00:18:38,705 --> 00:18:41,120
+And I also showed
+you how to implement
+
+367
+00:18:41,120 --> 00:18:44,840
+the not regular
+expression on a slide.
+
+368
+00:18:44,840 --> 00:18:48,995
+But your task in the
+coursework actually is
+
+369
+00:18:48,995 --> 00:18:52,970
+to extend that to some of
+the extended reg expression.
+
+370
+00:18:52,970 --> 00:18:57,260
+So especially these bounded
+repetitions like from 0 to
+
+371
+00:18:57,260 --> 00:19:01,550
+n times or between n and m times.
+
+372
+00:19:01,550 --> 00:19:04,325
+And I think also
+plus and D option.
+
+373
+00:19:04,325 --> 00:19:07,220
+The other loose trend is,
+
+374
+00:19:07,220 --> 00:19:09,200
+remember I did this while
+
+375
+00:19:09,200 --> 00:19:11,645
+calculations with
+regular expressions.
+
+376
+00:19:11,645 --> 00:19:13,205
+Why on earth?
+
+377
+00:19:13,205 --> 00:19:14,480
+Is that all correct?
+
+378
+00:19:14,480 --> 00:19:16,355
+Why on earth should
+this algorithm
+
+379
+00:19:16,355 --> 00:19:18,575
+actually satisfy
+our specification,
+
+380
+00:19:18,575 --> 00:19:20,450
+which we set out
+at the beginning.
+
+381
+00:19:20,450 --> 00:19:25,410
+So that needs to be looked at
+more closely. Bye for now.