|
1 1 |
|
2 00:00:06,350 --> 00:00:10,305 |
|
3 They come back. I |
|
4 can hear you saying, |
|
5 |
|
6 2 |
|
7 00:00:10,305 --> 00:00:12,000 |
|
8 yes, you tried it out on |
|
9 |
|
10 3 |
|
11 00:00:12,000 --> 00:00:14,760 |
|
12 one example and you |
|
13 were much better. |
|
14 |
|
15 4 |
|
16 00:00:14,760 --> 00:00:17,415 |
|
17 But how about on other examples? |
|
18 |
|
19 5 |
|
20 00:00:17,415 --> 00:00:21,265 |
|
21 Specifically, we had two |
|
22 regular expressions. |
|
23 |
|
24 6 |
|
25 00:00:21,265 --> 00:00:23,480 |
|
26 How about the other case? |
|
27 |
|
28 7 |
|
29 00:00:23,480 --> 00:00:27,480 |
|
30 Where let's have a look at |
|
31 that other case in this video. |
|
32 |
|
33 8 |
|
34 00:00:29,140 --> 00:00:32,705 |
|
35 So I'm back here |
|
36 in this Reto file. |
|
37 |
|
38 9 |
|
39 00:00:32,705 --> 00:00:35,180 |
|
40 And here's this test |
|
41 case which run quite |
|
42 |
|
43 10 |
|
44 00:00:35,180 --> 00:00:39,665 |
|
45 nicely for strings between 01000. |
|
46 |
|
47 11 |
|
48 00:00:39,665 --> 00:00:42,725 |
|
49 Here is the other test case. |
|
50 |
|
51 12 |
|
52 00:00:42,725 --> 00:00:44,090 |
|
53 I still run it only |
|
54 |
|
55 13 |
|
56 00:00:44,090 --> 00:00:48,470 |
|
57 for relatively small |
|
58 strings between 020. |
|
59 |
|
60 14 |
|
61 00:00:48,470 --> 00:00:50,240 |
|
62 And let's see what it say. |
|
63 |
|
64 15 |
|
65 00:00:50,240 --> 00:00:51,800 |
|
66 So I'm running the test in |
|
67 |
|
68 16 |
|
69 00:00:51,800 --> 00:00:57,320 |
|
70 the ammonoids repo and |
|
71 doesn't look too bad. |
|
72 |
|
73 17 |
|
74 00:00:57,320 --> 00:01:01,160 |
|
75 But this might be because |
|
76 20 is not general enough. |
|
77 |
|
78 18 |
|
79 00:01:01,160 --> 00:01:03,620 |
|
80 So let's try it with 30. |
|
81 |
|
82 19 |
|
83 00:01:03,620 --> 00:01:06,530 |
|
84 Let's run this test again. |
|
85 |
|
86 20 |
|
87 00:01:06,530 --> 00:01:11,075 |
|
88 And 20 is quite okay. |
|
89 |
|
90 21 |
|
91 00:01:11,075 --> 00:01:15,965 |
|
92 22, okay, that's now |
|
93 almost ten times as much. |
|
94 |
|
95 22 |
|
96 00:01:15,965 --> 00:01:18,995 |
|
97 And then the next |
|
98 one would be 24. |
|
99 |
|
100 23 |
|
101 00:01:18,995 --> 00:01:23,615 |
|
102 And we're waiting and waiting. |
|
103 |
|
104 24 |
|
105 00:01:23,615 --> 00:01:26,410 |
|
106 And we are waiting. |
|
107 |
|
108 25 |
|
109 00:01:26,410 --> 00:01:29,300 |
|
110 Actually, it isn't |
|
111 calculated at all. |
|
112 |
|
113 26 |
|
114 00:01:29,300 --> 00:01:31,399 |
|
115 It run out of memory. |
|
116 |
|
117 27 |
|
118 00:01:31,399 --> 00:01:33,905 |
|
119 So here is something going on, |
|
120 |
|
121 28 |
|
122 00:01:33,905 --> 00:01:37,640 |
|
123 which is Daphne bad and we |
|
124 have to have a look at that. |
|
125 |
|
126 29 |
|
127 00:01:37,640 --> 00:01:40,640 |
|
128 Okay? It's always |
|
129 instructive with |
|
130 |
|
131 30 |
|
132 00:01:40,640 --> 00:01:43,460 |
|
133 this algorithm to |
|
134 look at the sizes of |
|
135 |
|
136 31 |
|
137 00:01:43,460 --> 00:01:45,695 |
|
138 the record expressions |
|
139 we calculate |
|
140 |
|
141 32 |
|
142 00:01:45,695 --> 00:01:49,625 |
|
143 the evil to Is this |
|
144 a star, star B. |
|
145 |
|
146 33 |
|
147 00:01:49,625 --> 00:01:51,800 |
|
148 So there's nothing we |
|
149 can compress there. |
|
150 |
|
151 34 |
|
152 00:01:51,800 --> 00:01:55,490 |
|
153 It has just stars and |
|
154 sequences and nothing else. |
|
155 |
|
156 35 |
|
157 00:01:55,490 --> 00:01:58,430 |
|
158 And so it's not that we |
|
159 can play the same trick |
|
160 |
|
161 36 |
|
162 00:01:58,430 --> 00:02:01,805 |
|
163 of trying to introduce |
|
164 an explicit constructor. |
|
165 |
|
166 37 |
|
167 00:02:01,805 --> 00:02:04,190 |
|
168 But now let's have a |
|
169 look at the derivatives. |
|
170 |
|
171 38 |
|
172 00:02:04,190 --> 00:02:05,600 |
|
173 As you can see here. |
|
174 |
|
175 39 |
|
176 00:02:05,600 --> 00:02:08,420 |
|
177 If I take this evil to wreck |
|
178 |
|
179 40 |
|
180 00:02:08,420 --> 00:02:09,935 |
|
181 expression and they built |
|
182 |
|
183 41 |
|
184 00:02:09,935 --> 00:02:12,470 |
|
185 now longer and |
|
186 longer derivatives, |
|
187 |
|
188 42 |
|
189 00:02:12,470 --> 00:02:14,090 |
|
190 you can see this grows. |
|
191 |
|
192 43 |
|
193 00:02:14,090 --> 00:02:16,055 |
|
194 And as you see in this line, |
|
195 |
|
196 44 |
|
197 00:02:16,055 --> 00:02:17,270 |
|
198 if I'm trying to take |
|
199 |
|
200 45 |
|
201 00:02:17,270 --> 00:02:20,180 |
|
202 the derivative of a |
|
203 quite large string, |
|
204 |
|
205 46 |
|
206 00:02:20,180 --> 00:02:21,680 |
|
207 it is quite quick. |
|
208 |
|
209 47 |
|
210 00:02:21,680 --> 00:02:26,870 |
|
211 But the size of the |
|
212 reg expression, |
|
213 |
|
214 48 |
|
215 00:02:26,870 --> 00:02:28,310 |
|
216 the number of nodes, |
|
217 |
|
218 49 |
|
219 00:02:28,310 --> 00:02:30,815 |
|
220 is also like nearly |
|
221 eight millions. |
|
222 |
|
223 50 |
|
224 00:02:30,815 --> 00:02:33,845 |
|
225 And so even if that calculates |
|
226 relatively quickly, |
|
227 |
|
228 51 |
|
229 00:02:33,845 --> 00:02:37,865 |
|
230 that is the reason why at 24, |
|
231 |
|
232 52 |
|
233 00:02:37,865 --> 00:02:39,560 |
|
234 it just runs out of memory. |
|
235 |
|
236 53 |
|
237 00:02:39,560 --> 00:02:42,785 |
|
238 This reg expression |
|
239 just grew too much. |
|
240 |
|
241 54 |
|
242 00:02:42,785 --> 00:02:46,520 |
|
243 So we somehow need |
|
244 to still compress |
|
245 |
|
246 55 |
|
247 00:02:46,520 --> 00:02:49,655 |
|
248 this record expression |
|
249 and make it not |
|
250 |
|
251 56 |
|
252 00:02:49,655 --> 00:02:52,700 |
|
253 grow so much when we |
|
254 build derivative. |
|
255 |
|
256 57 |
|
257 00:02:52,700 --> 00:02:54,020 |
|
258 So we have to look at |
|
259 |
|
260 58 |
|
261 00:02:54,020 --> 00:02:57,050 |
|
262 where does that grow |
|
263 actually come from. |
|
264 |
|
265 59 |
|
266 00:02:57,050 --> 00:03:00,170 |
|
267 Let's look at the |
|
268 derivative operation |
|
269 |
|
270 60 |
|
271 00:03:00,170 --> 00:03:01,895 |
|
272 again in more detail. |
|
273 |
|
274 61 |
|
275 00:03:01,895 --> 00:03:03,740 |
|
276 When we introduced |
|
277 it, we looked at |
|
278 |
|
279 62 |
|
280 00:03:03,740 --> 00:03:06,560 |
|
281 this example of a |
|
282 wreck expression R, |
|
283 |
|
284 63 |
|
285 00:03:06,560 --> 00:03:11,465 |
|
286 which is a star of some |
|
287 alternative and some sequence. |
|
288 |
|
289 64 |
|
290 00:03:11,465 --> 00:03:13,130 |
|
291 And we can build now |
|
292 |
|
293 65 |
|
294 00:03:13,130 --> 00:03:15,800 |
|
295 the derivative of this |
|
296 r according to a, |
|
297 |
|
298 66 |
|
299 00:03:15,800 --> 00:03:18,965 |
|
300 b, and c and see |
|
301 what it calculates. |
|
302 |
|
303 67 |
|
304 00:03:18,965 --> 00:03:21,935 |
|
305 And you see in case of a, |
|
306 |
|
307 68 |
|
308 00:03:21,935 --> 00:03:26,570 |
|
309 it's like one times b plus |
|
310 0 and then followed by r, |
|
311 |
|
312 69 |
|
313 00:03:26,570 --> 00:03:29,015 |
|
314 which is this whole |
|
315 wreck expression again. |
|
316 |
|
317 70 |
|
318 00:03:29,015 --> 00:03:30,935 |
|
319 So you can also see. |
|
320 |
|
321 71 |
|
322 00:03:30,935 --> 00:03:34,745 |
|
323 Derivative operation |
|
324 introduces a lot of junk. |
|
325 |
|
326 72 |
|
327 00:03:34,745 --> 00:03:38,165 |
|
328 This plus 0 isn't |
|
329 really necessary. |
|
330 |
|
331 73 |
|
332 00:03:38,165 --> 00:03:42,455 |
|
333 And this times one could |
|
334 be also thrown away. |
|
335 |
|
336 74 |
|
337 00:03:42,455 --> 00:03:43,685 |
|
338 So in the first case, |
|
339 |
|
340 75 |
|
341 00:03:43,685 --> 00:03:48,110 |
|
342 actually we could just have |
|
343 one times b is b plus 0, |
|
344 |
|
345 76 |
|
346 00:03:48,110 --> 00:03:49,580 |
|
347 it still be a, |
|
348 |
|
349 77 |
|
350 00:03:49,580 --> 00:03:54,905 |
|
351 so it would be B followed by |
|
352 R. And in this second case, |
|
353 |
|
354 78 |
|
355 00:03:54,905 --> 00:03:57,515 |
|
356 C0 times b, that will be just 0. |
|
357 |
|
358 79 |
|
359 00:03:57,515 --> 00:03:59,270 |
|
360 Then 0 plus one is |
|
361 |
|
362 80 |
|
363 00:03:59,270 --> 00:04:05,330 |
|
364 11 times r would be just |
|
365 r. And in the last case, |
|
366 |
|
367 81 |
|
368 00:04:05,330 --> 00:04:12,155 |
|
369 C0 times B would be 00 plus |
|
370 0 is 00 times r would be 0. |
|
371 |
|
372 82 |
|
373 00:04:12,155 --> 00:04:17,540 |
|
374 Okay? So we could throw out |
|
375 all these useless zeros and |
|
376 |
|
377 83 |
|
378 00:04:17,540 --> 00:04:20,960 |
|
379 ones because actually |
|
380 we have to throw |
|
381 |
|
382 84 |
|
383 00:04:20,960 --> 00:04:24,845 |
|
384 them out because over time |
|
385 they just accumulate. |
|
386 |
|
387 85 |
|
388 00:04:24,845 --> 00:04:27,035 |
|
389 And then we build |
|
390 the next derivative. |
|
391 |
|
392 86 |
|
393 00:04:27,035 --> 00:04:31,130 |
|
394 0 wouldn't go away by |
|
395 building annuity where tests. |
|
396 |
|
397 87 |
|
398 00:04:31,130 --> 00:04:34,310 |
|
399 So we need to somehow |
|
400 a mechanism to |
|
401 |
|
402 88 |
|
403 00:04:34,310 --> 00:04:39,120 |
|
404 delete the junk from |
|
405 these derivatives. |
|
406 |
|
407 89 |
|
408 00:04:39,280 --> 00:04:41,900 |
|
409 But how to delete junk? |
|
410 |
|
411 90 |
|
412 00:04:41,900 --> 00:04:43,370 |
|
413 We already know we have |
|
414 |
|
415 91 |
|
416 00:04:43,370 --> 00:04:47,825 |
|
417 the simplification rules |
|
418 which say like if r plus 0, |
|
419 |
|
420 92 |
|
421 00:04:47,825 --> 00:04:53,000 |
|
422 I can just replace by |
|
423 r and the other ones. |
|
424 |
|
425 93 |
|
426 00:04:53,000 --> 00:04:55,760 |
|
427 And so what I now |
|
428 need to do is in |
|
429 |
|
430 94 |
|
431 00:04:55,760 --> 00:04:58,715 |
|
432 my algorithm when I |
|
433 built the derivative, |
|
434 |
|
435 95 |
|
436 00:04:58,715 --> 00:05:01,415 |
|
437 this might have |
|
438 introduced some chunk. |
|
439 |
|
440 96 |
|
441 00:05:01,415 --> 00:05:02,960 |
|
442 And I now have to have |
|
443 |
|
444 97 |
|
445 00:05:02,960 --> 00:05:06,590 |
|
446 essentially a additional function |
|
447 |
|
448 98 |
|
449 00:05:06,590 --> 00:05:08,945 |
|
450 which deletes this junk again. |
|
451 |
|
452 99 |
|
453 00:05:08,945 --> 00:05:10,490 |
|
454 And in doing so, |
|
455 |
|
456 100 |
|
457 00:05:10,490 --> 00:05:13,340 |
|
458 keep steer, Rekha expression, |
|
459 |
|
460 101 |
|
461 00:05:13,340 --> 00:05:16,730 |
|
462 relatively small, pickers debts, |
|
463 |
|
464 102 |
|
465 00:05:16,730 --> 00:05:19,535 |
|
466 the Achilles heel |
|
467 of this algorithm. |
|
468 |
|
469 103 |
|
470 00:05:19,535 --> 00:05:22,565 |
|
471 If this regular expression |
|
472 grows too much, |
|
473 |
|
474 104 |
|
475 00:05:22,565 --> 00:05:25,070 |
|
476 then all these functions |
|
477 will slow down. |
|
478 |
|
479 105 |
|
480 00:05:25,070 --> 00:05:26,360 |
|
481 So the purpose of |
|
482 |
|
483 106 |
|
484 00:05:26,360 --> 00:05:30,455 |
|
485 the simplification function |
|
486 is to after the derivative, |
|
487 |
|
488 107 |
|
489 00:05:30,455 --> 00:05:33,080 |
|
490 compress again this |
|
491 rec expression |
|
492 |
|
493 108 |
|
494 00:05:33,080 --> 00:05:35,570 |
|
495 and then do the next derivative. |
|
496 |
|
497 109 |
|
498 00:05:35,570 --> 00:05:39,815 |
|
499 So if we introduce that |
|
500 additional functions simp, |
|
501 |
|
502 110 |
|
503 00:05:39,815 --> 00:05:42,440 |
|
504 which essentially |
|
505 just goes through |
|
506 |
|
507 111 |
|
508 00:05:42,440 --> 00:05:46,040 |
|
509 this reg expression and |
|
510 deletes all this junk, |
|
511 |
|
512 112 |
|
513 00:05:46,040 --> 00:05:50,045 |
|
514 then we should be in a |
|
515 much better position. |
|
516 |
|
517 113 |
|
518 00:05:50,045 --> 00:05:52,490 |
|
519 As you can see on this slide, |
|
520 |
|
521 114 |
|
522 00:05:52,490 --> 00:05:54,440 |
|
523 the simplification |
|
524 function can actually |
|
525 |
|
526 115 |
|
527 00:05:54,440 --> 00:05:56,855 |
|
528 be implemented very easily. |
|
529 |
|
530 116 |
|
531 00:05:56,855 --> 00:05:59,750 |
|
532 However, there are few |
|
533 interesting points. |
|
534 |
|
535 117 |
|
536 00:05:59,750 --> 00:06:02,720 |
|
537 First of all, there |
|
538 are only two cases. |
|
539 |
|
540 118 |
|
541 00:06:02,720 --> 00:06:05,060 |
|
542 The only simplify when we have |
|
543 |
|
544 119 |
|
545 00:06:05,060 --> 00:06:08,255 |
|
546 an alternative or a sequence. |
|
547 |
|
548 120 |
|
549 00:06:08,255 --> 00:06:12,859 |
|
550 So for example, we will |
|
551 never simplify under star. |
|
552 |
|
553 121 |
|
554 00:06:12,859 --> 00:06:15,950 |
|
555 If you look at the |
|
556 derivative operation |
|
557 |
|
558 122 |
|
559 00:06:15,950 --> 00:06:17,825 |
|
560 and you scratch your |
|
561 head a little bit, |
|
562 |
|
563 123 |
|
564 00:06:17,825 --> 00:06:20,180 |
|
565 we'll find out why |
|
566 is that the case. |
|
567 |
|
568 124 |
|
569 00:06:20,180 --> 00:06:22,145 |
|
570 It's essentially a waste of time. |
|
571 |
|
572 125 |
|
573 00:06:22,145 --> 00:06:25,505 |
|
574 So you would not |
|
575 simplify under star. |
|
576 |
|
577 126 |
|
578 00:06:25,505 --> 00:06:27,650 |
|
579 You only simplify if you have |
|
580 |
|
581 127 |
|
582 00:06:27,650 --> 00:06:30,530 |
|
583 an alternative and the sequence. |
|
584 |
|
585 128 |
|
586 00:06:30,530 --> 00:06:34,880 |
|
587 Now, however, there |
|
588 is one small point. |
|
589 |
|
590 129 |
|
591 00:06:34,880 --> 00:06:39,785 |
|
592 You have to be aware of |
|
593 these simplification rules. |
|
594 |
|
595 130 |
|
596 00:06:39,785 --> 00:06:42,740 |
|
597 They need to be essentially |
|
598 applied backwards. |
|
599 |
|
600 131 |
|
601 00:06:42,740 --> 00:06:45,650 |
|
602 So I have to first essentially |
|
603 go to the leaves of |
|
604 |
|
605 132 |
|
606 00:06:45,650 --> 00:06:49,085 |
|
607 this record expression and |
|
608 then have to find out, |
|
609 |
|
610 133 |
|
611 00:06:49,085 --> 00:06:51,170 |
|
612 can I apply simplification? |
|
613 |
|
614 134 |
|
615 00:06:51,170 --> 00:06:52,670 |
|
616 And then if yes, |
|
617 |
|
618 135 |
|
619 00:06:52,670 --> 00:06:55,430 |
|
620 I apply the simplification |
|
621 and I look at the next step, |
|
622 |
|
623 136 |
|
624 00:06:55,430 --> 00:06:57,815 |
|
625 can I now simplify |
|
626 something more? |
|
627 |
|
628 137 |
|
629 00:06:57,815 --> 00:06:59,390 |
|
630 The reason is how |
|
631 |
|
632 138 |
|
633 00:06:59,390 --> 00:07:03,125 |
|
634 the simplification |
|
635 rules are formulated. |
|
636 |
|
637 139 |
|
638 00:07:03,125 --> 00:07:05,300 |
|
639 They might fire in |
|
640 |
|
641 140 |
|
642 00:07:05,300 --> 00:07:08,765 |
|
643 a higher level if something |
|
644 simplifies below. |
|
645 |
|
646 141 |
|
647 00:07:08,765 --> 00:07:14,315 |
|
648 So we have to essentially |
|
649 simplify from the inside out. |
|
650 |
|
651 142 |
|
652 00:07:14,315 --> 00:07:16,850 |
|
653 And in this |
|
654 simplification function, |
|
655 |
|
656 143 |
|
657 00:07:16,850 --> 00:07:20,885 |
|
658 what that means is the |
|
659 matching this reg expression. |
|
660 |
|
661 144 |
|
662 00:07:20,885 --> 00:07:23,120 |
|
663 Be test whether it's |
|
664 an alternative or |
|
665 |
|
666 145 |
|
667 00:07:23,120 --> 00:07:26,345 |
|
668 a sequence only then we |
|
669 actually go into action. |
|
670 |
|
671 146 |
|
672 00:07:26,345 --> 00:07:28,385 |
|
673 Once we know. |
|
674 |
|
675 147 |
|
676 00:07:28,385 --> 00:07:30,530 |
|
677 In case of an alternative, |
|
678 |
|
679 148 |
|
680 00:07:30,530 --> 00:07:32,120 |
|
681 what are the two components? |
|
682 |
|
683 149 |
|
684 00:07:32,120 --> 00:07:35,765 |
|
685 P first, simplify each component, |
|
686 |
|
687 150 |
|
688 00:07:35,765 --> 00:07:39,095 |
|
689 okay, and then we |
|
690 get a result back. |
|
691 |
|
692 151 |
|
693 00:07:39,095 --> 00:07:41,180 |
|
694 And we check whether |
|
695 |
|
696 152 |
|
697 00:07:41,180 --> 00:07:45,679 |
|
698 this simplification of |
|
699 R1 resulted into a 0. |
|
700 |
|
701 153 |
|
702 00:07:45,679 --> 00:07:47,884 |
|
703 Then because it's an alternative |
|
704 |
|
705 154 |
|
706 00:07:47,884 --> 00:07:49,190 |
|
707 than I can just replace it |
|
708 |
|
709 155 |
|
710 00:07:49,190 --> 00:07:53,375 |
|
711 by r is two a simplified |
|
712 version of R2. |
|
713 |
|
714 156 |
|
715 00:07:53,375 --> 00:07:58,820 |
|
716 If it came back r as |
|
717 two is actually 0, |
|
718 |
|
719 157 |
|
720 00:07:58,820 --> 00:08:00,410 |
|
721 then I can replace it by |
|
722 |
|
723 158 |
|
724 00:08:00,410 --> 00:08:03,785 |
|
725 the simplified version of a warm. |
|
726 |
|
727 159 |
|
728 00:08:03,785 --> 00:08:07,460 |
|
729 If I simplify both |
|
730 alternatives and it |
|
731 |
|
732 160 |
|
733 00:08:07,460 --> 00:08:11,180 |
|
734 happens that the simplified |
|
735 versions are the same, |
|
736 |
|
737 161 |
|
738 00:08:11,180 --> 00:08:15,170 |
|
739 next century I can throw away |
|
740 one of the alternatives. |
|
741 |
|
742 162 |
|
743 00:08:15,170 --> 00:08:18,080 |
|
744 Otherwise, I just have to |
|
745 keep the alternatives, |
|
746 |
|
747 163 |
|
748 00:08:18,080 --> 00:08:21,155 |
|
749 but the simplified components |
|
750 |
|
751 164 |
|
752 00:08:21,155 --> 00:08:23,915 |
|
753 in the sequence is |
|
754 pretty much the same. |
|
755 |
|
756 165 |
|
757 00:08:23,915 --> 00:08:27,950 |
|
758 I first have to check what |
|
759 does it simplify underneath. |
|
760 |
|
761 166 |
|
762 00:08:27,950 --> 00:08:31,385 |
|
763 So I call first simplify |
|
764 and then have a look. |
|
765 |
|
766 167 |
|
767 00:08:31,385 --> 00:08:33,020 |
|
768 Does it matches C or one of |
|
769 |
|
770 168 |
|
771 00:08:33,020 --> 00:08:36,035 |
|
772 the simplification |
|
773 damage, just return 0. |
|
774 |
|
775 169 |
|
776 00:08:36,035 --> 00:08:38,330 |
|
777 Or if the other component is 0, |
|
778 |
|
779 170 |
|
780 00:08:38,330 --> 00:08:40,535 |
|
781 I can also return a 0. |
|
782 |
|
783 171 |
|
784 00:08:40,535 --> 00:08:43,310 |
|
785 If it's one, then I keep |
|
786 the other component. |
|
787 |
|
788 172 |
|
789 00:08:43,310 --> 00:08:45,920 |
|
790 And if this iss two is one, |
|
791 |
|
792 173 |
|
793 00:08:45,920 --> 00:08:47,615 |
|
794 and I keep the first component, |
|
795 |
|
796 174 |
|
797 00:08:47,615 --> 00:08:50,359 |
|
798 and otherwise it's |
|
799 still the sequence. |
|
800 |
|
801 175 |
|
802 00:08:50,359 --> 00:08:53,540 |
|
803 And in all the other cases I |
|
804 don't have to do anything. |
|
805 |
|
806 176 |
|
807 00:08:53,540 --> 00:08:55,700 |
|
808 It's just a plain |
|
809 0. I leave it in. |
|
810 |
|
811 177 |
|
812 00:08:55,700 --> 00:08:57,860 |
|
813 If it's a plain |
|
814 warm, I leave it in. |
|
815 |
|
816 178 |
|
817 00:08:57,860 --> 00:09:00,170 |
|
818 And if something is under a star, |
|
819 |
|
820 179 |
|
821 00:09:00,170 --> 00:09:02,840 |
|
822 I don't open up this |
|
823 door and simplify it. |
|
824 |
|
825 180 |
|
826 00:09:02,840 --> 00:09:06,680 |
|
827 I just apply that to be |
|
828 as quick as possible. |
|
829 |
|
830 181 |
|
831 00:09:06,680 --> 00:09:10,280 |
|
832 Let's see whether this has |
|
833 any effect on our code. |
|
834 |
|
835 182 |
|
836 00:09:10,280 --> 00:09:12,980 |
|
837 So I'm now in the |
|
838 file read three, |
|
839 |
|
840 183 |
|
841 00:09:12,980 --> 00:09:17,405 |
|
842 and it's the same as Reto. |
|
843 |
|
844 184 |
|
845 00:09:17,405 --> 00:09:20,885 |
|
846 It still has this |
|
847 explicit and Times case, |
|
848 |
|
849 185 |
|
850 00:09:20,885 --> 00:09:24,665 |
|
851 nullable and derivative |
|
852 still the same. |
|
853 |
|
854 186 |
|
855 00:09:24,665 --> 00:09:28,610 |
|
856 Except now we have this |
|
857 simplification function as well. |
|
858 |
|
859 187 |
|
860 00:09:28,610 --> 00:09:29,960 |
|
861 And when we apply |
|
862 |
|
863 188 |
|
864 00:09:29,960 --> 00:09:33,725 |
|
865 the derivative after |
|
866 the apply deteriorated, |
|
867 |
|
868 189 |
|
869 00:09:33,725 --> 00:09:35,870 |
|
870 simplify it to throw away |
|
871 |
|
872 190 |
|
873 00:09:35,870 --> 00:09:39,050 |
|
874 all the junk this |
|
875 derivative introduced. |
|
876 |
|
877 191 |
|
878 00:09:39,050 --> 00:09:41,510 |
|
879 Otherwise everything |
|
880 stays the same. |
|
881 |
|
882 192 |
|
883 00:09:41,510 --> 00:09:43,580 |
|
884 You still have this expansion |
|
885 |
|
886 193 |
|
887 00:09:43,580 --> 00:09:45,515 |
|
888 of the optional reg expression. |
|
889 |
|
890 194 |
|
891 00:09:45,515 --> 00:09:49,670 |
|
892 Here, our two evil wreck |
|
893 expressions we use as a test. |
|
894 |
|
895 195 |
|
896 00:09:49,670 --> 00:09:52,460 |
|
897 And here's now this test case, |
|
898 |
|
899 196 |
|
900 00:09:52,460 --> 00:09:55,835 |
|
901 the first one and the |
|
902 actually now trying it |
|
903 |
|
904 197 |
|
905 00:09:55,835 --> 00:10:00,515 |
|
906 out for strings between |
|
907 09 thousand a's. |
|
908 |
|
909 198 |
|
910 00:10:00,515 --> 00:10:08,285 |
|
911 So C. So also gets |
|
912 simplification makes a, |
|
913 |
|
914 199 |
|
915 00:10:08,285 --> 00:10:10,655 |
|
916 actually this case fast on. |
|
917 |
|
918 200 |
|
919 00:10:10,655 --> 00:10:16,260 |
|
920 So we can now run strings |
|
921 up to 9 thousand. |
|
922 |
|
923 201 |
|
924 00:10:16,260 --> 00:10:19,360 |
|
925 And we don't actually |
|
926 sweat about this at all. |
|
927 |
|
928 202 |
|
929 00:10:19,360 --> 00:10:24,745 |
|
930 For I have to say my laptop |
|
931 is now starting its fan. |
|
932 |
|
933 203 |
|
934 00:10:24,745 --> 00:10:28,525 |
|
935 And also, because the times |
|
936 are relatively small, |
|
937 |
|
938 204 |
|
939 00:10:28,525 --> 00:10:30,610 |
|
940 I'm actually running |
|
941 each test three |
|
942 |
|
943 205 |
|
944 00:10:30,610 --> 00:10:32,785 |
|
945 times and then take the average, |
|
946 |
|
947 206 |
|
948 00:10:32,785 --> 00:10:34,720 |
|
949 which I didn't do before, |
|
950 |
|
951 207 |
|
952 00:10:34,720 --> 00:10:37,720 |
|
953 just to be a tiny bit |
|
954 more accurate map. |
|
955 |
|
956 208 |
|
957 00:10:37,720 --> 00:10:42,025 |
|
958 So three seconds for a |
|
959 string of 9 thousand a's. |
|
960 |
|
961 209 |
|
962 00:10:42,025 --> 00:10:44,980 |
|
963 And now the more |
|
964 interesting question is, |
|
965 |
|
966 210 |
|
967 00:10:44,980 --> 00:10:46,240 |
|
968 for the second case, |
|
969 |
|
970 211 |
|
971 00:10:46,240 --> 00:10:48,625 |
|
972 this E star, star b. |
|
973 |
|
974 212 |
|
975 00:10:48,625 --> 00:10:51,250 |
|
976 And you can already see |
|
977 on these numbers RB. |
|
978 |
|
979 213 |
|
980 00:10:51,250 --> 00:10:53,260 |
|
981 And now you're really ambitious. |
|
982 |
|
983 214 |
|
984 00:10:53,260 --> 00:10:57,860 |
|
985 And let's see how our |
|
986 program is coping with that. |
|
987 |
|
988 215 |
|
989 00:11:02,610 --> 00:11:07,780 |
|
990 Again. No sweat, at |
|
991 least not on my part. |
|
992 |
|
993 216 |
|
994 00:11:07,780 --> 00:11:10,480 |
|
995 The laptop thefts to |
|
996 calculate quite a bit. |
|
997 |
|
998 217 |
|
999 00:11:10,480 --> 00:11:12,940 |
|
1000 But this is now a string of |
|
1001 |
|
1002 218 |
|
1003 00:11:12,940 --> 00:11:16,539 |
|
1004 3 million A's and |
|
1005 it needs a second. |
|
1006 |
|
1007 219 |
|
1008 00:11:16,539 --> 00:11:20,680 |
|
1009 And let's see how far we get. |
|
1010 |
|
1011 220 |
|
1012 00:11:20,680 --> 00:11:25,280 |
|
1013 4 million a around two seconds. |
|
1014 |
|
1015 221 |
|
1016 00:11:27,030 --> 00:11:29,050 |
|
1017 So say it again, |
|
1018 |
|
1019 222 |
|
1020 00:11:29,050 --> 00:11:31,690 |
|
1021 I'm actually slowing it down. |
|
1022 |
|
1023 223 |
|
1024 00:11:31,690 --> 00:11:34,150 |
|
1025 He artificially run the test |
|
1026 |
|
1027 224 |
|
1028 00:11:34,150 --> 00:11:36,895 |
|
1029 three times and then |
|
1030 take the average. |
|
1031 |
|
1032 225 |
|
1033 00:11:36,895 --> 00:11:42,749 |
|
1034 But it seems to be something |
|
1035 less than five seconds. |
|
1036 |
|
1037 226 |
|
1038 00:11:42,749 --> 00:11:48,185 |
|
1039 And remember that is a |
|
1040 string of 6 million A's. |
|
1041 |
|
1042 227 |
|
1043 00:11:48,185 --> 00:11:51,170 |
|
1044 Let's just have a |
|
1045 look at the graphs. |
|
1046 |
|
1047 228 |
|
1048 00:11:51,170 --> 00:11:56,105 |
|
1049 So the simplification also |
|
1050 made of first case faster. |
|
1051 |
|
1052 229 |
|
1053 00:11:56,105 --> 00:11:58,880 |
|
1054 So earlier without |
|
1055 simplification, |
|
1056 |
|
1057 230 |
|
1058 00:11:58,880 --> 00:12:00,710 |
|
1059 where we just have |
|
1060 this explicit and |
|
1061 |
|
1062 231 |
|
1063 00:12:00,710 --> 00:12:03,050 |
|
1064 times that at this graph. |
|
1065 |
|
1066 232 |
|
1067 00:12:03,050 --> 00:12:05,210 |
|
1068 And now we can go up to |
|
1069 |
|
1070 233 |
|
1071 00:12:05,210 --> 00:12:09,620 |
|
1072 10 thousand and be still |
|
1073 under five seconds. |
|
1074 |
|
1075 234 |
|
1076 00:12:09,620 --> 00:12:12,300 |
|
1077 So that is good news. |
|
1078 |
|
1079 235 |
|
1080 00:12:13,270 --> 00:12:16,745 |
|
1081 In the other example, remember, |
|
1082 |
|
1083 236 |
|
1084 00:12:16,745 --> 00:12:19,220 |
|
1085 the existing regular |
|
1086 expression matches in |
|
1087 |
|
1088 237 |
|
1089 00:12:19,220 --> 00:12:22,880 |
|
1090 Java eight, Python, |
|
1091 and JavaScript. |
|
1092 |
|
1093 238 |
|
1094 00:12:22,880 --> 00:12:26,030 |
|
1095 And thanks to the |
|
1096 student be Ozma, |
|
1097 |
|
1098 239 |
|
1099 00:12:26,030 --> 00:12:27,935 |
|
1100 half a graph for Swift. |
|
1101 |
|
1102 240 |
|
1103 00:12:27,935 --> 00:12:29,750 |
|
1104 They're pretty atrocious. |
|
1105 |
|
1106 241 |
|
1107 00:12:29,750 --> 00:12:33,320 |
|
1108 They need like for 30 Ace, |
|
1109 |
|
1110 242 |
|
1111 00:12:33,320 --> 00:12:37,490 |
|
1112 something like on the |
|
1113 magnitude of thirty-seconds. |
|
1114 |
|
1115 243 |
|
1116 00:12:37,490 --> 00:12:39,740 |
|
1117 As I said already, |
|
1118 |
|
1119 244 |
|
1120 00:12:39,740 --> 00:12:42,665 |
|
1121 Java nine is slightly better. |
|
1122 |
|
1123 245 |
|
1124 00:12:42,665 --> 00:12:44,870 |
|
1125 Java nine and above this, |
|
1126 |
|
1127 246 |
|
1128 00:12:44,870 --> 00:12:46,220 |
|
1129 if you try that example, |
|
1130 |
|
1131 247 |
|
1132 00:12:46,220 --> 00:12:48,665 |
|
1133 the same exponential and nine, |
|
1134 |
|
1135 248 |
|
1136 00:12:48,665 --> 00:12:51,230 |
|
1137 you would be able to |
|
1138 process something |
|
1139 |
|
1140 249 |
|
1141 00:12:51,230 --> 00:12:54,215 |
|
1142 like 40 thousand A's |
|
1143 in half a minute. |
|
1144 |
|
1145 250 |
|
1146 00:12:54,215 --> 00:12:57,740 |
|
1147 I have to admit I'm not |
|
1148 a 100% sure what they |
|
1149 |
|
1150 251 |
|
1151 00:12:57,740 --> 00:13:03,575 |
|
1152 did to improve the |
|
1153 performance between Java 89. |
|
1154 |
|
1155 252 |
|
1156 00:13:03,575 --> 00:13:05,510 |
|
1157 I know they did something on |
|
1158 |
|
1159 253 |
|
1160 00:13:05,510 --> 00:13:07,790 |
|
1161 their regular expression |
|
1162 matching engine, |
|
1163 |
|
1164 254 |
|
1165 00:13:07,790 --> 00:13:09,770 |
|
1166 but I haven't really looked |
|
1167 |
|
1168 255 |
|
1169 00:13:09,770 --> 00:13:12,230 |
|
1170 at what precisely they've done. |
|
1171 |
|
1172 256 |
|
1173 00:13:12,230 --> 00:13:17,945 |
|
1174 But that's still not |
|
1175 really competition fas. |
|
1176 |
|
1177 257 |
|
1178 00:13:17,945 --> 00:13:20,915 |
|
1179 So our third version, |
|
1180 |
|
1181 258 |
|
1182 00:13:20,915 --> 00:13:24,335 |
|
1183 in this example, a star, star b. |
|
1184 |
|
1185 259 |
|
1186 00:13:24,335 --> 00:13:28,445 |
|
1187 We say it's something like |
|
1188 We need 6 million A's. |
|
1189 |
|
1190 260 |
|
1191 00:13:28,445 --> 00:13:31,760 |
|
1192 And again, I think these |
|
1193 are slightly older times, |
|
1194 |
|
1195 261 |
|
1196 00:13:31,760 --> 00:13:33,770 |
|
1197 so he had needs 20 seconds. |
|
1198 |
|
1199 262 |
|
1200 00:13:33,770 --> 00:13:37,250 |
|
1201 I think we had something |
|
1202 like below five seconds. |
|
1203 |
|
1204 263 |
|
1205 00:13:37,250 --> 00:13:40,865 |
|
1206 But again, you can see |
|
1207 the trends. We rants. |
|
1208 |
|
1209 264 |
|
1210 00:13:40,865 --> 00:13:42,590 |
|
1211 Circles around them. |
|
1212 |
|
1213 265 |
|
1214 00:13:42,590 --> 00:13:46,530 |
|
1215 And that's quite nice. |
|
1216 |
|
1217 266 |
|
1218 00:13:46,570 --> 00:13:49,774 |
|
1219 So what's good about |
|
1220 this algorithm? |
|
1221 |
|
1222 267 |
|
1223 00:13:49,774 --> 00:13:51,605 |
|
1224 By pressure of ski? |
|
1225 |
|
1226 268 |
|
1227 00:13:51,605 --> 00:13:54,500 |
|
1228 Well, first, it extends |
|
1229 |
|
1230 269 |
|
1231 00:13:54,500 --> 00:13:57,800 |
|
1232 actually to quite a number |
|
1233 of regular expressions. |
|
1234 |
|
1235 270 |
|
1236 00:13:57,800 --> 00:13:59,900 |
|
1237 So we saw in this example |
|
1238 |
|
1239 271 |
|
1240 00:13:59,900 --> 00:14:03,950 |
|
1241 the End Times regular expression |
|
1242 is a little bit of work. |
|
1243 |
|
1244 272 |
|
1245 00:14:03,950 --> 00:14:05,405 |
|
1246 We could extend that. |
|
1247 |
|
1248 273 |
|
1249 00:14:05,405 --> 00:14:08,480 |
|
1250 It also extends to the |
|
1251 not reg expression, |
|
1252 |
|
1253 274 |
|
1254 00:14:08,480 --> 00:14:10,820 |
|
1255 which I explain on |
|
1256 the next slide. |
|
1257 |
|
1258 275 |
|
1259 00:14:10,820 --> 00:14:15,290 |
|
1260 It's very easy to implement |
|
1261 in a functional language. |
|
1262 |
|
1263 276 |
|
1264 00:14:15,290 --> 00:14:16,610 |
|
1265 If you don't buy |
|
1266 |
|
1267 277 |
|
1268 00:14:16,610 --> 00:14:20,675 |
|
1269 all that functional |
|
1270 programming language stuff, |
|
1271 |
|
1272 278 |
|
1273 00:14:20,675 --> 00:14:22,955 |
|
1274 you still have to agree. |
|
1275 |
|
1276 279 |
|
1277 00:14:22,955 --> 00:14:25,715 |
|
1278 Quite beautiful in Scala. |
|
1279 |
|
1280 280 |
|
1281 00:14:25,715 --> 00:14:28,160 |
|
1282 And you could also |
|
1283 easily implemented |
|
1284 |
|
1285 281 |
|
1286 00:14:28,160 --> 00:14:31,760 |
|
1287 in C language or in Python. |
|
1288 |
|
1289 282 |
|
1290 00:14:31,760 --> 00:14:34,250 |
|
1291 There's really no |
|
1292 problem with that. |
|
1293 |
|
1294 283 |
|
1295 00:14:34,250 --> 00:14:38,390 |
|
1296 Say the algorithm's actually |
|
1297 quite old already brush-off, |
|
1298 |
|
1299 284 |
|
1300 00:14:38,390 --> 00:14:44,899 |
|
1301 ski wrote it down in |
|
1302 1964 and his PhD thesis. |
|
1303 |
|
1304 285 |
|
1305 00:14:44,899 --> 00:14:49,460 |
|
1306 Somehow it was forgotten during |
|
1307 |
|
1308 286 |
|
1309 00:14:49,460 --> 00:14:54,095 |
|
1310 the next decades and only |
|
1311 recently has been rediscovered. |
|
1312 |
|
1313 287 |
|
1314 00:14:54,095 --> 00:14:57,065 |
|
1315 At the moment, we are |
|
1316 only solve the problem |
|
1317 |
|
1318 288 |
|
1319 00:14:57,065 --> 00:15:02,240 |
|
1320 of Gmail reg expression |
|
1321 gamma string deaths, |
|
1322 |
|
1323 289 |
|
1324 00:15:02,240 --> 00:15:05,550 |
|
1325 the regular expression |
|
1326 match the string yes or no. |
|
1327 |
|
1328 290 |
|
1329 00:15:05,550 --> 00:15:08,740 |
|
1330 The want to of course, use |
|
1331 it as part of our lexicon. |
|
1332 |
|
1333 291 |
|
1334 00:15:08,740 --> 00:15:12,370 |
|
1335 And there we have to do |
|
1336 something more complicated. |
|
1337 |
|
1338 292 |
|
1339 00:15:12,370 --> 00:15:15,384 |
|
1340 We have to essentially |
|
1341 generate tokens. |
|
1342 |
|
1343 293 |
|
1344 00:15:15,384 --> 00:15:18,670 |
|
1345 And this will still take |
|
1346 a little bit of work. |
|
1347 |
|
1348 294 |
|
1349 00:15:18,670 --> 00:15:22,045 |
|
1350 And that's actually relatively |
|
1351 recent work by somebody |
|
1352 |
|
1353 295 |
|
1354 00:15:22,045 --> 00:15:26,110 |
|
1355 called suits Month and |
|
1356 the co-worker called Lou. |
|
1357 |
|
1358 296 |
|
1359 00:15:26,110 --> 00:15:30,880 |
|
1360 And what I personally |
|
1361 also find very satisfying |
|
1362 |
|
1363 297 |
|
1364 00:15:30,880 --> 00:15:32,695 |
|
1365 about this algorithm is |
|
1366 |
|
1367 298 |
|
1368 00:15:32,695 --> 00:15:36,040 |
|
1369 that we can actually |
|
1370 prove that it's correct. |
|
1371 |
|
1372 299 |
|
1373 00:15:36,040 --> 00:15:37,735 |
|
1374 You might remember I did |
|
1375 |
|
1376 300 |
|
1377 00:15:37,735 --> 00:15:41,500 |
|
1378 quite some interesting |
|
1379 transformations |
|
1380 |
|
1381 301 |
|
1382 00:15:41,500 --> 00:15:44,830 |
|
1383 when I calculated the derivative. |
|
1384 |
|
1385 302 |
|
1386 00:15:44,830 --> 00:15:48,270 |
|
1387 How can I be sure that |
|
1388 this is all correct? |
|
1389 |
|
1390 303 |
|
1391 00:15:48,270 --> 00:15:50,270 |
|
1392 Actually, I can now go away and |
|
1393 |
|
1394 304 |
|
1395 00:15:50,270 --> 00:15:52,865 |
|
1396 prove the correctness |
|
1397 of this algorithm. |
|
1398 |
|
1399 305 |
|
1400 00:15:52,865 --> 00:15:56,645 |
|
1401 Does it really satisfy |
|
1402 the specification? |
|
1403 |
|
1404 306 |
|
1405 00:15:56,645 --> 00:15:58,550 |
|
1406 Is really fs string |
|
1407 |
|
1408 307 |
|
1409 00:15:58,550 --> 00:16:00,440 |
|
1410 is in the language |
|
1411 of a reg expression. |
|
1412 |
|
1413 308 |
|
1414 00:16:00,440 --> 00:16:03,050 |
|
1415 Does that matter? Vd say yes. |
|
1416 |
|
1417 309 |
|
1418 00:16:03,050 --> 00:16:07,070 |
|
1419 However, I leave that |
|
1420 for another video. |
|
1421 |
|
1422 310 |
|
1423 00:16:07,070 --> 00:16:10,580 |
|
1424 What I also like about |
|
1425 this algorithm that can be |
|
1426 |
|
1427 311 |
|
1428 00:16:10,580 --> 00:16:13,775 |
|
1429 actually extended to quite a |
|
1430 number of rec expressions. |
|
1431 |
|
1432 312 |
|
1433 00:16:13,775 --> 00:16:17,810 |
|
1434 So this is t not reg |
|
1435 expression that is |
|
1436 |
|
1437 313 |
|
1438 00:16:17,810 --> 00:16:19,760 |
|
1439 opposed to match strings which |
|
1440 |
|
1441 314 |
|
1442 00:16:19,760 --> 00:16:22,235 |
|
1443 this reg expression cannot match. |
|
1444 |
|
1445 315 |
|
1446 00:16:22,235 --> 00:16:24,245 |
|
1447 So here's an example. |
|
1448 |
|
1449 316 |
|
1450 00:16:24,245 --> 00:16:28,640 |
|
1451 You're in the business of |
|
1452 trying to find out what |
|
1453 |
|
1454 317 |
|
1455 00:16:28,640 --> 00:16:33,530 |
|
1456 our comments in languages like |
|
1457 Java or C. Then you know, |
|
1458 |
|
1459 318 |
|
1460 00:16:33,530 --> 00:16:35,060 |
|
1461 they always start with a slash, |
|
1462 |
|
1463 319 |
|
1464 00:16:35,060 --> 00:16:36,245 |
|
1465 then comes a star, |
|
1466 |
|
1467 320 |
|
1468 00:16:36,245 --> 00:16:38,240 |
|
1469 then comes an arbitrary string. |
|
1470 |
|
1471 321 |
|
1472 00:16:38,240 --> 00:16:41,195 |
|
1473 And then they finish |
|
1474 with a slash and a star. |
|
1475 |
|
1476 322 |
|
1477 00:16:41,195 --> 00:16:44,330 |
|
1478 And how you want to specify |
|
1479 that is something like this. |
|
1480 |
|
1481 323 |
|
1482 00:16:44,330 --> 00:16:45,530 |
|
1483 You want to say, OK, |
|
1484 |
|
1485 324 |
|
1486 00:16:45,530 --> 00:16:48,245 |
|
1487 a comment starts with |
|
1488 a slash and a star. |
|
1489 |
|
1490 325 |
|
1491 00:16:48,245 --> 00:16:51,410 |
|
1492 Then it comes any |
|
1493 string in between. |
|
1494 |
|
1495 326 |
|
1496 00:16:51,410 --> 00:16:55,340 |
|
1497 But this string |
|
1498 in-between cannot contain |
|
1499 |
|
1500 327 |
|
1501 00:16:55,340 --> 00:16:58,280 |
|
1502 any star and slash |
|
1503 because that would then |
|
1504 |
|
1505 328 |
|
1506 00:16:58,280 --> 00:17:01,415 |
|
1507 indicate there's the |
|
1508 end already there. |
|
1509 |
|
1510 329 |
|
1511 00:17:01,415 --> 00:17:03,680 |
|
1512 And then after you |
|
1513 have such a string |
|
1514 |
|
1515 330 |
|
1516 00:17:03,680 --> 00:17:06,410 |
|
1517 which doesn't |
|
1518 contain as standard, |
|
1519 |
|
1520 331 |
|
1521 00:17:06,410 --> 00:17:11,180 |
|
1522 then at the end you want to |
|
1523 match a star and a slash. |
|
1524 |
|
1525 332 |
|
1526 00:17:11,180 --> 00:17:13,460 |
|
1527 So for these kind of comments, |
|
1528 |
|
1529 333 |
|
1530 00:17:13,460 --> 00:17:15,995 |
|
1531 this reg expression is |
|
1532 actually quite useful. |
|
1533 |
|
1534 334 |
|
1535 00:17:15,995 --> 00:17:19,430 |
|
1536 And if you ever seen |
|
1537 how to do the negation, |
|
1538 |
|
1539 335 |
|
1540 00:17:19,430 --> 00:17:21,995 |
|
1541 this kinda break |
|
1542 expression with automata? |
|
1543 |
|
1544 336 |
|
1545 00:17:21,995 --> 00:17:24,710 |
|
1546 You will know that's |
|
1547 a bit of a pain, |
|
1548 |
|
1549 337 |
|
1550 00:17:24,710 --> 00:17:26,675 |
|
1551 but for the Borowski, |
|
1552 |
|
1553 338 |
|
1554 00:17:26,675 --> 00:17:28,370 |
|
1555 it's no pain at all. |
|
1556 |
|
1557 339 |
|
1558 00:17:28,370 --> 00:17:30,710 |
|
1559 The meaning of this |
|
1560 reg expression |
|
1561 |
|
1562 340 |
|
1563 00:17:30,710 --> 00:17:32,255 |
|
1564 would be something like that. |
|
1565 |
|
1566 341 |
|
1567 00:17:32,255 --> 00:17:35,540 |
|
1568 If I have all the |
|
1569 strings there are, |
|
1570 |
|
1571 342 |
|
1572 00:17:35,540 --> 00:17:38,390 |
|
1573 and I take away all the strings, |
|
1574 |
|
1575 343 |
|
1576 00:17:38,390 --> 00:17:40,055 |
|
1577 this arc and match. |
|
1578 |
|
1579 344 |
|
1580 00:17:40,055 --> 00:17:42,080 |
|
1581 The remaining strings are |
|
1582 |
|
1583 345 |
|
1584 00:17:42,080 --> 00:17:44,630 |
|
1585 what this reg expression |
|
1586 is supposed to match. |
|
1587 |
|
1588 346 |
|
1589 00:17:44,630 --> 00:17:47,165 |
|
1590 So I can specify |
|
1591 what the meaning is. |
|
1592 |
|
1593 347 |
|
1594 00:17:47,165 --> 00:17:49,760 |
|
1595 And then it's also very easy to |
|
1596 |
|
1597 348 |
|
1598 00:17:49,760 --> 00:17:52,174 |
|
1599 say what is nullable |
|
1600 and derivative. |
|
1601 |
|
1602 349 |
|
1603 00:17:52,174 --> 00:17:54,140 |
|
1604 So for nullable, it would be |
|
1605 |
|
1606 350 |
|
1607 00:17:54,140 --> 00:17:56,570 |
|
1608 just a test where |
|
1609 the eyes nullable. |
|
1610 |
|
1611 351 |
|
1612 00:17:56,570 --> 00:17:58,985 |
|
1613 And I just take the |
|
1614 negation of that. |
|
1615 |
|
1616 352 |
|
1617 00:17:58,985 --> 00:18:00,680 |
|
1618 And men I have to build |
|
1619 |
|
1620 353 |
|
1621 00:18:00,680 --> 00:18:03,620 |
|
1622 the derivative of this |
|
1623 not reg expression. |
|
1624 |
|
1625 354 |
|
1626 00:18:03,620 --> 00:18:05,420 |
|
1627 Then I just have to move |
|
1628 |
|
1629 355 |
|
1630 00:18:05,420 --> 00:18:07,325 |
|
1631 this permutation does derivative, |
|
1632 |
|
1633 356 |
|
1634 00:18:07,325 --> 00:18:10,070 |
|
1635 derivative inside |
|
1636 the wreck expression |
|
1637 |
|
1638 357 |
|
1639 00:18:10,070 --> 00:18:12,575 |
|
1640 and keep the not on |
|
1641 the outermost level. |
|
1642 |
|
1643 358 |
|
1644 00:18:12,575 --> 00:18:15,515 |
|
1645 So there's again no |
|
1646 pain involved in |
|
1647 |
|
1648 359 |
|
1649 00:18:15,515 --> 00:18:19,130 |
|
1650 adding this reg expression |
|
1651 to the algorithm. |
|
1652 |
|
1653 360 |
|
1654 00:18:19,130 --> 00:18:22,100 |
|
1655 We made it to the end of |
|
1656 this video and we made |
|
1657 |
|
1658 361 |
|
1659 00:18:22,100 --> 00:18:24,739 |
|
1660 it also to the end |
|
1661 of this algorithm. |
|
1662 |
|
1663 362 |
|
1664 00:18:24,739 --> 00:18:27,620 |
|
1665 I can see to loose trends. |
|
1666 |
|
1667 363 |
|
1668 00:18:27,620 --> 00:18:29,420 |
|
1669 One is we implemented |
|
1670 |
|
1671 364 |
|
1672 00:18:29,420 --> 00:18:32,855 |
|
1673 this algorithm for the |
|
1674 basic regular expressions. |
|
1675 |
|
1676 365 |
|
1677 00:18:32,855 --> 00:18:38,705 |
|
1678 And we also added the end |
|
1679 times out of necessity. |
|
1680 |
|
1681 366 |
|
1682 00:18:38,705 --> 00:18:41,120 |
|
1683 And I also showed |
|
1684 you how to implement |
|
1685 |
|
1686 367 |
|
1687 00:18:41,120 --> 00:18:44,840 |
|
1688 the not regular |
|
1689 expression on a slide. |
|
1690 |
|
1691 368 |
|
1692 00:18:44,840 --> 00:18:48,995 |
|
1693 But your task in the |
|
1694 coursework actually is |
|
1695 |
|
1696 369 |
|
1697 00:18:48,995 --> 00:18:52,970 |
|
1698 to extend that to some of |
|
1699 the extended reg expression. |
|
1700 |
|
1701 370 |
|
1702 00:18:52,970 --> 00:18:57,260 |
|
1703 So especially these bounded |
|
1704 repetitions like from 0 to |
|
1705 |
|
1706 371 |
|
1707 00:18:57,260 --> 00:19:01,550 |
|
1708 n times or between n and m times. |
|
1709 |
|
1710 372 |
|
1711 00:19:01,550 --> 00:19:04,325 |
|
1712 And I think also |
|
1713 plus and D option. |
|
1714 |
|
1715 373 |
|
1716 00:19:04,325 --> 00:19:07,220 |
|
1717 The other loose trend is, |
|
1718 |
|
1719 374 |
|
1720 00:19:07,220 --> 00:19:09,200 |
|
1721 remember I did this while |
|
1722 |
|
1723 375 |
|
1724 00:19:09,200 --> 00:19:11,645 |
|
1725 calculations with |
|
1726 regular expressions. |
|
1727 |
|
1728 376 |
|
1729 00:19:11,645 --> 00:19:13,205 |
|
1730 Why on earth? |
|
1731 |
|
1732 377 |
|
1733 00:19:13,205 --> 00:19:14,480 |
|
1734 Is that all correct? |
|
1735 |
|
1736 378 |
|
1737 00:19:14,480 --> 00:19:16,355 |
|
1738 Why on earth should |
|
1739 this algorithm |
|
1740 |
|
1741 379 |
|
1742 00:19:16,355 --> 00:19:18,575 |
|
1743 actually satisfy |
|
1744 our specification, |
|
1745 |
|
1746 380 |
|
1747 00:19:18,575 --> 00:19:20,450 |
|
1748 which we set out |
|
1749 at the beginning. |
|
1750 |
|
1751 381 |
|
1752 00:19:20,450 --> 00:19:25,410 |
|
1753 So that needs to be looked at |
|
1754 more closely. Bye for now. |