|
1 1 |
|
2 00:00:05,880 --> 00:00:09,700 |
|
3 Welcome back. |
|
4 Remember this slide. |
|
5 |
|
6 2 |
|
7 00:00:09,700 --> 00:00:11,500 |
|
8 This slide said, What is |
|
9 |
|
10 3 |
|
11 00:00:11,500 --> 00:00:14,500 |
|
12 all wreck expression Metro |
|
13 actually supposed to do? |
|
14 |
|
15 4 |
|
16 00:00:14,500 --> 00:00:16,570 |
|
17 It will take two arguments and |
|
18 |
|
19 5 |
|
20 00:00:16,570 --> 00:00:18,670 |
|
21 reg expression R and a string |
|
22 |
|
23 6 |
|
24 00:00:18,670 --> 00:00:21,580 |
|
25 S. And it's supposed to say yes, |
|
26 |
|
27 7 |
|
28 00:00:21,580 --> 00:00:23,440 |
|
29 the wreck expression matches |
|
30 |
|
31 8 |
|
32 00:00:23,440 --> 00:00:26,920 |
|
33 the string if and only |
|
34 if the string is in |
|
35 |
|
36 9 |
|
37 00:00:26,920 --> 00:00:28,855 |
|
38 the language of R. |
|
39 |
|
40 10 |
|
41 00:00:28,855 --> 00:00:32,410 |
|
42 And if the string is not |
|
43 in the language of our, |
|
44 |
|
45 11 |
|
46 00:00:32,410 --> 00:00:35,515 |
|
47 then our algorithm has to say no. |
|
48 |
|
49 12 |
|
50 00:00:35,515 --> 00:00:37,210 |
|
51 And we can't use |
|
52 |
|
53 13 |
|
54 00:00:37,210 --> 00:00:39,565 |
|
55 this specification |
|
56 directly because remember, |
|
57 |
|
58 14 |
|
59 00:00:39,565 --> 00:00:43,305 |
|
60 this l Sometimes |
|
61 produces infinite sets. |
|
62 |
|
63 15 |
|
64 00:00:43,305 --> 00:00:47,585 |
|
65 And so we can test whether a |
|
66 string is an infinite set, |
|
67 |
|
68 16 |
|
69 00:00:47,585 --> 00:00:50,090 |
|
70 at least not easily. |
|
71 |
|
72 17 |
|
73 00:00:50,090 --> 00:00:52,340 |
|
74 And so what we have to do |
|
75 |
|
76 18 |
|
77 00:00:52,340 --> 00:00:54,260 |
|
78 instead is we have |
|
79 to be a bit more |
|
80 |
|
81 19 |
|
82 00:00:54,260 --> 00:00:57,050 |
|
83 clever and implement |
|
84 some operations |
|
85 |
|
86 20 |
|
87 00:00:57,050 --> 00:00:59,284 |
|
88 on Rekha expressions instead. |
|
89 |
|
90 21 |
|
91 00:00:59,284 --> 00:01:03,275 |
|
92 Because Weka expressions |
|
93 are always finite trees. |
|
94 |
|
95 22 |
|
96 00:01:03,275 --> 00:01:05,870 |
|
97 I should say the |
|
98 person who has been |
|
99 |
|
100 23 |
|
101 00:01:05,870 --> 00:01:08,150 |
|
102 clever is called brush-off ski. |
|
103 |
|
104 24 |
|
105 00:01:08,150 --> 00:01:11,435 |
|
106 It's his, I've written, |
|
107 I'm introducing here. |
|
108 |
|
109 25 |
|
110 00:01:11,435 --> 00:01:15,515 |
|
111 And his algorithm consists |
|
112 of two functions. |
|
113 |
|
114 26 |
|
115 00:01:15,515 --> 00:01:17,840 |
|
116 One is called |
|
117 nullable and it takes |
|
118 |
|
119 27 |
|
120 00:01:17,840 --> 00:01:20,104 |
|
121 a regular expression as argument. |
|
122 |
|
123 28 |
|
124 00:01:20,104 --> 00:01:24,155 |
|
125 And the idea is that |
|
126 this function nullable. |
|
127 |
|
128 29 |
|
129 00:01:24,155 --> 00:01:26,450 |
|
130 Testing whether |
|
131 the reg expression |
|
132 |
|
133 30 |
|
134 00:01:26,450 --> 00:01:27,950 |
|
135 can match the empty string. |
|
136 |
|
137 31 |
|
138 00:01:27,950 --> 00:01:30,305 |
|
139 So 0 cannot match any string. |
|
140 |
|
141 32 |
|
142 00:01:30,305 --> 00:01:33,275 |
|
143 So it cannot match the |
|
144 empty string either. |
|
145 |
|
146 33 |
|
147 00:01:33,275 --> 00:01:35,465 |
|
148 So that's defined as false. |
|
149 |
|
150 34 |
|
151 00:01:35,465 --> 00:01:37,775 |
|
152 This reg expression, |
|
153 the whole purpose |
|
154 |
|
155 35 |
|
156 00:01:37,775 --> 00:01:39,680 |
|
157 is that it can match |
|
158 the empty string. |
|
159 |
|
160 36 |
|
161 00:01:39,680 --> 00:01:41,645 |
|
162 So it's defined as true. |
|
163 |
|
164 37 |
|
165 00:01:41,645 --> 00:01:44,599 |
|
166 If a reg expression |
|
167 can match a character, |
|
168 |
|
169 38 |
|
170 00:01:44,599 --> 00:01:47,045 |
|
171 then it cannot match |
|
172 the empty string. |
|
173 |
|
174 39 |
|
175 00:01:47,045 --> 00:01:49,445 |
|
176 So that is defined as false. |
|
177 |
|
178 40 |
|
179 00:01:49,445 --> 00:01:53,180 |
|
180 If an alternative can |
|
181 match the empty string, |
|
182 |
|
183 41 |
|
184 00:01:53,180 --> 00:01:56,960 |
|
185 then either or one can |
|
186 match the empty string. |
|
187 |
|
188 42 |
|
189 00:01:56,960 --> 00:01:59,720 |
|
190 Or R2 can match the empty string. |
|
191 |
|
192 43 |
|
193 00:01:59,720 --> 00:02:03,110 |
|
194 So either nullable |
|
195 of R1 has to hold, |
|
196 |
|
197 44 |
|
198 00:02:03,110 --> 00:02:06,945 |
|
199 or nullable of R2 has to hold. |
|
200 |
|
201 45 |
|
202 00:02:06,945 --> 00:02:09,925 |
|
203 In this sequence, it's |
|
204 the other way around. |
|
205 |
|
206 46 |
|
207 00:02:09,925 --> 00:02:12,790 |
|
208 If this reg expression can |
|
209 match the empty string, |
|
210 |
|
211 47 |
|
212 00:02:12,790 --> 00:02:16,615 |
|
213 then R1 has to be able to |
|
214 match the empty string, |
|
215 |
|
216 48 |
|
217 00:02:16,615 --> 00:02:20,290 |
|
218 and R2 has to be able to |
|
219 match the empty string. |
|
220 |
|
221 49 |
|
222 00:02:20,290 --> 00:02:22,555 |
|
223 So here it's just the opposite. |
|
224 |
|
225 50 |
|
226 00:02:22,555 --> 00:02:25,660 |
|
227 In one case it is o and |
|
228 the other case it's end. |
|
229 |
|
230 51 |
|
231 00:02:25,660 --> 00:02:27,970 |
|
232 In the store record |
|
233 expression can match |
|
234 |
|
235 52 |
|
236 00:02:27,970 --> 00:02:30,445 |
|
237 always the empty string. |
|
238 So that is true. |
|
239 |
|
240 53 |
|
241 00:02:30,445 --> 00:02:33,340 |
|
242 So this is a simple |
|
243 recursive function |
|
244 |
|
245 54 |
|
246 00:02:33,340 --> 00:02:37,179 |
|
247 and should not be too |
|
248 difficult to implement. |
|
249 |
|
250 55 |
|
251 00:02:37,179 --> 00:02:42,025 |
|
252 Okay, this nullable function |
|
253 that is easy-peasy. |
|
254 |
|
255 56 |
|
256 00:02:42,025 --> 00:02:44,604 |
|
257 The second function, however, |
|
258 |
|
259 57 |
|
260 00:02:44,604 --> 00:02:49,155 |
|
261 is a bit more involved and |
|
262 that's just to be expected. |
|
263 |
|
264 58 |
|
265 00:02:49,155 --> 00:02:53,075 |
|
266 Remember people working in |
|
267 this area already for decades. |
|
268 |
|
269 59 |
|
270 00:02:53,075 --> 00:02:57,305 |
|
271 If they have some problems |
|
272 with runtime, for example, |
|
273 |
|
274 60 |
|
275 00:02:57,305 --> 00:02:58,940 |
|
276 we can't expect that as |
|
277 |
|
278 61 |
|
279 00:02:58,940 --> 00:03:03,260 |
|
280 simple fix will solve all |
|
281 the problems in the world. |
|
282 |
|
283 62 |
|
284 00:03:03,260 --> 00:03:06,530 |
|
285 So I admit the second |
|
286 function is a bit more |
|
287 |
|
288 63 |
|
289 00:03:06,530 --> 00:03:10,085 |
|
290 complicated and make sure |
|
291 that you understand it. |
|
292 |
|
293 64 |
|
294 00:03:10,085 --> 00:03:12,140 |
|
295 And it also just |
|
296 chose this preserves |
|
297 |
|
298 65 |
|
299 00:03:12,140 --> 00:03:14,345 |
|
300 the is a very clever guy. |
|
301 |
|
302 66 |
|
303 00:03:14,345 --> 00:03:15,800 |
|
304 Actually, I have to say he was |
|
305 |
|
306 67 |
|
307 00:03:15,800 --> 00:03:17,720 |
|
308 a clever guy because |
|
309 I think he either |
|
310 |
|
311 68 |
|
312 00:03:17,720 --> 00:03:21,650 |
|
313 died last year or |
|
314 the year before. |
|
315 |
|
316 69 |
|
317 00:03:21,650 --> 00:03:25,505 |
|
318 And he came up with this |
|
319 algorithm already in 1964. |
|
320 |
|
321 70 |
|
322 00:03:25,505 --> 00:03:27,260 |
|
323 It somehow got lost, |
|
324 |
|
325 71 |
|
326 00:03:27,260 --> 00:03:30,305 |
|
327 but has been rediscovered |
|
328 in the last ten years. |
|
329 |
|
330 72 |
|
331 00:03:30,305 --> 00:03:34,685 |
|
332 So the idea of the second |
|
333 function is the following. |
|
334 |
|
335 73 |
|
336 00:03:34,685 --> 00:03:38,120 |
|
337 Imagine you have a |
|
338 reexpression and it can |
|
339 |
|
340 74 |
|
341 00:03:38,120 --> 00:03:41,930 |
|
342 match a string of the |
|
343 form C followed by as. |
|
344 |
|
345 75 |
|
346 00:03:41,930 --> 00:03:44,810 |
|
347 So the C is the first |
|
348 character of that string. |
|
349 |
|
350 76 |
|
351 00:03:44,810 --> 00:03:48,605 |
|
352 So I mentioned that can |
|
353 match this kind of string. |
|
354 |
|
355 77 |
|
356 00:03:48,605 --> 00:03:50,330 |
|
357 The question is now, |
|
358 |
|
359 78 |
|
360 00:03:50,330 --> 00:03:52,400 |
|
361 what would a wrecker |
|
362 expression look |
|
363 |
|
364 79 |
|
365 00:03:52,400 --> 00:03:54,695 |
|
366 like that can match chest |
|
367 |
|
368 80 |
|
369 00:03:54,695 --> 00:03:57,140 |
|
370 s. You probably remember |
|
371 |
|
372 81 |
|
373 00:03:57,140 --> 00:03:59,300 |
|
374 that from the semantic |
|
375 that every relative, |
|
376 |
|
377 82 |
|
378 00:03:59,300 --> 00:04:00,860 |
|
379 there was also |
|
380 something of chopping |
|
381 |
|
382 83 |
|
383 00:04:00,860 --> 00:04:02,210 |
|
384 off the first character. |
|
385 |
|
386 84 |
|
387 00:04:02,210 --> 00:04:04,940 |
|
388 Here it's the same. |
|
389 If a reg expression |
|
390 |
|
391 85 |
|
392 00:04:04,940 --> 00:04:07,835 |
|
393 can match a string |
|
394 starting with a C, |
|
395 |
|
396 86 |
|
397 00:04:07,835 --> 00:04:11,090 |
|
398 we're looking for a wreck |
|
399 expression which can match |
|
400 |
|
401 87 |
|
402 00:04:11,090 --> 00:04:15,215 |
|
403 the rest of the string where |
|
404 the c has been chopped off. |
|
405 |
|
406 88 |
|
407 00:04:15,215 --> 00:04:17,690 |
|
408 And this operation will be called |
|
409 |
|
410 89 |
|
411 00:04:17,690 --> 00:04:21,049 |
|
412 the derivative of a |
|
413 wreck expression. |
|
414 |
|
415 90 |
|
416 00:04:21,049 --> 00:04:22,205 |
|
417 And it will also take |
|
418 |
|
419 91 |
|
420 00:04:22,205 --> 00:04:25,460 |
|
421 a character as argument |
|
422 and the rec expression. |
|
423 |
|
424 92 |
|
425 00:04:25,460 --> 00:04:28,730 |
|
426 And in contrast to |
|
427 the semantic records, |
|
428 |
|
429 93 |
|
430 00:04:28,730 --> 00:04:31,310 |
|
431 semantic derivative, which works |
|
432 |
|
433 94 |
|
434 00:04:31,310 --> 00:04:34,430 |
|
435 over languages or |
|
436 sets of strings. |
|
437 |
|
438 95 |
|
439 00:04:34,430 --> 00:04:39,620 |
|
440 This derivative works |
|
441 over regular expressions. |
|
442 |
|
443 96 |
|
444 00:04:39,620 --> 00:04:41,330 |
|
445 Okay, here's this function. |
|
446 |
|
447 97 |
|
448 00:04:41,330 --> 00:04:43,970 |
|
449 It's defined recursively over |
|
450 |
|
451 98 |
|
452 00:04:43,970 --> 00:04:46,370 |
|
453 the structure of rec expressions. |
|
454 |
|
455 99 |
|
456 00:04:46,370 --> 00:04:48,814 |
|
457 The first argument |
|
458 is the character, |
|
459 |
|
460 100 |
|
461 00:04:48,814 --> 00:04:52,700 |
|
462 and the second one is |
|
463 a wreck expression. |
|
464 |
|
465 101 |
|
466 00:04:52,700 --> 00:04:56,510 |
|
467 And remember the idea |
|
468 is we're looking for |
|
469 |
|
470 102 |
|
471 00:04:56,510 --> 00:05:00,680 |
|
472 a wreck expression that |
|
473 can match everything. |
|
474 |
|
475 103 |
|
476 00:05:00,680 --> 00:05:03,125 |
|
477 This reg expression |
|
478 as argument can match |
|
479 |
|
480 104 |
|
481 00:05:03,125 --> 00:05:07,040 |
|
482 except for the C. So now let's |
|
483 look at this first case. |
|
484 |
|
485 105 |
|
486 00:05:07,040 --> 00:05:10,550 |
|
487 So this reg expression |
|
488 cannot match any string. |
|
489 |
|
490 106 |
|
491 00:05:10,550 --> 00:05:14,270 |
|
492 So it certainly cannot |
|
493 match any string starting |
|
494 |
|
495 107 |
|
496 00:05:14,270 --> 00:05:16,910 |
|
497 a C. So we have to look |
|
498 |
|
499 108 |
|
500 00:05:16,910 --> 00:05:20,090 |
|
501 for and reg expression which |
|
502 can not much anything. |
|
503 |
|
504 109 |
|
505 00:05:20,090 --> 00:05:22,700 |
|
506 Well, that's the purpose |
|
507 of this record expression, |
|
508 |
|
509 110 |
|
510 00:05:22,700 --> 00:05:24,815 |
|
511 so we define it as 0. |
|
512 |
|
513 111 |
|
514 00:05:24,815 --> 00:05:28,130 |
|
515 In the next case, |
|
516 this reg expression |
|
517 |
|
518 112 |
|
519 00:05:28,130 --> 00:05:30,440 |
|
520 can match the empty string, |
|
521 |
|
522 113 |
|
523 00:05:30,440 --> 00:05:33,440 |
|
524 but it cannot match |
|
525 any string that starts |
|
526 |
|
527 114 |
|
528 00:05:33,440 --> 00:05:36,350 |
|
529 with C. So also in this case, |
|
530 |
|
531 115 |
|
532 00:05:36,350 --> 00:05:39,560 |
|
533 we just define it as |
|
534 the bracket question, |
|
535 |
|
536 116 |
|
537 00:05:39,560 --> 00:05:41,615 |
|
538 which cannot match anything. |
|
539 |
|
540 117 |
|
541 00:05:41,615 --> 00:05:45,170 |
|
542 In the next case, this |
|
543 C as the argument to |
|
544 |
|
545 118 |
|
546 00:05:45,170 --> 00:05:48,335 |
|
547 the derivative and this d |
|
548 is the Rekha expression. |
|
549 |
|
550 119 |
|
551 00:05:48,335 --> 00:05:50,225 |
|
552 So now there are two cases. |
|
553 |
|
554 120 |
|
555 00:05:50,225 --> 00:05:53,525 |
|
556 If this C and this D |
|
557 is actually equal. |
|
558 |
|
559 121 |
|
560 00:05:53,525 --> 00:05:55,970 |
|
561 That means this record |
|
562 expression can match |
|
563 |
|
564 122 |
|
565 00:05:55,970 --> 00:05:59,240 |
|
566 a string which just contains C0. |
|
567 |
|
568 123 |
|
569 00:05:59,240 --> 00:06:01,505 |
|
570 So if we strip that off, |
|
571 |
|
572 124 |
|
573 00:06:01,505 --> 00:06:04,790 |
|
574 motor remains is |
|
575 the empty string. |
|
576 |
|
577 125 |
|
578 00:06:04,790 --> 00:06:06,890 |
|
579 What is a regular expression |
|
580 |
|
581 126 |
|
582 00:06:06,890 --> 00:06:09,170 |
|
583 look like that can |
|
584 match the empty string. |
|
585 |
|
586 127 |
|
587 00:06:09,170 --> 00:06:11,915 |
|
588 Well, that's the |
|
589 purpose of the warm. |
|
590 |
|
591 128 |
|
592 00:06:11,915 --> 00:06:15,440 |
|
593 And if c is not equal to d, |
|
594 |
|
595 129 |
|
596 00:06:15,440 --> 00:06:17,630 |
|
597 then this reg expression |
|
598 |
|
599 130 |
|
600 00:06:17,630 --> 00:06:19,220 |
|
601 cannot match anything |
|
602 which starts with |
|
603 |
|
604 131 |
|
605 00:06:19,220 --> 00:06:23,780 |
|
606 a C. So again it will |
|
607 be defined as just 0. |
|
608 |
|
609 132 |
|
610 00:06:23,780 --> 00:06:29,390 |
|
611 Okay? Now, the alternative case, |
|
612 |
|
613 133 |
|
614 00:06:29,390 --> 00:06:31,970 |
|
615 if this reg expression can |
|
616 |
|
617 134 |
|
618 00:06:31,970 --> 00:06:36,050 |
|
619 match the strings |
|
620 starting with C, |
|
621 |
|
622 135 |
|
623 00:06:36,050 --> 00:06:40,820 |
|
624 then it can either be |
|
625 matched by the Ongwen. |
|
626 |
|
627 136 |
|
628 00:06:40,820 --> 00:06:44,495 |
|
629 Or it can be much by the R2. |
|
630 |
|
631 137 |
|
632 00:06:44,495 --> 00:06:50,090 |
|
633 If they are one can match C |
|
634 and then following a string, |
|
635 |
|
636 138 |
|
637 00:06:50,090 --> 00:06:53,975 |
|
638 then we just have to recursively |
|
639 call the derivative for |
|
640 |
|
641 139 |
|
642 00:06:53,975 --> 00:06:56,570 |
|
643 R to get a reg expression |
|
644 |
|
645 140 |
|
646 00:06:56,570 --> 00:06:59,135 |
|
647 that can match the |
|
648 rest of the string. |
|
649 |
|
650 141 |
|
651 00:06:59,135 --> 00:07:02,690 |
|
652 And the same we can do with R2. |
|
653 |
|
654 142 |
|
655 00:07:02,690 --> 00:07:06,110 |
|
656 You can find a reg expression |
|
657 which can match everything. |
|
658 |
|
659 143 |
|
660 00:07:06,110 --> 00:07:07,850 |
|
661 This R2 can match, |
|
662 |
|
663 144 |
|
664 00:07:07,850 --> 00:07:09,710 |
|
665 starting with C, bad, |
|
666 |
|
667 145 |
|
668 00:07:09,710 --> 00:07:12,590 |
|
669 which chopping off this C. Okay? |
|
670 |
|
671 146 |
|
672 00:07:12,590 --> 00:07:16,370 |
|
673 So now if you have to find |
|
674 the break expression, |
|
675 |
|
676 147 |
|
677 00:07:16,370 --> 00:07:20,030 |
|
678 which can match all the strings |
|
679 where C is tripped off. |
|
680 |
|
681 148 |
|
682 00:07:20,030 --> 00:07:22,295 |
|
683 Then we just have to |
|
684 take the alternatives |
|
685 |
|
686 149 |
|
687 00:07:22,295 --> 00:07:24,965 |
|
688 of these two derivatives. |
|
689 |
|
690 150 |
|
691 00:07:24,965 --> 00:07:29,390 |
|
692 Okay? Now to sequence case, |
|
693 |
|
694 151 |
|
695 00:07:29,390 --> 00:07:33,005 |
|
696 this sequence case is the |
|
697 most complicated one. |
|
698 |
|
699 152 |
|
700 00:07:33,005 --> 00:07:35,180 |
|
701 And let's look first at |
|
702 |
|
703 153 |
|
704 00:07:35,180 --> 00:07:39,335 |
|
705 the second case where |
|
706 the Earth's brush. |
|
707 |
|
708 154 |
|
709 00:07:39,335 --> 00:07:42,830 |
|
710 Okay? So if this |
|
711 regular expression |
|
712 |
|
713 155 |
|
714 00:07:42,830 --> 00:07:46,145 |
|
715 can match a string |
|
716 starting with C, |
|
717 |
|
718 156 |
|
719 00:07:46,145 --> 00:07:48,155 |
|
720 then the following |
|
721 must have happened. |
|
722 |
|
723 157 |
|
724 00:07:48,155 --> 00:07:51,905 |
|
725 First, the R1 must have matched |
|
726 a string starting with C |
|
727 |
|
728 158 |
|
729 00:07:51,905 --> 00:07:56,420 |
|
730 and then anything coming |
|
731 afterwards with r2. |
|
732 |
|
733 159 |
|
734 00:07:56,420 --> 00:07:59,660 |
|
735 Okay? So in this case I only |
|
736 |
|
737 160 |
|
738 00:07:59,660 --> 00:08:03,260 |
|
739 have to call this |
|
740 derivative for this r one. |
|
741 |
|
742 161 |
|
743 00:08:03,260 --> 00:08:06,830 |
|
744 And find the reg expression |
|
745 which can match everything |
|
746 |
|
747 162 |
|
748 00:08:06,830 --> 00:08:11,555 |
|
749 this R one can match except |
|
750 for this C chopped off. |
|
751 |
|
752 163 |
|
753 00:08:11,555 --> 00:08:15,830 |
|
754 And I have to build that |
|
755 sequence derivative of that. |
|
756 |
|
757 164 |
|
758 00:08:15,830 --> 00:08:18,260 |
|
759 So that's what the As per se. |
|
760 |
|
761 165 |
|
762 00:08:18,260 --> 00:08:21,860 |
|
763 So I take the |
|
764 derivative of R1 and I |
|
765 |
|
766 166 |
|
767 00:08:21,860 --> 00:08:23,480 |
|
768 put the R2 on |
|
769 |
|
770 167 |
|
771 00:08:23,480 --> 00:08:25,865 |
|
772 the back because that's |
|
773 the rest of the string. |
|
774 |
|
775 168 |
|
776 00:08:25,865 --> 00:08:29,240 |
|
777 Ok? So that's the only |
|
778 case we have to consider |
|
779 |
|
780 169 |
|
781 00:08:29,240 --> 00:08:32,750 |
|
782 this sequence case |
|
783 except if not the, |
|
784 |
|
785 170 |
|
786 00:08:32,750 --> 00:08:37,895 |
|
787 how one can match the |
|
788 sea and something else. |
|
789 |
|
790 171 |
|
791 00:08:37,895 --> 00:08:42,965 |
|
792 But if on mismatching |
|
793 actually the empty string, |
|
794 |
|
795 172 |
|
796 00:08:42,965 --> 00:08:48,890 |
|
797 this case actually the R two |
|
798 is in charge of matching |
|
799 |
|
800 173 |
|
801 00:08:48,890 --> 00:08:51,590 |
|
802 the string starting with C. So in |
|
803 |
|
804 174 |
|
805 00:08:51,590 --> 00:08:55,490 |
|
806 this case we have to call |
|
807 the derivative for R2. |
|
808 |
|
809 175 |
|
810 00:08:55,490 --> 00:08:57,875 |
|
811 So that's why we have |
|
812 these two cases. |
|
813 |
|
814 176 |
|
815 00:08:57,875 --> 00:09:00,455 |
|
816 So if R1 is nullable, |
|
817 |
|
818 177 |
|
819 00:09:00,455 --> 00:09:03,245 |
|
820 then it can match |
|
821 the empty string. |
|
822 |
|
823 178 |
|
824 00:09:03,245 --> 00:09:05,330 |
|
825 And we have to |
|
826 consider the case that |
|
827 |
|
828 179 |
|
829 00:09:05,330 --> 00:09:08,045 |
|
830 this R2 is matching a string |
|
831 |
|
832 180 |
|
833 00:09:08,045 --> 00:09:10,700 |
|
834 starting with C. And so we have |
|
835 |
|
836 181 |
|
837 00:09:10,700 --> 00:09:14,210 |
|
838 to call the derivative |
|
839 for this R2 in this case. |
|
840 |
|
841 182 |
|
842 00:09:14,210 --> 00:09:18,680 |
|
843 Otherwise, the R1 will |
|
844 be in charge of matching |
|
845 |
|
846 183 |
|
847 00:09:18,680 --> 00:09:20,840 |
|
848 a part of that |
|
849 string starting with |
|
850 |
|
851 184 |
|
852 00:09:20,840 --> 00:09:24,695 |
|
853 C. And I have to call the |
|
854 derivative on this R1. |
|
855 |
|
856 185 |
|
857 00:09:24,695 --> 00:09:30,670 |
|
858 Okay? I hope this makes sense. |
|
859 |
|
860 186 |
|
861 00:09:30,670 --> 00:09:34,150 |
|
862 Go over that and |
|
863 also the handouts. |
|
864 |
|
865 187 |
|
866 00:09:34,150 --> 00:09:37,465 |
|
867 Again. With that, that's |
|
868 it's really subtle. |
|
869 |
|
870 188 |
|
871 00:09:37,465 --> 00:09:40,945 |
|
872 And how do I remember this case? |
|
873 |
|
874 189 |
|
875 00:09:40,945 --> 00:09:42,430 |
|
876 Well, I know it's |
|
877 |
|
878 190 |
|
879 00:09:42,430 --> 00:09:45,310 |
|
880 an if condition and the |
|
881 condition is nullable, |
|
882 |
|
883 191 |
|
884 00:09:45,310 --> 00:09:48,160 |
|
885 then I will always remember |
|
886 |
|
887 192 |
|
888 00:09:48,160 --> 00:09:53,275 |
|
889 the else branch where pushes |
|
890 NG derivative over the R1. |
|
891 |
|
892 193 |
|
893 00:09:53,275 --> 00:09:55,780 |
|
894 So are usually write |
|
895 down the S punch |
|
896 |
|
897 194 |
|
898 00:09:55,780 --> 00:09:59,650 |
|
899 first and then construct |
|
900 the thin branch by saying, |
|
901 |
|
902 195 |
|
903 00:09:59,650 --> 00:10:01,525 |
|
904 well, I just repeat this. |
|
905 |
|
906 196 |
|
907 00:10:01,525 --> 00:10:03,760 |
|
908 And I have to remember |
|
909 in that case I |
|
910 |
|
911 197 |
|
912 00:10:03,760 --> 00:10:06,580 |
|
913 have to build also |
|
914 derivative of R2. |
|
915 |
|
916 198 |
|
917 00:10:06,580 --> 00:10:12,695 |
|
918 Okay? Finally, the star case. |
|
919 |
|
920 199 |
|
921 00:10:12,695 --> 00:10:15,665 |
|
922 Ok. So here again |
|
923 we're looking for |
|
924 |
|
925 200 |
|
926 00:10:15,665 --> 00:10:17,300 |
|
927 a reg expression which |
|
928 |
|
929 201 |
|
930 00:10:17,300 --> 00:10:19,745 |
|
931 can match the string |
|
932 starting with C, |
|
933 |
|
934 202 |
|
935 00:10:19,745 --> 00:10:22,355 |
|
936 except that the c has |
|
937 been chopped off. |
|
938 |
|
939 203 |
|
940 00:10:22,355 --> 00:10:28,640 |
|
941 So if r star has to match |
|
942 a string starting with C, |
|
943 |
|
944 204 |
|
945 00:10:28,640 --> 00:10:32,735 |
|
946 then at least we need one |
|
947 copy of this r. Okay? |
|
948 |
|
949 205 |
|
950 00:10:32,735 --> 00:10:34,310 |
|
951 So at least one copy of |
|
952 |
|
953 206 |
|
954 00:10:34,310 --> 00:10:37,010 |
|
955 this r has matched |
|
956 something which starts with |
|
957 |
|
958 207 |
|
959 00:10:37,010 --> 00:10:38,870 |
|
960 a C and then afterwards |
|
961 |
|
962 208 |
|
963 00:10:38,870 --> 00:10:41,570 |
|
964 come 0 more copies |
|
965 of this OnStar. |
|
966 |
|
967 209 |
|
968 00:10:41,570 --> 00:10:45,530 |
|
969 Okay? What we do there |
|
970 is we trying to find |
|
971 |
|
972 210 |
|
973 00:10:45,530 --> 00:10:47,960 |
|
974 the Rekha expression |
|
975 which can match |
|
976 |
|
977 211 |
|
978 00:10:47,960 --> 00:10:50,915 |
|
979 this first part of the string. |
|
980 |
|
981 212 |
|
982 00:10:50,915 --> 00:10:53,255 |
|
983 However, where the |
|
984 sea is chopped off. |
|
985 |
|
986 213 |
|
987 00:10:53,255 --> 00:10:55,130 |
|
988 And then we just say, well, |
|
989 |
|
990 214 |
|
991 00:10:55,130 --> 00:10:57,050 |
|
992 the rest has to be |
|
993 matched again with |
|
994 |
|
995 215 |
|
996 00:10:57,050 --> 00:10:59,030 |
|
997 0 or more copies of |
|
998 |
|
999 216 |
|
1000 00:10:59,030 --> 00:11:02,600 |
|
1001 this R. So that's why |
|
1002 it's defined like this. |
|
1003 |
|
1004 217 |
|
1005 00:11:02,600 --> 00:11:09,050 |
|
1006 Okay? S8. Please take care |
|
1007 with this definition. |
|
1008 |
|
1009 218 |
|
1010 00:11:09,050 --> 00:11:11,435 |
|
1011 That's not so simple. |
|
1012 |
|
1013 219 |
|
1014 00:11:11,435 --> 00:11:13,250 |
|
1015 Once you get a hang of it, |
|
1016 |
|
1017 220 |
|
1018 00:11:13,250 --> 00:11:15,170 |
|
1019 however, it makes perfect sense. |
|
1020 |
|
1021 221 |
|
1022 00:11:15,170 --> 00:11:17,210 |
|
1023 So let me explain it in |
|
1024 |
|
1025 222 |
|
1026 00:11:17,210 --> 00:11:20,825 |
|
1027 different ways in |
|
1028 the next slides. |
|
1029 |
|
1030 223 |
|
1031 00:11:20,825 --> 00:11:24,695 |
|
1032 Okay, let's look |
|
1033 first some examples. |
|
1034 |
|
1035 224 |
|
1036 00:11:24,695 --> 00:11:27,140 |
|
1037 So here is a regular expression, |
|
1038 |
|
1039 225 |
|
1040 00:11:27,140 --> 00:11:29,390 |
|
1041 R. And let's have |
|
1042 |
|
1043 226 |
|
1044 00:11:29,390 --> 00:11:32,450 |
|
1045 a look at these three |
|
1046 derivatives according to a, |
|
1047 |
|
1048 227 |
|
1049 00:11:32,450 --> 00:11:38,405 |
|
1050 B, and C. And Vishal do |
|
1051 with d1 for the a. Ok. |
|
1052 |
|
1053 228 |
|
1054 00:11:38,405 --> 00:11:42,379 |
|
1055 So here is our reg expression |
|
1056 |
|
1057 229 |
|
1058 00:11:42,379 --> 00:11:45,334 |
|
1059 and was very generous |
|
1060 with dependent a thesis. |
|
1061 |
|
1062 230 |
|
1063 00:11:45,334 --> 00:11:48,140 |
|
1064 And the outermost is a star. |
|
1065 |
|
1066 231 |
|
1067 00:11:48,140 --> 00:11:52,550 |
|
1068 So if people now the |
|
1069 derivative according to a, |
|
1070 |
|
1071 232 |
|
1072 00:11:52,550 --> 00:11:55,474 |
|
1073 the character a of |
|
1074 that wreck expression. |
|
1075 |
|
1076 233 |
|
1077 00:11:55,474 --> 00:11:57,380 |
|
1078 Okay? So the first thing we |
|
1079 |
|
1080 234 |
|
1081 00:11:57,380 --> 00:11:59,555 |
|
1082 have to analyze is the K star. |
|
1083 |
|
1084 235 |
|
1085 00:11:59,555 --> 00:12:04,370 |
|
1086 Ok? So here's direct expression, |
|
1087 which we are looking at. |
|
1088 |
|
1089 236 |
|
1090 00:12:04,370 --> 00:12:09,170 |
|
1091 This are the outermost |
|
1092 constructor is this star. |
|
1093 |
|
1094 237 |
|
1095 00:12:09,170 --> 00:12:11,510 |
|
1096 If you go back to the definition, |
|
1097 |
|
1098 238 |
|
1099 00:12:11,510 --> 00:12:13,625 |
|
1100 I hope you have it next to you, |
|
1101 |
|
1102 239 |
|
1103 00:12:13,625 --> 00:12:16,340 |
|
1104 then this star case is defined |
|
1105 |
|
1106 240 |
|
1107 00:12:16,340 --> 00:12:20,000 |
|
1108 as u taking just the |
|
1109 inside of the star |
|
1110 |
|
1111 241 |
|
1112 00:12:20,000 --> 00:12:23,030 |
|
1113 and apply this derivative and |
|
1114 |
|
1115 242 |
|
1116 00:12:23,030 --> 00:12:26,765 |
|
1117 leave the are on the |
|
1118 outside at the end. |
|
1119 |
|
1120 243 |
|
1121 00:12:26,765 --> 00:12:29,990 |
|
1122 Ok. So that's the first step. |
|
1123 |
|
1124 244 |
|
1125 00:12:29,990 --> 00:12:32,030 |
|
1126 Now we have to analyze |
|
1127 |
|
1128 245 |
|
1129 00:12:32,030 --> 00:12:36,035 |
|
1130 the derivative according to |
|
1131 a of this record expression, |
|
1132 |
|
1133 246 |
|
1134 00:12:36,035 --> 00:12:38,000 |
|
1135 which is an alternative. |
|
1136 |
|
1137 247 |
|
1138 00:12:38,000 --> 00:12:39,665 |
|
1139 So the outermost is a plus. |
|
1140 |
|
1141 248 |
|
1142 00:12:39,665 --> 00:12:41,375 |
|
1143 Ok, that's very easy again, |
|
1144 |
|
1145 249 |
|
1146 00:12:41,375 --> 00:12:45,185 |
|
1147 we just have to push the |
|
1148 derivative into each component, |
|
1149 |
|
1150 250 |
|
1151 00:12:45,185 --> 00:12:47,705 |
|
1152 into the a, followed by b. |
|
1153 |
|
1154 251 |
|
1155 00:12:47,705 --> 00:12:49,145 |
|
1156 And in this segment, |
|
1157 |
|
1158 252 |
|
1159 00:12:49,145 --> 00:12:51,185 |
|
1160 alternative into b. Ok, |
|
1161 |
|
1162 253 |
|
1163 00:12:51,185 --> 00:12:56,030 |
|
1164 so we take the derivative |
|
1165 of each according to a way. |
|
1166 |
|
1167 254 |
|
1168 00:12:56,030 --> 00:13:00,635 |
|
1169 Now this one is a sequence |
|
1170 break expression. |
|
1171 |
|
1172 255 |
|
1173 00:13:00,635 --> 00:13:02,210 |
|
1174 This most complicated case. |
|
1175 |
|
1176 256 |
|
1177 00:13:02,210 --> 00:13:04,160 |
|
1178 So the first of all |
|
1179 you have to test is, |
|
1180 |
|
1181 257 |
|
1182 00:13:04,160 --> 00:13:07,910 |
|
1183 is the first component |
|
1184 nullable of this sequence? |
|
1185 |
|
1186 258 |
|
1187 00:13:07,910 --> 00:13:09,200 |
|
1188 Well, that is a, |
|
1189 |
|
1190 259 |
|
1191 00:13:09,200 --> 00:13:12,740 |
|
1192 in this case, a on its |
|
1193 own is not nullable. |
|
1194 |
|
1195 260 |
|
1196 00:13:12,740 --> 00:13:14,210 |
|
1197 So vn, the easy case, |
|
1198 |
|
1199 261 |
|
1200 00:13:14,210 --> 00:13:17,000 |
|
1201 we only have a single derivative |
|
1202 |
|
1203 262 |
|
1204 00:13:17,000 --> 00:13:19,370 |
|
1205 pushed in the first component. |
|
1206 |
|
1207 263 |
|
1208 00:13:19,370 --> 00:13:25,160 |
|
1209 So we have the derivative |
|
1210 of a with the character a. |
|
1211 |
|
1212 264 |
|
1213 00:13:25,160 --> 00:13:27,920 |
|
1214 Okay, that's now |
|
1215 the character case. |
|
1216 |
|
1217 265 |
|
1218 00:13:27,920 --> 00:13:29,720 |
|
1219 And in this case the character |
|
1220 |
|
1221 266 |
|
1222 00:13:29,720 --> 00:13:31,715 |
|
1223 in the regular expression agree. |
|
1224 |
|
1225 267 |
|
1226 00:13:31,715 --> 00:13:33,890 |
|
1227 So it's defined as one. |
|
1228 |
|
1229 268 |
|
1230 00:13:33,890 --> 00:13:37,550 |
|
1231 Ok? In the other case, |
|
1232 |
|
1233 269 |
|
1234 00:13:37,550 --> 00:13:39,710 |
|
1235 the break expression is P, |
|
1236 |
|
1237 270 |
|
1238 00:13:39,710 --> 00:13:41,675 |
|
1239 But the characters a. |
|
1240 |
|
1241 271 |
|
1242 00:13:41,675 --> 00:13:46,385 |
|
1243 So in this case |
|
1244 it's defined as 0. |
|
1245 |
|
1246 272 |
|
1247 00:13:46,385 --> 00:13:50,630 |
|
1248 Okay? So that's what the |
|
1249 derivative would be. |
|
1250 |
|
1251 273 |
|
1252 00:13:50,630 --> 00:13:52,160 |
|
1253 This r is there |
|
1254 |
|
1255 274 |
|
1256 00:13:52,160 --> 00:13:55,280 |
|
1257 because originally we |
|
1258 started with a star. |
|
1259 |
|
1260 275 |
|
1261 00:13:55,280 --> 00:13:58,295 |
|
1262 This expression is that |
|
1263 star at expression. |
|
1264 |
|
1265 276 |
|
1266 00:13:58,295 --> 00:14:02,780 |
|
1267 Ok? So the derivative |
|
1268 according to a |
|
1269 |
|
1270 277 |
|
1271 00:14:02,780 --> 00:14:07,610 |
|
1272 up that reg expression |
|
1273 is this expression. |
|
1274 |
|
1275 278 |
|
1276 00:14:07,610 --> 00:14:10,970 |
|
1277 We just have to |
|
1278 substitute this back in. |
|
1279 |
|
1280 279 |
|
1281 00:14:10,970 --> 00:14:13,745 |
|
1282 Just coming back to this slide. |
|
1283 |
|
1284 280 |
|
1285 00:14:13,745 --> 00:14:16,160 |
|
1286 So far, they're only analyze |
|
1287 |
|
1288 281 |
|
1289 00:14:16,160 --> 00:14:19,505 |
|
1290 the derivative according |
|
1291 to a single character. |
|
1292 |
|
1293 282 |
|
1294 00:14:19,505 --> 00:14:23,960 |
|
1295 But we can also very easily |
|
1296 extend that to whole strings. |
|
1297 |
|
1298 283 |
|
1299 00:14:23,960 --> 00:14:26,360 |
|
1300 So if you build the |
|
1301 derivative according |
|
1302 |
|
1303 284 |
|
1304 00:14:26,360 --> 00:14:27,905 |
|
1305 to the empty string, |
|
1306 |
|
1307 285 |
|
1308 00:14:27,905 --> 00:14:30,875 |
|
1309 we just return the |
|
1310 Rekha expression. |
|
1311 |
|
1312 286 |
|
1313 00:14:30,875 --> 00:14:35,585 |
|
1314 If we have a string |
|
1315 starting with character c, |
|
1316 |
|
1317 287 |
|
1318 00:14:35,585 --> 00:14:37,850 |
|
1319 remember that can |
|
1320 be any character. |
|
1321 |
|
1322 288 |
|
1323 00:14:37,850 --> 00:14:42,170 |
|
1324 Then we build first the simple |
|
1325 derivative according to |
|
1326 |
|
1327 289 |
|
1328 00:14:42,170 --> 00:14:44,360 |
|
1329 that first character and |
|
1330 |
|
1331 290 |
|
1332 00:14:44,360 --> 00:14:46,925 |
|
1333 continue with the |
|
1334 rest of the string. |
|
1335 |
|
1336 291 |
|
1337 00:14:46,925 --> 00:14:50,615 |
|
1338 So here you see again, |
|
1339 my personal convention. |
|
1340 |
|
1341 292 |
|
1342 00:14:50,615 --> 00:14:54,365 |
|
1343 Everything which works on |
|
1344 lists has this S at the end. |
|
1345 |
|
1346 293 |
|
1347 00:14:54,365 --> 00:14:57,125 |
|
1348 So this function is |
|
1349 for single characters. |
|
1350 |
|
1351 294 |
|
1352 00:14:57,125 --> 00:14:59,179 |
|
1353 This one is for strings, |
|
1354 |
|
1355 295 |
|
1356 00:14:59,179 --> 00:15:02,450 |
|
1357 but it uses the one |
|
1358 for the character. |
|
1359 |
|
1360 296 |
|
1361 00:15:02,450 --> 00:15:04,025 |
|
1362 Essentially what it does is |
|
1363 |
|
1364 297 |
|
1365 00:15:04,025 --> 00:15:06,185 |
|
1366 it chops off the first character, |
|
1367 |
|
1368 298 |
|
1369 00:15:06,185 --> 00:15:09,800 |
|
1370 builds the derivative, then |
|
1371 chops off the next character, |
|
1372 |
|
1373 299 |
|
1374 00:15:09,800 --> 00:15:13,760 |
|
1375 builds the derivative of |
|
1376 the result, and so on. |
|
1377 |
|
1378 300 |
|
1379 00:15:13,760 --> 00:15:17,000 |
|
1380 Having this function, |
|
1381 we can actually now |
|
1382 |
|
1383 301 |
|
1384 00:15:17,000 --> 00:15:20,600 |
|
1385 state what the algorithm |
|
1386 is, the complete algorithm. |
|
1387 |
|
1388 302 |
|
1389 00:15:20,600 --> 00:15:23,465 |
|
1390 So the pro shops ki mantra |
|
1391 |
|
1392 303 |
|
1393 00:15:23,465 --> 00:15:24,860 |
|
1394 takes a regular expression as |
|
1395 |
|
1396 304 |
|
1397 00:15:24,860 --> 00:15:26,915 |
|
1398 argument and a |
|
1399 string as argument. |
|
1400 |
|
1401 305 |
|
1402 00:15:26,915 --> 00:15:30,920 |
|
1403 And is supposed to say yes if |
|
1404 the reg expression matches |
|
1405 |
|
1406 306 |
|
1407 00:15:30,920 --> 00:15:33,560 |
|
1408 the string or No |
|
1409 |
|
1410 307 |
|
1411 00:15:33,560 --> 00:15:36,065 |
|
1412 if the reg expression does |
|
1413 not match the string. |
|
1414 |
|
1415 308 |
|
1416 00:15:36,065 --> 00:15:37,715 |
|
1417 And how does it do that? |
|
1418 |
|
1419 309 |
|
1420 00:15:37,715 --> 00:15:42,560 |
|
1421 Well, it takes this string |
|
1422 s And this reg expression, |
|
1423 |
|
1424 310 |
|
1425 00:15:42,560 --> 00:15:43,925 |
|
1426 and it first built |
|
1427 |
|
1428 311 |
|
1429 00:15:43,925 --> 00:15:48,845 |
|
1430 successive derivatives until |
|
1431 that string is exhaust that. |
|
1432 |
|
1433 312 |
|
1434 00:15:48,845 --> 00:15:52,115 |
|
1435 Okay? Then you have |
|
1436 a final derivative, |
|
1437 |
|
1438 313 |
|
1439 00:15:52,115 --> 00:15:53,839 |
|
1440 a final reg expression. |
|
1441 |
|
1442 314 |
|
1443 00:15:53,839 --> 00:15:55,370 |
|
1444 And you test whether |
|
1445 |
|
1446 315 |
|
1447 00:15:55,370 --> 00:15:57,920 |
|
1448 this reg expression can |
|
1449 match the empty string. |
|
1450 |
|
1451 316 |
|
1452 00:15:57,920 --> 00:16:01,370 |
|
1453 If yes, then the |
|
1454 original reg expression |
|
1455 |
|
1456 317 |
|
1457 00:16:01,370 --> 00:16:03,245 |
|
1458 is r can match the string. |
|
1459 |
|
1460 318 |
|
1461 00:16:03,245 --> 00:16:05,210 |
|
1462 If no, if it cannot match |
|
1463 |
|
1464 319 |
|
1465 00:16:05,210 --> 00:16:08,030 |
|
1466 the final derivative |
|
1467 with the empty string, |
|
1468 |
|
1469 320 |
|
1470 00:16:08,030 --> 00:16:10,280 |
|
1471 then know this |
|
1472 regular expression, |
|
1473 |
|
1474 321 |
|
1475 00:16:10,280 --> 00:16:12,905 |
|
1476 R cannot match that string. |
|
1477 |
|
1478 322 |
|
1479 00:16:12,905 --> 00:16:16,010 |
|
1480 I know it looks |
|
1481 very anticlimactic, |
|
1482 |
|
1483 323 |
|
1484 00:16:16,010 --> 00:16:19,625 |
|
1485 but that's actually the |
|
1486 beauty of this algorithm, |
|
1487 |
|
1488 324 |
|
1489 00:16:19,625 --> 00:16:22,760 |
|
1490 that it's not that complicated. |
|
1491 |
|
1492 325 |
|
1493 00:16:22,760 --> 00:16:25,340 |
|
1494 So how does the algorithm work? |
|
1495 |
|
1496 326 |
|
1497 00:16:25,340 --> 00:16:27,634 |
|
1498 In a concrete example? |
|
1499 |
|
1500 327 |
|
1501 00:16:27,634 --> 00:16:31,580 |
|
1502 Imagine you have a string, abc |
|
1503 |
|
1504 328 |
|
1505 00:16:31,580 --> 00:16:34,370 |
|
1506 and you have a break |
|
1507 expression, say R1. |
|
1508 |
|
1509 329 |
|
1510 00:16:34,370 --> 00:16:37,520 |
|
1511 And you want to find out |
|
1512 whether this or one can match |
|
1513 |
|
1514 330 |
|
1515 00:16:37,520 --> 00:16:41,300 |
|
1516 that string abc or not. |
|
1517 How do you do that? |
|
1518 |
|
1519 331 |
|
1520 00:16:41,300 --> 00:16:45,140 |
|
1521 Well, you would first |
|
1522 take this R1 and you |
|
1523 |
|
1524 332 |
|
1525 00:16:45,140 --> 00:16:47,150 |
|
1526 build the derivative according |
|
1527 |
|
1528 333 |
|
1529 00:16:47,150 --> 00:16:49,880 |
|
1530 to the first character, D-A. |
|
1531 |
|
1532 334 |
|
1533 00:16:49,880 --> 00:16:53,015 |
|
1534 Okay? You get the derivative, |
|
1535 |
|
1536 335 |
|
1537 00:16:53,015 --> 00:16:55,294 |
|
1538 which I call here R2. |
|
1539 |
|
1540 336 |
|
1541 00:16:55,294 --> 00:16:58,640 |
|
1542 Then you take the next |
|
1543 character, the B. |
|
1544 |
|
1545 337 |
|
1546 00:16:58,640 --> 00:17:04,535 |
|
1547 You now build the derivative |
|
1548 according to b of this R2. |
|
1549 |
|
1550 338 |
|
1551 00:17:04,535 --> 00:17:07,460 |
|
1552 Okay? So you take the |
|
1553 result of the first step, |
|
1554 |
|
1555 339 |
|
1556 00:17:07,460 --> 00:17:09,530 |
|
1557 you feed it into the second step, |
|
1558 |
|
1559 340 |
|
1560 00:17:09,530 --> 00:17:11,810 |
|
1561 and you take the |
|
1562 second character. |
|
1563 |
|
1564 341 |
|
1565 00:17:11,810 --> 00:17:17,075 |
|
1566 Then you do this also with c. |
|
1567 So you get a derivative R3, |
|
1568 |
|
1569 342 |
|
1570 00:17:17,075 --> 00:17:22,460 |
|
1571 and you build the derivative |
|
1572 of R three according to c, |
|
1573 |
|
1574 343 |
|
1575 00:17:22,460 --> 00:17:24,185 |
|
1576 you get an R four. |
|
1577 |
|
1578 344 |
|
1579 00:17:24,185 --> 00:17:26,300 |
|
1580 Okay, so that's the |
|
1581 final derivative. |
|
1582 |
|
1583 345 |
|
1584 00:17:26,300 --> 00:17:27,530 |
|
1585 The string is exhausted. |
|
1586 |
|
1587 346 |
|
1588 00:17:27,530 --> 00:17:29,570 |
|
1589 We build derivatives |
|
1590 according to a, B, |
|
1591 |
|
1592 347 |
|
1593 00:17:29,570 --> 00:17:34,610 |
|
1594 and C. Now we just test whether |
|
1595 this r four is nullable. |
|
1596 |
|
1597 348 |
|
1598 00:17:34,610 --> 00:17:37,175 |
|
1599 If it says yes, |
|
1600 |
|
1601 349 |
|
1602 00:17:37,175 --> 00:17:41,510 |
|
1603 then df break expression metro |
|
1604 will just say true, yes, |
|
1605 |
|
1606 350 |
|
1607 00:17:41,510 --> 00:17:43,340 |
|
1608 this original reg expression, |
|
1609 |
|
1610 351 |
|
1611 00:17:43,340 --> 00:17:47,270 |
|
1612 the R1, will be able to |
|
1613 match that string abc. |
|
1614 |
|
1615 352 |
|
1616 00:17:47,270 --> 00:17:50,585 |
|
1617 And if this test returns false, |
|
1618 |
|
1619 353 |
|
1620 00:17:50,585 --> 00:17:53,015 |
|
1621 then the algorithm says false. |
|
1622 |
|
1623 354 |
|
1624 00:17:53,015 --> 00:17:56,975 |
|
1625 This reg expression will |
|
1626 not match the string. |
|
1627 |
|
1628 355 |
|
1629 00:17:56,975 --> 00:18:00,260 |
|
1630 Ok, you might ask |
|
1631 why on earth does |
|
1632 |
|
1633 356 |
|
1634 00:18:00,260 --> 00:18:02,960 |
|
1635 that algorithm |
|
1636 actually work away? |
|
1637 |
|
1638 357 |
|
1639 00:18:02,960 --> 00:18:06,515 |
|
1640 Here's an explanation |
|
1641 for why it works. |
|
1642 |
|
1643 358 |
|
1644 00:18:06,515 --> 00:18:10,190 |
|
1645 Imagine you have a wreck |
|
1646 expression R1, okay? |
|
1647 |
|
1648 359 |
|
1649 00:18:10,190 --> 00:18:13,220 |
|
1650 And you have a string abc, |
|
1651 |
|
1652 360 |
|
1653 00:18:13,220 --> 00:18:14,270 |
|
1654 and you want to find out |
|
1655 |
|
1656 361 |
|
1657 00:18:14,270 --> 00:18:17,180 |
|
1658 whether one can |
|
1659 match that string. |
|
1660 |
|
1661 362 |
|
1662 00:18:17,180 --> 00:18:18,799 |
|
1663 And for the moment, |
|
1664 |
|
1665 363 |
|
1666 00:18:18,799 --> 00:18:22,610 |
|
1667 let's assume that it |
|
1668 can match that string. |
|
1669 |
|
1670 364 |
|
1671 00:18:22,610 --> 00:18:26,315 |
|
1672 Ok? So the language L of |
|
1673 |
|
1674 365 |
|
1675 00:18:26,315 --> 00:18:30,185 |
|
1676 R will actually |
|
1677 contain that string, |
|
1678 |
|
1679 366 |
|
1680 00:18:30,185 --> 00:18:31,805 |
|
1681 otherwise it wouldn't match that. |
|
1682 |
|
1683 367 |
|
1684 00:18:31,805 --> 00:18:36,710 |
|
1685 Okay? So ABC is in |
|
1686 this language, okay? |
|
1687 |
|
1688 368 |
|
1689 00:18:36,710 --> 00:18:39,785 |
|
1690 If I now take the |
|
1691 semantic derivative, |
|
1692 |
|
1693 369 |
|
1694 00:18:39,785 --> 00:18:43,145 |
|
1695 that means I look at all |
|
1696 the strings in this f, |
|
1697 |
|
1698 370 |
|
1699 00:18:43,145 --> 00:18:46,820 |
|
1700 R1, and further out |
|
1701 |
|
1702 371 |
|
1703 00:18:46,820 --> 00:18:48,740 |
|
1704 all the ones which |
|
1705 do not start with |
|
1706 |
|
1707 372 |
|
1708 00:18:48,740 --> 00:18:51,260 |
|
1709 an a, I discharge them. |
|
1710 |
|
1711 373 |
|
1712 00:18:51,260 --> 00:18:54,545 |
|
1713 And I only look the one |
|
1714 which start with an a. |
|
1715 |
|
1716 374 |
|
1717 00:18:54,545 --> 00:18:56,465 |
|
1718 And of those strings, |
|
1719 |
|
1720 375 |
|
1721 00:18:56,465 --> 00:18:58,475 |
|
1722 I chop off this a. |
|
1723 |
|
1724 376 |
|
1725 00:18:58,475 --> 00:19:01,025 |
|
1726 So after this |
|
1727 romantic derivative, |
|
1728 |
|
1729 377 |
|
1730 00:19:01,025 --> 00:19:05,735 |
|
1731 this set of strings will |
|
1732 contain just B and C. |
|
1733 |
|
1734 378 |
|
1735 00:19:05,735 --> 00:19:12,830 |
|
1736 Ok. Now if I build the next |
|
1737 semantic derivative of that, |
|
1738 |
|
1739 379 |
|
1740 00:19:12,830 --> 00:19:14,345 |
|
1741 then I would look at |
|
1742 |
|
1743 380 |
|
1744 00:19:14,345 --> 00:19:16,850 |
|
1745 all the strings which |
|
1746 start with a P, |
|
1747 |
|
1748 381 |
|
1749 00:19:16,850 --> 00:19:21,350 |
|
1750 and forget about everything |
|
1751 else of the ones. |
|
1752 |
|
1753 382 |
|
1754 00:19:21,350 --> 00:19:27,905 |
|
1755 I know they start with B. |
|
1756 I just chop of the B. Ok. |
|
1757 |
|
1758 383 |
|
1759 00:19:27,905 --> 00:19:31,655 |
|
1760 So in this whole set here, |
|
1761 |
|
1762 384 |
|
1763 00:19:31,655 --> 00:19:33,785 |
|
1764 in this whole set here, |
|
1765 |
|
1766 385 |
|
1767 00:19:33,785 --> 00:19:39,030 |
|
1768 there will be now a string |
|
1769 which is just c. Okay? |
|
1770 |
|
1771 386 |
|
1772 00:19:39,190 --> 00:19:44,420 |
|
1773 Then I built the third |
|
1774 semantic derivative |
|
1775 |
|
1776 387 |
|
1777 00:19:44,420 --> 00:19:47,300 |
|
1778 because I want to find out |
|
1779 whether ABC is involved. |
|
1780 |
|
1781 388 |
|
1782 00:19:47,300 --> 00:19:50,540 |
|
1783 Okay? So now I look |
|
1784 at all the strings in |
|
1785 |
|
1786 389 |
|
1787 00:19:50,540 --> 00:19:52,820 |
|
1788 here and look at |
|
1789 |
|
1790 390 |
|
1791 00:19:52,820 --> 00:19:55,340 |
|
1792 them whether they start |
|
1793 with a C. If yes, |
|
1794 |
|
1795 391 |
|
1796 00:19:55,340 --> 00:19:56,885 |
|
1797 I chop off the sea. |
|
1798 |
|
1799 392 |
|
1800 00:19:56,885 --> 00:19:59,120 |
|
1801 And put in markets remaining. |
|
1802 |
|
1803 393 |
|
1804 00:19:59,120 --> 00:20:00,425 |
|
1805 So in this case, |
|
1806 |
|
1807 394 |
|
1808 00:20:00,425 --> 00:20:02,510 |
|
1809 if I have the string c |
|
1810 |
|
1811 395 |
|
1812 00:20:02,510 --> 00:20:04,550 |
|
1813 in this language and |
|
1814 I chop off this, |
|
1815 |
|
1816 396 |
|
1817 00:20:04,550 --> 00:20:07,700 |
|
1818 see what is remaining |
|
1819 is the empty string. |
|
1820 |
|
1821 397 |
|
1822 00:20:07,700 --> 00:20:09,695 |
|
1823 So we have to check of |
|
1824 |
|
1825 398 |
|
1826 00:20:09,695 --> 00:20:14,510 |
|
1827 that language whether it |
|
1828 contains the empty string. |
|
1829 |
|
1830 399 |
|
1831 00:20:14,510 --> 00:20:18,800 |
|
1832 If yes, then the |
|
1833 original R1 can match |
|
1834 |
|
1835 400 |
|
1836 00:20:18,800 --> 00:20:21,050 |
|
1837 this ABC because this ABC |
|
1838 |
|
1839 401 |
|
1840 00:20:21,050 --> 00:20:24,119 |
|
1841 must have been in this language. |
|
1842 |
|
1843 402 |
|
1844 00:20:24,130 --> 00:20:28,565 |
|
1845 And if in the end there wasn't |
|
1846 the empty string, then, |
|
1847 |
|
1848 403 |
|
1849 00:20:28,565 --> 00:20:33,575 |
|
1850 then this ABC Watson in |
|
1851 this language of one. |
|
1852 |
|
1853 404 |
|
1854 00:20:33,575 --> 00:20:36,260 |
|
1855 And so the electron must have, |
|
1856 |
|
1857 405 |
|
1858 00:20:36,260 --> 00:20:38,880 |
|
1859 or the metro must have failed. |
|
1860 |
|
1861 406 |
|
1862 00:20:39,040 --> 00:20:42,530 |
|
1863 The clever bit is that here |
|
1864 |
|
1865 407 |
|
1866 00:20:42,530 --> 00:20:45,530 |
|
1867 the explanation is for languages. |
|
1868 |
|
1869 408 |
|
1870 00:20:45,530 --> 00:20:49,835 |
|
1871 Remember, this |
|
1872 semantic derivative |
|
1873 |
|
1874 409 |
|
1875 00:20:49,835 --> 00:20:53,450 |
|
1876 works over languages and they |
|
1877 sometimes can be in finite. |
|
1878 |
|
1879 410 |
|
1880 00:20:53,450 --> 00:20:55,730 |
|
1881 So that's not really |
|
1882 an algorithm. |
|
1883 |
|
1884 411 |
|
1885 00:20:55,730 --> 00:20:58,880 |
|
1886 Yeah, that's just |
|
1887 explaining the idea with |
|
1888 |
|
1889 412 |
|
1890 00:20:58,880 --> 00:21:02,525 |
|
1891 preserves key |
|
1892 achieved was that he |
|
1893 |
|
1894 413 |
|
1895 00:21:02,525 --> 00:21:06,440 |
|
1896 now works with this derivative |
|
1897 America expressions and |
|
1898 |
|
1899 414 |
|
1900 00:21:06,440 --> 00:21:10,715 |
|
1901 somehow imitates what |
|
1902 happens on these languages. |
|
1903 |
|
1904 415 |
|
1905 00:21:10,715 --> 00:21:14,135 |
|
1906 Because remember if you |
|
1907 have an wreck expression, |
|
1908 |
|
1909 416 |
|
1910 00:21:14,135 --> 00:21:17,405 |
|
1911 are you want to test |
|
1912 whether can match APC, |
|
1913 |
|
1914 417 |
|
1915 00:21:17,405 --> 00:21:22,550 |
|
1916 then you take first |
|
1917 derivative according to a. |
|
1918 |
|
1919 418 |
|
1920 00:21:22,550 --> 00:21:25,760 |
|
1921 So you will get a wreck |
|
1922 expression which can match b |
|
1923 |
|
1924 419 |
|
1925 00:21:25,760 --> 00:21:29,464 |
|
1926 and c If R could match abc. |
|
1927 |
|
1928 420 |
|
1929 00:21:29,464 --> 00:21:31,430 |
|
1930 So after the first derivative, |
|
1931 |
|
1932 421 |
|
1933 00:21:31,430 --> 00:21:33,620 |
|
1934 you will get a wreck expression |
|
1935 which can match B and |
|
1936 |
|
1937 422 |
|
1938 00:21:33,620 --> 00:21:37,070 |
|
1939 C. If you take the |
|
1940 second derivative, |
|
1941 |
|
1942 423 |
|
1943 00:21:37,070 --> 00:21:41,225 |
|
1944 you will get a reexpression |
|
1945 which can match c alone. |
|
1946 |
|
1947 424 |
|
1948 00:21:41,225 --> 00:21:44,180 |
|
1949 And if you take the |
|
1950 final derivative, |
|
1951 |
|
1952 425 |
|
1953 00:21:44,180 --> 00:21:46,070 |
|
1954 then you will get |
|
1955 |
|
1956 426 |
|
1957 00:21:46,070 --> 00:21:48,260 |
|
1958 rec expression which hopefully |
|
1959 |
|
1960 427 |
|
1961 00:21:48,260 --> 00:21:49,715 |
|
1962 can match the empty string. |
|
1963 |
|
1964 428 |
|
1965 00:21:49,715 --> 00:21:53,780 |
|
1966 If it does, then this |
|
1967 R can match the ABC. |
|
1968 |
|
1969 429 |
|
1970 00:21:53,780 --> 00:21:55,655 |
|
1971 And if it doesn't, then |
|
1972 |
|
1973 430 |
|
1974 00:21:55,655 --> 00:21:58,680 |
|
1975 ABC couldn't be |
|
1976 matched by this on. |
|
1977 |
|
1978 431 |
|
1979 00:21:58,900 --> 00:22:02,990 |
|
1980 Okay, let's have a look |
|
1981 how this pans out in code. |
|
1982 |
|
1983 432 |
|
1984 00:22:02,990 --> 00:22:06,050 |
|
1985 Here's defile RE1. |
|
1986 |
|
1987 433 |
|
1988 00:22:06,050 --> 00:22:07,940 |
|
1989 It's also uploaded on Keith, |
|
1990 |
|
1991 434 |
|
1992 00:22:07,940 --> 00:22:10,625 |
|
1993 so you can see exactly |
|
1994 what I'm doing. |
|
1995 |
|
1996 435 |
|
1997 00:22:10,625 --> 00:22:13,970 |
|
1998 And actually I already saw |
|
1999 that file because I showed you |
|
2000 |
|
2001 436 |
|
2002 00:22:13,970 --> 00:22:15,710 |
|
2003 how my wreck expressions are |
|
2004 |
|
2005 437 |
|
2006 00:22:15,710 --> 00:22:17,960 |
|
2007 defined with the |
|
2008 abstract classes here. |
|
2009 |
|
2010 438 |
|
2011 00:22:17,960 --> 00:22:21,155 |
|
2012 And here, the six cases |
|
2013 for 0-1 character, |
|
2014 |
|
2015 439 |
|
2016 00:22:21,155 --> 00:22:23,540 |
|
2017 I turn a TIF in |
|
2018 sequence and star. |
|
2019 |
|
2020 440 |
|
2021 00:22:23,540 --> 00:22:26,705 |
|
2022 Ok. So the first |
|
2023 function nullable, |
|
2024 |
|
2025 441 |
|
2026 00:22:26,705 --> 00:22:28,760 |
|
2027 the simple one, takes |
|
2028 |
|
2029 442 |
|
2030 00:22:28,760 --> 00:22:32,120 |
|
2031 a regular expression as |
|
2032 argument and returns a boolean. |
|
2033 |
|
2034 443 |
|
2035 00:22:32,120 --> 00:22:34,280 |
|
2036 And then with this |
|
2037 pattern matching, |
|
2038 |
|
2039 444 |
|
2040 00:22:34,280 --> 00:22:37,040 |
|
2041 we just go through |
|
2042 all these six cases |
|
2043 |
|
2044 445 |
|
2045 00:22:37,040 --> 00:22:38,900 |
|
2046 are serious defined as false. |
|
2047 |
|
2048 446 |
|
2049 00:22:38,900 --> 00:22:43,234 |
|
2050 One is defined as true |
|
2051 character for any character, |
|
2052 |
|
2053 447 |
|
2054 00:22:43,234 --> 00:22:45,455 |
|
2055 this null return false. |
|
2056 |
|
2057 448 |
|
2058 00:22:45,455 --> 00:22:47,540 |
|
2059 The alternative is to find here, |
|
2060 |
|
2061 449 |
|
2062 00:22:47,540 --> 00:22:50,000 |
|
2063 so that's the or in Scala. |
|
2064 |
|
2065 450 |
|
2066 00:22:50,000 --> 00:22:52,700 |
|
2067 And for the sequence, |
|
2068 that's the end. |
|
2069 |
|
2070 451 |
|
2071 00:22:52,700 --> 00:22:56,180 |
|
2072 And this star, no matter |
|
2073 what the reg expression is, |
|
2074 |
|
2075 452 |
|
2076 00:22:56,180 --> 00:22:59,540 |
|
2077 it will always match the |
|
2078 empty string, so true. |
|
2079 |
|
2080 453 |
|
2081 00:22:59,540 --> 00:23:02,225 |
|
2082 So nanobots, very easy. |
|
2083 |
|
2084 454 |
|
2085 00:23:02,225 --> 00:23:07,430 |
|
2086 The derivative is also not |
|
2087 so much more complicated. |
|
2088 |
|
2089 455 |
|
2090 00:23:07,430 --> 00:23:08,974 |
|
2091 It takes two arguments, |
|
2092 |
|
2093 456 |
|
2094 00:23:08,974 --> 00:23:11,810 |
|
2095 a character and the |
|
2096 rec expression, |
|
2097 |
|
2098 457 |
|
2099 00:23:11,810 --> 00:23:14,405 |
|
2100 and returns a wreck expression. |
|
2101 |
|
2102 458 |
|
2103 00:23:14,405 --> 00:23:16,340 |
|
2104 So now we just have to analyze |
|
2105 |
|
2106 459 |
|
2107 00:23:16,340 --> 00:23:18,890 |
|
2108 what's that reg |
|
2109 expression looks like. |
|
2110 |
|
2111 460 |
|
2112 00:23:18,890 --> 00:23:23,870 |
|
2113 If it's 0, we return |
|
2114 01, we return 0. |
|
2115 |
|
2116 461 |
|
2117 00:23:23,870 --> 00:23:26,990 |
|
2118 If the character is the |
|
2119 wreck expressions character |
|
2120 |
|
2121 462 |
|
2122 00:23:26,990 --> 00:23:30,260 |
|
2123 d. We test whether it's |
|
2124 equal to this character. |
|
2125 |
|
2126 463 |
|
2127 00:23:30,260 --> 00:23:32,225 |
|
2128 We want to take the |
|
2129 derivative off. |
|
2130 |
|
2131 464 |
|
2132 00:23:32,225 --> 00:23:36,185 |
|
2133 If yes, return one, otherwise 0. |
|
2134 |
|
2135 465 |
|
2136 00:23:36,185 --> 00:23:38,600 |
|
2137 The alternative which has pushed |
|
2138 |
|
2139 466 |
|
2140 00:23:38,600 --> 00:23:39,860 |
|
2141 the derivative under |
|
2142 |
|
2143 467 |
|
2144 00:23:39,860 --> 00:23:42,710 |
|
2145 this alternative, |
|
2146 that's the easy one. |
|
2147 |
|
2148 468 |
|
2149 00:23:42,710 --> 00:23:44,690 |
|
2150 Here's the sequence case where we |
|
2151 |
|
2152 469 |
|
2153 00:23:44,690 --> 00:23:47,015 |
|
2154 first have to test |
|
2155 if it's nullable, |
|
2156 |
|
2157 470 |
|
2158 00:23:47,015 --> 00:23:49,040 |
|
2159 If it's not the have to push |
|
2160 |
|
2161 471 |
|
2162 00:23:49,040 --> 00:23:52,160 |
|
2163 the derivative over |
|
2164 the first DR1. |
|
2165 |
|
2166 472 |
|
2167 00:23:52,160 --> 00:23:56,135 |
|
2168 And otherwise if it is null |
|
2169 above we have two cases. |
|
2170 |
|
2171 473 |
|
2172 00:23:56,135 --> 00:24:00,605 |
|
2173 Either you have to push |
|
2174 it over the R1 or R2. |
|
2175 |
|
2176 474 |
|
2177 00:24:00,605 --> 00:24:03,860 |
|
2178 And a star case, this one. |
|
2179 |
|
2180 475 |
|
2181 00:24:03,860 --> 00:24:07,160 |
|
2182 We just build the sequence |
|
2183 of the derivative of |
|
2184 |
|
2185 476 |
|
2186 00:24:07,160 --> 00:24:11,600 |
|
2187 R1 and append the |
|
2188 or as a sequence, |
|
2189 |
|
2190 477 |
|
2191 00:24:11,600 --> 00:24:14,195 |
|
2192 put the star at the end. |
|
2193 |
|
2194 478 |
|
2195 00:24:14,195 --> 00:24:19,430 |
|
2196 Okay, so here's this |
|
2197 function for strings. |
|
2198 |
|
2199 479 |
|
2200 00:24:19,430 --> 00:24:21,740 |
|
2201 So a list of characters. |
|
2202 |
|
2203 480 |
|
2204 00:24:21,740 --> 00:24:23,870 |
|
2205 This function takes N, |
|
2206 |
|
2207 481 |
|
2208 00:24:23,870 --> 00:24:26,480 |
|
2209 S list of characters argument |
|
2210 and a reg expression |
|
2211 |
|
2212 482 |
|
2213 00:24:26,480 --> 00:24:29,360 |
|
2214 as argument and returns |
|
2215 a wreck expression. |
|
2216 |
|
2217 483 |
|
2218 00:24:29,360 --> 00:24:31,460 |
|
2219 And we just have to |
|
2220 pattern match what |
|
2221 |
|
2222 484 |
|
2223 00:24:31,460 --> 00:24:34,055 |
|
2224 that string looks like |
|
2225 or this list looks like. |
|
2226 |
|
2227 485 |
|
2228 00:24:34,055 --> 00:24:35,360 |
|
2229 If it's the empty list, |
|
2230 |
|
2231 486 |
|
2232 00:24:35,360 --> 00:24:38,030 |
|
2233 it just immediately |
|
2234 return the R. If |
|
2235 |
|
2236 487 |
|
2237 00:24:38,030 --> 00:24:42,020 |
|
2238 it's something of the |
|
2239 form C followed by S, |
|
2240 |
|
2241 488 |
|
2242 00:24:42,020 --> 00:24:45,050 |
|
2243 then we build first the |
|
2244 derivative according |
|
2245 |
|
2246 489 |
|
2247 00:24:45,050 --> 00:24:48,345 |
|
2248 to a C. And then |
|
2249 the result of that, |
|
2250 |
|
2251 490 |
|
2252 00:24:48,345 --> 00:24:50,090 |
|
2253 people recursively call |
|
2254 |
|
2255 491 |
|
2256 00:24:50,090 --> 00:24:53,675 |
|
2257 the derivative |
|
2258 according to s. Okay? |
|
2259 |
|
2260 492 |
|
2261 00:24:53,675 --> 00:24:56,060 |
|
2262 And now the main mantra, |
|
2263 |
|
2264 493 |
|
2265 00:24:56,060 --> 00:24:59,360 |
|
2266 it takes a reg |
|
2267 expression and takes |
|
2268 |
|
2269 494 |
|
2270 00:24:59,360 --> 00:25:02,870 |
|
2271 a string and returns a |
|
2272 boolean, true or false. |
|
2273 |
|
2274 495 |
|
2275 00:25:02,870 --> 00:25:05,705 |
|
2276 And it first builds |
|
2277 the derivative. |
|
2278 |
|
2279 496 |
|
2280 00:25:05,705 --> 00:25:07,430 |
|
2281 The only thing I have to do here |
|
2282 |
|
2283 497 |
|
2284 00:25:07,430 --> 00:25:09,410 |
|
2285 because I'm implementing |
|
2286 it and scholar, |
|
2287 |
|
2288 498 |
|
2289 00:25:09,410 --> 00:25:15,064 |
|
2290 I have to coerce between strings |
|
2291 and lists of characters. |
|
2292 |
|
2293 499 |
|
2294 00:25:15,064 --> 00:25:16,580 |
|
2295 So he, I take first |
|
2296 |
|
2297 500 |
|
2298 00:25:16,580 --> 00:25:20,810 |
|
2299 the two list of the S that |
|
2300 gives me a list of characters. |
|
2301 |
|
2302 501 |
|
2303 00:25:20,810 --> 00:25:23,135 |
|
2304 Then I build this derivative |
|
2305 |
|
2306 502 |
|
2307 00:25:23,135 --> 00:25:26,045 |
|
2308 is ds because I'm |
|
2309 looking at strings. |
|
2310 |
|
2311 503 |
|
2312 00:25:26,045 --> 00:25:28,160 |
|
2313 And then at the end, |
|
2314 |
|
2315 504 |
|
2316 00:25:28,160 --> 00:25:33,050 |
|
2317 built-in nullable of |
|
2318 the final derivative. |
|
2319 |
|
2320 505 |
|
2321 00:25:33,050 --> 00:25:37,310 |
|
2322 The beauty of all this |
|
2323 is that in Scala, |
|
2324 |
|
2325 506 |
|
2326 00:25:37,310 --> 00:25:40,085 |
|
2327 these definition which |
|
2328 I had on the slides |
|
2329 |
|
2330 507 |
|
2331 00:25:40,085 --> 00:25:43,835 |
|
2332 go almost one-to-one into code. |
|
2333 |
|
2334 508 |
|
2335 00:25:43,835 --> 00:25:45,605 |
|
2336 And that's of course, |
|
2337 |
|
2338 509 |
|
2339 00:25:45,605 --> 00:25:47,480 |
|
2340 makes it very easy |
|
2341 to implement in |
|
2342 |
|
2343 510 |
|
2344 00:25:47,480 --> 00:25:49,730 |
|
2345 a functional language like Scala. |
|
2346 |
|
2347 511 |
|
2348 00:25:49,730 --> 00:25:52,865 |
|
2349 We can also now try |
|
2350 out some examples. |
|
2351 |
|
2352 512 |
|
2353 00:25:52,865 --> 00:25:56,960 |
|
2354 This was the example |
|
2355 I had on the slide. |
|
2356 |
|
2357 513 |
|
2358 00:25:56,960 --> 00:25:58,370 |
|
2359 And we could now calculate |
|
2360 |
|
2361 514 |
|
2362 00:25:58,370 --> 00:26:00,305 |
|
2363 what's the derivative |
|
2364 according to a, |
|
2365 |
|
2366 515 |
|
2367 00:26:00,305 --> 00:26:02,720 |
|
2368 B, and C of this |
|
2369 record expression. |
|
2370 |
|
2371 516 |
|
2372 00:26:02,720 --> 00:26:07,040 |
|
2373 And I hope you didn't |
|
2374 make any mistake. |
|
2375 |
|
2376 517 |
|
2377 00:26:07,040 --> 00:26:09,260 |
|
2378 Now of course we want |
|
2379 to see whether B |
|
2380 |
|
2381 518 |
|
2382 00:26:09,260 --> 00:26:11,390 |
|
2383 do any better with |
|
2384 this algorithm. |
|
2385 |
|
2386 519 |
|
2387 00:26:11,390 --> 00:26:13,715 |
|
2388 Then Python and Ruby. |
|
2389 |
|
2390 520 |
|
2391 00:26:13,715 --> 00:26:16,070 |
|
2392 And let's first have a |
|
2393 look at this example. |
|
2394 |
|
2395 521 |
|
2396 00:26:16,070 --> 00:26:18,079 |
|
2397 So this reg expression. |
|
2398 |
|
2399 522 |
|
2400 00:26:18,079 --> 00:26:19,880 |
|
2401 The problem is that in |
|
2402 |
|
2403 523 |
|
2404 00:26:19,880 --> 00:26:22,070 |
|
2405 our reg expression |
|
2406 metro so far we have |
|
2407 |
|
2408 524 |
|
2409 00:26:22,070 --> 00:26:24,530 |
|
2410 the sequence right |
|
2411 expression where we |
|
2412 |
|
2413 525 |
|
2414 00:26:24,530 --> 00:26:27,200 |
|
2415 don't have this optional |
|
2416 wreck expression. |
|
2417 |
|
2418 526 |
|
2419 00:26:27,200 --> 00:26:30,800 |
|
2420 And we don't have this n |
|
2421 times Rekha expression, |
|
2422 |
|
2423 527 |
|
2424 00:26:30,800 --> 00:26:36,605 |
|
2425 but we can quickly implement |
|
2426 that in our implementation. |
|
2427 |
|
2428 528 |
|
2429 00:26:36,605 --> 00:26:40,549 |
|
2430 So if you want to build the |
|
2431 optional reg expression, |
|
2432 |
|
2433 529 |
|
2434 00:26:40,549 --> 00:26:41,870 |
|
2435 which is defined here, |
|
2436 |
|
2437 530 |
|
2438 00:26:41,870 --> 00:26:44,570 |
|
2439 a little function which |
|
2440 takes a reg expression and |
|
2441 |
|
2442 531 |
|
2443 00:26:44,570 --> 00:26:47,360 |
|
2444 returns the alternative of R one. |
|
2445 |
|
2446 532 |
|
2447 00:26:47,360 --> 00:26:49,624 |
|
2448 So it essentially |
|
2449 takes the definition |
|
2450 |
|
2451 533 |
|
2452 00:26:49,624 --> 00:26:53,240 |
|
2453 of optional R and |
|
2454 replaces it by that. |
|
2455 |
|
2456 534 |
|
2457 00:26:53,240 --> 00:26:56,150 |
|
2458 The end times what we |
|
2459 essentially do it, |
|
2460 |
|
2461 535 |
|
2462 00:26:56,150 --> 00:27:01,535 |
|
2463 There's beaches built n |
|
2464 copies of this r. Ok? |
|
2465 |
|
2466 536 |
|
2467 00:27:01,535 --> 00:27:04,745 |
|
2468 So if this n times was 0, |
|
2469 |
|
2470 537 |
|
2471 00:27:04,745 --> 00:27:06,245 |
|
2472 we just return one. |
|
2473 |
|
2474 538 |
|
2475 00:27:06,245 --> 00:27:11,570 |
|
2476 If it's one, then we return |
|
2477 just the reg expression. |
|
2478 |
|
2479 539 |
|
2480 00:27:11,570 --> 00:27:15,575 |
|
2481 And if it's a, something |
|
2482 bigger than one, |
|
2483 |
|
2484 540 |
|
2485 00:27:15,575 --> 00:27:18,560 |
|
2486 then we just built the |
|
2487 sequence of one of |
|
2488 |
|
2489 541 |
|
2490 00:27:18,560 --> 00:27:20,270 |
|
2491 these copies and call |
|
2492 |
|
2493 542 |
|
2494 00:27:20,270 --> 00:27:22,925 |
|
2495 this function again |
|
2496 with n minus one. |
|
2497 |
|
2498 543 |
|
2499 00:27:22,925 --> 00:27:26,330 |
|
2500 So we just now n copies of that. |
|
2501 |
|
2502 544 |
|
2503 00:27:26,330 --> 00:27:30,710 |
|
2504 Okay? Okay, so we can look |
|
2505 at our first example. |
|
2506 |
|
2507 545 |
|
2508 00:27:30,710 --> 00:27:32,629 |
|
2509 This is the work expression, |
|
2510 |
|
2511 546 |
|
2512 00:27:32,629 --> 00:27:36,035 |
|
2513 and I call that here |
|
2514 even rec expression1. |
|
2515 |
|
2516 547 |
|
2517 00:27:36,035 --> 00:27:37,670 |
|
2518 It doesn't look as beautiful |
|
2519 |
|
2520 548 |
|
2521 00:27:37,670 --> 00:27:39,140 |
|
2522 as what we write down on paper. |
|
2523 |
|
2524 549 |
|
2525 00:27:39,140 --> 00:27:41,240 |
|
2526 We will actually look |
|
2527 at this later on |
|
2528 |
|
2529 550 |
|
2530 00:27:41,240 --> 00:27:43,640 |
|
2531 if this can be improved. |
|
2532 But here it is. |
|
2533 |
|
2534 551 |
|
2535 00:27:43,640 --> 00:27:45,724 |
|
2536 Here's the reg expression, |
|
2537 |
|
2538 552 |
|
2539 00:27:45,724 --> 00:27:49,520 |
|
2540 and here's a little function |
|
2541 which measures the time. |
|
2542 |
|
2543 553 |
|
2544 00:27:49,520 --> 00:27:53,180 |
|
2545 And we can now start testing |
|
2546 |
|
2547 554 |
|
2548 00:27:53,180 --> 00:27:57,845 |
|
2549 this reg expression with |
|
2550 strings of just containing A's. |
|
2551 |
|
2552 555 |
|
2553 00:27:57,845 --> 00:28:00,020 |
|
2554 And we first a bit cautious, |
|
2555 |
|
2556 556 |
|
2557 00:28:00,020 --> 00:28:04,985 |
|
2558 be tested between 020 |
|
2559 and be count by two. |
|
2560 |
|
2561 557 |
|
2562 00:28:04,985 --> 00:28:08,330 |
|
2563 And I actually will not |
|
2564 start that with Scala, |
|
2565 |
|
2566 558 |
|
2567 00:28:08,330 --> 00:28:12,965 |
|
2568 but actually the orbital online. |
|
2569 |
|
2570 559 |
|
2571 00:28:12,965 --> 00:28:15,305 |
|
2572 And that's out. |
|
2573 |
|
2574 560 |
|
2575 00:28:15,305 --> 00:28:17,269 |
|
2576 And that correlates. |
|
2577 |
|
2578 561 |
|
2579 00:28:17,269 --> 00:28:20,675 |
|
2580 So for 18 days it's pretty fast. |
|
2581 |
|
2582 562 |
|
2583 00:28:20,675 --> 00:28:24,815 |
|
2584 But for 20 it already |
|
2585 needs to seconds. |
|
2586 |
|
2587 563 |
|
2588 00:28:24,815 --> 00:28:28,265 |
|
2589 And you can see |
|
2590 actually this jump from |
|
2591 |
|
2592 564 |
|
2593 00:28:28,265 --> 00:28:32,825 |
|
2594 18 to 20 in the runtime |
|
2595 is prepared to. |
|
2596 |
|
2597 565 |
|
2598 00:28:32,825 --> 00:28:37,460 |
|
2599 And if we actually measure |
|
2600 it more accurately, |
|
2601 |
|
2602 566 |
|
2603 00:28:37,460 --> 00:28:39,770 |
|
2604 then we get this picture. |
|
2605 |
|
2606 567 |
|
2607 00:28:39,770 --> 00:28:41,255 |
|
2608 And what turns out, |
|
2609 |
|
2610 568 |
|
2611 00:28:41,255 --> 00:28:45,830 |
|
2612 we actually inverse as Python |
|
2613 and Ruby in this example. |
|
2614 |
|
2615 569 |
|
2616 00:28:45,830 --> 00:28:49,230 |
|
2617 So I guess that's a fail. |