videos/02-algo3.srt
changeset 773 260e330f7466
child 774 42733adf2a48
equal deleted inserted replaced
772:3bf3f5bb067e 773:260e330f7466
       
     1 1
       
     2 00:00:06,350 --> 00:00:10,305
       
     3 They come back. I
       
     4 can hear you saying,
       
     5 
       
     6 2
       
     7 00:00:10,305 --> 00:00:12,000
       
     8 yes, you tried it out on
       
     9 
       
    10 3
       
    11 00:00:12,000 --> 00:00:14,760
       
    12 one example and you
       
    13 were much better.
       
    14 
       
    15 4
       
    16 00:00:14,760 --> 00:00:17,415
       
    17 But how about on other examples?
       
    18 
       
    19 5
       
    20 00:00:17,415 --> 00:00:21,265
       
    21 Specifically, we had two
       
    22 regular expressions.
       
    23 
       
    24 6
       
    25 00:00:21,265 --> 00:00:23,480
       
    26 How about the other case?
       
    27 
       
    28 7
       
    29 00:00:23,480 --> 00:00:27,480
       
    30 Where let's have a look at
       
    31 that other case in this video.
       
    32 
       
    33 8
       
    34 00:00:29,140 --> 00:00:32,705
       
    35 So I'm back here
       
    36 in this Reto file.
       
    37 
       
    38 9
       
    39 00:00:32,705 --> 00:00:35,180
       
    40 And here's this test
       
    41 case which run quite
       
    42 
       
    43 10
       
    44 00:00:35,180 --> 00:00:39,665
       
    45 nicely for strings between 01000.
       
    46 
       
    47 11
       
    48 00:00:39,665 --> 00:00:42,725
       
    49 Here is the other test case.
       
    50 
       
    51 12
       
    52 00:00:42,725 --> 00:00:44,090
       
    53 I still run it only
       
    54 
       
    55 13
       
    56 00:00:44,090 --> 00:00:48,470
       
    57 for relatively small
       
    58 strings between 020.
       
    59 
       
    60 14
       
    61 00:00:48,470 --> 00:00:50,240
       
    62 And let's see what it say.
       
    63 
       
    64 15
       
    65 00:00:50,240 --> 00:00:51,800
       
    66 So I'm running the test in
       
    67 
       
    68 16
       
    69 00:00:51,800 --> 00:00:57,320
       
    70 the ammonoids repo and
       
    71 doesn't look too bad.
       
    72 
       
    73 17
       
    74 00:00:57,320 --> 00:01:01,160
       
    75 But this might be because
       
    76 20 is not general enough.
       
    77 
       
    78 18
       
    79 00:01:01,160 --> 00:01:03,620
       
    80 So let's try it with 30.
       
    81 
       
    82 19
       
    83 00:01:03,620 --> 00:01:06,530
       
    84 Let's run this test again.
       
    85 
       
    86 20
       
    87 00:01:06,530 --> 00:01:11,075
       
    88 And 20 is quite okay.
       
    89 
       
    90 21
       
    91 00:01:11,075 --> 00:01:15,965
       
    92 22, okay, that's now
       
    93 almost ten times as much.
       
    94 
       
    95 22
       
    96 00:01:15,965 --> 00:01:18,995
       
    97 And then the next
       
    98 one would be 24.
       
    99 
       
   100 23
       
   101 00:01:18,995 --> 00:01:23,615
       
   102 And we're waiting and waiting.
       
   103 
       
   104 24
       
   105 00:01:23,615 --> 00:01:26,410
       
   106 And we are waiting.
       
   107 
       
   108 25
       
   109 00:01:26,410 --> 00:01:29,300
       
   110 Actually, it isn't
       
   111 calculated at all.
       
   112 
       
   113 26
       
   114 00:01:29,300 --> 00:01:31,399
       
   115 It run out of memory.
       
   116 
       
   117 27
       
   118 00:01:31,399 --> 00:01:33,905
       
   119 So here is something going on,
       
   120 
       
   121 28
       
   122 00:01:33,905 --> 00:01:37,640
       
   123 which is Daphne bad and we
       
   124 have to have a look at that.
       
   125 
       
   126 29
       
   127 00:01:37,640 --> 00:01:40,640
       
   128 Okay? It's always
       
   129 instructive with
       
   130 
       
   131 30
       
   132 00:01:40,640 --> 00:01:43,460
       
   133 this algorithm to
       
   134 look at the sizes of
       
   135 
       
   136 31
       
   137 00:01:43,460 --> 00:01:45,695
       
   138 the record expressions
       
   139 we calculate
       
   140 
       
   141 32
       
   142 00:01:45,695 --> 00:01:49,625
       
   143 the evil to Is this
       
   144 a star, star B.
       
   145 
       
   146 33
       
   147 00:01:49,625 --> 00:01:51,800
       
   148 So there's nothing we
       
   149 can compress there.
       
   150 
       
   151 34
       
   152 00:01:51,800 --> 00:01:55,490
       
   153 It has just stars and
       
   154 sequences and nothing else.
       
   155 
       
   156 35
       
   157 00:01:55,490 --> 00:01:58,430
       
   158 And so it's not that we
       
   159 can play the same trick
       
   160 
       
   161 36
       
   162 00:01:58,430 --> 00:02:01,805
       
   163 of trying to introduce
       
   164 an explicit constructor.
       
   165 
       
   166 37
       
   167 00:02:01,805 --> 00:02:04,190
       
   168 But now let's have a
       
   169 look at the derivatives.
       
   170 
       
   171 38
       
   172 00:02:04,190 --> 00:02:05,600
       
   173 As you can see here.
       
   174 
       
   175 39
       
   176 00:02:05,600 --> 00:02:08,420
       
   177 If I take this evil to wreck
       
   178 
       
   179 40
       
   180 00:02:08,420 --> 00:02:09,935
       
   181 expression and they built
       
   182 
       
   183 41
       
   184 00:02:09,935 --> 00:02:12,470
       
   185 now longer and
       
   186 longer derivatives,
       
   187 
       
   188 42
       
   189 00:02:12,470 --> 00:02:14,090
       
   190 you can see this grows.
       
   191 
       
   192 43
       
   193 00:02:14,090 --> 00:02:16,055
       
   194 And as you see in this line,
       
   195 
       
   196 44
       
   197 00:02:16,055 --> 00:02:17,270
       
   198 if I'm trying to take
       
   199 
       
   200 45
       
   201 00:02:17,270 --> 00:02:20,180
       
   202 the derivative of a
       
   203 quite large string,
       
   204 
       
   205 46
       
   206 00:02:20,180 --> 00:02:21,680
       
   207 it is quite quick.
       
   208 
       
   209 47
       
   210 00:02:21,680 --> 00:02:26,870
       
   211 But the size of the
       
   212 reg expression,
       
   213 
       
   214 48
       
   215 00:02:26,870 --> 00:02:28,310
       
   216 the number of nodes,
       
   217 
       
   218 49
       
   219 00:02:28,310 --> 00:02:30,815
       
   220 is also like nearly
       
   221 eight millions.
       
   222 
       
   223 50
       
   224 00:02:30,815 --> 00:02:33,845
       
   225 And so even if that calculates
       
   226 relatively quickly,
       
   227 
       
   228 51
       
   229 00:02:33,845 --> 00:02:37,865
       
   230 that is the reason why at 24,
       
   231 
       
   232 52
       
   233 00:02:37,865 --> 00:02:39,560
       
   234 it just runs out of memory.
       
   235 
       
   236 53
       
   237 00:02:39,560 --> 00:02:42,785
       
   238 This reg expression
       
   239 just grew too much.
       
   240 
       
   241 54
       
   242 00:02:42,785 --> 00:02:46,520
       
   243 So we somehow need
       
   244 to still compress
       
   245 
       
   246 55
       
   247 00:02:46,520 --> 00:02:49,655
       
   248 this record expression
       
   249 and make it not
       
   250 
       
   251 56
       
   252 00:02:49,655 --> 00:02:52,700
       
   253 grow so much when we
       
   254 build derivative.
       
   255 
       
   256 57
       
   257 00:02:52,700 --> 00:02:54,020
       
   258 So we have to look at
       
   259 
       
   260 58
       
   261 00:02:54,020 --> 00:02:57,050
       
   262 where does that grow
       
   263 actually come from.
       
   264 
       
   265 59
       
   266 00:02:57,050 --> 00:03:00,170
       
   267 Let's look at the
       
   268 derivative operation
       
   269 
       
   270 60
       
   271 00:03:00,170 --> 00:03:01,895
       
   272 again in more detail.
       
   273 
       
   274 61
       
   275 00:03:01,895 --> 00:03:03,740
       
   276 When we introduced
       
   277 it, we looked at
       
   278 
       
   279 62
       
   280 00:03:03,740 --> 00:03:06,560
       
   281 this example of a
       
   282 wreck expression R,
       
   283 
       
   284 63
       
   285 00:03:06,560 --> 00:03:11,465
       
   286 which is a star of some
       
   287 alternative and some sequence.
       
   288 
       
   289 64
       
   290 00:03:11,465 --> 00:03:13,130
       
   291 And we can build now
       
   292 
       
   293 65
       
   294 00:03:13,130 --> 00:03:15,800
       
   295 the derivative of this
       
   296 r according to a,
       
   297 
       
   298 66
       
   299 00:03:15,800 --> 00:03:18,965
       
   300 b, and c and see
       
   301 what it calculates.
       
   302 
       
   303 67
       
   304 00:03:18,965 --> 00:03:21,935
       
   305 And you see in case of a,
       
   306 
       
   307 68
       
   308 00:03:21,935 --> 00:03:26,570
       
   309 it's like one times b plus
       
   310 0 and then followed by r,
       
   311 
       
   312 69
       
   313 00:03:26,570 --> 00:03:29,015
       
   314 which is this whole
       
   315 wreck expression again.
       
   316 
       
   317 70
       
   318 00:03:29,015 --> 00:03:30,935
       
   319 So you can also see.
       
   320 
       
   321 71
       
   322 00:03:30,935 --> 00:03:34,745
       
   323 Derivative operation
       
   324 introduces a lot of junk.
       
   325 
       
   326 72
       
   327 00:03:34,745 --> 00:03:38,165
       
   328 This plus 0 isn't
       
   329 really necessary.
       
   330 
       
   331 73
       
   332 00:03:38,165 --> 00:03:42,455
       
   333 And this times one could
       
   334 be also thrown away.
       
   335 
       
   336 74
       
   337 00:03:42,455 --> 00:03:43,685
       
   338 So in the first case,
       
   339 
       
   340 75
       
   341 00:03:43,685 --> 00:03:48,110
       
   342 actually we could just have
       
   343 one times b is b plus 0,
       
   344 
       
   345 76
       
   346 00:03:48,110 --> 00:03:49,580
       
   347 it still be a,
       
   348 
       
   349 77
       
   350 00:03:49,580 --> 00:03:54,905
       
   351 so it would be B followed by
       
   352 R. And in this second case,
       
   353 
       
   354 78
       
   355 00:03:54,905 --> 00:03:57,515
       
   356 C0 times b, that will be just 0.
       
   357 
       
   358 79
       
   359 00:03:57,515 --> 00:03:59,270
       
   360 Then 0 plus one is
       
   361 
       
   362 80
       
   363 00:03:59,270 --> 00:04:05,330
       
   364 11 times r would be just
       
   365 r. And in the last case,
       
   366 
       
   367 81
       
   368 00:04:05,330 --> 00:04:12,155
       
   369 C0 times B would be 00 plus
       
   370 0 is 00 times r would be 0.
       
   371 
       
   372 82
       
   373 00:04:12,155 --> 00:04:17,540
       
   374 Okay? So we could throw out
       
   375 all these useless zeros and
       
   376 
       
   377 83
       
   378 00:04:17,540 --> 00:04:20,960
       
   379 ones because actually
       
   380 we have to throw
       
   381 
       
   382 84
       
   383 00:04:20,960 --> 00:04:24,845
       
   384 them out because over time
       
   385 they just accumulate.
       
   386 
       
   387 85
       
   388 00:04:24,845 --> 00:04:27,035
       
   389 And then we build
       
   390 the next derivative.
       
   391 
       
   392 86
       
   393 00:04:27,035 --> 00:04:31,130
       
   394 0 wouldn't go away by
       
   395 building annuity where tests.
       
   396 
       
   397 87
       
   398 00:04:31,130 --> 00:04:34,310
       
   399 So we need to somehow
       
   400 a mechanism to
       
   401 
       
   402 88
       
   403 00:04:34,310 --> 00:04:39,120
       
   404 delete the junk from
       
   405 these derivatives.
       
   406 
       
   407 89
       
   408 00:04:39,280 --> 00:04:41,900
       
   409 But how to delete junk?
       
   410 
       
   411 90
       
   412 00:04:41,900 --> 00:04:43,370
       
   413 We already know we have
       
   414 
       
   415 91
       
   416 00:04:43,370 --> 00:04:47,825
       
   417 the simplification rules
       
   418 which say like if r plus 0,
       
   419 
       
   420 92
       
   421 00:04:47,825 --> 00:04:53,000
       
   422 I can just replace by
       
   423 r and the other ones.
       
   424 
       
   425 93
       
   426 00:04:53,000 --> 00:04:55,760
       
   427 And so what I now
       
   428 need to do is in
       
   429 
       
   430 94
       
   431 00:04:55,760 --> 00:04:58,715
       
   432 my algorithm when I
       
   433 built the derivative,
       
   434 
       
   435 95
       
   436 00:04:58,715 --> 00:05:01,415
       
   437 this might have
       
   438 introduced some chunk.
       
   439 
       
   440 96
       
   441 00:05:01,415 --> 00:05:02,960
       
   442 And I now have to have
       
   443 
       
   444 97
       
   445 00:05:02,960 --> 00:05:06,590
       
   446 essentially a additional function
       
   447 
       
   448 98
       
   449 00:05:06,590 --> 00:05:08,945
       
   450 which deletes this junk again.
       
   451 
       
   452 99
       
   453 00:05:08,945 --> 00:05:10,490
       
   454 And in doing so,
       
   455 
       
   456 100
       
   457 00:05:10,490 --> 00:05:13,340
       
   458 keep steer, Rekha expression,
       
   459 
       
   460 101
       
   461 00:05:13,340 --> 00:05:16,730
       
   462 relatively small, pickers debts,
       
   463 
       
   464 102
       
   465 00:05:16,730 --> 00:05:19,535
       
   466 the Achilles heel
       
   467 of this algorithm.
       
   468 
       
   469 103
       
   470 00:05:19,535 --> 00:05:22,565
       
   471 If this regular expression
       
   472 grows too much,
       
   473 
       
   474 104
       
   475 00:05:22,565 --> 00:05:25,070
       
   476 then all these functions
       
   477 will slow down.
       
   478 
       
   479 105
       
   480 00:05:25,070 --> 00:05:26,360
       
   481 So the purpose of
       
   482 
       
   483 106
       
   484 00:05:26,360 --> 00:05:30,455
       
   485 the simplification function
       
   486 is to after the derivative,
       
   487 
       
   488 107
       
   489 00:05:30,455 --> 00:05:33,080
       
   490 compress again this
       
   491 rec expression
       
   492 
       
   493 108
       
   494 00:05:33,080 --> 00:05:35,570
       
   495 and then do the next derivative.
       
   496 
       
   497 109
       
   498 00:05:35,570 --> 00:05:39,815
       
   499 So if we introduce that
       
   500 additional functions simp,
       
   501 
       
   502 110
       
   503 00:05:39,815 --> 00:05:42,440
       
   504 which essentially
       
   505 just goes through
       
   506 
       
   507 111
       
   508 00:05:42,440 --> 00:05:46,040
       
   509 this reg expression and
       
   510 deletes all this junk,
       
   511 
       
   512 112
       
   513 00:05:46,040 --> 00:05:50,045
       
   514 then we should be in a
       
   515 much better position.
       
   516 
       
   517 113
       
   518 00:05:50,045 --> 00:05:52,490
       
   519 As you can see on this slide,
       
   520 
       
   521 114
       
   522 00:05:52,490 --> 00:05:54,440
       
   523 the simplification
       
   524 function can actually
       
   525 
       
   526 115
       
   527 00:05:54,440 --> 00:05:56,855
       
   528 be implemented very easily.
       
   529 
       
   530 116
       
   531 00:05:56,855 --> 00:05:59,750
       
   532 However, there are few
       
   533 interesting points.
       
   534 
       
   535 117
       
   536 00:05:59,750 --> 00:06:02,720
       
   537 First of all, there
       
   538 are only two cases.
       
   539 
       
   540 118
       
   541 00:06:02,720 --> 00:06:05,060
       
   542 The only simplify when we have
       
   543 
       
   544 119
       
   545 00:06:05,060 --> 00:06:08,255
       
   546 an alternative or a sequence.
       
   547 
       
   548 120
       
   549 00:06:08,255 --> 00:06:12,859
       
   550 So for example, we will
       
   551 never simplify under star.
       
   552 
       
   553 121
       
   554 00:06:12,859 --> 00:06:15,950
       
   555 If you look at the
       
   556 derivative operation
       
   557 
       
   558 122
       
   559 00:06:15,950 --> 00:06:17,825
       
   560 and you scratch your
       
   561 head a little bit,
       
   562 
       
   563 123
       
   564 00:06:17,825 --> 00:06:20,180
       
   565 we'll find out why
       
   566 is that the case.
       
   567 
       
   568 124
       
   569 00:06:20,180 --> 00:06:22,145
       
   570 It's essentially a waste of time.
       
   571 
       
   572 125
       
   573 00:06:22,145 --> 00:06:25,505
       
   574 So you would not
       
   575 simplify under star.
       
   576 
       
   577 126
       
   578 00:06:25,505 --> 00:06:27,650
       
   579 You only simplify if you have
       
   580 
       
   581 127
       
   582 00:06:27,650 --> 00:06:30,530
       
   583 an alternative and the sequence.
       
   584 
       
   585 128
       
   586 00:06:30,530 --> 00:06:34,880
       
   587 Now, however, there
       
   588 is one small point.
       
   589 
       
   590 129
       
   591 00:06:34,880 --> 00:06:39,785
       
   592 You have to be aware of
       
   593 these simplification rules.
       
   594 
       
   595 130
       
   596 00:06:39,785 --> 00:06:42,740
       
   597 They need to be essentially
       
   598 applied backwards.
       
   599 
       
   600 131
       
   601 00:06:42,740 --> 00:06:45,650
       
   602 So I have to first essentially
       
   603 go to the leaves of
       
   604 
       
   605 132
       
   606 00:06:45,650 --> 00:06:49,085
       
   607 this record expression and
       
   608 then have to find out,
       
   609 
       
   610 133
       
   611 00:06:49,085 --> 00:06:51,170
       
   612 can I apply simplification?
       
   613 
       
   614 134
       
   615 00:06:51,170 --> 00:06:52,670
       
   616 And then if yes,
       
   617 
       
   618 135
       
   619 00:06:52,670 --> 00:06:55,430
       
   620 I apply the simplification
       
   621 and I look at the next step,
       
   622 
       
   623 136
       
   624 00:06:55,430 --> 00:06:57,815
       
   625 can I now simplify
       
   626 something more?
       
   627 
       
   628 137
       
   629 00:06:57,815 --> 00:06:59,390
       
   630 The reason is how
       
   631 
       
   632 138
       
   633 00:06:59,390 --> 00:07:03,125
       
   634 the simplification
       
   635 rules are formulated.
       
   636 
       
   637 139
       
   638 00:07:03,125 --> 00:07:05,300
       
   639 They might fire in
       
   640 
       
   641 140
       
   642 00:07:05,300 --> 00:07:08,765
       
   643 a higher level if something
       
   644 simplifies below.
       
   645 
       
   646 141
       
   647 00:07:08,765 --> 00:07:14,315
       
   648 So we have to essentially
       
   649 simplify from the inside out.
       
   650 
       
   651 142
       
   652 00:07:14,315 --> 00:07:16,850
       
   653 And in this
       
   654 simplification function,
       
   655 
       
   656 143
       
   657 00:07:16,850 --> 00:07:20,885
       
   658 what that means is the
       
   659 matching this reg expression.
       
   660 
       
   661 144
       
   662 00:07:20,885 --> 00:07:23,120
       
   663 Be test whether it's
       
   664 an alternative or
       
   665 
       
   666 145
       
   667 00:07:23,120 --> 00:07:26,345
       
   668 a sequence only then we
       
   669 actually go into action.
       
   670 
       
   671 146
       
   672 00:07:26,345 --> 00:07:28,385
       
   673 Once we know.
       
   674 
       
   675 147
       
   676 00:07:28,385 --> 00:07:30,530
       
   677 In case of an alternative,
       
   678 
       
   679 148
       
   680 00:07:30,530 --> 00:07:32,120
       
   681 what are the two components?
       
   682 
       
   683 149
       
   684 00:07:32,120 --> 00:07:35,765
       
   685 P first, simplify each component,
       
   686 
       
   687 150
       
   688 00:07:35,765 --> 00:07:39,095
       
   689 okay, and then we
       
   690 get a result back.
       
   691 
       
   692 151
       
   693 00:07:39,095 --> 00:07:41,180
       
   694 And we check whether
       
   695 
       
   696 152
       
   697 00:07:41,180 --> 00:07:45,679
       
   698 this simplification of
       
   699 R1 resulted into a 0.
       
   700 
       
   701 153
       
   702 00:07:45,679 --> 00:07:47,884
       
   703 Then because it's an alternative
       
   704 
       
   705 154
       
   706 00:07:47,884 --> 00:07:49,190
       
   707 than I can just replace it
       
   708 
       
   709 155
       
   710 00:07:49,190 --> 00:07:53,375
       
   711 by r is two a simplified
       
   712 version of R2.
       
   713 
       
   714 156
       
   715 00:07:53,375 --> 00:07:58,820
       
   716 If it came back r as
       
   717 two is actually 0,
       
   718 
       
   719 157
       
   720 00:07:58,820 --> 00:08:00,410
       
   721 then I can replace it by
       
   722 
       
   723 158
       
   724 00:08:00,410 --> 00:08:03,785
       
   725 the simplified version of a warm.
       
   726 
       
   727 159
       
   728 00:08:03,785 --> 00:08:07,460
       
   729 If I simplify both
       
   730 alternatives and it
       
   731 
       
   732 160
       
   733 00:08:07,460 --> 00:08:11,180
       
   734 happens that the simplified
       
   735 versions are the same,
       
   736 
       
   737 161
       
   738 00:08:11,180 --> 00:08:15,170
       
   739 next century I can throw away
       
   740 one of the alternatives.
       
   741 
       
   742 162
       
   743 00:08:15,170 --> 00:08:18,080
       
   744 Otherwise, I just have to
       
   745 keep the alternatives,
       
   746 
       
   747 163
       
   748 00:08:18,080 --> 00:08:21,155
       
   749 but the simplified components
       
   750 
       
   751 164
       
   752 00:08:21,155 --> 00:08:23,915
       
   753 in the sequence is
       
   754 pretty much the same.
       
   755 
       
   756 165
       
   757 00:08:23,915 --> 00:08:27,950
       
   758 I first have to check what
       
   759 does it simplify underneath.
       
   760 
       
   761 166
       
   762 00:08:27,950 --> 00:08:31,385
       
   763 So I call first simplify
       
   764 and then have a look.
       
   765 
       
   766 167
       
   767 00:08:31,385 --> 00:08:33,020
       
   768 Does it matches C or one of
       
   769 
       
   770 168
       
   771 00:08:33,020 --> 00:08:36,035
       
   772 the simplification
       
   773 damage, just return 0.
       
   774 
       
   775 169
       
   776 00:08:36,035 --> 00:08:38,330
       
   777 Or if the other component is 0,
       
   778 
       
   779 170
       
   780 00:08:38,330 --> 00:08:40,535
       
   781 I can also return a 0.
       
   782 
       
   783 171
       
   784 00:08:40,535 --> 00:08:43,310
       
   785 If it's one, then I keep
       
   786 the other component.
       
   787 
       
   788 172
       
   789 00:08:43,310 --> 00:08:45,920
       
   790 And if this iss two is one,
       
   791 
       
   792 173
       
   793 00:08:45,920 --> 00:08:47,615
       
   794 and I keep the first component,
       
   795 
       
   796 174
       
   797 00:08:47,615 --> 00:08:50,359
       
   798 and otherwise it's
       
   799 still the sequence.
       
   800 
       
   801 175
       
   802 00:08:50,359 --> 00:08:53,540
       
   803 And in all the other cases I
       
   804 don't have to do anything.
       
   805 
       
   806 176
       
   807 00:08:53,540 --> 00:08:55,700
       
   808 It's just a plain
       
   809 0. I leave it in.
       
   810 
       
   811 177
       
   812 00:08:55,700 --> 00:08:57,860
       
   813 If it's a plain
       
   814 warm, I leave it in.
       
   815 
       
   816 178
       
   817 00:08:57,860 --> 00:09:00,170
       
   818 And if something is under a star,
       
   819 
       
   820 179
       
   821 00:09:00,170 --> 00:09:02,840
       
   822 I don't open up this
       
   823 door and simplify it.
       
   824 
       
   825 180
       
   826 00:09:02,840 --> 00:09:06,680
       
   827 I just apply that to be
       
   828 as quick as possible.
       
   829 
       
   830 181
       
   831 00:09:06,680 --> 00:09:10,280
       
   832 Let's see whether this has
       
   833 any effect on our code.
       
   834 
       
   835 182
       
   836 00:09:10,280 --> 00:09:12,980
       
   837 So I'm now in the
       
   838 file read three,
       
   839 
       
   840 183
       
   841 00:09:12,980 --> 00:09:17,405
       
   842 and it's the same as Reto.
       
   843 
       
   844 184
       
   845 00:09:17,405 --> 00:09:20,885
       
   846 It still has this
       
   847 explicit and Times case,
       
   848 
       
   849 185
       
   850 00:09:20,885 --> 00:09:24,665
       
   851 nullable and derivative
       
   852 still the same.
       
   853 
       
   854 186
       
   855 00:09:24,665 --> 00:09:28,610
       
   856 Except now we have this
       
   857 simplification function as well.
       
   858 
       
   859 187
       
   860 00:09:28,610 --> 00:09:29,960
       
   861 And when we apply
       
   862 
       
   863 188
       
   864 00:09:29,960 --> 00:09:33,725
       
   865 the derivative after
       
   866 the apply deteriorated,
       
   867 
       
   868 189
       
   869 00:09:33,725 --> 00:09:35,870
       
   870 simplify it to throw away
       
   871 
       
   872 190
       
   873 00:09:35,870 --> 00:09:39,050
       
   874 all the junk this
       
   875 derivative introduced.
       
   876 
       
   877 191
       
   878 00:09:39,050 --> 00:09:41,510
       
   879 Otherwise everything
       
   880 stays the same.
       
   881 
       
   882 192
       
   883 00:09:41,510 --> 00:09:43,580
       
   884 You still have this expansion
       
   885 
       
   886 193
       
   887 00:09:43,580 --> 00:09:45,515
       
   888 of the optional reg expression.
       
   889 
       
   890 194
       
   891 00:09:45,515 --> 00:09:49,670
       
   892 Here, our two evil wreck
       
   893 expressions we use as a test.
       
   894 
       
   895 195
       
   896 00:09:49,670 --> 00:09:52,460
       
   897 And here's now this test case,
       
   898 
       
   899 196
       
   900 00:09:52,460 --> 00:09:55,835
       
   901 the first one and the
       
   902 actually now trying it
       
   903 
       
   904 197
       
   905 00:09:55,835 --> 00:10:00,515
       
   906 out for strings between
       
   907 09 thousand a's.
       
   908 
       
   909 198
       
   910 00:10:00,515 --> 00:10:08,285
       
   911 So C. So also gets
       
   912 simplification makes a,
       
   913 
       
   914 199
       
   915 00:10:08,285 --> 00:10:10,655
       
   916 actually this case fast on.
       
   917 
       
   918 200
       
   919 00:10:10,655 --> 00:10:16,260
       
   920 So we can now run strings
       
   921 up to 9 thousand.
       
   922 
       
   923 201
       
   924 00:10:16,260 --> 00:10:19,360
       
   925 And we don't actually
       
   926 sweat about this at all.
       
   927 
       
   928 202
       
   929 00:10:19,360 --> 00:10:24,745
       
   930 For I have to say my laptop
       
   931 is now starting its fan.
       
   932 
       
   933 203
       
   934 00:10:24,745 --> 00:10:28,525
       
   935 And also, because the times
       
   936 are relatively small,
       
   937 
       
   938 204
       
   939 00:10:28,525 --> 00:10:30,610
       
   940 I'm actually running
       
   941 each test three
       
   942 
       
   943 205
       
   944 00:10:30,610 --> 00:10:32,785
       
   945 times and then take the average,
       
   946 
       
   947 206
       
   948 00:10:32,785 --> 00:10:34,720
       
   949 which I didn't do before,
       
   950 
       
   951 207
       
   952 00:10:34,720 --> 00:10:37,720
       
   953 just to be a tiny bit
       
   954 more accurate map.
       
   955 
       
   956 208
       
   957 00:10:37,720 --> 00:10:42,025
       
   958 So three seconds for a
       
   959 string of 9 thousand a's.
       
   960 
       
   961 209
       
   962 00:10:42,025 --> 00:10:44,980
       
   963 And now the more
       
   964 interesting question is,
       
   965 
       
   966 210
       
   967 00:10:44,980 --> 00:10:46,240
       
   968 for the second case,
       
   969 
       
   970 211
       
   971 00:10:46,240 --> 00:10:48,625
       
   972 this E star, star b.
       
   973 
       
   974 212
       
   975 00:10:48,625 --> 00:10:51,250
       
   976 And you can already see
       
   977 on these numbers RB.
       
   978 
       
   979 213
       
   980 00:10:51,250 --> 00:10:53,260
       
   981 And now you're really ambitious.
       
   982 
       
   983 214
       
   984 00:10:53,260 --> 00:10:57,860
       
   985 And let's see how our
       
   986 program is coping with that.
       
   987 
       
   988 215
       
   989 00:11:02,610 --> 00:11:07,780
       
   990 Again. No sweat, at
       
   991 least not on my part.
       
   992 
       
   993 216
       
   994 00:11:07,780 --> 00:11:10,480
       
   995 The laptop thefts to
       
   996 calculate quite a bit.
       
   997 
       
   998 217
       
   999 00:11:10,480 --> 00:11:12,940
       
  1000 But this is now a string of
       
  1001 
       
  1002 218
       
  1003 00:11:12,940 --> 00:11:16,539
       
  1004 3 million A's and
       
  1005 it needs a second.
       
  1006 
       
  1007 219
       
  1008 00:11:16,539 --> 00:11:20,680
       
  1009 And let's see how far we get.
       
  1010 
       
  1011 220
       
  1012 00:11:20,680 --> 00:11:25,280
       
  1013 4 million a around two seconds.
       
  1014 
       
  1015 221
       
  1016 00:11:27,030 --> 00:11:29,050
       
  1017 So say it again,
       
  1018 
       
  1019 222
       
  1020 00:11:29,050 --> 00:11:31,690
       
  1021 I'm actually slowing it down.
       
  1022 
       
  1023 223
       
  1024 00:11:31,690 --> 00:11:34,150
       
  1025 He artificially run the test
       
  1026 
       
  1027 224
       
  1028 00:11:34,150 --> 00:11:36,895
       
  1029 three times and then
       
  1030 take the average.
       
  1031 
       
  1032 225
       
  1033 00:11:36,895 --> 00:11:42,749
       
  1034 But it seems to be something
       
  1035 less than five seconds.
       
  1036 
       
  1037 226
       
  1038 00:11:42,749 --> 00:11:48,185
       
  1039 And remember that is a
       
  1040 string of 6 million A's.
       
  1041 
       
  1042 227
       
  1043 00:11:48,185 --> 00:11:51,170
       
  1044 Let's just have a
       
  1045 look at the graphs.
       
  1046 
       
  1047 228
       
  1048 00:11:51,170 --> 00:11:56,105
       
  1049 So the simplification also
       
  1050 made of first case faster.
       
  1051 
       
  1052 229
       
  1053 00:11:56,105 --> 00:11:58,880
       
  1054 So earlier without
       
  1055 simplification,
       
  1056 
       
  1057 230
       
  1058 00:11:58,880 --> 00:12:00,710
       
  1059 where we just have
       
  1060 this explicit and
       
  1061 
       
  1062 231
       
  1063 00:12:00,710 --> 00:12:03,050
       
  1064 times that at this graph.
       
  1065 
       
  1066 232
       
  1067 00:12:03,050 --> 00:12:05,210
       
  1068 And now we can go up to
       
  1069 
       
  1070 233
       
  1071 00:12:05,210 --> 00:12:09,620
       
  1072 10 thousand and be still
       
  1073 under five seconds.
       
  1074 
       
  1075 234
       
  1076 00:12:09,620 --> 00:12:12,300
       
  1077 So that is good news.
       
  1078 
       
  1079 235
       
  1080 00:12:13,270 --> 00:12:16,745
       
  1081 In the other example, remember,
       
  1082 
       
  1083 236
       
  1084 00:12:16,745 --> 00:12:19,220
       
  1085 the existing regular
       
  1086 expression matches in
       
  1087 
       
  1088 237
       
  1089 00:12:19,220 --> 00:12:22,880
       
  1090 Java eight, Python,
       
  1091 and JavaScript.
       
  1092 
       
  1093 238
       
  1094 00:12:22,880 --> 00:12:26,030
       
  1095 And thanks to the
       
  1096 student be Ozma,
       
  1097 
       
  1098 239
       
  1099 00:12:26,030 --> 00:12:27,935
       
  1100 half a graph for Swift.
       
  1101 
       
  1102 240
       
  1103 00:12:27,935 --> 00:12:29,750
       
  1104 They're pretty atrocious.
       
  1105 
       
  1106 241
       
  1107 00:12:29,750 --> 00:12:33,320
       
  1108 They need like for 30 Ace,
       
  1109 
       
  1110 242
       
  1111 00:12:33,320 --> 00:12:37,490
       
  1112 something like on the
       
  1113 magnitude of thirty-seconds.
       
  1114 
       
  1115 243
       
  1116 00:12:37,490 --> 00:12:39,740
       
  1117 As I said already,
       
  1118 
       
  1119 244
       
  1120 00:12:39,740 --> 00:12:42,665
       
  1121 Java nine is slightly better.
       
  1122 
       
  1123 245
       
  1124 00:12:42,665 --> 00:12:44,870
       
  1125 Java nine and above this,
       
  1126 
       
  1127 246
       
  1128 00:12:44,870 --> 00:12:46,220
       
  1129 if you try that example,
       
  1130 
       
  1131 247
       
  1132 00:12:46,220 --> 00:12:48,665
       
  1133 the same exponential and nine,
       
  1134 
       
  1135 248
       
  1136 00:12:48,665 --> 00:12:51,230
       
  1137 you would be able to
       
  1138 process something
       
  1139 
       
  1140 249
       
  1141 00:12:51,230 --> 00:12:54,215
       
  1142 like 40 thousand A's
       
  1143 in half a minute.
       
  1144 
       
  1145 250
       
  1146 00:12:54,215 --> 00:12:57,740
       
  1147 I have to admit I'm not
       
  1148 a 100% sure what they
       
  1149 
       
  1150 251
       
  1151 00:12:57,740 --> 00:13:03,575
       
  1152 did to improve the
       
  1153 performance between Java 89.
       
  1154 
       
  1155 252
       
  1156 00:13:03,575 --> 00:13:05,510
       
  1157 I know they did something on
       
  1158 
       
  1159 253
       
  1160 00:13:05,510 --> 00:13:07,790
       
  1161 their regular expression
       
  1162 matching engine,
       
  1163 
       
  1164 254
       
  1165 00:13:07,790 --> 00:13:09,770
       
  1166 but I haven't really looked
       
  1167 
       
  1168 255
       
  1169 00:13:09,770 --> 00:13:12,230
       
  1170 at what precisely they've done.
       
  1171 
       
  1172 256
       
  1173 00:13:12,230 --> 00:13:17,945
       
  1174 But that's still not
       
  1175 really competition fas.
       
  1176 
       
  1177 257
       
  1178 00:13:17,945 --> 00:13:20,915
       
  1179 So our third version,
       
  1180 
       
  1181 258
       
  1182 00:13:20,915 --> 00:13:24,335
       
  1183 in this example, a star, star b.
       
  1184 
       
  1185 259
       
  1186 00:13:24,335 --> 00:13:28,445
       
  1187 We say it's something like
       
  1188 We need 6 million A's.
       
  1189 
       
  1190 260
       
  1191 00:13:28,445 --> 00:13:31,760
       
  1192 And again, I think these
       
  1193 are slightly older times,
       
  1194 
       
  1195 261
       
  1196 00:13:31,760 --> 00:13:33,770
       
  1197 so he had needs 20 seconds.
       
  1198 
       
  1199 262
       
  1200 00:13:33,770 --> 00:13:37,250
       
  1201 I think we had something
       
  1202 like below five seconds.
       
  1203 
       
  1204 263
       
  1205 00:13:37,250 --> 00:13:40,865
       
  1206 But again, you can see
       
  1207 the trends. We rants.
       
  1208 
       
  1209 264
       
  1210 00:13:40,865 --> 00:13:42,590
       
  1211 Circles around them.
       
  1212 
       
  1213 265
       
  1214 00:13:42,590 --> 00:13:46,530
       
  1215 And that's quite nice.
       
  1216 
       
  1217 266
       
  1218 00:13:46,570 --> 00:13:49,774
       
  1219 So what's good about
       
  1220 this algorithm?
       
  1221 
       
  1222 267
       
  1223 00:13:49,774 --> 00:13:51,605
       
  1224 By pressure of ski?
       
  1225 
       
  1226 268
       
  1227 00:13:51,605 --> 00:13:54,500
       
  1228 Well, first, it extends
       
  1229 
       
  1230 269
       
  1231 00:13:54,500 --> 00:13:57,800
       
  1232 actually to quite a number
       
  1233 of regular expressions.
       
  1234 
       
  1235 270
       
  1236 00:13:57,800 --> 00:13:59,900
       
  1237 So we saw in this example
       
  1238 
       
  1239 271
       
  1240 00:13:59,900 --> 00:14:03,950
       
  1241 the End Times regular expression
       
  1242 is a little bit of work.
       
  1243 
       
  1244 272
       
  1245 00:14:03,950 --> 00:14:05,405
       
  1246 We could extend that.
       
  1247 
       
  1248 273
       
  1249 00:14:05,405 --> 00:14:08,480
       
  1250 It also extends to the
       
  1251 not reg expression,
       
  1252 
       
  1253 274
       
  1254 00:14:08,480 --> 00:14:10,820
       
  1255 which I explain on
       
  1256 the next slide.
       
  1257 
       
  1258 275
       
  1259 00:14:10,820 --> 00:14:15,290
       
  1260 It's very easy to implement
       
  1261 in a functional language.
       
  1262 
       
  1263 276
       
  1264 00:14:15,290 --> 00:14:16,610
       
  1265 If you don't buy
       
  1266 
       
  1267 277
       
  1268 00:14:16,610 --> 00:14:20,675
       
  1269 all that functional
       
  1270 programming language stuff,
       
  1271 
       
  1272 278
       
  1273 00:14:20,675 --> 00:14:22,955
       
  1274 you still have to agree.
       
  1275 
       
  1276 279
       
  1277 00:14:22,955 --> 00:14:25,715
       
  1278 Quite beautiful in Scala.
       
  1279 
       
  1280 280
       
  1281 00:14:25,715 --> 00:14:28,160
       
  1282 And you could also
       
  1283 easily implemented
       
  1284 
       
  1285 281
       
  1286 00:14:28,160 --> 00:14:31,760
       
  1287 in C language or in Python.
       
  1288 
       
  1289 282
       
  1290 00:14:31,760 --> 00:14:34,250
       
  1291 There's really no
       
  1292 problem with that.
       
  1293 
       
  1294 283
       
  1295 00:14:34,250 --> 00:14:38,390
       
  1296 Say the algorithm's actually
       
  1297 quite old already brush-off,
       
  1298 
       
  1299 284
       
  1300 00:14:38,390 --> 00:14:44,899
       
  1301 ski wrote it down in
       
  1302 1964 and his PhD thesis.
       
  1303 
       
  1304 285
       
  1305 00:14:44,899 --> 00:14:49,460
       
  1306 Somehow it was forgotten during
       
  1307 
       
  1308 286
       
  1309 00:14:49,460 --> 00:14:54,095
       
  1310 the next decades and only
       
  1311 recently has been rediscovered.
       
  1312 
       
  1313 287
       
  1314 00:14:54,095 --> 00:14:57,065
       
  1315 At the moment, we are
       
  1316 only solve the problem
       
  1317 
       
  1318 288
       
  1319 00:14:57,065 --> 00:15:02,240
       
  1320 of Gmail reg expression
       
  1321 gamma string deaths,
       
  1322 
       
  1323 289
       
  1324 00:15:02,240 --> 00:15:05,550
       
  1325 the regular expression
       
  1326 match the string yes or no.
       
  1327 
       
  1328 290
       
  1329 00:15:05,550 --> 00:15:08,740
       
  1330 The want to of course, use
       
  1331 it as part of our lexicon.
       
  1332 
       
  1333 291
       
  1334 00:15:08,740 --> 00:15:12,370
       
  1335 And there we have to do
       
  1336 something more complicated.
       
  1337 
       
  1338 292
       
  1339 00:15:12,370 --> 00:15:15,384
       
  1340 We have to essentially
       
  1341 generate tokens.
       
  1342 
       
  1343 293
       
  1344 00:15:15,384 --> 00:15:18,670
       
  1345 And this will still take
       
  1346 a little bit of work.
       
  1347 
       
  1348 294
       
  1349 00:15:18,670 --> 00:15:22,045
       
  1350 And that's actually relatively
       
  1351 recent work by somebody
       
  1352 
       
  1353 295
       
  1354 00:15:22,045 --> 00:15:26,110
       
  1355 called suits Month and
       
  1356 the co-worker called Lou.
       
  1357 
       
  1358 296
       
  1359 00:15:26,110 --> 00:15:30,880
       
  1360 And what I personally
       
  1361 also find very satisfying
       
  1362 
       
  1363 297
       
  1364 00:15:30,880 --> 00:15:32,695
       
  1365 about this algorithm is
       
  1366 
       
  1367 298
       
  1368 00:15:32,695 --> 00:15:36,040
       
  1369 that we can actually
       
  1370 prove that it's correct.
       
  1371 
       
  1372 299
       
  1373 00:15:36,040 --> 00:15:37,735
       
  1374 You might remember I did
       
  1375 
       
  1376 300
       
  1377 00:15:37,735 --> 00:15:41,500
       
  1378 quite some interesting
       
  1379 transformations
       
  1380 
       
  1381 301
       
  1382 00:15:41,500 --> 00:15:44,830
       
  1383 when I calculated the derivative.
       
  1384 
       
  1385 302
       
  1386 00:15:44,830 --> 00:15:48,270
       
  1387 How can I be sure that
       
  1388 this is all correct?
       
  1389 
       
  1390 303
       
  1391 00:15:48,270 --> 00:15:50,270
       
  1392 Actually, I can now go away and
       
  1393 
       
  1394 304
       
  1395 00:15:50,270 --> 00:15:52,865
       
  1396 prove the correctness
       
  1397 of this algorithm.
       
  1398 
       
  1399 305
       
  1400 00:15:52,865 --> 00:15:56,645
       
  1401 Does it really satisfy
       
  1402 the specification?
       
  1403 
       
  1404 306
       
  1405 00:15:56,645 --> 00:15:58,550
       
  1406 Is really fs string
       
  1407 
       
  1408 307
       
  1409 00:15:58,550 --> 00:16:00,440
       
  1410 is in the language
       
  1411 of a reg expression.
       
  1412 
       
  1413 308
       
  1414 00:16:00,440 --> 00:16:03,050
       
  1415 Does that matter? Vd say yes.
       
  1416 
       
  1417 309
       
  1418 00:16:03,050 --> 00:16:07,070
       
  1419 However, I leave that
       
  1420 for another video.
       
  1421 
       
  1422 310
       
  1423 00:16:07,070 --> 00:16:10,580
       
  1424 What I also like about
       
  1425 this algorithm that can be
       
  1426 
       
  1427 311
       
  1428 00:16:10,580 --> 00:16:13,775
       
  1429 actually extended to quite a
       
  1430 number of rec expressions.
       
  1431 
       
  1432 312
       
  1433 00:16:13,775 --> 00:16:17,810
       
  1434 So this is t not reg
       
  1435 expression that is
       
  1436 
       
  1437 313
       
  1438 00:16:17,810 --> 00:16:19,760
       
  1439 opposed to match strings which
       
  1440 
       
  1441 314
       
  1442 00:16:19,760 --> 00:16:22,235
       
  1443 this reg expression cannot match.
       
  1444 
       
  1445 315
       
  1446 00:16:22,235 --> 00:16:24,245
       
  1447 So here's an example.
       
  1448 
       
  1449 316
       
  1450 00:16:24,245 --> 00:16:28,640
       
  1451 You're in the business of
       
  1452 trying to find out what
       
  1453 
       
  1454 317
       
  1455 00:16:28,640 --> 00:16:33,530
       
  1456 our comments in languages like
       
  1457 Java or C. Then you know,
       
  1458 
       
  1459 318
       
  1460 00:16:33,530 --> 00:16:35,060
       
  1461 they always start with a slash,
       
  1462 
       
  1463 319
       
  1464 00:16:35,060 --> 00:16:36,245
       
  1465 then comes a star,
       
  1466 
       
  1467 320
       
  1468 00:16:36,245 --> 00:16:38,240
       
  1469 then comes an arbitrary string.
       
  1470 
       
  1471 321
       
  1472 00:16:38,240 --> 00:16:41,195
       
  1473 And then they finish
       
  1474 with a slash and a star.
       
  1475 
       
  1476 322
       
  1477 00:16:41,195 --> 00:16:44,330
       
  1478 And how you want to specify
       
  1479 that is something like this.
       
  1480 
       
  1481 323
       
  1482 00:16:44,330 --> 00:16:45,530
       
  1483 You want to say, OK,
       
  1484 
       
  1485 324
       
  1486 00:16:45,530 --> 00:16:48,245
       
  1487 a comment starts with
       
  1488 a slash and a star.
       
  1489 
       
  1490 325
       
  1491 00:16:48,245 --> 00:16:51,410
       
  1492 Then it comes any
       
  1493 string in between.
       
  1494 
       
  1495 326
       
  1496 00:16:51,410 --> 00:16:55,340
       
  1497 But this string
       
  1498 in-between cannot contain
       
  1499 
       
  1500 327
       
  1501 00:16:55,340 --> 00:16:58,280
       
  1502 any star and slash
       
  1503 because that would then
       
  1504 
       
  1505 328
       
  1506 00:16:58,280 --> 00:17:01,415
       
  1507 indicate there's the
       
  1508 end already there.
       
  1509 
       
  1510 329
       
  1511 00:17:01,415 --> 00:17:03,680
       
  1512 And then after you
       
  1513 have such a string
       
  1514 
       
  1515 330
       
  1516 00:17:03,680 --> 00:17:06,410
       
  1517 which doesn't
       
  1518 contain as standard,
       
  1519 
       
  1520 331
       
  1521 00:17:06,410 --> 00:17:11,180
       
  1522 then at the end you want to
       
  1523 match a star and a slash.
       
  1524 
       
  1525 332
       
  1526 00:17:11,180 --> 00:17:13,460
       
  1527 So for these kind of comments,
       
  1528 
       
  1529 333
       
  1530 00:17:13,460 --> 00:17:15,995
       
  1531 this reg expression is
       
  1532 actually quite useful.
       
  1533 
       
  1534 334
       
  1535 00:17:15,995 --> 00:17:19,430
       
  1536 And if you ever seen
       
  1537 how to do the negation,
       
  1538 
       
  1539 335
       
  1540 00:17:19,430 --> 00:17:21,995
       
  1541 this kinda break
       
  1542 expression with automata?
       
  1543 
       
  1544 336
       
  1545 00:17:21,995 --> 00:17:24,710
       
  1546 You will know that's
       
  1547 a bit of a pain,
       
  1548 
       
  1549 337
       
  1550 00:17:24,710 --> 00:17:26,675
       
  1551 but for the Borowski,
       
  1552 
       
  1553 338
       
  1554 00:17:26,675 --> 00:17:28,370
       
  1555 it's no pain at all.
       
  1556 
       
  1557 339
       
  1558 00:17:28,370 --> 00:17:30,710
       
  1559 The meaning of this
       
  1560 reg expression
       
  1561 
       
  1562 340
       
  1563 00:17:30,710 --> 00:17:32,255
       
  1564 would be something like that.
       
  1565 
       
  1566 341
       
  1567 00:17:32,255 --> 00:17:35,540
       
  1568 If I have all the
       
  1569 strings there are,
       
  1570 
       
  1571 342
       
  1572 00:17:35,540 --> 00:17:38,390
       
  1573 and I take away all the strings,
       
  1574 
       
  1575 343
       
  1576 00:17:38,390 --> 00:17:40,055
       
  1577 this arc and match.
       
  1578 
       
  1579 344
       
  1580 00:17:40,055 --> 00:17:42,080
       
  1581 The remaining strings are
       
  1582 
       
  1583 345
       
  1584 00:17:42,080 --> 00:17:44,630
       
  1585 what this reg expression
       
  1586 is supposed to match.
       
  1587 
       
  1588 346
       
  1589 00:17:44,630 --> 00:17:47,165
       
  1590 So I can specify
       
  1591 what the meaning is.
       
  1592 
       
  1593 347
       
  1594 00:17:47,165 --> 00:17:49,760
       
  1595 And then it's also very easy to
       
  1596 
       
  1597 348
       
  1598 00:17:49,760 --> 00:17:52,174
       
  1599 say what is nullable
       
  1600 and derivative.
       
  1601 
       
  1602 349
       
  1603 00:17:52,174 --> 00:17:54,140
       
  1604 So for nullable, it would be
       
  1605 
       
  1606 350
       
  1607 00:17:54,140 --> 00:17:56,570
       
  1608 just a test where
       
  1609 the eyes nullable.
       
  1610 
       
  1611 351
       
  1612 00:17:56,570 --> 00:17:58,985
       
  1613 And I just take the
       
  1614 negation of that.
       
  1615 
       
  1616 352
       
  1617 00:17:58,985 --> 00:18:00,680
       
  1618 And men I have to build
       
  1619 
       
  1620 353
       
  1621 00:18:00,680 --> 00:18:03,620
       
  1622 the derivative of this
       
  1623 not reg expression.
       
  1624 
       
  1625 354
       
  1626 00:18:03,620 --> 00:18:05,420
       
  1627 Then I just have to move
       
  1628 
       
  1629 355
       
  1630 00:18:05,420 --> 00:18:07,325
       
  1631 this permutation does derivative,
       
  1632 
       
  1633 356
       
  1634 00:18:07,325 --> 00:18:10,070
       
  1635 derivative inside
       
  1636 the wreck expression
       
  1637 
       
  1638 357
       
  1639 00:18:10,070 --> 00:18:12,575
       
  1640 and keep the not on
       
  1641 the outermost level.
       
  1642 
       
  1643 358
       
  1644 00:18:12,575 --> 00:18:15,515
       
  1645 So there's again no
       
  1646 pain involved in
       
  1647 
       
  1648 359
       
  1649 00:18:15,515 --> 00:18:19,130
       
  1650 adding this reg expression
       
  1651 to the algorithm.
       
  1652 
       
  1653 360
       
  1654 00:18:19,130 --> 00:18:22,100
       
  1655 We made it to the end of
       
  1656 this video and we made
       
  1657 
       
  1658 361
       
  1659 00:18:22,100 --> 00:18:24,739
       
  1660 it also to the end
       
  1661 of this algorithm.
       
  1662 
       
  1663 362
       
  1664 00:18:24,739 --> 00:18:27,620
       
  1665 I can see to loose trends.
       
  1666 
       
  1667 363
       
  1668 00:18:27,620 --> 00:18:29,420
       
  1669 One is we implemented
       
  1670 
       
  1671 364
       
  1672 00:18:29,420 --> 00:18:32,855
       
  1673 this algorithm for the
       
  1674 basic regular expressions.
       
  1675 
       
  1676 365
       
  1677 00:18:32,855 --> 00:18:38,705
       
  1678 And we also added the end
       
  1679 times out of necessity.
       
  1680 
       
  1681 366
       
  1682 00:18:38,705 --> 00:18:41,120
       
  1683 And I also showed
       
  1684 you how to implement
       
  1685 
       
  1686 367
       
  1687 00:18:41,120 --> 00:18:44,840
       
  1688 the not regular
       
  1689 expression on a slide.
       
  1690 
       
  1691 368
       
  1692 00:18:44,840 --> 00:18:48,995
       
  1693 But your task in the
       
  1694 coursework actually is
       
  1695 
       
  1696 369
       
  1697 00:18:48,995 --> 00:18:52,970
       
  1698 to extend that to some of
       
  1699 the extended reg expression.
       
  1700 
       
  1701 370
       
  1702 00:18:52,970 --> 00:18:57,260
       
  1703 So especially these bounded
       
  1704 repetitions like from 0 to
       
  1705 
       
  1706 371
       
  1707 00:18:57,260 --> 00:19:01,550
       
  1708 n times or between n and m times.
       
  1709 
       
  1710 372
       
  1711 00:19:01,550 --> 00:19:04,325
       
  1712 And I think also
       
  1713 plus and D option.
       
  1714 
       
  1715 373
       
  1716 00:19:04,325 --> 00:19:07,220
       
  1717 The other loose trend is,
       
  1718 
       
  1719 374
       
  1720 00:19:07,220 --> 00:19:09,200
       
  1721 remember I did this while
       
  1722 
       
  1723 375
       
  1724 00:19:09,200 --> 00:19:11,645
       
  1725 calculations with
       
  1726 regular expressions.
       
  1727 
       
  1728 376
       
  1729 00:19:11,645 --> 00:19:13,205
       
  1730 Why on earth?
       
  1731 
       
  1732 377
       
  1733 00:19:13,205 --> 00:19:14,480
       
  1734 Is that all correct?
       
  1735 
       
  1736 378
       
  1737 00:19:14,480 --> 00:19:16,355
       
  1738 Why on earth should
       
  1739 this algorithm
       
  1740 
       
  1741 379
       
  1742 00:19:16,355 --> 00:19:18,575
       
  1743 actually satisfy
       
  1744 our specification,
       
  1745 
       
  1746 380
       
  1747 00:19:18,575 --> 00:19:20,450
       
  1748 which we set out
       
  1749 at the beginning.
       
  1750 
       
  1751 381
       
  1752 00:19:20,450 --> 00:19:25,410
       
  1753 So that needs to be looked at
       
  1754 more closely. Bye for now.