videos/02-algo1.srt
changeset 766 e8402d8ec8e6
child 769 f9686b22db7e
equal deleted inserted replaced
765:b294cfbb5c01 766:e8402d8ec8e6
       
     1 1
       
     2 00:00:05,880 --> 00:00:09,700
       
     3 Welcome back.
       
     4 Remember this slide.
       
     5 
       
     6 2
       
     7 00:00:09,700 --> 00:00:11,500
       
     8 This slide said, What is
       
     9 
       
    10 3
       
    11 00:00:11,500 --> 00:00:14,500
       
    12 all wreck expression Metro
       
    13 actually supposed to do?
       
    14 
       
    15 4
       
    16 00:00:14,500 --> 00:00:16,570
       
    17 It will take two arguments and
       
    18 
       
    19 5
       
    20 00:00:16,570 --> 00:00:18,670
       
    21 reg expression R and a string
       
    22 
       
    23 6
       
    24 00:00:18,670 --> 00:00:21,580
       
    25 S. And it's supposed to say yes,
       
    26 
       
    27 7
       
    28 00:00:21,580 --> 00:00:23,440
       
    29 the wreck expression matches
       
    30 
       
    31 8
       
    32 00:00:23,440 --> 00:00:26,920
       
    33 the string if and only
       
    34 if the string is in
       
    35 
       
    36 9
       
    37 00:00:26,920 --> 00:00:28,855
       
    38 the language of R.
       
    39 
       
    40 10
       
    41 00:00:28,855 --> 00:00:32,410
       
    42 And if the string is not
       
    43 in the language of our,
       
    44 
       
    45 11
       
    46 00:00:32,410 --> 00:00:35,515
       
    47 then our algorithm has to say no.
       
    48 
       
    49 12
       
    50 00:00:35,515 --> 00:00:37,210
       
    51 And we can't use
       
    52 
       
    53 13
       
    54 00:00:37,210 --> 00:00:39,565
       
    55 this specification
       
    56 directly because remember,
       
    57 
       
    58 14
       
    59 00:00:39,565 --> 00:00:43,305
       
    60 this l Sometimes
       
    61 produces infinite sets.
       
    62 
       
    63 15
       
    64 00:00:43,305 --> 00:00:47,585
       
    65 And so we can test whether a
       
    66 string is an infinite set,
       
    67 
       
    68 16
       
    69 00:00:47,585 --> 00:00:50,090
       
    70 at least not easily.
       
    71 
       
    72 17
       
    73 00:00:50,090 --> 00:00:52,340
       
    74 And so what we have to do
       
    75 
       
    76 18
       
    77 00:00:52,340 --> 00:00:54,260
       
    78 instead is we have
       
    79 to be a bit more
       
    80 
       
    81 19
       
    82 00:00:54,260 --> 00:00:57,050
       
    83 clever and implement
       
    84 some operations
       
    85 
       
    86 20
       
    87 00:00:57,050 --> 00:00:59,284
       
    88 on Rekha expressions instead.
       
    89 
       
    90 21
       
    91 00:00:59,284 --> 00:01:03,275
       
    92 Because Weka expressions
       
    93 are always finite trees.
       
    94 
       
    95 22
       
    96 00:01:03,275 --> 00:01:05,870
       
    97 I should say the
       
    98 person who has been
       
    99 
       
   100 23
       
   101 00:01:05,870 --> 00:01:08,150
       
   102 clever is called brush-off ski.
       
   103 
       
   104 24
       
   105 00:01:08,150 --> 00:01:11,435
       
   106 It's his, I've written,
       
   107 I'm introducing here.
       
   108 
       
   109 25
       
   110 00:01:11,435 --> 00:01:15,515
       
   111 And his algorithm consists
       
   112 of two functions.
       
   113 
       
   114 26
       
   115 00:01:15,515 --> 00:01:17,840
       
   116 One is called
       
   117 nullable and it takes
       
   118 
       
   119 27
       
   120 00:01:17,840 --> 00:01:20,104
       
   121 a regular expression as argument.
       
   122 
       
   123 28
       
   124 00:01:20,104 --> 00:01:24,155
       
   125 And the idea is that
       
   126 this function nullable.
       
   127 
       
   128 29
       
   129 00:01:24,155 --> 00:01:26,450
       
   130 Testing whether
       
   131 the reg expression
       
   132 
       
   133 30
       
   134 00:01:26,450 --> 00:01:27,950
       
   135 can match the empty string.
       
   136 
       
   137 31
       
   138 00:01:27,950 --> 00:01:30,305
       
   139 So 0 cannot match any string.
       
   140 
       
   141 32
       
   142 00:01:30,305 --> 00:01:33,275
       
   143 So it cannot match the
       
   144 empty string either.
       
   145 
       
   146 33
       
   147 00:01:33,275 --> 00:01:35,465
       
   148 So that's defined as false.
       
   149 
       
   150 34
       
   151 00:01:35,465 --> 00:01:37,775
       
   152 This reg expression,
       
   153 the whole purpose
       
   154 
       
   155 35
       
   156 00:01:37,775 --> 00:01:39,680
       
   157 is that it can match
       
   158 the empty string.
       
   159 
       
   160 36
       
   161 00:01:39,680 --> 00:01:41,645
       
   162 So it's defined as true.
       
   163 
       
   164 37
       
   165 00:01:41,645 --> 00:01:44,599
       
   166 If a reg expression
       
   167 can match a character,
       
   168 
       
   169 38
       
   170 00:01:44,599 --> 00:01:47,045
       
   171 then it cannot match
       
   172 the empty string.
       
   173 
       
   174 39
       
   175 00:01:47,045 --> 00:01:49,445
       
   176 So that is defined as false.
       
   177 
       
   178 40
       
   179 00:01:49,445 --> 00:01:53,180
       
   180 If an alternative can
       
   181 match the empty string,
       
   182 
       
   183 41
       
   184 00:01:53,180 --> 00:01:56,960
       
   185 then either or one can
       
   186 match the empty string.
       
   187 
       
   188 42
       
   189 00:01:56,960 --> 00:01:59,720
       
   190 Or R2 can match the empty string.
       
   191 
       
   192 43
       
   193 00:01:59,720 --> 00:02:03,110
       
   194 So either nullable
       
   195 of R1 has to hold,
       
   196 
       
   197 44
       
   198 00:02:03,110 --> 00:02:06,945
       
   199 or nullable of R2 has to hold.
       
   200 
       
   201 45
       
   202 00:02:06,945 --> 00:02:09,925
       
   203 In this sequence, it's
       
   204 the other way around.
       
   205 
       
   206 46
       
   207 00:02:09,925 --> 00:02:12,790
       
   208 If this reg expression can
       
   209 match the empty string,
       
   210 
       
   211 47
       
   212 00:02:12,790 --> 00:02:16,615
       
   213 then R1 has to be able to
       
   214 match the empty string,
       
   215 
       
   216 48
       
   217 00:02:16,615 --> 00:02:20,290
       
   218 and R2 has to be able to
       
   219 match the empty string.
       
   220 
       
   221 49
       
   222 00:02:20,290 --> 00:02:22,555
       
   223 So here it's just the opposite.
       
   224 
       
   225 50
       
   226 00:02:22,555 --> 00:02:25,660
       
   227 In one case it is o and
       
   228 the other case it's end.
       
   229 
       
   230 51
       
   231 00:02:25,660 --> 00:02:27,970
       
   232 In the store record
       
   233 expression can match
       
   234 
       
   235 52
       
   236 00:02:27,970 --> 00:02:30,445
       
   237 always the empty string.
       
   238 So that is true.
       
   239 
       
   240 53
       
   241 00:02:30,445 --> 00:02:33,340
       
   242 So this is a simple
       
   243 recursive function
       
   244 
       
   245 54
       
   246 00:02:33,340 --> 00:02:37,179
       
   247 and should not be too
       
   248 difficult to implement.
       
   249 
       
   250 55
       
   251 00:02:37,179 --> 00:02:42,025
       
   252 Okay, this nullable function
       
   253 that is easy-peasy.
       
   254 
       
   255 56
       
   256 00:02:42,025 --> 00:02:44,604
       
   257 The second function, however,
       
   258 
       
   259 57
       
   260 00:02:44,604 --> 00:02:49,155
       
   261 is a bit more involved and
       
   262 that's just to be expected.
       
   263 
       
   264 58
       
   265 00:02:49,155 --> 00:02:53,075
       
   266 Remember people working in
       
   267 this area already for decades.
       
   268 
       
   269 59
       
   270 00:02:53,075 --> 00:02:57,305
       
   271 If they have some problems
       
   272 with runtime, for example,
       
   273 
       
   274 60
       
   275 00:02:57,305 --> 00:02:58,940
       
   276 we can't expect that as
       
   277 
       
   278 61
       
   279 00:02:58,940 --> 00:03:03,260
       
   280 simple fix will solve all
       
   281 the problems in the world.
       
   282 
       
   283 62
       
   284 00:03:03,260 --> 00:03:06,530
       
   285 So I admit the second
       
   286 function is a bit more
       
   287 
       
   288 63
       
   289 00:03:06,530 --> 00:03:10,085
       
   290 complicated and make sure
       
   291 that you understand it.
       
   292 
       
   293 64
       
   294 00:03:10,085 --> 00:03:12,140
       
   295 And it also just
       
   296 chose this preserves
       
   297 
       
   298 65
       
   299 00:03:12,140 --> 00:03:14,345
       
   300 the is a very clever guy.
       
   301 
       
   302 66
       
   303 00:03:14,345 --> 00:03:15,800
       
   304 Actually, I have to say he was
       
   305 
       
   306 67
       
   307 00:03:15,800 --> 00:03:17,720
       
   308 a clever guy because
       
   309 I think he either
       
   310 
       
   311 68
       
   312 00:03:17,720 --> 00:03:21,650
       
   313 died last year or
       
   314 the year before.
       
   315 
       
   316 69
       
   317 00:03:21,650 --> 00:03:25,505
       
   318 And he came up with this
       
   319 algorithm already in 1964.
       
   320 
       
   321 70
       
   322 00:03:25,505 --> 00:03:27,260
       
   323 It somehow got lost,
       
   324 
       
   325 71
       
   326 00:03:27,260 --> 00:03:30,305
       
   327 but has been rediscovered
       
   328 in the last ten years.
       
   329 
       
   330 72
       
   331 00:03:30,305 --> 00:03:34,685
       
   332 So the idea of the second
       
   333 function is the following.
       
   334 
       
   335 73
       
   336 00:03:34,685 --> 00:03:38,120
       
   337 Imagine you have a
       
   338 reexpression and it can
       
   339 
       
   340 74
       
   341 00:03:38,120 --> 00:03:41,930
       
   342 match a string of the
       
   343 form C followed by as.
       
   344 
       
   345 75
       
   346 00:03:41,930 --> 00:03:44,810
       
   347 So the C is the first
       
   348 character of that string.
       
   349 
       
   350 76
       
   351 00:03:44,810 --> 00:03:48,605
       
   352 So I mentioned that can
       
   353 match this kind of string.
       
   354 
       
   355 77
       
   356 00:03:48,605 --> 00:03:50,330
       
   357 The question is now,
       
   358 
       
   359 78
       
   360 00:03:50,330 --> 00:03:52,400
       
   361 what would a wrecker
       
   362 expression look
       
   363 
       
   364 79
       
   365 00:03:52,400 --> 00:03:54,695
       
   366 like that can match chest
       
   367 
       
   368 80
       
   369 00:03:54,695 --> 00:03:57,140
       
   370 s. You probably remember
       
   371 
       
   372 81
       
   373 00:03:57,140 --> 00:03:59,300
       
   374 that from the semantic
       
   375 that every relative,
       
   376 
       
   377 82
       
   378 00:03:59,300 --> 00:04:00,860
       
   379 there was also
       
   380 something of chopping
       
   381 
       
   382 83
       
   383 00:04:00,860 --> 00:04:02,210
       
   384 off the first character.
       
   385 
       
   386 84
       
   387 00:04:02,210 --> 00:04:04,940
       
   388 Here it's the same.
       
   389 If a reg expression
       
   390 
       
   391 85
       
   392 00:04:04,940 --> 00:04:07,835
       
   393 can match a string
       
   394 starting with a C,
       
   395 
       
   396 86
       
   397 00:04:07,835 --> 00:04:11,090
       
   398 we're looking for a wreck
       
   399 expression which can match
       
   400 
       
   401 87
       
   402 00:04:11,090 --> 00:04:15,215
       
   403 the rest of the string where
       
   404 the c has been chopped off.
       
   405 
       
   406 88
       
   407 00:04:15,215 --> 00:04:17,690
       
   408 And this operation will be called
       
   409 
       
   410 89
       
   411 00:04:17,690 --> 00:04:21,049
       
   412 the derivative of a
       
   413 wreck expression.
       
   414 
       
   415 90
       
   416 00:04:21,049 --> 00:04:22,205
       
   417 And it will also take
       
   418 
       
   419 91
       
   420 00:04:22,205 --> 00:04:25,460
       
   421 a character as argument
       
   422 and the rec expression.
       
   423 
       
   424 92
       
   425 00:04:25,460 --> 00:04:28,730
       
   426 And in contrast to
       
   427 the semantic records,
       
   428 
       
   429 93
       
   430 00:04:28,730 --> 00:04:31,310
       
   431 semantic derivative, which works
       
   432 
       
   433 94
       
   434 00:04:31,310 --> 00:04:34,430
       
   435 over languages or
       
   436 sets of strings.
       
   437 
       
   438 95
       
   439 00:04:34,430 --> 00:04:39,620
       
   440 This derivative works
       
   441 over regular expressions.
       
   442 
       
   443 96
       
   444 00:04:39,620 --> 00:04:41,330
       
   445 Okay, here's this function.
       
   446 
       
   447 97
       
   448 00:04:41,330 --> 00:04:43,970
       
   449 It's defined recursively over
       
   450 
       
   451 98
       
   452 00:04:43,970 --> 00:04:46,370
       
   453 the structure of rec expressions.
       
   454 
       
   455 99
       
   456 00:04:46,370 --> 00:04:48,814
       
   457 The first argument
       
   458 is the character,
       
   459 
       
   460 100
       
   461 00:04:48,814 --> 00:04:52,700
       
   462 and the second one is
       
   463 a wreck expression.
       
   464 
       
   465 101
       
   466 00:04:52,700 --> 00:04:56,510
       
   467 And remember the idea
       
   468 is we're looking for
       
   469 
       
   470 102
       
   471 00:04:56,510 --> 00:05:00,680
       
   472 a wreck expression that
       
   473 can match everything.
       
   474 
       
   475 103
       
   476 00:05:00,680 --> 00:05:03,125
       
   477 This reg expression
       
   478 as argument can match
       
   479 
       
   480 104
       
   481 00:05:03,125 --> 00:05:07,040
       
   482 except for the C. So now let's
       
   483 look at this first case.
       
   484 
       
   485 105
       
   486 00:05:07,040 --> 00:05:10,550
       
   487 So this reg expression
       
   488 cannot match any string.
       
   489 
       
   490 106
       
   491 00:05:10,550 --> 00:05:14,270
       
   492 So it certainly cannot
       
   493 match any string starting
       
   494 
       
   495 107
       
   496 00:05:14,270 --> 00:05:16,910
       
   497 a C. So we have to look
       
   498 
       
   499 108
       
   500 00:05:16,910 --> 00:05:20,090
       
   501 for and reg expression which
       
   502 can not much anything.
       
   503 
       
   504 109
       
   505 00:05:20,090 --> 00:05:22,700
       
   506 Well, that's the purpose
       
   507 of this record expression,
       
   508 
       
   509 110
       
   510 00:05:22,700 --> 00:05:24,815
       
   511 so we define it as 0.
       
   512 
       
   513 111
       
   514 00:05:24,815 --> 00:05:28,130
       
   515 In the next case,
       
   516 this reg expression
       
   517 
       
   518 112
       
   519 00:05:28,130 --> 00:05:30,440
       
   520 can match the empty string,
       
   521 
       
   522 113
       
   523 00:05:30,440 --> 00:05:33,440
       
   524 but it cannot match
       
   525 any string that starts
       
   526 
       
   527 114
       
   528 00:05:33,440 --> 00:05:36,350
       
   529 with C. So also in this case,
       
   530 
       
   531 115
       
   532 00:05:36,350 --> 00:05:39,560
       
   533 we just define it as
       
   534 the bracket question,
       
   535 
       
   536 116
       
   537 00:05:39,560 --> 00:05:41,615
       
   538 which cannot match anything.
       
   539 
       
   540 117
       
   541 00:05:41,615 --> 00:05:45,170
       
   542 In the next case, this
       
   543 C as the argument to
       
   544 
       
   545 118
       
   546 00:05:45,170 --> 00:05:48,335
       
   547 the derivative and this d
       
   548 is the Rekha expression.
       
   549 
       
   550 119
       
   551 00:05:48,335 --> 00:05:50,225
       
   552 So now there are two cases.
       
   553 
       
   554 120
       
   555 00:05:50,225 --> 00:05:53,525
       
   556 If this C and this D
       
   557 is actually equal.
       
   558 
       
   559 121
       
   560 00:05:53,525 --> 00:05:55,970
       
   561 That means this record
       
   562 expression can match
       
   563 
       
   564 122
       
   565 00:05:55,970 --> 00:05:59,240
       
   566 a string which just contains C0.
       
   567 
       
   568 123
       
   569 00:05:59,240 --> 00:06:01,505
       
   570 So if we strip that off,
       
   571 
       
   572 124
       
   573 00:06:01,505 --> 00:06:04,790
       
   574 motor remains is
       
   575 the empty string.
       
   576 
       
   577 125
       
   578 00:06:04,790 --> 00:06:06,890
       
   579 What is a regular expression
       
   580 
       
   581 126
       
   582 00:06:06,890 --> 00:06:09,170
       
   583 look like that can
       
   584 match the empty string.
       
   585 
       
   586 127
       
   587 00:06:09,170 --> 00:06:11,915
       
   588 Well, that's the
       
   589 purpose of the warm.
       
   590 
       
   591 128
       
   592 00:06:11,915 --> 00:06:15,440
       
   593 And if c is not equal to d,
       
   594 
       
   595 129
       
   596 00:06:15,440 --> 00:06:17,630
       
   597 then this reg expression
       
   598 
       
   599 130
       
   600 00:06:17,630 --> 00:06:19,220
       
   601 cannot match anything
       
   602 which starts with
       
   603 
       
   604 131
       
   605 00:06:19,220 --> 00:06:23,780
       
   606 a C. So again it will
       
   607 be defined as just 0.
       
   608 
       
   609 132
       
   610 00:06:23,780 --> 00:06:29,390
       
   611 Okay? Now, the alternative case,
       
   612 
       
   613 133
       
   614 00:06:29,390 --> 00:06:31,970
       
   615 if this reg expression can
       
   616 
       
   617 134
       
   618 00:06:31,970 --> 00:06:36,050
       
   619 match the strings
       
   620 starting with C,
       
   621 
       
   622 135
       
   623 00:06:36,050 --> 00:06:40,820
       
   624 then it can either be
       
   625 matched by the Ongwen.
       
   626 
       
   627 136
       
   628 00:06:40,820 --> 00:06:44,495
       
   629 Or it can be much by the R2.
       
   630 
       
   631 137
       
   632 00:06:44,495 --> 00:06:50,090
       
   633 If they are one can match C
       
   634 and then following a string,
       
   635 
       
   636 138
       
   637 00:06:50,090 --> 00:06:53,975
       
   638 then we just have to recursively
       
   639 call the derivative for
       
   640 
       
   641 139
       
   642 00:06:53,975 --> 00:06:56,570
       
   643 R to get a reg expression
       
   644 
       
   645 140
       
   646 00:06:56,570 --> 00:06:59,135
       
   647 that can match the
       
   648 rest of the string.
       
   649 
       
   650 141
       
   651 00:06:59,135 --> 00:07:02,690
       
   652 And the same we can do with R2.
       
   653 
       
   654 142
       
   655 00:07:02,690 --> 00:07:06,110
       
   656 You can find a reg expression
       
   657 which can match everything.
       
   658 
       
   659 143
       
   660 00:07:06,110 --> 00:07:07,850
       
   661 This R2 can match,
       
   662 
       
   663 144
       
   664 00:07:07,850 --> 00:07:09,710
       
   665 starting with C, bad,
       
   666 
       
   667 145
       
   668 00:07:09,710 --> 00:07:12,590
       
   669 which chopping off this C. Okay?
       
   670 
       
   671 146
       
   672 00:07:12,590 --> 00:07:16,370
       
   673 So now if you have to find
       
   674 the break expression,
       
   675 
       
   676 147
       
   677 00:07:16,370 --> 00:07:20,030
       
   678 which can match all the strings
       
   679 where C is tripped off.
       
   680 
       
   681 148
       
   682 00:07:20,030 --> 00:07:22,295
       
   683 Then we just have to
       
   684 take the alternatives
       
   685 
       
   686 149
       
   687 00:07:22,295 --> 00:07:24,965
       
   688 of these two derivatives.
       
   689 
       
   690 150
       
   691 00:07:24,965 --> 00:07:29,390
       
   692 Okay? Now to sequence case,
       
   693 
       
   694 151
       
   695 00:07:29,390 --> 00:07:33,005
       
   696 this sequence case is the
       
   697 most complicated one.
       
   698 
       
   699 152
       
   700 00:07:33,005 --> 00:07:35,180
       
   701 And let's look first at
       
   702 
       
   703 153
       
   704 00:07:35,180 --> 00:07:39,335
       
   705 the second case where
       
   706 the Earth's brush.
       
   707 
       
   708 154
       
   709 00:07:39,335 --> 00:07:42,830
       
   710 Okay? So if this
       
   711 regular expression
       
   712 
       
   713 155
       
   714 00:07:42,830 --> 00:07:46,145
       
   715 can match a string
       
   716 starting with C,
       
   717 
       
   718 156
       
   719 00:07:46,145 --> 00:07:48,155
       
   720 then the following
       
   721 must have happened.
       
   722 
       
   723 157
       
   724 00:07:48,155 --> 00:07:51,905
       
   725 First, the R1 must have matched
       
   726 a string starting with C
       
   727 
       
   728 158
       
   729 00:07:51,905 --> 00:07:56,420
       
   730 and then anything coming
       
   731 afterwards with r2.
       
   732 
       
   733 159
       
   734 00:07:56,420 --> 00:07:59,660
       
   735 Okay? So in this case I only
       
   736 
       
   737 160
       
   738 00:07:59,660 --> 00:08:03,260
       
   739 have to call this
       
   740 derivative for this r one.
       
   741 
       
   742 161
       
   743 00:08:03,260 --> 00:08:06,830
       
   744 And find the reg expression
       
   745 which can match everything
       
   746 
       
   747 162
       
   748 00:08:06,830 --> 00:08:11,555
       
   749 this R one can match except
       
   750 for this C chopped off.
       
   751 
       
   752 163
       
   753 00:08:11,555 --> 00:08:15,830
       
   754 And I have to build that
       
   755 sequence derivative of that.
       
   756 
       
   757 164
       
   758 00:08:15,830 --> 00:08:18,260
       
   759 So that's what the As per se.
       
   760 
       
   761 165
       
   762 00:08:18,260 --> 00:08:21,860
       
   763 So I take the
       
   764 derivative of R1 and I
       
   765 
       
   766 166
       
   767 00:08:21,860 --> 00:08:23,480
       
   768 put the R2 on
       
   769 
       
   770 167
       
   771 00:08:23,480 --> 00:08:25,865
       
   772 the back because that's
       
   773 the rest of the string.
       
   774 
       
   775 168
       
   776 00:08:25,865 --> 00:08:29,240
       
   777 Ok? So that's the only
       
   778 case we have to consider
       
   779 
       
   780 169
       
   781 00:08:29,240 --> 00:08:32,750
       
   782 this sequence case
       
   783 except if not the,
       
   784 
       
   785 170
       
   786 00:08:32,750 --> 00:08:37,895
       
   787 how one can match the
       
   788 sea and something else.
       
   789 
       
   790 171
       
   791 00:08:37,895 --> 00:08:42,965
       
   792 But if on mismatching
       
   793 actually the empty string,
       
   794 
       
   795 172
       
   796 00:08:42,965 --> 00:08:48,890
       
   797 this case actually the R two
       
   798 is in charge of matching
       
   799 
       
   800 173
       
   801 00:08:48,890 --> 00:08:51,590
       
   802 the string starting with C. So in
       
   803 
       
   804 174
       
   805 00:08:51,590 --> 00:08:55,490
       
   806 this case we have to call
       
   807 the derivative for R2.
       
   808 
       
   809 175
       
   810 00:08:55,490 --> 00:08:57,875
       
   811 So that's why we have
       
   812 these two cases.
       
   813 
       
   814 176
       
   815 00:08:57,875 --> 00:09:00,455
       
   816 So if R1 is nullable,
       
   817 
       
   818 177
       
   819 00:09:00,455 --> 00:09:03,245
       
   820 then it can match
       
   821 the empty string.
       
   822 
       
   823 178
       
   824 00:09:03,245 --> 00:09:05,330
       
   825 And we have to
       
   826 consider the case that
       
   827 
       
   828 179
       
   829 00:09:05,330 --> 00:09:08,045
       
   830 this R2 is matching a string
       
   831 
       
   832 180
       
   833 00:09:08,045 --> 00:09:10,700
       
   834 starting with C. And so we have
       
   835 
       
   836 181
       
   837 00:09:10,700 --> 00:09:14,210
       
   838 to call the derivative
       
   839 for this R2 in this case.
       
   840 
       
   841 182
       
   842 00:09:14,210 --> 00:09:18,680
       
   843 Otherwise, the R1 will
       
   844 be in charge of matching
       
   845 
       
   846 183
       
   847 00:09:18,680 --> 00:09:20,840
       
   848 a part of that
       
   849 string starting with
       
   850 
       
   851 184
       
   852 00:09:20,840 --> 00:09:24,695
       
   853 C. And I have to call the
       
   854 derivative on this R1.
       
   855 
       
   856 185
       
   857 00:09:24,695 --> 00:09:30,670
       
   858 Okay? I hope this makes sense.
       
   859 
       
   860 186
       
   861 00:09:30,670 --> 00:09:34,150
       
   862 Go over that and
       
   863 also the handouts.
       
   864 
       
   865 187
       
   866 00:09:34,150 --> 00:09:37,465
       
   867 Again. With that, that's
       
   868 it's really subtle.
       
   869 
       
   870 188
       
   871 00:09:37,465 --> 00:09:40,945
       
   872 And how do I remember this case?
       
   873 
       
   874 189
       
   875 00:09:40,945 --> 00:09:42,430
       
   876 Well, I know it's
       
   877 
       
   878 190
       
   879 00:09:42,430 --> 00:09:45,310
       
   880 an if condition and the
       
   881 condition is nullable,
       
   882 
       
   883 191
       
   884 00:09:45,310 --> 00:09:48,160
       
   885 then I will always remember
       
   886 
       
   887 192
       
   888 00:09:48,160 --> 00:09:53,275
       
   889 the else branch where pushes
       
   890 NG derivative over the R1.
       
   891 
       
   892 193
       
   893 00:09:53,275 --> 00:09:55,780
       
   894 So are usually write
       
   895 down the S punch
       
   896 
       
   897 194
       
   898 00:09:55,780 --> 00:09:59,650
       
   899 first and then construct
       
   900 the thin branch by saying,
       
   901 
       
   902 195
       
   903 00:09:59,650 --> 00:10:01,525
       
   904 well, I just repeat this.
       
   905 
       
   906 196
       
   907 00:10:01,525 --> 00:10:03,760
       
   908 And I have to remember
       
   909 in that case I
       
   910 
       
   911 197
       
   912 00:10:03,760 --> 00:10:06,580
       
   913 have to build also
       
   914 derivative of R2.
       
   915 
       
   916 198
       
   917 00:10:06,580 --> 00:10:12,695
       
   918 Okay? Finally, the star case.
       
   919 
       
   920 199
       
   921 00:10:12,695 --> 00:10:15,665
       
   922 Ok. So here again
       
   923 we're looking for
       
   924 
       
   925 200
       
   926 00:10:15,665 --> 00:10:17,300
       
   927 a reg expression which
       
   928 
       
   929 201
       
   930 00:10:17,300 --> 00:10:19,745
       
   931 can match the string
       
   932 starting with C,
       
   933 
       
   934 202
       
   935 00:10:19,745 --> 00:10:22,355
       
   936 except that the c has
       
   937 been chopped off.
       
   938 
       
   939 203
       
   940 00:10:22,355 --> 00:10:28,640
       
   941 So if r star has to match
       
   942 a string starting with C,
       
   943 
       
   944 204
       
   945 00:10:28,640 --> 00:10:32,735
       
   946 then at least we need one
       
   947 copy of this r. Okay?
       
   948 
       
   949 205
       
   950 00:10:32,735 --> 00:10:34,310
       
   951 So at least one copy of
       
   952 
       
   953 206
       
   954 00:10:34,310 --> 00:10:37,010
       
   955 this r has matched
       
   956 something which starts with
       
   957 
       
   958 207
       
   959 00:10:37,010 --> 00:10:38,870
       
   960 a C and then afterwards
       
   961 
       
   962 208
       
   963 00:10:38,870 --> 00:10:41,570
       
   964 come 0 more copies
       
   965 of this OnStar.
       
   966 
       
   967 209
       
   968 00:10:41,570 --> 00:10:45,530
       
   969 Okay? What we do there
       
   970 is we trying to find
       
   971 
       
   972 210
       
   973 00:10:45,530 --> 00:10:47,960
       
   974 the Rekha expression
       
   975 which can match
       
   976 
       
   977 211
       
   978 00:10:47,960 --> 00:10:50,915
       
   979 this first part of the string.
       
   980 
       
   981 212
       
   982 00:10:50,915 --> 00:10:53,255
       
   983 However, where the
       
   984 sea is chopped off.
       
   985 
       
   986 213
       
   987 00:10:53,255 --> 00:10:55,130
       
   988 And then we just say, well,
       
   989 
       
   990 214
       
   991 00:10:55,130 --> 00:10:57,050
       
   992 the rest has to be
       
   993 matched again with
       
   994 
       
   995 215
       
   996 00:10:57,050 --> 00:10:59,030
       
   997 0 or more copies of
       
   998 
       
   999 216
       
  1000 00:10:59,030 --> 00:11:02,600
       
  1001 this R. So that's why
       
  1002 it's defined like this.
       
  1003 
       
  1004 217
       
  1005 00:11:02,600 --> 00:11:09,050
       
  1006 Okay? S8. Please take care
       
  1007 with this definition.
       
  1008 
       
  1009 218
       
  1010 00:11:09,050 --> 00:11:11,435
       
  1011 That's not so simple.
       
  1012 
       
  1013 219
       
  1014 00:11:11,435 --> 00:11:13,250
       
  1015 Once you get a hang of it,
       
  1016 
       
  1017 220
       
  1018 00:11:13,250 --> 00:11:15,170
       
  1019 however, it makes perfect sense.
       
  1020 
       
  1021 221
       
  1022 00:11:15,170 --> 00:11:17,210
       
  1023 So let me explain it in
       
  1024 
       
  1025 222
       
  1026 00:11:17,210 --> 00:11:20,825
       
  1027 different ways in
       
  1028 the next slides.
       
  1029 
       
  1030 223
       
  1031 00:11:20,825 --> 00:11:24,695
       
  1032 Okay, let's look
       
  1033 first some examples.
       
  1034 
       
  1035 224
       
  1036 00:11:24,695 --> 00:11:27,140
       
  1037 So here is a regular expression,
       
  1038 
       
  1039 225
       
  1040 00:11:27,140 --> 00:11:29,390
       
  1041 R. And let's have
       
  1042 
       
  1043 226
       
  1044 00:11:29,390 --> 00:11:32,450
       
  1045 a look at these three
       
  1046 derivatives according to a,
       
  1047 
       
  1048 227
       
  1049 00:11:32,450 --> 00:11:38,405
       
  1050 B, and C. And Vishal do
       
  1051 with d1 for the a. Ok.
       
  1052 
       
  1053 228
       
  1054 00:11:38,405 --> 00:11:42,379
       
  1055 So here is our reg expression
       
  1056 
       
  1057 229
       
  1058 00:11:42,379 --> 00:11:45,334
       
  1059 and was very generous
       
  1060 with dependent a thesis.
       
  1061 
       
  1062 230
       
  1063 00:11:45,334 --> 00:11:48,140
       
  1064 And the outermost is a star.
       
  1065 
       
  1066 231
       
  1067 00:11:48,140 --> 00:11:52,550
       
  1068 So if people now the
       
  1069 derivative according to a,
       
  1070 
       
  1071 232
       
  1072 00:11:52,550 --> 00:11:55,474
       
  1073 the character a of
       
  1074 that wreck expression.
       
  1075 
       
  1076 233
       
  1077 00:11:55,474 --> 00:11:57,380
       
  1078 Okay? So the first thing we
       
  1079 
       
  1080 234
       
  1081 00:11:57,380 --> 00:11:59,555
       
  1082 have to analyze is the K star.
       
  1083 
       
  1084 235
       
  1085 00:11:59,555 --> 00:12:04,370
       
  1086 Ok? So here's direct expression,
       
  1087 which we are looking at.
       
  1088 
       
  1089 236
       
  1090 00:12:04,370 --> 00:12:09,170
       
  1091 This are the outermost
       
  1092 constructor is this star.
       
  1093 
       
  1094 237
       
  1095 00:12:09,170 --> 00:12:11,510
       
  1096 If you go back to the definition,
       
  1097 
       
  1098 238
       
  1099 00:12:11,510 --> 00:12:13,625
       
  1100 I hope you have it next to you,
       
  1101 
       
  1102 239
       
  1103 00:12:13,625 --> 00:12:16,340
       
  1104 then this star case is defined
       
  1105 
       
  1106 240
       
  1107 00:12:16,340 --> 00:12:20,000
       
  1108 as u taking just the
       
  1109 inside of the star
       
  1110 
       
  1111 241
       
  1112 00:12:20,000 --> 00:12:23,030
       
  1113 and apply this derivative and
       
  1114 
       
  1115 242
       
  1116 00:12:23,030 --> 00:12:26,765
       
  1117 leave the are on the
       
  1118 outside at the end.
       
  1119 
       
  1120 243
       
  1121 00:12:26,765 --> 00:12:29,990
       
  1122 Ok. So that's the first step.
       
  1123 
       
  1124 244
       
  1125 00:12:29,990 --> 00:12:32,030
       
  1126 Now we have to analyze
       
  1127 
       
  1128 245
       
  1129 00:12:32,030 --> 00:12:36,035
       
  1130 the derivative according to
       
  1131 a of this record expression,
       
  1132 
       
  1133 246
       
  1134 00:12:36,035 --> 00:12:38,000
       
  1135 which is an alternative.
       
  1136 
       
  1137 247
       
  1138 00:12:38,000 --> 00:12:39,665
       
  1139 So the outermost is a plus.
       
  1140 
       
  1141 248
       
  1142 00:12:39,665 --> 00:12:41,375
       
  1143 Ok, that's very easy again,
       
  1144 
       
  1145 249
       
  1146 00:12:41,375 --> 00:12:45,185
       
  1147 we just have to push the
       
  1148 derivative into each component,
       
  1149 
       
  1150 250
       
  1151 00:12:45,185 --> 00:12:47,705
       
  1152 into the a, followed by b.
       
  1153 
       
  1154 251
       
  1155 00:12:47,705 --> 00:12:49,145
       
  1156 And in this segment,
       
  1157 
       
  1158 252
       
  1159 00:12:49,145 --> 00:12:51,185
       
  1160 alternative into b. Ok,
       
  1161 
       
  1162 253
       
  1163 00:12:51,185 --> 00:12:56,030
       
  1164 so we take the derivative
       
  1165 of each according to a way.
       
  1166 
       
  1167 254
       
  1168 00:12:56,030 --> 00:13:00,635
       
  1169 Now this one is a sequence
       
  1170 break expression.
       
  1171 
       
  1172 255
       
  1173 00:13:00,635 --> 00:13:02,210
       
  1174 This most complicated case.
       
  1175 
       
  1176 256
       
  1177 00:13:02,210 --> 00:13:04,160
       
  1178 So the first of all
       
  1179 you have to test is,
       
  1180 
       
  1181 257
       
  1182 00:13:04,160 --> 00:13:07,910
       
  1183 is the first component
       
  1184 nullable of this sequence?
       
  1185 
       
  1186 258
       
  1187 00:13:07,910 --> 00:13:09,200
       
  1188 Well, that is a,
       
  1189 
       
  1190 259
       
  1191 00:13:09,200 --> 00:13:12,740
       
  1192 in this case, a on its
       
  1193 own is not nullable.
       
  1194 
       
  1195 260
       
  1196 00:13:12,740 --> 00:13:14,210
       
  1197 So vn, the easy case,
       
  1198 
       
  1199 261
       
  1200 00:13:14,210 --> 00:13:17,000
       
  1201 we only have a single derivative
       
  1202 
       
  1203 262
       
  1204 00:13:17,000 --> 00:13:19,370
       
  1205 pushed in the first component.
       
  1206 
       
  1207 263
       
  1208 00:13:19,370 --> 00:13:25,160
       
  1209 So we have the derivative
       
  1210 of a with the character a.
       
  1211 
       
  1212 264
       
  1213 00:13:25,160 --> 00:13:27,920
       
  1214 Okay, that's now
       
  1215 the character case.
       
  1216 
       
  1217 265
       
  1218 00:13:27,920 --> 00:13:29,720
       
  1219 And in this case the character
       
  1220 
       
  1221 266
       
  1222 00:13:29,720 --> 00:13:31,715
       
  1223 in the regular expression agree.
       
  1224 
       
  1225 267
       
  1226 00:13:31,715 --> 00:13:33,890
       
  1227 So it's defined as one.
       
  1228 
       
  1229 268
       
  1230 00:13:33,890 --> 00:13:37,550
       
  1231 Ok? In the other case,
       
  1232 
       
  1233 269
       
  1234 00:13:37,550 --> 00:13:39,710
       
  1235 the break expression is P,
       
  1236 
       
  1237 270
       
  1238 00:13:39,710 --> 00:13:41,675
       
  1239 But the characters a.
       
  1240 
       
  1241 271
       
  1242 00:13:41,675 --> 00:13:46,385
       
  1243 So in this case
       
  1244 it's defined as 0.
       
  1245 
       
  1246 272
       
  1247 00:13:46,385 --> 00:13:50,630
       
  1248 Okay? So that's what the
       
  1249 derivative would be.
       
  1250 
       
  1251 273
       
  1252 00:13:50,630 --> 00:13:52,160
       
  1253 This r is there
       
  1254 
       
  1255 274
       
  1256 00:13:52,160 --> 00:13:55,280
       
  1257 because originally we
       
  1258 started with a star.
       
  1259 
       
  1260 275
       
  1261 00:13:55,280 --> 00:13:58,295
       
  1262 This expression is that
       
  1263 star at expression.
       
  1264 
       
  1265 276
       
  1266 00:13:58,295 --> 00:14:02,780
       
  1267 Ok? So the derivative
       
  1268 according to a
       
  1269 
       
  1270 277
       
  1271 00:14:02,780 --> 00:14:07,610
       
  1272 up that reg expression
       
  1273 is this expression.
       
  1274 
       
  1275 278
       
  1276 00:14:07,610 --> 00:14:10,970
       
  1277 We just have to
       
  1278 substitute this back in.
       
  1279 
       
  1280 279
       
  1281 00:14:10,970 --> 00:14:13,745
       
  1282 Just coming back to this slide.
       
  1283 
       
  1284 280
       
  1285 00:14:13,745 --> 00:14:16,160
       
  1286 So far, they're only analyze
       
  1287 
       
  1288 281
       
  1289 00:14:16,160 --> 00:14:19,505
       
  1290 the derivative according
       
  1291 to a single character.
       
  1292 
       
  1293 282
       
  1294 00:14:19,505 --> 00:14:23,960
       
  1295 But we can also very easily
       
  1296 extend that to whole strings.
       
  1297 
       
  1298 283
       
  1299 00:14:23,960 --> 00:14:26,360
       
  1300 So if you build the
       
  1301 derivative according
       
  1302 
       
  1303 284
       
  1304 00:14:26,360 --> 00:14:27,905
       
  1305 to the empty string,
       
  1306 
       
  1307 285
       
  1308 00:14:27,905 --> 00:14:30,875
       
  1309 we just return the
       
  1310 Rekha expression.
       
  1311 
       
  1312 286
       
  1313 00:14:30,875 --> 00:14:35,585
       
  1314 If we have a string
       
  1315 starting with character c,
       
  1316 
       
  1317 287
       
  1318 00:14:35,585 --> 00:14:37,850
       
  1319 remember that can
       
  1320 be any character.
       
  1321 
       
  1322 288
       
  1323 00:14:37,850 --> 00:14:42,170
       
  1324 Then we build first the simple
       
  1325 derivative according to
       
  1326 
       
  1327 289
       
  1328 00:14:42,170 --> 00:14:44,360
       
  1329 that first character and
       
  1330 
       
  1331 290
       
  1332 00:14:44,360 --> 00:14:46,925
       
  1333 continue with the
       
  1334 rest of the string.
       
  1335 
       
  1336 291
       
  1337 00:14:46,925 --> 00:14:50,615
       
  1338 So here you see again,
       
  1339 my personal convention.
       
  1340 
       
  1341 292
       
  1342 00:14:50,615 --> 00:14:54,365
       
  1343 Everything which works on
       
  1344 lists has this S at the end.
       
  1345 
       
  1346 293
       
  1347 00:14:54,365 --> 00:14:57,125
       
  1348 So this function is
       
  1349 for single characters.
       
  1350 
       
  1351 294
       
  1352 00:14:57,125 --> 00:14:59,179
       
  1353 This one is for strings,
       
  1354 
       
  1355 295
       
  1356 00:14:59,179 --> 00:15:02,450
       
  1357 but it uses the one
       
  1358 for the character.
       
  1359 
       
  1360 296
       
  1361 00:15:02,450 --> 00:15:04,025
       
  1362 Essentially what it does is
       
  1363 
       
  1364 297
       
  1365 00:15:04,025 --> 00:15:06,185
       
  1366 it chops off the first character,
       
  1367 
       
  1368 298
       
  1369 00:15:06,185 --> 00:15:09,800
       
  1370 builds the derivative, then
       
  1371 chops off the next character,
       
  1372 
       
  1373 299
       
  1374 00:15:09,800 --> 00:15:13,760
       
  1375 builds the derivative of
       
  1376 the result, and so on.
       
  1377 
       
  1378 300
       
  1379 00:15:13,760 --> 00:15:17,000
       
  1380 Having this function,
       
  1381 we can actually now
       
  1382 
       
  1383 301
       
  1384 00:15:17,000 --> 00:15:20,600
       
  1385 state what the algorithm
       
  1386 is, the complete algorithm.
       
  1387 
       
  1388 302
       
  1389 00:15:20,600 --> 00:15:23,465
       
  1390 So the pro shops ki mantra
       
  1391 
       
  1392 303
       
  1393 00:15:23,465 --> 00:15:24,860
       
  1394 takes a regular expression as
       
  1395 
       
  1396 304
       
  1397 00:15:24,860 --> 00:15:26,915
       
  1398 argument and a
       
  1399 string as argument.
       
  1400 
       
  1401 305
       
  1402 00:15:26,915 --> 00:15:30,920
       
  1403 And is supposed to say yes if
       
  1404 the reg expression matches
       
  1405 
       
  1406 306
       
  1407 00:15:30,920 --> 00:15:33,560
       
  1408 the string or No
       
  1409 
       
  1410 307
       
  1411 00:15:33,560 --> 00:15:36,065
       
  1412 if the reg expression does
       
  1413 not match the string.
       
  1414 
       
  1415 308
       
  1416 00:15:36,065 --> 00:15:37,715
       
  1417 And how does it do that?
       
  1418 
       
  1419 309
       
  1420 00:15:37,715 --> 00:15:42,560
       
  1421 Well, it takes this string
       
  1422 s And this reg expression,
       
  1423 
       
  1424 310
       
  1425 00:15:42,560 --> 00:15:43,925
       
  1426 and it first built
       
  1427 
       
  1428 311
       
  1429 00:15:43,925 --> 00:15:48,845
       
  1430 successive derivatives until
       
  1431 that string is exhaust that.
       
  1432 
       
  1433 312
       
  1434 00:15:48,845 --> 00:15:52,115
       
  1435 Okay? Then you have
       
  1436 a final derivative,
       
  1437 
       
  1438 313
       
  1439 00:15:52,115 --> 00:15:53,839
       
  1440 a final reg expression.
       
  1441 
       
  1442 314
       
  1443 00:15:53,839 --> 00:15:55,370
       
  1444 And you test whether
       
  1445 
       
  1446 315
       
  1447 00:15:55,370 --> 00:15:57,920
       
  1448 this reg expression can
       
  1449 match the empty string.
       
  1450 
       
  1451 316
       
  1452 00:15:57,920 --> 00:16:01,370
       
  1453 If yes, then the
       
  1454 original reg expression
       
  1455 
       
  1456 317
       
  1457 00:16:01,370 --> 00:16:03,245
       
  1458 is r can match the string.
       
  1459 
       
  1460 318
       
  1461 00:16:03,245 --> 00:16:05,210
       
  1462 If no, if it cannot match
       
  1463 
       
  1464 319
       
  1465 00:16:05,210 --> 00:16:08,030
       
  1466 the final derivative
       
  1467 with the empty string,
       
  1468 
       
  1469 320
       
  1470 00:16:08,030 --> 00:16:10,280
       
  1471 then know this
       
  1472 regular expression,
       
  1473 
       
  1474 321
       
  1475 00:16:10,280 --> 00:16:12,905
       
  1476 R cannot match that string.
       
  1477 
       
  1478 322
       
  1479 00:16:12,905 --> 00:16:16,010
       
  1480 I know it looks
       
  1481 very anticlimactic,
       
  1482 
       
  1483 323
       
  1484 00:16:16,010 --> 00:16:19,625
       
  1485 but that's actually the
       
  1486 beauty of this algorithm,
       
  1487 
       
  1488 324
       
  1489 00:16:19,625 --> 00:16:22,760
       
  1490 that it's not that complicated.
       
  1491 
       
  1492 325
       
  1493 00:16:22,760 --> 00:16:25,340
       
  1494 So how does the algorithm work?
       
  1495 
       
  1496 326
       
  1497 00:16:25,340 --> 00:16:27,634
       
  1498 In a concrete example?
       
  1499 
       
  1500 327
       
  1501 00:16:27,634 --> 00:16:31,580
       
  1502 Imagine you have a string, abc
       
  1503 
       
  1504 328
       
  1505 00:16:31,580 --> 00:16:34,370
       
  1506 and you have a break
       
  1507 expression, say R1.
       
  1508 
       
  1509 329
       
  1510 00:16:34,370 --> 00:16:37,520
       
  1511 And you want to find out
       
  1512 whether this or one can match
       
  1513 
       
  1514 330
       
  1515 00:16:37,520 --> 00:16:41,300
       
  1516 that string abc or not.
       
  1517 How do you do that?
       
  1518 
       
  1519 331
       
  1520 00:16:41,300 --> 00:16:45,140
       
  1521 Well, you would first
       
  1522 take this R1 and you
       
  1523 
       
  1524 332
       
  1525 00:16:45,140 --> 00:16:47,150
       
  1526 build the derivative according
       
  1527 
       
  1528 333
       
  1529 00:16:47,150 --> 00:16:49,880
       
  1530 to the first character, D-A.
       
  1531 
       
  1532 334
       
  1533 00:16:49,880 --> 00:16:53,015
       
  1534 Okay? You get the derivative,
       
  1535 
       
  1536 335
       
  1537 00:16:53,015 --> 00:16:55,294
       
  1538 which I call here R2.
       
  1539 
       
  1540 336
       
  1541 00:16:55,294 --> 00:16:58,640
       
  1542 Then you take the next
       
  1543 character, the B.
       
  1544 
       
  1545 337
       
  1546 00:16:58,640 --> 00:17:04,535
       
  1547 You now build the derivative
       
  1548 according to b of this R2.
       
  1549 
       
  1550 338
       
  1551 00:17:04,535 --> 00:17:07,460
       
  1552 Okay? So you take the
       
  1553 result of the first step,
       
  1554 
       
  1555 339
       
  1556 00:17:07,460 --> 00:17:09,530
       
  1557 you feed it into the second step,
       
  1558 
       
  1559 340
       
  1560 00:17:09,530 --> 00:17:11,810
       
  1561 and you take the
       
  1562 second character.
       
  1563 
       
  1564 341
       
  1565 00:17:11,810 --> 00:17:17,075
       
  1566 Then you do this also with c.
       
  1567 So you get a derivative R3,
       
  1568 
       
  1569 342
       
  1570 00:17:17,075 --> 00:17:22,460
       
  1571 and you build the derivative
       
  1572 of R three according to c,
       
  1573 
       
  1574 343
       
  1575 00:17:22,460 --> 00:17:24,185
       
  1576 you get an R four.
       
  1577 
       
  1578 344
       
  1579 00:17:24,185 --> 00:17:26,300
       
  1580 Okay, so that's the
       
  1581 final derivative.
       
  1582 
       
  1583 345
       
  1584 00:17:26,300 --> 00:17:27,530
       
  1585 The string is exhausted.
       
  1586 
       
  1587 346
       
  1588 00:17:27,530 --> 00:17:29,570
       
  1589 We build derivatives
       
  1590 according to a, B,
       
  1591 
       
  1592 347
       
  1593 00:17:29,570 --> 00:17:34,610
       
  1594 and C. Now we just test whether
       
  1595 this r four is nullable.
       
  1596 
       
  1597 348
       
  1598 00:17:34,610 --> 00:17:37,175
       
  1599 If it says yes,
       
  1600 
       
  1601 349
       
  1602 00:17:37,175 --> 00:17:41,510
       
  1603 then df break expression metro
       
  1604 will just say true, yes,
       
  1605 
       
  1606 350
       
  1607 00:17:41,510 --> 00:17:43,340
       
  1608 this original reg expression,
       
  1609 
       
  1610 351
       
  1611 00:17:43,340 --> 00:17:47,270
       
  1612 the R1, will be able to
       
  1613 match that string abc.
       
  1614 
       
  1615 352
       
  1616 00:17:47,270 --> 00:17:50,585
       
  1617 And if this test returns false,
       
  1618 
       
  1619 353
       
  1620 00:17:50,585 --> 00:17:53,015
       
  1621 then the algorithm says false.
       
  1622 
       
  1623 354
       
  1624 00:17:53,015 --> 00:17:56,975
       
  1625 This reg expression will
       
  1626 not match the string.
       
  1627 
       
  1628 355
       
  1629 00:17:56,975 --> 00:18:00,260
       
  1630 Ok, you might ask
       
  1631 why on earth does
       
  1632 
       
  1633 356
       
  1634 00:18:00,260 --> 00:18:02,960
       
  1635 that algorithm
       
  1636 actually work away?
       
  1637 
       
  1638 357
       
  1639 00:18:02,960 --> 00:18:06,515
       
  1640 Here's an explanation
       
  1641 for why it works.
       
  1642 
       
  1643 358
       
  1644 00:18:06,515 --> 00:18:10,190
       
  1645 Imagine you have a wreck
       
  1646 expression R1, okay?
       
  1647 
       
  1648 359
       
  1649 00:18:10,190 --> 00:18:13,220
       
  1650 And you have a string abc,
       
  1651 
       
  1652 360
       
  1653 00:18:13,220 --> 00:18:14,270
       
  1654 and you want to find out
       
  1655 
       
  1656 361
       
  1657 00:18:14,270 --> 00:18:17,180
       
  1658 whether one can
       
  1659 match that string.
       
  1660 
       
  1661 362
       
  1662 00:18:17,180 --> 00:18:18,799
       
  1663 And for the moment,
       
  1664 
       
  1665 363
       
  1666 00:18:18,799 --> 00:18:22,610
       
  1667 let's assume that it
       
  1668 can match that string.
       
  1669 
       
  1670 364
       
  1671 00:18:22,610 --> 00:18:26,315
       
  1672 Ok? So the language L of
       
  1673 
       
  1674 365
       
  1675 00:18:26,315 --> 00:18:30,185
       
  1676 R will actually
       
  1677 contain that string,
       
  1678 
       
  1679 366
       
  1680 00:18:30,185 --> 00:18:31,805
       
  1681 otherwise it wouldn't match that.
       
  1682 
       
  1683 367
       
  1684 00:18:31,805 --> 00:18:36,710
       
  1685 Okay? So ABC is in
       
  1686 this language, okay?
       
  1687 
       
  1688 368
       
  1689 00:18:36,710 --> 00:18:39,785
       
  1690 If I now take the
       
  1691 semantic derivative,
       
  1692 
       
  1693 369
       
  1694 00:18:39,785 --> 00:18:43,145
       
  1695 that means I look at all
       
  1696 the strings in this f,
       
  1697 
       
  1698 370
       
  1699 00:18:43,145 --> 00:18:46,820
       
  1700 R1, and further out
       
  1701 
       
  1702 371
       
  1703 00:18:46,820 --> 00:18:48,740
       
  1704 all the ones which
       
  1705 do not start with
       
  1706 
       
  1707 372
       
  1708 00:18:48,740 --> 00:18:51,260
       
  1709 an a, I discharge them.
       
  1710 
       
  1711 373
       
  1712 00:18:51,260 --> 00:18:54,545
       
  1713 And I only look the one
       
  1714 which start with an a.
       
  1715 
       
  1716 374
       
  1717 00:18:54,545 --> 00:18:56,465
       
  1718 And of those strings,
       
  1719 
       
  1720 375
       
  1721 00:18:56,465 --> 00:18:58,475
       
  1722 I chop off this a.
       
  1723 
       
  1724 376
       
  1725 00:18:58,475 --> 00:19:01,025
       
  1726 So after this
       
  1727 romantic derivative,
       
  1728 
       
  1729 377
       
  1730 00:19:01,025 --> 00:19:05,735
       
  1731 this set of strings will
       
  1732 contain just B and C.
       
  1733 
       
  1734 378
       
  1735 00:19:05,735 --> 00:19:12,830
       
  1736 Ok. Now if I build the next
       
  1737 semantic derivative of that,
       
  1738 
       
  1739 379
       
  1740 00:19:12,830 --> 00:19:14,345
       
  1741 then I would look at
       
  1742 
       
  1743 380
       
  1744 00:19:14,345 --> 00:19:16,850
       
  1745 all the strings which
       
  1746 start with a P,
       
  1747 
       
  1748 381
       
  1749 00:19:16,850 --> 00:19:21,350
       
  1750 and forget about everything
       
  1751 else of the ones.
       
  1752 
       
  1753 382
       
  1754 00:19:21,350 --> 00:19:27,905
       
  1755 I know they start with B.
       
  1756 I just chop of the B. Ok.
       
  1757 
       
  1758 383
       
  1759 00:19:27,905 --> 00:19:31,655
       
  1760 So in this whole set here,
       
  1761 
       
  1762 384
       
  1763 00:19:31,655 --> 00:19:33,785
       
  1764 in this whole set here,
       
  1765 
       
  1766 385
       
  1767 00:19:33,785 --> 00:19:39,030
       
  1768 there will be now a string
       
  1769 which is just c. Okay?
       
  1770 
       
  1771 386
       
  1772 00:19:39,190 --> 00:19:44,420
       
  1773 Then I built the third
       
  1774 semantic derivative
       
  1775 
       
  1776 387
       
  1777 00:19:44,420 --> 00:19:47,300
       
  1778 because I want to find out
       
  1779 whether ABC is involved.
       
  1780 
       
  1781 388
       
  1782 00:19:47,300 --> 00:19:50,540
       
  1783 Okay? So now I look
       
  1784 at all the strings in
       
  1785 
       
  1786 389
       
  1787 00:19:50,540 --> 00:19:52,820
       
  1788 here and look at
       
  1789 
       
  1790 390
       
  1791 00:19:52,820 --> 00:19:55,340
       
  1792 them whether they start
       
  1793 with a C. If yes,
       
  1794 
       
  1795 391
       
  1796 00:19:55,340 --> 00:19:56,885
       
  1797 I chop off the sea.
       
  1798 
       
  1799 392
       
  1800 00:19:56,885 --> 00:19:59,120
       
  1801 And put in markets remaining.
       
  1802 
       
  1803 393
       
  1804 00:19:59,120 --> 00:20:00,425
       
  1805 So in this case,
       
  1806 
       
  1807 394
       
  1808 00:20:00,425 --> 00:20:02,510
       
  1809 if I have the string c
       
  1810 
       
  1811 395
       
  1812 00:20:02,510 --> 00:20:04,550
       
  1813 in this language and
       
  1814 I chop off this,
       
  1815 
       
  1816 396
       
  1817 00:20:04,550 --> 00:20:07,700
       
  1818 see what is remaining
       
  1819 is the empty string.
       
  1820 
       
  1821 397
       
  1822 00:20:07,700 --> 00:20:09,695
       
  1823 So we have to check of
       
  1824 
       
  1825 398
       
  1826 00:20:09,695 --> 00:20:14,510
       
  1827 that language whether it
       
  1828 contains the empty string.
       
  1829 
       
  1830 399
       
  1831 00:20:14,510 --> 00:20:18,800
       
  1832 If yes, then the
       
  1833 original R1 can match
       
  1834 
       
  1835 400
       
  1836 00:20:18,800 --> 00:20:21,050
       
  1837 this ABC because this ABC
       
  1838 
       
  1839 401
       
  1840 00:20:21,050 --> 00:20:24,119
       
  1841 must have been in this language.
       
  1842 
       
  1843 402
       
  1844 00:20:24,130 --> 00:20:28,565
       
  1845 And if in the end there wasn't
       
  1846 the empty string, then,
       
  1847 
       
  1848 403
       
  1849 00:20:28,565 --> 00:20:33,575
       
  1850 then this ABC Watson in
       
  1851 this language of one.
       
  1852 
       
  1853 404
       
  1854 00:20:33,575 --> 00:20:36,260
       
  1855 And so the electron must have,
       
  1856 
       
  1857 405
       
  1858 00:20:36,260 --> 00:20:38,880
       
  1859 or the metro must have failed.
       
  1860 
       
  1861 406
       
  1862 00:20:39,040 --> 00:20:42,530
       
  1863 The clever bit is that here
       
  1864 
       
  1865 407
       
  1866 00:20:42,530 --> 00:20:45,530
       
  1867 the explanation is for languages.
       
  1868 
       
  1869 408
       
  1870 00:20:45,530 --> 00:20:49,835
       
  1871 Remember, this
       
  1872 semantic derivative
       
  1873 
       
  1874 409
       
  1875 00:20:49,835 --> 00:20:53,450
       
  1876 works over languages and they
       
  1877 sometimes can be in finite.
       
  1878 
       
  1879 410
       
  1880 00:20:53,450 --> 00:20:55,730
       
  1881 So that's not really
       
  1882 an algorithm.
       
  1883 
       
  1884 411
       
  1885 00:20:55,730 --> 00:20:58,880
       
  1886 Yeah, that's just
       
  1887 explaining the idea with
       
  1888 
       
  1889 412
       
  1890 00:20:58,880 --> 00:21:02,525
       
  1891 preserves key
       
  1892 achieved was that he
       
  1893 
       
  1894 413
       
  1895 00:21:02,525 --> 00:21:06,440
       
  1896 now works with this derivative
       
  1897 America expressions and
       
  1898 
       
  1899 414
       
  1900 00:21:06,440 --> 00:21:10,715
       
  1901 somehow imitates what
       
  1902 happens on these languages.
       
  1903 
       
  1904 415
       
  1905 00:21:10,715 --> 00:21:14,135
       
  1906 Because remember if you
       
  1907 have an wreck expression,
       
  1908 
       
  1909 416
       
  1910 00:21:14,135 --> 00:21:17,405
       
  1911 are you want to test
       
  1912 whether can match APC,
       
  1913 
       
  1914 417
       
  1915 00:21:17,405 --> 00:21:22,550
       
  1916 then you take first
       
  1917 derivative according to a.
       
  1918 
       
  1919 418
       
  1920 00:21:22,550 --> 00:21:25,760
       
  1921 So you will get a wreck
       
  1922 expression which can match b
       
  1923 
       
  1924 419
       
  1925 00:21:25,760 --> 00:21:29,464
       
  1926 and c If R could match abc.
       
  1927 
       
  1928 420
       
  1929 00:21:29,464 --> 00:21:31,430
       
  1930 So after the first derivative,
       
  1931 
       
  1932 421
       
  1933 00:21:31,430 --> 00:21:33,620
       
  1934 you will get a wreck expression
       
  1935 which can match B and
       
  1936 
       
  1937 422
       
  1938 00:21:33,620 --> 00:21:37,070
       
  1939 C. If you take the
       
  1940 second derivative,
       
  1941 
       
  1942 423
       
  1943 00:21:37,070 --> 00:21:41,225
       
  1944 you will get a reexpression
       
  1945 which can match c alone.
       
  1946 
       
  1947 424
       
  1948 00:21:41,225 --> 00:21:44,180
       
  1949 And if you take the
       
  1950 final derivative,
       
  1951 
       
  1952 425
       
  1953 00:21:44,180 --> 00:21:46,070
       
  1954 then you will get
       
  1955 
       
  1956 426
       
  1957 00:21:46,070 --> 00:21:48,260
       
  1958 rec expression which hopefully
       
  1959 
       
  1960 427
       
  1961 00:21:48,260 --> 00:21:49,715
       
  1962 can match the empty string.
       
  1963 
       
  1964 428
       
  1965 00:21:49,715 --> 00:21:53,780
       
  1966 If it does, then this
       
  1967 R can match the ABC.
       
  1968 
       
  1969 429
       
  1970 00:21:53,780 --> 00:21:55,655
       
  1971 And if it doesn't, then
       
  1972 
       
  1973 430
       
  1974 00:21:55,655 --> 00:21:58,680
       
  1975 ABC couldn't be
       
  1976 matched by this on.
       
  1977 
       
  1978 431
       
  1979 00:21:58,900 --> 00:22:02,990
       
  1980 Okay, let's have a look
       
  1981 how this pans out in code.
       
  1982 
       
  1983 432
       
  1984 00:22:02,990 --> 00:22:06,050
       
  1985 Here's defile RE1.
       
  1986 
       
  1987 433
       
  1988 00:22:06,050 --> 00:22:07,940
       
  1989 It's also uploaded on Keith,
       
  1990 
       
  1991 434
       
  1992 00:22:07,940 --> 00:22:10,625
       
  1993 so you can see exactly
       
  1994 what I'm doing.
       
  1995 
       
  1996 435
       
  1997 00:22:10,625 --> 00:22:13,970
       
  1998 And actually I already saw
       
  1999 that file because I showed you
       
  2000 
       
  2001 436
       
  2002 00:22:13,970 --> 00:22:15,710
       
  2003 how my wreck expressions are
       
  2004 
       
  2005 437
       
  2006 00:22:15,710 --> 00:22:17,960
       
  2007 defined with the
       
  2008 abstract classes here.
       
  2009 
       
  2010 438
       
  2011 00:22:17,960 --> 00:22:21,155
       
  2012 And here, the six cases
       
  2013 for 0-1 character,
       
  2014 
       
  2015 439
       
  2016 00:22:21,155 --> 00:22:23,540
       
  2017 I turn a TIF in
       
  2018 sequence and star.
       
  2019 
       
  2020 440
       
  2021 00:22:23,540 --> 00:22:26,705
       
  2022 Ok. So the first
       
  2023 function nullable,
       
  2024 
       
  2025 441
       
  2026 00:22:26,705 --> 00:22:28,760
       
  2027 the simple one, takes
       
  2028 
       
  2029 442
       
  2030 00:22:28,760 --> 00:22:32,120
       
  2031 a regular expression as
       
  2032 argument and returns a boolean.
       
  2033 
       
  2034 443
       
  2035 00:22:32,120 --> 00:22:34,280
       
  2036 And then with this
       
  2037 pattern matching,
       
  2038 
       
  2039 444
       
  2040 00:22:34,280 --> 00:22:37,040
       
  2041 we just go through
       
  2042 all these six cases
       
  2043 
       
  2044 445
       
  2045 00:22:37,040 --> 00:22:38,900
       
  2046 are serious defined as false.
       
  2047 
       
  2048 446
       
  2049 00:22:38,900 --> 00:22:43,234
       
  2050 One is defined as true
       
  2051 character for any character,
       
  2052 
       
  2053 447
       
  2054 00:22:43,234 --> 00:22:45,455
       
  2055 this null return false.
       
  2056 
       
  2057 448
       
  2058 00:22:45,455 --> 00:22:47,540
       
  2059 The alternative is to find here,
       
  2060 
       
  2061 449
       
  2062 00:22:47,540 --> 00:22:50,000
       
  2063 so that's the or in Scala.
       
  2064 
       
  2065 450
       
  2066 00:22:50,000 --> 00:22:52,700
       
  2067 And for the sequence,
       
  2068 that's the end.
       
  2069 
       
  2070 451
       
  2071 00:22:52,700 --> 00:22:56,180
       
  2072 And this star, no matter
       
  2073 what the reg expression is,
       
  2074 
       
  2075 452
       
  2076 00:22:56,180 --> 00:22:59,540
       
  2077 it will always match the
       
  2078 empty string, so true.
       
  2079 
       
  2080 453
       
  2081 00:22:59,540 --> 00:23:02,225
       
  2082 So nanobots, very easy.
       
  2083 
       
  2084 454
       
  2085 00:23:02,225 --> 00:23:07,430
       
  2086 The derivative is also not
       
  2087 so much more complicated.
       
  2088 
       
  2089 455
       
  2090 00:23:07,430 --> 00:23:08,974
       
  2091 It takes two arguments,
       
  2092 
       
  2093 456
       
  2094 00:23:08,974 --> 00:23:11,810
       
  2095 a character and the
       
  2096 rec expression,
       
  2097 
       
  2098 457
       
  2099 00:23:11,810 --> 00:23:14,405
       
  2100 and returns a wreck expression.
       
  2101 
       
  2102 458
       
  2103 00:23:14,405 --> 00:23:16,340
       
  2104 So now we just have to analyze
       
  2105 
       
  2106 459
       
  2107 00:23:16,340 --> 00:23:18,890
       
  2108 what's that reg
       
  2109 expression looks like.
       
  2110 
       
  2111 460
       
  2112 00:23:18,890 --> 00:23:23,870
       
  2113 If it's 0, we return
       
  2114 01, we return 0.
       
  2115 
       
  2116 461
       
  2117 00:23:23,870 --> 00:23:26,990
       
  2118 If the character is the
       
  2119 wreck expressions character
       
  2120 
       
  2121 462
       
  2122 00:23:26,990 --> 00:23:30,260
       
  2123 d. We test whether it's
       
  2124 equal to this character.
       
  2125 
       
  2126 463
       
  2127 00:23:30,260 --> 00:23:32,225
       
  2128 We want to take the
       
  2129 derivative off.
       
  2130 
       
  2131 464
       
  2132 00:23:32,225 --> 00:23:36,185
       
  2133 If yes, return one, otherwise 0.
       
  2134 
       
  2135 465
       
  2136 00:23:36,185 --> 00:23:38,600
       
  2137 The alternative which has pushed
       
  2138 
       
  2139 466
       
  2140 00:23:38,600 --> 00:23:39,860
       
  2141 the derivative under
       
  2142 
       
  2143 467
       
  2144 00:23:39,860 --> 00:23:42,710
       
  2145 this alternative,
       
  2146 that's the easy one.
       
  2147 
       
  2148 468
       
  2149 00:23:42,710 --> 00:23:44,690
       
  2150 Here's the sequence case where we
       
  2151 
       
  2152 469
       
  2153 00:23:44,690 --> 00:23:47,015
       
  2154 first have to test
       
  2155 if it's nullable,
       
  2156 
       
  2157 470
       
  2158 00:23:47,015 --> 00:23:49,040
       
  2159 If it's not the have to push
       
  2160 
       
  2161 471
       
  2162 00:23:49,040 --> 00:23:52,160
       
  2163 the derivative over
       
  2164 the first DR1.
       
  2165 
       
  2166 472
       
  2167 00:23:52,160 --> 00:23:56,135
       
  2168 And otherwise if it is null
       
  2169 above we have two cases.
       
  2170 
       
  2171 473
       
  2172 00:23:56,135 --> 00:24:00,605
       
  2173 Either you have to push
       
  2174 it over the R1 or R2.
       
  2175 
       
  2176 474
       
  2177 00:24:00,605 --> 00:24:03,860
       
  2178 And a star case, this one.
       
  2179 
       
  2180 475
       
  2181 00:24:03,860 --> 00:24:07,160
       
  2182 We just build the sequence
       
  2183 of the derivative of
       
  2184 
       
  2185 476
       
  2186 00:24:07,160 --> 00:24:11,600
       
  2187 R1 and append the
       
  2188 or as a sequence,
       
  2189 
       
  2190 477
       
  2191 00:24:11,600 --> 00:24:14,195
       
  2192 put the star at the end.
       
  2193 
       
  2194 478
       
  2195 00:24:14,195 --> 00:24:19,430
       
  2196 Okay, so here's this
       
  2197 function for strings.
       
  2198 
       
  2199 479
       
  2200 00:24:19,430 --> 00:24:21,740
       
  2201 So a list of characters.
       
  2202 
       
  2203 480
       
  2204 00:24:21,740 --> 00:24:23,870
       
  2205 This function takes N,
       
  2206 
       
  2207 481
       
  2208 00:24:23,870 --> 00:24:26,480
       
  2209 S list of characters argument
       
  2210 and a reg expression
       
  2211 
       
  2212 482
       
  2213 00:24:26,480 --> 00:24:29,360
       
  2214 as argument and returns
       
  2215 a wreck expression.
       
  2216 
       
  2217 483
       
  2218 00:24:29,360 --> 00:24:31,460
       
  2219 And we just have to
       
  2220 pattern match what
       
  2221 
       
  2222 484
       
  2223 00:24:31,460 --> 00:24:34,055
       
  2224 that string looks like
       
  2225 or this list looks like.
       
  2226 
       
  2227 485
       
  2228 00:24:34,055 --> 00:24:35,360
       
  2229 If it's the empty list,
       
  2230 
       
  2231 486
       
  2232 00:24:35,360 --> 00:24:38,030
       
  2233 it just immediately
       
  2234 return the R. If
       
  2235 
       
  2236 487
       
  2237 00:24:38,030 --> 00:24:42,020
       
  2238 it's something of the
       
  2239 form C followed by S,
       
  2240 
       
  2241 488
       
  2242 00:24:42,020 --> 00:24:45,050
       
  2243 then we build first the
       
  2244 derivative according
       
  2245 
       
  2246 489
       
  2247 00:24:45,050 --> 00:24:48,345
       
  2248 to a C. And then
       
  2249 the result of that,
       
  2250 
       
  2251 490
       
  2252 00:24:48,345 --> 00:24:50,090
       
  2253 people recursively call
       
  2254 
       
  2255 491
       
  2256 00:24:50,090 --> 00:24:53,675
       
  2257 the derivative
       
  2258 according to s. Okay?
       
  2259 
       
  2260 492
       
  2261 00:24:53,675 --> 00:24:56,060
       
  2262 And now the main mantra,
       
  2263 
       
  2264 493
       
  2265 00:24:56,060 --> 00:24:59,360
       
  2266 it takes a reg
       
  2267 expression and takes
       
  2268 
       
  2269 494
       
  2270 00:24:59,360 --> 00:25:02,870
       
  2271 a string and returns a
       
  2272 boolean, true or false.
       
  2273 
       
  2274 495
       
  2275 00:25:02,870 --> 00:25:05,705
       
  2276 And it first builds
       
  2277 the derivative.
       
  2278 
       
  2279 496
       
  2280 00:25:05,705 --> 00:25:07,430
       
  2281 The only thing I have to do here
       
  2282 
       
  2283 497
       
  2284 00:25:07,430 --> 00:25:09,410
       
  2285 because I'm implementing
       
  2286 it and scholar,
       
  2287 
       
  2288 498
       
  2289 00:25:09,410 --> 00:25:15,064
       
  2290 I have to coerce between strings
       
  2291 and lists of characters.
       
  2292 
       
  2293 499
       
  2294 00:25:15,064 --> 00:25:16,580
       
  2295 So he, I take first
       
  2296 
       
  2297 500
       
  2298 00:25:16,580 --> 00:25:20,810
       
  2299 the two list of the S that
       
  2300 gives me a list of characters.
       
  2301 
       
  2302 501
       
  2303 00:25:20,810 --> 00:25:23,135
       
  2304 Then I build this derivative
       
  2305 
       
  2306 502
       
  2307 00:25:23,135 --> 00:25:26,045
       
  2308 is ds because I'm
       
  2309 looking at strings.
       
  2310 
       
  2311 503
       
  2312 00:25:26,045 --> 00:25:28,160
       
  2313 And then at the end,
       
  2314 
       
  2315 504
       
  2316 00:25:28,160 --> 00:25:33,050
       
  2317 built-in nullable of
       
  2318 the final derivative.
       
  2319 
       
  2320 505
       
  2321 00:25:33,050 --> 00:25:37,310
       
  2322 The beauty of all this
       
  2323 is that in Scala,
       
  2324 
       
  2325 506
       
  2326 00:25:37,310 --> 00:25:40,085
       
  2327 these definition which
       
  2328 I had on the slides
       
  2329 
       
  2330 507
       
  2331 00:25:40,085 --> 00:25:43,835
       
  2332 go almost one-to-one into code.
       
  2333 
       
  2334 508
       
  2335 00:25:43,835 --> 00:25:45,605
       
  2336 And that's of course,
       
  2337 
       
  2338 509
       
  2339 00:25:45,605 --> 00:25:47,480
       
  2340 makes it very easy
       
  2341 to implement in
       
  2342 
       
  2343 510
       
  2344 00:25:47,480 --> 00:25:49,730
       
  2345 a functional language like Scala.
       
  2346 
       
  2347 511
       
  2348 00:25:49,730 --> 00:25:52,865
       
  2349 We can also now try
       
  2350 out some examples.
       
  2351 
       
  2352 512
       
  2353 00:25:52,865 --> 00:25:56,960
       
  2354 This was the example
       
  2355 I had on the slide.
       
  2356 
       
  2357 513
       
  2358 00:25:56,960 --> 00:25:58,370
       
  2359 And we could now calculate
       
  2360 
       
  2361 514
       
  2362 00:25:58,370 --> 00:26:00,305
       
  2363 what's the derivative
       
  2364 according to a,
       
  2365 
       
  2366 515
       
  2367 00:26:00,305 --> 00:26:02,720
       
  2368 B, and C of this
       
  2369 record expression.
       
  2370 
       
  2371 516
       
  2372 00:26:02,720 --> 00:26:07,040
       
  2373 And I hope you didn't
       
  2374 make any mistake.
       
  2375 
       
  2376 517
       
  2377 00:26:07,040 --> 00:26:09,260
       
  2378 Now of course we want
       
  2379 to see whether B
       
  2380 
       
  2381 518
       
  2382 00:26:09,260 --> 00:26:11,390
       
  2383 do any better with
       
  2384 this algorithm.
       
  2385 
       
  2386 519
       
  2387 00:26:11,390 --> 00:26:13,715
       
  2388 Then Python and Ruby.
       
  2389 
       
  2390 520
       
  2391 00:26:13,715 --> 00:26:16,070
       
  2392 And let's first have a
       
  2393 look at this example.
       
  2394 
       
  2395 521
       
  2396 00:26:16,070 --> 00:26:18,079
       
  2397 So this reg expression.
       
  2398 
       
  2399 522
       
  2400 00:26:18,079 --> 00:26:19,880
       
  2401 The problem is that in
       
  2402 
       
  2403 523
       
  2404 00:26:19,880 --> 00:26:22,070
       
  2405 our reg expression
       
  2406 metro so far we have
       
  2407 
       
  2408 524
       
  2409 00:26:22,070 --> 00:26:24,530
       
  2410 the sequence right
       
  2411 expression where we
       
  2412 
       
  2413 525
       
  2414 00:26:24,530 --> 00:26:27,200
       
  2415 don't have this optional
       
  2416 wreck expression.
       
  2417 
       
  2418 526
       
  2419 00:26:27,200 --> 00:26:30,800
       
  2420 And we don't have this n
       
  2421 times Rekha expression,
       
  2422 
       
  2423 527
       
  2424 00:26:30,800 --> 00:26:36,605
       
  2425 but we can quickly implement
       
  2426 that in our implementation.
       
  2427 
       
  2428 528
       
  2429 00:26:36,605 --> 00:26:40,549
       
  2430 So if you want to build the
       
  2431 optional reg expression,
       
  2432 
       
  2433 529
       
  2434 00:26:40,549 --> 00:26:41,870
       
  2435 which is defined here,
       
  2436 
       
  2437 530
       
  2438 00:26:41,870 --> 00:26:44,570
       
  2439 a little function which
       
  2440 takes a reg expression and
       
  2441 
       
  2442 531
       
  2443 00:26:44,570 --> 00:26:47,360
       
  2444 returns the alternative of R one.
       
  2445 
       
  2446 532
       
  2447 00:26:47,360 --> 00:26:49,624
       
  2448 So it essentially
       
  2449 takes the definition
       
  2450 
       
  2451 533
       
  2452 00:26:49,624 --> 00:26:53,240
       
  2453 of optional R and
       
  2454 replaces it by that.
       
  2455 
       
  2456 534
       
  2457 00:26:53,240 --> 00:26:56,150
       
  2458 The end times what we
       
  2459 essentially do it,
       
  2460 
       
  2461 535
       
  2462 00:26:56,150 --> 00:27:01,535
       
  2463 There's beaches built n
       
  2464 copies of this r. Ok?
       
  2465 
       
  2466 536
       
  2467 00:27:01,535 --> 00:27:04,745
       
  2468 So if this n times was 0,
       
  2469 
       
  2470 537
       
  2471 00:27:04,745 --> 00:27:06,245
       
  2472 we just return one.
       
  2473 
       
  2474 538
       
  2475 00:27:06,245 --> 00:27:11,570
       
  2476 If it's one, then we return
       
  2477 just the reg expression.
       
  2478 
       
  2479 539
       
  2480 00:27:11,570 --> 00:27:15,575
       
  2481 And if it's a, something
       
  2482 bigger than one,
       
  2483 
       
  2484 540
       
  2485 00:27:15,575 --> 00:27:18,560
       
  2486 then we just built the
       
  2487 sequence of one of
       
  2488 
       
  2489 541
       
  2490 00:27:18,560 --> 00:27:20,270
       
  2491 these copies and call
       
  2492 
       
  2493 542
       
  2494 00:27:20,270 --> 00:27:22,925
       
  2495 this function again
       
  2496 with n minus one.
       
  2497 
       
  2498 543
       
  2499 00:27:22,925 --> 00:27:26,330
       
  2500 So we just now n copies of that.
       
  2501 
       
  2502 544
       
  2503 00:27:26,330 --> 00:27:30,710
       
  2504 Okay? Okay, so we can look
       
  2505 at our first example.
       
  2506 
       
  2507 545
       
  2508 00:27:30,710 --> 00:27:32,629
       
  2509 This is the work expression,
       
  2510 
       
  2511 546
       
  2512 00:27:32,629 --> 00:27:36,035
       
  2513 and I call that here
       
  2514 even rec expression1.
       
  2515 
       
  2516 547
       
  2517 00:27:36,035 --> 00:27:37,670
       
  2518 It doesn't look as beautiful
       
  2519 
       
  2520 548
       
  2521 00:27:37,670 --> 00:27:39,140
       
  2522 as what we write down on paper.
       
  2523 
       
  2524 549
       
  2525 00:27:39,140 --> 00:27:41,240
       
  2526 We will actually look
       
  2527 at this later on
       
  2528 
       
  2529 550
       
  2530 00:27:41,240 --> 00:27:43,640
       
  2531 if this can be improved.
       
  2532 But here it is.
       
  2533 
       
  2534 551
       
  2535 00:27:43,640 --> 00:27:45,724
       
  2536 Here's the reg expression,
       
  2537 
       
  2538 552
       
  2539 00:27:45,724 --> 00:27:49,520
       
  2540 and here's a little function
       
  2541 which measures the time.
       
  2542 
       
  2543 553
       
  2544 00:27:49,520 --> 00:27:53,180
       
  2545 And we can now start testing
       
  2546 
       
  2547 554
       
  2548 00:27:53,180 --> 00:27:57,845
       
  2549 this reg expression with
       
  2550 strings of just containing A's.
       
  2551 
       
  2552 555
       
  2553 00:27:57,845 --> 00:28:00,020
       
  2554 And we first a bit cautious,
       
  2555 
       
  2556 556
       
  2557 00:28:00,020 --> 00:28:04,985
       
  2558 be tested between 020
       
  2559 and be count by two.
       
  2560 
       
  2561 557
       
  2562 00:28:04,985 --> 00:28:08,330
       
  2563 And I actually will not
       
  2564 start that with Scala,
       
  2565 
       
  2566 558
       
  2567 00:28:08,330 --> 00:28:12,965
       
  2568 but actually the orbital online.
       
  2569 
       
  2570 559
       
  2571 00:28:12,965 --> 00:28:15,305
       
  2572 And that's out.
       
  2573 
       
  2574 560
       
  2575 00:28:15,305 --> 00:28:17,269
       
  2576 And that correlates.
       
  2577 
       
  2578 561
       
  2579 00:28:17,269 --> 00:28:20,675
       
  2580 So for 18 days it's pretty fast.
       
  2581 
       
  2582 562
       
  2583 00:28:20,675 --> 00:28:24,815
       
  2584 But for 20 it already
       
  2585 needs to seconds.
       
  2586 
       
  2587 563
       
  2588 00:28:24,815 --> 00:28:28,265
       
  2589 And you can see
       
  2590 actually this jump from
       
  2591 
       
  2592 564
       
  2593 00:28:28,265 --> 00:28:32,825
       
  2594 18 to 20 in the runtime
       
  2595 is prepared to.
       
  2596 
       
  2597 565
       
  2598 00:28:32,825 --> 00:28:37,460
       
  2599 And if we actually measure
       
  2600 it more accurately,
       
  2601 
       
  2602 566
       
  2603 00:28:37,460 --> 00:28:39,770
       
  2604 then we get this picture.
       
  2605 
       
  2606 567
       
  2607 00:28:39,770 --> 00:28:41,255
       
  2608 And what turns out,
       
  2609 
       
  2610 568
       
  2611 00:28:41,255 --> 00:28:45,830
       
  2612 we actually inverse as Python
       
  2613 and Ruby in this example.
       
  2614 
       
  2615 569
       
  2616 00:28:45,830 --> 00:28:49,230
       
  2617 So I guess that's a fail.