100:00:06,020 --> 00:00:09,945Welcome back. Afterthis disappointment200:00:09,945 --> 00:00:14,115and case of over-promisingand under-achieving,300:00:14,115 --> 00:00:17,295let's have a look whetherwe can recover from that.400:00:17,295 --> 00:00:20,535So here's one problem.500:00:20,535 --> 00:00:23,790When we looked at this n-times regular expression, we600:00:23,790 --> 00:00:27,330were not able to representthat directly.700:00:27,330 --> 00:00:31,275We had to represent it as asequence regular expression.800:00:31,275 --> 00:00:32,850So we expanded it.900:00:32,850 --> 00:00:36,510So times-one would be just a,1000:00:36,510 --> 00:00:40,595times-two wouldbe a o a, and so on.1100:00:40,595 --> 00:00:43,535And so you can see ifyou go already to 13,1200:00:43,535 --> 00:00:47,960n equals 13, the regularexpression becomes quite large.1300:00:47,960 --> 00:00:52,865And now the functionsnullable and also derivative,1400:00:52,865 --> 00:00:56,360they need to traversethis regular expression.1500:00:56,360 --> 00:00:59,060Remember this tree, sometimes,1600:00:59,060 --> 00:01:01,820they have to traversethat even several times.1700:01:01,820 --> 00:01:04,535So that will slowdown the algorithm.1800:01:04,535 --> 00:01:08,150And in particular, becauseour first regular expression was1900:01:08,150 --> 00:01:11,840actually not just a, buta + 1. So we had2000:01:11,840 --> 00:01:14,330in the case of 13,we had a + 1, a + 1,...2100:01:14,330 --> 00:01:17,330a + 1... 13 times.2200:01:17,330 --> 00:01:20,150And this regularexpression just grows2300:01:20,150 --> 00:01:25,475with the number n. Just toshow you this also in code,2400:01:25,475 --> 00:01:28,115I'm in the file re1.sc,2500:01:28,115 --> 00:01:30,920and in the end I have a size function.2600:01:30,920 --> 00:01:33,140The size function takesa regular expression as2700:01:33,140 --> 00:01:36,215argument and produces an integer.2800:01:36,215 --> 00:01:39,980And essentially it takesthis regular expression, seen as2900:01:39,980 --> 00:01:44,045a tree and counts how manynodes there are in this tree.3000:01:44,045 --> 00:01:49,490And so if I take this EVIL1regular expression, this one,3100:01:49,490 --> 00:01:52,160and increase now this n. So you3200:01:52,160 --> 00:01:54,680can already seefor n = 1,3300:01:54,680 --> 00:01:56,540the size of thisrecord expression is 53400:01:56,540 --> 00:01:59,615and you see itslowly increases.3500:01:59,615 --> 00:02:02,150And this 20, i.e. n = 20,3600:02:02,150 --> 00:02:07,100the size of this regularexpression is now already 119.3700:02:07,100 --> 00:02:10,145The interestingline is this one.3800:02:10,145 --> 00:02:13,295Remember, when webuilt the derivative,3900:02:13,295 --> 00:02:17,150very often parts of a regularexpression gets copied.4000:02:17,150 --> 00:02:19,280So if you have like EVIL14100:02:19,280 --> 00:02:22,325of 20, and you have 119 nodes,4200:02:22,325 --> 00:02:25,475now this will be often copied.4300:02:25,475 --> 00:02:31,475And we want to see what's theresulting tree look like, i.e.4400:02:31,475 --> 00:02:32,780how many nodes does it have.4500:02:32,780 --> 00:02:34,985So I take here EVIL1 of 204600:02:34,985 --> 00:02:38,405and the derivativeaccording to 20 a's.4700:02:38,405 --> 00:02:41,820And just have a lookwhat the size is.4800:02:43,210 --> 00:02:45,680Okay, that takes a while.4900:02:45,680 --> 00:02:48,410You see now this recular expression,5000:02:48,410 --> 00:02:52,280the derivative has already8 million plus nodes.5100:02:52,280 --> 00:02:55,400And now nullable andderivative have to5200:02:55,400 --> 00:02:58,430traverse these trees with8 million plus nodes.5300:02:58,430 --> 00:03:01,400So it's no wonderthat this is slow.5400:03:01,400 --> 00:03:03,860The first thing wecan try to mitigate5500:03:03,860 --> 00:03:06,350this explosion problem is to5600:03:06,350 --> 00:03:09,050have an explicit n-times regular expression.5700:03:09,050 --> 00:03:11,600So instead of expanding5800:03:11,600 --> 00:03:13,415it to the sequenceregular expression,5900:03:13,415 --> 00:03:17,510let's just have a singleregular expression n-times,6000:03:17,510 --> 00:03:20,540which takes an expression and6100:03:20,540 --> 00:03:24,470a number, and keeps thatregular expression small.6200:03:24,470 --> 00:03:27,095I'm here in the file re2.sc,6300:03:27,095 --> 00:03:29,090which is also on KEATS.6400:03:29,090 --> 00:03:32,570And the only change I madeis I added explicitly6500:03:32,570 --> 00:03:33,860a regular expression for6600:03:33,860 --> 00:03:36,770n-times. The optionalregular expression6700:03:36,770 --> 00:03:39,920we leave out at themoment because this a?6800:03:39,920 --> 00:03:41,975is represented as6900:03:41,975 --> 00:03:44,645a + 1 and doesn'texpand too much.7000:03:44,645 --> 00:03:47,375The real culpritis this n-times.7100:03:47,375 --> 00:03:51,095And instead of expandingit n-times, we just7200:03:51,095 --> 00:03:54,275take here a regular expressionand a natural number,7300:03:54,275 --> 00:03:56,960which says how many timesit should be repeated.7400:03:56,960 --> 00:03:59,165And in this week wecan keep it small.7500:03:59,165 --> 00:04:01,370But by adding thatregular expression,7600:04:01,370 --> 00:04:05,150we now have to think aboutcases for nullable and7700:04:05,150 --> 00:04:07,340derivative. So that we have7800:04:07,340 --> 00:04:10,205to calculate nextwhat they look like.7900:04:10,205 --> 00:04:14,060We can actuallycalculate what the rule8000:04:14,060 --> 00:04:17,525for nullable should be fromhow we defined it earlier.8100:04:17,525 --> 00:04:19,580Remember, if you havea regular expression,8200:04:19,580 --> 00:04:21,980r and we take itn-times of this r,8300:04:21,980 --> 00:04:25,220then in case if n = 0,8400:04:25,220 --> 00:04:28,130we definded that as the1 regular expression.8500:04:28,130 --> 00:04:30,380If n = 1,8600:04:30,380 --> 00:04:32,495it will be just asingle copy of r.8700:04:32,495 --> 00:04:33,725If n = 2,8800:04:33,725 --> 00:04:35,270it will be two copies in sequence.8900:04:35,270 --> 00:04:39,260If three, we havethree copies and so on.9000:04:39,260 --> 00:04:41,390Now we have tosomehow characterize9100:04:41,390 --> 00:04:44,285when are these cases all nullable.9200:04:44,285 --> 00:04:47,810Well, if n equals 0,9300:04:47,810 --> 00:04:49,850then this will be defined as 1.9400:04:49,850 --> 00:04:52,100So 1 can matchthe empty string.9500:04:52,100 --> 00:04:54,785So that should bedefinitely true.9600:04:54,785 --> 00:04:57,725Okay, if n = 1,9700:04:57,725 --> 00:05:00,470when can this regular expressionmatch the empty string?9800:05:00,470 --> 00:05:02,000So when should this be nullable?9900:05:02,000 --> 00:05:07,535Well, if nullable of r holds,10000:05:07,535 --> 00:05:09,860If it can matchthe empty string,10100:05:09,860 --> 00:05:12,110then we can matchthe empty string.10200:05:12,110 --> 00:05:14,870When can this regular expressionmatch the empty string?10300:05:14,870 --> 00:05:16,265So these are now two copies of r.10400:05:16,265 --> 00:05:20,690Well, both copies needto match the empty string.10500:05:20,690 --> 00:05:22,820So again, we have to ask10600:05:22,820 --> 00:05:25,774both of them need to be nullable.10700:05:25,774 --> 00:05:28,520Similarly, in the third case,10800:05:28,520 --> 00:05:30,710all three copies need to be10900:05:30,710 --> 00:05:33,230able to match the emptystring. When is that the case?11000:05:33,230 --> 00:05:38,360Well, if nullable ofr holds and so on.11100:05:38,360 --> 00:05:48,754So we can actually define thatif n = 0, then true,11200:05:48,754 --> 00:05:56,045else we have to ask whethernullable of r holds.11300:05:56,045 --> 00:05:58,550So that will be the clause we11400:05:58,550 --> 00:06:01,625have to implement forn-times and nullable.11500:06:01,625 --> 00:06:04,280Deriving the definition for11600:06:04,280 --> 00:06:06,920the derivative of the n-times11700:06:06,920 --> 00:06:10,175regular expressionis not that easy.11800:06:10,175 --> 00:06:12,755We have to look againhow it was defined11900:06:12,755 --> 00:06:16,010earlier and we somehowhave to spot a pattern.12000:06:16,010 --> 00:06:18,380Otherwise, ouralgorithm will be again12100:06:18,380 --> 00:06:20,975quite slow if we don'tspot that pattern.12200:06:20,975 --> 00:06:22,550So let's have a look.12300:06:22,550 --> 00:06:26,240So in the case thatn is equal to 0,12400:06:26,240 --> 00:06:29,885then r n-times wasdefined as just 1.12500:06:29,885 --> 00:06:32,030What would be thederivative of one?12600:06:32,030 --> 00:06:36,140So the derivative of c of 112700:06:36,140 --> 00:06:38,990would be defined as 0.12800:06:38,990 --> 00:06:41,465Okay, fair enough.12900:06:41,465 --> 00:06:44,960If he have n = 1,13000:06:44,960 --> 00:06:48,125then we just have one copyof this regular expression.13100:06:48,125 --> 00:06:50,120So there's not much else we can do.13200:06:50,120 --> 00:06:53,735We would have to calculatethe derivative of r.13300:06:53,735 --> 00:06:57,395Now in the n = 2 case,13400:06:57,395 --> 00:07:00,245that would mean wehave two copies of r.13500:07:00,245 --> 00:07:03,425And they would be a sequence.13600:07:03,425 --> 00:07:07,775How would be the derivative c of13700:07:07,775 --> 00:07:10,340r followed by r be13800:07:10,340 --> 00:07:13,265defined? Well we wouldlook up the definition.13900:07:13,265 --> 00:07:15,470And it would say that'sthe complicated case,14000:07:15,470 --> 00:07:16,760the sequence.14100:07:16,760 --> 00:07:21,875If nullable of r,14200:07:21,875 --> 00:07:25,250then the more complicated case,14300:07:25,250 --> 00:07:28,225else, that's the easycase, that would be14400:07:28,225 --> 00:07:32,660derivative of c of r, followed14500:07:32,660 --> 00:07:35,540by r. And then I just have to copy14600:07:35,540 --> 00:07:40,775that derivative of cof r followed by r here.14700:07:40,775 --> 00:07:43,130But in this case I have also14800:07:43,130 --> 00:07:51,145the alternative derivativeof c of r. And from that,14900:07:51,145 --> 00:07:55,030it looks like we can'tdo much better than15000:07:55,030 --> 00:07:58,390that, unless we do15100:07:58,390 --> 00:08:02,560a little bit of magic and themagic has to do with this case.15200:08:02,560 --> 00:08:07,420So let me look at thiscase a bit more closely.15300:08:07,420 --> 00:08:09,819Let me go back to slides15400:08:09,819 --> 00:08:12,940because actually the calculationmight be a bit hairy.15500:08:12,940 --> 00:08:16,945So remember we are in thiscase where n equals two.15600:08:16,945 --> 00:08:18,550And this was defined as15700:08:18,550 --> 00:08:20,680this sequence regularexpression r followed by r.15800:08:20,680 --> 00:08:23,080And the question was,15900:08:23,080 --> 00:08:26,365can we somehow make senseout of this derivative16000:08:26,365 --> 00:08:30,370where we have to build thederivative of a sequence.16100:08:30,370 --> 00:08:33,020So if we just unfold the definition,16200:08:33,020 --> 00:08:36,050we would ask ifthe r is nullable,16300:08:36,050 --> 00:08:39,095In this case, we returnthis alternative.16400:08:39,095 --> 00:08:40,640And if it's not nullable,16500:08:40,640 --> 00:08:44,135it is just thisregular expression.16600:08:44,135 --> 00:08:49,550Now my claim is thatthis whole if condition16700:08:49,550 --> 00:08:55,895can be replaced by just thissimple derivative here.16800:08:55,895 --> 00:08:58,775And if that claim is correct,16900:08:58,775 --> 00:09:01,520there would be very nicebecause in contrast to17000:09:01,520 --> 00:09:04,130this if condition where17100:09:04,130 --> 00:09:06,280we have to calculatelike nullable,17200:09:06,280 --> 00:09:08,330decide in which branch we are.17300:09:08,330 --> 00:09:10,580We don't have tocalculate nullable,17400:09:10,580 --> 00:09:13,880we can just replaceit by this expression.17500:09:13,880 --> 00:09:16,670So if we cansubstantiate that claim,17600:09:16,670 --> 00:09:19,860that will be definitelygood our algorithm.17700:09:20,140 --> 00:09:22,775And to substantiate that,17800:09:22,775 --> 00:09:26,795I will focus on thisrecord expression here.17900:09:26,795 --> 00:09:31,100And notice that this recordexpression will only be18000:09:31,100 --> 00:09:35,780called or only be generatedif r is nullable.18100:09:35,780 --> 00:09:38,075So in any other case,18200:09:38,075 --> 00:09:40,060I will actually not go into 18300:09:40,060 --> 00:09:43,850that if branch and wouldbe in the other one.18400:09:43,850 --> 00:09:45,260So if we are in this if branch,18500:09:45,260 --> 00:09:47,705we definitely knowthat r is nullable.18600:09:47,705 --> 00:09:52,955Okay, so here'sour regular expression.18700:09:52,955 --> 00:09:55,940And we know it's nullable.18800:09:55,940 --> 00:09:57,920So we have to somehow find18900:09:57,920 --> 00:10:00,380an equivalent expression that19000:10:00,380 --> 00:10:04,100is smaller or simplerthan that one.19100:10:04,100 --> 00:10:05,945Let's see what we can do.19200:10:05,945 --> 00:10:10,160So the first thingactually is we're multiplying19300:10:10,160 --> 00:10:16,595this right hand side of thealternative with times 1.19400:10:16,595 --> 00:10:19,700We can do thatbecause this does not19500:10:19,700 --> 00:10:23,090change which strings thisregular expression can match.19600:10:23,090 --> 00:10:25,685Remember we even had itas a simplification rule,19700:10:25,685 --> 00:10:27,425just in this case we19800:10:27,425 --> 00:10:29,705don't apply it as asimplification rule,19900:10:29,705 --> 00:10:31,310actually make thisregular expression20000:10:31,310 --> 00:10:32,720a bit more complicated.20100:10:32,720 --> 00:10:34,910But times 1 doesn't make20200:10:34,910 --> 00:10:37,820a difference because itmeans at the end of a string,20300:10:37,820 --> 00:10:40,070we still want to matchthe empty string.20400:10:40,070 --> 00:10:42,155Okay, so that is possible.20500:10:42,155 --> 00:10:45,740I can do that. Oncewe have done that,20600:10:45,740 --> 00:10:48,410you will notice that thisfactor derivative of20700:10:48,410 --> 00:10:51,860r are exactly thesame as that one.20800:10:51,860 --> 00:10:54,650And in between is a plus.20900:10:54,650 --> 00:10:57,440So you probably remember the law21000:10:57,440 --> 00:11:00,170from school maththat I can pull out21100:11:00,170 --> 00:11:02,735this factor derivative of c of r.21200:11:02,735 --> 00:11:06,320And I'm inside the parenthesesis left with that.21300:11:06,320 --> 00:11:09,245So now I have r + 1.21400:11:09,245 --> 00:11:13,055Usually we cannotsimplify r + 1.21500:11:13,055 --> 00:11:15,530If it had been r + 0, then yes,21600:11:15,530 --> 00:11:18,665we could have got rid of the 0.21700:11:18,665 --> 00:11:21,590But this + 1 oftenmakes a difference,21800:11:21,590 --> 00:11:22,970but not in our case.21900:11:22,970 --> 00:11:25,940Remember, we know thatthis r is nullable,22000:11:25,940 --> 00:11:29,840so on its own can alreadymatch the empty string.22100:11:29,840 --> 00:11:33,305So we don't really need thisalternative plus one there,22200:11:33,305 --> 00:11:35,300so we can just get rid of that.22300:11:35,300 --> 00:11:37,070Okay, and so now we have22400:11:37,070 --> 00:11:40,535a much simpler equivalentregular expression.22500:11:40,535 --> 00:11:44,285And this actually helps alot for our if-condition.22600:11:44,285 --> 00:11:46,925Look, this was theoriginal if-condition22700:11:46,925 --> 00:11:50,270and this is the regular expressionwe just simplified.22800:11:50,270 --> 00:11:53,105If we replace it with this one,22900:11:53,105 --> 00:11:56,090then we just end up with this.23000:11:56,090 --> 00:11:59,510And now you will see thatthe if condition is actually23100:11:59,510 --> 00:12:02,750pointless because youcheck if it's nullable,23200:12:02,750 --> 00:12:05,060we return this regularexpression or it's23300:12:05,060 --> 00:12:08,210not nullable and wereturn exactly the same.23400:12:08,210 --> 00:12:10,025That doesn't make any difference.23500:12:10,025 --> 00:12:11,750So we can just get rid of23600:12:11,750 --> 00:12:14,645that one and canreplace it by that.23700:12:14,645 --> 00:12:16,865And you see, this is now23800:12:16,865 --> 00:12:20,720a much simpler case thanwhat we had before.23900:12:20,720 --> 00:12:24,170So let's take stockwhat we have so far.24000:12:24,170 --> 00:12:26,915So we know in the 0-case,24100:12:26,915 --> 00:12:30,920the derivative needsto be defined as 0.24200:12:30,920 --> 00:12:33,095Because we define this24300:12:33,095 --> 00:12:36,785n-times as one,the derivative is 0.24400:12:36,785 --> 00:12:39,889For just r, this willbe the derivative.24500:12:39,889 --> 00:12:42,170And we can't do anybetter than that.24600:12:42,170 --> 00:12:45,620For r followed by r, as wejust found out24700:12:45,620 --> 00:12:47,270actually it is quite simple24800:12:47,270 --> 00:12:51,410regular expression is equivalentto the derivative.24900:12:51,410 --> 00:12:53,870Now, we have to continue with25000:12:53,870 --> 00:12:56,090that case where n isequal to three and we25100:12:56,090 --> 00:12:58,099now have three copies25200:12:58,099 --> 00:13:02,390of this r. What shouldbe the derivative?25300:13:02,390 --> 00:13:05,330Well, if you look very carefully25400:13:05,330 --> 00:13:08,465at this emerging pattern,25500:13:08,465 --> 00:13:12,410I have to say thenwhat would be nice if,25600:13:12,410 --> 00:13:16,400if he could show that inthe n equals three case,25700:13:16,400 --> 00:13:18,275we end up with this.25800:13:18,275 --> 00:13:21,290Because then what we have is this.25900:13:21,290 --> 00:13:25,370We can define ournullable for n-times26000:13:25,370 --> 00:13:31,025as if n = 0 thentrue, else nullable r.26100:13:31,025 --> 00:13:33,875And for the derivative,26200:13:33,875 --> 00:13:37,190we can save if n is equal to 0,26300:13:37,190 --> 00:13:40,235then we return the0 regular expression.26400:13:40,235 --> 00:13:43,295Otherwise, as you can seefrom this pattern here,26500:13:43,295 --> 00:13:50,735it would be derivative ofc r followed by r{n-1}.26600:13:50,735 --> 00:13:54,770Okay? And the only26700:13:54,770 --> 00:13:56,330thing we have to make csure is that26800:13:56,330 --> 00:13:58,175this pattern actually holds.26900:13:58,175 --> 00:14:00,470So that's, I willshow you next in27000:14:00,470 --> 00:14:03,735the case for n equals three.27100:14:03,735 --> 00:14:07,810If we have a regular expression r27200:14:07,810 --> 00:14:09,820and it's followedby r and another r,27300:14:09,820 --> 00:14:11,275three copies of it.27400:14:11,275 --> 00:14:14,245We can just unfoldagain the definition.27500:14:14,245 --> 00:14:16,930So we would ask if r is nullable,27600:14:16,930 --> 00:14:19,765then we have this if branch.27700:14:19,765 --> 00:14:23,905And if r is not nullable,we have this else branch.27800:14:23,905 --> 00:14:27,010Okay? What can we do here?27900:14:27,010 --> 00:14:30,310Well, we notice thatthis one is just now28000:14:30,310 --> 00:14:34,510the derivative of twor's, one after another.28100:14:34,510 --> 00:14:37,330But this we justcalculated a moment ago,28200:14:37,330 --> 00:14:40,120so we can justreplace it by this.28300:14:40,120 --> 00:14:43,255Ok. That's what wecalculated earlier.28400:14:43,255 --> 00:14:46,665But now you will seeagain this factor,28500:14:46,665 --> 00:14:48,695and this factor is the same.28600:14:48,695 --> 00:14:52,700So I can pull thatout to be like that.28700:14:52,700 --> 00:14:57,380And I have now r followedby r plus r. 28800:14:57,380 --> 00:14:59,030But now you probably28900:14:59,030 --> 00:15:00,680remember how we did it earlier.29000:15:00,680 --> 00:15:03,080We can now pull out one copy of29100:15:03,080 --> 00:15:06,020this are to just getsomething like this.29200:15:06,020 --> 00:15:08,765We have to add one essentially,29300:15:08,765 --> 00:15:11,615but we now get r plus one.29400:15:11,615 --> 00:15:15,065And this r here isnow just pulled out.29500:15:15,065 --> 00:15:18,995Well, here again kicksin this reasoning.29600:15:18,995 --> 00:15:22,700We go in this if branchonly if r is nullable.29700:15:22,700 --> 00:15:26,150So r on its own can alreadymatch the empty string.29800:15:26,150 --> 00:15:28,895So I don't needreally this plus one.29900:15:28,895 --> 00:15:30,695I can just get rid of it.30000:15:30,695 --> 00:15:33,140And so I now just haveto rearrange the parentheses30100:15:33,140 --> 00:15:35,450which we said we can also do.30200:15:35,450 --> 00:15:37,595So we have something like that.30300:15:37,595 --> 00:15:39,740And here again, thesame reasoning,30400:15:39,740 --> 00:15:41,975we have an if condition30500:15:41,975 --> 00:15:43,310where it doesn'treally matter what30600:15:43,310 --> 00:15:44,405we're going to return,30700:15:44,405 --> 00:15:46,205it's in both branches the same.30800:15:46,205 --> 00:15:48,470So we can justreplace it by that.30900:15:48,470 --> 00:15:51,920And yes, now we can bepretty sure they'll31000:15:51,920 --> 00:15:55,310work for all the n-timesregular expressions.31100:15:55,310 --> 00:15:57,860And I leave the calculation for31200:15:57,860 --> 00:16:02,570maybe r to the four to you.31300:16:02,570 --> 00:16:04,490And the reason why I do it31400:16:04,490 --> 00:16:06,605in such a detail,this calculation,31500:16:06,605 --> 00:16:08,960this is really meantto help you with31600:16:08,960 --> 00:16:13,200the coursework which iscoming up this week.31700:16:13,210 --> 00:16:16,250I'm now back in ourimplementation.31800:16:16,250 --> 00:16:20,660In this re2.sc, we said we havethis explicit constructor 31900:16:20,660 --> 00:16:25,535for n-times. We can now fillin the cases for nullable.32000:16:25,535 --> 00:16:27,635So if we have r n-times,32100:16:27,635 --> 00:16:30,995if this n is equal to0, we return true.32200:16:30,995 --> 00:16:34,190Otherwise we have to testwhether r is nullable.32300:16:34,190 --> 00:16:37,565And in the derivative case, 32400:16:37,565 --> 00:16:40,339if this n is equal to 0,32500:16:40,339 --> 00:16:43,564the return this 0regular expression.32600:16:43,564 --> 00:16:46,700Otherwise we return thesequence of the derivative32700:16:46,700 --> 00:16:50,270of c of r followed by n times of r,32800:16:50,270 --> 00:16:54,545but n minus one times, andeverything else stays the same.32900:16:54,545 --> 00:16:56,510Here's the function for strings,33000:16:56,510 --> 00:16:58,430derivative function for strings.33100:16:58,430 --> 00:17:01,595In the main matcherfunction is all the same.33200:17:01,595 --> 00:17:04,820We still require thisdefinition because33300:17:04,820 --> 00:17:06,050we haven't done anything about33400:17:06,050 --> 00:17:08,090the optional regularexpression yet.33500:17:08,090 --> 00:17:10,670And we have now this33600:17:10,670 --> 00:17:13,250EVIL1 and EVIL2regular expression.33700:17:13,250 --> 00:17:15,290And let's test it. And let's be33800:17:15,290 --> 00:17:17,000a bit more ambitious.We're testing it with33900:17:17,000 --> 00:17:20,315strings between 0 and 100034000:17:20,315 --> 00:17:22,580and let's see what the time say.34100:17:22,580 --> 00:17:26,210I'm testing this againinside the Ammonite REPL.34200:17:26,210 --> 00:17:30,180And you'll see it shouldbe now much quicker.34300:17:30,610 --> 00:17:34,640Okay, it might slowdown also around 600.34400:17:34,640 --> 00:17:40,490700 needs two seconds,three seconds for 800.34500:17:40,490 --> 00:17:43,940Let's see what itneeds for one thousand?34600:17:43,940 --> 00:17:47,435But you have to rememberRuby and Python34700:17:47,435 --> 00:17:51,530needed half aminute for just 30 a's.34800:17:51,530 --> 00:17:54,485We need a little bitmore than six seconds,34900:17:54,485 --> 00:17:57,110but we are processing a string of35000:17:57,110 --> 00:18:00,5751000 a's. So that success.35100:18:00,575 --> 00:18:04,775This speed is also explainedif you look at the sizes.35200:18:04,775 --> 00:18:08,975Since we now have this explicitn-times constructor,35300:18:08,975 --> 00:18:11,930it doesn't really matterhow big this n is.35400:18:11,930 --> 00:18:14,540This EVIL1 regularexpression will always be35500:18:14,540 --> 00:18:17,195of this size seven, atthe beginning.35600:18:17,195 --> 00:18:20,315And you can also see if younow build the derivatives,35700:18:20,315 --> 00:18:22,550they still grow in size,35800:18:22,550 --> 00:18:24,740but much more moderately.35900:18:24,740 --> 00:18:28,100And let's try out thisexample with 20 a's.36000:18:28,100 --> 00:18:31,685So we build the derivativeaccording to 20 character a's.36100:18:31,685 --> 00:18:33,890In the earlier example,36200:18:33,890 --> 00:18:35,780we ended up with aregular expression that36300:18:35,780 --> 00:18:37,895had like 8 million plus nodes.36400:18:37,895 --> 00:18:39,830And if we do this now,36500:18:39,830 --> 00:18:43,205then we just have a regularexpression with 211 nodes.36600:18:43,205 --> 00:18:44,750And that is much smaller and36700:18:44,750 --> 00:18:47,165all the calculationswould be much quicker.36800:18:47,165 --> 00:18:49,550So yeah, the Brzozowski36900:18:49,550 --> 00:18:52,205algorithmand with this improvement,37000:18:52,205 --> 00:18:54,890we're now runningcircles around Ruby and37100:18:54,890 --> 00:18:58,445Python because they're juststuck here at the beginning.37200:18:58,445 --> 00:19:00,230Because they need already37300:19:00,230 --> 00:19:02,975like half a minutefor just 30 a's.37400:19:02,975 --> 00:19:05,930We now can do somethinglike thousand a's37500:19:05,930 --> 00:19:07,580in a reasonable time.37600:19:07,580 --> 00:19:09,740I think this must betiming I obtained with37700:19:09,740 --> 00:19:12,635my older laptop some time ago.37800:19:12,635 --> 00:19:14,210Because remember wehad something like37900:19:14,210 --> 00:19:16,670six seconds. Here it says 15.38000:19:16,670 --> 00:19:18,080You always have to take38100:19:18,080 --> 00:19:20,885these times withthe pinch of salt.38200:19:20,885 --> 00:19:22,670It's essentially only the trend,38300:19:22,670 --> 00:19:25,100but it's clear we aremuch, much better.38400:19:25,100 --> 00:19:27,065We have worked a lot,38500:19:27,065 --> 00:19:30,720but we also got somethingfor it in return.