100:00:06,350 --> 00:00:10,305Welcome come back. Ican hear you saying,200:00:10,305 --> 00:00:12,000yes, you tried it out on300:00:12,000 --> 00:00:14,760one example and youwere much better.400:00:14,760 --> 00:00:17,415But how about on other examples?500:00:17,415 --> 00:00:21,265Specifically, we had twoevil regular expressions.600:00:21,265 --> 00:00:23,480How about the other case?700:00:23,480 --> 00:00:27,480Well, let's have a look atthat other case in this video.800:00:29,140 --> 00:00:32,705So I'm back herein this re2.sc file.900:00:32,705 --> 00:00:35,180And here's this testcase which run quite1000:00:35,180 --> 00:00:39,665nicely for strings between 0 and 1000.1100:00:39,665 --> 00:00:42,725Here is the other test case.1200:00:42,725 --> 00:00:44,090I still run it only1300:00:44,090 --> 00:00:48,470for relatively smallstrings between 0 and 20.1400:00:48,470 --> 00:00:50,240And let's see what it says.1500:00:50,240 --> 00:00:51,800So I'm running the test in1600:00:51,800 --> 00:00:57,320the Amoonite REPL and it doesn't look too bad.1700:00:57,320 --> 00:01:01,160But this might be because20 is not generous enough.1800:01:01,160 --> 00:01:03,620So let's try it with 30.1900:01:03,620 --> 00:01:06,530Let's run this test again.2000:01:06,530 --> 00:01:11,075And 20 is quite okay.2100:01:11,075 --> 00:01:15,96522, okay, that's nowalmost ten times as much.2200:01:15,965 --> 00:01:18,995And then the nextone would be 24.2300:01:18,995 --> 00:01:23,615And we're waiting and waiting.2400:01:23,615 --> 00:01:26,410And we are waiting.2500:01:26,410 --> 00:01:29,300Actually, it isn'tcalculated at all.2600:01:29,300 --> 00:01:31,399It run out of memory.2700:01:31,399 --> 00:01:33,905So here is something going on,2800:01:33,905 --> 00:01:37,640which is definitely bad and wehave to have a look at that.2900:01:37,640 --> 00:01:40,640It's alwaysinstructive with3000:01:40,640 --> 00:01:43,460this algorithm tolook at the sizes of3100:01:43,460 --> 00:01:45,695the regular expressionswe calculate.3200:01:45,695 --> 00:01:49,625The EVIL2 is thisa star, star b.3300:01:49,625 --> 00:01:51,800So there's nothing wecan compress there.3400:01:51,800 --> 00:01:55,490It has just stars andsequences and nothing else.3500:01:55,490 --> 00:01:58,430And so it's not that wecan play the same trick3600:01:58,430 --> 00:02:01,805of trying to introducean explicit constructor.3700:02:01,805 --> 00:02:04,190But now let's have alook at the derivatives.3800:02:04,190 --> 00:02:05,600As you can see here.3900:02:05,600 --> 00:02:08,420If I take this EVIL2 regular4000:02:08,420 --> 00:02:09,935expression and then build4100:02:09,935 --> 00:02:12,470longer andlonger derivatives,4200:02:12,470 --> 00:02:14,090you can see this grows.4300:02:14,090 --> 00:02:16,055And as you see in this line,4400:02:16,055 --> 00:02:17,270if I'm trying to take4500:02:17,270 --> 00:02:20,180the derivative of aquite large string,4600:02:20,180 --> 00:02:21,680it is quite quick.4700:02:21,680 --> 00:02:26,870But the size of theregular expression,4800:02:26,870 --> 00:02:28,310the number of nodes,4900:02:28,310 --> 00:02:30,815is also like nearlyeight millions.5000:02:30,815 --> 00:02:33,845And so even if that calculatesrelatively quickly,5100:02:33,845 --> 00:02:37,865that is the reason why at 24,5200:02:37,865 --> 00:02:39,560it just runs out of memory.5300:02:39,560 --> 00:02:42,785This regular expressionjust grew too much.5400:02:42,785 --> 00:02:46,520So we somehow needto still compress5500:02:46,520 --> 00:02:49,655this regular expressionand make it not5600:02:49,655 --> 00:02:52,700grow so much when webuild the derivative.5700:02:52,700 --> 00:02:54,020So we have to look at5800:02:54,020 --> 00:02:57,050where does that growactually come from.5900:02:57,050 --> 00:03:00,170Let's look at thederivative operation6000:03:00,170 --> 00:03:01,895again in more detail.6100:03:01,895 --> 00:03:03,740When we introducedit, we looked at6200:03:03,740 --> 00:03:06,560this example of aregulat expression r,6300:03:06,560 --> 00:03:11,465which is a star of somealternative and some sequence.6400:03:11,465 --> 00:03:13,130And we can build now6500:03:13,130 --> 00:03:15,800the derivative of thisr according to a,6600:03:15,800 --> 00:03:18,965b, and c and seewhat it calculates.6700:03:18,965 --> 00:03:21,935And you see in case of a,6800:03:21,935 --> 00:03:26,570it's like one times b plus0 and then followed by r,6900:03:26,570 --> 00:03:29,015which is this wholeregular expression again.7000:03:29,015 --> 00:03:30,935So you can also see7100:03:30,935 --> 00:03:34,745the derivative operationintroduces a lot of junk.7200:03:34,745 --> 00:03:38,165This plus 0 isn'treally necessary.7300:03:38,165 --> 00:03:42,455And this times 1 couldbe also thrown away.7400:03:42,455 --> 00:03:43,685So in the first case,7500:03:43,685 --> 00:03:48,110actually we could just haveone times b is b plus 0,7600:03:48,110 --> 00:03:49,580it still b,7700:03:49,580 --> 00:03:54,905so it would be b followed byr. And in this second case,7800:03:54,905 --> 00:03:57,5150 times b, that will be just 0.7900:03:57,515 --> 00:03:59,270Then 0 plus one is8000:03:59,270 --> 00:04:05,3301 ... times r would be justr. And in the last case,8100:04:05,330 --> 00:04:12,1550 times b would be 0. 0 plus0 is 0. 0 times r would be 0.8200:04:12,155 --> 00:04:17,540Okay? So we could throw outall these useless zeros and8300:04:17,540 --> 00:04:20,960ones because actuallywe have to throw8400:04:20,960 --> 00:04:24,845them out because over timethey just accumulate.8500:04:24,845 --> 00:04:27,035And then we buildthe next derivative.8600:04:27,035 --> 00:04:31,130And the 0s wouldn't go away bybuilding a new derivative.8700:04:31,130 --> 00:04:34,310So we need to somehowa mechanism to8800:04:34,310 --> 00:04:39,120delete the junk fromthese derivatives.8900:04:39,280 --> 00:04:41,900But how to delete junk?9000:04:41,900 --> 00:04:43,370We already know we have9100:04:43,370 --> 00:04:47,825the simplification ruleswhich say like if r plus 0,9200:04:47,825 --> 00:04:53,000I can just replace byr and the other ones.9300:04:53,000 --> 00:04:55,760And so what I nowneed to do is in9400:04:55,760 --> 00:04:58,715my algorithm when Ibuilt the derivative,9500:04:58,715 --> 00:05:01,415this might haveintroduced some junk.9600:05:01,415 --> 00:05:02,960And I now have to have9700:05:02,960 --> 00:05:06,590essentially an additional function9800:05:06,590 --> 00:05:08,945which deletes this junk again.9900:05:08,945 --> 00:05:10,490And in doing so,10000:05:10,490 --> 00:05:13,340keeps the regular expression,10100:05:13,340 --> 00:05:16,730relatively small, because that10200:05:16,730 --> 00:05:19,535is the Achilles' heelof this algorithm.10300:05:19,535 --> 00:05:22,565If this regular expressiongrows too much,10400:05:22,565 --> 00:05:25,070then all these functionswill slow down.10500:05:25,070 --> 00:05:26,360So the purpose of10600:05:26,360 --> 00:05:30,455the simplification functionis to after the derivative,10700:05:30,455 --> 00:05:33,080compress again thisregular expression10800:05:33,080 --> 00:05:35,570and then do the next derivative.10900:05:35,570 --> 00:05:39,815So if we introduce thatadditional functions simp,11000:05:39,815 --> 00:05:42,440which essentiallyjust goes through11100:05:42,440 --> 00:05:46,040this regular expression anddeletes all this junk,11200:05:46,040 --> 00:05:50,045then we should be in amuch better position.11300:05:50,045 --> 00:05:52,490As you can see on this slide,11400:05:52,490 --> 00:05:54,440the simplificationfunction can actually11500:05:54,440 --> 00:05:56,855be implemented very easily.11600:05:56,855 --> 00:05:59,750However, there are fewinteresting points.11700:05:59,750 --> 00:06:02,720First of all, thereare only two cases.11800:06:02,720 --> 00:06:05,060We only simplify when we have11900:06:05,060 --> 00:06:08,255an alternative or a sequence.12000:06:08,255 --> 00:06:12,859So for example, we willnever simplify under star.12100:06:12,859 --> 00:06:15,950If you look at thederivative operation12200:06:15,950 --> 00:06:17,825and you scratch yourhead a little bit,12300:06:17,825 --> 00:06:20,180we'll find out whyis that the case.12400:06:20,180 --> 00:06:22,145It's essentially a waste of time.12500:06:22,145 --> 00:06:25,505So you would notsimplify under star.12600:06:25,505 --> 00:06:27,650You only simplify if you have12700:06:27,650 --> 00:06:30,530an alternative and the sequence.12800:06:30,530 --> 00:06:34,880Now, however, thereis one small point12900:06:34,880 --> 00:06:39,785you have to be aware of.These simplification rules13000:06:39,785 --> 00:06:42,740they need to be essentiallyapplied backwards.13100:06:42,740 --> 00:06:45,650So I have to first essentiallygo to the leaves of13200:06:45,650 --> 00:06:49,085this regular expression andthen have to find out,13300:06:49,085 --> 00:06:51,170can I apply the simplification?13400:06:51,170 --> 00:06:52,670And then if yes,13500:06:52,670 --> 00:06:55,430I apply the simplificationand I look at the next step,13600:06:55,430 --> 00:06:57,815can I now simplifysomething more?13700:06:57,815 --> 00:06:59,390The reason is how13800:06:59,390 --> 00:07:03,125the simplificationrules are formulated.13900:07:03,125 --> 00:07:05,300They might fire in14000:07:05,300 --> 00:07:08,765a higher level if somethingsimplifies below.14100:07:08,765 --> 00:07:14,315So we have to essentiallysimplify from the inside out.14200:07:14,315 --> 00:07:16,850And in thissimplification function,14300:07:16,850 --> 00:07:20,885what that means we'rematching this regular expression.14400:07:20,885 --> 00:07:23,120We test whether it'san alternative or14500:07:23,120 --> 00:07:26,345a sequence. Only then weactually go into action.14600:07:26,345 --> 00:07:28,385Once we know14700:07:28,385 --> 00:07:30,530in case of an alternative,14800:07:30,530 --> 00:07:32,120what are the two components?14900:07:32,120 --> 00:07:35,765We first, simplify each component,15000:07:35,765 --> 00:07:39,095okay, and then weget a result back.15100:07:39,095 --> 00:07:41,180And we check whether15200:07:41,180 --> 00:07:45,679this simplification ofr1 resulted into a 0.15300:07:45,679 --> 00:07:47,884Then because it's an alternative15400:07:47,884 --> 00:07:49,190then I can just replace it15500:07:49,190 --> 00:07:53,375by r2s, which a simplifiedversion of r2.15600:07:53,375 --> 00:07:58,820If it came back r2sis actually 0,15700:07:58,820 --> 00:08:00,410then I can replace it by15800:08:00,410 --> 00:08:03,785the simplified version of a r1.15900:08:03,785 --> 00:08:07,460If I simplify bothalternatives and it16000:08:07,460 --> 00:08:11,180happens that the simplifiedversions are the same,16100:08:11,180 --> 00:08:15,170then I can throw awayone of the alternatives.16200:08:15,170 --> 00:08:18,080Otherwise, I just have tokeep the alternatives,16300:08:18,080 --> 00:08:21,155but the simplified components.16400:08:21,155 --> 00:08:23,915In the sequence case it ispretty much the same.16500:08:23,915 --> 00:08:27,950I first have to check whatdoes it simplify underneath.16600:08:27,950 --> 00:08:31,385So I call first simplifyand then have a look...16700:08:31,385 --> 00:08:33,020Does it matches 0 one of 16800:08:33,020 --> 00:08:36,035the simplifications,then I just return 0.16900:08:36,035 --> 00:08:38,330Or if the other component is 0,17000:08:38,330 --> 00:08:40,535I can also return a 0.17100:08:40,535 --> 00:08:43,310If it's one, then I keepthe other component.17200:08:43,310 --> 00:08:45,920And if the rs2 is one,17300:08:45,920 --> 00:08:47,615and I keep the first component.17400:08:47,615 --> 00:08:50,359And otherwise it'sstill the sequence.17500:08:50,359 --> 00:08:53,540And in all the other cases Idon't have to do anything.17600:08:53,540 --> 00:08:55,700If it's just a plain0. I leave it in.17700:08:55,700 --> 00:08:57,860If it's a plain1, I leave it in.17800:08:57,860 --> 00:09:00,170And if something is under a star,17900:09:00,170 --> 00:09:02,840I don't open up thisstar and simplify it.18000:09:02,840 --> 00:09:06,680I just apply that to beas quick as possible.18100:09:06,680 --> 00:09:10,280Let's see whether this hasany effect on our code.18200:09:10,280 --> 00:09:12,980So I'm now in thefile re3.sc,18300:09:12,980 --> 00:09:17,405and it's the same as re2.sc.18400:09:17,405 --> 00:09:20,885It still has thisexplicit and n-times case,18500:09:20,885 --> 00:09:24,665nullable and derivative arestill the same.18600:09:24,665 --> 00:09:28,610Except now we have thissimplification function as well.18700:09:28,610 --> 00:09:29,960And when we apply18800:09:29,960 --> 00:09:33,725the derivative and afterwe apply the derivative,18900:09:33,725 --> 00:09:35,870we simplify it to throw away19000:09:35,870 --> 00:09:39,050all the junk thisderivative introduced.19100:09:39,050 --> 00:09:41,510Otherwise everythingstays the same.19200:09:41,510 --> 00:09:43,580You still have this expansion19300:09:43,580 --> 00:09:45,515of the optional regular expression.19400:09:45,515 --> 00:09:49,670Here are our two EVIL regularexpressions we use as a test.19500:09:49,670 --> 00:09:52,460And here's now this test case,19600:09:52,460 --> 00:09:55,835the first one and we'reactually now trying it19700:09:55,835 --> 00:10:00,515out for strings between0 and 9000 a's.19800:10:00,515 --> 00:10:08,285So let's see. So also the simplification makes 19900:10:08,285 --> 00:10:10,655actually this case faster.20000:10:10,655 --> 00:10:16,260So we can now run stringsup to 9000.20100:10:16,260 --> 00:10:19,360And we don't actuallysweat about this at all.20200:10:19,360 --> 00:10:24,745For I have to say my laptopis now starting to run its fan.20300:10:24,745 --> 00:10:28,525And also, because the timesare relatively small,20400:10:28,525 --> 00:10:30,610I'm actually runningeach test three20500:10:30,610 --> 00:10:32,785times and then take the average,20600:10:32,785 --> 00:10:34,720which I didn't do before,20700:10:34,720 --> 00:10:37,720just to be a tiny bitmore accurate.20800:10:37,720 --> 00:10:42,025So three seconds for astring of 9000 a's.20900:10:42,025 --> 00:10:44,980And now the moreinteresting question is,21000:10:44,980 --> 00:10:46,240for the second case,21100:10:46,240 --> 00:10:48,625this a star, star, b.21200:10:48,625 --> 00:10:51,250And you can already seeon these numbers...21300:10:51,250 --> 00:10:53,260we are really ambitious.21400:10:53,260 --> 00:10:57,860And let's see how ourprogram is coping with that.21500:11:02,610 --> 00:11:07,780Again. No sweat, atleast not on my part.21600:11:07,780 --> 00:11:10,480The laptop has tocalculate quite a bit.21700:11:10,480 --> 00:11:12,940But this is now a string of21800:11:12,940 --> 00:11:16,5393 million a's andit needs a second.21900:11:16,539 --> 00:11:20,680And let's see how far we get.22000:11:20,680 --> 00:11:25,2804 million a's around two seconds.22100:11:27,030 --> 00:11:29,050So say it again,22200:11:29,050 --> 00:11:31,690I'm actually slowing it down22300:11:31,690 --> 00:11:34,150here artificially with running the test22400:11:34,150 --> 00:11:36,895three times and thentake the average.22500:11:36,895 --> 00:11:42,749But it seems to be somethingless than five seconds.22600:11:42,749 --> 00:11:48,185And remember that is astring of 6 million a's.22700:11:48,185 --> 00:11:51,170Let's just have alook at the graphs.22800:11:51,170 --> 00:11:56,105So the simplification alsomade our first case faster.22900:11:56,105 --> 00:11:58,880So earlier withoutsimplification,23000:11:58,880 --> 00:12:00,710where we just havethis explicit 23100:12:00,710 --> 00:12:03,050n-times. That is this graph.23200:12:03,050 --> 00:12:05,210And now we can go up to23300:12:05,210 --> 00:12:09,62010 thousand and be stillunder five seconds.23400:12:09,620 --> 00:12:12,300So that is good news.23500:12:13,270 --> 00:12:16,745In the other example, remember,23600:12:16,745 --> 00:12:19,220the existing regularexpression matchers in23700:12:19,220 --> 00:12:22,880Java 8, Python,and JavaScript.23800:12:22,880 --> 00:12:26,030And thanks to thestudent we also 23900:12:26,030 --> 00:12:27,935have a graph for Swift.24000:12:27,935 --> 00:12:29,750They're pretty atrocious.24100:12:29,750 --> 00:12:33,320They need like for 30 a's,24200:12:33,320 --> 00:12:37,490something like on themagnitude of 30 seconds.24300:12:37,490 --> 00:12:39,740As I said already,24400:12:39,740 --> 00:12:42,665Java 9 is slightly better.24500:12:42,665 --> 00:12:44,870Java 9 and above, 24600:12:44,870 --> 00:12:46,220if you try that example,24700:12:46,220 --> 00:12:48,665the same example in Java 9,24800:12:48,665 --> 00:12:51,230you would be able toprocess something24900:12:51,230 --> 00:12:54,215like 40 thousand a'sin half a minute.25000:12:54,215 --> 00:12:57,740I have to admit I'm not100% sure what they25100:12:57,740 --> 00:13:03,575did to improve theperformance between Java 8 and 9.25200:13:03,575 --> 00:13:05,510I know they did something on25300:13:05,510 --> 00:13:07,790their regular expressionmatching engine,25400:13:07,790 --> 00:13:09,770but I haven't really looked25500:13:09,770 --> 00:13:12,230at what precisely they've done.25600:13:12,230 --> 00:13:17,945But that's still notreally a competition for us.25700:13:17,945 --> 00:13:20,915So our third version,25800:13:20,915 --> 00:13:24,335in this example, a star, star b.25900:13:24,335 --> 00:13:28,445We said for something like6 million a's we need 15 secs.26000:13:28,445 --> 00:13:31,760And again, I think theseare slightly older times,26100:13:31,760 --> 00:13:33,770so here it needs 20 seconds.26200:13:33,770 --> 00:13:37,250I think we had somethinglike below five seconds.26300:13:37,250 --> 00:13:40,865But again, you can seethe trends. We run.26400:13:40,865 --> 00:13:42,590circles around them.26500:13:42,590 --> 00:13:46,530And that's quite nice.26600:13:46,570 --> 00:13:49,774So what's good aboutthis algorithm26700:13:49,774 --> 00:13:51,605by Brzozowski?26800:13:51,605 --> 00:13:54,500Well, first, it extends26900:13:54,500 --> 00:13:57,800actually to quite a numberof regular expressions.27000:13:57,800 --> 00:13:59,900So we saw in this example27100:13:59,900 --> 00:14:03,950the n-times regular expression.Is a little bit of work, but27200:14:03,950 --> 00:14:05,405we could extend that.27300:14:05,405 --> 00:14:08,480It also extends to thenot regular expression,27400:14:08,480 --> 00:14:10,820which I explain onthe next slide.27500:14:10,820 --> 00:14:15,290It's very easy to implementin a functional language.27600:14:15,290 --> 00:14:16,610If you don't buy27700:14:16,610 --> 00:14:20,675all that functionalprogramming language stuff,27800:14:20,675 --> 00:14:22,955you still have to agree.27900:14:22,955 --> 00:14:25,715It's quite beautiful in Scala.28000:14:25,715 --> 00:14:28,160And you could alsoeasily implemented28100:14:28,160 --> 00:14:31,760in the C language or in Python.28200:14:31,760 --> 00:14:34,250There's really noproblem with that.28300:14:34,250 --> 00:14:38,390The algorithm is actuallyquite old already. 28400:14:38,390 --> 00:14:44,899Brzozowski wrote it down in1964 and his PhD thesis.28500:14:44,899 --> 00:14:49,460Somehow it was forgotten during28600:14:49,460 --> 00:14:54,095the next decades and onlyrecently has been rediscovered.28700:14:54,095 --> 00:14:57,065At the moment, we areonly solved the problem28800:14:57,065 --> 00:15:02,240of given a regular expression,givven a string, does28900:15:02,240 --> 00:15:05,550the regular expressionmatch the string yes or no.29000:15:05,550 --> 00:15:08,740We want to of course, useit as part of our lexer.29100:15:08,740 --> 00:15:12,370And there we have to dosomething more complicated.29200:15:12,370 --> 00:15:15,384We have to essentiallygenerate tokens.29300:15:15,384 --> 00:15:18,670And this will still takea little bit of work.29400:15:18,670 --> 00:15:22,045And that's actually relativelyrecent work by somebody29500:15:22,045 --> 00:15:26,110called Sulzmann andthe co-worker called Lu.29600:15:26,110 --> 00:15:30,880And what I personallyalso find very satisfying29700:15:30,880 --> 00:15:32,695about this algorithm is29800:15:32,695 --> 00:15:36,040that we can actuallyprove that it's correct.29900:15:36,040 --> 00:15:37,735You might remember I did30000:15:37,735 --> 00:15:41,500quite some interestingtransformations30100:15:41,500 --> 00:15:44,830when I calculated the derivative.30200:15:44,830 --> 00:15:48,270How can I be sure thatthis is all correct?30300:15:48,270 --> 00:15:50,270Actually, I can now go away and30400:15:50,270 --> 00:15:52,865prove the correctnessof this algorithm.30500:15:52,865 --> 00:15:56,645Does it really satisfythe specification?30600:15:56,645 --> 00:15:58,550Is it really true that if a string30700:15:58,550 --> 00:16:00,440is in the languageof a regular expression.30800:16:00,440 --> 00:16:03,050Does that matter? I would say yes.30900:16:03,050 --> 00:16:07,070However, I leave thatfor another video.31000:16:07,070 --> 00:16:10,580What I also like aboutthis algorithm that can be31100:16:10,580 --> 00:16:13,775actually extended to quite anumber of regular expressions.31200:16:13,775 --> 00:16:17,810So this is the NOT regularexpression that is31300:16:17,810 --> 00:16:19,760supposed to match strings which31400:16:19,760 --> 00:16:22,235this regular expression cannot match.31500:16:22,235 --> 00:16:24,245So here's an example.31600:16:24,245 --> 00:16:28,640If you're in the business oftrying to find out what31700:16:28,640 --> 00:16:33,530are comments in languages likeJava or C. Then you know,31800:16:33,530 --> 00:16:35,060they always start with a slash,31900:16:35,060 --> 00:16:36,245then comes a star,32000:16:36,245 --> 00:16:38,240then comes an arbitrary string.32100:16:38,240 --> 00:16:41,195And then they finishwith a slash and a star.32200:16:41,195 --> 00:16:44,330And how you want to specifythat is something like this.32300:16:44,330 --> 00:16:45,530You want to say, OK,32400:16:45,530 --> 00:16:48,245a comment starts witha slash and a star.32500:16:48,245 --> 00:16:51,410Then it comes anystring in between.32600:16:51,410 --> 00:16:55,340But this stringin-between cannot contain32700:16:55,340 --> 00:16:58,280any star and slashbecause that would then32800:16:58,280 --> 00:17:01,415indicate there's theend already there.32900:17:01,415 --> 00:17:03,680And then after youhave such a string33000:17:03,680 --> 00:17:06,410which doesn'tcontain a star and a slash,33100:17:06,410 --> 00:17:11,180then at the end you want tomatch a star and a slash.33200:17:11,180 --> 00:17:13,460So for these kind of comments,33300:17:13,460 --> 00:17:15,995this regular expression isactually quite useful.33400:17:15,995 --> 00:17:19,430And if you ever seenhow to do the negation,33500:17:19,430 --> 00:17:21,995for this kind of regularexpression with automata,33600:17:21,995 --> 00:17:24,710you will know that'sa bit of a pain.33700:17:24,710 --> 00:17:26,675But for the Brzozowski,33800:17:26,675 --> 00:17:28,370it's no pain at all.33900:17:28,370 --> 00:17:30,710The meaning of thisregular expression34000:17:30,710 --> 00:17:32,255would be something like that.34100:17:32,255 --> 00:17:35,540If I have all thestrings there are,34200:17:35,540 --> 00:17:38,390and I take away all the strings,34300:17:38,390 --> 00:17:40,055this r can match.34400:17:40,055 --> 00:17:42,080The remaining strings are34500:17:42,080 --> 00:17:44,630what this regular expressionis supposed to match.34600:17:44,630 --> 00:17:47,165So I can specifywhat the meaning is.34700:17:47,165 --> 00:17:49,760And then it's also very easy to34800:17:49,760 --> 00:17:52,174say what is nullableand derivative.34900:17:52,174 --> 00:17:54,140So for nullable, it would be35000:17:54,140 --> 00:17:56,570just a test whether ris nullable.35100:17:56,570 --> 00:17:58,985And I just take thenegation of that.35200:17:58,985 --> 00:18:00,680And then I have to build35300:18:00,680 --> 00:18:03,620the derivative of thisNOT regular expression.35400:18:03,620 --> 00:18:05,420Then I just have to move35500:18:05,420 --> 00:18:07,325this ....35600:18:07,325 --> 00:18:10,070derivative insidethe regular expression35700:18:10,070 --> 00:18:12,575and keep the NOT onthe outermost level.35800:18:12,575 --> 00:18:15,515So there's again nopain involved in35900:18:15,515 --> 00:18:19,130adding this regular expressionto the algorithm.36000:18:19,130 --> 00:18:22,100We made it to the end ofthis video and we made36100:18:22,100 --> 00:18:24,739it also to the endof this algorithm.36200:18:24,739 --> 00:18:27,620I can see to loose strands.36300:18:27,620 --> 00:18:29,420One is we implemented36400:18:29,420 --> 00:18:32,855this algorithm for thebasic regular expressions.36500:18:32,855 --> 00:18:38,705And we also added the n-times out of necessity.36600:18:38,705 --> 00:18:41,120And I also showedyou how to implement36700:18:41,120 --> 00:18:44,840the NOT regularexpression on a slide.36800:18:44,840 --> 00:18:48,995But your task in thecoursework actually is36900:18:48,995 --> 00:18:52,970to extend that to some ofthe extended regular expressions.37000:18:52,970 --> 00:18:57,260So especially these boundedrepetitions like from 0 to37100:18:57,260 --> 00:19:01,550n-times or between n and m times.37200:19:01,550 --> 00:19:04,325And I think alsoplus and option.37300:19:04,325 --> 00:19:07,220The other loose strand is,37400:19:07,220 --> 00:19:09,200remember I did this wild37500:19:09,200 --> 00:19:11,645calculations withregular expressions.37600:19:11,645 --> 00:19:13,205Why on earth37700:19:13,205 --> 00:19:14,480is that all correct?37800:19:14,480 --> 00:19:16,355Why on earth shouldthis algorithm37900:19:16,355 --> 00:19:18,575actually satisfyour specification,38000:19:18,575 --> 00:19:20,450which we set outat the beginning.38100:19:20,450 --> 00:19:25,410So that needs to be looked atmore closely. Bye for now.