100:00:06,410 --> 00:00:09,390The final video for this week.200:00:09,390 --> 00:00:12,465And let me say somethingabout the coursework.300:00:12,465 --> 00:00:15,300First. You can solve400:00:15,300 --> 00:00:17,745the coursework in any programminglanguage you're like,500:00:17,745 --> 00:00:22,440I already said that. Youhave to submit a PDF file.600:00:22,440 --> 00:00:23,850There will be some questions,700:00:23,850 --> 00:00:26,250so you have to write down thesolution to this questions.800:00:26,250 --> 00:00:30,315Please use a PDF and you haveto submit your source code.900:00:30,315 --> 00:00:35,580And yes, if you do use alanguage which isn't Scala,1000:00:35,580 --> 00:00:39,450it might help to tell mehow I can run your code.1100:00:39,450 --> 00:00:42,550If I can't run your code,I will ask you anyway.1200:00:42,550 --> 00:00:50,044Also, please submit both thePDF and the code in a file,1300:00:50,044 --> 00:00:52,730in a zip file, which generates1400:00:52,730 --> 00:00:55,835a subdirectory with yourname and your family name.1500:00:55,835 --> 00:00:58,970That we'll just save alot of time for me to try1600:00:58,970 --> 00:01:02,030to figure out whohas submitted what.1700:01:02,030 --> 00:01:04,445Please do that.1800:01:04,445 --> 00:01:07,789So what is the taskinto coursework?1900:01:07,789 --> 00:01:09,885I essentially showed you how2000:01:09,885 --> 00:01:11,995the Brzozowski matcher2100:01:11,995 --> 00:01:14,645works for the basicregular expressions.2200:01:14,645 --> 00:01:16,295I also showed you actually how it2300:01:16,295 --> 00:01:18,110works for the n-times.2400:01:18,110 --> 00:01:20,300And the task in coursework2500:01:20,300 --> 00:01:22,970is that you extend theBrzozowski matcher to2600:01:22,970 --> 00:01:25,820the other regular expressions2700:01:25,820 --> 00:01:27,800from the extendedregular expressions.2800:01:27,800 --> 00:01:30,245So you're supposedto extended with2900:01:30,245 --> 00:01:34,490r-plus, optional r, for n-times.3000:01:34,490 --> 00:01:38,420You've already seen that.For 0 or more times,3100:01:38,420 --> 00:01:41,120but not more than mtimes regular expression.3200:01:41,120 --> 00:01:45,890For at least n-times regularexpression and for between3300:01:45,890 --> 00:01:47,480n times and m times3400:01:47,480 --> 00:01:50,810regular expression and also thisNOT regular expression.3500:01:50,810 --> 00:01:52,790So your task is to3600:01:52,790 --> 00:01:55,310essentially find the definition3700:01:55,310 --> 00:01:57,155for nullable in these cases.3800:01:57,155 --> 00:02:00,215Find a definition for derivative,3900:02:00,215 --> 00:02:02,480implement them,write them down in4000:02:02,480 --> 00:02:06,065a PDF and then do someexperiments with them.4100:02:06,065 --> 00:02:08,875Well, how can you do that?4200:02:08,875 --> 00:02:10,370Well I've given you4300:02:10,370 --> 00:02:13,565the meaning of these additionalregular expressions.4400:02:13,565 --> 00:02:16,730So here's, for example, themeaning of this r-plus.4500:02:16,730 --> 00:02:18,290So that would be, I have4600:02:18,290 --> 00:02:22,115at least one copy up to infinity.4700:02:22,115 --> 00:02:25,070And the optional-r would be it's4800:02:25,070 --> 00:02:28,370the language of r plusthe empty string.4900:02:28,370 --> 00:02:30,440If I have it exactly n times,5000:02:30,440 --> 00:02:31,985then it would bejust the language5100:02:31,985 --> 00:02:34,010of r exactly n-times.5200:02:34,010 --> 00:02:38,105And here you have unionfrom 0 to m and so on.5300:02:38,105 --> 00:02:42,560This might be a slightlyinteresting regular expression.5400:02:42,560 --> 00:02:46,580So that's supposed to be thecharacter set that is very5500:02:46,580 --> 00:02:48,410similar to character ranges like5600:02:48,410 --> 00:02:51,005in the existing regularexpression matchers.5700:02:51,005 --> 00:02:52,820So this just says5800:02:52,820 --> 00:02:55,565...this regularexpression can match5900:02:55,565 --> 00:03:00,860either the character c1 orthe character c2 up to cn.6000:03:00,860 --> 00:03:03,620Why do I includethat regular expression?6100:03:03,620 --> 00:03:09,050I could also just sayc1 plus c2 plus c3...6200:03:09,050 --> 00:03:12,889a big alternative.Well that's possible.6300:03:12,889 --> 00:03:16,595But remember the Achilles'heel of this algorithm is,6400:03:16,595 --> 00:03:18,425if the regular expression is large,6500:03:18,425 --> 00:03:20,825then the computationtake always a long time.6600:03:20,825 --> 00:03:23,840So I'm trying to compressthat as much as I can.6700:03:23,840 --> 00:03:25,370So instead of giving a big6800:03:25,370 --> 00:03:29,134alternative, I am trying togive one regular expression,6900:03:29,134 --> 00:03:31,340which can not just matcha single character,7000:03:31,340 --> 00:03:34,230but a set of characters.7100:03:34,630 --> 00:03:36,980How can you be sure that7200:03:36,980 --> 00:03:39,215definition you comeup with are correct?7300:03:39,215 --> 00:03:41,450Well, I showed you which are7400:03:41,450 --> 00:03:45,170the properties these twofunctions need to satisfy.7500:03:45,170 --> 00:03:47,060I've given you here what7600:03:47,060 --> 00:03:49,325the meaning of theseexpressions are.7700:03:49,325 --> 00:03:52,700So you will always know what'son the right-hand side.7800:03:52,700 --> 00:03:54,515This is completely determined.7900:03:54,515 --> 00:03:56,720You essentiallyhave to fill something8000:03:56,720 --> 00:03:58,910equivalent on the left-hand side.8100:03:58,910 --> 00:04:02,105That's the main taskof the coursework.8200:04:02,105 --> 00:04:08,090And you can start from thefiles I provided on KEATS.8300:04:08,090 --> 00:04:12,125So you would just have toadd these additional cases.8400:04:12,125 --> 00:04:15,020When you add theseadditional cases8500:04:15,020 --> 00:04:17,330and you're using the Scala language,8600:04:17,330 --> 00:04:18,980you can do me a favour.8700:04:18,980 --> 00:04:22,550You can call theseconstructors for8800:04:22,550 --> 00:04:25,400these different regularexpressions as RANGE,8900:04:25,400 --> 00:04:28,985PLUS, OPTIONAL and NTIMES,UPTO, FROM and BETWEEN.9000:04:28,985 --> 00:04:31,370Remember I have thisconvention to always9100:04:31,370 --> 00:04:34,025use capital lettersfor regular expressions.9200:04:34,025 --> 00:04:36,680It would be nice if you could use9300:04:36,680 --> 00:04:39,260these names becausethen it will be9400:04:39,260 --> 00:04:42,335very easy for meto test your code.9500:04:42,335 --> 00:04:44,690If you're using a differentprogramming language9600:04:44,690 --> 00:04:46,370like let's say Rust,9700:04:46,370 --> 00:04:48,860I expect some people will use that, where I9800:04:48,860 --> 00:04:51,380have no idea what are theconventions in your language,9900:04:51,380 --> 00:04:53,420how you have to callthese constructors,10000:04:53,420 --> 00:04:56,420we will see whatever yousubmit. I'll have a look.10100:04:56,420 --> 00:04:59,360There's one moreconstraint I have to10200:04:59,360 --> 00:05:02,194impose to make thiscoursework interesting.10300:05:02,194 --> 00:05:04,730I do not want you that you just take10400:05:04,730 --> 00:05:07,370these extended regularexpressions and that you10500:05:07,370 --> 00:05:10,550expand them usingtheir definition.10600:05:10,550 --> 00:05:12,320Because that would make the regular10700:05:12,320 --> 00:05:13,955expression matcher very slow.10800:05:13,955 --> 00:05:16,160So for example, ifyou want to define10900:05:16,160 --> 00:05:18,785what's the derivative for r-plus,11000:05:18,785 --> 00:05:21,560then what does notcount as a solution:11100:05:21,560 --> 00:05:24,770if you just expand thatto the definition that r11200:05:24,770 --> 00:05:28,935plus is equivalent tor followed by r-star.11300:05:28,935 --> 00:05:32,845So in code, what Iwould like to not see,11400:05:32,845 --> 00:05:35,680I would not give anymarks for that is,11500:05:35,680 --> 00:05:37,780if you add the plus as follows,11600:05:37,780 --> 00:05:39,910so that is still perfectly fine.11700:05:39,910 --> 00:05:42,160There's nothing youcan do differently.11800:05:42,160 --> 00:05:44,065That would be the constructor.11900:05:44,065 --> 00:05:46,480But when you define nullable,12000:05:46,480 --> 00:05:49,180I do not want to seethat you defined12100:05:49,180 --> 00:05:51,790this plus r as nullable12200:05:51,790 --> 00:05:54,385of the sequence of r and star-r,12300:05:54,385 --> 00:05:58,075just unfoldingthe definition.12400:05:58,075 --> 00:06:00,415As you can see, when you come12500:06:00,415 --> 00:06:02,815up with a much bettersolution for that.12600:06:02,815 --> 00:06:05,110This is a very inefficient way12700:06:05,110 --> 00:06:07,090for how to calculate nullable.12800:06:07,090 --> 00:06:10,825And so the same for derivativewould not be allowed.12900:06:10,825 --> 00:06:13,895If you, I have the plus r here,13000:06:13,895 --> 00:06:16,685you can't just unfoldthe definition,13100:06:16,685 --> 00:06:19,790with r-plusbeing defined as r13200:06:19,790 --> 00:06:21,350followed by r star and13300:06:21,350 --> 00:06:23,375then just build thederivative of that.13400:06:23,375 --> 00:06:25,280You have to find something more13500:06:25,280 --> 00:06:28,730primitive that involvesonly the derivative of r,13600:06:28,730 --> 00:06:31,790essentially, nothingmore involved. The same13700:06:31,790 --> 00:06:33,815if you have n-times, for example,13800:06:33,815 --> 00:06:39,935you can't just unfold thisn-times in this sequence13900:06:39,935 --> 00:06:43,310of .... n-copies14000:06:43,310 --> 00:06:47,285of this r. You have to getsomething more primitive.14100:06:47,285 --> 00:06:49,760Otherwise, as you'veseen earlier,14200:06:49,760 --> 00:06:53,135your regular expression matcherwould be totally slow.14300:06:53,135 --> 00:06:55,475When you submit your solution,14400:06:55,475 --> 00:06:58,445please submit thissolution in the PDF.14500:06:58,445 --> 00:07:01,580So give the cases of your definition14600:07:01,580 --> 00:07:06,004in a form like that fornullable and derivative.14700:07:06,004 --> 00:07:08,405And also implement it in code.14800:07:08,405 --> 00:07:10,910That is just helping me to14900:07:10,910 --> 00:07:13,400find out whether you havethe correct solution or not.15000:07:13,400 --> 00:07:15,710So you needed a kind ofmathematical notation of15100:07:15,710 --> 00:07:18,695your solution, andan actual code.15200:07:18,695 --> 00:07:22,414And then once youhave your definition,15300:07:22,414 --> 00:07:25,699also make sure you tryit out on some examples.15400:07:25,699 --> 00:07:28,970These will be the examplesI will definitely try out,15500:07:28,970 --> 00:07:30,560but probably also more15600:07:30,560 --> 00:07:33,215depending on whatdefinitions you give me.15700:07:33,215 --> 00:07:38,300There's one more question Iask you to do. You've seen15800:07:38,300 --> 00:07:40,130there's some regularexpressions that15900:07:40,130 --> 00:07:42,215are involved for characters,16000:07:42,215 --> 00:07:46,925for character ranges orcharacter sets essentially.16100:07:46,925 --> 00:07:48,665And sometimes I also want to have16200:07:48,665 --> 00:07:51,665just a reg expression whichcan match any character!!16300:07:51,665 --> 00:07:56,195And I could have for all ofthem separate definitions.16400:07:56,195 --> 00:07:57,710And that would mean I also need16500:07:57,710 --> 00:07:59,645separate definitions for nullable,16600:07:59,645 --> 00:08:02,405separate definitionsfor derivative.16700:08:02,405 --> 00:08:05,825But actually they can begeneralized to just one constructor.16800:08:05,825 --> 00:08:08,210And that is if we introduce16900:08:08,210 --> 00:08:11,630a constructor for regular expressions,17000:08:11,630 --> 00:08:13,760which not just takesa single character17100:08:13,760 --> 00:08:15,469or set of characters.17200:08:15,469 --> 00:08:18,245But, which I call here CFUN.17300:08:18,245 --> 00:08:23,165And it takes a function fromcharacters to booleans.17400:08:23,165 --> 00:08:24,470So if I want to match17500:08:24,470 --> 00:08:26,900just a single characterthen this function f17600:08:26,900 --> 00:08:29,060would only say true17700:08:29,060 --> 00:08:32,225if it gets as argumentthe single character.17800:08:32,225 --> 00:08:34,850If I have a character set,17900:08:34,850 --> 00:08:36,515then I would say, well,18000:08:36,515 --> 00:08:38,300I need a functionwhich says true18100:08:38,300 --> 00:08:40,970for all the charactersin this set.18200:08:40,970 --> 00:08:43,460And otherwise it's false.18300:08:43,460 --> 00:08:46,205And if I want tomatch any character!!,18400:08:46,205 --> 00:08:48,500then they should get a function18500:08:48,500 --> 00:08:53,450which says true forall characters.18600:08:53,450 --> 00:08:56,630Okay? If I havesuch a constructor18700:08:56,630 --> 00:09:00,140that actually I can savemyself a bit of work.18800:09:00,140 --> 00:09:03,200And I can just have one case18900:09:03,200 --> 00:09:06,920for nullable and onecase for CFUNS.19000:09:06,920 --> 00:09:09,800So that would then subsumethe character ranges and19100:09:09,800 --> 00:09:13,385the character and alsothis ALL regular expression.19200:09:13,385 --> 00:09:15,380Ok, this was not explicitlyincluded at the beginning,19300:09:15,380 --> 00:09:17,510but that's what I can now define.19400:09:17,510 --> 00:09:21,740And then I can definethis for characters,19500:09:21,740 --> 00:09:23,885just as special casesfor these functions.19600:09:23,885 --> 00:09:25,610And I told you alreadywhat this function19700:09:25,610 --> 00:09:28,025should look like inthese three cases.19800:09:28,025 --> 00:09:30,350So I let ...19900:09:30,350 --> 00:09:33,515you decide how you'regoing to implement that.20000:09:33,515 --> 00:09:35,450If you first define20100:09:35,450 --> 00:09:38,495your implementation isall these explicit cases.20200:09:38,495 --> 00:09:41,900Because in these cases it'sactually more intuitive,20300:09:41,900 --> 00:09:45,980what nullable andderivative should be.20400:09:45,980 --> 00:09:48,035And then in a second step,20500:09:48,035 --> 00:09:51,140you implement thesemore general cases.20600:09:51,140 --> 00:09:53,360And just keep the original ones20700:09:53,360 --> 00:09:54,500in your implementation if you20800:09:54,500 --> 00:09:57,710want, or if you feeladventurous enough,20900:09:57,710 --> 00:09:59,225and want to be lazy,21000:09:59,225 --> 00:10:01,115that you just implement that.21100:10:01,115 --> 00:10:02,660And then you have already done21200:10:02,660 --> 00:10:06,530at least two constructorsby just implementing one.21300:10:06,530 --> 00:10:08,915But as said that requires a bit skill21400:10:08,915 --> 00:10:11,450of generalizing howthat should be.21500:10:11,450 --> 00:10:14,180And the other questionsare just then21600:10:14,180 --> 00:10:16,745trying out yourexpression matcher on21700:10:16,745 --> 00:10:19,580some examples, morepower examples,21800:10:19,580 --> 00:10:22,400and they should bevery easy to solve.21900:10:22,400 --> 00:10:24,605So I hope it's nottoo much asked for22000:10:24,605 --> 00:10:26,615and I hope you have fun.22100:10:26,615 --> 00:10:29,675It is not too much askbecause you can,22200:10:29,675 --> 00:10:32,420I hope it's not too muchbecause you can start from22300:10:32,420 --> 00:10:35,615my definitions ormy implementation.22400:10:35,615 --> 00:10:39,425It's really essentiallymostly is brain work,22500:10:39,425 --> 00:10:42,170how these nullable andderivative should work.22600:10:42,170 --> 00:10:44,510If you're in alanguage like Scala,22700:10:44,510 --> 00:10:48,875the implementation shouldthen be easy-peasy.22800:10:48,875 --> 00:10:51,365If you are in a different language22900:10:51,365 --> 00:10:52,610I assume you also23000:10:52,610 --> 00:10:54,890dedicated to thatlanguage to start with,23100:10:54,890 --> 00:10:58,475so it should be also veryeasy for you to translate23200:10:58,475 --> 00:11:01,100my Scala code into whateverlanguage you want to23300:11:01,100 --> 00:11:04,280do, first and thenstart from there.23400:11:04,280 --> 00:11:06,230If there are any questions,23500:11:06,230 --> 00:11:08,360please ask me or the TAs.23600:11:08,360 --> 00:11:10,040We are trying to be as helpful23700:11:10,040 --> 00:11:12,900as possible with the coursework.23800:11:13,240 --> 00:11:17,885Bear in mind, this is thefirst step in our compiler.23900:11:17,885 --> 00:11:21,005The coursework 2 will thenbuild on top of that.24000:11:21,005 --> 00:11:25,770So it's better to get thiscorrect first. Thanks.