diff -r 260e330f7466 -r 42733adf2a48 videos/02-cw1.srt --- /dev/null Thu Jan 01 00:00:00 1970 +0000 +++ b/videos/02-cw1.srt Mon Oct 05 17:46:12 2020 +0100 @@ -0,0 +1,1107 @@ +1 +00:00:06,410 --> 00:00:09,390 +The final video for this week. + +2 +00:00:09,390 --> 00:00:12,465 +And let me say something +about the coursework. + +3 +00:00:12,465 --> 00:00:15,300 +First. You can solve + +4 +00:00:15,300 --> 00:00:17,745 +the coursework in any programming +language you're like, + +5 +00:00:17,745 --> 00:00:22,440 +I already said that. You +have to submit a PDF file. + +6 +00:00:22,440 --> 00:00:23,850 +There will be some questions, + +7 +00:00:23,850 --> 00:00:26,250 +so you have to write down the +solution to this questions. + +8 +00:00:26,250 --> 00:00:30,315 +Please use a PDF and you have +to submit your source code. + +9 +00:00:30,315 --> 00:00:35,580 +And yes, if you do use a +language which isn't Scala, + +10 +00:00:35,580 --> 00:00:39,450 +it might help to tell me +how I can run your code. + +11 +00:00:39,450 --> 00:00:42,550 +If I can't run your code, +I will ask you anyway. + +12 +00:00:42,550 --> 00:00:50,044 +Also, please submit both the +PDF and the code in a file, + +13 +00:00:50,044 --> 00:00:52,730 +in a zip file, which generates + +14 +00:00:52,730 --> 00:00:55,835 +a subdirectory with your +name and your family name. + +15 +00:00:55,835 --> 00:00:58,970 +That we'll just save a +lot of time for me to try + +16 +00:00:58,970 --> 00:01:02,030 +to figure out who +has submitted what. + +17 +00:01:02,030 --> 00:01:04,445 +Please do that. + +18 +00:01:04,445 --> 00:01:07,789 +So what is the task +into coursework? + +19 +00:01:07,789 --> 00:01:09,885 +I essentially showed you how + +20 +00:01:09,885 --> 00:01:11,995 +the Brzozowski matcher + +21 +00:01:11,995 --> 00:01:14,645 +works for the basic +regular expressions. + +22 +00:01:14,645 --> 00:01:16,295 +I also showed you actually how it + +23 +00:01:16,295 --> 00:01:18,110 +works for the n-times. + +24 +00:01:18,110 --> 00:01:20,300 +And the task in coursework + +25 +00:01:20,300 --> 00:01:22,970 +is that you extend the +Brzozowski matcher to + +26 +00:01:22,970 --> 00:01:25,820 +the other regular expressions + +27 +00:01:25,820 --> 00:01:27,800 +from the extended +regular expressions. + +28 +00:01:27,800 --> 00:01:30,245 +So you're supposed +to extended with + +29 +00:01:30,245 --> 00:01:34,490 +r-plus, optional r, for n-times. + +30 +00:01:34,490 --> 00:01:38,420 +You've already seen that. +For 0 or more times, + +31 +00:01:38,420 --> 00:01:41,120 +but not more than m +times regular expression. + +32 +00:01:41,120 --> 00:01:45,890 +For at least n-times regular +expression and for between + +33 +00:01:45,890 --> 00:01:47,480 +n times and m times + +34 +00:01:47,480 --> 00:01:50,810 +regular expression and also this +NOT regular expression. + +35 +00:01:50,810 --> 00:01:52,790 +So your task is to + +36 +00:01:52,790 --> 00:01:55,310 +essentially find the definition + +37 +00:01:55,310 --> 00:01:57,155 +for nullable in these cases. + +38 +00:01:57,155 --> 00:02:00,215 +Find a definition for derivative, + +39 +00:02:00,215 --> 00:02:02,480 +implement them, +write them down in + +40 +00:02:02,480 --> 00:02:06,065 +a PDF and then do some +experiments with them. + +41 +00:02:06,065 --> 00:02:08,875 +Well, how can you do that? + +42 +00:02:08,875 --> 00:02:10,370 +Well I've given you + +43 +00:02:10,370 --> 00:02:13,565 +the meaning of these additional +regular expressions. + +44 +00:02:13,565 --> 00:02:16,730 +So here's, for example, the +meaning of this r-plus. + +45 +00:02:16,730 --> 00:02:18,290 +So that would be, I have + +46 +00:02:18,290 --> 00:02:22,115 +at least one copy up to infinity. + +47 +00:02:22,115 --> 00:02:25,070 +And the optional-r would be it's + +48 +00:02:25,070 --> 00:02:28,370 +the language of r plus +the empty string. + +49 +00:02:28,370 --> 00:02:30,440 +If I have it exactly n times, + +50 +00:02:30,440 --> 00:02:31,985 +then it would be +just the language + +51 +00:02:31,985 --> 00:02:34,010 +of r exactly n-times. + +52 +00:02:34,010 --> 00:02:38,105 +And here you have union +from 0 to m and so on. + +53 +00:02:38,105 --> 00:02:42,560 +This might be a slightly +interesting regular expression. + +54 +00:02:42,560 --> 00:02:46,580 +So that's supposed to be the +character set that is very + +55 +00:02:46,580 --> 00:02:48,410 +similar to character ranges like + +56 +00:02:48,410 --> 00:02:51,005 +in the existing regular +expression matchers. + +57 +00:02:51,005 --> 00:02:52,820 +So this just says + +58 +00:02:52,820 --> 00:02:55,565 +...this regular +expression can match + +59 +00:02:55,565 --> 00:03:00,860 +either the character c1 or +the character c2 up to cn. + +60 +00:03:00,860 --> 00:03:03,620 +Why do I include +that regular expression? + +61 +00:03:03,620 --> 00:03:09,050 +I could also just say +c1 plus c2 plus c3... + +62 +00:03:09,050 --> 00:03:12,889 +a big alternative. +Well that's possible. + +63 +00:03:12,889 --> 00:03:16,595 +But remember the Achilles' +heel of this algorithm is, + +64 +00:03:16,595 --> 00:03:18,425 +if the regular expression is large, + +65 +00:03:18,425 --> 00:03:20,825 +then the computation +take always a long time. + +66 +00:03:20,825 --> 00:03:23,840 +So I'm trying to compress +that as much as I can. + +67 +00:03:23,840 --> 00:03:25,370 +So instead of giving a big + +68 +00:03:25,370 --> 00:03:29,134 +alternative, I am trying to +give one regular expression, + +69 +00:03:29,134 --> 00:03:31,340 +which can not just match +a single character, + +70 +00:03:31,340 --> 00:03:34,230 +but a set of characters. + +71 +00:03:34,630 --> 00:03:36,980 +How can you be sure that + +72 +00:03:36,980 --> 00:03:39,215 +definition you come +up with are correct? + +73 +00:03:39,215 --> 00:03:41,450 +Well, I showed you which are + +74 +00:03:41,450 --> 00:03:45,170 +the properties these two +functions need to satisfy. + +75 +00:03:45,170 --> 00:03:47,060 +I've given you here what + +76 +00:03:47,060 --> 00:03:49,325 +the meaning of these +expressions are. + +77 +00:03:49,325 --> 00:03:52,700 +So you will always know what's +on the right-hand side. + +78 +00:03:52,700 --> 00:03:54,515 +This is completely determined. + +79 +00:03:54,515 --> 00:03:56,720 +You essentially +have to fill something + +80 +00:03:56,720 --> 00:03:58,910 +equivalent on the left-hand side. + +81 +00:03:58,910 --> 00:04:02,105 +That's the main task +of the coursework. + +82 +00:04:02,105 --> 00:04:08,090 +And you can start from the +files I provided on KEATS. + +83 +00:04:08,090 --> 00:04:12,125 +So you would just have to +add these additional cases. + +84 +00:04:12,125 --> 00:04:15,020 +When you add these +additional cases + +85 +00:04:15,020 --> 00:04:17,330 +and you're using the Scala language, + +86 +00:04:17,330 --> 00:04:18,980 +you can do me a favour. + +87 +00:04:18,980 --> 00:04:22,550 +You can call these +constructors for + +88 +00:04:22,550 --> 00:04:25,400 +these different regular +expressions as RANGE, + +89 +00:04:25,400 --> 00:04:28,985 +PLUS, OPTIONAL and NTIMES, +UPTO, FROM and BETWEEN. + +90 +00:04:28,985 --> 00:04:31,370 +Remember I have this +convention to always + +91 +00:04:31,370 --> 00:04:34,025 +use capital letters +for regular expressions. + +92 +00:04:34,025 --> 00:04:36,680 +It would be nice if you could use + +93 +00:04:36,680 --> 00:04:39,260 +these names because +then it will be + +94 +00:04:39,260 --> 00:04:42,335 +very easy for me +to test your code. + +95 +00:04:42,335 --> 00:04:44,690 +If you're using a different +programming language + +96 +00:04:44,690 --> 00:04:46,370 +like let's say Rust, + +97 +00:04:46,370 --> 00:04:48,860 +I expect some people will use that, where I + +98 +00:04:48,860 --> 00:04:51,380 +have no idea what are the +conventions in your language, + +99 +00:04:51,380 --> 00:04:53,420 +how you have to call +these constructors, + +100 +00:04:53,420 --> 00:04:56,420 +we will see whatever you +submit. I'll have a look. + +101 +00:04:56,420 --> 00:04:59,360 +There's one more +constraint I have to + +102 +00:04:59,360 --> 00:05:02,194 +impose to make this +coursework interesting. + +103 +00:05:02,194 --> 00:05:04,730 +I do not want you +that you just take + +104 +00:05:04,730 --> 00:05:07,370 +these extended regular +expressions and that you + +105 +00:05:07,370 --> 00:05:10,550 +expand them using +their definition. + +106 +00:05:10,550 --> 00:05:12,320 +Because that would make the regular + +107 +00:05:12,320 --> 00:05:13,955 +expression matcher very slow. + +108 +00:05:13,955 --> 00:05:16,160 +So for example, if +you want to define + +109 +00:05:16,160 --> 00:05:18,785 +what's the derivative for r-plus, + +110 +00:05:18,785 --> 00:05:21,560 +then what does not +count as a solution: + +111 +00:05:21,560 --> 00:05:24,770 +if you just expand that +to the definition that r + +112 +00:05:24,770 --> 00:05:28,935 +plus is equivalent to +r followed by r-star. + +113 +00:05:28,935 --> 00:05:32,845 +So in code, what I +would like to not see, + +114 +00:05:32,845 --> 00:05:35,680 +I would not give any +marks for that is, + +115 +00:05:35,680 --> 00:05:37,780 +if you add the plus as follows, + +116 +00:05:37,780 --> 00:05:39,910 +so that is still perfectly fine. + +117 +00:05:39,910 --> 00:05:42,160 +There's nothing you +can do differently. + +118 +00:05:42,160 --> 00:05:44,065 +That would be the constructor. + +119 +00:05:44,065 --> 00:05:46,480 +But when you define nullable, + +120 +00:05:46,480 --> 00:05:49,180 +I do not want to see +that you defined + +121 +00:05:49,180 --> 00:05:51,790 +this plus r as nullable + +122 +00:05:51,790 --> 00:05:54,385 +of the sequence of r and star-r, + +123 +00:05:54,385 --> 00:05:58,075 +just unfolding +the definition. + +124 +00:05:58,075 --> 00:06:00,415 +As you can see, when you come + +125 +00:06:00,415 --> 00:06:02,815 +up with a much better +solution for that. + +126 +00:06:02,815 --> 00:06:05,110 +This is a very inefficient way + +127 +00:06:05,110 --> 00:06:07,090 +for how to calculate nullable. + +128 +00:06:07,090 --> 00:06:10,825 +And so the same for derivative +would not be allowed. + +129 +00:06:10,825 --> 00:06:13,895 +If you, I have the plus r here, + +130 +00:06:13,895 --> 00:06:16,685 +you can't just unfold +the definition, + +131 +00:06:16,685 --> 00:06:19,790 +with r-plus +being defined as r + +132 +00:06:19,790 --> 00:06:21,350 +followed by r star and + +133 +00:06:21,350 --> 00:06:23,375 +then just build the +derivative of that. + +134 +00:06:23,375 --> 00:06:25,280 +You have to find something more + +135 +00:06:25,280 --> 00:06:28,730 +primitive that involves +only the derivative of r, + +136 +00:06:28,730 --> 00:06:31,790 +essentially, nothing +more involved. The same + +137 +00:06:31,790 --> 00:06:33,815 +if you have n-times, for example, + +138 +00:06:33,815 --> 00:06:39,935 +you can't just unfold this +n-times in this sequence + +139 +00:06:39,935 --> 00:06:43,310 +of .... n-copies + +140 +00:06:43,310 --> 00:06:47,285 +of this r. You have to get +something more primitive. + +141 +00:06:47,285 --> 00:06:49,760 +Otherwise, as you've +seen earlier, + +142 +00:06:49,760 --> 00:06:53,135 +your regular expression matcher +would be totally slow. + +143 +00:06:53,135 --> 00:06:55,475 +When you submit your solution, + +144 +00:06:55,475 --> 00:06:58,445 +please submit this +solution in the PDF. + +145 +00:06:58,445 --> 00:07:01,580 +So give the cases of your definition + +146 +00:07:01,580 --> 00:07:06,004 +in a form like that for +nullable and derivative. + +147 +00:07:06,004 --> 00:07:08,405 +And also implement it in code. + +148 +00:07:08,405 --> 00:07:10,910 +That is just helping me to + +149 +00:07:10,910 --> 00:07:13,400 +find out whether you have +the correct solution or not. + +150 +00:07:13,400 --> 00:07:15,710 +So you needed a kind of +mathematical notation of + +151 +00:07:15,710 --> 00:07:18,695 +your solution, and +an actual code. + +152 +00:07:18,695 --> 00:07:22,414 +And then once you +have your definition, + +153 +00:07:22,414 --> 00:07:25,699 +also make sure you try +it out on some examples. + +154 +00:07:25,699 --> 00:07:28,970 +These will be the examples +I will definitely try out, + +155 +00:07:28,970 --> 00:07:30,560 +but probably also more + +156 +00:07:30,560 --> 00:07:33,215 +depending on what +definitions you give me. + +157 +00:07:33,215 --> 00:07:38,300 +There's one more question I +ask you to do. You've seen + +158 +00:07:38,300 --> 00:07:40,130 +there's some regular +expressions that + +159 +00:07:40,130 --> 00:07:42,215 +are involved for characters, + +160 +00:07:42,215 --> 00:07:46,925 +for character ranges or +character sets essentially. + +161 +00:07:46,925 --> 00:07:48,665 +And sometimes I also want to have + +162 +00:07:48,665 --> 00:07:51,665 +just a reg expression which +can match any character!! + +163 +00:07:51,665 --> 00:07:56,195 +And I could have for all of +them separate definitions. + +164 +00:07:56,195 --> 00:07:57,710 +And that would mean I also need + +165 +00:07:57,710 --> 00:07:59,645 +separate definitions for nullable, + +166 +00:07:59,645 --> 00:08:02,405 +separate definitions +for derivative. + +167 +00:08:02,405 --> 00:08:05,825 +But actually they can be +generalized to just one constructor. + +168 +00:08:05,825 --> 00:08:08,210 +And that is if we introduce + +169 +00:08:08,210 --> 00:08:11,630 +a constructor for regular expressions, + +170 +00:08:11,630 --> 00:08:13,760 +which not just takes +a single character + +171 +00:08:13,760 --> 00:08:15,469 +or set of characters. + +172 +00:08:15,469 --> 00:08:18,245 +But, which I call here CFUN. + +173 +00:08:18,245 --> 00:08:23,165 +And it takes a function from +characters to booleans. + +174 +00:08:23,165 --> 00:08:24,470 +So if I want to match + +175 +00:08:24,470 --> 00:08:26,900 +just a single character +then this function f + +176 +00:08:26,900 --> 00:08:29,060 +would only say true + +177 +00:08:29,060 --> 00:08:32,225 +if it gets as argument +the single character. + +178 +00:08:32,225 --> 00:08:34,850 +If I have a character set, + +179 +00:08:34,850 --> 00:08:36,515 +then I would say, well, + +180 +00:08:36,515 --> 00:08:38,300 +I need a function +which says true + +181 +00:08:38,300 --> 00:08:40,970 +for all the characters +in this set. + +182 +00:08:40,970 --> 00:08:43,460 +And otherwise it's false. + +183 +00:08:43,460 --> 00:08:46,205 +And if I want to +match any character!!, + +184 +00:08:46,205 --> 00:08:48,500 +then they should get a function + +185 +00:08:48,500 --> 00:08:53,450 +which says true for +all characters. + +186 +00:08:53,450 --> 00:08:56,630 +Okay? If I have +such a constructor + +187 +00:08:56,630 --> 00:09:00,140 +that actually I can save +myself a bit of work. + +188 +00:09:00,140 --> 00:09:03,200 +And I can just have one case + +189 +00:09:03,200 --> 00:09:06,920 +for nullable and one +case for CFUNS. + +190 +00:09:06,920 --> 00:09:09,800 +So that would then subsume +the character ranges and + +191 +00:09:09,800 --> 00:09:13,385 +the character and also +this ALL regular expression. + +192 +00:09:13,385 --> 00:09:15,380 +Ok, this was not explicitly +included at the beginning, + +193 +00:09:15,380 --> 00:09:17,510 +but that's what I can now define. + +194 +00:09:17,510 --> 00:09:21,740 +And then I can define +this for characters, + +195 +00:09:21,740 --> 00:09:23,885 +just as special cases +for these functions. + +196 +00:09:23,885 --> 00:09:25,610 +And I told you already +what this function + +197 +00:09:25,610 --> 00:09:28,025 +should look like in +these three cases. + +198 +00:09:28,025 --> 00:09:30,350 +So I let ... + +199 +00:09:30,350 --> 00:09:33,515 +you decide how you're +going to implement that. + +200 +00:09:33,515 --> 00:09:35,450 +If you first define + +201 +00:09:35,450 --> 00:09:38,495 +your implementation is +all these explicit cases. + +202 +00:09:38,495 --> 00:09:41,900 +Because in these cases it's +actually more intuitive, + +203 +00:09:41,900 --> 00:09:45,980 +what nullable and +derivative should be. + +204 +00:09:45,980 --> 00:09:48,035 +And then in a second step, + +205 +00:09:48,035 --> 00:09:51,140 +you implement these +more general cases. + +206 +00:09:51,140 --> 00:09:53,360 +And just keep the original ones + +207 +00:09:53,360 --> 00:09:54,500 +in your implementation if you + +208 +00:09:54,500 --> 00:09:57,710 +want, or if you feel +adventurous enough, + +209 +00:09:57,710 --> 00:09:59,225 +and want to be lazy, + +210 +00:09:59,225 --> 00:10:01,115 +that you just implement that. + +211 +00:10:01,115 --> 00:10:02,660 +And then you have already done + +212 +00:10:02,660 --> 00:10:06,530 +at least two constructors +by just implementing one. + +213 +00:10:06,530 --> 00:10:08,915 +But as said that +requires a bit skill + +214 +00:10:08,915 --> 00:10:11,450 +of generalizing how +that should be. + +215 +00:10:11,450 --> 00:10:14,180 +And the other questions +are just then + +216 +00:10:14,180 --> 00:10:16,745 +trying out your +expression matcher on + +217 +00:10:16,745 --> 00:10:19,580 +some examples, more +power examples, + +218 +00:10:19,580 --> 00:10:22,400 +and they should be +very easy to solve. + +219 +00:10:22,400 --> 00:10:24,605 +So I hope it's not +too much asked for + +220 +00:10:24,605 --> 00:10:26,615 +and I hope you have fun. + +221 +00:10:26,615 --> 00:10:29,675 +It is not too much ask +because you can, + +222 +00:10:29,675 --> 00:10:32,420 +I hope it's not too much +because you can start from + +223 +00:10:32,420 --> 00:10:35,615 +my definitions or +my implementation. + +224 +00:10:35,615 --> 00:10:39,425 +It's really essentially +mostly is brain work, + +225 +00:10:39,425 --> 00:10:42,170 +how these nullable and +derivative should work. + +226 +00:10:42,170 --> 00:10:44,510 +If you're in a +language like Scala, + +227 +00:10:44,510 --> 00:10:48,875 +the implementation should +then be easy-peasy. + +228 +00:10:48,875 --> 00:10:51,365 +If you are in a different language + +229 +00:10:51,365 --> 00:10:52,610 +I assume you also + +230 +00:10:52,610 --> 00:10:54,890 +dedicated to that +language to start with, + +231 +00:10:54,890 --> 00:10:58,475 +so it should be also very +easy for you to translate + +232 +00:10:58,475 --> 00:11:01,100 +my Scala code into whatever +language you want to + +233 +00:11:01,100 --> 00:11:04,280 +do, first and then +start from there. + +234 +00:11:04,280 --> 00:11:06,230 +If there are any questions, + +235 +00:11:06,230 --> 00:11:08,360 +please ask me or the TAs. + +236 +00:11:08,360 --> 00:11:10,040 +We are trying to be as helpful + +237 +00:11:10,040 --> 00:11:12,900 +as possible with the coursework. + +238 +00:11:13,240 --> 00:11:17,885 +Bear in mind, this is the +first step in our compiler. + +239 +00:11:17,885 --> 00:11:21,005 +The coursework 2 will then +build on top of that. + +240 +00:11:21,005 --> 00:11:25,770 +So it's better to get this +correct first. Thanks.