# HG changeset patch # User update # Date 1601916372 -3600 # Node ID 42733adf2a48319157c60d3b5982dc5a1d230d22 # Parent 260e330f74661c35473e85d049bbabb36adf478f updated diff -r 260e330f7466 -r 42733adf2a48 videos/02-algo3.srt --- a/videos/02-algo3.srt Mon Oct 05 01:42:47 2020 +0100 +++ b/videos/02-algo3.srt Mon Oct 05 17:46:12 2020 +0100 @@ -1,6 +1,6 @@ 1 00:00:06,350 --> 00:00:10,305 -They come back. I +Welcome come back. I can hear you saying, 2 @@ -19,7 +19,7 @@ 5 00:00:17,415 --> 00:00:21,265 Specifically, we had two -regular expressions. +evil regular expressions. 6 00:00:21,265 --> 00:00:23,480 @@ -27,13 +27,13 @@ 7 00:00:23,480 --> 00:00:27,480 -Where let's have a look at +Well, let's have a look at that other case in this video. 8 00:00:29,140 --> 00:00:32,705 So I'm back here -in this Reto file. +in this re2.sc file. 9 00:00:32,705 --> 00:00:35,180 @@ -42,7 +42,7 @@ 10 00:00:35,180 --> 00:00:39,665 -nicely for strings between 01000. +nicely for strings between 0 and 1000. 11 00:00:39,665 --> 00:00:42,725 @@ -55,11 +55,11 @@ 13 00:00:44,090 --> 00:00:48,470 for relatively small -strings between 020. +strings between 0 and 20. 14 00:00:48,470 --> 00:00:50,240 -And let's see what it say. +And let's see what it says. 15 00:00:50,240 --> 00:00:51,800 @@ -67,13 +67,13 @@ 16 00:00:51,800 --> 00:00:57,320 -the ammonoids repo and +the Amoonite REPL and it doesn't look too bad. 17 00:00:57,320 --> 00:01:01,160 But this might be because -20 is not general enough. +20 is not generous enough. 18 00:01:01,160 --> 00:01:03,620 @@ -120,12 +120,12 @@ 28 00:01:33,905 --> 00:01:37,640 -which is Daphne bad and we +which is definitely bad and we have to have a look at that. 29 00:01:37,640 --> 00:01:40,640 -Okay? It's always +It's always instructive with 30 @@ -135,13 +135,13 @@ 31 00:01:43,460 --> 00:01:45,695 -the record expressions -we calculate +the regular expressions +we calculate. 32 00:01:45,695 --> 00:01:49,625 -the evil to Is this -a star, star B. +The EVIL2 is this +a star, star b. 33 00:01:49,625 --> 00:01:51,800 @@ -174,15 +174,15 @@ 39 00:02:05,600 --> 00:02:08,420 -If I take this evil to wreck +If I take this EVIL2 regular 40 00:02:08,420 --> 00:02:09,935 -expression and they built +expression and then build 41 00:02:09,935 --> 00:02:12,470 -now longer and +longer and longer derivatives, 42 @@ -209,7 +209,7 @@ 47 00:02:21,680 --> 00:02:26,870 But the size of the -reg expression, +regular expression, 48 00:02:26,870 --> 00:02:28,310 @@ -235,7 +235,7 @@ 53 00:02:39,560 --> 00:02:42,785 -This reg expression +This regular expression just grew too much. 54 @@ -245,13 +245,13 @@ 55 00:02:46,520 --> 00:02:49,655 -this record expression +this regular expression and make it not 56 00:02:49,655 --> 00:02:52,700 grow so much when we -build derivative. +build the derivative. 57 00:02:52,700 --> 00:02:54,020 @@ -279,7 +279,7 @@ 62 00:03:03,740 --> 00:03:06,560 this example of a -wreck expression R, +regulat expression r, 63 00:03:06,560 --> 00:03:11,465 @@ -312,15 +312,15 @@ 69 00:03:26,570 --> 00:03:29,015 which is this whole -wreck expression again. +regular expression again. 70 00:03:29,015 --> 00:03:30,935 -So you can also see. +So you can also see 71 00:03:30,935 --> 00:03:34,745 -Derivative operation +the derivative operation introduces a lot of junk. 72 @@ -330,7 +330,7 @@ 73 00:03:38,165 --> 00:03:42,455 -And this times one could +And this times 1 could be also thrown away. 74 @@ -344,16 +344,16 @@ 76 00:03:48,110 --> 00:03:49,580 -it still be a, +it still b, 77 00:03:49,580 --> 00:03:54,905 -so it would be B followed by -R. And in this second case, +so it would be b followed by +r. And in this second case, 78 00:03:54,905 --> 00:03:57,515 -C0 times b, that will be just 0. +0 times b, that will be just 0. 79 00:03:57,515 --> 00:03:59,270 @@ -361,13 +361,13 @@ 80 00:03:59,270 --> 00:04:05,330 -11 times r would be just +1 ... times r would be just r. And in the last case, 81 00:04:05,330 --> 00:04:12,155 -C0 times B would be 00 plus -0 is 00 times r would be 0. +0 times b would be 0. 0 plus +0 is 0. 0 times r would be 0. 82 00:04:12,155 --> 00:04:17,540 @@ -391,8 +391,8 @@ 86 00:04:27,035 --> 00:04:31,130 -0 wouldn't go away by -building annuity where tests. +And the 0s wouldn't go away by +building a new derivative. 87 00:04:31,130 --> 00:04:34,310 @@ -435,7 +435,7 @@ 95 00:04:58,715 --> 00:05:01,415 this might have -introduced some chunk. +introduced some junk. 96 00:05:01,415 --> 00:05:02,960 @@ -443,7 +443,7 @@ 97 00:05:02,960 --> 00:05:06,590 -essentially a additional function +essentially an additional function 98 00:05:06,590 --> 00:05:08,945 @@ -455,15 +455,15 @@ 100 00:05:10,490 --> 00:05:13,340 -keep steer, Rekha expression, +keeps the regular expression, 101 00:05:13,340 --> 00:05:16,730 -relatively small, pickers debts, +relatively small, because that 102 00:05:16,730 --> 00:05:19,535 -the Achilles heel +is the Achilles' heel of this algorithm. 103 @@ -488,7 +488,7 @@ 107 00:05:30,455 --> 00:05:33,080 compress again this -rec expression +regular expression 108 00:05:33,080 --> 00:05:35,570 @@ -506,7 +506,7 @@ 111 00:05:42,440 --> 00:05:46,040 -this reg expression and +this regular expression and deletes all this junk, 112 @@ -539,7 +539,7 @@ 118 00:06:02,720 --> 00:06:05,060 -The only simplify when we have +We only simplify when we have 119 00:06:05,060 --> 00:06:08,255 @@ -585,16 +585,16 @@ 128 00:06:30,530 --> 00:06:34,880 Now, however, there -is one small point. +is one small point 129 00:06:34,880 --> 00:06:39,785 -You have to be aware of -these simplification rules. +you have to be aware of. +These simplification rules 130 00:06:39,785 --> 00:06:42,740 -They need to be essentially +they need to be essentially applied backwards. 131 @@ -604,12 +604,12 @@ 132 00:06:45,650 --> 00:06:49,085 -this record expression and +this regular expression and then have to find out, 133 00:06:49,085 --> 00:06:51,170 -can I apply simplification? +can I apply the simplification? 134 00:06:51,170 --> 00:06:52,670 @@ -655,26 +655,26 @@ 143 00:07:16,850 --> 00:07:20,885 -what that means is the -matching this reg expression. +what that means we're +matching this regular expression. 144 00:07:20,885 --> 00:07:23,120 -Be test whether it's +We test whether it's an alternative or 145 00:07:23,120 --> 00:07:26,345 -a sequence only then we +a sequence. Only then we actually go into action. 146 00:07:26,345 --> 00:07:28,385 -Once we know. +Once we know 147 00:07:28,385 --> 00:07:30,530 -In case of an alternative, +in case of an alternative, 148 00:07:30,530 --> 00:07:32,120 @@ -682,7 +682,7 @@ 149 00:07:32,120 --> 00:07:35,765 -P first, simplify each component, +We first, simplify each component, 150 00:07:35,765 --> 00:07:39,095 @@ -696,7 +696,7 @@ 152 00:07:41,180 --> 00:07:45,679 this simplification of -R1 resulted into a 0. +r1 resulted into a 0. 153 00:07:45,679 --> 00:07:47,884 @@ -704,17 +704,17 @@ 154 00:07:47,884 --> 00:07:49,190 -than I can just replace it +then I can just replace it 155 00:07:49,190 --> 00:07:53,375 -by r is two a simplified -version of R2. +by r2s, which a simplified +version of r2. 156 00:07:53,375 --> 00:07:58,820 -If it came back r as -two is actually 0, +If it came back r2s +is actually 0, 157 00:07:58,820 --> 00:08:00,410 @@ -722,7 +722,7 @@ 158 00:08:00,410 --> 00:08:03,785 -the simplified version of a warm. +the simplified version of a r1. 159 00:08:03,785 --> 00:08:07,460 @@ -736,7 +736,7 @@ 161 00:08:11,180 --> 00:08:15,170 -next century I can throw away +then I can throw away one of the alternatives. 162 @@ -746,11 +746,11 @@ 163 00:08:18,080 --> 00:08:21,155 -but the simplified components +but the simplified components. 164 00:08:21,155 --> 00:08:23,915 -in the sequence is +In the sequence case it is pretty much the same. 165 @@ -761,16 +761,16 @@ 166 00:08:27,950 --> 00:08:31,385 So I call first simplify -and then have a look. +and then have a look... 167 00:08:31,385 --> 00:08:33,020 -Does it matches C or one of +Does it matches 0 one of 168 00:08:33,020 --> 00:08:36,035 -the simplification -damage, just return 0. +the simplifications, +then I just return 0. 169 00:08:36,035 --> 00:08:38,330 @@ -787,15 +787,15 @@ 172 00:08:43,310 --> 00:08:45,920 -And if this iss two is one, +And if the rs2 is one, 173 00:08:45,920 --> 00:08:47,615 -and I keep the first component, +and I keep the first component. 174 00:08:47,615 --> 00:08:50,359 -and otherwise it's +And otherwise it's still the sequence. 175 @@ -805,13 +805,13 @@ 176 00:08:53,540 --> 00:08:55,700 -It's just a plain +If it's just a plain 0. I leave it in. 177 00:08:55,700 --> 00:08:57,860 If it's a plain -warm, I leave it in. +1, I leave it in. 178 00:08:57,860 --> 00:09:00,170 @@ -820,7 +820,7 @@ 179 00:09:00,170 --> 00:09:02,840 I don't open up this -door and simplify it. +star and simplify it. 180 00:09:02,840 --> 00:09:06,680 @@ -835,20 +835,20 @@ 182 00:09:10,280 --> 00:09:12,980 So I'm now in the -file read three, +file re3.sc, 183 00:09:12,980 --> 00:09:17,405 -and it's the same as Reto. +and it's the same as re2.sc. 184 00:09:17,405 --> 00:09:20,885 It still has this -explicit and Times case, +explicit and n-times case, 185 00:09:20,885 --> 00:09:24,665 -nullable and derivative +nullable and derivative are still the same. 186 @@ -862,12 +862,12 @@ 188 00:09:29,960 --> 00:09:33,725 -the derivative after -the apply deteriorated, +the derivative and after +we apply the derivative, 189 00:09:33,725 --> 00:09:35,870 -simplify it to throw away +we simplify it to throw away 190 00:09:35,870 --> 00:09:39,050 @@ -885,11 +885,11 @@ 193 00:09:43,580 --> 00:09:45,515 -of the optional reg expression. +of the optional regular expression. 194 00:09:45,515 --> 00:09:49,670 -Here, our two evil wreck +Here are our two EVIL regular expressions we use as a test. 195 @@ -898,27 +898,27 @@ 196 00:09:52,460 --> 00:09:55,835 -the first one and the +the first one and we're actually now trying it 197 00:09:55,835 --> 00:10:00,515 out for strings between -09 thousand a's. +0 and 9000 a's. 198 00:10:00,515 --> 00:10:08,285 -So C. So also gets -simplification makes a, +So let's see. So also the +simplification makes 199 00:10:08,285 --> 00:10:10,655 -actually this case fast on. +actually this case faster. 200 00:10:10,655 --> 00:10:16,260 So we can now run strings -up to 9 thousand. +up to 9000. 201 00:10:16,260 --> 00:10:19,360 @@ -928,7 +928,7 @@ 202 00:10:19,360 --> 00:10:24,745 For I have to say my laptop -is now starting its fan. +is now starting to run its fan. 203 00:10:24,745 --> 00:10:28,525 @@ -951,12 +951,12 @@ 207 00:10:34,720 --> 00:10:37,720 just to be a tiny bit -more accurate map. +more accurate. 208 00:10:37,720 --> 00:10:42,025 So three seconds for a -string of 9 thousand a's. +string of 9000 a's. 209 00:10:42,025 --> 00:10:44,980 @@ -969,16 +969,16 @@ 211 00:10:46,240 --> 00:10:48,625 -this E star, star b. +this a star, star, b. 212 00:10:48,625 --> 00:10:51,250 And you can already see -on these numbers RB. +on these numbers... 213 00:10:51,250 --> 00:10:53,260 -And now you're really ambitious. +we are really ambitious. 214 00:10:53,260 --> 00:10:57,860 @@ -992,7 +992,7 @@ 216 00:11:07,780 --> 00:11:10,480 -The laptop thefts to +The laptop has to calculate quite a bit. 217 @@ -1001,7 +1001,7 @@ 218 00:11:12,940 --> 00:11:16,539 -3 million A's and +3 million a's and it needs a second. 219 @@ -1010,7 +1010,7 @@ 220 00:11:20,680 --> 00:11:25,280 -4 million a around two seconds. +4 million a's around two seconds. 221 00:11:27,030 --> 00:11:29,050 @@ -1018,11 +1018,11 @@ 222 00:11:29,050 --> 00:11:31,690 -I'm actually slowing it down. +I'm actually slowing it down 223 00:11:31,690 --> 00:11:34,150 -He artificially run the test +here artificially with running the test 224 00:11:34,150 --> 00:11:36,895 @@ -1037,7 +1037,7 @@ 226 00:11:42,749 --> 00:11:48,185 And remember that is a -string of 6 million A's. +string of 6 million a's. 227 00:11:48,185 --> 00:11:51,170 @@ -1047,7 +1047,7 @@ 228 00:11:51,170 --> 00:11:56,105 So the simplification also -made of first case faster. +made our first case faster. 229 00:11:56,105 --> 00:11:58,880 @@ -1057,11 +1057,11 @@ 230 00:11:58,880 --> 00:12:00,710 where we just have -this explicit and +this explicit 231 00:12:00,710 --> 00:12:03,050 -times that at this graph. +n-times. That is this graph. 232 00:12:03,050 --> 00:12:05,210 @@ -1083,21 +1083,21 @@ 236 00:12:16,745 --> 00:12:19,220 the existing regular -expression matches in +expression matchers in 237 00:12:19,220 --> 00:12:22,880 -Java eight, Python, +Java 8, Python, and JavaScript. 238 00:12:22,880 --> 00:12:26,030 And thanks to the -student be Ozma, +student we also 239 00:12:26,030 --> 00:12:27,935 -half a graph for Swift. +have a graph for Swift. 240 00:12:27,935 --> 00:12:29,750 @@ -1105,12 +1105,12 @@ 241 00:12:29,750 --> 00:12:33,320 -They need like for 30 Ace, +They need like for 30 a's, 242 00:12:33,320 --> 00:12:37,490 something like on the -magnitude of thirty-seconds. +magnitude of 30 seconds. 243 00:12:37,490 --> 00:12:39,740 @@ -1118,11 +1118,11 @@ 244 00:12:39,740 --> 00:12:42,665 -Java nine is slightly better. +Java 9 is slightly better. 245 00:12:42,665 --> 00:12:44,870 -Java nine and above this, +Java 9 and above, 246 00:12:44,870 --> 00:12:46,220 @@ -1130,7 +1130,7 @@ 247 00:12:46,220 --> 00:12:48,665 -the same exponential and nine, +the same example in Java 9, 248 00:12:48,665 --> 00:12:51,230 @@ -1139,18 +1139,18 @@ 249 00:12:51,230 --> 00:12:54,215 -like 40 thousand A's +like 40 thousand a's in half a minute. 250 00:12:54,215 --> 00:12:57,740 I have to admit I'm not -a 100% sure what they +100% sure what they 251 00:12:57,740 --> 00:13:03,575 did to improve the -performance between Java 89. +performance between Java 8 and 9. 252 00:13:03,575 --> 00:13:05,510 @@ -1172,7 +1172,7 @@ 256 00:13:12,230 --> 00:13:17,945 But that's still not -really competition fas. +really a competition for us. 257 00:13:17,945 --> 00:13:20,915 @@ -1184,8 +1184,8 @@ 259 00:13:24,335 --> 00:13:28,445 -We say it's something like -We need 6 million A's. +We said for something like +6 million a's we need 15 secs. 260 00:13:28,445 --> 00:13:31,760 @@ -1194,7 +1194,7 @@ 261 00:13:31,760 --> 00:13:33,770 -so he had needs 20 seconds. +so here it needs 20 seconds. 262 00:13:33,770 --> 00:13:37,250 @@ -1204,11 +1204,11 @@ 263 00:13:37,250 --> 00:13:40,865 But again, you can see -the trends. We rants. +the trends. We run. 264 00:13:40,865 --> 00:13:42,590 -Circles around them. +circles around them. 265 00:13:42,590 --> 00:13:46,530 @@ -1217,11 +1217,11 @@ 266 00:13:46,570 --> 00:13:49,774 So what's good about -this algorithm? +this algorithm 267 00:13:49,774 --> 00:13:51,605 -By pressure of ski? +by Brzozowski? 268 00:13:51,605 --> 00:13:54,500 @@ -1238,17 +1238,17 @@ 271 00:13:59,900 --> 00:14:03,950 -the End Times regular expression -is a little bit of work. +the n-times regular expression. +Is a little bit of work, but 272 00:14:03,950 --> 00:14:05,405 -We could extend that. +we could extend that. 273 00:14:05,405 --> 00:14:08,480 It also extends to the -not reg expression, +not regular expression, 274 00:14:08,480 --> 00:14:10,820 @@ -1275,7 +1275,7 @@ 279 00:14:22,955 --> 00:14:25,715 -Quite beautiful in Scala. +It's quite beautiful in Scala. 280 00:14:25,715 --> 00:14:28,160 @@ -1284,7 +1284,7 @@ 281 00:14:28,160 --> 00:14:31,760 -in C language or in Python. +in the C language or in Python. 282 00:14:31,760 --> 00:14:34,250 @@ -1293,12 +1293,12 @@ 283 00:14:34,250 --> 00:14:38,390 -Say the algorithm's actually -quite old already brush-off, +The algorithm is actually +quite old already. 284 00:14:38,390 --> 00:14:44,899 -ski wrote it down in +Brzozowski wrote it down in 1964 and his PhD thesis. 285 @@ -1313,12 +1313,12 @@ 287 00:14:54,095 --> 00:14:57,065 At the moment, we are -only solve the problem +only solved the problem 288 00:14:57,065 --> 00:15:02,240 -of Gmail reg expression -gamma string deaths, +of given a regular expression, +givven a string, does 289 00:15:02,240 --> 00:15:05,550 @@ -1327,8 +1327,8 @@ 290 00:15:05,550 --> 00:15:08,740 -The want to of course, use -it as part of our lexicon. +We want to of course, use +it as part of our lexer. 291 00:15:08,740 --> 00:15:12,370 @@ -1352,8 +1352,8 @@ 295 00:15:22,045 --> 00:15:26,110 -called suits Month and -the co-worker called Lou. +called Sulzmann and +the co-worker called Lu. 296 00:15:26,110 --> 00:15:30,880 @@ -1403,16 +1403,16 @@ 306 00:15:56,645 --> 00:15:58,550 -Is really fs string +Is it really true that if a string 307 00:15:58,550 --> 00:16:00,440 is in the language -of a reg expression. +of a regular expression. 308 00:16:00,440 --> 00:16:03,050 -Does that matter? Vd say yes. +Does that matter? I would say yes. 309 00:16:03,050 --> 00:16:07,070 @@ -1427,20 +1427,20 @@ 311 00:16:10,580 --> 00:16:13,775 actually extended to quite a -number of rec expressions. +number of regular expressions. 312 00:16:13,775 --> 00:16:17,810 -So this is t not reg +So this is the NOT regular expression that is 313 00:16:17,810 --> 00:16:19,760 -opposed to match strings which +supposed to match strings which 314 00:16:19,760 --> 00:16:22,235 -this reg expression cannot match. +this regular expression cannot match. 315 00:16:22,235 --> 00:16:24,245 @@ -1448,12 +1448,12 @@ 316 00:16:24,245 --> 00:16:28,640 -You're in the business of +If you're in the business of trying to find out what 317 00:16:28,640 --> 00:16:33,530 -our comments in languages like +are comments in languages like Java or C. Then you know, 318 @@ -1515,7 +1515,7 @@ 330 00:17:03,680 --> 00:17:06,410 which doesn't -contain as standard, +contain a star and a slash, 331 00:17:06,410 --> 00:17:11,180 @@ -1528,7 +1528,7 @@ 333 00:17:13,460 --> 00:17:15,995 -this reg expression is +this regular expression is actually quite useful. 334 @@ -1538,17 +1538,17 @@ 335 00:17:19,430 --> 00:17:21,995 -this kinda break -expression with automata? +for this kind of regular +expression with automata, 336 00:17:21,995 --> 00:17:24,710 -You will know that's -a bit of a pain, +you will know that's +a bit of a pain. 337 00:17:24,710 --> 00:17:26,675 -but for the Borowski, +But for the Brzozowski, 338 00:17:26,675 --> 00:17:28,370 @@ -1557,7 +1557,7 @@ 339 00:17:28,370 --> 00:17:30,710 The meaning of this -reg expression +regular expression 340 00:17:30,710 --> 00:17:32,255 @@ -1574,7 +1574,7 @@ 343 00:17:38,390 --> 00:17:40,055 -this arc and match. +this r can match. 344 00:17:40,055 --> 00:17:42,080 @@ -1582,7 +1582,7 @@ 345 00:17:42,080 --> 00:17:44,630 -what this reg expression +what this regular expression is supposed to match. 346 @@ -1605,8 +1605,8 @@ 350 00:17:54,140 --> 00:17:56,570 -just a test where -the eyes nullable. +just a test whether r +is nullable. 351 00:17:56,570 --> 00:17:58,985 @@ -1615,12 +1615,12 @@ 352 00:17:58,985 --> 00:18:00,680 -And men I have to build +And then I have to build 353 00:18:00,680 --> 00:18:03,620 the derivative of this -not reg expression. +NOT regular expression. 354 00:18:03,620 --> 00:18:05,420 @@ -1628,16 +1628,16 @@ 355 00:18:05,420 --> 00:18:07,325 -this permutation does derivative, +this .... 356 00:18:07,325 --> 00:18:10,070 derivative inside -the wreck expression +the regular expression 357 00:18:10,070 --> 00:18:12,575 -and keep the not on +and keep the NOT on the outermost level. 358 @@ -1647,7 +1647,7 @@ 359 00:18:15,515 --> 00:18:19,130 -adding this reg expression +adding this regular expression to the algorithm. 360 @@ -1662,7 +1662,7 @@ 362 00:18:24,739 --> 00:18:27,620 -I can see to loose trends. +I can see to loose strands. 363 00:18:27,620 --> 00:18:29,420 @@ -1675,8 +1675,8 @@ 365 00:18:32,855 --> 00:18:38,705 -And we also added the end -times out of necessity. +And we also added the +n-times out of necessity. 366 00:18:38,705 --> 00:18:41,120 @@ -1685,7 +1685,7 @@ 367 00:18:41,120 --> 00:18:44,840 -the not regular +the NOT regular expression on a slide. 368 @@ -1696,7 +1696,7 @@ 369 00:18:48,995 --> 00:18:52,970 to extend that to some of -the extended reg expression. +the extended regular expressions. 370 00:18:52,970 --> 00:18:57,260 @@ -1705,20 +1705,20 @@ 371 00:18:57,260 --> 00:19:01,550 -n times or between n and m times. +n-times or between n and m times. 372 00:19:01,550 --> 00:19:04,325 And I think also -plus and D option. +plus and option. 373 00:19:04,325 --> 00:19:07,220 -The other loose trend is, +The other loose strand is, 374 00:19:07,220 --> 00:19:09,200 -remember I did this while +remember I did this wild 375 00:19:09,200 --> 00:19:11,645 @@ -1727,11 +1727,11 @@ 376 00:19:11,645 --> 00:19:13,205 -Why on earth? +Why on earth 377 00:19:13,205 --> 00:19:14,480 -Is that all correct? +is that all correct? 378 00:19:14,480 --> 00:19:16,355 diff -r 260e330f7466 -r 42733adf2a48 videos/02-cw1.srt --- /dev/null Thu Jan 01 00:00:00 1970 +0000 +++ b/videos/02-cw1.srt Mon Oct 05 17:46:12 2020 +0100 @@ -0,0 +1,1107 @@ +1 +00:00:06,410 --> 00:00:09,390 +The final video for this week. + +2 +00:00:09,390 --> 00:00:12,465 +And let me say something +about the coursework. + +3 +00:00:12,465 --> 00:00:15,300 +First. You can solve + +4 +00:00:15,300 --> 00:00:17,745 +the coursework in any programming +language you're like, + +5 +00:00:17,745 --> 00:00:22,440 +I already said that. You +have to submit a PDF file. + +6 +00:00:22,440 --> 00:00:23,850 +There will be some questions, + +7 +00:00:23,850 --> 00:00:26,250 +so you have to write down the +solution to this questions. + +8 +00:00:26,250 --> 00:00:30,315 +Please use a PDF and you have +to submit your source code. + +9 +00:00:30,315 --> 00:00:35,580 +And yes, if you do use a +language which isn't Scala, + +10 +00:00:35,580 --> 00:00:39,450 +it might help to tell me +how I can run your code. + +11 +00:00:39,450 --> 00:00:42,550 +If I can't run your code, +I will ask you anyway. + +12 +00:00:42,550 --> 00:00:50,044 +Also, please submit both the +PDF and the code in a file, + +13 +00:00:50,044 --> 00:00:52,730 +in a zip file, which generates + +14 +00:00:52,730 --> 00:00:55,835 +a subdirectory with your +name and your family name. + +15 +00:00:55,835 --> 00:00:58,970 +That we'll just save a +lot of time for me to try + +16 +00:00:58,970 --> 00:01:02,030 +to figure out who +has submitted what. + +17 +00:01:02,030 --> 00:01:04,445 +Please do that. + +18 +00:01:04,445 --> 00:01:07,789 +So what is the task +into coursework? + +19 +00:01:07,789 --> 00:01:09,885 +I essentially showed you how + +20 +00:01:09,885 --> 00:01:11,995 +the Brzozowski matcher + +21 +00:01:11,995 --> 00:01:14,645 +works for the basic +regular expressions. + +22 +00:01:14,645 --> 00:01:16,295 +I also showed you actually how it + +23 +00:01:16,295 --> 00:01:18,110 +works for the n-times. + +24 +00:01:18,110 --> 00:01:20,300 +And the task in coursework + +25 +00:01:20,300 --> 00:01:22,970 +is that you extend the +Brzozowski matcher to + +26 +00:01:22,970 --> 00:01:25,820 +the other regular expressions + +27 +00:01:25,820 --> 00:01:27,800 +from the extended +regular expressions. + +28 +00:01:27,800 --> 00:01:30,245 +So you're supposed +to extended with + +29 +00:01:30,245 --> 00:01:34,490 +r-plus, optional r, for n-times. + +30 +00:01:34,490 --> 00:01:38,420 +You've already seen that. +For 0 or more times, + +31 +00:01:38,420 --> 00:01:41,120 +but not more than m +times regular expression. + +32 +00:01:41,120 --> 00:01:45,890 +For at least n-times regular +expression and for between + +33 +00:01:45,890 --> 00:01:47,480 +n times and m times + +34 +00:01:47,480 --> 00:01:50,810 +regular expression and also this +NOT regular expression. + +35 +00:01:50,810 --> 00:01:52,790 +So your task is to + +36 +00:01:52,790 --> 00:01:55,310 +essentially find the definition + +37 +00:01:55,310 --> 00:01:57,155 +for nullable in these cases. + +38 +00:01:57,155 --> 00:02:00,215 +Find a definition for derivative, + +39 +00:02:00,215 --> 00:02:02,480 +implement them, +write them down in + +40 +00:02:02,480 --> 00:02:06,065 +a PDF and then do some +experiments with them. + +41 +00:02:06,065 --> 00:02:08,875 +Well, how can you do that? + +42 +00:02:08,875 --> 00:02:10,370 +Well I've given you + +43 +00:02:10,370 --> 00:02:13,565 +the meaning of these additional +regular expressions. + +44 +00:02:13,565 --> 00:02:16,730 +So here's, for example, the +meaning of this r-plus. + +45 +00:02:16,730 --> 00:02:18,290 +So that would be, I have + +46 +00:02:18,290 --> 00:02:22,115 +at least one copy up to infinity. + +47 +00:02:22,115 --> 00:02:25,070 +And the optional-r would be it's + +48 +00:02:25,070 --> 00:02:28,370 +the language of r plus +the empty string. + +49 +00:02:28,370 --> 00:02:30,440 +If I have it exactly n times, + +50 +00:02:30,440 --> 00:02:31,985 +then it would be +just the language + +51 +00:02:31,985 --> 00:02:34,010 +of r exactly n-times. + +52 +00:02:34,010 --> 00:02:38,105 +And here you have union +from 0 to m and so on. + +53 +00:02:38,105 --> 00:02:42,560 +This might be a slightly +interesting regular expression. + +54 +00:02:42,560 --> 00:02:46,580 +So that's supposed to be the +character set that is very + +55 +00:02:46,580 --> 00:02:48,410 +similar to character ranges like + +56 +00:02:48,410 --> 00:02:51,005 +in the existing regular +expression matchers. + +57 +00:02:51,005 --> 00:02:52,820 +So this just says + +58 +00:02:52,820 --> 00:02:55,565 +...this regular +expression can match + +59 +00:02:55,565 --> 00:03:00,860 +either the character c1 or +the character c2 up to cn. + +60 +00:03:00,860 --> 00:03:03,620 +Why do I include +that regular expression? + +61 +00:03:03,620 --> 00:03:09,050 +I could also just say +c1 plus c2 plus c3... + +62 +00:03:09,050 --> 00:03:12,889 +a big alternative. +Well that's possible. + +63 +00:03:12,889 --> 00:03:16,595 +But remember the Achilles' +heel of this algorithm is, + +64 +00:03:16,595 --> 00:03:18,425 +if the regular expression is large, + +65 +00:03:18,425 --> 00:03:20,825 +then the computation +take always a long time. + +66 +00:03:20,825 --> 00:03:23,840 +So I'm trying to compress +that as much as I can. + +67 +00:03:23,840 --> 00:03:25,370 +So instead of giving a big + +68 +00:03:25,370 --> 00:03:29,134 +alternative, I am trying to +give one regular expression, + +69 +00:03:29,134 --> 00:03:31,340 +which can not just match +a single character, + +70 +00:03:31,340 --> 00:03:34,230 +but a set of characters. + +71 +00:03:34,630 --> 00:03:36,980 +How can you be sure that + +72 +00:03:36,980 --> 00:03:39,215 +definition you come +up with are correct? + +73 +00:03:39,215 --> 00:03:41,450 +Well, I showed you which are + +74 +00:03:41,450 --> 00:03:45,170 +the properties these two +functions need to satisfy. + +75 +00:03:45,170 --> 00:03:47,060 +I've given you here what + +76 +00:03:47,060 --> 00:03:49,325 +the meaning of these +expressions are. + +77 +00:03:49,325 --> 00:03:52,700 +So you will always know what's +on the right-hand side. + +78 +00:03:52,700 --> 00:03:54,515 +This is completely determined. + +79 +00:03:54,515 --> 00:03:56,720 +You essentially +have to fill something + +80 +00:03:56,720 --> 00:03:58,910 +equivalent on the left-hand side. + +81 +00:03:58,910 --> 00:04:02,105 +That's the main task +of the coursework. + +82 +00:04:02,105 --> 00:04:08,090 +And you can start from the +files I provided on KEATS. + +83 +00:04:08,090 --> 00:04:12,125 +So you would just have to +add these additional cases. + +84 +00:04:12,125 --> 00:04:15,020 +When you add these +additional cases + +85 +00:04:15,020 --> 00:04:17,330 +and you're using the Scala language, + +86 +00:04:17,330 --> 00:04:18,980 +you can do me a favour. + +87 +00:04:18,980 --> 00:04:22,550 +You can call these +constructors for + +88 +00:04:22,550 --> 00:04:25,400 +these different regular +expressions as RANGE, + +89 +00:04:25,400 --> 00:04:28,985 +PLUS, OPTIONAL and NTIMES, +UPTO, FROM and BETWEEN. + +90 +00:04:28,985 --> 00:04:31,370 +Remember I have this +convention to always + +91 +00:04:31,370 --> 00:04:34,025 +use capital letters +for regular expressions. + +92 +00:04:34,025 --> 00:04:36,680 +It would be nice if you could use + +93 +00:04:36,680 --> 00:04:39,260 +these names because +then it will be + +94 +00:04:39,260 --> 00:04:42,335 +very easy for me +to test your code. + +95 +00:04:42,335 --> 00:04:44,690 +If you're using a different +programming language + +96 +00:04:44,690 --> 00:04:46,370 +like let's say Rust, + +97 +00:04:46,370 --> 00:04:48,860 +I expect some people will use that, where I + +98 +00:04:48,860 --> 00:04:51,380 +have no idea what are the +conventions in your language, + +99 +00:04:51,380 --> 00:04:53,420 +how you have to call +these constructors, + +100 +00:04:53,420 --> 00:04:56,420 +we will see whatever you +submit. I'll have a look. + +101 +00:04:56,420 --> 00:04:59,360 +There's one more +constraint I have to + +102 +00:04:59,360 --> 00:05:02,194 +impose to make this +coursework interesting. + +103 +00:05:02,194 --> 00:05:04,730 +I do not want you +that you just take + +104 +00:05:04,730 --> 00:05:07,370 +these extended regular +expressions and that you + +105 +00:05:07,370 --> 00:05:10,550 +expand them using +their definition. + +106 +00:05:10,550 --> 00:05:12,320 +Because that would make the regular + +107 +00:05:12,320 --> 00:05:13,955 +expression matcher very slow. + +108 +00:05:13,955 --> 00:05:16,160 +So for example, if +you want to define + +109 +00:05:16,160 --> 00:05:18,785 +what's the derivative for r-plus, + +110 +00:05:18,785 --> 00:05:21,560 +then what does not +count as a solution: + +111 +00:05:21,560 --> 00:05:24,770 +if you just expand that +to the definition that r + +112 +00:05:24,770 --> 00:05:28,935 +plus is equivalent to +r followed by r-star. + +113 +00:05:28,935 --> 00:05:32,845 +So in code, what I +would like to not see, + +114 +00:05:32,845 --> 00:05:35,680 +I would not give any +marks for that is, + +115 +00:05:35,680 --> 00:05:37,780 +if you add the plus as follows, + +116 +00:05:37,780 --> 00:05:39,910 +so that is still perfectly fine. + +117 +00:05:39,910 --> 00:05:42,160 +There's nothing you +can do differently. + +118 +00:05:42,160 --> 00:05:44,065 +That would be the constructor. + +119 +00:05:44,065 --> 00:05:46,480 +But when you define nullable, + +120 +00:05:46,480 --> 00:05:49,180 +I do not want to see +that you defined + +121 +00:05:49,180 --> 00:05:51,790 +this plus r as nullable + +122 +00:05:51,790 --> 00:05:54,385 +of the sequence of r and star-r, + +123 +00:05:54,385 --> 00:05:58,075 +just unfolding +the definition. + +124 +00:05:58,075 --> 00:06:00,415 +As you can see, when you come + +125 +00:06:00,415 --> 00:06:02,815 +up with a much better +solution for that. + +126 +00:06:02,815 --> 00:06:05,110 +This is a very inefficient way + +127 +00:06:05,110 --> 00:06:07,090 +for how to calculate nullable. + +128 +00:06:07,090 --> 00:06:10,825 +And so the same for derivative +would not be allowed. + +129 +00:06:10,825 --> 00:06:13,895 +If you, I have the plus r here, + +130 +00:06:13,895 --> 00:06:16,685 +you can't just unfold +the definition, + +131 +00:06:16,685 --> 00:06:19,790 +with r-plus +being defined as r + +132 +00:06:19,790 --> 00:06:21,350 +followed by r star and + +133 +00:06:21,350 --> 00:06:23,375 +then just build the +derivative of that. + +134 +00:06:23,375 --> 00:06:25,280 +You have to find something more + +135 +00:06:25,280 --> 00:06:28,730 +primitive that involves +only the derivative of r, + +136 +00:06:28,730 --> 00:06:31,790 +essentially, nothing +more involved. The same + +137 +00:06:31,790 --> 00:06:33,815 +if you have n-times, for example, + +138 +00:06:33,815 --> 00:06:39,935 +you can't just unfold this +n-times in this sequence + +139 +00:06:39,935 --> 00:06:43,310 +of .... n-copies + +140 +00:06:43,310 --> 00:06:47,285 +of this r. You have to get +something more primitive. + +141 +00:06:47,285 --> 00:06:49,760 +Otherwise, as you've +seen earlier, + +142 +00:06:49,760 --> 00:06:53,135 +your regular expression matcher +would be totally slow. + +143 +00:06:53,135 --> 00:06:55,475 +When you submit your solution, + +144 +00:06:55,475 --> 00:06:58,445 +please submit this +solution in the PDF. + +145 +00:06:58,445 --> 00:07:01,580 +So give the cases of your definition + +146 +00:07:01,580 --> 00:07:06,004 +in a form like that for +nullable and derivative. + +147 +00:07:06,004 --> 00:07:08,405 +And also implement it in code. + +148 +00:07:08,405 --> 00:07:10,910 +That is just helping me to + +149 +00:07:10,910 --> 00:07:13,400 +find out whether you have +the correct solution or not. + +150 +00:07:13,400 --> 00:07:15,710 +So you needed a kind of +mathematical notation of + +151 +00:07:15,710 --> 00:07:18,695 +your solution, and +an actual code. + +152 +00:07:18,695 --> 00:07:22,414 +And then once you +have your definition, + +153 +00:07:22,414 --> 00:07:25,699 +also make sure you try +it out on some examples. + +154 +00:07:25,699 --> 00:07:28,970 +These will be the examples +I will definitely try out, + +155 +00:07:28,970 --> 00:07:30,560 +but probably also more + +156 +00:07:30,560 --> 00:07:33,215 +depending on what +definitions you give me. + +157 +00:07:33,215 --> 00:07:38,300 +There's one more question I +ask you to do. You've seen + +158 +00:07:38,300 --> 00:07:40,130 +there's some regular +expressions that + +159 +00:07:40,130 --> 00:07:42,215 +are involved for characters, + +160 +00:07:42,215 --> 00:07:46,925 +for character ranges or +character sets essentially. + +161 +00:07:46,925 --> 00:07:48,665 +And sometimes I also want to have + +162 +00:07:48,665 --> 00:07:51,665 +just a reg expression which +can match any character!! + +163 +00:07:51,665 --> 00:07:56,195 +And I could have for all of +them separate definitions. + +164 +00:07:56,195 --> 00:07:57,710 +And that would mean I also need + +165 +00:07:57,710 --> 00:07:59,645 +separate definitions for nullable, + +166 +00:07:59,645 --> 00:08:02,405 +separate definitions +for derivative. + +167 +00:08:02,405 --> 00:08:05,825 +But actually they can be +generalized to just one constructor. + +168 +00:08:05,825 --> 00:08:08,210 +And that is if we introduce + +169 +00:08:08,210 --> 00:08:11,630 +a constructor for regular expressions, + +170 +00:08:11,630 --> 00:08:13,760 +which not just takes +a single character + +171 +00:08:13,760 --> 00:08:15,469 +or set of characters. + +172 +00:08:15,469 --> 00:08:18,245 +But, which I call here CFUN. + +173 +00:08:18,245 --> 00:08:23,165 +And it takes a function from +characters to booleans. + +174 +00:08:23,165 --> 00:08:24,470 +So if I want to match + +175 +00:08:24,470 --> 00:08:26,900 +just a single character +then this function f + +176 +00:08:26,900 --> 00:08:29,060 +would only say true + +177 +00:08:29,060 --> 00:08:32,225 +if it gets as argument +the single character. + +178 +00:08:32,225 --> 00:08:34,850 +If I have a character set, + +179 +00:08:34,850 --> 00:08:36,515 +then I would say, well, + +180 +00:08:36,515 --> 00:08:38,300 +I need a function +which says true + +181 +00:08:38,300 --> 00:08:40,970 +for all the characters +in this set. + +182 +00:08:40,970 --> 00:08:43,460 +And otherwise it's false. + +183 +00:08:43,460 --> 00:08:46,205 +And if I want to +match any character!!, + +184 +00:08:46,205 --> 00:08:48,500 +then they should get a function + +185 +00:08:48,500 --> 00:08:53,450 +which says true for +all characters. + +186 +00:08:53,450 --> 00:08:56,630 +Okay? If I have +such a constructor + +187 +00:08:56,630 --> 00:09:00,140 +that actually I can save +myself a bit of work. + +188 +00:09:00,140 --> 00:09:03,200 +And I can just have one case + +189 +00:09:03,200 --> 00:09:06,920 +for nullable and one +case for CFUNS. + +190 +00:09:06,920 --> 00:09:09,800 +So that would then subsume +the character ranges and + +191 +00:09:09,800 --> 00:09:13,385 +the character and also +this ALL regular expression. + +192 +00:09:13,385 --> 00:09:15,380 +Ok, this was not explicitly +included at the beginning, + +193 +00:09:15,380 --> 00:09:17,510 +but that's what I can now define. + +194 +00:09:17,510 --> 00:09:21,740 +And then I can define +this for characters, + +195 +00:09:21,740 --> 00:09:23,885 +just as special cases +for these functions. + +196 +00:09:23,885 --> 00:09:25,610 +And I told you already +what this function + +197 +00:09:25,610 --> 00:09:28,025 +should look like in +these three cases. + +198 +00:09:28,025 --> 00:09:30,350 +So I let ... + +199 +00:09:30,350 --> 00:09:33,515 +you decide how you're +going to implement that. + +200 +00:09:33,515 --> 00:09:35,450 +If you first define + +201 +00:09:35,450 --> 00:09:38,495 +your implementation is +all these explicit cases. + +202 +00:09:38,495 --> 00:09:41,900 +Because in these cases it's +actually more intuitive, + +203 +00:09:41,900 --> 00:09:45,980 +what nullable and +derivative should be. + +204 +00:09:45,980 --> 00:09:48,035 +And then in a second step, + +205 +00:09:48,035 --> 00:09:51,140 +you implement these +more general cases. + +206 +00:09:51,140 --> 00:09:53,360 +And just keep the original ones + +207 +00:09:53,360 --> 00:09:54,500 +in your implementation if you + +208 +00:09:54,500 --> 00:09:57,710 +want, or if you feel +adventurous enough, + +209 +00:09:57,710 --> 00:09:59,225 +and want to be lazy, + +210 +00:09:59,225 --> 00:10:01,115 +that you just implement that. + +211 +00:10:01,115 --> 00:10:02,660 +And then you have already done + +212 +00:10:02,660 --> 00:10:06,530 +at least two constructors +by just implementing one. + +213 +00:10:06,530 --> 00:10:08,915 +But as said that +requires a bit skill + +214 +00:10:08,915 --> 00:10:11,450 +of generalizing how +that should be. + +215 +00:10:11,450 --> 00:10:14,180 +And the other questions +are just then + +216 +00:10:14,180 --> 00:10:16,745 +trying out your +expression matcher on + +217 +00:10:16,745 --> 00:10:19,580 +some examples, more +power examples, + +218 +00:10:19,580 --> 00:10:22,400 +and they should be +very easy to solve. + +219 +00:10:22,400 --> 00:10:24,605 +So I hope it's not +too much asked for + +220 +00:10:24,605 --> 00:10:26,615 +and I hope you have fun. + +221 +00:10:26,615 --> 00:10:29,675 +It is not too much ask +because you can, + +222 +00:10:29,675 --> 00:10:32,420 +I hope it's not too much +because you can start from + +223 +00:10:32,420 --> 00:10:35,615 +my definitions or +my implementation. + +224 +00:10:35,615 --> 00:10:39,425 +It's really essentially +mostly is brain work, + +225 +00:10:39,425 --> 00:10:42,170 +how these nullable and +derivative should work. + +226 +00:10:42,170 --> 00:10:44,510 +If you're in a +language like Scala, + +227 +00:10:44,510 --> 00:10:48,875 +the implementation should +then be easy-peasy. + +228 +00:10:48,875 --> 00:10:51,365 +If you are in a different language + +229 +00:10:51,365 --> 00:10:52,610 +I assume you also + +230 +00:10:52,610 --> 00:10:54,890 +dedicated to that +language to start with, + +231 +00:10:54,890 --> 00:10:58,475 +so it should be also very +easy for you to translate + +232 +00:10:58,475 --> 00:11:01,100 +my Scala code into whatever +language you want to + +233 +00:11:01,100 --> 00:11:04,280 +do, first and then +start from there. + +234 +00:11:04,280 --> 00:11:06,230 +If there are any questions, + +235 +00:11:06,230 --> 00:11:08,360 +please ask me or the TAs. + +236 +00:11:08,360 --> 00:11:10,040 +We are trying to be as helpful + +237 +00:11:10,040 --> 00:11:12,900 +as possible with the coursework. + +238 +00:11:13,240 --> 00:11:17,885 +Bear in mind, this is the +first step in our compiler. + +239 +00:11:17,885 --> 00:11:21,005 +The coursework 2 will then +build on top of that. + +240 +00:11:21,005 --> 00:11:25,770 +So it's better to get this +correct first. Thanks.