# HG changeset patch # User Christian Urban # Date 1632486942 -3600 # Node ID 499405058cfd00866eb8fb0eb8b4deb79de814b7 # Parent a3418ee8c404ec415a335829546f77e605c0d5c8 updated diff -r a3418ee8c404 -r 499405058cfd slides/slides01.pdf Binary file slides/slides01.pdf has changed diff -r a3418ee8c404 -r 499405058cfd slides/slides01.tex --- a/slides/slides01.tex Sun Sep 05 23:51:37 2021 +0100 +++ b/slides/slides01.tex Fri Sep 24 13:35:42 2021 +0100 @@ -56,7 +56,7 @@ \begin{tabular}{@ {}c@ {}} \\[-3mm] \LARGE Compilers and \\[-1mm] - \LARGE FFFormal Languages\\[-3mm] + \LARGE Formal Languages\\[-3mm] \end{tabular}} \begin{center} @@ -423,20 +423,18 @@ \begin{frame}[c] \frametitle{Some Housekeeping} -\textbf{Exams will be online:}\bigskip +\textbf{Exam will be online:}\bigskip \begin{itemize} -\item final exam in January (30\%) -\item mid-term shortly after Reading Week (10\%)\bigskip - -\item weekly engagement (10\%) +\item final exam in January (35\%) +\item five CWs (65\%) \end{itemize}\bigskip\bigskip\pause \textbf{Weekly Homework (optional):} \begin{itemize} -\item uploaded on KEATS, send answers via email, responded individually -\item \alert{\bf all} questions in the exam and mid-term will be from the HW!! +\item uploaded on KEATS, send answers via email, (try to!) respond individually +\item \alert{\bf all} questions in the exam will be from the HWs!! \end{itemize} \end{frame} @@ -446,18 +444,18 @@ \begin{frame}[c] \frametitle{Some Housekeeping} -\textbf{Coursework (5 accounting for 45\%):}\bigskip +\textbf{Coursework (5 accounting for 65\%):}\bigskip \begin{itemize} \item matcher (5\%) -\item lexer (8\%) +\item lexer (10\%) \item parser / interpreter (10\%) -\item JVM compiler (10\%) -\item LLVM compiler (12\%) +\item JVM compiler (15\%) +\item LLVM compiler (25\%) \end{itemize}\bigskip\pause -you can use any programming language you like (Haskell, Rust)\\\pause -you can use any code I showed you and uploaded to KEATS\ldots\textbf{BUT NOTHING ELSE!}\pause +you can use \alert{any} programming language you like (Haskell, Rust)\\\pause +you can use any code I show you and is uploaded to KEATS\ldots\textbf{BUT NOTHING ELSE!} \end{frame} %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% diff -r a3418ee8c404 -r 499405058cfd videos/01-evilregexes.srt --- a/videos/01-evilregexes.srt Sun Sep 05 23:51:37 2021 +0100 +++ b/videos/01-evilregexes.srt Fri Sep 24 13:35:42 2021 +0100 @@ -169,7 +169,7 @@ 38 00:01:55,790 --> 00:01:59,105 match either it is there -or it ss not there. +or it is not there. 39 00:01:59,105 --> 00:02:01,340 @@ -229,7 +229,7 @@ 51 00:02:33,635 --> 00:02:36,545 Here you also have regular -expression which can +expressions which can 52 00:02:36,545 --> 00:02:40,070 @@ -304,11 +304,11 @@ 67 00:03:19,925 --> 00:03:23,075 -And here's a particular one. +And here's a particular one 68 00:03:23,075 --> 00:03:25,820 -Trying to match something +trying to match something 69 00:03:25,820 --> 00:03:28,770 @@ -521,8 +521,8 @@ 114 00:05:42,935 --> 00:05:46,805 -And we measure time with longer -and longer strings of a. +And we measure the time with longer +and longer strings of a's. 115 00:05:46,805 --> 00:05:48,770 @@ -564,7 +564,7 @@ 123 00:06:09,110 --> 00:06:11,255 -That's indicated by this none. +That's indicated by this None. 124 00:06:11,255 --> 00:06:13,925 @@ -593,38 +593,40 @@ 130 00:06:28,040 --> 00:06:31,490 -So 15 is also a let's take +So 15 is also OK. +Let's take 20. 131 00:06:31,490 --> 00:06:36,965 -28th notes already +Hmmm this already takes double the time. 132 00:06:36,965 --> 00:06:42,440 -Twenty-five longer. +Twenty-five. Then even longer. 133 00:06:42,440 --> 00:06:45,680 -Okay, that suddenly -from 02 seconds, +Okay, then suddenly +from 0.2 seconds, 134 00:06:45,680 --> 00:06:48,960 -it takes almost four seconds. +it now takes almost four seconds. 135 00:06:49,600 --> 00:06:54,890 -Six this +Twenty-Six, this 136 00:06:54,890 --> 00:07:01,415 -takes six seconds -already Double, okay? +takes six seconds... +already double. 137 00:07:01,415 --> 00:07:07,229 -Go to 28. That would be now. +Let's go to 28. That would be +...hmmm....hmmm 138 00:07:08,890 --> 00:07:11,840 @@ -657,11 +659,11 @@ 144 00:07:24,710 --> 00:07:26,570 -AES is actually not much +a's is actually not matched 145 00:07:26,570 --> 00:07:28,490 -by that you see it's +by that. You see it's still not finished. 146 @@ -679,13 +681,13 @@ 149 00:07:36,530 --> 00:07:40,805 -30 would be already -more than a minute. +30, we would be already here +for more than a minute. 150 00:07:40,805 --> 00:07:43,940 -And if I could read -something like hundreds, +And if I could use +something like 100, 151 00:07:43,940 --> 00:07:46,220 @@ -756,7 +758,7 @@ 166 00:08:23,930 --> 00:08:26,150 -in the strings this work +in the strings this regular expression matches. 167 @@ -772,7 +774,7 @@ 169 00:08:31,460 --> 00:08:35,285 And we again, we just use -repeated A's for that. +repeated a's for that. 170 00:08:35,285 --> 00:08:38,195 @@ -785,7 +787,7 @@ 172 00:08:41,930 --> 00:08:44,540 -So ten SBA, good. +So ten a's is very good. 173 00:08:44,540 --> 00:08:46,340 @@ -801,15 +803,15 @@ 176 00:08:50,525 --> 00:08:54,725 -Friendly. Although pretty fast. +Twenty...also pretty fast. 177 00:08:54,725 --> 00:08:59,120 -Five, again, +Twenty-five... Again, 178 00:08:59,120 --> 00:09:06,650 -somehow is kind of +somehow is a kind of threshold that is 25, 26. 179 @@ -827,8 +829,8 @@ 182 00:09:17,165 --> 00:09:19,250 -the Times has always -doubling from +the times always +double from 183 00:09:19,250 --> 00:09:21,860 @@ -849,7 +851,7 @@ 187 00:09:30,230 --> 00:09:32,165 -Let's choose twenties or maize. +It is just twenty-or-something a's. 188 00:09:32,165 --> 00:09:35,419 @@ -858,12 +860,12 @@ 189 00:09:35,419 --> 00:09:38,720 -with kilobytes of data. +with Gigabytes of data 190 00:09:38,720 --> 00:09:42,260 -This, these regular -expressions that would years +with these regular +expressions that would 191 00:09:42,260 --> 00:09:48,150 @@ -893,7 +895,7 @@ 197 00:10:03,415 --> 00:10:05,980 -is the reg expression +is the regular expression and we just having 198 @@ -902,7 +904,7 @@ 199 00:10:08,320 --> 00:10:11,905 -strings from five up till 28. +strings from 5 up till 28. 200 00:10:11,905 --> 00:10:14,305 @@ -919,7 +921,7 @@ 203 00:10:19,900 --> 00:10:24,925 but then starting from -23, skidding pretty slow. +23, it is getting pretty slow. 204 00:10:24,925 --> 00:10:27,445 @@ -928,11 +930,11 @@ 205 00:10:27,445 --> 00:10:29,230 -By the way, I'm not quoting here. +By the way, I'm not gloating here. 206 00:10:29,230 --> 00:10:33,755 -Scala, using internally +Scala uses internally the regular expression 207 @@ -969,16 +971,16 @@ 214 00:10:55,490 --> 00:10:57,605 -So I think I can that. +So I think I can 215 00:10:57,605 --> 00:10:59,165 -Now, just finish here. +now, just finish this here. 216 00:10:59,165 --> 00:11:04,025 -You see the problem. Just -for completeness sake. +You see the problem. +Just for completeness sake. 217 00:11:04,025 --> 00:11:07,010 @@ -997,7 +999,7 @@ 220 00:11:12,935 --> 00:11:20,300 And again it tries out -strings between 130 here. +strings between 1 and 30 here. 221 00:11:20,300 --> 00:11:23,450 @@ -1011,7 +1013,7 @@ 223 00:11:25,565 --> 00:11:29,780 of links up till 20 -AES is pretty fast. +a's is pretty fast. 224 00:11:29,780 --> 00:11:32,495 @@ -1060,20 +1062,20 @@ 234 00:11:52,250 --> 00:11:55,620 -Hey, I also just stop that here. +I also just stop that here. 235 00:11:55,710 --> 00:12:00,985 -Okay, this slight collect -this information about times. +Okay, this slide collects +the information about times. 236 00:12:00,985 --> 00:12:03,400 -On the right hand side will +On the right-hand side will 237 00:12:03,400 --> 00:12:05,860 -be our regular expression mantra, +be our regular expression matcher, 238 00:12:05,860 --> 00:12:08,290 @@ -1086,7 +1088,7 @@ 240 00:12:10,795 --> 00:12:14,260 -barriers than regular +various other regular expression matching engines? 241 @@ -1096,12 +1098,12 @@ 242 00:12:19,080 --> 00:12:23,335 -Possible a n times a n times. +Possible a n-times a n-times. 243 00:12:23,335 --> 00:12:26,890 -And on the lowest -is a star, star b. +And on the lower +is (a*)* b. 244 00:12:26,890 --> 00:12:30,370 @@ -1114,16 +1116,16 @@ 246 00:12:35,335 --> 00:12:38,925 -And on the y axis is the time. +And on the y-axis is the time 247 00:12:38,925 --> 00:12:41,660 -They need to decide whether +they need to decide whether 248 00:12:41,660 --> 00:12:44,615 the string is matched by -the rate expression or not. +the regular expression or not. 249 00:12:44,615 --> 00:12:46,415 @@ -1131,16 +1133,16 @@ 250 00:12:46,415 --> 00:12:47,945 -Java eight in JavaScript, +Java 8 and JavaScript, 251 00:12:47,945 --> 00:12:52,250 they max out approximately -at between 2530. +at between 25 and 30. 252 00:12:52,250 --> 00:12:53,900 -The kristin, it takes already +Because then it takes already 253 00:12:53,900 --> 00:12:55,160 @@ -1158,7 +1160,7 @@ 256 00:13:00,815 --> 00:13:03,830 -Python and derived Ruby max out +Python and Ruby max out 257 00:13:03,830 --> 00:13:07,220 @@ -1172,18 +1174,17 @@ 259 00:13:10,400 --> 00:13:13,940 -whether this rec expression +whether this regular expression actually matches the string. 260 00:13:13,940 --> 00:13:16,790 Contrast that with -the reg expression + 261 00:13:16,790 --> 00:13:19,235 -which we are regular -expression mantra, +the regular expression matcher 262 00:13:19,235 --> 00:13:21,470 @@ -1206,7 +1207,7 @@ 266 00:13:32,285 --> 00:13:34,850 -First version may be +The first version will be also relatively slow. 267 @@ -1228,7 +1229,7 @@ 271 00:13:42,380 --> 00:13:45,740 you have to be careful -about the x axis because +about the x-axis because 272 00:13:45,740 --> 00:13:49,385 @@ -1237,12 +1238,12 @@ 273 00:13:49,385 --> 00:13:51,695 -It's actually 4 million A's. +It's actually 4 million a's. 274 00:13:51,695 --> 00:13:55,100 So our regular -expression match or need +expression matcher needs 275 00:13:55,100 --> 00:13:57,635 @@ -1251,11 +1252,11 @@ 276 00:13:57,635 --> 00:14:00,725 match a string of length -of 4 million A's. +of 4 million a's. 277 00:14:00,725 --> 00:14:04,430 -Contrast that Python, Java eight, +Contrast that Python, Java 8, 278 00:14:04,430 --> 00:14:06,770 @@ -1264,16 +1265,15 @@ 279 00:14:06,770 --> 00:14:09,905 already for a string -of length just 30, +of length just 30. 280 00:14:09,905 --> 00:14:12,365 -unless you're very -careful with Java eight. +I was very careful with Java 8. 281 00:14:12,365 --> 00:14:15,725 -Yes, Java nine and above, +Yes, Java 9 and above, 282 00:14:15,725 --> 00:14:17,180 @@ -1291,8 +1291,8 @@ 285 00:14:22,805 --> 00:14:27,050 -It's this data. I -call this slide. +with this data. +I call this slide: 286 00:14:27,050 --> 00:14:29,675 @@ -1306,7 +1306,7 @@ 288 00:14:33,515 --> 00:14:34,910 -at least more times by +abysmal times by 289 00:14:34,910 --> 00:14:38,015 @@ -1326,7 +1326,7 @@ 292 00:14:42,695 --> 00:14:47,495 And if you don't believe -in D times, I gave here, +in the times, I gave here, 293 00:14:47,495 --> 00:14:50,090 @@ -1336,7 +1336,7 @@ 294 00:14:50,090 --> 00:14:52,865 with the examples -I uploaded, Keats. +I uploaded on KEATS. 295 00:14:52,865 --> 00:14:55,235 @@ -1344,7 +1344,7 @@ 296 00:14:55,235 --> 00:14:57,470 -are used here in the examples. +I used here in the examples. 297 00:14:57,470 --> 00:14:59,255 @@ -1357,19 +1357,19 @@ 299 00:15:01,970 --> 00:15:05,449 These are two very -well chosen examples. +well chosen examples, 300 00:15:05,449 --> 00:15:07,145 -And I admit that's true. +and I admit that's true, 301 00:15:07,145 --> 00:15:09,410 -And such problem there never +and such problems never 302 00:15:09,410 --> 00:15:12,540 -causing any problems +cause any problems in real life. 303 @@ -1387,7 +1387,7 @@ 306 00:15:21,410 --> 00:15:23,885 -a company called cloudflare. +a company called Cloudflare. 307 00:15:23,885 --> 00:15:27,560 @@ -1395,7 +1395,7 @@ 308 00:15:27,560 --> 00:15:30,935 -which host very +which hosts very well-known web pages. 309 @@ -1410,21 +1410,21 @@ 311 00:15:37,340 --> 00:15:39,320 -But then a Rekha expression, +But then a regular expression, 312 00:15:39,320 --> 00:15:41,180 -actually this one caused +actually this one, caused a problem and you 313 00:15:41,180 --> 00:15:43,265 can see they're also -like two stars. +two stars 314 00:15:43,265 --> 00:15:44,630 -They are at the end. +at the end. 315 00:15:44,630 --> 00:15:46,955 @@ -1463,8 +1463,8 @@ 323 00:16:04,040 --> 00:16:06,650 -this webpage had -also an outage from, +this webpage, had +also an outage for 324 00:16:06,650 --> 00:16:08,390 @@ -1472,8 +1472,8 @@ 325 00:16:08,390 --> 00:16:13,070 -Because a regular expression -then needed to format posts, +Because a regular expression, +needed to format posts, 326 00:16:13,070 --> 00:16:15,575 @@ -1487,7 +1487,7 @@ 328 00:16:19,010 --> 00:16:23,390 And again, there was a -semi kind of problem. +similar kind of problem. 329 00:16:23,390 --> 00:16:24,950 @@ -1506,20 +1506,20 @@ 332 00:16:31,730 --> 00:16:34,175 what surprised me is -that theoretician +that theoreticians, 333 00:16:34,175 --> 00:16:37,520 who sometimes dedicate their -life to regular expression. +life to regular expressions 334 00:16:37,520 --> 00:16:39,440 -And no really a lot about +and know really a lot about 335 00:16:39,440 --> 00:16:41,690 -them didn't know +them, didn't know anything about this. 336 @@ -1529,12 +1529,12 @@ 337 00:16:43,610 --> 00:16:46,160 -a name for that -regular expression, +a name for that: +Regular Expression 338 00:16:46,160 --> 00:16:47,975 -denial of service attack. +Denial of Service Attack. 339 00:16:47,975 --> 00:16:49,745 @@ -1547,11 +1547,11 @@ 341 00:16:51,230 --> 00:16:54,920 attackers look for -certain strings. +certain strings 342 00:16:54,920 --> 00:16:56,780 -You make your regular expression +that make your regular expression 343 00:16:56,780 --> 00:16:59,105 @@ -1559,12 +1559,12 @@ 344 00:16:59,105 --> 00:17:01,370 -And these kind of expressions, +And these kind of 345 00:17:01,370 --> 00:17:04,160 -regular expressions called -Eve of reg expression. +regular expressions are called +Evil Regular Expressions. 346 00:17:04,160 --> 00:17:06,350 @@ -1591,7 +1591,7 @@ 351 00:17:15,620 --> 00:17:18,560 your program one of -these reg expression. +these regular expressions. 352 00:17:18,560 --> 00:17:21,830 @@ -1609,7 +1609,7 @@ 355 00:17:25,640 --> 00:17:29,945 -which make the records +which make the regular matching engine topple over, 356 @@ -1619,15 +1619,15 @@ 357 00:17:31,820 --> 00:17:34,295 because your webpage is -probably not variable. +probably not available. 358 00:17:34,295 --> 00:17:36,140 -This is also sometimes called +This phenomenon is also sometimes 359 00:17:36,140 --> 00:17:39,350 -this phenomenon, +called catastrophic backtracking. 360 @@ -1643,7 +1643,7 @@ 362 00:17:46,910 --> 00:17:50,795 real life is actually -not to do with Lexus. +not to do with lexers. 363 00:17:50,795 --> 00:17:53,180 @@ -1656,7 +1656,7 @@ 365 00:17:55,040 --> 00:17:57,185 -like source bad reg expressions, +lexers. But regular expressions, 366 00:17:57,185 --> 00:18:00,065 @@ -1670,7 +1670,7 @@ 368 00:18:03,770 --> 00:18:06,590 -Remember, you having to +Remember, say you're having to 369 00:18:06,590 --> 00:18:10,130 @@ -1679,7 +1679,7 @@ 370 00:18:10,130 --> 00:18:13,640 -in packets which you think are K +in packets which you think are OK 371 00:18:13,640 --> 00:18:14,930 @@ -1727,7 +1727,8 @@ 381 00:18:39,080 --> 00:18:43,190 -so fast that the reg expressions +so fast that the regular +expressions 382 00:18:43,190 --> 00:18:45,169 @@ -1739,12 +1740,12 @@ 384 00:18:47,060 --> 00:18:49,880 -suddenly a reg expression -takes too much time +suddenly a regular expression +takes too much time? 385 00:18:49,880 --> 00:18:52,670 -to just stop the matching +Do you just stop the matching 386 00:18:52,670 --> 00:18:55,100 @@ -1778,7 +1779,7 @@ 392 00:19:09,965 --> 00:19:13,820 And it's always say that -Germans don't have any Yammer. +Germans don't have any humor. 393 00:19:13,820 --> 00:19:16,985 @@ -1793,11 +1794,11 @@ 395 00:19:19,145 --> 00:19:21,095 but feel free to -have a look which +have a look. 396 00:19:21,095 --> 00:19:23,705 -explains exactly that problem. +It explains exactly that problem. 397 00:19:23,705 --> 00:19:25,610