diff -r 500dfc51914d -r d1c9294fff65 videos/01-housekeeping.srt --- /dev/null Thu Jan 01 00:00:00 1970 +0000 +++ b/videos/01-housekeeping.srt Sat Sep 26 23:45:40 2020 +0100 @@ -0,0 +1,1218 @@ +1 +00:00:05,750 --> 00:00:10,395 +They come back! Before we can +start with any serious work, + +2 +00:00:10,395 --> 00:00:12,255 +let's do some housekeeping. + +3 +00:00:12,255 --> 00:00:17,380 +As you know, this year is a tad special. +While the broad direction is clear, + +4 +00:00:17,380 --> 00:00:19,730 +there are some organizational +details that + +5 +00:00:19,730 --> 00:00:22,370 +still need to be worked +out as we go along. + +6 +00:00:22,370 --> 00:00:23,930 +The main hub for + +7 +00:00:23,930 --> 00:00:27,215 +all the information is the +KEATS page of this module. + +8 +00:00:27,215 --> 00:00:29,300 +Let me say a few words on how + +9 +00:00:29,300 --> 00:00:35,269 +it is organized. The module is +organized into weekly topics. + +10 +00:00:35,269 --> 00:00:39,785 +So there are entries for weeks +one up till week ten. + +11 +00:00:39,785 --> 00:00:44,390 +Let's also proceed in +approximately weekly steps. + +12 +00:00:44,390 --> 00:00:46,790 +Because if you +already asked me in + +13 +00:00:46,790 --> 00:00:49,130 +week two something +about week ten, + +14 +00:00:49,130 --> 00:00:51,275 +then you might confuse me + +15 +00:00:51,275 --> 00:00:53,720 +and also your fellow students. + +16 +00:00:53,720 --> 00:00:55,970 +All the communication in + +17 +00:00:55,970 --> 00:00:59,945 +this module can be done in +the course discussion area. + +18 +00:00:59,945 --> 00:01:02,300 +You can ask any questions. + +19 +00:01:02,300 --> 00:01:05,465 +For example, I haven't +understood this point, + +20 +00:01:05,465 --> 00:01:07,910 +this video isn't clear, + +21 +00:01:07,910 --> 00:01:10,160 +I cannot run this code, + +22 +00:01:10,160 --> 00:01:13,910 +I have problems with +accessing the coursework. + +23 +00:01:13,910 --> 00:01:16,295 +This can be all discussed here. + +24 +00:01:16,295 --> 00:01:18,590 +If you have any private matter, + +25 +00:01:18,590 --> 00:01:20,900 +like, I can't understand why + +26 +00:01:20,900 --> 00:01:23,465 +did I get such a good +mark, for example? + +27 +00:01:23,465 --> 00:01:27,245 +Or there are some private +reasons why you can't keep up. + +28 +00:01:27,245 --> 00:01:31,430 +Please feel free to use +my private email address. + +29 +00:01:31,430 --> 00:01:35,630 +But for any other matter +related to the module, + +30 +00:01:35,630 --> 00:01:38,480 +try to use the course +discussion area. + +31 +00:01:38,480 --> 00:01:41,510 +And I will try to stay on top + +32 +00:01:41,510 --> 00:01:45,450 +of all the discussions +as much as I can. + +33 +00:01:45,580 --> 00:01:48,500 +Each week is organized into + +34 +00:01:48,500 --> 00:01:52,550 +a mandatory section and +an optional section. + +35 +00:01:52,550 --> 00:01:55,835 +The mandatory section +contains videos, + +36 +00:01:55,835 --> 00:01:58,190 +a handout, and the slides, + +37 +00:01:58,190 --> 00:02:00,725 +which I used to +produce the videos. + +38 +00:02:00,725 --> 00:02:04,324 +I recommend that your +first read the handouts, + +39 +00:02:04,324 --> 00:02:07,430 +then watch the videos and then + +40 +00:02:07,430 --> 00:02:11,030 +read the handout +again for each week. + +41 +00:02:11,030 --> 00:02:14,795 +Apologies, some of these +handouts are quite lengthy. + +42 +00:02:14,795 --> 00:02:17,615 +I think the longest +one is 20 pages. + +43 +00:02:17,615 --> 00:02:19,235 +On the good side, + +44 +00:02:19,235 --> 00:02:21,650 +these handouts are +written in such a way + +45 +00:02:21,650 --> 00:02:24,935 +that you can read them just +before you fall asleep + +46 +00:02:24,935 --> 00:02:26,720 +and you should still be able + +47 +00:02:26,720 --> 00:02:28,835 +to understand what is going on. + +48 +00:02:28,835 --> 00:02:32,105 +Also, they contain lots +of pictures and code. + +49 +00:02:32,105 --> 00:02:35,330 +So 20 pages is +a bit misleading. + +50 +00:02:35,330 --> 00:02:38,870 +You can see the second week +is organized similarly. + +51 +00:02:38,870 --> 00:02:41,030 +You will have some videos. + +52 +00:02:41,030 --> 00:02:42,260 +You will have a handout, + +53 +00:02:42,260 --> 00:02:43,624 +you have the slides, + +54 +00:02:43,624 --> 00:02:45,770 +and some optional resources. + +55 +00:02:45,770 --> 00:02:51,410 +I have also put up a general +section, with general + +56 +00:02:51,410 --> 00:02:57,965 +material, where I put +a PDF about my notation. + +57 +00:02:57,965 --> 00:03:01,385 +This might be very +useful because + +58 +00:03:01,385 --> 00:03:04,610 +remember this topic +is already many, + +59 +00:03:04,610 --> 00:03:07,445 +many decades old, +at least 50 years, + +60 +00:03:07,445 --> 00:03:09,185 +and over the time, + +61 +00:03:09,185 --> 00:03:11,510 +many authors and many lecturers + +62 +00:03:11,510 --> 00:03:13,745 +have introduced +their own notation. + +63 +00:03:13,745 --> 00:03:16,565 +So if you read anything +on the web or in books, + +64 +00:03:16,565 --> 00:03:19,640 +you might be confused about that + +65 +00:03:19,640 --> 00:03:24,035 +they call something, which +I call X, they call Y. + +66 +00:03:24,035 --> 00:03:26,150 +So in this PDF, + +67 +00:03:26,150 --> 00:03:30,320 +I have collected all the +information on what kind of + +68 +00:03:30,320 --> 00:03:34,940 +notation I am using and where +I introduce some shortcuts. + +69 +00:03:34,940 --> 00:03:38,240 +The problem is, you always +want to be precise, + +70 +00:03:38,240 --> 00:03:40,070 +but we also lazy. + +71 +00:03:40,070 --> 00:03:43,745 +We don't want to write +everything into the last detail. + +72 +00:03:43,745 --> 00:03:47,375 +So here I say something +about my notation. + +73 +00:03:47,375 --> 00:03:50,180 +I have also put up +a document about + +74 +00:03:50,180 --> 00:03:53,510 +the Scala I'm going to +use in this module. + +75 +00:03:53,510 --> 00:03:56,090 +This is important +for the coursework. + +76 +00:03:56,090 --> 00:03:58,175 +In the coursework which +will come in a moment, + +77 +00:03:58,175 --> 00:04:00,440 +you can use any +programming language you like + +78 +00:04:00,440 --> 00:04:03,665 +to solve the questions and +implement your code. + +79 +00:04:03,665 --> 00:04:05,615 +But, I'm sorry, + +80 +00:04:05,615 --> 00:04:09,830 +all the code I'm going to +show you will be in Scala. + +81 +00:04:09,830 --> 00:04:12,155 +And the main difference is that + +82 +00:04:12,155 --> 00:04:15,095 +between PEP course from last year + +83 +00:04:15,095 --> 00:04:17,675 +is that I'm going to +use the Ammonite + +84 +00:04:17,675 --> 00:04:21,170 +REPL instead of +the Scala REPL. + +85 +00:04:21,170 --> 00:04:23,615 +They work very similarly. + +86 +00:04:23,615 --> 00:04:26,070 +For example, + +87 +00:04:28,120 --> 00:04:33,710 +I can now evaluate code +and get a feedback of what + +88 +00:04:33,710 --> 00:04:37,100 +it calculates, just like the original one. + +89 +00:04:37,100 --> 00:04:40,220 +The difference is that +Ammonite is + +90 +00:04:40,220 --> 00:04:42,440 +an extension and +provides some features + +91 +00:04:42,440 --> 00:04:44,975 +which will be very useful for us. + +92 +00:04:44,975 --> 00:04:47,600 +And I recorded some +of the features + +93 +00:04:47,600 --> 00:04:50,970 +we are going to use +in this document. + +94 +00:04:52,330 --> 00:04:55,970 +The last point to note +on the KEATS webpage + +95 +00:04:55,970 --> 00:04:59,405 +is that in the optional section, + +96 +00:04:59,405 --> 00:05:02,270 +I will always give +the source code of + +97 +00:05:02,270 --> 00:05:05,845 +the programs I discussed +in the videos and in the handouts. + +98 +00:05:05,845 --> 00:05:08,420 +And I really, really +encourage you + +99 +00:05:08,420 --> 00:05:11,060 +that you play around with this code + +100 +00:05:11,060 --> 00:05:14,420 +to make sure you understand +what was discussed. + +101 +00:05:14,420 --> 00:05:17,540 +At the also have sometimes +some additional pointers + +102 +00:05:17,540 --> 00:05:19,490 +to interesting topics, which + +103 +00:05:19,490 --> 00:05:22,680 +you can read up in your own time. + +104 +00:05:22,690 --> 00:05:26,240 +Exams, I'm sorry, +there will be exams. + +105 +00:05:26,240 --> 00:05:31,760 +There will be a final exam in +January, counting for 30%. + +106 +00:05:31,760 --> 00:05:34,190 +There will be a +midterm shortly after + +107 +00:05:34,190 --> 00:05:37,565 +Reading Week, accounting for 10%. + +108 +00:05:37,565 --> 00:05:40,625 +This will be a normal exams, + +109 +00:05:40,625 --> 00:05:44,780 +just that they will be online +in some form or shape. + +110 +00:05:44,780 --> 00:05:49,939 +There will be also a weekly +engagement component. + +111 +00:05:49,939 --> 00:05:52,370 +So 1% in each week, + +112 +00:05:52,370 --> 00:05:56,360 +where I make sure that +you are engaged with + +113 +00:05:56,360 --> 00:06:01,625 +the material and you will +get 1% each week for that. + +114 +00:06:01,625 --> 00:06:06,020 +How that is going to be +working out in practice, + +115 +00:06:06,020 --> 00:06:08,000 +unfortunately, I do not know + +116 +00:06:08,000 --> 00:06:11,285 +yet what tools we use +or what it means, + +117 +00:06:11,285 --> 00:06:13,685 +but I will let you know about it. + +118 +00:06:13,685 --> 00:06:19,295 +There's also an optional +weekly homework. + +119 +00:06:19,295 --> 00:06:22,999 +The homework is +uploaded on KEATS. + +120 +00:06:22,999 --> 00:06:27,230 +You should send your answers +via email to me. + +121 +00:06:27,230 --> 00:06:31,955 +I will respond individually +and give some feedback. + +122 +00:06:31,955 --> 00:06:33,920 +Normally, if everything is okay + +123 +00:06:33,920 --> 00:06:36,605 +with a question, I will +just respond "OK"? + +124 +00:06:36,605 --> 00:06:38,630 +If there is +something wrong, + +125 +00:06:38,630 --> 00:06:41,060 +I will sometimes answer with a longer + +126 +00:06:41,060 --> 00:06:44,480 +email. All the questions in + +127 +00:06:44,480 --> 00:06:49,415 +the exam and in the midterm +will be from this homework. + +128 +00:06:49,415 --> 00:06:53,330 +So I do not give out the +solutions to the homework. + +129 +00:06:53,330 --> 00:06:55,415 +You have to email me first + +130 +00:06:55,415 --> 00:06:57,290 +what do you think +the solution is... + +131 +00:06:57,290 --> 00:07:01,280 +and I will give answers +individually back. + +132 +00:07:01,280 --> 00:07:05,270 +And I will then ensure +that all the questions in + +133 +00:07:05,270 --> 00:07:09,560 +the exam and the midterm coming +only from this homework. + +134 +00:07:09,560 --> 00:07:11,825 +The simple reason for that is + +135 +00:07:11,825 --> 00:07:14,645 +that I hate if students ask me + +136 +00:07:14,645 --> 00:07:17,705 +if something is relevant +for the exam or not. + +137 +00:07:17,705 --> 00:07:21,620 +I really like this material +and whatever I tell you, + +138 +00:07:21,620 --> 00:07:23,840 +I feel like you +should know about it. + +139 +00:07:23,840 --> 00:07:26,645 +So, to prevent that question, + +140 +00:07:26,645 --> 00:07:28,415 +if it's relevant to the exams, + +141 +00:07:28,415 --> 00:07:32,165 +you can just look at +what's in the homework. + +142 +00:07:32,165 --> 00:07:34,370 +And if this question +is in the homework, + +143 +00:07:34,370 --> 00:07:36,335 +then yes, it will be relevant. + +144 +00:07:36,335 --> 00:07:40,410 +And if not, it will +not. Thank you. + +145 +00:07:42,280 --> 00:07:46,610 +Can you please make one more +point about the homework? + +146 +00:07:46,610 --> 00:07:50,959 +Please submit your +answers only as PDF. + +147 +00:07:50,959 --> 00:07:53,780 +That just makes it easy +for me to look at. + +148 +00:07:53,780 --> 00:07:55,160 +Remember, I might get + +149 +00:07:55,160 --> 00:07:58,820 +60 or more emails about + +150 +00:07:58,820 --> 00:08:03,110 +that and it's just much +easier for me to read PDFs. + +151 +00:08:03,110 --> 00:08:07,880 +Also, please in your answer +copy the question. + +152 +00:08:07,880 --> 00:08:11,480 +I have an extremely limited memory + +153 +00:08:11,480 --> 00:08:15,290 +and have no idea what I actually +asked in this homework until + +154 +00:08:15,290 --> 00:08:16,910 +I actually look it up. + +155 +00:08:16,910 --> 00:08:18,530 +Just to prevent that, + +156 +00:08:18,530 --> 00:08:20,255 +that I have to look it up myself, + +157 +00:08:20,255 --> 00:08:23,765 +please send in +always the question + +158 +00:08:23,765 --> 00:08:26,330 +and then give your answer. + +159 +00:08:26,330 --> 00:08:29,915 +And finally, there are some + +160 +00:08:29,915 --> 00:08:32,540 +optional homework. They are just to + +161 +00:08:32,540 --> 00:08:34,130 +reminders actually that + +162 +00:08:34,130 --> 00:08:36,140 +you really should +have a look at that. + +163 +00:08:36,140 --> 00:08:39,830 +But they're optional. The ones + +164 +00:08:39,830 --> 00:08:43,520 +that you are supposed to +solve, they look like this. + +165 +00:08:43,520 --> 00:08:45,890 +And as I said, + +166 +00:08:45,890 --> 00:08:50,645 +please copy the question +and then give your answer. + +167 +00:08:50,645 --> 00:08:53,780 +And at the end, +package it into a PDF. + +168 +00:08:53,780 --> 00:08:58,505 +Thank you. A very +important component + +169 +00:08:58,505 --> 00:09:02,240 +for assessment in this +module is the coursework. + +170 +00:09:02,240 --> 00:09:04,160 +There will be five of them. + +171 +00:09:04,160 --> 00:09:06,590 +We will first implement a matcher. + +172 +00:09:06,590 --> 00:09:09,770 +Then a lexer building +on top of the matcher. + +173 +00:09:09,770 --> 00:09:13,655 +Then build a parser +and an interpreter. + +174 +00:09:13,655 --> 00:09:16,730 +And then a +compiler for the JVM + +175 +00:09:16,730 --> 00:09:20,420 +and a compiler +targetting the LLVM. + +176 +00:09:20,420 --> 00:09:24,155 +So you can see this +accounts for 45%. + +177 +00:09:24,155 --> 00:09:27,200 +You will submit your +code and you also have + +178 +00:09:27,200 --> 00:09:30,260 +to give some explanation +about your code. + +179 +00:09:30,260 --> 00:09:31,520 +This will be always + +180 +00:09:31,520 --> 00:09:34,655 +explained in the +coursework description. + +181 +00:09:34,655 --> 00:09:36,785 +For the coursework, you can use + +182 +00:09:36,785 --> 00:09:38,945 +any programming +language you like. + +183 +00:09:38,945 --> 00:09:42,860 +In the past, people have +used Haskell and Rust. + +184 +00:09:42,860 --> 00:09:45,905 +Obviously many go for Scala. + +185 +00:09:45,905 --> 00:09:49,730 +You can use any code +I showed you in + +186 +00:09:49,730 --> 00:09:54,230 +the videos or in handouts +or uploaded on KEATS. + +187 +00:09:54,230 --> 00:09:56,390 +But nothing else. + +188 +00:09:56,390 --> 00:09:59,450 +Remember, unlike +in the PEP course, + +189 +00:09:59,450 --> 00:10:02,030 +here, in the Compiler course, + +190 +00:10:02,030 --> 00:10:04,130 +I actually will look with + +191 +00:10:04,130 --> 00:10:07,910 +my own eyes at the submissions +and make judgments. + +192 +00:10:07,910 --> 00:10:13,775 +And I'll see if people have +copied from each other. + +193 +00:10:13,775 --> 00:10:15,545 +Please don't do that! + +194 +00:10:15,545 --> 00:10:20,160 +That's very annoying +for everybody involved. + +195 +00:10:20,440 --> 00:10:23,570 +To conclude this +housekeeping video, + +196 +00:10:23,570 --> 00:10:25,580 +let me tell you a +bit more about how + +197 +00:10:25,580 --> 00:10:28,025 +this module is organized. + +198 +00:10:28,025 --> 00:10:32,600 +The first five weeks we will +spend on lexing and parsing. + +199 +00:10:32,600 --> 00:10:37,370 +This is essentially the +frontend to our compiler. + +200 +00:10:37,370 --> 00:10:39,500 +The first part will +be about lexing. + +201 +00:10:39,500 --> 00:10:42,995 +Actually, we will emphasize quite a +bit on the lexing part, + +202 +00:10:42,995 --> 00:10:45,005 +because that's about my research. + +203 +00:10:45,005 --> 00:10:47,855 +And sorry, I can talk +about this for hours. + +204 +00:10:47,855 --> 00:10:49,400 +But I also like to show you that + +205 +00:10:49,400 --> 00:10:50,750 +there's actually something really + +206 +00:10:50,750 --> 00:10:52,340 +interesting going on and +I want to + +207 +00:10:52,340 --> 00:10:54,529 +show you a number of techniques. + +208 +00:10:54,529 --> 00:10:57,435 +Then obviously we have +to do parsing because + +209 +00:10:57,435 --> 00:11:01,669 +lexing essentially only +finds the words in our programs. + +210 +00:11:01,669 --> 00:11:03,860 +We then have to put +them into sentences to + +211 +00:11:03,860 --> 00:11:07,350 +understand what our +programs actually do. + +212 +00:11:09,820 --> 00:11:12,200 +The next five weeks, + +213 +00:11:12,200 --> 00:11:15,330 +actually that overlaps +with the parsing part, + +214 +00:11:15,330 --> 00:11:18,850 +we will first start with +writing an interpreter. + +215 +00:11:18,850 --> 00:11:20,470 +This is essentially to get + +216 +00:11:20,470 --> 00:11:22,285 +the baseline for our compiler. + +217 +00:11:22,285 --> 00:11:24,325 +Also, you will see compilers are + +218 +00:11:24,325 --> 00:11:27,370 +sometimes fiendishly +difficult to get correct. + +219 +00:11:27,370 --> 00:11:30,880 +And to actually know what +the results should be, + +220 +00:11:30,880 --> 00:11:32,320 +by having it calculated with + +221 +00:11:32,320 --> 00:11:35,515 +an interpreter, is +really important. + +222 +00:11:35,515 --> 00:11:40,030 +And then we really +spent time on a compiler. + +223 +00:11:40,030 --> 00:11:45,310 +We will first write JVM code +for a little While language. + +224 +00:11:45,310 --> 00:11:46,780 +That is an extremely, + +225 +00:11:46,780 --> 00:11:50,335 +extremely simple C-like +language, if you want. + +226 +00:11:50,335 --> 00:11:53,770 +And also JVM Code for +a functional language. + +227 +00:11:53,770 --> 00:11:55,615 +Again, something very small. + +228 +00:11:55,615 --> 00:11:57,815 +And then for that +functional language + +229 +00:11:57,815 --> 00:12:03,080 +also generate LLVM-IR code +that can be + +230 +00:12:03,080 --> 00:12:09,270 +then used to generate directly +CPU code for X86 or ARM. + +231 +00:12:09,460 --> 00:12:12,095 +Just to whet your appetite, + +232 +00:12:12,095 --> 00:12:14,780 +here's the first compiler for + +233 +00:12:14,780 --> 00:12:20,240 +the While-language calculating +the Mandelbrot program. + +234 +00:12:20,240 --> 00:12:22,160 +It generates code for + +235 +00:12:22,160 --> 00:12:25,460 +the JVM, for the Java +virtual machine. + +236 +00:12:25,460 --> 00:12:27,350 +So what it first does, like in + +237 +00:12:27,350 --> 00:12:29,855 +the example I showed you earlier, + +238 +00:12:29,855 --> 00:12:33,215 +it takes the Mandelbrot +program from the BF language, + +239 +00:12:33,215 --> 00:12:35,480 +generates a While program + +240 +00:12:35,480 --> 00:12:38,520 +in our language we are +going to implement... + +241 +00:12:38,530 --> 00:12:40,940 +this has produced a program which is + +242 +00:12:40,940 --> 00:12:44,795 +approximately 70k of source code. + +243 +00:12:44,795 --> 00:12:47,585 +Then our compiler produced + +244 +00:12:47,585 --> 00:12:53,015 +a Java Virtual Machine +byte code file. + +245 +00:12:53,015 --> 00:12:54,890 +This is the readable version of + +246 +00:12:54,890 --> 00:12:57,500 +the Java Virtual Machine bytecode. + +247 +00:12:57,500 --> 00:12:59,480 +And we use then an assembler + +248 +00:12:59,480 --> 00:13:01,535 +to actually produce +the class file. + +249 +00:13:01,535 --> 00:13:04,460 +And then this compiler +just runs this class file. + +250 +00:13:04,460 --> 00:13:06,830 +It takes a little while and +unfortunately this compiler + +251 +00:13:06,830 --> 00:13:09,365 +doesn't show it incrementally. +Only at the end + +252 +00:13:09,365 --> 00:13:13,205 +it shows us it needed +approximately 40 seconds. + +253 +00:13:13,205 --> 00:13:16,745 +And you have to +remember the GCC with -O0 + +254 +00:13:16,745 --> 00:13:21,485 +took a little +bit more than 20 seconds. + +255 +00:13:21,485 --> 00:13:24,440 +And so this is quite +remarkable result + +256 +00:13:24,440 --> 00:13:27,575 +considering we are +running it on the JVM. + +257 +00:13:27,575 --> 00:13:29,345 +Just to assure you, + +258 +00:13:29,345 --> 00:13:33,245 +the Mandelbrot program will not be the +only program we can compile. + +259 +00:13:33,245 --> 00:13:36,380 +But we will focus on integers + +260 +00:13:36,380 --> 00:13:39,605 +and strings in our +programming language. + +261 +00:13:39,605 --> 00:13:43,280 +Because for more complicated +data structures + +262 +00:13:43,280 --> 00:13:46,250 +we don't have enough time +to fit into this module. + +263 +00:13:46,250 --> 00:13:49,040 +So let's start now +with the serious work. + +264 +00:13:49,040 --> 00:13:52,020 +I hope I see you in a bit.