1
00:00:05,750 --> 00:00:10,395
They come back! Before we can
start with any serious work,
2
00:00:10,395 --> 00:00:12,255
let's do some housekeeping.
3
00:00:12,255 --> 00:00:17,380
As you know, this year is a tad special.
While the broad direction is clear,
4
00:00:17,380 --> 00:00:19,730
there are some organizational
details that
5
00:00:19,730 --> 00:00:22,370
still need to be worked
out as we go along.
6
00:00:22,370 --> 00:00:23,930
The main hub for
7
00:00:23,930 --> 00:00:27,215
all the information is the
KEATS page of this module.
8
00:00:27,215 --> 00:00:29,300
Let me say a few words on how
9
00:00:29,300 --> 00:00:35,269
it is organized. The module is
organized into weekly topics.
10
00:00:35,269 --> 00:00:39,785
So there are entries for weeks
one up till week ten.
11
00:00:39,785 --> 00:00:44,390
Let's also proceed in
approximately weekly steps.
12
00:00:44,390 --> 00:00:46,790
Because if you
already asked me in
13
00:00:46,790 --> 00:00:49,130
week two something
about week ten,
14
00:00:49,130 --> 00:00:51,275
then you might confuse me
15
00:00:51,275 --> 00:00:53,720
and also your fellow students.
16
00:00:53,720 --> 00:00:55,970
All the communication in
17
00:00:55,970 --> 00:00:59,945
this module can be done in
the course discussion area.
18
00:00:59,945 --> 00:01:02,300
You can ask any questions.
19
00:01:02,300 --> 00:01:05,465
For example, I haven't
understood this point,
20
00:01:05,465 --> 00:01:07,910
this video isn't clear,
21
00:01:07,910 --> 00:01:10,160
I cannot run this code,
22
00:01:10,160 --> 00:01:13,910
I have problems with
accessing the coursework.
23
00:01:13,910 --> 00:01:16,295
This can be all discussed here.
24
00:01:16,295 --> 00:01:18,590
If you have any private matter,
25
00:01:18,590 --> 00:01:20,900
like, I can't understand why
26
00:01:20,900 --> 00:01:23,465
did I get such a good
mark, for example?
27
00:01:23,465 --> 00:01:27,245
Or there are some private
reasons why you can't keep up.
28
00:01:27,245 --> 00:01:31,430
Please feel free to use
my private email address.
29
00:01:31,430 --> 00:01:35,630
But for any other matter
related to the module,
30
00:01:35,630 --> 00:01:38,480
try to use the course
discussion area.
31
00:01:38,480 --> 00:01:41,510
And I will try to stay on top
32
00:01:41,510 --> 00:01:45,450
of all the discussions
as much as I can.
33
00:01:45,580 --> 00:01:48,500
Each week is organized into
34
00:01:48,500 --> 00:01:52,550
a mandatory section and
an optional section.
35
00:01:52,550 --> 00:01:55,835
The mandatory section
contains videos,
36
00:01:55,835 --> 00:01:58,190
a handout, and the slides,
37
00:01:58,190 --> 00:02:00,725
which I used to
produce the videos.
38
00:02:00,725 --> 00:02:04,324
I recommend that your
first read the handouts,
39
00:02:04,324 --> 00:02:07,430
then watch the videos and then
40
00:02:07,430 --> 00:02:11,030
read the handout
again for each week.
41
00:02:11,030 --> 00:02:14,795
Apologies, some of these
handouts are quite lengthy.
42
00:02:14,795 --> 00:02:17,615
I think the longest
one is 20 pages.
43
00:02:17,615 --> 00:02:19,235
On the good side,
44
00:02:19,235 --> 00:02:21,650
these handouts are
written in such a way
45
00:02:21,650 --> 00:02:24,935
that you can read them just
before you fall asleep
46
00:02:24,935 --> 00:02:26,720
and you should still be able
47
00:02:26,720 --> 00:02:28,835
to understand what is going on.
48
00:02:28,835 --> 00:02:32,105
Also, they contain lots
of pictures and code.
49
00:02:32,105 --> 00:02:35,330
So 20 pages is
a bit misleading.
50
00:02:35,330 --> 00:02:38,870
You can see the second week
is organized similarly.
51
00:02:38,870 --> 00:02:41,030
You will have some videos.
52
00:02:41,030 --> 00:02:42,260
You will have a handout,
53
00:02:42,260 --> 00:02:43,624
you have the slides,
54
00:02:43,624 --> 00:02:45,770
and some optional resources.
55
00:02:45,770 --> 00:02:51,410
I have also put up a general
section, with general
56
00:02:51,410 --> 00:02:57,965
material, where I put
a PDF about my notation.
57
00:02:57,965 --> 00:03:01,385
This might be very
useful because
58
00:03:01,385 --> 00:03:04,610
remember this topic
is already many,
59
00:03:04,610 --> 00:03:07,445
many decades old,
at least 50 years,
60
00:03:07,445 --> 00:03:09,185
and over the time,
61
00:03:09,185 --> 00:03:11,510
many authors and many lecturers
62
00:03:11,510 --> 00:03:13,745
have introduced
their own notation.
63
00:03:13,745 --> 00:03:16,565
So if you read anything
on the web or in books,
64
00:03:16,565 --> 00:03:19,640
you might be confused about that
65
00:03:19,640 --> 00:03:24,035
they call something, which
I call X, they call Y.
66
00:03:24,035 --> 00:03:26,150
So in this PDF,
67
00:03:26,150 --> 00:03:30,320
I have collected all the
information on what kind of
68
00:03:30,320 --> 00:03:34,940
notation I am using and where
I introduce some shortcuts.
69
00:03:34,940 --> 00:03:38,240
The problem is, you always
want to be precise,
70
00:03:38,240 --> 00:03:40,070
but we also lazy.
71
00:03:40,070 --> 00:03:43,745
We don't want to write
everything into the last detail.
72
00:03:43,745 --> 00:03:47,375
So here I say something
about my notation.
73
00:03:47,375 --> 00:03:50,180
I have also put up
a document about
74
00:03:50,180 --> 00:03:53,510
the Scala I'm going to
use in this module.
75
00:03:53,510 --> 00:03:56,090
This is important
for the coursework.
76
00:03:56,090 --> 00:03:58,175
In the coursework which
will come in a moment,
77
00:03:58,175 --> 00:04:00,440
you can use any
programming language you like
78
00:04:00,440 --> 00:04:03,665
to solve the questions and
implement your code.
79
00:04:03,665 --> 00:04:05,615
But, I'm sorry,
80
00:04:05,615 --> 00:04:09,830
all the code I'm going to
show you will be in Scala.
81
00:04:09,830 --> 00:04:12,155
And the main difference is that
82
00:04:12,155 --> 00:04:15,095
between PEP course from last year
83
00:04:15,095 --> 00:04:17,675
is that I'm going to
use the Ammonite
84
00:04:17,675 --> 00:04:21,170
REPL instead of
the Scala REPL.
85
00:04:21,170 --> 00:04:23,615
They work very similarly.
86
00:04:23,615 --> 00:04:26,070
For example,
87
00:04:28,120 --> 00:04:33,710
I can now evaluate code
and get a feedback of what
88
00:04:33,710 --> 00:04:37,100
it calculates, just like the original one.
89
00:04:37,100 --> 00:04:40,220
The difference is that
Ammonite is
90
00:04:40,220 --> 00:04:42,440
an extension and
provides some features
91
00:04:42,440 --> 00:04:44,975
which will be very useful for us.
92
00:04:44,975 --> 00:04:47,600
And I recorded some
of the features
93
00:04:47,600 --> 00:04:50,970
we are going to use
in this document.
94
00:04:52,330 --> 00:04:55,970
The last point to note
on the KEATS webpage
95
00:04:55,970 --> 00:04:59,405
is that in the optional section,
96
00:04:59,405 --> 00:05:02,270
I will always give
the source code of
97
00:05:02,270 --> 00:05:05,845
the programs I discussed
in the videos and in the handouts.
98
00:05:05,845 --> 00:05:08,420
And I really, really
encourage you
99
00:05:08,420 --> 00:05:11,060
that you play around with this code
100
00:05:11,060 --> 00:05:14,420
to make sure you understand
what was discussed.
101
00:05:14,420 --> 00:05:17,540
At the also have sometimes
some additional pointers
102
00:05:17,540 --> 00:05:19,490
to interesting topics, which
103
00:05:19,490 --> 00:05:22,680
you can read up in your own time.
104
00:05:22,690 --> 00:05:26,240
Exams, I'm sorry,
there will be exams.
105
00:05:26,240 --> 00:05:31,760
There will be a final exam in
January, counting for 30%.
106
00:05:31,760 --> 00:05:34,190
There will be a
midterm shortly after
107
00:05:34,190 --> 00:05:37,565
Reading Week, accounting for 10%.
108
00:05:37,565 --> 00:05:40,625
This will be a normal exams,
109
00:05:40,625 --> 00:05:44,780
just that they will be online
in some form or shape.
110
00:05:44,780 --> 00:05:49,939
There will be also a weekly
engagement component.
111
00:05:49,939 --> 00:05:52,370
So 1% in each week,
112
00:05:52,370 --> 00:05:56,360
where I make sure that
you are engaged with
113
00:05:56,360 --> 00:06:01,625
the material and you will
get 1% each week for that.
114
00:06:01,625 --> 00:06:06,020
How that is going to be
working out in practice,
115
00:06:06,020 --> 00:06:08,000
unfortunately, I do not know
116
00:06:08,000 --> 00:06:11,285
yet what tools we use
or what it means,
117
00:06:11,285 --> 00:06:13,685
but I will let you know about it.
118
00:06:13,685 --> 00:06:19,295
There's also an optional
weekly homework.
119
00:06:19,295 --> 00:06:22,999
The homework is
uploaded on KEATS.
120
00:06:22,999 --> 00:06:27,230
You should send your answers
via email to me.
121
00:06:27,230 --> 00:06:31,955
I will respond individually
and give some feedback.
122
00:06:31,955 --> 00:06:33,920
Normally, if everything is okay
123
00:06:33,920 --> 00:06:36,605
with a question, I will
just respond "OK"?
124
00:06:36,605 --> 00:06:38,630
If there is
something wrong,
125
00:06:38,630 --> 00:06:41,060
I will sometimes answer with a longer
126
00:06:41,060 --> 00:06:44,480
email. All the questions in
127
00:06:44,480 --> 00:06:49,415
the exam and in the midterm
will be from this homework.
128
00:06:49,415 --> 00:06:53,330
So I do not give out the
solutions to the homework.
129
00:06:53,330 --> 00:06:55,415
You have to email me first
130
00:06:55,415 --> 00:06:57,290
what do you think
the solution is...
131
00:06:57,290 --> 00:07:01,280
and I will give answers
individually back.
132
00:07:01,280 --> 00:07:05,270
And I will then ensure
that all the questions in
133
00:07:05,270 --> 00:07:09,560
the exam and the midterm coming
only from this homework.
134
00:07:09,560 --> 00:07:11,825
The simple reason for that is
135
00:07:11,825 --> 00:07:14,645
that I hate if students ask me
136
00:07:14,645 --> 00:07:17,705
if something is relevant
for the exam or not.
137
00:07:17,705 --> 00:07:21,620
I really like this material
and whatever I tell you,
138
00:07:21,620 --> 00:07:23,840
I feel like you
should know about it.
139
00:07:23,840 --> 00:07:26,645
So, to prevent that question,
140
00:07:26,645 --> 00:07:28,415
if it's relevant to the exams,
141
00:07:28,415 --> 00:07:32,165
you can just look at
what's in the homework.
142
00:07:32,165 --> 00:07:34,370
And if this question
is in the homework,
143
00:07:34,370 --> 00:07:36,335
then yes, it will be relevant.
144
00:07:36,335 --> 00:07:40,410
And if not, it will
not. Thank you.
145
00:07:42,280 --> 00:07:46,610
Can you please make one more
point about the homework?
146
00:07:46,610 --> 00:07:50,959
Please submit your
answers only as PDF.
147
00:07:50,959 --> 00:07:53,780
That just makes it easy
for me to look at.
148
00:07:53,780 --> 00:07:55,160
Remember, I might get
149
00:07:55,160 --> 00:07:58,820
60 or more emails about
150
00:07:58,820 --> 00:08:03,110
that and it's just much
easier for me to read PDFs.
151
00:08:03,110 --> 00:08:07,880
Also, please in your answer
copy the question.
152
00:08:07,880 --> 00:08:11,480
I have an extremely limited memory
153
00:08:11,480 --> 00:08:15,290
and have no idea what I actually
asked in this homework until
154
00:08:15,290 --> 00:08:16,910
I actually look it up.
155
00:08:16,910 --> 00:08:18,530
Just to prevent that,
156
00:08:18,530 --> 00:08:20,255
that I have to look it up myself,
157
00:08:20,255 --> 00:08:23,765
please send in
always the question
158
00:08:23,765 --> 00:08:26,330
and then give your answer.
159
00:08:26,330 --> 00:08:29,915
And finally, there are some
160
00:08:29,915 --> 00:08:32,540
optional homework. They are just to
161
00:08:32,540 --> 00:08:34,130
reminders actually that
162
00:08:34,130 --> 00:08:36,140
you really should
have a look at that.
163
00:08:36,140 --> 00:08:39,830
But they're optional. The ones
164
00:08:39,830 --> 00:08:43,520
that you are supposed to
solve, they look like this.
165
00:08:43,520 --> 00:08:45,890
And as I said,
166
00:08:45,890 --> 00:08:50,645
please copy the question
and then give your answer.
167
00:08:50,645 --> 00:08:53,780
And at the end,
package it into a PDF.
168
00:08:53,780 --> 00:08:58,505
Thank you. A very
important component
169
00:08:58,505 --> 00:09:02,240
for assessment in this
module is the coursework.
170
00:09:02,240 --> 00:09:04,160
There will be five of them.
171
00:09:04,160 --> 00:09:06,590
We will first implement a matcher.
172
00:09:06,590 --> 00:09:09,770
Then a lexer building
on top of the matcher.
173
00:09:09,770 --> 00:09:13,655
Then build a parser
and an interpreter.
174
00:09:13,655 --> 00:09:16,730
And then a
compiler for the JVM
175
00:09:16,730 --> 00:09:20,420
and a compiler
targetting the LLVM.
176
00:09:20,420 --> 00:09:24,155
So you can see this
accounts for 45%.
177
00:09:24,155 --> 00:09:27,200
You will submit your
code and you also have
178
00:09:27,200 --> 00:09:30,260
to give some explanation
about your code.
179
00:09:30,260 --> 00:09:31,520
This will be always
180
00:09:31,520 --> 00:09:34,655
explained in the
coursework description.
181
00:09:34,655 --> 00:09:36,785
For the coursework, you can use
182
00:09:36,785 --> 00:09:38,945
any programming
language you like.
183
00:09:38,945 --> 00:09:42,860
In the past, people have
used Haskell and Rust.
184
00:09:42,860 --> 00:09:45,905
Obviously many go for Scala.
185
00:09:45,905 --> 00:09:49,730
You can use any code
I showed you in
186
00:09:49,730 --> 00:09:54,230
the videos or in handouts
or uploaded on KEATS.
187
00:09:54,230 --> 00:09:56,390
But nothing else.
188
00:09:56,390 --> 00:09:59,450
Remember, unlike
in the PEP course,
189
00:09:59,450 --> 00:10:02,030
here, in the Compiler course,
190
00:10:02,030 --> 00:10:04,130
I actually will look with
191
00:10:04,130 --> 00:10:07,910
my own eyes at the submissions
and make judgments.
192
00:10:07,910 --> 00:10:13,775
And I'll see if people have
copied from each other.
193
00:10:13,775 --> 00:10:15,545
Please don't do that!
194
00:10:15,545 --> 00:10:20,160
That's very annoying
for everybody involved.
195
00:10:20,440 --> 00:10:23,570
To conclude this
housekeeping video,
196
00:10:23,570 --> 00:10:25,580
let me tell you a
bit more about how
197
00:10:25,580 --> 00:10:28,025
this module is organized.
198
00:10:28,025 --> 00:10:32,600
The first five weeks we will
spend on lexing and parsing.
199
00:10:32,600 --> 00:10:37,370
This is essentially the
frontend to our compiler.
200
00:10:37,370 --> 00:10:39,500
The first part will
be about lexing.
201
00:10:39,500 --> 00:10:42,995
Actually, we will emphasize quite a
bit on the lexing part,
202
00:10:42,995 --> 00:10:45,005
because that's about my research.
203
00:10:45,005 --> 00:10:47,855
And sorry, I can talk
about this for hours.
204
00:10:47,855 --> 00:10:49,400
But I also like to show you that
205
00:10:49,400 --> 00:10:50,750
there's actually something really
206
00:10:50,750 --> 00:10:52,340
interesting going on and
I want to
207
00:10:52,340 --> 00:10:54,529
show you a number of techniques.
208
00:10:54,529 --> 00:10:57,435
Then obviously we have
to do parsing because
209
00:10:57,435 --> 00:11:01,669
lexing essentially only
finds the words in our programs.
210
00:11:01,669 --> 00:11:03,860
We then have to put
them into sentences to
211
00:11:03,860 --> 00:11:07,350
understand what our
programs actually do.
212
00:11:09,820 --> 00:11:12,200
The next five weeks,
213
00:11:12,200 --> 00:11:15,330
actually that overlaps
with the parsing part,
214
00:11:15,330 --> 00:11:18,850
we will first start with
writing an interpreter.
215
00:11:18,850 --> 00:11:20,470
This is essentially to get
216
00:11:20,470 --> 00:11:22,285
the baseline for our compiler.
217
00:11:22,285 --> 00:11:24,325
Also, you will see compilers are
218
00:11:24,325 --> 00:11:27,370
sometimes fiendishly
difficult to get correct.
219
00:11:27,370 --> 00:11:30,880
And to actually know what
the results should be,
220
00:11:30,880 --> 00:11:32,320
by having it calculated with
221
00:11:32,320 --> 00:11:35,515
an interpreter, is
really important.
222
00:11:35,515 --> 00:11:40,030
And then we really
spent time on a compiler.
223
00:11:40,030 --> 00:11:45,310
We will first write JVM code
for a little While language.
224
00:11:45,310 --> 00:11:46,780
That is an extremely,
225
00:11:46,780 --> 00:11:50,335
extremely simple C-like
language, if you want.
226
00:11:50,335 --> 00:11:53,770
And also JVM Code for
a functional language.
227
00:11:53,770 --> 00:11:55,615
Again, something very small.
228
00:11:55,615 --> 00:11:57,815
And then for that
functional language
229
00:11:57,815 --> 00:12:03,080
also generate LLVM-IR code
that can be
230
00:12:03,080 --> 00:12:09,270
then used to generate directly
CPU code for X86 or ARM.
231
00:12:09,460 --> 00:12:12,095
Just to whet your appetite,
232
00:12:12,095 --> 00:12:14,780
here's the first compiler for
233
00:12:14,780 --> 00:12:20,240
the While-language calculating
the Mandelbrot program.
234
00:12:20,240 --> 00:12:22,160
It generates code for
235
00:12:22,160 --> 00:12:25,460
the JVM, for the Java
virtual machine.
236
00:12:25,460 --> 00:12:27,350
So what it first does, like in
237
00:12:27,350 --> 00:12:29,855
the example I showed you earlier,
238
00:12:29,855 --> 00:12:33,215
it takes the Mandelbrot
program from the BF language,
239
00:12:33,215 --> 00:12:35,480
generates a While program
240
00:12:35,480 --> 00:12:38,520
in our language we are
going to implement...
241
00:12:38,530 --> 00:12:40,940
this has produced a program which is
242
00:12:40,940 --> 00:12:44,795
approximately 70k of source code.
243
00:12:44,795 --> 00:12:47,585
Then our compiler produced
244
00:12:47,585 --> 00:12:53,015
a Java Virtual Machine
byte code file.
245
00:12:53,015 --> 00:12:54,890
This is the readable version of
246
00:12:54,890 --> 00:12:57,500
the Java Virtual Machine bytecode.
247
00:12:57,500 --> 00:12:59,480
And we use then an assembler
248
00:12:59,480 --> 00:13:01,535
to actually produce
the class file.
249
00:13:01,535 --> 00:13:04,460
And then this compiler
just runs this class file.
250
00:13:04,460 --> 00:13:06,830
It takes a little while and
unfortunately this compiler
251
00:13:06,830 --> 00:13:09,365
doesn't show it incrementally.
Only at the end
252
00:13:09,365 --> 00:13:13,205
it shows us it needed
approximately 40 seconds.
253
00:13:13,205 --> 00:13:16,745
And you have to
remember the GCC with -O0
254
00:13:16,745 --> 00:13:21,485
took a little
bit more than 20 seconds.
255
00:13:21,485 --> 00:13:24,440
And so this is quite
remarkable result
256
00:13:24,440 --> 00:13:27,575
considering we are
running it on the JVM.
257
00:13:27,575 --> 00:13:29,345
Just to assure you,
258
00:13:29,345 --> 00:13:33,245
the Mandelbrot program will not be the
only program we can compile.
259
00:13:33,245 --> 00:13:36,380
But we will focus on integers
260
00:13:36,380 --> 00:13:39,605
and strings in our
programming language.
261
00:13:39,605 --> 00:13:43,280
Because for more complicated
data structures
262
00:13:43,280 --> 00:13:46,250
we don't have enough time
to fit into this module.
263
00:13:46,250 --> 00:13:49,040
So let's start now
with the serious work.
264
00:13:49,040 --> 00:13:52,020
I hope I see you in a bit.