videos/01-housekeeping.srt
author Christian Urban <christian.urban@kcl.ac.uk>
Fri, 04 Nov 2022 12:07:40 +0000
changeset 894 02ef5c3abc51
parent 763 4e628958c01a
permissions -rw-r--r--
updatedHG: added solutions/cw5/fun_tokens.sc

1
00:00:05,750 --> 00:00:10,395
They come back! Before we can
start with any serious work,

2
00:00:10,395 --> 00:00:12,255
let's do some housekeeping.

3
00:00:12,255 --> 00:00:17,380
As you know, this year is a tad special.
While the broad direction is clear,

4
00:00:17,380 --> 00:00:19,730
there are some organizational
details that

5
00:00:19,730 --> 00:00:22,370
still need to be worked
out as we go along.

6
00:00:22,370 --> 00:00:23,930
The main hub for

7
00:00:23,930 --> 00:00:27,215
all the information is the
KEATS page of this module.

8
00:00:27,215 --> 00:00:29,300
Let me say a few words on how

9
00:00:29,300 --> 00:00:35,269
it is organized. The module is
organized into weekly topics.

10
00:00:35,269 --> 00:00:39,785
So there are entries for weeks
one up till week ten.

11
00:00:39,785 --> 00:00:44,390
Let's also proceed in
approximately weekly steps.

12
00:00:44,390 --> 00:00:46,790
Because if you
already asked me in

13
00:00:46,790 --> 00:00:49,130
week two something
about week ten,

14
00:00:49,130 --> 00:00:51,275
then you might confuse me

15
00:00:51,275 --> 00:00:53,720
and also your fellow students.

16
00:00:53,720 --> 00:00:55,970
All the communication in

17
00:00:55,970 --> 00:00:59,945
this module can be done in
the course discussion area.

18
00:00:59,945 --> 00:01:02,300
You can ask any questions.

19
00:01:02,300 --> 00:01:05,465
For example, I haven't
understood this point,

20
00:01:05,465 --> 00:01:07,910
this video isn't clear,

21
00:01:07,910 --> 00:01:10,160
I cannot run this code,

22
00:01:10,160 --> 00:01:13,910
I have problems with
accessing the coursework.

23
00:01:13,910 --> 00:01:16,295
This can be all discussed here.

24
00:01:16,295 --> 00:01:18,590
If you have any private matter,

25
00:01:18,590 --> 00:01:20,900
like, I can't understand why

26
00:01:20,900 --> 00:01:23,465
did I get such a good
mark, for example?

27
00:01:23,465 --> 00:01:27,245
Or there are some private
reasons why you can't keep up.

28
00:01:27,245 --> 00:01:31,430
Please feel free to use
my private email address.

29
00:01:31,430 --> 00:01:35,630
But for any other matter
related to the module,

30
00:01:35,630 --> 00:01:38,480
try to use the course
discussion area.

31
00:01:38,480 --> 00:01:41,510
And I will try to stay on top

32
00:01:41,510 --> 00:01:45,450
of all the discussions
as much as I can.

33
00:01:45,580 --> 00:01:48,500
Each week is organized into

34
00:01:48,500 --> 00:01:52,550
a mandatory section and
an optional section.

35
00:01:52,550 --> 00:01:55,835
The mandatory section
contains videos,

36
00:01:55,835 --> 00:01:58,190
a handout, and the slides,

37
00:01:58,190 --> 00:02:00,725
which I used to
produce the videos.

38
00:02:00,725 --> 00:02:04,324
I recommend that your
first read the handouts,

39
00:02:04,324 --> 00:02:07,430
then watch the videos and then

40
00:02:07,430 --> 00:02:11,030
read the handout
again for each week.

41
00:02:11,030 --> 00:02:14,795
Apologies, some of these
handouts are quite lengthy.

42
00:02:14,795 --> 00:02:17,615
I think the longest
one is 20 pages.

43
00:02:17,615 --> 00:02:19,235
On the good side,

44
00:02:19,235 --> 00:02:21,650
these handouts are
written in such a way

45
00:02:21,650 --> 00:02:24,935
that you can read them just
before you fall asleep

46
00:02:24,935 --> 00:02:26,720
and you should still be able

47
00:02:26,720 --> 00:02:28,835
to understand what is going on.

48
00:02:28,835 --> 00:02:32,105
Also, they contain lots
of pictures and code.

49
00:02:32,105 --> 00:02:35,330
So 20 pages is
a bit misleading.

50
00:02:35,330 --> 00:02:38,870
You can see the second week
is organized similarly.

51
00:02:38,870 --> 00:02:41,030
You will have some videos.

52
00:02:41,030 --> 00:02:42,260
You will have a handout,

53
00:02:42,260 --> 00:02:43,624
you have the slides,

54
00:02:43,624 --> 00:02:45,770
and some optional resources.

55
00:02:45,770 --> 00:02:51,410
I have also put up a general
section, with general

56
00:02:51,410 --> 00:02:57,965
material, where I put
a PDF about my notation.

57
00:02:57,965 --> 00:03:01,385
This might be very
useful because

58
00:03:01,385 --> 00:03:04,610
remember this topic
is already many,

59
00:03:04,610 --> 00:03:07,445
many decades old,
at least 50 years,

60
00:03:07,445 --> 00:03:09,185
and over the time,

61
00:03:09,185 --> 00:03:11,510
many authors and many lecturers

62
00:03:11,510 --> 00:03:13,745
have introduced
their own notation.

63
00:03:13,745 --> 00:03:16,565
So if you read anything
on the web or in books,

64
00:03:16,565 --> 00:03:19,640
you might be confused about that

65
00:03:19,640 --> 00:03:24,035
they call something, which
I call X, they call Y.

66
00:03:24,035 --> 00:03:26,150
So in this PDF,

67
00:03:26,150 --> 00:03:30,320
I have collected all the
information on what kind of

68
00:03:30,320 --> 00:03:34,940
notation I am using and where
I introduce some shortcuts.

69
00:03:34,940 --> 00:03:38,240
The problem is, you always
want to be precise,

70
00:03:38,240 --> 00:03:40,070
but we also lazy.

71
00:03:40,070 --> 00:03:43,745
We don't want to write
everything into the last detail.

72
00:03:43,745 --> 00:03:47,375
So here I say something
about my notation.

73
00:03:47,375 --> 00:03:50,180
I have also put up
a document about

74
00:03:50,180 --> 00:03:53,510
the Scala I'm going to
use in this module.

75
00:03:53,510 --> 00:03:56,090
This is important
for the coursework.

76
00:03:56,090 --> 00:03:58,175
In the coursework which
will come in a moment,

77
00:03:58,175 --> 00:04:00,440
you can use any
programming language you like

78
00:04:00,440 --> 00:04:03,665
to solve the questions and
implement your code.

79
00:04:03,665 --> 00:04:05,615
But, I'm sorry,

80
00:04:05,615 --> 00:04:09,830
all the code I'm going to
show you will be in Scala.

81
00:04:09,830 --> 00:04:12,155
And the main difference is that

82
00:04:12,155 --> 00:04:15,095
between PEP course from last year

83
00:04:15,095 --> 00:04:17,675
is that I'm going to
use the Ammonite

84
00:04:17,675 --> 00:04:21,170
REPL instead of
the Scala REPL.

85
00:04:21,170 --> 00:04:23,615
They work very similarly.

86
00:04:23,615 --> 00:04:26,070
For example,

87
00:04:28,120 --> 00:04:33,710
I can now evaluate code
and get a feedback of what

88
00:04:33,710 --> 00:04:37,100
it calculates, just like the original one.

89
00:04:37,100 --> 00:04:40,220
The difference is that
Ammonite is

90
00:04:40,220 --> 00:04:42,440
an extension and
provides some features

91
00:04:42,440 --> 00:04:44,975
which will be very useful for us.

92
00:04:44,975 --> 00:04:47,600
And I recorded some
of the features

93
00:04:47,600 --> 00:04:50,970
we are going to use
in this document.

94
00:04:52,330 --> 00:04:55,970
The last point to note
on the KEATS webpage

95
00:04:55,970 --> 00:04:59,405
is that in the optional section,

96
00:04:59,405 --> 00:05:02,270
I will always give
the source code of

97
00:05:02,270 --> 00:05:05,845
the programs I discussed
in the videos and in the handouts.

98
00:05:05,845 --> 00:05:08,420
And I really, really
encourage you

99
00:05:08,420 --> 00:05:11,060
that you play around with this code

100
00:05:11,060 --> 00:05:14,420
to make sure you understand
what was discussed.

101
00:05:14,420 --> 00:05:17,540
At the also have sometimes
some additional pointers

102
00:05:17,540 --> 00:05:19,490
to interesting topics, which

103
00:05:19,490 --> 00:05:22,680
you can read up in your own time.

104
00:05:22,690 --> 00:05:26,240
Exams, I'm sorry,
there will be exams.

105
00:05:26,240 --> 00:05:31,760
There will be a final exam in
January, counting for 30%.

106
00:05:31,760 --> 00:05:34,190
There will be a
midterm shortly after

107
00:05:34,190 --> 00:05:37,565
Reading Week, accounting for 10%.

108
00:05:37,565 --> 00:05:40,625
This will be a normal exams,

109
00:05:40,625 --> 00:05:44,780
just that they will be online
in some form or shape.

110
00:05:44,780 --> 00:05:49,939
There will be also a weekly
engagement component.

111
00:05:49,939 --> 00:05:52,370
So 1% in each week,

112
00:05:52,370 --> 00:05:56,360
where I make sure that
you are engaged with

113
00:05:56,360 --> 00:06:01,625
the material and you will
get 1% each week for that.

114
00:06:01,625 --> 00:06:06,020
How that is going to be
working out in practice,

115
00:06:06,020 --> 00:06:08,000
unfortunately, I do not know

116
00:06:08,000 --> 00:06:11,285
yet what tools we use
or what it means,

117
00:06:11,285 --> 00:06:13,685
but I will let you know about it.

118
00:06:13,685 --> 00:06:19,295
There's also an optional
weekly homework.

119
00:06:19,295 --> 00:06:22,999
The homework is
uploaded on KEATS.

120
00:06:22,999 --> 00:06:27,230
You should send your answers
via email to me.

121
00:06:27,230 --> 00:06:31,955
I will respond individually
and give some feedback.

122
00:06:31,955 --> 00:06:33,920
Normally, if everything is okay

123
00:06:33,920 --> 00:06:36,605
with a question, I will
just respond "OK"?

124
00:06:36,605 --> 00:06:38,630
If there is
something wrong,

125
00:06:38,630 --> 00:06:41,060
I will sometimes answer with a longer

126
00:06:41,060 --> 00:06:44,480
email. All the questions in

127
00:06:44,480 --> 00:06:49,415
the exam and in the midterm
will be from this homework.

128
00:06:49,415 --> 00:06:53,330
So I do not give out the
solutions to the homework.

129
00:06:53,330 --> 00:06:55,415
You have to email me first

130
00:06:55,415 --> 00:06:57,290
what do you think
the solution is...

131
00:06:57,290 --> 00:07:01,280
and I will give answers
individually back.

132
00:07:01,280 --> 00:07:05,270
And I will then ensure
that all the questions in

133
00:07:05,270 --> 00:07:09,560
the exam and the midterm coming
only from this homework.

134
00:07:09,560 --> 00:07:11,825
The simple reason for that is

135
00:07:11,825 --> 00:07:14,645
that I hate if students ask me

136
00:07:14,645 --> 00:07:17,705
if something is relevant
for the exam or not.

137
00:07:17,705 --> 00:07:21,620
I really like this material
and whatever I tell you,

138
00:07:21,620 --> 00:07:23,840
I feel like you
should know about it.

139
00:07:23,840 --> 00:07:26,645
So, to prevent that question,

140
00:07:26,645 --> 00:07:28,415
if it's relevant to the exams,

141
00:07:28,415 --> 00:07:32,165
you can just look at
what's in the homework.

142
00:07:32,165 --> 00:07:34,370
And if this question
is in the homework,

143
00:07:34,370 --> 00:07:36,335
then yes, it will be relevant.

144
00:07:36,335 --> 00:07:40,410
And if not, it will
not. Thank you.

145
00:07:42,280 --> 00:07:46,610
Can you please make one more
point about the homework?

146
00:07:46,610 --> 00:07:50,959
Please submit your
answers only as PDF.

147
00:07:50,959 --> 00:07:53,780
That just makes it easy
for me to look at.

148
00:07:53,780 --> 00:07:55,160
Remember, I might get

149
00:07:55,160 --> 00:07:58,820
60 or more emails about

150
00:07:58,820 --> 00:08:03,110
that and it's just much
easier for me to read PDFs.

151
00:08:03,110 --> 00:08:07,880
Also, please in your answer
copy the question.

152
00:08:07,880 --> 00:08:11,480
I have an extremely limited memory

153
00:08:11,480 --> 00:08:15,290
and have no idea what I actually
asked in this homework until

154
00:08:15,290 --> 00:08:16,910
I actually look it up.

155
00:08:16,910 --> 00:08:18,530
Just to prevent that,

156
00:08:18,530 --> 00:08:20,255
that I have to look it up myself,

157
00:08:20,255 --> 00:08:23,765
please send in
always the question

158
00:08:23,765 --> 00:08:26,330
and then give your answer.

159
00:08:26,330 --> 00:08:29,915
And finally, there are some

160
00:08:29,915 --> 00:08:32,540
optional homework. They are just to

161
00:08:32,540 --> 00:08:34,130
reminders actually that

162
00:08:34,130 --> 00:08:36,140
you really should
have a look at that.

163
00:08:36,140 --> 00:08:39,830
But they're optional. The ones

164
00:08:39,830 --> 00:08:43,520
that you are supposed to
solve, they look like this.

165
00:08:43,520 --> 00:08:45,890
And as I said,

166
00:08:45,890 --> 00:08:50,645
please copy the question
and then give your answer.

167
00:08:50,645 --> 00:08:53,780
And at the end,
package it into a PDF.

168
00:08:53,780 --> 00:08:58,505
Thank you. A very
important component

169
00:08:58,505 --> 00:09:02,240
for assessment in this
module is the coursework.

170
00:09:02,240 --> 00:09:04,160
There will be five of them.

171
00:09:04,160 --> 00:09:06,590
We will first implement a matcher.

172
00:09:06,590 --> 00:09:09,770
Then a lexer building
on top of the matcher.

173
00:09:09,770 --> 00:09:13,655
Then build a parser
and an interpreter.

174
00:09:13,655 --> 00:09:16,730
And then a
compiler for the JVM

175
00:09:16,730 --> 00:09:20,420
and a compiler
targetting the LLVM.

176
00:09:20,420 --> 00:09:24,155
So you can see this
accounts for 45%.

177
00:09:24,155 --> 00:09:27,200
You will submit your
code and you also have

178
00:09:27,200 --> 00:09:30,260
to give some explanation
about your code.

179
00:09:30,260 --> 00:09:31,520
This will be always

180
00:09:31,520 --> 00:09:34,655
explained in the
coursework description.

181
00:09:34,655 --> 00:09:36,785
For the coursework, you can use

182
00:09:36,785 --> 00:09:38,945
any programming
language you like.

183
00:09:38,945 --> 00:09:42,860
In the past, people have
used Haskell and Rust.

184
00:09:42,860 --> 00:09:45,905
Obviously many go for Scala.

185
00:09:45,905 --> 00:09:49,730
You can use any code
I showed you in

186
00:09:49,730 --> 00:09:54,230
the videos or in handouts
or uploaded on KEATS.

187
00:09:54,230 --> 00:09:56,390
But nothing else.

188
00:09:56,390 --> 00:09:59,450
Remember, unlike
in the PEP course,

189
00:09:59,450 --> 00:10:02,030
here, in the Compiler course,

190
00:10:02,030 --> 00:10:04,130
I actually will look with

191
00:10:04,130 --> 00:10:07,910
my own eyes at the submissions
and make judgments.

192
00:10:07,910 --> 00:10:13,775
And I'll see if people have
copied from each other.

193
00:10:13,775 --> 00:10:15,545
Please don't do that!

194
00:10:15,545 --> 00:10:20,160
That's very annoying
for everybody involved.

195
00:10:20,440 --> 00:10:23,570
To conclude this
housekeeping video,

196
00:10:23,570 --> 00:10:25,580
let me tell you a
bit more about how

197
00:10:25,580 --> 00:10:28,025
this module is organized.

198
00:10:28,025 --> 00:10:32,600
The first five weeks we will
spend on lexing and parsing.

199
00:10:32,600 --> 00:10:37,370
This is essentially the
frontend to our compiler.

200
00:10:37,370 --> 00:10:39,500
The first part will
be about lexing.

201
00:10:39,500 --> 00:10:42,995
Actually, we will emphasize quite a
bit on the lexing part,

202
00:10:42,995 --> 00:10:45,005
because that's about my research.

203
00:10:45,005 --> 00:10:47,855
And sorry, I can talk
about this for hours.

204
00:10:47,855 --> 00:10:49,400
But I also like to show you that

205
00:10:49,400 --> 00:10:50,750
there's actually something really

206
00:10:50,750 --> 00:10:52,340
interesting going on and
I want to

207
00:10:52,340 --> 00:10:54,529
show you a number of techniques.

208
00:10:54,529 --> 00:10:57,435
Then obviously we have
to do parsing because

209
00:10:57,435 --> 00:11:01,669
lexing essentially only
finds the words in our programs.

210
00:11:01,669 --> 00:11:03,860
We then have to put
them into sentences to

211
00:11:03,860 --> 00:11:07,350
understand what our
programs actually do.

212
00:11:09,820 --> 00:11:12,200
The next five weeks,

213
00:11:12,200 --> 00:11:15,330
actually that overlaps
with the parsing part,

214
00:11:15,330 --> 00:11:18,850
we will first start with
writing an interpreter.

215
00:11:18,850 --> 00:11:20,470
This is essentially to get

216
00:11:20,470 --> 00:11:22,285
the baseline for our compiler.

217
00:11:22,285 --> 00:11:24,325
Also, you will see compilers are

218
00:11:24,325 --> 00:11:27,370
sometimes fiendishly
difficult to get correct.

219
00:11:27,370 --> 00:11:30,880
And to actually know what
the results should be, 

220
00:11:30,880 --> 00:11:32,320
by having it calculated with

221
00:11:32,320 --> 00:11:35,515
an interpreter, is
really important.

222
00:11:35,515 --> 00:11:40,030
And then we really
spent time on a compiler.

223
00:11:40,030 --> 00:11:45,310
We will first write JVM code
for a little While language.

224
00:11:45,310 --> 00:11:46,780
That is an extremely,

225
00:11:46,780 --> 00:11:50,335
extremely simple C-like
language, if you want.

226
00:11:50,335 --> 00:11:53,770
And also JVM Code for
a functional language.

227
00:11:53,770 --> 00:11:55,615
Again, something very small.

228
00:11:55,615 --> 00:11:57,815
And then for that
functional language

229
00:11:57,815 --> 00:12:03,080
also generate LLVM-IR code
that can be

230
00:12:03,080 --> 00:12:09,270
then used to generate directly
CPU code for X86 or ARM.

231
00:12:09,460 --> 00:12:12,095
Just to whet your appetite,

232
00:12:12,095 --> 00:12:14,780
here's the first compiler for

233
00:12:14,780 --> 00:12:20,240
the While-language calculating
the Mandelbrot program.

234
00:12:20,240 --> 00:12:22,160
It generates code for

235
00:12:22,160 --> 00:12:25,460
the JVM, for the Java
virtual machine.

236
00:12:25,460 --> 00:12:27,350
So what it first does, like in

237
00:12:27,350 --> 00:12:29,855
the example I showed you earlier,

238
00:12:29,855 --> 00:12:33,215
it takes the Mandelbrot
program from the BF language,

239
00:12:33,215 --> 00:12:35,480
generates a While program

240
00:12:35,480 --> 00:12:38,520
in our language we are
going to implement...

241
00:12:38,530 --> 00:12:40,940
this has produced a program which is

242
00:12:40,940 --> 00:12:44,795
approximately 70k of source code.

243
00:12:44,795 --> 00:12:47,585
Then our compiler produced

244
00:12:47,585 --> 00:12:53,015
a Java Virtual Machine
byte code file.

245
00:12:53,015 --> 00:12:54,890
This is the readable version of

246
00:12:54,890 --> 00:12:57,500
the Java Virtual Machine bytecode.

247
00:12:57,500 --> 00:12:59,480
And we use then an assembler

248
00:12:59,480 --> 00:13:01,535
to actually produce
the class file.

249
00:13:01,535 --> 00:13:04,460
And then this compiler
just runs this class file.

250
00:13:04,460 --> 00:13:06,830
It takes a little while and
unfortunately this compiler

251
00:13:06,830 --> 00:13:09,365
doesn't show it incrementally.
Only at the end

252
00:13:09,365 --> 00:13:13,205
it shows us it needed
approximately 40 seconds.

253
00:13:13,205 --> 00:13:16,745
And you have to
remember the GCC with -O0

254
00:13:16,745 --> 00:13:21,485
took a little
bit more than 20 seconds.

255
00:13:21,485 --> 00:13:24,440
And so this is quite
remarkable result

256
00:13:24,440 --> 00:13:27,575
considering we are
running it on the JVM.

257
00:13:27,575 --> 00:13:29,345
Just to assure you,

258
00:13:29,345 --> 00:13:33,245
the Mandelbrot program will not be the
only program we can compile.

259
00:13:33,245 --> 00:13:36,380
But we will focus on integers

260
00:13:36,380 --> 00:13:39,605
and strings in our
programming language.

261
00:13:39,605 --> 00:13:43,280
Because for more complicated
data structures 

262
00:13:43,280 --> 00:13:46,250
we don't have enough time
to fit into this module.

263
00:13:46,250 --> 00:13:49,040
So let's start now
with the serious work.

264
00:13:49,040 --> 00:13:52,020
I hope I see you in a bit.