|
1 1 |
|
2 00:00:16,280 --> 00:00:18,330 |
|
3 A warm welcome to |
|
4 |
|
5 2 |
|
6 00:00:18,330 --> 00:00:20,820 |
|
7 the compilers and formal |
|
8 languages module? |
|
9 |
|
10 3 |
|
11 00:00:20,820 --> 00:00:24,390 |
|
12 My name is Christian Urban. |
|
13 Thank you for coming. |
|
14 |
|
15 4 |
|
16 00:00:24,390 --> 00:00:27,644 |
|
17 Compilers. I guess |
|
18 you use compilers |
|
19 |
|
20 5 |
|
21 00:00:27,644 --> 00:00:31,680 |
|
22 in your daily work, be it |
|
23 the C or Java compiler. |
|
24 |
|
25 6 |
|
26 00:00:31,680 --> 00:00:34,245 |
|
27 And you might be curious |
|
28 in what they do. |
|
29 |
|
30 7 |
|
31 00:00:34,245 --> 00:00:35,700 |
|
32 But you might also be |
|
33 |
|
34 8 |
|
35 00:00:35,700 --> 00:00:38,130 |
|
36 intimidated to look |
|
37 what they do underneath |
|
38 |
|
39 9 |
|
40 00:00:38,130 --> 00:00:39,900 |
|
41 the bonnet because they are |
|
42 |
|
43 10 |
|
44 00:00:39,900 --> 00:00:42,520 |
|
45 quite large software systems. |
|
46 |
|
47 11 |
|
48 00:00:42,520 --> 00:00:46,130 |
|
49 What I like to show you in |
|
50 this module is that you |
|
51 |
|
52 12 |
|
53 00:00:46,130 --> 00:00:49,310 |
|
54 do not need to be an |
|
55 Über hacker to implement your own |
|
56 |
|
57 13 |
|
58 00:00:49,310 --> 00:00:51,305 |
|
59 compiler for your |
|
60 own language, say. |
|
61 |
|
62 14 |
|
63 00:00:51,305 --> 00:00:54,155 |
|
64 So that will be the main |
|
65 goal of this module. |
|
66 |
|
67 15 |
|
68 00:00:54,155 --> 00:00:56,210 |
|
69 You will implement |
|
70 your own compiler, |
|
71 |
|
72 16 |
|
73 00:00:56,210 --> 00:00:58,310 |
|
74 of course with my help. |
|
75 |
|
76 17 |
|
77 00:00:58,310 --> 00:01:02,360 |
|
78 What I personally like |
|
79 about compilers is that |
|
80 |
|
81 18 |
|
82 00:01:02,360 --> 00:01:04,580 |
|
83 the subject is a |
|
84 perfect combination |
|
85 |
|
86 19 |
|
87 00:01:04,580 --> 00:01:06,350 |
|
88 of theory and practice. |
|
89 |
|
90 20 |
|
91 00:01:06,350 --> 00:01:07,790 |
|
92 I like to hack things, |
|
93 |
|
94 21 |
|
95 00:01:07,790 --> 00:01:10,595 |
|
96 writing actual code, |
|
97 but I also like theory. |
|
98 |
|
99 22 |
|
100 00:01:10,595 --> 00:01:13,040 |
|
101 I want to understand what |
|
102 my code actually does. |
|
103 |
|
104 23 |
|
105 00:01:13,040 --> 00:01:17,040 |
|
106 So compilers are the |
|
107 perfect subject for me. |
|
108 |
|
109 24 |
|
110 00:01:18,040 --> 00:01:20,809 |
|
111 Let's have a look at the details. |
|
112 |
|
113 25 |
|
114 00:01:20,809 --> 00:01:23,779 |
|
115 Here's an airplane |
|
116 view of a compiler. |
|
117 |
|
118 26 |
|
119 00:01:23,779 --> 00:01:25,850 |
|
120 On the left-hand side you can see |
|
121 |
|
122 27 |
|
123 00:01:25,850 --> 00:01:28,745 |
|
124 the input program a |
|
125 developer would write. |
|
126 |
|
127 28 |
|
128 00:01:28,745 --> 00:01:31,955 |
|
129 And on the right-hand side is |
|
130 the output of the compiler, |
|
131 |
|
132 29 |
|
133 00:01:31,955 --> 00:01:36,360 |
|
134 the binary code the developer |
|
135 wants to actually run. |
|
136 |
|
137 30 |
|
138 00:01:36,370 --> 00:01:40,340 |
|
139 What makes such a project |
|
140 actually feasible is that |
|
141 |
|
142 31 |
|
143 00:01:40,340 --> 00:01:44,165 |
|
144 compilers fall neatly into |
|
145 separate components. |
|
146 |
|
147 32 |
|
148 00:01:44,165 --> 00:01:47,210 |
|
149 So you can see the lexer and the |
|
150 |
|
151 33 |
|
152 00:01:47,210 --> 00:01:48,860 |
|
153 parser, which are often called the |
|
154 |
|
155 34 |
|
156 00:01:48,860 --> 00:01:50,990 |
|
157 front end of the compiler. |
|
158 |
|
159 35 |
|
160 00:01:50,990 --> 00:01:53,000 |
|
161 And the code generation is |
|
162 |
|
163 36 |
|
164 00:01:53,000 --> 00:01:55,700 |
|
165 called the backend |
|
166 of the compiler. |
|
167 |
|
168 37 |
|
169 00:01:55,700 --> 00:01:57,620 |
|
170 And it so happens |
|
171 that we will spend |
|
172 |
|
173 38 |
|
174 00:01:57,620 --> 00:01:59,930 |
|
175 the first five weeks |
|
176 on the lexer and |
|
177 |
|
178 39 |
|
179 00:01:59,930 --> 00:02:04,970 |
|
180 the parser, and the next five |
|
181 weeks on the code generator. |
|
182 |
|
183 40 |
|
184 00:02:04,970 --> 00:02:09,575 |
|
185 The first component of the |
|
186 compiler is the lexer. |
|
187 |
|
188 41 |
|
189 00:02:09,575 --> 00:02:14,480 |
|
190 The purpose of the lexer |
|
191 is to scan a flat |
|
192 |
|
193 42 |
|
194 00:02:14,480 --> 00:02:16,610 |
|
195 string, which the |
|
196 input program is at |
|
197 |
|
198 43 |
|
199 00:02:16,610 --> 00:02:19,145 |
|
200 the beginning and separate out |
|
201 |
|
202 44 |
|
203 00:02:19,145 --> 00:02:21,275 |
|
204 where are the words? |
|
205 |
|
206 45 |
|
207 00:02:21,275 --> 00:02:23,600 |
|
208 You might think |
|
209 in Western languages, |
|
210 |
|
211 46 |
|
212 00:02:23,600 --> 00:02:24,920 |
|
213 that is very easy: |
|
214 |
|
215 47 |
|
216 00:02:24,920 --> 00:02:26,690 |
|
217 you just try to |
|
218 find out where are |
|
219 |
|
220 48 |
|
221 00:02:26,690 --> 00:02:28,580 |
|
222 the whitespaces |
|
223 and then you know, |
|
224 |
|
225 49 |
|
226 00:02:28,580 --> 00:02:31,955 |
|
227 where one word stops and |
|
228 where the next word begins. |
|
229 |
|
230 50 |
|
231 00:02:31,955 --> 00:02:35,300 |
|
232 But, actually, computer |
|
233 languages are more |
|
234 |
|
235 51 |
|
236 00:02:35,300 --> 00:02:38,180 |
|
237 like ancient languages |
|
238 that you see here. |
|
239 |
|
240 52 |
|
241 00:02:38,180 --> 00:02:39,860 |
|
242 For example, ancient Greek. |
|
243 |
|
244 53 |
|
245 00:02:39,860 --> 00:02:42,065 |
|
246 The writer wrote one word |
|
247 |
|
248 54 |
|
249 00:02:42,065 --> 00:02:44,915 |
|
250 after the next and not |
|
251 leaving any space. |
|
252 |
|
253 55 |
|
254 00:02:44,915 --> 00:02:47,930 |
|
255 So for example, in this |
|
256 very simple string here |
|
257 |
|
258 56 |
|
259 00:02:47,930 --> 00:02:50,960 |
|
260 read n, there is no space at all. |
|
261 |
|
262 57 |
|
263 00:02:50,960 --> 00:02:53,540 |
|
264 But the purpose of |
|
265 the lexer is to |
|
266 |
|
267 58 |
|
268 00:02:53,540 --> 00:02:56,570 |
|
269 separate it into five |
|
270 different components. |
|
271 |
|
272 59 |
|
273 00:02:56,570 --> 00:02:59,450 |
|
274 It has to first say, |
|
275 well there is a read, |
|
276 |
|
277 60 |
|
278 00:02:59,450 --> 00:03:01,595 |
|
279 then there is a left parenthesis, |
|
280 |
|
281 61 |
|
282 00:03:01,595 --> 00:03:04,025 |
|
283 then there's an |
|
284 identifier called n, |
|
285 |
|
286 62 |
|
287 00:03:04,025 --> 00:03:07,715 |
|
288 then there's a right parenthesis, |
|
289 and finally a semicolon. |
|
290 |
|
291 63 |
|
292 00:03:07,715 --> 00:03:11,345 |
|
293 And as you can see, there |
|
294 is no space here at all. |
|
295 |
|
296 64 |
|
297 00:03:11,345 --> 00:03:14,150 |
|
298 And so the task of finding where |
|
299 |
|
300 65 |
|
301 00:03:14,150 --> 00:03:17,705 |
|
302 are the words is actually |
|
303 quite involved. |
|
304 |
|
305 66 |
|
306 00:03:17,705 --> 00:03:19,460 |
|
307 Also the classification is |
|
308 |
|
309 67 |
|
310 00:03:19,460 --> 00:03:22,055 |
|
311 sometimes not so straightforward. |
|
312 |
|
313 68 |
|
314 00:03:22,055 --> 00:03:24,350 |
|
315 If, for example, a writer |
|
316 |
|
317 69 |
|
318 00:03:24,350 --> 00:03:26,360 |
|
319 wrote "if" on its own, |
|
320 |
|
321 70 |
|
322 00:03:26,360 --> 00:03:29,000 |
|
323 then this should be a |
|
324 keyword classified as |
|
325 |
|
326 71 |
|
327 00:03:29,000 --> 00:03:32,615 |
|
328 a keyword because it's |
|
329 from the if-then-else. |
|
330 |
|
331 72 |
|
332 00:03:32,615 --> 00:03:36,800 |
|
333 But if the developer wrote |
|
334 something longer like "iffoo", |
|
335 |
|
336 73 |
|
337 00:03:36,800 --> 00:03:38,030 |
|
338 then this might just be |
|
339 |
|
340 74 |
|
341 00:03:38,030 --> 00:03:39,860 |
|
342 a strange variable name |
|
343 and he needs to be |
|
344 |
|
345 75 |
|
346 00:03:39,860 --> 00:03:44,250 |
|
347 classified as a variable name. |
|
348 |
|
349 76 |
|
350 00:03:45,220 --> 00:03:50,720 |
|
351 The output of the lexer |
|
352 is a sequence of tokens. |
|
353 |
|
354 77 |
|
355 00:03:50,720 --> 00:03:53,480 |
|
356 These are essentially the words in |
|
357 |
|
358 78 |
|
359 00:03:53,480 --> 00:03:56,405 |
|
360 a sentence and their |
|
361 classification: |
|
362 |
|
363 79 |
|
364 00:03:56,405 --> 00:03:59,540 |
|
365 they are nouns, |
|
366 verbs or adjectives. |
|
367 |
|
368 80 |
|
369 00:03:59,540 --> 00:04:02,885 |
|
370 For us, of course, the |
|
371 classification would be keywords, |
|
372 |
|
373 81 |
|
374 00:04:02,885 --> 00:04:06,005 |
|
375 identifiers, |
|
376 operators, and so on. |
|
377 |
|
378 82 |
|
379 00:04:06,005 --> 00:04:11,480 |
|
380 And these tokens are the |
|
381 input for the parser, |
|
382 |
|
383 83 |
|
384 00:04:11,480 --> 00:04:13,085 |
|
385 the next component in |
|
386 |
|
387 84 |
|
388 00:04:13,085 --> 00:04:17,240 |
|
389 our compiler. The parser |
|
390 essentially takes this list |
|
391 |
|
392 85 |
|
393 00:04:17,240 --> 00:04:23,300 |
|
394 of tokens and transforms it |
|
395 into a abstract syntax tree. |
|
396 |
|
397 86 |
|
398 00:04:23,300 --> 00:04:27,230 |
|
399 That means we have now the |
|
400 sentence, we have the words. |
|
401 |
|
402 87 |
|
403 00:04:27,230 --> 00:04:30,275 |
|
404 We have to relate |
|
405 the words to each other |
|
406 |
|
407 88 |
|
408 00:04:30,275 --> 00:04:33,680 |
|
409 in order to find out what |
|
410 this sentence actually means. |
|
411 |
|
412 89 |
|
413 00:04:33,680 --> 00:04:35,405 |
|
414 In this case here, |
|
415 |
|
416 90 |
|
417 00:04:35,405 --> 00:04:38,595 |
|
418 we have to do a read...which |
|
419 variable is affected? |
|
420 |
|
421 91 |
|
422 00:04:38,595 --> 00:04:45,040 |
|
423 The variable n. Once we have |
|
424 the abstract syntax tree, |
|
425 |
|
426 92 |
|
427 00:04:45,040 --> 00:04:49,225 |
|
428 it can go to the next component, |
|
429 to the code generator. |
|
430 |
|
431 93 |
|
432 00:04:49,225 --> 00:04:52,480 |
|
433 Whilst it doesn't look |
|
434 like this in this picture, |
|
435 |
|
436 94 |
|
437 00:04:52,480 --> 00:04:54,100 |
|
438 the code generators is usually the |
|
439 |
|
440 95 |
|
441 00:04:54,100 --> 00:04:56,080 |
|
442 biggest part in a compiler. |
|
443 |
|
444 96 |
|
445 00:04:56,080 --> 00:04:58,720 |
|
446 And here we actually |
|
447 have to cut corners. |
|
448 |
|
449 97 |
|
450 00:04:58,720 --> 00:05:02,470 |
|
451 Instead of producing binary |
|
452 code, which can be run |
|
453 |
|
454 98 |
|
455 00:05:02,470 --> 00:05:06,820 |
|
456 directly on a CPU, like X86 or ARM, |
|
457 |
|
458 99 |
|
459 00:05:06,820 --> 00:05:11,035 |
|
460 we actually target the |
|
461 Java Virtual Machine, the JVM. |
|
462 |
|
463 100 |
|
464 00:05:11,035 --> 00:05:13,600 |
|
465 This is very similar |
|
466 to the Scala compiler, |
|
467 |
|
468 101 |
|
469 00:05:13,600 --> 00:05:16,480 |
|
470 for example, which |
|
471 produces JVM code. |
|
472 |
|
473 102 |
|
474 00:05:16,480 --> 00:05:18,940 |
|
475 Or, of course, also |
|
476 the Java compiler, |
|
477 |
|
478 103 |
|
479 00:05:18,940 --> 00:05:20,845 |
|
480 which produces JVM code. |
|
481 |
|
482 104 |
|
483 00:05:20,845 --> 00:05:23,900 |
|
484 So here's a typical |
|
485 example code which we're |
|
486 |
|
487 105 |
|
488 00:05:23,900 --> 00:05:27,305 |
|
489 going to produce: Something |
|
490 like variable 2 |
|
491 |
|
492 106 |
|
493 00:05:27,305 --> 00:05:30,545 |
|
494 gets allocated in the stack. |
|
495 |
|
496 107 |
|
497 00:05:30,545 --> 00:05:32,390 |
|
498 You subtract ten from it. |
|
499 |
|
500 108 |
|
501 00:05:32,390 --> 00:05:36,050 |
|
502 You test whether the |
|
503 variable is 0 and if yes, |
|
504 |
|
505 109 |
|
506 00:05:36,050 --> 00:05:40,170 |
|
507 you jump somewhere and otherwise |
|
508 you reload the variable. |
|
509 |
|
510 110 |
|
511 00:05:41,560 --> 00:05:45,935 |
|
512 Even though we cut corners |
|
513 into could generate apart. |
|
514 |
|
515 111 |
|
516 00:05:45,935 --> 00:05:48,575 |
|
517 By producing code for the JVM, |
|
518 |
|
519 112 |
|
520 00:05:48,575 --> 00:05:51,710 |
|
521 we can still obtain quite |
|
522 impressive results. |
|
523 |
|
524 113 |
|
525 00:05:51,710 --> 00:05:56,225 |
|
526 Here's a program with |
|
527 a cubic runtime behaviour. |
|
528 |
|
529 114 |
|
530 00:05:56,225 --> 00:05:59,525 |
|
531 When it has to |
|
532 calculate with 400 units, |
|
533 |
|
534 115 |
|
535 00:05:59,525 --> 00:06:02,165 |
|
536 it might need five |
|
537 seconds to calculate. |
|
538 |
|
539 116 |
|
540 00:06:02,165 --> 00:06:05,345 |
|
541 But if it has to |
|
542 calculate with 1200 units, |
|
543 |
|
544 117 |
|
545 00:06:05,345 --> 00:06:08,165 |
|
546 it already needs more |
|
547 than two minutes. |
|
548 |
|
549 118 |
|
550 00:06:08,165 --> 00:06:10,940 |
|
551 Now these timings, |
|
552 the blue timings |
|
553 |
|
554 119 |
|
555 00:06:10,940 --> 00:06:13,910 |
|
556 are obtained with an |
|
557 interpreter of that program. |
|
558 |
|
559 120 |
|
560 00:06:13,910 --> 00:06:16,265 |
|
561 And you can see |
|
562 just next to it, |
|
563 |
|
564 121 |
|
565 00:06:16,265 --> 00:06:18,050 |
|
566 the red times. |
|
567 |
|
568 122 |
|
569 00:06:18,050 --> 00:06:20,359 |
|
570 They are obtained |
|
571 with the compiler |
|
572 |
|
573 123 |
|
574 00:06:20,359 --> 00:06:23,135 |
|
575 we're going to write |
|
576 in this module. |
|
577 |
|
578 124 |
|
579 00:06:23,135 --> 00:06:25,100 |
|
580 And there you can see, even |
|
581 |
|
582 125 |
|
583 00:06:25,100 --> 00:06:29,000 |
|
584 with 1200, the times get |
|
585 hardly off the x-axis. |
|
586 |
|
587 126 |
|
588 00:06:29,000 --> 00:06:30,485 |
|
589 So in this instance, |
|
590 |
|
591 127 |
|
592 00:06:30,485 --> 00:06:32,630 |
|
593 our compiler will |
|
594 have a speed up from |
|
595 |
|
596 128 |
|
597 00:06:32,630 --> 00:06:37,589 |
|
598 approximately 1 million in |
|
599 comparison to the interpreter. |
|
600 |
|
601 129 |
|
602 00:06:38,350 --> 00:06:42,020 |
|
603 This might be a fun task |
|
604 for your spare time. |
|
605 |
|
606 130 |
|
607 00:06:42,020 --> 00:06:44,480 |
|
608 This is a compiler explorer, |
|
609 which allows you to |
|
610 |
|
611 131 |
|
612 00:06:44,480 --> 00:06:47,060 |
|
613 write C code on the |
|
614 left-hand side. |
|
615 |
|
616 132 |
|
617 00:06:47,060 --> 00:06:49,115 |
|
618 It shows you on the |
|
619 right-hand side |
|
620 |
|
621 133 |
|
622 00:06:49,115 --> 00:06:53,270 |
|
623 the machine code an |
|
624 X86 CPU can run. |
|
625 |
|
626 134 |
|
627 00:06:53,270 --> 00:06:56,060 |
|
628 For example, it gives |
|
629 you these color scheme and |
|
630 |
|
631 135 |
|
632 00:06:56,060 --> 00:06:59,000 |
|
633 says that this addition |
|
634 in the num + num in |
|
635 |
|
636 136 |
|
637 00:06:59,000 --> 00:07:01,580 |
|
638 the return statement, |
|
639 translates to |
|
640 |
|
641 137 |
|
642 00:07:01,580 --> 00:07:02,960 |
|
643 essentially an addition of |
|
644 |
|
645 138 |
|
646 00:07:02,960 --> 00:07:05,930 |
|
647 an register in the machine code. |
|
648 |
|
649 139 |
|
650 00:07:05,930 --> 00:07:08,480 |
|
651 I think this compiler |
|
652 explorer also works for |
|
653 |
|
654 140 |
|
655 00:07:08,480 --> 00:07:11,495 |
|
656 the Haskell language and |
|
657 also produces ARM code, |
|
658 |
|
659 141 |
|
660 00:07:11,495 --> 00:07:15,245 |
|
661 not just code for the X86 CPUs. |
|
662 |
|
663 142 |
|
664 00:07:15,245 --> 00:07:18,950 |
|
665 As an aside, I also recommend |
|
666 to watch the movie of |
|
667 |
|
668 143 |
|
669 00:07:18,950 --> 00:07:20,870 |
|
670 this guy because that is |
|
671 |
|
672 144 |
|
673 00:07:20,870 --> 00:07:22,940 |
|
674 very much like how I worked |
|
675 at the beginning |
|
676 |
|
677 145 |
|
678 00:07:22,940 --> 00:07:25,355 |
|
679 when implementing our compiler. |
|
680 |
|
681 146 |
|
682 00:07:25,355 --> 00:07:29,300 |
|
683 I wrote some code which I |
|
684 knew how it should behave. |
|
685 |
|
686 147 |
|
687 00:07:29,300 --> 00:07:30,725 |
|
688 And then I just had a look |
|
689 |
|
690 148 |
|
691 00:07:30,725 --> 00:07:32,900 |
|
692 what another compiler produces. |
|
693 |
|
694 149 |
|
695 00:07:32,900 --> 00:07:37,190 |
|
696 And I imitated that code |
|
697 as much as I could. |
|
698 |
|
699 150 |
|
700 00:07:37,190 --> 00:07:39,380 |
|
701 Such a compiler explorer |
|
702 |
|
703 151 |
|
704 00:07:39,380 --> 00:07:41,375 |
|
705 also exists for |
|
706 the Java language. |
|
707 |
|
708 152 |
|
709 00:07:41,375 --> 00:07:42,935 |
|
710 Here's one where you can write |
|
711 |
|
712 153 |
|
713 00:07:42,935 --> 00:07:44,915 |
|
714 Java code on the left-hand side, |
|
715 |
|
716 154 |
|
717 00:07:44,915 --> 00:07:47,930 |
|
718 and on the right-hand |
|
719 side you get JVM code. |
|
720 |
|
721 155 |
|
722 00:07:47,930 --> 00:07:50,255 |
|
723 JVM code is a byte code |
|
724 |
|
725 156 |
|
726 00:07:50,255 --> 00:07:53,405 |
|
727 which cannot be run |
|
728 directly by the CPU, |
|
729 |
|
730 157 |
|
731 00:07:53,405 --> 00:07:55,220 |
|
732 but needs the Java |
|
733 Virtual Machine |
|
734 |
|
735 158 |
|
736 00:07:55,220 --> 00:07:57,170 |
|
737 to essentially interpret that. |
|
738 |
|
739 159 |
|
740 00:07:57,170 --> 00:07:58,760 |
|
741 This means it's not |
|
742 |
|
743 160 |
|
744 00:07:58,760 --> 00:08:01,235 |
|
745 the most efficient way |
|
746 how to run programs - |
|
747 |
|
748 161 |
|
749 00:08:01,235 --> 00:08:02,780 |
|
750 it would be much faster to |
|
751 |
|
752 162 |
|
753 00:08:02,780 --> 00:08:05,285 |
|
754 generate direct |
|
755 code for the CPU. |
|
756 |
|
757 163 |
|
758 00:08:05,285 --> 00:08:07,700 |
|
759 But by producing bytecode, |
|
760 |
|
761 164 |
|
762 00:08:07,700 --> 00:08:11,435 |
|
763 we still run the |
|
764 programs quite fast |
|
765 |
|
766 165 |
|
767 00:08:11,435 --> 00:08:13,520 |
|
768 and it also simplifies |
|
769 |
|
770 166 |
|
771 00:08:13,520 --> 00:08:16,280 |
|
772 a number of issues |
|
773 in our compiler. |
|
774 |
|
775 167 |
|
776 00:08:16,280 --> 00:08:18,980 |
|
777 One issue is about |
|
778 memory management. |
|
779 |
|
780 168 |
|
781 00:08:18,980 --> 00:08:22,055 |
|
782 We don't have to be concerned |
|
783 about register allocation, |
|
784 |
|
785 169 |
|
786 00:08:22,055 --> 00:08:25,505 |
|
787 which we would need |
|
788 to do in a real CPU. |
|
789 |
|
790 170 |
|
791 00:08:25,505 --> 00:08:27,680 |
|
792 This will be done by the JVM. |
|
793 |
|
794 171 |
|
795 00:08:27,680 --> 00:08:29,750 |
|
796 It's also much easier to produce |
|
797 |
|
798 172 |
|
799 00:08:29,750 --> 00:08:33,635 |
|
800 this bytecode rather than |
|
801 actual machine code. |
|
802 |
|
803 173 |
|
804 00:08:33,635 --> 00:08:37,385 |
|
805 I think it's now a good time |
|
806 to come to the question, |
|
807 |
|
808 174 |
|
809 00:08:37,385 --> 00:08:39,950 |
|
810 why on Earth studying compilers. |
|
811 |
|
812 175 |
|
813 00:08:39,950 --> 00:08:42,650 |
|
814 Compilers are such an |
|
815 established subject |
|
816 |
|
817 176 |
|
818 00:08:42,650 --> 00:08:43,985 |
|
819 in computer science. |
|
820 |
|
821 177 |
|
822 00:08:43,985 --> 00:08:46,100 |
|
823 Compilers do what they do. |
|
824 |
|
825 178 |
|
826 00:08:46,100 --> 00:08:48,410 |
|
827 Probably forrests have |
|
828 been killed by |
|
829 |
|
830 179 |
|
831 00:08:48,410 --> 00:08:52,190 |
|
832 all the books that have been |
|
833 published on compilers. |
|
834 |
|
835 180 |
|
836 00:08:52,190 --> 00:08:56,690 |
|
837 Why on Earth studying |
|
838 compilers in 2020? And even worse |
|
839 |
|
840 181 |
|
841 00:08:56,690 --> 00:08:59,659 |
|
842 why implementing |
|
843 your own compiler? |
|
844 |
|
845 182 |
|
846 00:08:59,659 --> 00:09:02,720 |
|
847 Well, a slightly humorous take on |
|
848 |
|
849 183 |
|
850 00:09:02,720 --> 00:09:05,105 |
|
851 that question is |
|
852 given by John Regehr, |
|
853 |
|
854 184 |
|
855 00:09:05,105 --> 00:09:08,375 |
|
856 who is a compiler hacker |
|
857 from the University of Utah. |
|
858 |
|
859 185 |
|
860 00:09:08,375 --> 00:09:09,770 |
|
861 Essentially what he says, |
|
862 |
|
863 186 |
|
864 00:09:09,770 --> 00:09:12,110 |
|
865 if you're a good compiler hacker, |
|
866 |
|
867 187 |
|
868 00:09:12,110 --> 00:09:14,690 |
|
869 you have no problems |
|
870 of finding a job. |
|
871 |
|
872 188 |
|
873 00:09:14,690 --> 00:09:17,210 |
|
874 He puts it as: It's effectively |
|
875 |
|
876 189 |
|
877 00:09:17,210 --> 00:09:22,320 |
|
878 a "Perpetual Employment Act" |
|
879 for solid compiler hackers. |
|
880 |
|
881 190 |
|
882 00:09:22,990 --> 00:09:27,380 |
|
883 Regehr gives two justifications |
|
884 for that statement. |
|
885 |
|
886 191 |
|
887 00:09:27,380 --> 00:09:29,690 |
|
888 First, he says |
|
889 hardware is getting |
|
890 |
|
891 192 |
|
892 00:09:29,690 --> 00:09:32,585 |
|
893 wierder, rather than |
|
894 getting clocked faster. |
|
895 |
|
896 193 |
|
897 00:09:32,585 --> 00:09:34,520 |
|
898 And that's definitely true. |
|
899 |
|
900 194 |
|
901 00:09:34,520 --> 00:09:36,590 |
|
902 My first computer many, many, |
|
903 |
|
904 195 |
|
905 00:09:36,590 --> 00:09:40,040 |
|
906 many years ago contained |
|
907 only a single core CPU. |
|
908 |
|
909 196 |
|
910 00:09:40,040 --> 00:09:44,030 |
|
911 And it was such a simple CPU |
|
912 that we wrote machine code |
|
913 |
|
914 197 |
|
915 00:09:44,030 --> 00:09:46,220 |
|
916 directly for that CPU in order to |
|
917 |
|
918 198 |
|
919 00:09:46,220 --> 00:09:49,740 |
|
920 run our programs as |
|
921 fast as possible. |
|
922 |
|
923 199 |
|
924 00:09:50,260 --> 00:09:53,600 |
|
925 In contrast, today, Regehr writes, |
|
926 |
|
927 200 |
|
928 00:09:53,600 --> 00:09:57,005 |
|
929 almost all processors are |
|
930 multi-core nowadays. |
|
931 |
|
932 201 |
|
933 00:09:57,005 --> 00:09:59,870 |
|
934 And it looks like there's |
|
935 an increasing asymmetry |
|
936 |
|
937 202 |
|
938 00:09:59,870 --> 00:10:02,015 |
|
939 in resources across cores. |
|
940 |
|
941 203 |
|
942 00:10:02,015 --> 00:10:04,445 |
|
943 Processors come |
|
944 with vector units, |
|
945 |
|
946 204 |
|
947 00:10:04,445 --> 00:10:07,189 |
|
948 crypto accelerators, etc. |
|
949 |
|
950 205 |
|
951 00:10:07,189 --> 00:10:11,930 |
|
952 We have TSPs, GPUs, ARM |
|
953 big,little, Xeon Phis, |
|
954 |
|
955 206 |
|
956 00:10:11,930 --> 00:10:14,630 |
|
957 and this only scratches the surface. |
|
958 |
|
959 207 |
|
960 00:10:14,630 --> 00:10:17,255 |
|
961 And that is really a |
|
962 problem for compilers, |
|
963 |
|
964 208 |
|
965 00:10:17,255 --> 00:10:20,495 |
|
966 because if we now |
|
967 have multi-core CPUs, |
|
968 |
|
969 209 |
|
970 00:10:20,495 --> 00:10:23,180 |
|
971 that means our programs need |
|
972 |
|
973 210 |
|
974 00:10:23,180 --> 00:10:26,075 |
|
975 to be scheduled |
|
976 over several CPUs. |
|
977 |
|
978 211 |
|
979 00:10:26,075 --> 00:10:28,220 |
|
980 But the developer, of course, |
|
981 |
|
982 212 |
|
983 00:10:28,220 --> 00:10:30,545 |
|
984 doesn't want to know |
|
985 anything about that. |
|
986 |
|
987 213 |
|
988 00:10:30,545 --> 00:10:34,655 |
|
989 Also, now we have more |
|
990 CPUs in a computer, |
|
991 |
|
992 214 |
|
993 00:10:34,655 --> 00:10:37,400 |
|
994 but they seem to also come |
|
995 with different resources. |
|
996 |
|
997 215 |
|
998 00:10:37,400 --> 00:10:40,310 |
|
999 So certain tasks in |
|
1000 a program need to |
|
1001 |
|
1002 216 |
|
1003 00:10:40,310 --> 00:10:43,460 |
|
1004 be scheduled on some |
|
1005 cores, but not on others. |
|
1006 |
|
1007 217 |
|
1008 00:10:43,460 --> 00:10:46,685 |
|
1009 We also have for a |
|
1010 long time already GPUs, |
|
1011 |
|
1012 218 |
|
1013 00:10:46,685 --> 00:10:49,025 |
|
1014 which are highly specialized for |
|
1015 |
|
1016 219 |
|
1017 00:10:49,025 --> 00:10:53,240 |
|
1018 very parallel computations |
|
1019 to do with graphics. |
|
1020 |
|
1021 220 |
|
1022 00:10:53,240 --> 00:10:56,015 |
|
1023 They at least in |
|
1024 the past few years, |
|
1025 |
|
1026 221 |
|
1027 00:10:56,015 --> 00:10:59,360 |
|
1028 they could only do very |
|
1029 special computations, |
|
1030 |
|
1031 222 |
|
1032 00:10:59,360 --> 00:11:02,255 |
|
1033 but very, very fast |
|
1034 and highly parallel. |
|
1035 |
|
1036 223 |
|
1037 00:11:02,255 --> 00:11:05,539 |
|
1038 You couldn't use them for |
|
1039 general-purpose calculations, |
|
1040 |
|
1041 224 |
|
1042 00:11:05,539 --> 00:11:10,205 |
|
1043 which you would usually |
|
1044 use CPUs. Similarly, DSPs. |
|
1045 |
|
1046 225 |
|
1047 00:11:10,205 --> 00:11:14,075 |
|
1048 They are needed for all |
|
1049 the signal processing |
|
1050 |
|
1051 226 |
|
1052 00:11:14,075 --> 00:11:16,040 |
|
1053 in mobile phones. |
|
1054 |
|
1055 227 |
|
1056 00:11:16,040 --> 00:11:20,780 |
|
1057 Without them, we just |
|
1058 wouldn't have mobile phones. |
|
1059 |
|
1060 228 |
|
1061 00:11:20,780 --> 00:11:23,525 |
|
1062 The second reason Regehr gives is |
|
1063 |
|
1064 229 |
|
1065 00:11:23,525 --> 00:11:25,550 |
|
1066 that we are getting tired of |
|
1067 |
|
1068 230 |
|
1069 00:11:25,550 --> 00:11:27,620 |
|
1070 low-level languages and |
|
1071 |
|
1072 231 |
|
1073 00:11:27,620 --> 00:11:30,335 |
|
1074 their associated |
|
1075 security disasters. |
|
1076 |
|
1077 232 |
|
1078 00:11:30,335 --> 00:11:32,435 |
|
1079 While at the beginning |
|
1080 we were still |
|
1081 |
|
1082 233 |
|
1083 00:11:32,435 --> 00:11:34,760 |
|
1084 happy to write machine |
|
1085 code directly, |
|
1086 |
|
1087 234 |
|
1088 00:11:34,760 --> 00:11:37,175 |
|
1089 nobody wants to do this nowadays. |
|
1090 |
|
1091 235 |
|
1092 00:11:37,175 --> 00:11:39,515 |
|
1093 He writes: :We want |
|
1094 to write new code |
|
1095 |
|
1096 236 |
|
1097 00:11:39,515 --> 00:11:44,120 |
|
1098 to whatever extent possible in |
|
1099 safer high-level languages. |
|
1100 |
|
1101 237 |
|
1102 00:11:44,120 --> 00:11:46,130 |
|
1103 Compilers are caught right |
|
1104 |
|
1105 238 |
|
1106 00:11:46,130 --> 00:11:48,365 |
|
1107 in the middle of these |
|
1108 opposing trends: |
|
1109 |
|
1110 239 |
|
1111 00:11:48,365 --> 00:11:50,765 |
|
1112 one of their main jobs |
|
1113 is to have bridged |
|
1114 |
|
1115 240 |
|
1116 00:11:50,765 --> 00:11:53,240 |
|
1117 the large and growing gap between |
|
1118 |
|
1119 241 |
|
1120 00:11:53,240 --> 00:11:55,460 |
|
1121 increasingly high-level languages |
|
1122 |
|
1123 242 |
|
1124 00:11:55,460 --> 00:11:58,565 |
|
1125 and increasingly |
|
1126 wacky platforms. |
|
1127 |
|
1128 243 |
|
1129 00:11:58,565 --> 00:12:00,320 |
|
1130 So here you have it: |
|
1131 |
|
1132 244 |
|
1133 00:12:00,320 --> 00:12:02,750 |
|
1134 It's still interesting to study |
|
1135 |
|
1136 245 |
|
1137 00:12:02,750 --> 00:12:06,244 |
|
1138 compilers nowadays, |
|
1139 especially nowadays. |
|
1140 |
|
1141 246 |
|
1142 00:12:06,244 --> 00:12:09,875 |
|
1143 Well, if you want to work |
|
1144 on interesting problems, |
|
1145 |
|
1146 247 |
|
1147 00:12:09,875 --> 00:12:14,570 |
|
1148 then very often you have to |
|
1149 know compilers. Here's one: |
|
1150 |
|
1151 248 |
|
1152 00:12:14,570 --> 00:12:16,940 |
|
1153 In the good old |
|
1154 times when we were |
|
1155 |
|
1156 249 |
|
1157 00:12:16,940 --> 00:12:19,310 |
|
1158 still able to fly on holidays, |
|
1159 |
|
1160 250 |
|
1161 00:12:19,310 --> 00:12:23,300 |
|
1162 I'm sure you flew in |
|
1163 a 777 Boeing airplane. |
|
1164 |
|
1165 251 |
|
1166 00:12:23,300 --> 00:12:25,850 |
|
1167 It's actually already |
|
1168 a quite old airplane. |
|
1169 |
|
1170 252 |
|
1171 00:12:25,850 --> 00:12:28,295 |
|
1172 But they had the |
|
1173 following problem. |
|
1174 |
|
1175 253 |
|
1176 00:12:28,295 --> 00:12:33,020 |
|
1177 What happens if a CPU |
|
1178 calculates the wrong result? |
|
1179 |
|
1180 254 |
|
1181 00:12:33,020 --> 00:12:36,065 |
|
1182 And that's actually not |
|
1183 such a hypothetical problem |
|
1184 |
|
1185 255 |
|
1186 00:12:36,065 --> 00:12:40,295 |
|
1187 because you remember |
|
1188 the Pentium CPU |
|
1189 |
|
1190 256 |
|
1191 00:12:40,295 --> 00:12:43,655 |
|
1192 in around 2000 |
|
1193 contained a bug and it cost |
|
1194 |
|
1195 257 |
|
1196 00:12:43,655 --> 00:12:48,140 |
|
1197 a lot of money for Intel |
|
1198 to replace this CPU. |
|
1199 |
|
1200 258 |
|
1201 00:12:48,140 --> 00:12:51,470 |
|
1202 What happens if an CPU calculates |
|
1203 |
|
1204 259 |
|
1205 00:12:51,470 --> 00:12:56,105 |
|
1206 the wrong result in |
|
1207 some navigation data? |
|
1208 |
|
1209 260 |
|
1210 00:12:56,105 --> 00:12:58,520 |
|
1211 Do you just let the |
|
1212 airplane crash? |
|
1213 |
|
1214 261 |
|
1215 00:12:58,520 --> 00:13:00,650 |
|
1216 Well, the engineers at Boeing |
|
1217 |
|
1218 262 |
|
1219 00:13:00,650 --> 00:13:02,675 |
|
1220 came up with the |
|
1221 following solution: |
|
1222 |
|
1223 263 |
|
1224 00:13:02,675 --> 00:13:05,055 |
|
1225 They writing one program that |
|
1226 |
|
1227 264 |
|
1228 00:13:05,055 --> 00:13:08,690 |
|
1229 essentially controls |
|
1230 how their airplane |
|
1231 |
|
1232 265 |
|
1233 00:13:08,690 --> 00:13:10,295 |
|
1234 is supposed to fly. |
|
1235 |
|
1236 266 |
|
1237 00:13:10,295 --> 00:13:13,040 |
|
1238 In a programming language |
|
1239 called Ada that is |
|
1240 |
|
1241 267 |
|
1242 00:13:13,040 --> 00:13:15,770 |
|
1243 slightly obscure |
|
1244 programming language |
|
1245 |
|
1246 268 |
|
1247 00:13:15,770 --> 00:13:18,650 |
|
1248 bat is very well-known in |
|
1249 |
|
1250 269 |
|
1251 00:13:18,650 --> 00:13:22,040 |
|
1252 areas where safety |
|
1253 is really paramount. |
|
1254 |
|
1255 270 |
|
1256 00:13:22,040 --> 00:13:25,010 |
|
1257 And what they did is they |
|
1258 took this one Ada program |
|
1259 |
|
1260 271 |
|
1261 00:13:25,010 --> 00:13:28,010 |
|
1262 and they essentially |
|
1263 took three compilers, |
|
1264 |
|
1265 272 |
|
1266 00:13:28,010 --> 00:13:29,435 |
|
1267 independent compilers, |
|
1268 |
|
1269 273 |
|
1270 00:13:29,435 --> 00:13:32,045 |
|
1271 which compiled the program |
|
1272 to different CPUs. |
|
1273 |
|
1274 274 |
|
1275 00:13:32,045 --> 00:13:33,815 |
|
1276 One CPU from Intel, |
|
1277 |
|
1278 275 |
|
1279 00:13:33,815 --> 00:13:37,235 |
|
1280 one CPU for Motorola, |
|
1281 and one from AMD. |
|
1282 |
|
1283 276 |
|
1284 00:13:37,235 --> 00:13:38,930 |
|
1285 And these are quite different |
|
1286 |
|
1287 277 |
|
1288 00:13:38,930 --> 00:13:40,955 |
|
1289 CPUs. Also some of them |
|
1290 |
|
1291 278 |
|
1292 00:13:40,955 --> 00:13:42,755 |
|
1293 have quite different philosophies |
|
1294 |
|
1295 279 |
|
1296 00:13:42,755 --> 00:13:45,245 |
|
1297 on how they do their calculations. |
|
1298 |
|
1299 280 |
|
1300 00:13:45,245 --> 00:13:47,330 |
|
1301 Now what they could do is, they |
|
1302 |
|
1303 281 |
|
1304 00:13:47,330 --> 00:13:50,690 |
|
1305 could put these three computers |
|
1306 on a single board and |
|
1307 |
|
1308 282 |
|
1309 00:13:50,690 --> 00:13:54,335 |
|
1310 could now run all their |
|
1311 calculations in parallel. |
|
1312 |
|
1313 283 |
|
1314 00:13:54,335 --> 00:13:56,690 |
|
1315 One with an Intel |
|
1316 CPU, one with |
|
1317 |
|
1318 284 |
|
1319 00:13:56,690 --> 00:14:00,245 |
|
1320 Motorola, and one with a |
|
1321 Risc computers say. |
|
1322 |
|
1323 285 |
|
1324 00:14:00,245 --> 00:14:02,795 |
|
1325 And then they could |
|
1326 compare the results. |
|
1327 |
|
1328 286 |
|
1329 00:14:02,795 --> 00:14:05,030 |
|
1330 And if these results differed, |
|
1331 |
|
1332 287 |
|
1333 00:14:05,030 --> 00:14:07,940 |
|
1334 then they knew one CPU must have |
|
1335 |
|
1336 288 |
|
1337 00:14:07,940 --> 00:14:11,600 |
|
1338 calculated the wrong result |
|
1339 and probably told the pilot, |
|
1340 |
|
1341 289 |
|
1342 00:14:11,600 --> 00:14:14,850 |
|
1343 please don't let |
|
1344 that airplane crash. |
|
1345 |
|
1346 290 |
|
1347 00:14:14,950 --> 00:14:17,270 |
|
1348 Not just Boeing is doing |
|
1349 |
|
1350 291 |
|
1351 00:14:17,270 --> 00:14:19,355 |
|
1352 interesting things |
|
1353 with compilers. |
|
1354 |
|
1355 292 |
|
1356 00:14:19,355 --> 00:14:22,640 |
|
1357 Also, Airbus in a completely |
|
1358 different setting |
|
1359 |
|
1360 293 |
|
1361 00:14:22,640 --> 00:14:24,260 |
|
1362 is using compilers in |
|
1363 |
|
1364 294 |
|
1365 00:14:24,260 --> 00:14:26,420 |
|
1366 a interesting way to get |
|
1367 |
|
1368 295 |
|
1369 00:14:26,420 --> 00:14:30,195 |
|
1370 their airplanes up in the |
|
1371 air and let them not crush. |
|
1372 |
|
1373 296 |
|
1374 00:14:30,195 --> 00:14:33,010 |
|
1375 In another example, I |
|
1376 have friends working |
|
1377 |
|
1378 297 |
|
1379 00:14:33,010 --> 00:14:35,350 |
|
1380 at Facebook who work on |
|
1381 |
|
1382 298 |
|
1383 00:14:35,350 --> 00:14:37,900 |
|
1384 compilers to make sense |
|
1385 are they are heaps |
|
1386 |
|
1387 299 |
|
1388 00:14:37,900 --> 00:14:41,470 |
|
1389 and heaps and heaps of |
|
1390 code written in PHP, |
|
1391 |
|
1392 300 |
|
1393 00:14:41,470 --> 00:14:42,880 |
|
1394 which is one of the most |
|
1395 |
|
1396 301 |
|
1397 00:14:42,880 --> 00:14:44,920 |
|
1398 horrible languages on the planet. |
|
1399 |
|
1400 302 |
|
1401 00:14:44,920 --> 00:14:46,630 |
|
1402 The bottom line is, |
|
1403 |
|
1404 303 |
|
1405 00:14:46,630 --> 00:14:50,499 |
|
1406 compilers are still very, |
|
1407 very important nowadays. |
|
1408 |
|
1409 304 |
|
1410 00:14:50,499 --> 00:14:52,810 |
|
1411 And for the foreseeable future, |
|
1412 |
|
1413 305 |
|
1414 00:14:52,810 --> 00:14:56,150 |
|
1415 compilers still need |
|
1416 to be developed. |
|
1417 |
|
1418 306 |
|
1419 00:14:57,270 --> 00:15:00,235 |
|
1420 When one talks about |
|
1421 compilers then |
|
1422 |
|
1423 307 |
|
1424 00:15:00,235 --> 00:15:01,630 |
|
1425 there is, of course, |
|
1426 |
|
1427 308 |
|
1428 00:15:01,630 --> 00:15:05,395 |
|
1429 a magic element involved. They're |
|
1430 large software systems. |
|
1431 |
|
1432 309 |
|
1433 00:15:05,395 --> 00:15:07,750 |
|
1434 And yes, they're supposed to |
|
1435 |
|
1436 310 |
|
1437 00:15:07,750 --> 00:15:10,525 |
|
1438 generate a runnable binary for |
|
1439 |
|
1440 311 |
|
1441 00:15:10,525 --> 00:15:12,105 |
|
1442 a high-level program, |
|
1443 |
|
1444 312 |
|
1445 00:15:12,105 --> 00:15:15,230 |
|
1446 but they are also supposed |
|
1447 to make our programs better. |
|
1448 |
|
1449 313 |
|
1450 00:15:15,230 --> 00:15:18,920 |
|
1451 They optimize our |
|
1452 programs to run faster. |
|
1453 |
|
1454 314 |
|
1455 00:15:18,920 --> 00:15:22,610 |
|
1456 And there's a lot of |
|
1457 "magic" involved in that. |
|
1458 |
|
1459 315 |
|
1460 00:15:22,610 --> 00:15:24,890 |
|
1461 Magic in inverted quotes. |
|
1462 |
|
1463 316 |
|
1464 00:15:24,890 --> 00:15:26,480 |
|
1465 I would like to show you one of |
|
1466 |
|
1467 317 |
|
1468 00:15:26,480 --> 00:15:29,340 |
|
1469 this magic in one example. |
|
1470 |
|
1471 318 |
|
1472 00:15:31,090 --> 00:15:33,110 |
|
1473 I hope you still have |
|
1474 |
|
1475 319 |
|
1476 00:15:33,110 --> 00:15:36,485 |
|
1477 fond memories of the |
|
1478 PEP module last year. |
|
1479 |
|
1480 320 |
|
1481 00:15:36,485 --> 00:15:40,280 |
|
1482 And you might remember BF |
|
1483 language, we had to look at. |
|
1484 |
|
1485 321 |
|
1486 00:15:40,280 --> 00:15:43,670 |
|
1487 This BF language contains |
|
1488 the kind of memory and |
|
1489 |
|
1490 322 |
|
1491 00:15:43,670 --> 00:15:45,680 |
|
1492 the memory pointer which can |
|
1493 |
|
1494 323 |
|
1495 00:15:45,680 --> 00:15:48,140 |
|
1496 be moved to the left |
|
1497 or to the right, |
|
1498 |
|
1499 324 |
|
1500 00:15:48,140 --> 00:15:50,270 |
|
1501 where an integer in |
|
1502 |
|
1503 325 |
|
1504 00:15:50,270 --> 00:15:52,925 |
|
1505 the memory can be either |
|
1506 increased or decreased. |
|
1507 |
|
1508 326 |
|
1509 00:15:52,925 --> 00:15:55,730 |
|
1510 We can print |
|
1511 out the content of |
|
1512 |
|
1513 327 |
|
1514 00:15:55,730 --> 00:15:59,645 |
|
1515 the current cell or input |
|
1516 something into a cell. |
|
1517 |
|
1518 328 |
|
1519 00:15:59,645 --> 00:16:01,850 |
|
1520 And we have loops, and |
|
1521 |
|
1522 329 |
|
1523 00:16:01,850 --> 00:16:04,220 |
|
1524 everything else is |
|
1525 considered a comment. |
|
1526 |
|
1527 330 |
|
1528 00:16:04,220 --> 00:16:06,290 |
|
1529 What's good about |
|
1530 this BF language is that |
|
1531 |
|
1532 331 |
|
1533 00:16:06,290 --> 00:16:08,180 |
|
1534 we don't even need a front end. |
|
1535 |
|
1536 332 |
|
1537 00:16:08,180 --> 00:16:09,920 |
|
1538 We can immediately start writing |
|
1539 |
|
1540 333 |
|
1541 00:16:09,920 --> 00:16:13,529 |
|
1542 an interpreter or a compiler. |
|
1543 |
|
1544 334 |
|
1545 00:16:15,850 --> 00:16:18,155 |
|
1546 Okay, I have here |
|
1547 |
|
1548 335 |
|
1549 00:16:18,155 --> 00:16:20,600 |
|
1550 a very straightforward |
|
1551 interpreter for |
|
1552 |
|
1553 336 |
|
1554 00:16:20,600 --> 00:16:22,865 |
|
1555 the BF language. You |
|
1556 might recognize it. |
|
1557 |
|
1558 337 |
|
1559 00:16:22,865 --> 00:16:27,120 |
|
1560 And I run it with a |
|
1561 benchmark program, which |
|
1562 |
|
1563 338 |
|
1564 00:16:27,760 --> 00:16:30,960 |
|
1565 might also recognize. |
|
1566 |
|
1567 339 |
|
1568 00:16:31,560 --> 00:16:36,835 |
|
1569 And this will now take |
|
1570 approximately ten minutes. |
|
1571 |
|
1572 340 |
|
1573 00:16:36,835 --> 00:16:39,920 |
|
1574 So see you in a bit. |
|
1575 |
|
1576 341 |
|
1577 00:16:40,710 --> 00:16:43,660 |
|
1578 Okay, this has finished now. |
|
1579 |
|
1580 342 |
|
1581 00:16:43,660 --> 00:16:45,925 |
|
1582 Almost took 11 minutes. |
|
1583 |
|
1584 343 |
|
1585 00:16:45,925 --> 00:16:47,410 |
|
1586 The question now is, |
|
1587 |
|
1588 344 |
|
1589 00:16:47,410 --> 00:16:51,820 |
|
1590 can we to better? |
|
1591 |
|
1592 345 |
|
1593 00:16:51,820 --> 00:16:53,260 |
|
1594 Actually, it is not difficult |
|
1595 to do better |
|
1596 |
|
1597 346 |
|
1598 00:16:53,260 --> 00:16:54,970 |
|
1599 than this simple-minded |
|
1600 interpreter. |
|
1601 |
|
1602 347 |
|
1603 00:16:54,970 --> 00:16:58,690 |
|
1604 It is relatively |
|
1605 straightforward to take |
|
1606 |
|
1607 348 |
|
1608 00:16:58,690 --> 00:17:03,520 |
|
1609 a BF program and generate equivalent C code. |
|
1610 |
|
1611 349 |
|
1612 00:17:03,520 --> 00:17:06,520 |
|
1613 This can be easily |
|
1614 done by somehow |
|
1615 |
|
1616 350 |
|
1617 00:17:06,520 --> 00:17:09,490 |
|
1618 representing the BF memory in |
|
1619 |
|
1620 351 |
|
1621 00:17:09,490 --> 00:17:12,310 |
|
1622 C. We can do this |
|
1623 by just an array of |
|
1624 |
|
1625 352 |
|
1626 00:17:12,310 --> 00:17:15,380 |
|
1627 characters and a memory pointer, |
|
1628 |
|
1629 353 |
|
1630 00:17:15,380 --> 00:17:19,265 |
|
1631 which points, in this case |
|
1632 in the middle of this array. |
|
1633 |
|
1634 354 |
|
1635 00:17:19,265 --> 00:17:21,800 |
|
1636 Then it's very easy to |
|
1637 |
|
1638 355 |
|
1639 00:17:21,800 --> 00:17:24,275 |
|
1640 translate the movement |
|
1641 of the |
|
1642 |
|
1643 356 |
|
1644 00:17:24,275 --> 00:17:28,610 |
|
1645 BF memory pointer into increments |
|
1646 |
|
1647 357 |
|
1648 00:17:28,610 --> 00:17:31,520 |
|
1649 and decrements of |
|
1650 the C pointer. |
|
1651 |
|
1652 358 |
|
1653 00:17:31,520 --> 00:17:33,050 |
|
1654 Similarly, if you want to increment or |
|
1655 |
|
1656 359 |
|
1657 00:17:33,050 --> 00:17:35,915 |
|
1658 decrement an element |
|
1659 in this array, |
|
1660 |
|
1661 360 |
|
1662 00:17:35,915 --> 00:17:38,975 |
|
1663 you just have to look up what |
|
1664 the pointer is and increment |
|
1665 |
|
1666 361 |
|
1667 00:17:38,975 --> 00:17:42,140 |
|
1668 and decrement what's |
|
1669 under the pointer. Similarly |
|
1670 |
|
1671 362 |
|
1672 00:17:42,140 --> 00:17:43,790 |
|
1673 we can print out something from |
|
1674 |
|
1675 363 |
|
1676 00:17:43,790 --> 00:17:47,450 |
|
1677 this array and we can put |
|
1678 something into this array. |
|
1679 |
|
1680 364 |
|
1681 00:17:47,450 --> 00:17:49,610 |
|
1682 What is great is that the loops |
|
1683 |
|
1684 365 |
|
1685 00:17:49,610 --> 00:17:51,530 |
|
1686 from the BF language directly |
|
1687 |
|
1688 366 |
|
1689 00:17:51,530 --> 00:17:55,580 |
|
1690 translate into while |
|
1691 loops of the C language. |
|
1692 |
|
1693 367 |
|
1694 00:17:55,580 --> 00:17:58,100 |
|
1695 We essentially have to check is |
|
1696 |
|
1697 368 |
|
1698 00:17:58,100 --> 00:18:02,450 |
|
1699 the memory pointer pointing |
|
1700 to a field that is 0. |
|
1701 |
|
1702 369 |
|
1703 00:18:02,450 --> 00:18:03,950 |
|
1704 Then we exit the loop, |
|
1705 |
|
1706 370 |
|
1707 00:18:03,950 --> 00:18:05,719 |
|
1708 and otherwise we continue. |
|
1709 |
|
1710 371 |
|
1711 00:18:05,719 --> 00:18:09,575 |
|
1712 And that can be done |
|
1713 nicely in C, like so. |
|
1714 |
|
1715 372 |
|
1716 00:18:09,575 --> 00:18:12,995 |
|
1717 And everything else is |
|
1718 again, just a comment. |
|
1719 |
|
1720 373 |
|
1721 00:18:12,995 --> 00:18:16,445 |
|
1722 So I have implemented |
|
1723 this translation for you. |
|
1724 |
|
1725 374 |
|
1726 00:18:16,445 --> 00:18:19,700 |
|
1727 Remember this was the BF |
|
1728 program for the Mandelbrot Set. |
|
1729 |
|
1730 375 |
|
1731 00:18:19,700 --> 00:18:23,690 |
|
1732 The equivalent C code |
|
1733 would look like this. |
|
1734 |
|
1735 376 |
|
1736 00:18:23,690 --> 00:18:27,110 |
|
1737 So you can see at the beginning |
|
1738 is this generation of |
|
1739 |
|
1740 377 |
|
1741 00:18:27,110 --> 00:18:30,140 |
|
1742 the BF memory represented |
|
1743 |
|
1744 378 |
|
1745 00:18:30,140 --> 00:18:32,345 |
|
1746 as an array and the |
|
1747 memory pointer. |
|
1748 |
|
1749 379 |
|
1750 00:18:32,345 --> 00:18:34,550 |
|
1751 And then inside there are lots |
|
1752 |
|
1753 380 |
|
1754 00:18:34,550 --> 00:18:36,770 |
|
1755 of increments and decrements |
|
1756 |
|
1757 381 |
|
1758 00:18:36,770 --> 00:18:41,915 |
|
1759 of pointers and also |
|
1760 contents of this array. |
|
1761 |
|
1762 382 |
|
1763 00:18:41,915 --> 00:18:45,199 |
|
1764 Now fingers crossed that this |
|
1765 is the correct translation. |
|
1766 |
|
1767 383 |
|
1768 00:18:45,199 --> 00:18:48,125 |
|
1769 And I can also run this |
|
1770 |
|
1771 384 |
|
1772 00:18:48,125 --> 00:18:52,805 |
|
1773 and you should see that it runs |
|
1774 substantially faster. |
|
1775 |
|
1776 385 |
|
1777 00:18:52,805 --> 00:18:55,880 |
|
1778 I'm using now my GCC on |
|
1779 |
|
1780 386 |
|
1781 00:18:55,880 --> 00:19:01,040 |
|
1782 my computer to generate an |
|
1783 executable for the C program. |
|
1784 |
|
1785 387 |
|
1786 00:19:01,040 --> 00:19:04,295 |
|
1787 And it should run for |
|
1788 approximately 20 seconds. |
|
1789 |
|
1790 388 |
|
1791 00:19:04,295 --> 00:19:07,620 |
|
1792 So let's just wait for that. |
|
1793 |
|
1794 389 |
|
1795 00:19:11,430 --> 00:19:14,950 |
|
1796 Okay. What is important to note |
|
1797 |
|
1798 390 |
|
1799 00:19:14,950 --> 00:19:19,885 |
|
1800 here is that I'm running GCC, |
|
1801 |
|
1802 391 |
|
1803 00:19:19,885 --> 00:19:22,450 |
|
1804 this the option -O0. |
|
1805 |
|
1806 392 |
|
1807 00:19:22,450 --> 00:19:25,600 |
|
1808 That means I tell GCC to |
|
1809 |
|
1810 393 |
|
1811 00:19:25,600 --> 00:19:28,840 |
|
1812 generate a binary which I |
|
1813 can run as you can see. |
|
1814 |
|
1815 394 |
|
1816 00:19:28,840 --> 00:19:31,810 |
|
1817 But don't apply any optimization. |
|
1818 |
|
1819 395 |
|
1820 00:19:31,810 --> 00:19:38,395 |
|
1821 Keep this in mind. |
|
1822 As a second try, |
|
1823 |
|
1824 396 |
|
1825 00:19:38,395 --> 00:19:42,595 |
|
1826 of course, I can try to now |
|
1827 generate a better C program. |
|
1828 |
|
1829 397 |
|
1830 00:19:42,595 --> 00:19:46,060 |
|
1831 And as you'll remember from |
|
1832 the PEP course, it can, |
|
1833 |
|
1834 398 |
|
1835 00:19:46,060 --> 00:19:50,095 |
|
1836 for example, combine |
|
1837 several steps |
|
1838 |
|
1839 399 |
|
1840 00:19:50,095 --> 00:19:51,670 |
|
1841 going to the right of the memory |
|
1842 |
|
1843 400 |
|
1844 00:19:51,670 --> 00:19:53,310 |
|
1845 pointer or to the left. |
|
1846 |
|
1847 401 |
|
1848 00:19:53,310 --> 00:19:55,280 |
|
1849 We can combine that into |
|
1850 |
|
1851 402 |
|
1852 00:19:55,280 --> 00:19:58,760 |
|
1853 one single increment or |
|
1854 decrement of not just one, |
|
1855 |
|
1856 403 |
|
1857 00:19:58,760 --> 00:20:00,050 |
|
1858 but of like n, |
|
1859 |
|
1860 404 |
|
1861 00:20:00,050 --> 00:20:02,570 |
|
1862 where n is greater |
|
1863 than 1. Similarly, |
|
1864 |
|
1865 405 |
|
1866 00:20:02,570 --> 00:20:06,980 |
|
1867 if we increment or decrement |
|
1868 the content of this array, |
|
1869 |
|
1870 406 |
|
1871 00:20:06,980 --> 00:20:09,740 |
|
1872 we can do this in one |
|
1873 big step by incrementing |
|
1874 |
|
1875 407 |
|
1876 00:20:09,740 --> 00:20:12,710 |
|
1877 that by not just one |
|
1878 and increment by one, |
|
1879 |
|
1880 408 |
|
1881 00:20:12,710 --> 00:20:15,635 |
|
1882 but increment and decrement |
|
1883 by bigger numbers. |
|
1884 |
|
1885 409 |
|
1886 00:20:15,635 --> 00:20:18,870 |
|
1887 Everything else stays the same. |
|
1888 |
|
1889 410 |
|
1890 00:20:20,830 --> 00:20:23,270 |
|
1891 Again, I have implemented that |
|
1892 |
|
1893 411 |
|
1894 00:20:23,270 --> 00:20:24,950 |
|
1895 for you and you can see now |
|
1896 |
|
1897 412 |
|
1898 00:20:24,950 --> 00:20:26,835 |
|
1899 the C program moves |
|
1900 |
|
1901 413 |
|
1902 00:20:26,835 --> 00:20:30,455 |
|
1903 the memory pointer and bigger |
|
1904 chunks and also increases, |
|
1905 |
|
1906 414 |
|
1907 00:20:30,455 --> 00:20:32,810 |
|
1908 for example, here, memory content |
|
1909 |
|
1910 415 |
|
1911 00:20:32,810 --> 00:20:35,555 |
|
1912 by 15 than just by 1. |
|
1913 |
|
1914 416 |
|
1915 00:20:35,555 --> 00:20:38,520 |
|
1916 Now let's run this program. |
|
1917 |
|
1918 417 |
|
1919 00:20:38,530 --> 00:20:40,760 |
|
1920 Again, I use GCC |
|
1921 |
|
1922 418 |
|
1923 00:20:40,760 --> 00:20:45,350 |
|
1924 to compile the |
|
1925 C program and run it. |
|
1926 |
|
1927 419 |
|
1928 00:20:45,350 --> 00:20:49,025 |
|
1929 And again, I made sure |
|
1930 that it only runs this with |
|
1931 |
|
1932 420 |
|
1933 00:20:49,025 --> 00:20:51,530 |
|
1934 no optimizations switched on. |
|
1935 |
|
1936 421 |
|
1937 00:20:51,530 --> 00:20:54,050 |
|
1938 So there's minus O0. |
|
1939 |
|
1940 422 |
|
1941 00:20:54,050 --> 00:20:56,090 |
|
1942 And you can see |
|
1943 it's now down from |
|
1944 |
|
1945 423 |
|
1946 00:20:56,090 --> 00:20:59,990 |
|
1947 20 seconds to just 6 seconds. |
|
1948 |
|
1949 424 |
|
1950 00:20:59,990 --> 00:21:06,065 |
|
1951 I show you, the GCC is |
|
1952 called with -O0. |
|
1953 |
|
1954 425 |
|
1955 00:21:06,065 --> 00:21:08,990 |
|
1956 So this reduction |
|
1957 in time is purely |
|
1958 |
|
1959 426 |
|
1960 00:21:08,990 --> 00:21:12,755 |
|
1961 because I produced better C code, |
|
1962 |
|
1963 427 |
|
1964 00:21:12,755 --> 00:21:17,220 |
|
1965 which the compiler then just |
|
1966 transformed into a binary. |
|
1967 |
|
1968 428 |
|
1969 00:21:18,910 --> 00:21:22,055 |
|
1970 So far there's |
|
1971 nothing interesting. |
|
1972 |
|
1973 429 |
|
1974 00:21:22,055 --> 00:21:25,385 |
|
1975 We used in the first instance |
|
1976 |
|
1977 430 |
|
1978 00:21:25,385 --> 00:21:29,240 |
|
1979 single increments and use GCC with |
|
1980 |
|
1981 431 |
|
1982 00:21:29,240 --> 00:21:31,700 |
|
1983 O0 to not introduce |
|
1984 |
|
1985 432 |
|
1986 00:21:31,700 --> 00:21:35,255 |
|
1987 any optimizations and run |
|
1988 essentially 20 seconds. |
|
1989 |
|
1990 433 |
|
1991 00:21:35,255 --> 00:21:37,880 |
|
1992 If we then increment |
|
1993 |
|
1994 434 |
|
1995 00:21:37,880 --> 00:21:40,895 |
|
1996 in bigger chunks or |
|
1997 decrement in bigger chunks, |
|
1998 |
|
1999 435 |
|
2000 00:21:40,895 --> 00:21:45,380 |
|
2001 use again GCC with -O0, |
|
2002 |
|
2003 436 |
|
2004 00:21:45,380 --> 00:21:50,030 |
|
2005 then it runs in approximately |
|
2006 five to six seconds. |
|
2007 |
|
2008 437 |
|
2009 00:21:50,030 --> 00:21:51,950 |
|
2010 Now let me do the following: |
|
2011 |
|
2012 438 |
|
2013 00:21:51,950 --> 00:21:55,430 |
|
2014 I take the first program which |
|
2015 |
|
2016 439 |
|
2017 00:21:55,430 --> 00:21:58,070 |
|
2018 increments everything |
|
2019 in a single steps |
|
2020 |
|
2021 440 |
|
2022 00:21:58,070 --> 00:22:00,185 |
|
2023 or decremented in single steps. |
|
2024 |
|
2025 441 |
|
2026 00:22:00,185 --> 00:22:04,835 |
|
2027 But now I use the full |
|
2028 capacity of the GCC compiler |
|
2029 |
|
2030 442 |
|
2031 00:22:04,835 --> 00:22:06,560 |
|
2032 and I tell it, |
|
2033 |
|
2034 443 |
|
2035 00:22:06,560 --> 00:22:11,370 |
|
2036 please do introduce some |
|
2037 optimizations as you want. |
|
2038 |
|
2039 444 |
|
2040 00:22:11,590 --> 00:22:15,799 |
|
2041 And I'm now running |
|
2042 exactly the same program... |
|
2043 |
|
2044 445 |
|
2045 00:22:15,799 --> 00:22:17,870 |
|
2046 just the GCC compiler will |
|
2047 |
|
2048 446 |
|
2049 00:22:17,870 --> 00:22:22,325 |
|
2050 now have the optimizations |
|
2051 switched on. |
|
2052 |
|
2053 447 |
|
2054 00:22:22,325 --> 00:22:24,380 |
|
2055 Let's see what happens. |
|
2056 |
|
2057 448 |
|
2058 00:22:24,380 --> 00:22:27,320 |
|
2059 One first needs to compile it. |
|
2060 |
|
2061 449 |
|
2062 00:22:27,320 --> 00:22:29,795 |
|
2063 And that takes a little while. |
|
2064 |
|
2065 450 |
|
2066 00:22:29,795 --> 00:22:32,000 |
|
2067 Okay, this has now |
|
2068 finished and also |
|
2069 |
|
2070 451 |
|
2071 00:22:32,000 --> 00:22:34,115 |
|
2072 the calculation of the |
|
2073 picture has finished. |
|
2074 |
|
2075 452 |
|
2076 00:22:34,115 --> 00:22:35,960 |
|
2077 And as you can see, it took |
|
2078 |
|
2079 453 |
|
2080 00:22:35,960 --> 00:22:38,510 |
|
2081 approximately eight |
|
2082 seconds to calculate. |
|
2083 |
|
2084 454 |
|
2085 00:22:38,510 --> 00:22:41,645 |
|
2086 That is down from |
|
2087 approximately 20 seconds. |
|
2088 |
|
2089 455 |
|
2090 00:22:41,645 --> 00:22:46,040 |
|
2091 So the difference from |
|
2092 switching from -O0 to |
|
2093 |
|
2094 456 |
|
2095 00:22:46,040 --> 00:22:51,935 |
|
2096 -O3 meant that now |
|
2097 the program runs almost as |
|
2098 |
|
2099 457 |
|
2100 00:22:51,935 --> 00:22:54,800 |
|
2101 fast as where I by hand |
|
2102 |
|
2103 458 |
|
2104 00:22:54,800 --> 00:22:58,610 |
|
2105 combined several steps |
|
2106 into a big chunk. |
|
2107 |
|
2108 459 |
|
2109 00:22:58,610 --> 00:23:00,170 |
|
2110 That is essentially what |
|
2111 |
|
2112 460 |
|
2113 00:23:00,170 --> 00:23:03,485 |
|
2114 the GCC compiler |
|
2115 found out on its own. |
|
2116 |
|
2117 461 |
|
2118 00:23:03,485 --> 00:23:05,840 |
|
2119 That rather than jumping |
|
2120 |
|
2121 462 |
|
2122 00:23:05,840 --> 00:23:08,465 |
|
2123 in just single increments by one, |
|
2124 |
|
2125 463 |
|
2126 00:23:08,465 --> 00:23:11,510 |
|
2127 they can be all combined |
|
2128 into bigger chunks. |
|
2129 |
|
2130 464 |
|
2131 00:23:11,510 --> 00:23:16,595 |
|
2132 It hasn't been as successful |
|
2133 as if I do this explicitly. |
|
2134 |
|
2135 465 |
|
2136 00:23:16,595 --> 00:23:18,620 |
|
2137 But that is the magic that |
|
2138 |
|
2139 466 |
|
2140 00:23:18,620 --> 00:23:22,560 |
|
2141 the compiler essentially found |
|
2142 out, that this can be done. |
|
2143 |
|
2144 467 |
|
2145 00:23:22,960 --> 00:23:25,700 |
|
2146 Just a quick recap of what I did. |
|
2147 |
|
2148 468 |
|
2149 00:23:25,700 --> 00:23:28,160 |
|
2150 I first run the program with |
|
2151 |
|
2152 469 |
|
2153 00:23:28,160 --> 00:23:31,730 |
|
2154 an interpreter and it took |
|
2155 approximately 11 minutes. |
|
2156 |
|
2157 470 |
|
2158 00:23:31,730 --> 00:23:36,559 |
|
2159 Then I had a simple compiler |
|
2160 generating a C program. |
|
2161 |
|
2162 471 |
|
2163 00:23:36,559 --> 00:23:40,880 |
|
2164 And the C compiler then had |
|
2165 no optimization switched on. |
|
2166 |
|
2167 472 |
|
2168 00:23:40,880 --> 00:23:44,645 |
|
2169 In the C program had only |
|
2170 small single increments and |
|
2171 |
|
2172 473 |
|
2173 00:23:44,645 --> 00:23:46,820 |
|
2174 small jumps. Then it took |
|
2175 |
|
2176 474 |
|
2177 00:23:46,820 --> 00:23:49,775 |
|
2178 approximately 20 seconds for |
|
2179 to same program. |
|
2180 |
|
2181 475 |
|
2182 00:23:49,775 --> 00:23:52,460 |
|
2183 Then I had a more advanced |
|
2184 compiler which does |
|
2185 |
|
2186 476 |
|
2187 00:23:52,460 --> 00:23:55,730 |
|
2188 big increments and |
|
2189 also big jumps. |
|
2190 |
|
2191 477 |
|
2192 00:23:55,730 --> 00:23:57,470 |
|
2193 But again, the compiler didn't |
|
2194 |
|
2195 478 |
|
2196 00:23:57,470 --> 00:23:59,210 |
|
2197 introduce any optimization on its |
|
2198 |
|
2199 479 |
|
2200 00:23:59,210 --> 00:24:02,915 |
|
2201 own. Then it took |
|
2202 approximately five seconds. |
|
2203 |
|
2204 480 |
|
2205 00:24:02,915 --> 00:24:05,210 |
|
2206 Then I went back to |
|
2207 |
|
2208 481 |
|
2209 00:24:05,210 --> 00:24:08,465 |
|
2210 the simple compiler with |
|
2211 only small steps. |
|
2212 |
|
2213 482 |
|
2214 00:24:08,465 --> 00:24:10,400 |
|
2215 But then told the C compiler |
|
2216 |
|
2217 483 |
|
2218 00:24:10,400 --> 00:24:12,950 |
|
2219 to fully optimize this program. |
|
2220 |
|
2221 484 |
|
2222 00:24:12,950 --> 00:24:14,690 |
|
2223 And then it took more |
|
2224 or less the same |
|
2225 |
|
2226 485 |
|
2227 00:24:14,690 --> 00:24:17,465 |
|
2228 time as the more advanced compiler. |
|
2229 |
|
2230 486 |
|
2231 00:24:17,465 --> 00:24:20,465 |
|
2232 I encourage you to |
|
2233 look at this yourself. |
|
2234 |
|
2235 487 |
|
2236 00:24:20,465 --> 00:24:24,240 |
|
2237 As usual, all the programs |
|
2238 are uploaded on KEATS. |
|
2239 |
|
2240 488 |
|
2241 00:24:25,090 --> 00:24:27,590 |
|
2242 To finish this |
|
2243 introduction video, |
|
2244 |
|
2245 489 |
|
2246 00:24:27,590 --> 00:24:30,170 |
|
2247 let me give you an |
|
2248 extremely brief overview |
|
2249 |
|
2250 490 |
|
2251 00:24:30,170 --> 00:24:32,855 |
|
2252 over the history of compilers. |
|
2253 |
|
2254 491 |
|
2255 00:24:32,855 --> 00:24:35,450 |
|
2256 While I argued at the beginning |
|
2257 |
|
2258 492 |
|
2259 00:24:35,450 --> 00:24:38,915 |
|
2260 that it's interesting to |
|
2261 study compilers nowadays, |
|
2262 |
|
2263 493 |
|
2264 00:24:38,915 --> 00:24:40,400 |
|
2265 obviously in this field, |
|
2266 |
|
2267 494 |
|
2268 00:24:40,400 --> 00:24:43,295 |
|
2269 we are standing on the |
|
2270 shoulders of giants. |
|
2271 |
|
2272 495 |
|
2273 00:24:43,295 --> 00:24:46,520 |
|
2274 The field started with Turing and |
|
2275 Turing machines, |
|
2276 |
|
2277 496 |
|
2278 00:24:46,520 --> 00:24:49,130 |
|
2279 which were introduced in 1936. |
|
2280 |
|
2281 497 |
|
2282 00:24:49,130 --> 00:24:52,175 |
|
2283 Turing machines already had |
|
2284 this concept of memory |
|
2285 |
|
2286 498 |
|
2287 00:24:52,175 --> 00:24:55,190 |
|
2288 and also programs |
|
2289 of Turing machines |
|
2290 |
|
2291 499 |
|
2292 00:24:55,190 --> 00:24:58,775 |
|
2293 needed to be translated |
|
2294 between different formalisms. |
|
2295 |
|
2296 500 |
|
2297 00:24:58,775 --> 00:25:01,100 |
|
2298 Regular expressions, |
|
2299 which will play |
|
2300 |
|
2301 501 |
|
2302 00:25:01,100 --> 00:25:03,905 |
|
2303 an important role in our |
|
2304 lexer of our compiler, |
|
2305 |
|
2306 502 |
|
2307 00:25:03,905 --> 00:25:06,800 |
|
2308 were introduced in 1956. |
|
2309 |
|
2310 503 |
|
2311 00:25:06,800 --> 00:25:10,370 |
|
2312 Arguably the first compiler |
|
2313 was introduced in |
|
2314 |
|
2315 504 |
|
2316 00:25:10,370 --> 00:25:13,850 |
|
2317 1957 for a language called COBOL. |
|
2318 |
|
2319 505 |
|
2320 00:25:13,850 --> 00:25:16,550 |
|
2321 This was done by Grace Hopper. |
|
2322 |
|
2323 506 |
|
2324 00:25:16,550 --> 00:25:18,770 |
|
2325 And I recommend that |
|
2326 you have and look |
|
2327 |
|
2328 507 |
|
2329 00:25:18,770 --> 00:25:20,900 |
|
2330 at this interview |
|
2331 of Grace Hopper, |
|
2332 |
|
2333 508 |
|
2334 00:25:20,900 --> 00:25:22,130 |
|
2335 which shows she must have been |
|
2336 |
|
2337 509 |
|
2338 00:25:22,130 --> 00:25:24,424 |
|
2339 a very interesting personality. |
|
2340 |
|
2341 510 |
|
2342 00:25:24,424 --> 00:25:27,770 |
|
2343 But despite the |
|
2344 maturity of this field, |
|
2345 |
|
2346 511 |
|
2347 00:25:27,770 --> 00:25:29,465 |
|
2348 it might be surprising that |
|
2349 |
|
2350 512 |
|
2351 00:25:29,465 --> 00:25:31,505 |
|
2352 research papers are |
|
2353 still published. |
|
2354 |
|
2355 513 |
|
2356 00:25:31,505 --> 00:25:34,835 |
|
2357 And we will find that |
|
2358 out in the module. |
|
2359 |
|
2360 514 |
|
2361 00:25:34,835 --> 00:25:37,760 |
|
2362 And a number of |
|
2363 problems which we would |
|
2364 |
|
2365 515 |
|
2366 00:25:37,760 --> 00:25:41,270 |
|
2367 assume are already solved by now, |
|
2368 |
|
2369 516 |
|
2370 00:25:41,270 --> 00:25:43,730 |
|
2371 actually turning out |
|
2372 that they are not solved. |
|
2373 |
|
2374 517 |
|
2375 00:25:43,730 --> 00:25:45,620 |
|
2376 Here in this blog post, |
|
2377 |
|
2378 518 |
|
2379 00:25:45,620 --> 00:25:47,825 |
|
2380 for example, Laurie Tratt |
|
2381 |
|
2382 519 |
|
2383 00:25:47,825 --> 00:25:49,550 |
|
2384 who is also from this department, |
|
2385 |
|
2386 520 |
|
2387 00:25:49,550 --> 00:25:51,740 |
|
2388 argued that parsing, for example, |
|
2389 |
|
2390 521 |
|
2391 00:25:51,740 --> 00:25:54,035 |
|
2392 isn't a solved problem yet. |
|
2393 |
|
2394 522 |
|
2395 00:25:54,035 --> 00:25:56,300 |
|
2396 I hope you will have as much fun |
|
2397 |
|
2398 523 |
|
2399 00:25:56,300 --> 00:25:58,130 |
|
2400 with this module as I have. |
|
2401 |
|
2402 524 |
|
2403 00:25:58,130 --> 00:26:00,750 |
|
2404 Thank you very much |
|
2405 for listening. |