videos/01-intro.srt
changeset 761 82a1315c128d
child 833 aad5957eb7e4
equal deleted inserted replaced
760:d41956ea544e 761:82a1315c128d
       
     1 1
       
     2 00:00:16,280 --> 00:00:18,330
       
     3 A warm welcome to
       
     4 
       
     5 2
       
     6 00:00:18,330 --> 00:00:20,820
       
     7 the compilers and formal
       
     8 languages module?
       
     9 
       
    10 3
       
    11 00:00:20,820 --> 00:00:24,390
       
    12 My name is Christian Urban.
       
    13 Thank you for coming.
       
    14 
       
    15 4
       
    16 00:00:24,390 --> 00:00:27,644
       
    17 Compilers. I guess
       
    18 you use compilers
       
    19 
       
    20 5
       
    21 00:00:27,644 --> 00:00:31,680
       
    22 in your daily work, be it
       
    23 the C or Java compiler.
       
    24 
       
    25 6
       
    26 00:00:31,680 --> 00:00:34,245
       
    27 And you might be curious
       
    28 in what they do.
       
    29 
       
    30 7
       
    31 00:00:34,245 --> 00:00:35,700
       
    32 But you might also be
       
    33 
       
    34 8
       
    35 00:00:35,700 --> 00:00:38,130
       
    36 intimidated to look
       
    37 what they do underneath
       
    38 
       
    39 9
       
    40 00:00:38,130 --> 00:00:39,900
       
    41 the bonnet because they are
       
    42 
       
    43 10
       
    44 00:00:39,900 --> 00:00:42,520
       
    45 quite large software systems.
       
    46 
       
    47 11
       
    48 00:00:42,520 --> 00:00:46,130
       
    49 What I like to show you in
       
    50 this module is that you
       
    51 
       
    52 12
       
    53 00:00:46,130 --> 00:00:49,310
       
    54 do not need to be an
       
    55 Über hacker to implement your own
       
    56 
       
    57 13
       
    58 00:00:49,310 --> 00:00:51,305
       
    59 compiler for your
       
    60 own language, say.
       
    61 
       
    62 14
       
    63 00:00:51,305 --> 00:00:54,155
       
    64 So that will be the main
       
    65 goal of this module.
       
    66 
       
    67 15
       
    68 00:00:54,155 --> 00:00:56,210
       
    69 You will implement
       
    70 your own compiler,
       
    71 
       
    72 16
       
    73 00:00:56,210 --> 00:00:58,310
       
    74 of course with my help.
       
    75 
       
    76 17
       
    77 00:00:58,310 --> 00:01:02,360
       
    78 What I personally like
       
    79 about compilers is that
       
    80 
       
    81 18
       
    82 00:01:02,360 --> 00:01:04,580
       
    83 the subject is a
       
    84 perfect combination
       
    85 
       
    86 19
       
    87 00:01:04,580 --> 00:01:06,350
       
    88 of theory and practice.
       
    89 
       
    90 20
       
    91 00:01:06,350 --> 00:01:07,790
       
    92 I like to hack things,
       
    93 
       
    94 21
       
    95 00:01:07,790 --> 00:01:10,595
       
    96 writing actual code,
       
    97 but I also like theory.
       
    98 
       
    99 22
       
   100 00:01:10,595 --> 00:01:13,040
       
   101 I want to understand what
       
   102 my code actually does.
       
   103 
       
   104 23
       
   105 00:01:13,040 --> 00:01:17,040
       
   106 So compilers are the
       
   107 perfect subject for me.
       
   108 
       
   109 24
       
   110 00:01:18,040 --> 00:01:20,809
       
   111 Let's have a look at the details.
       
   112 
       
   113 25
       
   114 00:01:20,809 --> 00:01:23,779
       
   115 Here's an airplane
       
   116 view of a compiler.
       
   117 
       
   118 26
       
   119 00:01:23,779 --> 00:01:25,850
       
   120 On the left-hand side you can see
       
   121 
       
   122 27
       
   123 00:01:25,850 --> 00:01:28,745
       
   124 the input program a
       
   125 developer would write.
       
   126 
       
   127 28
       
   128 00:01:28,745 --> 00:01:31,955
       
   129 And on the right-hand side is
       
   130 the output of the compiler,
       
   131 
       
   132 29
       
   133 00:01:31,955 --> 00:01:36,360
       
   134 the binary code the developer
       
   135 wants to actually run.
       
   136 
       
   137 30
       
   138 00:01:36,370 --> 00:01:40,340
       
   139 What makes such a project
       
   140 actually feasible is that
       
   141 
       
   142 31
       
   143 00:01:40,340 --> 00:01:44,165
       
   144 compilers fall neatly into
       
   145 separate components.
       
   146 
       
   147 32
       
   148 00:01:44,165 --> 00:01:47,210
       
   149 So you can see the lexer and the
       
   150 
       
   151 33
       
   152 00:01:47,210 --> 00:01:48,860
       
   153 parser, which are often called the
       
   154 
       
   155 34
       
   156 00:01:48,860 --> 00:01:50,990
       
   157 front end of the compiler.
       
   158 
       
   159 35
       
   160 00:01:50,990 --> 00:01:53,000
       
   161 And the code generation is
       
   162 
       
   163 36
       
   164 00:01:53,000 --> 00:01:55,700
       
   165 called the backend
       
   166 of the compiler.
       
   167 
       
   168 37
       
   169 00:01:55,700 --> 00:01:57,620
       
   170 And it so happens
       
   171 that we will spend
       
   172 
       
   173 38
       
   174 00:01:57,620 --> 00:01:59,930
       
   175 the first five weeks
       
   176 on the lexer and
       
   177 
       
   178 39
       
   179 00:01:59,930 --> 00:02:04,970
       
   180 the parser, and the next five
       
   181 weeks on the code generator.
       
   182 
       
   183 40
       
   184 00:02:04,970 --> 00:02:09,575
       
   185 The first component of the
       
   186 compiler is the lexer.
       
   187 
       
   188 41
       
   189 00:02:09,575 --> 00:02:14,480
       
   190 The purpose of the lexer
       
   191 is to scan a flat
       
   192 
       
   193 42
       
   194 00:02:14,480 --> 00:02:16,610
       
   195 string, which the
       
   196 input program is at
       
   197 
       
   198 43
       
   199 00:02:16,610 --> 00:02:19,145
       
   200 the beginning and separate out
       
   201 
       
   202 44
       
   203 00:02:19,145 --> 00:02:21,275
       
   204 where are the words?
       
   205 
       
   206 45
       
   207 00:02:21,275 --> 00:02:23,600
       
   208 You might think
       
   209 in Western languages,
       
   210 
       
   211 46
       
   212 00:02:23,600 --> 00:02:24,920
       
   213 that is very easy:
       
   214 
       
   215 47
       
   216 00:02:24,920 --> 00:02:26,690
       
   217 you just try to
       
   218 find out where are
       
   219 
       
   220 48
       
   221 00:02:26,690 --> 00:02:28,580
       
   222 the whitespaces
       
   223 and then you know,
       
   224 
       
   225 49
       
   226 00:02:28,580 --> 00:02:31,955
       
   227 where one word stops and
       
   228 where the next word begins.
       
   229 
       
   230 50
       
   231 00:02:31,955 --> 00:02:35,300
       
   232 But, actually, computer
       
   233 languages are more
       
   234 
       
   235 51
       
   236 00:02:35,300 --> 00:02:38,180
       
   237 like ancient languages
       
   238 that you see here.
       
   239 
       
   240 52
       
   241 00:02:38,180 --> 00:02:39,860
       
   242 For example, ancient Greek.
       
   243 
       
   244 53
       
   245 00:02:39,860 --> 00:02:42,065
       
   246 The writer wrote one word
       
   247 
       
   248 54
       
   249 00:02:42,065 --> 00:02:44,915
       
   250 after the next and not
       
   251 leaving any space.
       
   252 
       
   253 55
       
   254 00:02:44,915 --> 00:02:47,930
       
   255 So for example, in this
       
   256 very simple string here
       
   257 
       
   258 56
       
   259 00:02:47,930 --> 00:02:50,960
       
   260 read n, there is no space at all.
       
   261 
       
   262 57
       
   263 00:02:50,960 --> 00:02:53,540
       
   264 But the purpose of
       
   265 the lexer is to
       
   266 
       
   267 58
       
   268 00:02:53,540 --> 00:02:56,570
       
   269 separate it into five
       
   270 different components.
       
   271 
       
   272 59
       
   273 00:02:56,570 --> 00:02:59,450
       
   274 It has to first say,
       
   275 well there is a read,
       
   276 
       
   277 60
       
   278 00:02:59,450 --> 00:03:01,595
       
   279 then there is a left parenthesis,
       
   280 
       
   281 61
       
   282 00:03:01,595 --> 00:03:04,025
       
   283 then there's an
       
   284 identifier called n,
       
   285 
       
   286 62
       
   287 00:03:04,025 --> 00:03:07,715
       
   288 then there's a right parenthesis,
       
   289 and finally a semicolon.
       
   290 
       
   291 63
       
   292 00:03:07,715 --> 00:03:11,345
       
   293 And as you can see, there
       
   294 is no space here at all.
       
   295 
       
   296 64
       
   297 00:03:11,345 --> 00:03:14,150
       
   298 And so the task of finding where
       
   299 
       
   300 65
       
   301 00:03:14,150 --> 00:03:17,705
       
   302 are the words is actually
       
   303 quite involved.
       
   304 
       
   305 66
       
   306 00:03:17,705 --> 00:03:19,460
       
   307 Also the classification is
       
   308 
       
   309 67
       
   310 00:03:19,460 --> 00:03:22,055
       
   311 sometimes not so straightforward.
       
   312 
       
   313 68
       
   314 00:03:22,055 --> 00:03:24,350
       
   315 If, for example, a writer
       
   316 
       
   317 69
       
   318 00:03:24,350 --> 00:03:26,360
       
   319 wrote "if" on its own,
       
   320 
       
   321 70
       
   322 00:03:26,360 --> 00:03:29,000
       
   323 then this should be a
       
   324 keyword classified as
       
   325 
       
   326 71
       
   327 00:03:29,000 --> 00:03:32,615
       
   328 a keyword because it's
       
   329 from the if-then-else.
       
   330 
       
   331 72
       
   332 00:03:32,615 --> 00:03:36,800
       
   333 But if the developer wrote
       
   334 something longer like "iffoo",
       
   335 
       
   336 73
       
   337 00:03:36,800 --> 00:03:38,030
       
   338 then this might just be
       
   339 
       
   340 74
       
   341 00:03:38,030 --> 00:03:39,860
       
   342 a strange variable name
       
   343 and he needs to be
       
   344 
       
   345 75
       
   346 00:03:39,860 --> 00:03:44,250
       
   347 classified as a variable name.
       
   348 
       
   349 76
       
   350 00:03:45,220 --> 00:03:50,720
       
   351 The output of the lexer 
       
   352 is a sequence of tokens.
       
   353 
       
   354 77
       
   355 00:03:50,720 --> 00:03:53,480
       
   356 These are essentially the words in
       
   357 
       
   358 78
       
   359 00:03:53,480 --> 00:03:56,405
       
   360 a sentence and their
       
   361 classification:
       
   362 
       
   363 79
       
   364 00:03:56,405 --> 00:03:59,540
       
   365 they are nouns,
       
   366 verbs or adjectives.
       
   367 
       
   368 80
       
   369 00:03:59,540 --> 00:04:02,885
       
   370 For us, of course, the
       
   371 classification would be keywords,
       
   372 
       
   373 81
       
   374 00:04:02,885 --> 00:04:06,005
       
   375 identifiers,
       
   376 operators, and so on.
       
   377 
       
   378 82
       
   379 00:04:06,005 --> 00:04:11,480
       
   380 And these tokens are the
       
   381 input for the parser,
       
   382 
       
   383 83
       
   384 00:04:11,480 --> 00:04:13,085
       
   385 the next component in
       
   386 
       
   387 84
       
   388 00:04:13,085 --> 00:04:17,240
       
   389 our compiler. The parser
       
   390 essentially takes this list
       
   391 
       
   392 85
       
   393 00:04:17,240 --> 00:04:23,300
       
   394 of tokens and transforms it
       
   395 into a abstract syntax tree.
       
   396 
       
   397 86
       
   398 00:04:23,300 --> 00:04:27,230
       
   399 That means we have now the
       
   400 sentence, we have the words.
       
   401 
       
   402 87
       
   403 00:04:27,230 --> 00:04:30,275
       
   404 We have to relate
       
   405 the words to each other
       
   406 
       
   407 88
       
   408 00:04:30,275 --> 00:04:33,680
       
   409 in order to find out what
       
   410 this sentence actually means.
       
   411 
       
   412 89
       
   413 00:04:33,680 --> 00:04:35,405
       
   414 In this case here,
       
   415 
       
   416 90
       
   417 00:04:35,405 --> 00:04:38,595
       
   418 we have to do a read...which
       
   419 variable is affected?
       
   420 
       
   421 91
       
   422 00:04:38,595 --> 00:04:45,040
       
   423 The variable n. Once we have
       
   424 the abstract syntax tree,
       
   425 
       
   426 92
       
   427 00:04:45,040 --> 00:04:49,225
       
   428 it can go to the next component,
       
   429 to the code generator.
       
   430 
       
   431 93
       
   432 00:04:49,225 --> 00:04:52,480
       
   433 Whilst it doesn't look
       
   434 like this in this picture,
       
   435 
       
   436 94
       
   437 00:04:52,480 --> 00:04:54,100
       
   438 the code generators is usually the
       
   439 
       
   440 95
       
   441 00:04:54,100 --> 00:04:56,080
       
   442 biggest part in a compiler.
       
   443 
       
   444 96
       
   445 00:04:56,080 --> 00:04:58,720
       
   446 And here we actually
       
   447 have to cut corners.
       
   448 
       
   449 97
       
   450 00:04:58,720 --> 00:05:02,470
       
   451 Instead of producing binary
       
   452 code, which can be run
       
   453 
       
   454 98
       
   455 00:05:02,470 --> 00:05:06,820
       
   456 directly on a CPU, like X86 or ARM,
       
   457 
       
   458 99
       
   459 00:05:06,820 --> 00:05:11,035
       
   460 we actually target the
       
   461 Java Virtual Machine, the JVM.
       
   462 
       
   463 100
       
   464 00:05:11,035 --> 00:05:13,600
       
   465 This is very similar
       
   466 to the Scala compiler,
       
   467 
       
   468 101
       
   469 00:05:13,600 --> 00:05:16,480
       
   470 for example, which
       
   471 produces JVM code.
       
   472 
       
   473 102
       
   474 00:05:16,480 --> 00:05:18,940
       
   475 Or, of course, also 
       
   476 the Java compiler,
       
   477 
       
   478 103
       
   479 00:05:18,940 --> 00:05:20,845
       
   480 which produces JVM code.
       
   481 
       
   482 104
       
   483 00:05:20,845 --> 00:05:23,900
       
   484 So here's a typical
       
   485 example code which we're
       
   486 
       
   487 105
       
   488 00:05:23,900 --> 00:05:27,305
       
   489 going to produce: Something
       
   490 like variable 2
       
   491 
       
   492 106
       
   493 00:05:27,305 --> 00:05:30,545
       
   494 gets allocated in the stack.
       
   495 
       
   496 107
       
   497 00:05:30,545 --> 00:05:32,390
       
   498 You subtract ten from it.
       
   499 
       
   500 108
       
   501 00:05:32,390 --> 00:05:36,050
       
   502 You test whether the
       
   503 variable is 0 and if yes,
       
   504 
       
   505 109
       
   506 00:05:36,050 --> 00:05:40,170
       
   507 you jump somewhere and otherwise
       
   508 you reload the variable.
       
   509 
       
   510 110
       
   511 00:05:41,560 --> 00:05:45,935
       
   512 Even though we cut corners
       
   513 into could generate apart.
       
   514 
       
   515 111
       
   516 00:05:45,935 --> 00:05:48,575
       
   517 By producing code for the JVM,
       
   518 
       
   519 112
       
   520 00:05:48,575 --> 00:05:51,710
       
   521 we can still obtain quite
       
   522 impressive results.
       
   523 
       
   524 113
       
   525 00:05:51,710 --> 00:05:56,225
       
   526 Here's a program with
       
   527 a cubic runtime behaviour.
       
   528 
       
   529 114
       
   530 00:05:56,225 --> 00:05:59,525
       
   531 When it has to
       
   532 calculate with 400 units,
       
   533 
       
   534 115
       
   535 00:05:59,525 --> 00:06:02,165
       
   536 it might need five
       
   537 seconds to calculate.
       
   538 
       
   539 116
       
   540 00:06:02,165 --> 00:06:05,345
       
   541 But if it  has to
       
   542 calculate with 1200 units,
       
   543 
       
   544 117
       
   545 00:06:05,345 --> 00:06:08,165
       
   546 it already needs more
       
   547 than two minutes.
       
   548 
       
   549 118
       
   550 00:06:08,165 --> 00:06:10,940
       
   551 Now these timings,
       
   552 the blue timings
       
   553 
       
   554 119
       
   555 00:06:10,940 --> 00:06:13,910
       
   556 are obtained with an
       
   557 interpreter of that program.
       
   558 
       
   559 120
       
   560 00:06:13,910 --> 00:06:16,265
       
   561 And you can see
       
   562 just next to it,
       
   563 
       
   564 121
       
   565 00:06:16,265 --> 00:06:18,050
       
   566 the red times.
       
   567 
       
   568 122
       
   569 00:06:18,050 --> 00:06:20,359
       
   570 They are obtained
       
   571 with the compiler
       
   572 
       
   573 123
       
   574 00:06:20,359 --> 00:06:23,135
       
   575 we're going to write
       
   576 in this module.
       
   577 
       
   578 124
       
   579 00:06:23,135 --> 00:06:25,100
       
   580 And there you can see, even
       
   581 
       
   582 125
       
   583 00:06:25,100 --> 00:06:29,000
       
   584 with 1200,  the times get
       
   585 hardly off the x-axis.
       
   586 
       
   587 126
       
   588 00:06:29,000 --> 00:06:30,485
       
   589 So in this instance,
       
   590 
       
   591 127
       
   592 00:06:30,485 --> 00:06:32,630
       
   593 our compiler will
       
   594 have a speed up from
       
   595 
       
   596 128
       
   597 00:06:32,630 --> 00:06:37,589
       
   598 approximately 1 million in
       
   599 comparison to the interpreter.
       
   600 
       
   601 129
       
   602 00:06:38,350 --> 00:06:42,020
       
   603 This might be a fun task
       
   604 for your spare time.
       
   605 
       
   606 130
       
   607 00:06:42,020 --> 00:06:44,480
       
   608 This is a compiler explorer,
       
   609 which allows you to
       
   610 
       
   611 131
       
   612 00:06:44,480 --> 00:06:47,060
       
   613 write C code on the
       
   614 left-hand side.
       
   615 
       
   616 132
       
   617 00:06:47,060 --> 00:06:49,115
       
   618 It shows you on the
       
   619 right-hand side
       
   620 
       
   621 133
       
   622 00:06:49,115 --> 00:06:53,270
       
   623 the machine code an
       
   624 X86 CPU can run.
       
   625 
       
   626 134
       
   627 00:06:53,270 --> 00:06:56,060
       
   628 For example, it gives
       
   629 you these color scheme and
       
   630 
       
   631 135
       
   632 00:06:56,060 --> 00:06:59,000
       
   633 says that this addition
       
   634 in the num + num in
       
   635 
       
   636 136
       
   637 00:06:59,000 --> 00:07:01,580
       
   638 the return statement,
       
   639 translates to
       
   640 
       
   641 137
       
   642 00:07:01,580 --> 00:07:02,960
       
   643 essentially an addition of
       
   644 
       
   645 138
       
   646 00:07:02,960 --> 00:07:05,930
       
   647 an register in the machine code.
       
   648 
       
   649 139
       
   650 00:07:05,930 --> 00:07:08,480
       
   651 I think this compiler
       
   652 explorer also works for
       
   653 
       
   654 140
       
   655 00:07:08,480 --> 00:07:11,495
       
   656 the Haskell language and
       
   657 also produces ARM code,
       
   658 
       
   659 141
       
   660 00:07:11,495 --> 00:07:15,245
       
   661 not just code for the X86 CPUs.
       
   662 
       
   663 142
       
   664 00:07:15,245 --> 00:07:18,950
       
   665 As an aside, I also recommend
       
   666 to watch the movie of
       
   667 
       
   668 143
       
   669 00:07:18,950 --> 00:07:20,870
       
   670 this guy because that is
       
   671 
       
   672 144
       
   673 00:07:20,870 --> 00:07:22,940
       
   674 very much like how I worked
       
   675 at the beginning
       
   676 
       
   677 145
       
   678 00:07:22,940 --> 00:07:25,355
       
   679 when implementing our compiler.
       
   680 
       
   681 146
       
   682 00:07:25,355 --> 00:07:29,300
       
   683 I wrote some code which I
       
   684 knew how it should behave.
       
   685 
       
   686 147
       
   687 00:07:29,300 --> 00:07:30,725
       
   688 And then I just had a look
       
   689 
       
   690 148
       
   691 00:07:30,725 --> 00:07:32,900
       
   692 what another compiler produces.
       
   693 
       
   694 149
       
   695 00:07:32,900 --> 00:07:37,190
       
   696 And I imitated that code
       
   697 as much as I could.
       
   698 
       
   699 150
       
   700 00:07:37,190 --> 00:07:39,380
       
   701 Such a compiler explorer
       
   702 
       
   703 151
       
   704 00:07:39,380 --> 00:07:41,375
       
   705 also exists for
       
   706 the Java language.
       
   707 
       
   708 152
       
   709 00:07:41,375 --> 00:07:42,935
       
   710 Here's one where you can write
       
   711 
       
   712 153
       
   713 00:07:42,935 --> 00:07:44,915
       
   714 Java code on the left-hand side,
       
   715 
       
   716 154
       
   717 00:07:44,915 --> 00:07:47,930
       
   718 and on the right-hand
       
   719 side you get JVM code.
       
   720 
       
   721 155
       
   722 00:07:47,930 --> 00:07:50,255
       
   723 JVM code is a byte code
       
   724 
       
   725 156
       
   726 00:07:50,255 --> 00:07:53,405
       
   727 which cannot be run
       
   728 directly by the CPU,
       
   729 
       
   730 157
       
   731 00:07:53,405 --> 00:07:55,220
       
   732 but needs the Java
       
   733 Virtual Machine
       
   734 
       
   735 158
       
   736 00:07:55,220 --> 00:07:57,170
       
   737 to essentially interpret that.
       
   738 
       
   739 159
       
   740 00:07:57,170 --> 00:07:58,760
       
   741 This means it's not
       
   742 
       
   743 160
       
   744 00:07:58,760 --> 00:08:01,235
       
   745 the most efficient way
       
   746 how to run programs - 
       
   747 
       
   748 161
       
   749 00:08:01,235 --> 00:08:02,780
       
   750 it would be much faster to
       
   751 
       
   752 162
       
   753 00:08:02,780 --> 00:08:05,285
       
   754 generate direct
       
   755 code for the CPU.
       
   756 
       
   757 163
       
   758 00:08:05,285 --> 00:08:07,700
       
   759 But by producing bytecode,
       
   760 
       
   761 164
       
   762 00:08:07,700 --> 00:08:11,435
       
   763 we still run the
       
   764 programs quite fast
       
   765 
       
   766 165
       
   767 00:08:11,435 --> 00:08:13,520
       
   768 and it also simplifies
       
   769 
       
   770 166
       
   771 00:08:13,520 --> 00:08:16,280
       
   772 a number of issues
       
   773 in our compiler.
       
   774 
       
   775 167
       
   776 00:08:16,280 --> 00:08:18,980
       
   777 One issue is about
       
   778 memory management.
       
   779 
       
   780 168
       
   781 00:08:18,980 --> 00:08:22,055
       
   782 We don't have to be concerned
       
   783 about register allocation,
       
   784 
       
   785 169
       
   786 00:08:22,055 --> 00:08:25,505
       
   787 which we would need
       
   788 to do in a real CPU.
       
   789 
       
   790 170
       
   791 00:08:25,505 --> 00:08:27,680
       
   792 This will be done by the JVM.
       
   793 
       
   794 171
       
   795 00:08:27,680 --> 00:08:29,750
       
   796 It's also much easier to produce
       
   797 
       
   798 172
       
   799 00:08:29,750 --> 00:08:33,635
       
   800 this bytecode rather than
       
   801 actual machine code.
       
   802 
       
   803 173
       
   804 00:08:33,635 --> 00:08:37,385
       
   805 I think it's now a good time
       
   806 to come to the question,
       
   807 
       
   808 174
       
   809 00:08:37,385 --> 00:08:39,950
       
   810 why on Earth studying compilers.
       
   811 
       
   812 175
       
   813 00:08:39,950 --> 00:08:42,650
       
   814 Compilers are such an
       
   815 established subject
       
   816 
       
   817 176
       
   818 00:08:42,650 --> 00:08:43,985
       
   819 in computer science.
       
   820 
       
   821 177
       
   822 00:08:43,985 --> 00:08:46,100
       
   823 Compilers do what they do.
       
   824 
       
   825 178
       
   826 00:08:46,100 --> 00:08:48,410
       
   827 Probably forrests have
       
   828 been killed by
       
   829 
       
   830 179
       
   831 00:08:48,410 --> 00:08:52,190
       
   832 all the books that have been
       
   833 published on compilers.
       
   834 
       
   835 180
       
   836 00:08:52,190 --> 00:08:56,690
       
   837 Why on Earth studying
       
   838 compilers in 2020? And even worse
       
   839 
       
   840 181
       
   841 00:08:56,690 --> 00:08:59,659
       
   842 why implementing
       
   843 your own compiler?
       
   844 
       
   845 182
       
   846 00:08:59,659 --> 00:09:02,720
       
   847 Well, a slightly humorous take on
       
   848 
       
   849 183
       
   850 00:09:02,720 --> 00:09:05,105
       
   851 that question is
       
   852 given by John Regehr,
       
   853 
       
   854 184
       
   855 00:09:05,105 --> 00:09:08,375
       
   856 who is a compiler hacker
       
   857 from the University of Utah.
       
   858 
       
   859 185
       
   860 00:09:08,375 --> 00:09:09,770
       
   861 Essentially what he says,
       
   862 
       
   863 186
       
   864 00:09:09,770 --> 00:09:12,110
       
   865 if you're a good compiler hacker,
       
   866 
       
   867 187
       
   868 00:09:12,110 --> 00:09:14,690
       
   869 you have no problems
       
   870 of finding a job.
       
   871 
       
   872 188
       
   873 00:09:14,690 --> 00:09:17,210
       
   874 He puts it as: It's effectively
       
   875 
       
   876 189
       
   877 00:09:17,210 --> 00:09:22,320
       
   878 a "Perpetual Employment Act"
       
   879 for solid compiler hackers.
       
   880 
       
   881 190
       
   882 00:09:22,990 --> 00:09:27,380
       
   883 Regehr gives two justifications
       
   884 for that statement.
       
   885 
       
   886 191
       
   887 00:09:27,380 --> 00:09:29,690
       
   888 First, he says
       
   889 hardware is getting
       
   890 
       
   891 192
       
   892 00:09:29,690 --> 00:09:32,585
       
   893 wierder, rather than
       
   894 getting clocked faster.
       
   895 
       
   896 193
       
   897 00:09:32,585 --> 00:09:34,520
       
   898 And that's definitely true.
       
   899 
       
   900 194
       
   901 00:09:34,520 --> 00:09:36,590
       
   902 My first computer many, many,
       
   903 
       
   904 195
       
   905 00:09:36,590 --> 00:09:40,040
       
   906 many years ago contained
       
   907 only a single core CPU.
       
   908 
       
   909 196
       
   910 00:09:40,040 --> 00:09:44,030
       
   911 And it was such a simple CPU
       
   912 that we wrote machine code
       
   913 
       
   914 197
       
   915 00:09:44,030 --> 00:09:46,220
       
   916 directly for that CPU in order to
       
   917 
       
   918 198
       
   919 00:09:46,220 --> 00:09:49,740
       
   920 run our programs as
       
   921 fast as possible.
       
   922 
       
   923 199
       
   924 00:09:50,260 --> 00:09:53,600
       
   925 In contrast, today, Regehr writes,
       
   926 
       
   927 200
       
   928 00:09:53,600 --> 00:09:57,005
       
   929 almost all processors are
       
   930 multi-core nowadays.
       
   931 
       
   932 201
       
   933 00:09:57,005 --> 00:09:59,870
       
   934 And it looks like there's
       
   935 an increasing asymmetry
       
   936 
       
   937 202
       
   938 00:09:59,870 --> 00:10:02,015
       
   939 in resources across cores.
       
   940 
       
   941 203
       
   942 00:10:02,015 --> 00:10:04,445
       
   943 Processors come
       
   944 with vector units,
       
   945 
       
   946 204
       
   947 00:10:04,445 --> 00:10:07,189
       
   948 crypto accelerators, etc.
       
   949 
       
   950 205
       
   951 00:10:07,189 --> 00:10:11,930
       
   952 We have TSPs, GPUs, ARM
       
   953 big,little, Xeon Phis,
       
   954 
       
   955 206
       
   956 00:10:11,930 --> 00:10:14,630
       
   957 and this only scratches the surface.
       
   958 
       
   959 207
       
   960 00:10:14,630 --> 00:10:17,255
       
   961 And that is really a
       
   962 problem for compilers,
       
   963 
       
   964 208
       
   965 00:10:17,255 --> 00:10:20,495
       
   966 because if we now
       
   967 have multi-core CPUs,
       
   968 
       
   969 209
       
   970 00:10:20,495 --> 00:10:23,180
       
   971 that means our programs need
       
   972 
       
   973 210
       
   974 00:10:23,180 --> 00:10:26,075
       
   975 to be scheduled
       
   976 over several CPUs.
       
   977 
       
   978 211
       
   979 00:10:26,075 --> 00:10:28,220
       
   980 But the developer, of course,
       
   981 
       
   982 212
       
   983 00:10:28,220 --> 00:10:30,545
       
   984 doesn't want to know
       
   985 anything about that.
       
   986 
       
   987 213
       
   988 00:10:30,545 --> 00:10:34,655
       
   989 Also, now we have more
       
   990 CPUs in a computer,
       
   991 
       
   992 214
       
   993 00:10:34,655 --> 00:10:37,400
       
   994 but they seem to also come
       
   995 with different resources.
       
   996 
       
   997 215
       
   998 00:10:37,400 --> 00:10:40,310
       
   999 So certain tasks in
       
  1000 a program need to
       
  1001 
       
  1002 216
       
  1003 00:10:40,310 --> 00:10:43,460
       
  1004 be scheduled on some
       
  1005 cores, but not on others.
       
  1006 
       
  1007 217
       
  1008 00:10:43,460 --> 00:10:46,685
       
  1009 We also have for a
       
  1010 long time already GPUs,
       
  1011 
       
  1012 218
       
  1013 00:10:46,685 --> 00:10:49,025
       
  1014 which are highly specialized for
       
  1015 
       
  1016 219
       
  1017 00:10:49,025 --> 00:10:53,240
       
  1018 very parallel computations
       
  1019 to do with graphics.
       
  1020 
       
  1021 220
       
  1022 00:10:53,240 --> 00:10:56,015
       
  1023 They at least in
       
  1024 the past few years,
       
  1025 
       
  1026 221
       
  1027 00:10:56,015 --> 00:10:59,360
       
  1028 they could only do very
       
  1029 special computations,
       
  1030 
       
  1031 222
       
  1032 00:10:59,360 --> 00:11:02,255
       
  1033 but very, very fast
       
  1034 and highly parallel.
       
  1035 
       
  1036 223
       
  1037 00:11:02,255 --> 00:11:05,539
       
  1038 You couldn't use them for
       
  1039 general-purpose calculations,
       
  1040 
       
  1041 224
       
  1042 00:11:05,539 --> 00:11:10,205
       
  1043 which you would usually
       
  1044 use CPUs. Similarly, DSPs.
       
  1045 
       
  1046 225
       
  1047 00:11:10,205 --> 00:11:14,075
       
  1048 They are needed for all
       
  1049 the signal processing
       
  1050 
       
  1051 226
       
  1052 00:11:14,075 --> 00:11:16,040
       
  1053 in mobile phones.
       
  1054 
       
  1055 227
       
  1056 00:11:16,040 --> 00:11:20,780
       
  1057 Without them, we just
       
  1058 wouldn't have mobile phones.
       
  1059 
       
  1060 228
       
  1061 00:11:20,780 --> 00:11:23,525
       
  1062 The second reason Regehr gives is
       
  1063 
       
  1064 229
       
  1065 00:11:23,525 --> 00:11:25,550
       
  1066 that we are getting tired of
       
  1067 
       
  1068 230
       
  1069 00:11:25,550 --> 00:11:27,620
       
  1070 low-level languages and
       
  1071 
       
  1072 231
       
  1073 00:11:27,620 --> 00:11:30,335
       
  1074 their associated
       
  1075 security disasters.
       
  1076 
       
  1077 232
       
  1078 00:11:30,335 --> 00:11:32,435
       
  1079 While at the beginning
       
  1080 we were still
       
  1081 
       
  1082 233
       
  1083 00:11:32,435 --> 00:11:34,760
       
  1084 happy to write machine
       
  1085 code directly,
       
  1086 
       
  1087 234
       
  1088 00:11:34,760 --> 00:11:37,175
       
  1089 nobody wants to do this nowadays.
       
  1090 
       
  1091 235
       
  1092 00:11:37,175 --> 00:11:39,515
       
  1093 He writes: :We want
       
  1094 to write new code
       
  1095 
       
  1096 236
       
  1097 00:11:39,515 --> 00:11:44,120
       
  1098 to whatever extent possible in
       
  1099 safer high-level languages.
       
  1100 
       
  1101 237
       
  1102 00:11:44,120 --> 00:11:46,130
       
  1103 Compilers are caught right
       
  1104 
       
  1105 238
       
  1106 00:11:46,130 --> 00:11:48,365
       
  1107 in the middle of these
       
  1108 opposing trends:
       
  1109 
       
  1110 239
       
  1111 00:11:48,365 --> 00:11:50,765
       
  1112 one of their main jobs
       
  1113 is to have bridged
       
  1114 
       
  1115 240
       
  1116 00:11:50,765 --> 00:11:53,240
       
  1117 the large and growing gap between
       
  1118 
       
  1119 241
       
  1120 00:11:53,240 --> 00:11:55,460
       
  1121 increasingly high-level languages
       
  1122 
       
  1123 242
       
  1124 00:11:55,460 --> 00:11:58,565
       
  1125 and increasingly
       
  1126 wacky platforms.
       
  1127 
       
  1128 243
       
  1129 00:11:58,565 --> 00:12:00,320
       
  1130 So here you have it:
       
  1131 
       
  1132 244
       
  1133 00:12:00,320 --> 00:12:02,750
       
  1134 It's still interesting to study
       
  1135 
       
  1136 245
       
  1137 00:12:02,750 --> 00:12:06,244
       
  1138 compilers nowadays,
       
  1139 especially nowadays.
       
  1140 
       
  1141 246
       
  1142 00:12:06,244 --> 00:12:09,875
       
  1143 Well, if you want to work
       
  1144 on interesting problems,
       
  1145 
       
  1146 247
       
  1147 00:12:09,875 --> 00:12:14,570
       
  1148 then very often you have to
       
  1149 know compilers. Here's one:
       
  1150 
       
  1151 248
       
  1152 00:12:14,570 --> 00:12:16,940
       
  1153 In the good old
       
  1154 times when we were
       
  1155 
       
  1156 249
       
  1157 00:12:16,940 --> 00:12:19,310
       
  1158 still able to fly on holidays,
       
  1159 
       
  1160 250
       
  1161 00:12:19,310 --> 00:12:23,300
       
  1162 I'm sure you flew in
       
  1163 a 777 Boeing airplane.
       
  1164 
       
  1165 251
       
  1166 00:12:23,300 --> 00:12:25,850
       
  1167 It's actually already
       
  1168 a quite old airplane.
       
  1169 
       
  1170 252
       
  1171 00:12:25,850 --> 00:12:28,295
       
  1172 But they had the
       
  1173 following problem.
       
  1174 
       
  1175 253
       
  1176 00:12:28,295 --> 00:12:33,020
       
  1177 What happens if a CPU
       
  1178 calculates the wrong result?
       
  1179 
       
  1180 254
       
  1181 00:12:33,020 --> 00:12:36,065
       
  1182 And that's actually not
       
  1183 such a hypothetical problem
       
  1184 
       
  1185 255
       
  1186 00:12:36,065 --> 00:12:40,295
       
  1187 because you remember
       
  1188 the Pentium CPU
       
  1189 
       
  1190 256
       
  1191 00:12:40,295 --> 00:12:43,655
       
  1192 in around 2000
       
  1193 contained a bug and it cost
       
  1194 
       
  1195 257
       
  1196 00:12:43,655 --> 00:12:48,140
       
  1197 a lot of money for Intel
       
  1198 to replace this CPU.
       
  1199 
       
  1200 258
       
  1201 00:12:48,140 --> 00:12:51,470
       
  1202 What happens if an CPU calculates
       
  1203 
       
  1204 259
       
  1205 00:12:51,470 --> 00:12:56,105
       
  1206 the wrong result in
       
  1207 some navigation data?
       
  1208 
       
  1209 260
       
  1210 00:12:56,105 --> 00:12:58,520
       
  1211 Do you just let the
       
  1212 airplane crash?
       
  1213 
       
  1214 261
       
  1215 00:12:58,520 --> 00:13:00,650
       
  1216 Well, the engineers at Boeing
       
  1217 
       
  1218 262
       
  1219 00:13:00,650 --> 00:13:02,675
       
  1220 came up with the
       
  1221 following solution:
       
  1222 
       
  1223 263
       
  1224 00:13:02,675 --> 00:13:05,055
       
  1225 They writing one program that
       
  1226 
       
  1227 264
       
  1228 00:13:05,055 --> 00:13:08,690
       
  1229 essentially controls
       
  1230 how their airplane
       
  1231 
       
  1232 265
       
  1233 00:13:08,690 --> 00:13:10,295
       
  1234 is supposed to fly.
       
  1235 
       
  1236 266
       
  1237 00:13:10,295 --> 00:13:13,040
       
  1238 In a programming language
       
  1239 called Ada that is
       
  1240 
       
  1241 267
       
  1242 00:13:13,040 --> 00:13:15,770
       
  1243 slightly obscure
       
  1244 programming language
       
  1245 
       
  1246 268
       
  1247 00:13:15,770 --> 00:13:18,650
       
  1248 bat is very well-known in
       
  1249 
       
  1250 269
       
  1251 00:13:18,650 --> 00:13:22,040
       
  1252 areas where safety
       
  1253 is really paramount.
       
  1254 
       
  1255 270
       
  1256 00:13:22,040 --> 00:13:25,010
       
  1257 And what they did is they
       
  1258 took this one Ada program
       
  1259 
       
  1260 271
       
  1261 00:13:25,010 --> 00:13:28,010
       
  1262 and they essentially
       
  1263 took three compilers,
       
  1264 
       
  1265 272
       
  1266 00:13:28,010 --> 00:13:29,435
       
  1267 independent compilers,
       
  1268 
       
  1269 273
       
  1270 00:13:29,435 --> 00:13:32,045
       
  1271 which compiled the program
       
  1272 to different CPUs.
       
  1273 
       
  1274 274
       
  1275 00:13:32,045 --> 00:13:33,815
       
  1276 One CPU from Intel,
       
  1277 
       
  1278 275
       
  1279 00:13:33,815 --> 00:13:37,235
       
  1280 one CPU for Motorola,
       
  1281 and one from AMD.
       
  1282 
       
  1283 276
       
  1284 00:13:37,235 --> 00:13:38,930
       
  1285 And these are quite different
       
  1286 
       
  1287 277
       
  1288 00:13:38,930 --> 00:13:40,955
       
  1289 CPUs. Also some of them
       
  1290 
       
  1291 278
       
  1292 00:13:40,955 --> 00:13:42,755
       
  1293 have quite different philosophies
       
  1294 
       
  1295 279
       
  1296 00:13:42,755 --> 00:13:45,245
       
  1297 on how they do their calculations.
       
  1298 
       
  1299 280
       
  1300 00:13:45,245 --> 00:13:47,330
       
  1301 Now what they could do is, they
       
  1302 
       
  1303 281
       
  1304 00:13:47,330 --> 00:13:50,690
       
  1305 could put these three computers
       
  1306 on a single board and
       
  1307 
       
  1308 282
       
  1309 00:13:50,690 --> 00:13:54,335
       
  1310 could now run all their
       
  1311 calculations in parallel.
       
  1312 
       
  1313 283
       
  1314 00:13:54,335 --> 00:13:56,690
       
  1315 One with an Intel
       
  1316 CPU, one with
       
  1317 
       
  1318 284
       
  1319 00:13:56,690 --> 00:14:00,245
       
  1320 Motorola, and one with a 
       
  1321 Risc computers say.
       
  1322 
       
  1323 285
       
  1324 00:14:00,245 --> 00:14:02,795
       
  1325 And then they could
       
  1326 compare the results.
       
  1327 
       
  1328 286
       
  1329 00:14:02,795 --> 00:14:05,030
       
  1330 And if these results differed,
       
  1331 
       
  1332 287
       
  1333 00:14:05,030 --> 00:14:07,940
       
  1334 then they knew one CPU must have
       
  1335 
       
  1336 288
       
  1337 00:14:07,940 --> 00:14:11,600
       
  1338 calculated the wrong result
       
  1339 and probably told the pilot,
       
  1340 
       
  1341 289
       
  1342 00:14:11,600 --> 00:14:14,850
       
  1343 please don't let
       
  1344 that airplane crash.
       
  1345 
       
  1346 290
       
  1347 00:14:14,950 --> 00:14:17,270
       
  1348 Not just Boeing is doing
       
  1349 
       
  1350 291
       
  1351 00:14:17,270 --> 00:14:19,355
       
  1352 interesting things
       
  1353 with compilers.
       
  1354 
       
  1355 292
       
  1356 00:14:19,355 --> 00:14:22,640
       
  1357 Also, Airbus in a completely
       
  1358 different setting
       
  1359 
       
  1360 293
       
  1361 00:14:22,640 --> 00:14:24,260
       
  1362 is using compilers in
       
  1363 
       
  1364 294
       
  1365 00:14:24,260 --> 00:14:26,420
       
  1366 a interesting way to get
       
  1367 
       
  1368 295
       
  1369 00:14:26,420 --> 00:14:30,195
       
  1370 their airplanes up in the
       
  1371 air and let them not crush.
       
  1372 
       
  1373 296
       
  1374 00:14:30,195 --> 00:14:33,010
       
  1375 In another example, I
       
  1376 have friends working
       
  1377 
       
  1378 297
       
  1379 00:14:33,010 --> 00:14:35,350
       
  1380 at Facebook who work on
       
  1381 
       
  1382 298
       
  1383 00:14:35,350 --> 00:14:37,900
       
  1384 compilers to make sense
       
  1385 are they are heaps
       
  1386 
       
  1387 299
       
  1388 00:14:37,900 --> 00:14:41,470
       
  1389 and heaps and heaps of
       
  1390 code written in PHP,
       
  1391 
       
  1392 300
       
  1393 00:14:41,470 --> 00:14:42,880
       
  1394 which is one of the most
       
  1395 
       
  1396 301
       
  1397 00:14:42,880 --> 00:14:44,920
       
  1398 horrible languages on the planet.
       
  1399 
       
  1400 302
       
  1401 00:14:44,920 --> 00:14:46,630
       
  1402 The bottom line is,
       
  1403 
       
  1404 303
       
  1405 00:14:46,630 --> 00:14:50,499
       
  1406 compilers are still very,
       
  1407 very important nowadays.
       
  1408 
       
  1409 304
       
  1410 00:14:50,499 --> 00:14:52,810
       
  1411 And for the foreseeable future,
       
  1412 
       
  1413 305
       
  1414 00:14:52,810 --> 00:14:56,150
       
  1415 compilers still need
       
  1416 to be developed.
       
  1417 
       
  1418 306
       
  1419 00:14:57,270 --> 00:15:00,235
       
  1420 When one talks about
       
  1421 compilers then
       
  1422 
       
  1423 307
       
  1424 00:15:00,235 --> 00:15:01,630
       
  1425 there is, of course,
       
  1426 
       
  1427 308
       
  1428 00:15:01,630 --> 00:15:05,395
       
  1429 a magic element involved. They're
       
  1430 large software systems.
       
  1431 
       
  1432 309
       
  1433 00:15:05,395 --> 00:15:07,750
       
  1434 And yes, they're supposed to
       
  1435 
       
  1436 310
       
  1437 00:15:07,750 --> 00:15:10,525
       
  1438 generate a runnable binary for
       
  1439 
       
  1440 311
       
  1441 00:15:10,525 --> 00:15:12,105
       
  1442 a high-level program,
       
  1443 
       
  1444 312
       
  1445 00:15:12,105 --> 00:15:15,230
       
  1446 but they are also supposed
       
  1447 to make our programs better.
       
  1448 
       
  1449 313
       
  1450 00:15:15,230 --> 00:15:18,920
       
  1451 They optimize our
       
  1452 programs to run faster.
       
  1453 
       
  1454 314
       
  1455 00:15:18,920 --> 00:15:22,610
       
  1456 And there's a lot of
       
  1457 "magic" involved in that.
       
  1458 
       
  1459 315
       
  1460 00:15:22,610 --> 00:15:24,890
       
  1461 Magic in inverted quotes.
       
  1462 
       
  1463 316
       
  1464 00:15:24,890 --> 00:15:26,480
       
  1465 I would like to show you one of
       
  1466 
       
  1467 317
       
  1468 00:15:26,480 --> 00:15:29,340
       
  1469 this magic in one example.
       
  1470 
       
  1471 318
       
  1472 00:15:31,090 --> 00:15:33,110
       
  1473 I hope you still have
       
  1474 
       
  1475 319
       
  1476 00:15:33,110 --> 00:15:36,485
       
  1477 fond memories of the
       
  1478 PEP module last year.
       
  1479 
       
  1480 320
       
  1481 00:15:36,485 --> 00:15:40,280
       
  1482 And you might remember BF
       
  1483 language, we had to look at.
       
  1484 
       
  1485 321
       
  1486 00:15:40,280 --> 00:15:43,670
       
  1487 This BF language contains
       
  1488 the kind of memory and
       
  1489 
       
  1490 322
       
  1491 00:15:43,670 --> 00:15:45,680
       
  1492 the memory pointer which can
       
  1493 
       
  1494 323
       
  1495 00:15:45,680 --> 00:15:48,140
       
  1496 be moved to the left
       
  1497 or to the right,
       
  1498 
       
  1499 324
       
  1500 00:15:48,140 --> 00:15:50,270
       
  1501 where an integer in
       
  1502 
       
  1503 325
       
  1504 00:15:50,270 --> 00:15:52,925
       
  1505 the memory can be either
       
  1506 increased or decreased.
       
  1507 
       
  1508 326
       
  1509 00:15:52,925 --> 00:15:55,730
       
  1510 We can print
       
  1511 out the content of
       
  1512 
       
  1513 327
       
  1514 00:15:55,730 --> 00:15:59,645
       
  1515 the current cell or input
       
  1516 something into a cell.
       
  1517 
       
  1518 328
       
  1519 00:15:59,645 --> 00:16:01,850
       
  1520 And we have loops, and
       
  1521 
       
  1522 329
       
  1523 00:16:01,850 --> 00:16:04,220
       
  1524 everything else is
       
  1525 considered a comment.
       
  1526 
       
  1527 330
       
  1528 00:16:04,220 --> 00:16:06,290
       
  1529 What's good about
       
  1530 this BF language is that 
       
  1531 
       
  1532 331
       
  1533 00:16:06,290 --> 00:16:08,180
       
  1534 we don't even need a front end.
       
  1535 
       
  1536 332
       
  1537 00:16:08,180 --> 00:16:09,920
       
  1538 We can immediately start writing
       
  1539 
       
  1540 333
       
  1541 00:16:09,920 --> 00:16:13,529
       
  1542 an interpreter or a compiler.
       
  1543 
       
  1544 334
       
  1545 00:16:15,850 --> 00:16:18,155
       
  1546 Okay, I have here
       
  1547 
       
  1548 335
       
  1549 00:16:18,155 --> 00:16:20,600
       
  1550 a very straightforward
       
  1551 interpreter for
       
  1552 
       
  1553 336
       
  1554 00:16:20,600 --> 00:16:22,865
       
  1555 the BF language. You
       
  1556 might recognize it.
       
  1557 
       
  1558 337
       
  1559 00:16:22,865 --> 00:16:27,120
       
  1560 And I run it with a
       
  1561 benchmark program, which
       
  1562 
       
  1563 338
       
  1564 00:16:27,760 --> 00:16:30,960
       
  1565 might also recognize.
       
  1566 
       
  1567 339
       
  1568 00:16:31,560 --> 00:16:36,835
       
  1569 And this will now take
       
  1570 approximately ten minutes.
       
  1571 
       
  1572 340
       
  1573 00:16:36,835 --> 00:16:39,920
       
  1574 So see you in a bit.
       
  1575 
       
  1576 341
       
  1577 00:16:40,710 --> 00:16:43,660
       
  1578 Okay, this has finished now.
       
  1579 
       
  1580 342
       
  1581 00:16:43,660 --> 00:16:45,925
       
  1582 Almost took 11 minutes.
       
  1583 
       
  1584 343
       
  1585 00:16:45,925 --> 00:16:47,410
       
  1586 The question now is,
       
  1587 
       
  1588 344
       
  1589 00:16:47,410 --> 00:16:51,820
       
  1590 can we to better? 
       
  1591 
       
  1592 345
       
  1593 00:16:51,820 --> 00:16:53,260
       
  1594 Actually, it is not difficult
       
  1595 to do better
       
  1596 
       
  1597 346
       
  1598 00:16:53,260 --> 00:16:54,970
       
  1599 than this simple-minded
       
  1600 interpreter.
       
  1601 
       
  1602 347
       
  1603 00:16:54,970 --> 00:16:58,690
       
  1604 It is relatively
       
  1605 straightforward to take
       
  1606 
       
  1607 348
       
  1608 00:16:58,690 --> 00:17:03,520
       
  1609 a BF program and generate equivalent C code.
       
  1610 
       
  1611 349
       
  1612 00:17:03,520 --> 00:17:06,520
       
  1613 This can be easily
       
  1614 done by somehow
       
  1615 
       
  1616 350
       
  1617 00:17:06,520 --> 00:17:09,490
       
  1618 representing the BF memory in
       
  1619 
       
  1620 351
       
  1621 00:17:09,490 --> 00:17:12,310
       
  1622 C. We can do this
       
  1623 by just an array of
       
  1624 
       
  1625 352
       
  1626 00:17:12,310 --> 00:17:15,380
       
  1627 characters and a memory pointer,
       
  1628 
       
  1629 353
       
  1630 00:17:15,380 --> 00:17:19,265
       
  1631 which points, in this case
       
  1632 in the middle of this array.
       
  1633 
       
  1634 354
       
  1635 00:17:19,265 --> 00:17:21,800
       
  1636 Then it's very easy to
       
  1637 
       
  1638 355
       
  1639 00:17:21,800 --> 00:17:24,275
       
  1640 translate the movement
       
  1641 of the 
       
  1642 
       
  1643 356
       
  1644 00:17:24,275 --> 00:17:28,610
       
  1645 BF memory pointer into increments
       
  1646 
       
  1647 357
       
  1648 00:17:28,610 --> 00:17:31,520
       
  1649 and decrements of
       
  1650 the C pointer.
       
  1651 
       
  1652 358
       
  1653 00:17:31,520 --> 00:17:33,050
       
  1654 Similarly, if you want to increment or
       
  1655 
       
  1656 359
       
  1657 00:17:33,050 --> 00:17:35,915
       
  1658 decrement an element
       
  1659 in this array,
       
  1660 
       
  1661 360
       
  1662 00:17:35,915 --> 00:17:38,975
       
  1663 you just have to look up what
       
  1664 the pointer is and increment
       
  1665 
       
  1666 361
       
  1667 00:17:38,975 --> 00:17:42,140
       
  1668 and decrement what's
       
  1669 under the pointer. Similarly
       
  1670 
       
  1671 362
       
  1672 00:17:42,140 --> 00:17:43,790
       
  1673 we can print out something from
       
  1674 
       
  1675 363
       
  1676 00:17:43,790 --> 00:17:47,450
       
  1677 this array and we can put
       
  1678 something into this array.
       
  1679 
       
  1680 364
       
  1681 00:17:47,450 --> 00:17:49,610
       
  1682 What is great is that the loops
       
  1683 
       
  1684 365
       
  1685 00:17:49,610 --> 00:17:51,530
       
  1686 from the BF language directly
       
  1687 
       
  1688 366
       
  1689 00:17:51,530 --> 00:17:55,580
       
  1690 translate into while
       
  1691 loops of the C language.
       
  1692 
       
  1693 367
       
  1694 00:17:55,580 --> 00:17:58,100
       
  1695 We essentially have to check is
       
  1696 
       
  1697 368
       
  1698 00:17:58,100 --> 00:18:02,450
       
  1699 the memory pointer pointing
       
  1700 to a field that is 0.
       
  1701 
       
  1702 369
       
  1703 00:18:02,450 --> 00:18:03,950
       
  1704 Then we exit the loop,
       
  1705 
       
  1706 370
       
  1707 00:18:03,950 --> 00:18:05,719
       
  1708 and otherwise we continue.
       
  1709 
       
  1710 371
       
  1711 00:18:05,719 --> 00:18:09,575
       
  1712 And that can be done
       
  1713 nicely in C, like so.
       
  1714 
       
  1715 372
       
  1716 00:18:09,575 --> 00:18:12,995
       
  1717 And everything else is
       
  1718 again, just a comment.
       
  1719 
       
  1720 373
       
  1721 00:18:12,995 --> 00:18:16,445
       
  1722 So I have implemented
       
  1723 this translation for you.
       
  1724 
       
  1725 374
       
  1726 00:18:16,445 --> 00:18:19,700
       
  1727 Remember this was the BF
       
  1728 program for the Mandelbrot Set.
       
  1729 
       
  1730 375
       
  1731 00:18:19,700 --> 00:18:23,690
       
  1732 The equivalent C code
       
  1733 would look like this.
       
  1734 
       
  1735 376
       
  1736 00:18:23,690 --> 00:18:27,110
       
  1737 So you can see at the beginning
       
  1738 is this generation of
       
  1739 
       
  1740 377
       
  1741 00:18:27,110 --> 00:18:30,140
       
  1742 the BF memory represented
       
  1743 
       
  1744 378
       
  1745 00:18:30,140 --> 00:18:32,345
       
  1746 as an array and the
       
  1747 memory pointer.
       
  1748 
       
  1749 379
       
  1750 00:18:32,345 --> 00:18:34,550
       
  1751 And then inside there are lots
       
  1752 
       
  1753 380
       
  1754 00:18:34,550 --> 00:18:36,770
       
  1755 of increments and decrements
       
  1756 
       
  1757 381
       
  1758 00:18:36,770 --> 00:18:41,915
       
  1759 of pointers and also
       
  1760 contents of this array.
       
  1761 
       
  1762 382
       
  1763 00:18:41,915 --> 00:18:45,199
       
  1764 Now fingers crossed that this
       
  1765 is the correct translation.
       
  1766 
       
  1767 383
       
  1768 00:18:45,199 --> 00:18:48,125
       
  1769 And I can also run this
       
  1770 
       
  1771 384
       
  1772 00:18:48,125 --> 00:18:52,805
       
  1773 and you should see that it runs
       
  1774 substantially faster.
       
  1775 
       
  1776 385
       
  1777 00:18:52,805 --> 00:18:55,880
       
  1778 I'm using now my GCC on
       
  1779 
       
  1780 386
       
  1781 00:18:55,880 --> 00:19:01,040
       
  1782 my computer to generate an
       
  1783 executable for the C program.
       
  1784 
       
  1785 387
       
  1786 00:19:01,040 --> 00:19:04,295
       
  1787 And it should run for
       
  1788 approximately 20 seconds.
       
  1789 
       
  1790 388
       
  1791 00:19:04,295 --> 00:19:07,620
       
  1792 So let's just wait for that.
       
  1793 
       
  1794 389
       
  1795 00:19:11,430 --> 00:19:14,950
       
  1796 Okay. What is important to note
       
  1797 
       
  1798 390
       
  1799 00:19:14,950 --> 00:19:19,885
       
  1800 here is that I'm running GCC,
       
  1801 
       
  1802 391
       
  1803 00:19:19,885 --> 00:19:22,450
       
  1804 this the option -O0.
       
  1805 
       
  1806 392
       
  1807 00:19:22,450 --> 00:19:25,600
       
  1808 That means I tell GCC to
       
  1809 
       
  1810 393
       
  1811 00:19:25,600 --> 00:19:28,840
       
  1812 generate a binary which I
       
  1813 can run as you can see.
       
  1814 
       
  1815 394
       
  1816 00:19:28,840 --> 00:19:31,810
       
  1817 But don't apply any optimization.
       
  1818 
       
  1819 395
       
  1820 00:19:31,810 --> 00:19:38,395
       
  1821 Keep this in mind.
       
  1822 As a second try,
       
  1823 
       
  1824 396
       
  1825 00:19:38,395 --> 00:19:42,595
       
  1826 of course, I can try to now
       
  1827 generate a better C program.
       
  1828 
       
  1829 397
       
  1830 00:19:42,595 --> 00:19:46,060
       
  1831 And as you'll remember from
       
  1832 the PEP course, it can,
       
  1833 
       
  1834 398
       
  1835 00:19:46,060 --> 00:19:50,095
       
  1836 for example, combine
       
  1837 several steps
       
  1838 
       
  1839 399
       
  1840 00:19:50,095 --> 00:19:51,670
       
  1841 going to the right of the memory
       
  1842 
       
  1843 400
       
  1844 00:19:51,670 --> 00:19:53,310
       
  1845 pointer or to the left.
       
  1846 
       
  1847 401
       
  1848 00:19:53,310 --> 00:19:55,280
       
  1849 We can combine that into
       
  1850 
       
  1851 402
       
  1852 00:19:55,280 --> 00:19:58,760
       
  1853 one single increment or
       
  1854 decrement of not just one,
       
  1855 
       
  1856 403
       
  1857 00:19:58,760 --> 00:20:00,050
       
  1858 but of like n,
       
  1859 
       
  1860 404
       
  1861 00:20:00,050 --> 00:20:02,570
       
  1862 where n is greater
       
  1863 than 1. Similarly, 
       
  1864 
       
  1865 405
       
  1866 00:20:02,570 --> 00:20:06,980
       
  1867 if we increment or decrement
       
  1868 the content of this array,
       
  1869 
       
  1870 406
       
  1871 00:20:06,980 --> 00:20:09,740
       
  1872 we can do this in one
       
  1873 big step by incrementing
       
  1874 
       
  1875 407
       
  1876 00:20:09,740 --> 00:20:12,710
       
  1877 that by not just one
       
  1878 and increment by one,
       
  1879 
       
  1880 408
       
  1881 00:20:12,710 --> 00:20:15,635
       
  1882 but increment and decrement
       
  1883 by bigger numbers.
       
  1884 
       
  1885 409
       
  1886 00:20:15,635 --> 00:20:18,870
       
  1887 Everything else stays the same.
       
  1888 
       
  1889 410
       
  1890 00:20:20,830 --> 00:20:23,270
       
  1891 Again, I have implemented that
       
  1892 
       
  1893 411
       
  1894 00:20:23,270 --> 00:20:24,950
       
  1895 for you and you can see now
       
  1896 
       
  1897 412
       
  1898 00:20:24,950 --> 00:20:26,835
       
  1899 the C program moves
       
  1900 
       
  1901 413
       
  1902 00:20:26,835 --> 00:20:30,455
       
  1903 the memory pointer and bigger
       
  1904 chunks and also increases,
       
  1905 
       
  1906 414
       
  1907 00:20:30,455 --> 00:20:32,810
       
  1908 for example, here, memory content
       
  1909 
       
  1910 415
       
  1911 00:20:32,810 --> 00:20:35,555
       
  1912 by 15 than just by 1.
       
  1913 
       
  1914 416
       
  1915 00:20:35,555 --> 00:20:38,520
       
  1916 Now let's run this program.
       
  1917 
       
  1918 417
       
  1919 00:20:38,530 --> 00:20:40,760
       
  1920 Again, I use GCC
       
  1921 
       
  1922 418
       
  1923 00:20:40,760 --> 00:20:45,350
       
  1924 to compile the
       
  1925 C program and run it.
       
  1926 
       
  1927 419
       
  1928 00:20:45,350 --> 00:20:49,025
       
  1929 And again, I made sure
       
  1930 that it only runs this with
       
  1931 
       
  1932 420
       
  1933 00:20:49,025 --> 00:20:51,530
       
  1934 no optimizations switched on.
       
  1935 
       
  1936 421
       
  1937 00:20:51,530 --> 00:20:54,050
       
  1938 So there's minus O0.
       
  1939 
       
  1940 422
       
  1941 00:20:54,050 --> 00:20:56,090
       
  1942 And you can see
       
  1943 it's now down from
       
  1944 
       
  1945 423
       
  1946 00:20:56,090 --> 00:20:59,990
       
  1947 20 seconds to just 6 seconds.
       
  1948 
       
  1949 424
       
  1950 00:20:59,990 --> 00:21:06,065
       
  1951 I show you, the GCC is
       
  1952 called with -O0.
       
  1953 
       
  1954 425
       
  1955 00:21:06,065 --> 00:21:08,990
       
  1956 So this reduction
       
  1957 in time is purely
       
  1958 
       
  1959 426
       
  1960 00:21:08,990 --> 00:21:12,755
       
  1961 because I produced better C code,
       
  1962 
       
  1963 427
       
  1964 00:21:12,755 --> 00:21:17,220
       
  1965 which the compiler then just
       
  1966 transformed into a binary.
       
  1967 
       
  1968 428
       
  1969 00:21:18,910 --> 00:21:22,055
       
  1970 So far there's
       
  1971 nothing interesting.
       
  1972 
       
  1973 429
       
  1974 00:21:22,055 --> 00:21:25,385
       
  1975 We used in the first instance
       
  1976 
       
  1977 430
       
  1978 00:21:25,385 --> 00:21:29,240
       
  1979 single increments and use GCC with
       
  1980 
       
  1981 431
       
  1982 00:21:29,240 --> 00:21:31,700
       
  1983 O0 to not introduce
       
  1984 
       
  1985 432
       
  1986 00:21:31,700 --> 00:21:35,255
       
  1987 any optimizations and run
       
  1988 essentially 20 seconds.
       
  1989 
       
  1990 433
       
  1991 00:21:35,255 --> 00:21:37,880
       
  1992 If we then increment
       
  1993 
       
  1994 434
       
  1995 00:21:37,880 --> 00:21:40,895
       
  1996 in bigger chunks or
       
  1997 decrement in bigger chunks,
       
  1998 
       
  1999 435
       
  2000 00:21:40,895 --> 00:21:45,380
       
  2001 use again GCC with -O0,
       
  2002 
       
  2003 436
       
  2004 00:21:45,380 --> 00:21:50,030
       
  2005 then it runs in approximately
       
  2006 five to six seconds.
       
  2007 
       
  2008 437
       
  2009 00:21:50,030 --> 00:21:51,950
       
  2010 Now let me do the following:
       
  2011 
       
  2012 438
       
  2013 00:21:51,950 --> 00:21:55,430
       
  2014 I take the first program which
       
  2015 
       
  2016 439
       
  2017 00:21:55,430 --> 00:21:58,070
       
  2018 increments everything
       
  2019 in a single steps
       
  2020 
       
  2021 440
       
  2022 00:21:58,070 --> 00:22:00,185
       
  2023 or decremented in single steps.
       
  2024 
       
  2025 441
       
  2026 00:22:00,185 --> 00:22:04,835
       
  2027 But now I use the full
       
  2028 capacity of the GCC compiler
       
  2029 
       
  2030 442
       
  2031 00:22:04,835 --> 00:22:06,560
       
  2032 and I tell it,
       
  2033 
       
  2034 443
       
  2035 00:22:06,560 --> 00:22:11,370
       
  2036 please do introduce some
       
  2037 optimizations as you want.
       
  2038 
       
  2039 444
       
  2040 00:22:11,590 --> 00:22:15,799
       
  2041 And I'm now running
       
  2042 exactly the same program...
       
  2043 
       
  2044 445
       
  2045 00:22:15,799 --> 00:22:17,870
       
  2046 just the GCC compiler will
       
  2047 
       
  2048 446
       
  2049 00:22:17,870 --> 00:22:22,325
       
  2050 now have the optimizations
       
  2051 switched on.
       
  2052 
       
  2053 447
       
  2054 00:22:22,325 --> 00:22:24,380
       
  2055 Let's see what happens.
       
  2056 
       
  2057 448
       
  2058 00:22:24,380 --> 00:22:27,320
       
  2059 One first needs to compile it.
       
  2060 
       
  2061 449
       
  2062 00:22:27,320 --> 00:22:29,795
       
  2063 And that takes a little while.
       
  2064 
       
  2065 450
       
  2066 00:22:29,795 --> 00:22:32,000
       
  2067 Okay, this has now
       
  2068 finished and also
       
  2069 
       
  2070 451
       
  2071 00:22:32,000 --> 00:22:34,115
       
  2072 the calculation of the
       
  2073 picture has finished.
       
  2074 
       
  2075 452
       
  2076 00:22:34,115 --> 00:22:35,960
       
  2077 And as you can see, it took
       
  2078 
       
  2079 453
       
  2080 00:22:35,960 --> 00:22:38,510
       
  2081 approximately eight
       
  2082 seconds to calculate.
       
  2083 
       
  2084 454
       
  2085 00:22:38,510 --> 00:22:41,645
       
  2086 That is down from
       
  2087 approximately 20 seconds.
       
  2088 
       
  2089 455
       
  2090 00:22:41,645 --> 00:22:46,040
       
  2091 So the difference from
       
  2092 switching from -O0 to
       
  2093 
       
  2094 456
       
  2095 00:22:46,040 --> 00:22:51,935
       
  2096 -O3 meant that now
       
  2097 the program runs almost as
       
  2098 
       
  2099 457
       
  2100 00:22:51,935 --> 00:22:54,800
       
  2101 fast as where I by hand
       
  2102 
       
  2103 458
       
  2104 00:22:54,800 --> 00:22:58,610
       
  2105 combined several steps
       
  2106 into a big chunk.
       
  2107 
       
  2108 459
       
  2109 00:22:58,610 --> 00:23:00,170
       
  2110 That is essentially what
       
  2111 
       
  2112 460
       
  2113 00:23:00,170 --> 00:23:03,485
       
  2114 the GCC compiler
       
  2115 found out on its own.
       
  2116 
       
  2117 461
       
  2118 00:23:03,485 --> 00:23:05,840
       
  2119 That rather than jumping
       
  2120 
       
  2121 462
       
  2122 00:23:05,840 --> 00:23:08,465
       
  2123 in just single increments by one,
       
  2124 
       
  2125 463
       
  2126 00:23:08,465 --> 00:23:11,510
       
  2127 they can be all combined
       
  2128 into bigger chunks.
       
  2129 
       
  2130 464
       
  2131 00:23:11,510 --> 00:23:16,595
       
  2132 It hasn't been as successful
       
  2133 as if I do this explicitly.
       
  2134 
       
  2135 465
       
  2136 00:23:16,595 --> 00:23:18,620
       
  2137 But that is the magic that
       
  2138 
       
  2139 466
       
  2140 00:23:18,620 --> 00:23:22,560
       
  2141 the compiler essentially found
       
  2142 out, that this can be done.
       
  2143 
       
  2144 467
       
  2145 00:23:22,960 --> 00:23:25,700
       
  2146 Just a quick recap of what I did.
       
  2147 
       
  2148 468
       
  2149 00:23:25,700 --> 00:23:28,160
       
  2150 I first run the program with
       
  2151 
       
  2152 469
       
  2153 00:23:28,160 --> 00:23:31,730
       
  2154 an interpreter and it took
       
  2155 approximately 11 minutes.
       
  2156 
       
  2157 470
       
  2158 00:23:31,730 --> 00:23:36,559
       
  2159 Then I had a simple compiler
       
  2160 generating a C program.
       
  2161 
       
  2162 471
       
  2163 00:23:36,559 --> 00:23:40,880
       
  2164 And the C compiler then had
       
  2165 no optimization switched on.
       
  2166 
       
  2167 472
       
  2168 00:23:40,880 --> 00:23:44,645
       
  2169 In the C program had only
       
  2170 small single increments and
       
  2171 
       
  2172 473
       
  2173 00:23:44,645 --> 00:23:46,820
       
  2174 small jumps. Then it took
       
  2175 
       
  2176 474
       
  2177 00:23:46,820 --> 00:23:49,775
       
  2178 approximately 20 seconds for
       
  2179 to same program.
       
  2180 
       
  2181 475
       
  2182 00:23:49,775 --> 00:23:52,460
       
  2183 Then I had a more advanced
       
  2184 compiler which does
       
  2185 
       
  2186 476
       
  2187 00:23:52,460 --> 00:23:55,730
       
  2188 big increments and
       
  2189 also big jumps.
       
  2190 
       
  2191 477
       
  2192 00:23:55,730 --> 00:23:57,470
       
  2193 But again, the compiler didn't
       
  2194 
       
  2195 478
       
  2196 00:23:57,470 --> 00:23:59,210
       
  2197 introduce any optimization on its
       
  2198 
       
  2199 479
       
  2200 00:23:59,210 --> 00:24:02,915
       
  2201 own. Then it took
       
  2202 approximately five seconds.
       
  2203 
       
  2204 480
       
  2205 00:24:02,915 --> 00:24:05,210
       
  2206 Then I went back to
       
  2207 
       
  2208 481
       
  2209 00:24:05,210 --> 00:24:08,465
       
  2210 the simple compiler with
       
  2211 only small steps.
       
  2212 
       
  2213 482
       
  2214 00:24:08,465 --> 00:24:10,400
       
  2215 But then told the C compiler
       
  2216 
       
  2217 483
       
  2218 00:24:10,400 --> 00:24:12,950
       
  2219 to fully optimize this program.
       
  2220 
       
  2221 484
       
  2222 00:24:12,950 --> 00:24:14,690
       
  2223 And then it took more
       
  2224 or less the same
       
  2225 
       
  2226 485
       
  2227 00:24:14,690 --> 00:24:17,465
       
  2228 time as the more advanced compiler.
       
  2229 
       
  2230 486
       
  2231 00:24:17,465 --> 00:24:20,465
       
  2232 I encourage you to
       
  2233 look at this yourself.
       
  2234 
       
  2235 487
       
  2236 00:24:20,465 --> 00:24:24,240
       
  2237 As usual, all the programs
       
  2238 are uploaded on KEATS.
       
  2239 
       
  2240 488
       
  2241 00:24:25,090 --> 00:24:27,590
       
  2242 To finish this
       
  2243 introduction video,
       
  2244 
       
  2245 489
       
  2246 00:24:27,590 --> 00:24:30,170
       
  2247 let me give you an
       
  2248 extremely brief overview
       
  2249 
       
  2250 490
       
  2251 00:24:30,170 --> 00:24:32,855
       
  2252 over the history of compilers.
       
  2253 
       
  2254 491
       
  2255 00:24:32,855 --> 00:24:35,450
       
  2256 While I  argued at the beginning
       
  2257 
       
  2258 492
       
  2259 00:24:35,450 --> 00:24:38,915
       
  2260 that it's interesting to
       
  2261 study compilers nowadays,
       
  2262 
       
  2263 493
       
  2264 00:24:38,915 --> 00:24:40,400
       
  2265 obviously in this field,
       
  2266 
       
  2267 494
       
  2268 00:24:40,400 --> 00:24:43,295
       
  2269 we are standing on the
       
  2270 shoulders of giants.
       
  2271 
       
  2272 495
       
  2273 00:24:43,295 --> 00:24:46,520
       
  2274 The field started with Turing and 
       
  2275 Turing machines,
       
  2276 
       
  2277 496
       
  2278 00:24:46,520 --> 00:24:49,130
       
  2279 which were introduced in 1936.
       
  2280 
       
  2281 497
       
  2282 00:24:49,130 --> 00:24:52,175
       
  2283 Turing machines already had
       
  2284 this concept of memory
       
  2285 
       
  2286 498
       
  2287 00:24:52,175 --> 00:24:55,190
       
  2288 and also programs
       
  2289 of Turing machines
       
  2290 
       
  2291 499
       
  2292 00:24:55,190 --> 00:24:58,775
       
  2293 needed to be translated
       
  2294 between different formalisms.
       
  2295 
       
  2296 500
       
  2297 00:24:58,775 --> 00:25:01,100
       
  2298 Regular expressions,
       
  2299 which will play
       
  2300 
       
  2301 501
       
  2302 00:25:01,100 --> 00:25:03,905
       
  2303 an important role in our
       
  2304 lexer of our compiler,
       
  2305 
       
  2306 502
       
  2307 00:25:03,905 --> 00:25:06,800
       
  2308 were introduced in 1956.
       
  2309 
       
  2310 503
       
  2311 00:25:06,800 --> 00:25:10,370
       
  2312 Arguably the first compiler
       
  2313 was introduced in
       
  2314 
       
  2315 504
       
  2316 00:25:10,370 --> 00:25:13,850
       
  2317 1957 for a language called COBOL.
       
  2318 
       
  2319 505
       
  2320 00:25:13,850 --> 00:25:16,550
       
  2321 This was done by Grace Hopper.
       
  2322 
       
  2323 506
       
  2324 00:25:16,550 --> 00:25:18,770
       
  2325 And I recommend that
       
  2326 you have and look
       
  2327 
       
  2328 507
       
  2329 00:25:18,770 --> 00:25:20,900
       
  2330 at this interview
       
  2331 of Grace Hopper,
       
  2332 
       
  2333 508
       
  2334 00:25:20,900 --> 00:25:22,130
       
  2335 which shows she must have been
       
  2336 
       
  2337 509
       
  2338 00:25:22,130 --> 00:25:24,424
       
  2339 a very interesting personality.
       
  2340 
       
  2341 510
       
  2342 00:25:24,424 --> 00:25:27,770
       
  2343 But despite the
       
  2344 maturity of this field,
       
  2345 
       
  2346 511
       
  2347 00:25:27,770 --> 00:25:29,465
       
  2348 it might be surprising that
       
  2349 
       
  2350 512
       
  2351 00:25:29,465 --> 00:25:31,505
       
  2352 research papers are
       
  2353 still published.
       
  2354 
       
  2355 513
       
  2356 00:25:31,505 --> 00:25:34,835
       
  2357 And we will find that
       
  2358 out in the module.
       
  2359 
       
  2360 514
       
  2361 00:25:34,835 --> 00:25:37,760
       
  2362 And a number of
       
  2363 problems which we would
       
  2364 
       
  2365 515
       
  2366 00:25:37,760 --> 00:25:41,270
       
  2367 assume are already solved by now,
       
  2368 
       
  2369 516
       
  2370 00:25:41,270 --> 00:25:43,730
       
  2371 actually turning out
       
  2372 that they are not solved.
       
  2373 
       
  2374 517
       
  2375 00:25:43,730 --> 00:25:45,620
       
  2376 Here in this blog post,
       
  2377 
       
  2378 518
       
  2379 00:25:45,620 --> 00:25:47,825
       
  2380 for example, Laurie Tratt
       
  2381 
       
  2382 519
       
  2383 00:25:47,825 --> 00:25:49,550
       
  2384 who is also from this department,
       
  2385 
       
  2386 520
       
  2387 00:25:49,550 --> 00:25:51,740
       
  2388 argued that parsing, for example,
       
  2389 
       
  2390 521
       
  2391 00:25:51,740 --> 00:25:54,035
       
  2392 isn't a solved problem yet.
       
  2393 
       
  2394 522
       
  2395 00:25:54,035 --> 00:25:56,300
       
  2396 I hope you will have as much fun
       
  2397 
       
  2398 523
       
  2399 00:25:56,300 --> 00:25:58,130
       
  2400 with this module as I have.
       
  2401 
       
  2402 524
       
  2403 00:25:58,130 --> 00:26:00,750
       
  2404 Thank you very much
       
  2405 for listening.