100:00:16,280 --> 00:00:18,330A warm welcome to200:00:18,330 --> 00:00:20,820the compilers and formallanguages module?300:00:20,820 --> 00:00:24,390My name is Christian Urban.Thank you for coming.400:00:24,390 --> 00:00:27,644Compilers. I guessyou use compilers500:00:27,644 --> 00:00:31,680in your daily work, be itthe C or Java compiler.600:00:31,680 --> 00:00:34,245And you might be curiousin what they do.700:00:34,245 --> 00:00:35,700But you might also be800:00:35,700 --> 00:00:38,130intimidated to lookwhat they do underneath900:00:38,130 --> 00:00:39,900the bonnet because they are1000:00:39,900 --> 00:00:42,520quite large software systems.1100:00:42,520 --> 00:00:46,130What I like to show you inthis module is that you1200:00:46,130 --> 00:00:49,310do not need to be anÃœberhacker to implement your own1300:00:49,310 --> 00:00:51,305compiler for yourown language, say.1400:00:51,305 --> 00:00:54,155So that will be the maingoal of this module.1500:00:54,155 --> 00:00:56,210You will implementyour own compiler,1600:00:56,210 --> 00:00:58,310of course with my help.1700:00:58,310 --> 00:01:02,360What I personally likeabout compilers is that1800:01:02,360 --> 00:01:04,580the subject is aperfect combination1900:01:04,580 --> 00:01:06,350of theory and practice.2000:01:06,350 --> 00:01:07,790I like to hack things,2100:01:07,790 --> 00:01:10,595writing actual code,but I also like theory.2200:01:10,595 --> 00:01:13,040I want to understand whatmy code actually does.2300:01:13,040 --> 00:01:17,040So compilers are theperfect subject for me.2400:01:18,040 --> 00:01:20,809Let's have a look at the details.2500:01:20,809 --> 00:01:23,779Here's an airplaneview of a compiler.2600:01:23,779 --> 00:01:25,850On the left-hand side you can see2700:01:25,850 --> 00:01:28,745the input program adeveloper would write.2800:01:28,745 --> 00:01:31,955And on the right-hand side isthe output of the compiler,2900:01:31,955 --> 00:01:36,360the binary code the developerwants to actually run.3000:01:36,370 --> 00:01:40,340What makes such a projectactually feasible is that3100:01:40,340 --> 00:01:44,165compilers fall neatly intoseparate components.3200:01:44,165 --> 00:01:47,210So you can see the lexer and the3300:01:47,210 --> 00:01:48,860parser, which are often called the3400:01:48,860 --> 00:01:50,990front end of the compiler.3500:01:50,990 --> 00:01:53,000And the code generation is3600:01:53,000 --> 00:01:55,700called the backendof the compiler.3700:01:55,700 --> 00:01:57,620And it so happensthat we will spend3800:01:57,620 --> 00:01:59,930the first five weekson the lexer and3900:01:59,930 --> 00:02:04,970the parser, and the next fiveweeks on the code generator.4000:02:04,970 --> 00:02:09,575The first component of thecompiler is the lexer.4100:02:09,575 --> 00:02:14,480The purpose of the lexeris to scan a flat4200:02:14,480 --> 00:02:16,610string, which theinput program is at4300:02:16,610 --> 00:02:19,145the beginning and separate out4400:02:19,145 --> 00:02:21,275where are the words?4500:02:21,275 --> 00:02:23,600You might thinkin Western languages,4600:02:23,600 --> 00:02:24,920that is very easy:4700:02:24,920 --> 00:02:26,690you just try tofind out where are4800:02:26,690 --> 00:02:28,580the whitespacesand then you know,4900:02:28,580 --> 00:02:31,955where one word stops andwhere the next word begins.5000:02:31,955 --> 00:02:35,300But, actually, computerlanguages are more5100:02:35,300 --> 00:02:38,180like ancient languagesthat you see here.5200:02:38,180 --> 00:02:39,860For example, ancient Greek.5300:02:39,860 --> 00:02:42,065The writer wrote one word5400:02:42,065 --> 00:02:44,915after the next and notleaving any space.5500:02:44,915 --> 00:02:47,930So for example, in thisvery simple string here5600:02:47,930 --> 00:02:50,960read n, there is no space at all.5700:02:50,960 --> 00:02:53,540But the purpose ofthe lexer is to5800:02:53,540 --> 00:02:56,570separate it into fivedifferent components.5900:02:56,570 --> 00:02:59,450It has to first say,well there is a read,6000:02:59,450 --> 00:03:01,595then there is a left parenthesis,6100:03:01,595 --> 00:03:04,025then there's anidentifier called n,6200:03:04,025 --> 00:03:07,715then there's a right parenthesis,and finally a semicolon.6300:03:07,715 --> 00:03:11,345And as you can see, thereis no space here at all.6400:03:11,345 --> 00:03:14,150And so the task of finding where6500:03:14,150 --> 00:03:17,705are the words is actuallyquite involved.6600:03:17,705 --> 00:03:19,460Also the classification is6700:03:19,460 --> 00:03:22,055sometimes not so straightforward.6800:03:22,055 --> 00:03:24,350If, for example, a writer6900:03:24,350 --> 00:03:26,360wrote "if" on its own,7000:03:26,360 --> 00:03:29,000then this should be akeyword classified as7100:03:29,000 --> 00:03:32,615a keyword because it'sfrom the if-then-else.7200:03:32,615 --> 00:03:36,800But if the developer wrotesomething longer like "iffoo",7300:03:36,800 --> 00:03:38,030then this might just be7400:03:38,030 --> 00:03:39,860a strange variable nameand he needs to be7500:03:39,860 --> 00:03:44,250classified as a variable name.7600:03:45,220 --> 00:03:50,720The output of the lexer is a sequence of tokens.7700:03:50,720 --> 00:03:53,480These are essentially the words in7800:03:53,480 --> 00:03:56,405a sentence and theirclassification:7900:03:56,405 --> 00:03:59,540they are nouns,verbs or adjectives.8000:03:59,540 --> 00:04:02,885For us, of course, theclassification would be keywords,8100:04:02,885 --> 00:04:06,005identifiers,operators, and so on.8200:04:06,005 --> 00:04:11,480And these tokens are theinput for the parser,8300:04:11,480 --> 00:04:13,085the next component in8400:04:13,085 --> 00:04:17,240our compiler. The parseressentially takes this list8500:04:17,240 --> 00:04:23,300of tokens and transforms itinto a abstract syntax tree.8600:04:23,300 --> 00:04:27,230That means we have now thesentence, we have the words.8700:04:27,230 --> 00:04:30,275We have to relatethe words to each other8800:04:30,275 --> 00:04:33,680in order to find out whatthis sentence actually means.8900:04:33,680 --> 00:04:35,405In this case here,9000:04:35,405 --> 00:04:38,595we have to do a read...whichvariable is affected?9100:04:38,595 --> 00:04:45,040The variable n. Once we havethe abstract syntax tree,9200:04:45,040 --> 00:04:49,225it can go to the next component,to the code generator.9300:04:49,225 --> 00:04:52,480Whilst it doesn't looklike this in this picture,9400:04:52,480 --> 00:04:54,100the code generators is usually the9500:04:54,100 --> 00:04:56,080biggest part in a compiler.9600:04:56,080 --> 00:04:58,720And here we actuallyhave to cut corners.9700:04:58,720 --> 00:05:02,470Instead of producing binarycode, which can be run9800:05:02,470 --> 00:05:06,820directly on a CPU, like X86 or ARM,9900:05:06,820 --> 00:05:11,035we actually target theJava Virtual Machine, the JVM.10000:05:11,035 --> 00:05:13,600This is very similarto the Scala compiler,10100:05:13,600 --> 00:05:16,480for example, whichproduces JVM code.10200:05:16,480 --> 00:05:18,940Or, of course, also the Java compiler,10300:05:18,940 --> 00:05:20,845which produces JVM code.10400:05:20,845 --> 00:05:23,900So here's a typicalexample code which we're10500:05:23,900 --> 00:05:27,305going to produce: Somethinglike variable 210600:05:27,305 --> 00:05:30,545gets allocated in the stack.10700:05:30,545 --> 00:05:32,390You subtract ten from it.10800:05:32,390 --> 00:05:36,050You test whether thevariable is 0 and if yes,10900:05:36,050 --> 00:05:40,170you jump somewhere and otherwiseyou reload the variable.11000:05:41,560 --> 00:05:45,935Even though we cut cornersin the code generater part11100:05:45,935 --> 00:05:48,575by producing code for the JVM,11200:05:48,575 --> 00:05:51,710we can still obtain quiteimpressive results.11300:05:51,710 --> 00:05:56,225Here's a program witha cubic runtime behaviour.11400:05:56,225 --> 00:05:59,525When it has tocalculate with 400 units,11500:05:59,525 --> 00:06:02,165it might need fiveseconds to calculate.11600:06:02,165 --> 00:06:05,345But if it has tocalculate with 1200 units,11700:06:05,345 --> 00:06:08,165it already needs morethan two minutes.11800:06:08,165 --> 00:06:10,940Now these timings,the blue timings11900:06:10,940 --> 00:06:13,910are obtained with aninterpreter of that program.12000:06:13,910 --> 00:06:16,265And you can seejust next to it,12100:06:16,265 --> 00:06:18,050the red times.12200:06:18,050 --> 00:06:20,359They are obtainedwith the compiler12300:06:20,359 --> 00:06:23,135we're going to writein this module.12400:06:23,135 --> 00:06:25,100And there you can see, even12500:06:25,100 --> 00:06:29,000with 1200, the times gethardly off the x-axis.12600:06:29,000 --> 00:06:30,485So in this instance,12700:06:30,485 --> 00:06:32,630our compiler willhave a speed up from12800:06:32,630 --> 00:06:37,589approximately 1 million incomparison to the interpreter.12900:06:38,350 --> 00:06:42,020This might be a fun taskfor your spare time.13000:06:42,020 --> 00:06:44,480This is a compiler explorer,which allows you to13100:06:44,480 --> 00:06:47,060write C code on theleft-hand side.13200:06:47,060 --> 00:06:49,115It shows you on theright-hand side13300:06:49,115 --> 00:06:53,270the machine code anX86 CPU can run.13400:06:53,270 --> 00:06:56,060For example, it givesyou these color scheme and13500:06:56,060 --> 00:06:59,000says that this additionin the num + num in13600:06:59,000 --> 00:07:01,580the return statement,translates to13700:07:01,580 --> 00:07:02,960essentially an addition of13800:07:02,960 --> 00:07:05,930an register in the machine code.13900:07:05,930 --> 00:07:08,480I think this compilerexplorer also works for14000:07:08,480 --> 00:07:11,495the Haskell language andalso produces ARM code,14100:07:11,495 --> 00:07:15,245not just code for the X86 CPUs.14200:07:15,245 --> 00:07:18,950As an aside, I also recommendto watch the movie of14300:07:18,950 --> 00:07:20,870this guy because that is14400:07:20,870 --> 00:07:22,940very much like how I workedat the beginning14500:07:22,940 --> 00:07:25,355when implementing our compiler.14600:07:25,355 --> 00:07:29,300I wrote some code which Iknew how it should behave.14700:07:29,300 --> 00:07:30,725And then I just had a look14800:07:30,725 --> 00:07:32,900what another compiler produces.14900:07:32,900 --> 00:07:37,190And I imitated that codeas much as I could.15000:07:37,190 --> 00:07:39,380Such a compiler explorer15100:07:39,380 --> 00:07:41,375also exists forthe Java language.15200:07:41,375 --> 00:07:42,935Here's one where you can write15300:07:42,935 --> 00:07:44,915Java code on the left-hand side,15400:07:44,915 --> 00:07:47,930and on the right-handside you get JVM code.15500:07:47,930 --> 00:07:50,255JVM code is a byte code15600:07:50,255 --> 00:07:53,405which cannot be rundirectly by the CPU,15700:07:53,405 --> 00:07:55,220but needs the JavaVirtual Machine15800:07:55,220 --> 00:07:57,170to essentially interpret that.15900:07:57,170 --> 00:07:58,760This means it's not16000:07:58,760 --> 00:08:01,235the most efficient wayhow to run programs - 16100:08:01,235 --> 00:08:02,780it would be much faster to16200:08:02,780 --> 00:08:05,285generate directcode for the CPU.16300:08:05,285 --> 00:08:07,700But by producing bytecode,16400:08:07,700 --> 00:08:11,435we still run theprograms quite fast16500:08:11,435 --> 00:08:13,520and it also simplifies16600:08:13,520 --> 00:08:16,280a number of issuesin our compiler.16700:08:16,280 --> 00:08:18,980One issue is aboutmemory management.16800:08:18,980 --> 00:08:22,055We don't have to be concernedabout register allocation,16900:08:22,055 --> 00:08:25,505which we would needto do in a real CPU.17000:08:25,505 --> 00:08:27,680This will be done by the JVM.17100:08:27,680 --> 00:08:29,750It's also much easier to produce17200:08:29,750 --> 00:08:33,635this bytecode rather thanactual machine code.17300:08:33,635 --> 00:08:37,385I think it's now a good timeto come to the question,17400:08:37,385 --> 00:08:39,950why on Earth studying compilers.17500:08:39,950 --> 00:08:42,650Compilers are such anestablished subject17600:08:42,650 --> 00:08:43,985in computer science.17700:08:43,985 --> 00:08:46,100Compilers do what they do.17800:08:46,100 --> 00:08:48,410Probably forrests havebeen killed by17900:08:48,410 --> 00:08:52,190all the books that have beenpublished on compilers.18000:08:52,190 --> 00:08:56,690Why on Earth studyingcompilers in 2020 (and of course in 2021)? 18100:08:56,690 --> 00:08:59,659And even worse: Why implementingyour own compiler?18200:08:59,659 --> 00:09:02,720Well, a slightly humorous take on18300:09:02,720 --> 00:09:05,105that question isgiven by John Regehr,18400:09:05,105 --> 00:09:08,375who is a compiler hackerfrom the University of Utah.18500:09:08,375 --> 00:09:09,770Essentially what he says,18600:09:09,770 --> 00:09:12,110if you're a good compiler hacker,18700:09:12,110 --> 00:09:14,690you have no problemsof finding a job.18800:09:14,690 --> 00:09:17,210He puts it as: It's effectively18900:09:17,210 --> 00:09:22,320a "Perpetual Employment Act"for solid compiler hackers.19000:09:22,990 --> 00:09:27,380Regehr gives two justificationsfor that statement.19100:09:27,380 --> 00:09:29,690First, he sayshardware is getting19200:09:29,690 --> 00:09:32,585weirder, rather thangetting clocked faster.19300:09:32,585 --> 00:09:34,520And that's definitely true.19400:09:34,520 --> 00:09:36,590My first computer many, many,19500:09:36,590 --> 00:09:40,040many years ago containedonly a single core CPU.19600:09:40,040 --> 00:09:44,030And it was such a simple CPUthat we wrote machine code19700:09:44,030 --> 00:09:46,220directly for that CPU in order to19800:09:46,220 --> 00:09:49,740run our programs asfast as possible.19900:09:50,260 --> 00:09:53,600In contrast, today, Regehr writes,20000:09:53,600 --> 00:09:57,005almost all processors aremulti-core nowadays.20100:09:57,005 --> 00:09:59,870And it looks like there'san increasing asymmetry20200:09:59,870 --> 00:10:02,015in resources across cores.20300:10:02,015 --> 00:10:04,445Processors comewith vector units,20400:10:04,445 --> 00:10:07,189crypto accelerators, etc.20500:10:07,189 --> 00:10:11,930We have TSPs, GPUs, ARMbig,little, Xeon Phi,20600:10:11,930 --> 00:10:14,630and this only scratches the surface.20700:10:14,630 --> 00:10:17,255And that is really aproblem for compilers,20800:10:17,255 --> 00:10:20,495because if we nowhave multi-core CPUs,20900:10:20,495 --> 00:10:23,180that means our programs need21000:10:23,180 --> 00:10:26,075to be scheduledover several CPUs.21100:10:26,075 --> 00:10:28,220But the developer, of course,21200:10:28,220 --> 00:10:30,545doesn't want to knowanything about that.21300:10:30,545 --> 00:10:34,655Also, now we have moreCPUs in a computer,21400:10:34,655 --> 00:10:37,400but they seem to also comewith different resources.21500:10:37,400 --> 00:10:40,310So certain tasks ina program need to21600:10:40,310 --> 00:10:43,460be scheduled on somecores, but not on others.21700:10:43,460 --> 00:10:46,685We also have for along time already GPUs,21800:10:46,685 --> 00:10:49,025which are highly specialized for21900:10:49,025 --> 00:10:53,240very parallel computationsto do with graphics.22000:10:53,240 --> 00:10:56,015They at least inthe past few years,22100:10:56,015 --> 00:10:59,360they could only do veryspecial computations,22200:10:59,360 --> 00:11:02,255but very, very fastand highly parallel.22300:11:02,255 --> 00:11:05,539You couldn't use them forgeneral-purpose calculations,22400:11:05,539 --> 00:11:10,205which you would usuallyuse CPUs. Similarly, DSPs.22500:11:10,205 --> 00:11:14,075They are needed for allthe signal processing22600:11:14,075 --> 00:11:16,040in mobile phones.22700:11:16,040 --> 00:11:20,780Without them, we justwouldn't have mobile phones.22800:11:20,780 --> 00:11:23,525The second reason Regehr gives is22900:11:23,525 --> 00:11:25,550that we are getting tired of23000:11:25,550 --> 00:11:27,620low-level languages and23100:11:27,620 --> 00:11:30,335their associatedsecurity disasters.23200:11:30,335 --> 00:11:32,435While at the beginningwe were still23300:11:32,435 --> 00:11:34,760happy to write machinecode directly,23400:11:34,760 --> 00:11:37,175nobody wants to do this nowadays.23500:11:37,175 --> 00:11:39,515He writes: :We wantto write new code23600:11:39,515 --> 00:11:44,120to whatever extent possible insafer high-level languages.23700:11:44,120 --> 00:11:46,130Compilers are caught right23800:11:46,130 --> 00:11:48,365in the middle of theseopposing trends:23900:11:48,365 --> 00:11:50,765one of their main jobsis to have bridged24000:11:50,765 --> 00:11:53,240the large and growing gap between24100:11:53,240 --> 00:11:55,460increasingly high-level languages24200:11:55,460 --> 00:11:58,565and increasinglywacky platforms.24300:11:58,565 --> 00:12:00,320So here you have it:24400:12:00,320 --> 00:12:02,750It's still interesting to study24500:12:02,750 --> 00:12:06,244compilers nowadays,especially nowadays.24600:12:06,244 --> 00:12:09,875Well, if you want to workon interesting problems,24700:12:09,875 --> 00:12:14,570then very often you have toknow compilers. Here's one:24800:12:14,570 --> 00:12:16,940In the good oldtimes when we were24900:12:16,940 --> 00:12:19,310still able to fly on holidays,25000:12:19,310 --> 00:12:23,300I'm sure you flew ina 777 Boeing airplane.25100:12:23,300 --> 00:12:25,850It's actually alreadya quite old airplane.25200:12:25,850 --> 00:12:28,295But they had thefollowing problem.25300:12:28,295 --> 00:12:33,020What happens if a CPUcalculates the wrong result?25400:12:33,020 --> 00:12:36,065And that's actually notsuch a hypothetical problem25500:12:36,065 --> 00:12:40,295because you rememberthe Pentium CPU25600:12:40,295 --> 00:12:43,655in around 2000contained a bug and it cost25700:12:43,655 --> 00:12:48,140a lot of money for Intelto replace this CPU.25800:12:48,140 --> 00:12:51,470What happens if an CPU calculates25900:12:51,470 --> 00:12:56,105the wrong result insome navigation data?26000:12:56,105 --> 00:12:58,520Do you just let theairplane crash?26100:12:58,520 --> 00:13:00,650Well, the engineers at Boeing26200:13:00,650 --> 00:13:02,675came up with thefollowing solution:26300:13:02,675 --> 00:13:05,055They writing one program that26400:13:05,055 --> 00:13:08,690essentially controlshow their airplane26500:13:08,690 --> 00:13:10,295is supposed to fly.26600:13:10,295 --> 00:13:13,040In a programming languagecalled Ada that is26700:13:13,040 --> 00:13:15,770slightly obscureprogramming language26800:13:15,770 --> 00:13:18,650but is very well-known in26900:13:18,650 --> 00:13:22,040areas where safetyis really paramount.27000:13:22,040 --> 00:13:25,010And what they did is theytook this one Ada program27100:13:25,010 --> 00:13:28,010and they essentiallytook three compilers,27200:13:28,010 --> 00:13:29,435independent compilers,27300:13:29,435 --> 00:13:32,045which compiled the programto different CPUs.27400:13:32,045 --> 00:13:33,815One CPU from Intel,27500:13:33,815 --> 00:13:37,235one CPU for Motorola,and one from AMD.27600:13:37,235 --> 00:13:38,930And these are quite different27700:13:38,930 --> 00:13:40,955CPUs. Also some of them27800:13:40,955 --> 00:13:42,755have quite different philosophies27900:13:42,755 --> 00:13:45,245on how they do their calculations.28000:13:45,245 --> 00:13:47,330Now what they could do is, they28100:13:47,330 --> 00:13:50,690could put these three computerson a single board and28200:13:50,690 --> 00:13:54,335could now run all theircalculations in parallel.28300:13:54,335 --> 00:13:56,690One with an IntelCPU, one with28400:13:56,690 --> 00:14:00,245Motorola, and one with a Risc computers say.28500:14:00,245 --> 00:14:02,795And then they couldcompare the results.28600:14:02,795 --> 00:14:05,030And if these results differed,28700:14:05,030 --> 00:14:07,940then they knew one CPU must have28800:14:07,940 --> 00:14:11,600calculated the wrong resultand probably told the pilot,28900:14:11,600 --> 00:14:14,850please don't letthat airplane crash.29000:14:14,950 --> 00:14:17,270Not just Boeing is doing29100:14:17,270 --> 00:14:19,355interesting thingswith compilers.29200:14:19,355 --> 00:14:22,640Also, Airbus in a completelydifferent setting29300:14:22,640 --> 00:14:24,260is using compilers in29400:14:24,260 --> 00:14:26,420a interesting way to get29500:14:26,420 --> 00:14:30,195their airplanes up in theair and let them not crash.29600:14:30,195 --> 00:14:33,010In another example, Ihave friends working29700:14:33,010 --> 00:14:35,350at Facebook who work on29800:14:35,350 --> 00:14:37,900compilers to make senseare they are heaps29900:14:37,900 --> 00:14:41,470and heaps and heaps ofcode written in PHP,30000:14:41,470 --> 00:14:42,880which is one of the most30100:14:42,880 --> 00:14:44,920horrible languages on the planet.30200:14:44,920 --> 00:14:46,630The bottom line is,30300:14:46,630 --> 00:14:50,499compilers are still very,very important nowadays.30400:14:50,499 --> 00:14:52,810And for the foreseeable future,30500:14:52,810 --> 00:14:56,150compilers still needto be developed.30600:14:57,270 --> 00:15:00,235When one talks aboutcompilers then30700:15:00,235 --> 00:15:01,630there is, of course,30800:15:01,630 --> 00:15:05,395a magic element involved. They'relarge software systems.30900:15:05,395 --> 00:15:07,750And yes, they're supposed to31000:15:07,750 --> 00:15:10,525generate a runnable binary for31100:15:10,525 --> 00:15:12,105a high-level program,31200:15:12,105 --> 00:15:15,230but they are also supposedto make our programs better.31300:15:15,230 --> 00:15:18,920They optimize ourprograms to run faster.31400:15:18,920 --> 00:15:22,610And there's a lot of"magic" involved in that.31500:15:22,610 --> 00:15:24,890Magic in inverted quotes.31600:15:24,890 --> 00:15:26,480I would like to show you one of31700:15:26,480 --> 00:15:29,340this magic in one example.31800:15:31,090 --> 00:15:33,110I hope you still have31900:15:33,110 --> 00:15:36,485fond memories of thePEP module last year.32000:15:36,485 --> 00:15:40,280And you might remember BFlanguage, we had to look at.32100:15:40,280 --> 00:15:43,670This BF language containsa kind of memory and32200:15:43,670 --> 00:15:45,680a memory pointer which can32300:15:45,680 --> 00:15:48,140be moved to the leftor to the right,32400:15:48,140 --> 00:15:50,270where an integer in32500:15:50,270 --> 00:15:52,925the memory can be eitherincreased or decreased.32600:15:52,925 --> 00:15:55,730We can printout the content of32700:15:55,730 --> 00:15:59,645the current cell or inputsomething into a cell.32800:15:59,645 --> 00:16:01,850And we have loops, and32900:16:01,850 --> 00:16:04,220everything else isconsidered a comment.33000:16:04,220 --> 00:16:06,290What's good aboutthis BF language is that 33100:16:06,290 --> 00:16:08,180we don't even need a front end.33200:16:08,180 --> 00:16:09,920We can immediately start writing33300:16:09,920 --> 00:16:13,529an interpreter or a compiler.33400:16:15,850 --> 00:16:18,155Okay, I have here33500:16:18,155 --> 00:16:20,600a very straightforwardinterpreter for33600:16:20,600 --> 00:16:22,865the BF language. Youmight recognize it.33700:16:22,865 --> 00:16:27,120And I run it with abenchmark program, which you 33800:16:27,760 --> 00:16:30,960might also recognize.33900:16:31,560 --> 00:16:36,835And this will now takeapproximately ten minutes.34000:16:36,835 --> 00:16:39,920So see you in a bit.34100:16:40,710 --> 00:16:43,660Okay, this has finished now.34200:16:43,660 --> 00:16:45,925Almost took 11 minutes.34300:16:45,925 --> 00:16:47,410The question now is,34400:16:47,410 --> 00:16:51,820can we to better? 34500:16:51,820 --> 00:16:53,260Actually, it is not difficultto do better34600:16:53,260 --> 00:16:54,970than this simple-mindedinterpreter.34700:16:54,970 --> 00:16:58,690It is relativelystraightforward to take34800:16:58,690 --> 00:17:03,520a BF program and generate equivalent C code.34900:17:03,520 --> 00:17:06,520This can be easilydone by somehow35000:17:06,520 --> 00:17:09,490representing the BF memory in C.35100:17:09,490 --> 00:17:12,310We can do thisby just an array of35200:17:12,310 --> 00:17:15,380characters and a memory pointer,35300:17:15,380 --> 00:17:19,265which points, in this casein the middle of this array.35400:17:19,265 --> 00:17:21,800Then it's very easy to35500:17:21,800 --> 00:17:24,275translate the movementof the 35600:17:24,275 --> 00:17:28,610BF memory pointer into increments35700:17:28,610 --> 00:17:31,520and decrements ofthe C pointer.35800:17:31,520 --> 00:17:33,050Similarly, if you want to increment or35900:17:33,050 --> 00:17:35,915decrement an elementin this array,36000:17:35,915 --> 00:17:38,975you just have to look up whatthe pointer is and increment36100:17:38,975 --> 00:17:42,140and decrement what'sunder the pointer. Similarly36200:17:42,140 --> 00:17:43,790we can print out something from36300:17:43,790 --> 00:17:47,450this array and we can putsomething into this array.36400:17:47,450 --> 00:17:49,610What is great is that the loops36500:17:49,610 --> 00:17:51,530from the BF language directly36600:17:51,530 --> 00:17:55,580translate into whileloops of the C language.36700:17:55,580 --> 00:17:58,100We essentially have to check is36800:17:58,100 --> 00:18:02,450the memory pointer pointingto a field that is 0.36900:18:02,450 --> 00:18:03,950Then we exit the loop,37000:18:03,950 --> 00:18:05,719and otherwise we continue.37100:18:05,719 --> 00:18:09,575And that can be donenicely in C, like so.37200:18:09,575 --> 00:18:12,995And everything else isagain, just a comment.37300:18:12,995 --> 00:18:16,445So I have implementedthis translation for you.37400:18:16,445 --> 00:18:19,700Remember this was the BFprogram for the Mandelbrot Set.37500:18:19,700 --> 00:18:23,690The equivalent C codewould look like this.37600:18:23,690 --> 00:18:27,110So you can see at the beginningis this generation of37700:18:27,110 --> 00:18:30,140the BF memory represented37800:18:30,140 --> 00:18:32,345as an array and thememory pointer.37900:18:32,345 --> 00:18:34,550And then inside there are lots38000:18:34,550 --> 00:18:36,770of increments and decrements38100:18:36,770 --> 00:18:41,915of pointers and alsocontents of this array.38200:18:41,915 --> 00:18:45,199Now fingers crossed that thisis the correct translation.38300:18:45,199 --> 00:18:48,125And I can also run this38400:18:48,125 --> 00:18:52,805and you should see that it runssubstantially faster.38500:18:52,805 --> 00:18:55,880I'm using now my GCC on38600:18:55,880 --> 00:19:01,040my computer to generate anexecutable for the C program.38700:19:01,040 --> 00:19:04,295And it should run forapproximately 20 seconds.38800:19:04,295 --> 00:19:07,620So let's just wait for that.38900:19:11,430 --> 00:19:14,950Okay. What is important to note39000:19:14,950 --> 00:19:19,885here is that I'm running GCC,39100:19:19,885 --> 00:19:22,450this the option -O0.39200:19:22,450 --> 00:19:25,600That means I tell GCC to39300:19:25,600 --> 00:19:28,840generate a binary which Ican run as you can see.39400:19:28,840 --> 00:19:31,810But don't apply any optimization.39500:19:31,810 --> 00:19:38,395Keep this in mind.39600:19:38,395 --> 00:19:42,595As a second try, of course, I can try to generate a better C program.39700:19:42,595 --> 00:19:46,060And as you'll remember fromthe PEP course, it can,39800:19:46,060 --> 00:19:50,095for example, combineseveral steps39900:19:50,095 --> 00:19:51,670going to the right of the memory40000:19:51,670 --> 00:19:53,310pointer or to the left.40100:19:53,310 --> 00:19:55,280We can combine that into40200:19:55,280 --> 00:19:58,760one single increment ordecrement of not just one,40300:19:58,760 --> 00:20:00,050but of like n,40400:20:00,050 --> 00:20:02,570where n is greaterthan 1. Similarly, 40500:20:02,570 --> 00:20:06,980if we increment or decrementthe content of this array,40600:20:06,980 --> 00:20:09,740we can do this in onebig step by incrementing40700:20:09,740 --> 00:20:12,710that by not just oneand increment by one,40800:20:12,710 --> 00:20:15,635but increment and decrementby bigger numbers.40900:20:15,635 --> 00:20:18,870Everything else stays the same.41000:20:20,830 --> 00:20:23,270Again, I have implemented that41100:20:23,270 --> 00:20:24,950for you and you can see now41200:20:24,950 --> 00:20:26,835the C program moves41300:20:26,835 --> 00:20:30,455the memory pointer and biggerchunks and also increases,41400:20:30,455 --> 00:20:32,810for example, here, memory content41500:20:32,810 --> 00:20:35,555by 15 than just by 1.41600:20:35,555 --> 00:20:38,520Now let's run this program.41700:20:38,530 --> 00:20:40,760Again, I use GCC41800:20:40,760 --> 00:20:45,350to compile theC program and run it.41900:20:45,350 --> 00:20:49,025And again, I made surethat it only runs this with42000:20:49,025 --> 00:20:51,530no optimizations switched on.42100:20:51,530 --> 00:20:54,050So it runs with minus O0.42200:20:54,050 --> 00:20:56,090And you can seeit's now down from42300:20:56,090 --> 00:20:59,99020 seconds to just 6 seconds.42400:20:59,990 --> 00:21:06,065I show you, the GCC iscalled with -O0.42500:21:06,065 --> 00:21:08,990So this reductionin time is purely42600:21:08,990 --> 00:21:12,755because I produced better C code,42700:21:12,755 --> 00:21:17,220which the compiler then justtransformed into a binary.42800:21:18,910 --> 00:21:22,055So far there'snothing interesting.42900:21:22,055 --> 00:21:25,385We used in the first instance43000:21:25,385 --> 00:21:29,240single increments and use GCC with43100:21:29,240 --> 00:21:31,700O0 to not introduce43200:21:31,700 --> 00:21:35,255any optimizations and it runsessentially for 20 seconds.43300:21:35,255 --> 00:21:37,880If we then increment43400:21:37,880 --> 00:21:40,895in bigger chunks ordecrement in bigger chunks,43500:21:40,895 --> 00:21:45,380use again GCC with -O0,43600:21:45,380 --> 00:21:50,030then it runs in approximatelyfive to six seconds.43700:21:50,030 --> 00:21:51,950Now let me do the following:43800:21:51,950 --> 00:21:55,430I take the first program which43900:21:55,430 --> 00:21:58,070increments everythingin single steps44000:21:58,070 --> 00:22:00,185or decrements in single steps.44100:22:00,185 --> 00:22:04,835But now I use the fullcapacity of the GCC compiler44200:22:04,835 --> 00:22:06,560and I tell it,44300:22:06,560 --> 00:22:11,370please do introduce someoptimizations as you want.44400:22:11,590 --> 00:22:15,799And I'm now runningexactly the same program...44500:22:15,799 --> 00:22:17,870just the GCC compiler will44600:22:17,870 --> 00:22:22,325now have the optimizationsswitched on.44700:22:22,325 --> 00:22:24,380Let's see what happens.44800:22:24,380 --> 00:22:27,320One first needs to compile it.44900:22:27,320 --> 00:22:29,795And that takes a little while.45000:22:29,795 --> 00:22:32,000Okay, this has nowfinished and also45100:22:32,000 --> 00:22:34,115the calculation of thepicture has finished.45200:22:34,115 --> 00:22:35,960And as you can see, it took45300:22:35,960 --> 00:22:38,510approximately eightseconds to calculate.45400:22:38,510 --> 00:22:41,645That is down fromapproximately 20 seconds.45500:22:41,645 --> 00:22:46,040So the difference fromswitching from -O0 to45600:22:46,040 --> 00:22:51,935-O3 meant that nowthe program runs almost as45700:22:51,935 --> 00:22:54,800fast as where I by hand45800:22:54,800 --> 00:22:58,610combined several stepsinto a big chunk.45900:22:58,610 --> 00:23:00,170That is essentially what46000:23:00,170 --> 00:23:03,485the GCC compilerfound out on its own.46100:23:03,485 --> 00:23:05,840That rather than jumping46200:23:05,840 --> 00:23:08,465in just single increments by one,46300:23:08,465 --> 00:23:11,510they can be all combinedinto bigger chunks.46400:23:11,510 --> 00:23:16,595It hasn't been as successfulas if I do this explicitly.46500:23:16,595 --> 00:23:18,620But that is the magic that46600:23:18,620 --> 00:23:22,560the compiler essentially foundout, that this can be done.46700:23:22,960 --> 00:23:25,700Just a quick recap of what I did.46800:23:25,700 --> 00:23:28,160I first run the Mandelbrot program with46900:23:28,160 --> 00:23:31,730an interpreter and it tookapproximately 11 minutes.47000:23:31,730 --> 00:23:36,559Then I had a simple compilergenerating a C program.47100:23:36,559 --> 00:23:40,880And the C compiler then hadno optimization switched on.47200:23:40,880 --> 00:23:44,645In the C program had onlysmall single increments and47300:23:44,645 --> 00:23:46,820small jumps. Then it took47400:23:46,820 --> 00:23:49,775approximately 20 seconds forto same program.47500:23:49,775 --> 00:23:52,460Then I had a more advancedcompiler which does47600:23:52,460 --> 00:23:55,730big increments andalso big jumps.47700:23:55,730 --> 00:23:57,470But again, the compiler didn't47800:23:57,470 --> 00:23:59,210introduce any optimization on its47900:23:59,210 --> 00:24:02,915own. Then it tookapproximately five seconds.48000:24:02,915 --> 00:24:05,210Then I went back to48100:24:05,210 --> 00:24:08,465the simple compiler withonly small steps.48200:24:08,465 --> 00:24:10,400But then told the C compiler48300:24:10,400 --> 00:24:12,950to fully optimize this program.48400:24:12,950 --> 00:24:14,690And then it took moreor less the same48500:24:14,690 --> 00:24:17,465time as the more advanced compiler.48600:24:17,465 --> 00:24:20,465I encourage you tolook at this yourself.48700:24:20,465 --> 00:24:24,240As usual, all the programsare uploaded on KEATS.48800:24:25,090 --> 00:24:27,590To finish thisintroduction video,48900:24:27,590 --> 00:24:30,170let me give you anextremely brief overview49000:24:30,170 --> 00:24:32,855over the history of compilers.49100:24:32,855 --> 00:24:35,450While I argued at the beginning49200:24:35,450 --> 00:24:38,915that it's interesting tostudy compilers nowadays,49300:24:38,915 --> 00:24:40,400obviously in this field,49400:24:40,400 --> 00:24:43,295we are standing on theshoulders of giants.49500:24:43,295 --> 00:24:46,520The field started with Turing and Turing machines,49600:24:46,520 --> 00:24:49,130which were introduced in 1936.49700:24:49,130 --> 00:24:52,175Turing machines already hadthis concept of memory49800:24:52,175 --> 00:24:55,190and also programsof Turing machines49900:24:55,190 --> 00:24:58,775needed to be translatedbetween different formalisms.50000:24:58,775 --> 00:25:01,100Regular expressions,which will play50100:25:01,100 --> 00:25:03,905an important role in ourlexer of our compiler,50200:25:03,905 --> 00:25:06,800were introduced in 1956.50300:25:06,800 --> 00:25:10,370Arguably the first compilerwas introduced in50400:25:10,370 --> 00:25:13,8501957 for a language called COBOL.50500:25:13,850 --> 00:25:16,550This was done by Grace Hopper.50600:25:16,550 --> 00:25:18,770And I recommend thatyou have and look50700:25:18,770 --> 00:25:20,900at this interviewof Grace Hopper,50800:25:20,900 --> 00:25:22,130which shows she must have been50900:25:22,130 --> 00:25:24,424a very interesting personality.51000:25:24,424 --> 00:25:27,770But despite thematurity of this field,51100:25:27,770 --> 00:25:29,465it might be surprising that51200:25:29,465 --> 00:25:31,505research papers arestill published.51300:25:31,505 --> 00:25:34,835And we will find thatout in the module.51400:25:34,835 --> 00:25:37,760And a number ofproblems which we would51500:25:37,760 --> 00:25:41,270assume are already solved by now,51600:25:41,270 --> 00:25:43,730actually turning outthat they are not solved.51700:25:43,730 --> 00:25:45,620Here in this blog post,51800:25:45,620 --> 00:25:47,825for example, Laurie Tratt51900:25:47,825 --> 00:25:49,550who is also from this department,52000:25:49,550 --> 00:25:51,740argued that parsing, for example,52100:25:51,740 --> 00:25:54,035isn't a solved problem yet.52200:25:54,035 --> 00:25:56,300I hope you will have as much fun52300:25:56,300 --> 00:25:58,130with this module as I have.52400:25:58,130 --> 00:26:00,750Thank you very muchfor listening.