3 \usepackage{../style} |
3 \usepackage{../style} |
4 \usepackage{../langs} |
4 \usepackage{../langs} |
5 \usepackage{../grammar} |
5 \usepackage{../grammar} |
6 \usepackage{../graphics} |
6 \usepackage{../graphics} |
7 \usepackage{amssymb} |
7 \usepackage{amssymb} |
|
8 \usepackage{xcolor} |
|
9 |
|
10 \usepackage[most]{tcolorbox} |
|
11 |
|
12 \newtcbox{\hm}[1][]{% |
|
13 size=fbox, |
|
14 tcbox raise base, nobeforeafter, |
|
15 enhanced, colframe=gray, |
|
16 colback=gray!30, boxrule=1pt, |
|
17 fontupper=\ttfamily, |
|
18 #1} |
8 |
19 |
9 % modern compilers are different |
20 % modern compilers are different |
10 % https://channel9.msdn.com/Blogs/Seth-Juarez/Anders-Hejlsberg-on-Modern-Compiler-Construction |
21 % https://channel9.msdn.com/Blogs/Seth-Juarez/Anders-Hejlsberg-on-Modern-Compiler-Construction |
11 |
22 |
12 %halting problem |
23 %halting problem |
13 %https://dfilaretti.github.io/2017-04-30/halting-problem |
24 %https://dfilaretti.github.io/2017-04-30/halting-problem |
|
25 |
|
26 |
14 |
27 |
15 |
28 |
16 \begin{document} |
29 \begin{document} |
17 \lstset{morekeywords={def,if,then,else,write,read},keywordstyle=\color{codepurple}\bfseries} |
30 \lstset{morekeywords={def,if,then,else,write,read},keywordstyle=\color{codepurple}\bfseries} |
18 |
31 |
24 primitive and the compiler rather crude---everything was |
37 primitive and the compiler rather crude---everything was |
25 essentially compiled into a big monolithic chunk of code |
38 essentially compiled into a big monolithic chunk of code |
26 inside the main function. In this handout we like to have a |
39 inside the main function. In this handout we like to have a |
27 look at a slightly more comfortable language, which I call |
40 look at a slightly more comfortable language, which I call |
28 Fun-language, and a tiny-teeny bit more realistic compiler. |
41 Fun-language, and a tiny-teeny bit more realistic compiler. |
29 The Fun-language is a functional programming language. A small |
42 The Fun-language is a small functional programming language. A small |
30 collection of programs we want to be able to write and compile |
43 collection of programs we want to be able to write and compile |
31 is as follows: |
44 is as follows: |
32 |
45 |
33 \begin{lstlisting}[numbers=none] |
46 \begin{lstlisting}[numbers=none] |
34 def fib(n) = if n == 0 then 0 |
47 def fib(n) = if n == 0 then 0 |
42 else ack(m - 1, ack(m, n - 1)); |
55 else ack(m - 1, ack(m, n - 1)); |
43 |
56 |
44 def gcd(a, b) = if b == 0 then a else gcd(b, a % b); |
57 def gcd(a, b) = if b == 0 then a else gcd(b, a % b); |
45 \end{lstlisting} |
58 \end{lstlisting} |
46 |
59 |
47 \noindent Compare the code of the fib-program with the same |
60 \noindent Compare the code of the fib-program with the same program |
48 program written in the While-language\ldots Fun is definitely |
61 written in the While-language\ldots Fun is definitely more comfortable. |
49 more comfortable. We will still focus on programs involving |
62 We will still focus on programs involving integers only, that means for |
50 integers only, that means for example that every function is |
63 example that every function in Fun is expected to return an integer. The |
51 expected to return an integer. The point of the Fun language |
64 point of the Fun language is to compile each function to a separate |
52 is to compile each function to a separate method in JVM |
65 method in JVM bytecode (not just a big monolithic code chunk). The means |
53 bytecode (not just a big monolithic code chunk). The means we |
66 we need to adapt to some of the conventions of the JVM about methods. |
54 need to adapt to some of the conventions of the JVM about |
|
55 methods. |
|
56 |
67 |
57 The grammar of the Fun-language is slightly simpler than the |
68 The grammar of the Fun-language is slightly simpler than the |
58 While-language, because the main syntactic category are |
69 While-language, because the main syntactic category are |
59 expressions (we do not have statements). The grammar rules are |
70 expressions (we do not have statements). The grammar rules are |
60 as follows: |
71 as follows: |
61 |
72 |
62 |
73 |
63 \begin{plstx}[rhs style=,margin=1.5cm] |
74 \begin{plstx}[rhs style=,margin=1.5cm] |
64 : \meta{Exp} ::= \meta{Id} | \meta{Num} {\hspace{3.7cm}} |
75 : \meta{Exp} ::= \meta{Id} | \meta{Num} {\hspace{3.7cm}} |
65 | \meta{Exp} + \meta{Exp} | ... | (\meta{Exp}) {\hspace{3.7cm}} |
76 | \meta{Exp} + \meta{Exp} | ... | (\meta{Exp}) {\hspace{3.7cm}} |
66 | \code{if} \meta{BExp} \code{then} \meta{Exp} \code{else} \meta{Exp} |
77 | \code{if} \; \meta{BExp} \; \code{then} \; \meta{Exp} \; \code{else} \; \meta{Exp} |
67 | \code{write} \meta{Exp} {\hspace{5cm}} |
78 | \code{write} \meta{Exp} {\hspace{5cm}} |
68 | \meta{Exp} ; \meta{Exp} {\hspace{5cm}} |
79 | \meta{Exp} ; \meta{Exp} {\hspace{5cm}} |
69 | \textit{FunName} (\meta{Exp}, ..., \meta{Exp})\\ |
80 | \textit{FunName} (\meta{Exp}, ..., \meta{Exp})\\ |
70 : \meta{BExp} ::= ...\\ |
81 : \meta{BExp} ::= ...\\ |
71 : \meta{Decl} ::= \meta{Def} ; \meta{Decl} |
82 : \meta{Decl} ::= \meta{Def} ; \meta{Decl} |
118 args: List[String], |
129 args: List[String], |
119 body: Exp) extends Decl |
130 body: Exp) extends Decl |
120 case class Main(e: Exp) extends Decl |
131 case class Main(e: Exp) extends Decl |
121 \end{lstlisting}} |
132 \end{lstlisting}} |
122 |
133 |
123 Let us first look at some clauses for compiling expressions. |
134 Let us first look at some clauses for compiling expressions. The |
124 The compilation of arithmetic and boolean expressions is just |
135 compilation of arithmetic and boolean expressions is just like for the |
125 like for the While-language and do not need any modification |
136 While-language and does not need any modification (recall that the |
126 (recall that the \textit{compile}-function for boolean |
137 \textit{compile}-function for boolean expressions takes a third argument |
127 expression takes a third argument for the label where the |
138 for the label where the control-flow should jump when the boolean |
128 control-flow should jump when the boolean expression is |
139 expression is \emph{not} true---this is needed for compiling |
129 \emph{not} true---this is needed for compiling \pcode{if}s). |
140 \pcode{if}s). One additional feature in the Fun-language are sequences. |
130 One additional feature in the Fun-language are sequences. |
141 Their purpose is to do one calculation after another or printing out an |
131 Their purpose is to do one calculation after another. The |
142 intermediate result. The reason why we need to be careful however is the |
132 reason why we need to be careful however is the convention |
143 convention that every expression can only produce s single result |
133 that every expression can only produce s single result |
144 (including sequences). Since this result will be on the top of the |
134 (including sequences). Since this result will be on the |
145 stack, we need to generate a \pcode{pop}-instruction for sequences in |
135 top of the stack, we need to generate a |
146 order to clean-up the stack. For example, for an expression of the form |
136 \pcode{pop}-instruction in order to clean-up the stack. Given |
147 \pcode{exp1 ; exp2} we need to generate code where after the first code |
137 the expression of the form \pcode{exp1 ; exp2} we need to |
148 chunk a \pcode{pop}-instruction is needed. |
138 generate code where after the first code chunk |
|
139 a \pcode{pop}-instruction is needed. |
|
140 |
149 |
141 \begin{lstlisting}[language=JVMIS, numbers=none,mathescape] |
150 \begin{lstlisting}[language=JVMIS, numbers=none,mathescape] |
142 $\textrm{\textit{compile}}($exp1$)$ |
151 $\textrm{\textit{compile}}($exp1$)$ |
143 pop |
152 pop |
144 $\textrm{\textit{compile}}($exp2$)$ |
153 $\textrm{\textit{compile}}($exp2$)$ |
330 call returns we have the result of the addition on the top of |
339 call returns we have the result of the addition on the top of |
331 the stack and just need to call suc. Finally, we can return |
340 the stack and just need to call suc. Finally, we can return |
332 the result on top of the stack as the result of the |
341 the result on top of the stack as the result of the |
333 add-function. |
342 add-function. |
334 |
343 |
|
344 \subsection*{Tail-Call Optimisations} |
|
345 |
|
346 Let us now briefly touch upon the vast topic of compiler optimisations. |
|
347 As an example, let's perform tail-call optimisations for our |
|
348 Fun-language. Consider the following version of the factorial function: |
|
349 |
|
350 \begin{lstlisting} |
|
351 def facT(n, acc) = |
|
352 if n == 0 then acc |
|
353 else facT(n - 1, n * acc); |
|
354 \end{lstlisting} |
|
355 |
|
356 \noindent |
|
357 The corrsponding JVM code for this function is below: |
|
358 |
|
359 \begin{lstlisting}[language=JVMIS, numbers=left] |
|
360 .method public static facT(II)I |
|
361 .limit locals 2 |
|
362 .limit stack 6 |
|
363 iload 0 |
|
364 ldc 0 |
|
365 if_icmpne If_else_2 |
|
366 iload 1 |
|
367 goto If_end_3 |
|
368 If_else_2: |
|
369 iload 0 |
|
370 ldc 1 |
|
371 isub |
|
372 iload 0 |
|
373 iload 1 |
|
374 imul |
|
375 invokestatic fact/fact/facT(II)I |
|
376 If_end_3: |
|
377 ireturn |
|
378 .end method |
|
379 \end{lstlisting} |
|
380 |
|
381 \noindent |
|
382 The interesting part is in Lines 10 to 16. Since the \code{facT} |
|
383 function is recursive, we have also a recursive call in Line 16 in the |
|
384 JVM code. The problem is that before we can make the recursive call, we |
|
385 need to put the two arguments, namely \code{n - 1} and \code{n * acc}, |
|
386 onto the stack. That is how we communicate arguments to a function. To |
|
387 see the the difficulty, imagine you call this function 1000 times |
|
388 recursively. Each call results in some hefty overhead on the |
|
389 stack---ultimately leading to a stack overflow. Well, it is possible to |
|
390 avoid this overhead completely in many circumstances. This is what |
|
391 tail-call optimisations are about. |
|
392 |
|
393 Note that the call to \code{facT} in the program is the last instruction |
|
394 before the \code{ireturn} (the label in Line 17 does not count since it |
|
395 is not an instruction). Also remember, before we make the recursive call |
|
396 the arguments of \code{facT} need to put on the stack. Once we are |
|
397 ``inside'' the arguments on the stack turn into local variables. Therefore |
|
398 \code{n} and \code{acc} are referenced inside the function with \pcode{iload 0} |
|
399 and \pcode{iload 1} respectively. |
|
400 |
|
401 The idea of tail-call optimisation is to eliminate the expensive recursive |
|
402 functions call and replace it by a simple jump back to the beginning of the |
|
403 function. To make this work we have to change how we communicate the arguments |
|
404 to the next level of ``recursion'': we cannot use the stack, but have to load |
|
405 the argument into the corresponding local variables. This gives the following |
|
406 code |
|
407 |
|
408 \begin{lstlisting}[language=JVMIS, numbers=left, escapeinside={(*@}{@*)}] |
|
409 .method public static facT(II)I |
|
410 .limit locals 2 |
|
411 .limit stack 6 |
|
412 (*@\hm{facT\_Start:} @*) |
|
413 iload 0 |
|
414 ldc 0 |
|
415 if_icmpne If_else_2 |
|
416 iload 1 |
|
417 goto If_end_3 |
|
418 If_else_2: |
|
419 iload 0 |
|
420 ldc 1 |
|
421 isub |
|
422 iload 0 |
|
423 iload 1 |
|
424 imul |
|
425 (*@\hm{istore 1} @*) |
|
426 (*@\hm{istore 0} @*) |
|
427 (*@\hm{goto facT\_Start} @*) |
|
428 If_end_3: |
|
429 ireturn |
|
430 .end method |
|
431 \end{lstlisting} |
|
432 |
|
433 \noindent |
|
434 In Line 4 we introduce a label for indicating where the start of the |
|
435 function is. Important are Lines 17 and 18 where we store the values |
|
436 from the stack into local variables. When we then jump back to the |
|
437 beginning of the function (in Line 19) it will look like to the |
|
438 function as if the function had been called the normal way via values |
|
439 on the stack. But because of the jump, clearly, no memory on the |
|
440 stack is needed. In effect we replaced a recursive call with a simple |
|
441 loop. |
|
442 |
|
443 Can this optimisation always be applied? Unfortunately not. The |
|
444 recursive call needs to be in tail-call position, that is the last |
|
445 operation needs to be the recursive call. This is for example |
|
446 not the case with the usual formulation of the factorial function. |
|
447 Consider again the Fun-program |
|
448 |
|
449 \begin{lstlisting}[numbers=none] |
|
450 def fact(n) = if n == 0 then 1 |
|
451 else n * fact(n - 1) |
|
452 \end{lstlisting} |
|
453 |
|
454 \noindent |
|
455 In this version of the factorial function the recursive call is |
|
456 \emph{not} the last operation (which can also been seen very clearly |
|
457 in the generated JVM code). Because of this, the plumbing of local |
|
458 variables would not work and in effect the optimisation is not applicable. |
|
459 Very roughly speaking the tail-position of a function is in the two |
|
460 highlighted places |
|
461 |
|
462 \begin{itemize} |
|
463 \item \texttt{if Bexp then \hm{Exp} else \hm{Exp}} |
|
464 \item \texttt{Exp ; \hm{Exp}} |
|
465 \end{itemize} |
|
466 |
|
467 To sum up, the compiler needs to recognise when a recursive call |
|
468 is in tail-position. It then can apply the tail-call optimisations |
|
469 technique, which is well known and widely implemented in compilers |
|
470 for functional programming languages. |
|
471 |
|
472 |
335 \end{document} |
473 \end{document} |
336 |
474 |
337 %%% Local Variables: |
475 %%% Local Variables: |
338 %%% mode: latex |
476 %%% mode: latex |
339 %%% TeX-master: t |
477 %%% TeX-master: t |