handouts/ho08.tex
changeset 693 605d971e98fd
parent 604 9e75249e96f2
child 699 7dac350b20a1
equal deleted inserted replaced
692:8c7ccdebcb89 693:605d971e98fd
     3 \usepackage{../style}
     3 \usepackage{../style}
     4 \usepackage{../langs}
     4 \usepackage{../langs}
     5 \usepackage{../grammar}
     5 \usepackage{../grammar}
     6 \usepackage{../graphics}
     6 \usepackage{../graphics}
     7 \usepackage{amssymb}
     7 \usepackage{amssymb}
       
     8 \usepackage{xcolor}
       
     9 
       
    10 \usepackage[most]{tcolorbox}
       
    11 
       
    12 \newtcbox{\hm}[1][]{%
       
    13      size=fbox,
       
    14     tcbox raise base, nobeforeafter, 
       
    15     enhanced, colframe=gray,
       
    16     colback=gray!30, boxrule=1pt,
       
    17     fontupper=\ttfamily,
       
    18     #1}
     8 
    19 
     9 % modern compilers are different
    20 % modern compilers are different
    10 % https://channel9.msdn.com/Blogs/Seth-Juarez/Anders-Hejlsberg-on-Modern-Compiler-Construction
    21 % https://channel9.msdn.com/Blogs/Seth-Juarez/Anders-Hejlsberg-on-Modern-Compiler-Construction
    11 
    22 
    12 %halting problem
    23 %halting problem
    13 %https://dfilaretti.github.io/2017-04-30/halting-problem
    24 %https://dfilaretti.github.io/2017-04-30/halting-problem
       
    25 
       
    26 
    14 
    27 
    15 
    28 
    16 \begin{document}
    29 \begin{document}
    17 \lstset{morekeywords={def,if,then,else,write,read},keywordstyle=\color{codepurple}\bfseries}
    30 \lstset{morekeywords={def,if,then,else,write,read},keywordstyle=\color{codepurple}\bfseries}
    18 
    31 
    24 primitive and the compiler rather crude---everything was
    37 primitive and the compiler rather crude---everything was
    25 essentially compiled into a big monolithic chunk of code
    38 essentially compiled into a big monolithic chunk of code
    26 inside the main function. In this handout we like to have a
    39 inside the main function. In this handout we like to have a
    27 look at a slightly more comfortable language, which I call
    40 look at a slightly more comfortable language, which I call
    28 Fun-language, and a tiny-teeny bit more realistic compiler.
    41 Fun-language, and a tiny-teeny bit more realistic compiler.
    29 The Fun-language is a functional programming language. A small
    42 The Fun-language is a small functional programming language. A small
    30 collection of programs we want to be able to write and compile
    43 collection of programs we want to be able to write and compile
    31 is as follows:
    44 is as follows:
    32 
    45 
    33 \begin{lstlisting}[numbers=none]
    46 \begin{lstlisting}[numbers=none]
    34 def fib(n) = if n == 0 then 0 
    47 def fib(n) = if n == 0 then 0 
    42                 else ack(m - 1, ack(m, n - 1));
    55                 else ack(m - 1, ack(m, n - 1));
    43                 
    56                 
    44 def gcd(a, b) = if b == 0 then a else gcd(b, a % b);                
    57 def gcd(a, b) = if b == 0 then a else gcd(b, a % b);                
    45 \end{lstlisting}
    58 \end{lstlisting}
    46 
    59 
    47 \noindent Compare the code of the fib-program with the same
    60 \noindent Compare the code of the fib-program with the same program
    48 program written in the While-language\ldots Fun is definitely
    61 written in the While-language\ldots Fun is definitely more comfortable.
    49 more comfortable. We will still focus on programs involving
    62 We will still focus on programs involving integers only, that means for
    50 integers only, that means for example that every function is
    63 example that every function in Fun is expected to return an integer. The
    51 expected to return an integer. The point of the Fun language
    64 point of the Fun language is to compile each function to a separate
    52 is to compile each function to a separate method in JVM
    65 method in JVM bytecode (not just a big monolithic code chunk). The means
    53 bytecode (not just a big monolithic code chunk). The means we
    66 we need to adapt to some of the conventions of the JVM about methods.
    54 need to adapt to some of the conventions of the JVM about
       
    55 methods.
       
    56 
    67 
    57 The grammar of the Fun-language is slightly simpler than the
    68 The grammar of the Fun-language is slightly simpler than the
    58 While-language, because the main syntactic category are
    69 While-language, because the main syntactic category are
    59 expressions (we do not have statements). The grammar rules are
    70 expressions (we do not have statements). The grammar rules are
    60 as follows:
    71 as follows:
    61 
    72 
    62 
    73 
    63 \begin{plstx}[rhs style=,margin=1.5cm]
    74 \begin{plstx}[rhs style=,margin=1.5cm]
    64 : \meta{Exp} ::= \meta{Id} | \meta{Num} {\hspace{3.7cm}}
    75 : \meta{Exp} ::= \meta{Id} | \meta{Num} {\hspace{3.7cm}}
    65              |   \meta{Exp} + \meta{Exp} | ... | (\meta{Exp}) {\hspace{3.7cm}}
    76              |   \meta{Exp} + \meta{Exp} | ... | (\meta{Exp}) {\hspace{3.7cm}}
    66              |   \code{if} \meta{BExp} \code{then} \meta{Exp} \code{else} \meta{Exp}
    77              |   \code{if} \; \meta{BExp} \; \code{then} \; \meta{Exp} \; \code{else} \; \meta{Exp}
    67              |   \code{write} \meta{Exp} {\hspace{5cm}}
    78              |   \code{write} \meta{Exp} {\hspace{5cm}}
    68              |   \meta{Exp} ; \meta{Exp}  {\hspace{5cm}}
    79              |   \meta{Exp} ; \meta{Exp}  {\hspace{5cm}}
    69              |   \textit{FunName} (\meta{Exp}, ..., \meta{Exp})\\
    80              |   \textit{FunName} (\meta{Exp}, ..., \meta{Exp})\\
    70 : \meta{BExp} ::= ...\\
    81 : \meta{BExp} ::= ...\\
    71 : \meta{Decl} ::= \meta{Def} ; \meta{Decl}
    82 : \meta{Decl} ::= \meta{Def} ; \meta{Decl}
   118                args: List[String], 
   129                args: List[String], 
   119                body: Exp) extends Decl
   130                body: Exp) extends Decl
   120 case class Main(e: Exp) extends Decl
   131 case class Main(e: Exp) extends Decl
   121 \end{lstlisting}}
   132 \end{lstlisting}}
   122 
   133 
   123 Let us first look at some clauses for compiling expressions.
   134 Let us first look at some clauses for compiling expressions. The
   124 The compilation of arithmetic and boolean expressions is just
   135 compilation of arithmetic and boolean expressions is just like for the
   125 like for the While-language and do not need any modification
   136 While-language and does not need any modification (recall that the
   126 (recall that the \textit{compile}-function for boolean
   137 \textit{compile}-function for boolean expressions takes a third argument
   127 expression takes a third argument for the label where the
   138 for the label where the control-flow should jump when the boolean
   128 control-flow should jump when the boolean expression is
   139 expression is \emph{not} true---this is needed for compiling
   129 \emph{not} true---this is needed for compiling \pcode{if}s).
   140 \pcode{if}s). One additional feature in the Fun-language are sequences.
   130 One additional feature in the Fun-language are sequences.
   141 Their purpose is to do one calculation after another or printing out an
   131 Their purpose is to do one calculation after another. The
   142 intermediate result. The reason why we need to be careful however is the
   132 reason why we need to be careful however is the convention
   143 convention that every expression can only produce s single result
   133 that every expression can only produce s single result
   144 (including sequences). Since this result will be on the top of the
   134 (including sequences). Since this result will be on the
   145 stack, we need to generate a \pcode{pop}-instruction for sequences in
   135 top of the stack, we need to generate a 
   146 order to clean-up the stack. For example, for an expression of the form
   136 \pcode{pop}-instruction in order to clean-up the stack. Given 
   147 \pcode{exp1 ; exp2} we need to generate code where after the first code
   137 the expression of the form \pcode{exp1 ; exp2} we need to 
   148 chunk a \pcode{pop}-instruction is needed. 
   138 generate code where after the first code chunk
       
   139 a \pcode{pop}-instruction is needed. 
       
   140 
   149 
   141 \begin{lstlisting}[language=JVMIS, numbers=none,mathescape]
   150 \begin{lstlisting}[language=JVMIS, numbers=none,mathescape]
   142 $\textrm{\textit{compile}}($exp1$)$
   151 $\textrm{\textit{compile}}($exp1$)$
   143 pop
   152 pop
   144 $\textrm{\textit{compile}}($exp2$)$
   153 $\textrm{\textit{compile}}($exp2$)$
   330 call returns we have the result of the addition on the top of 
   339 call returns we have the result of the addition on the top of 
   331 the stack and just need to call suc. Finally, we can return 
   340 the stack and just need to call suc. Finally, we can return 
   332 the result on top of the stack as the result of the 
   341 the result on top of the stack as the result of the 
   333 add-function.
   342 add-function.
   334  
   343  
       
   344 \subsection*{Tail-Call Optimisations}
       
   345 
       
   346 Let us now briefly touch upon the vast topic of compiler optimisations.
       
   347 As an example, let's perform tail-call optimisations for our
       
   348 Fun-language. Consider the following version of the factorial function:
       
   349 
       
   350 \begin{lstlisting}
       
   351 def facT(n, acc) = 
       
   352   if n == 0 then acc 
       
   353   else facT(n - 1, n * acc);
       
   354 \end{lstlisting}
       
   355 
       
   356 \noindent 
       
   357 The corrsponding JVM code for this function is below:
       
   358 
       
   359 \begin{lstlisting}[language=JVMIS, numbers=left]
       
   360 .method public static facT(II)I 
       
   361 .limit locals 2
       
   362 .limit stack 6
       
   363   iload 0
       
   364   ldc 0	
       
   365   if_icmpne If_else_2
       
   366   iload 1
       
   367   goto If_end_3
       
   368 If_else_2:
       
   369   iload 0
       
   370   ldc 1
       
   371   isub
       
   372   iload 0
       
   373   iload 1
       
   374   imul
       
   375   invokestatic fact/fact/facT(II)I
       
   376 If_end_3:
       
   377   ireturn
       
   378 .end method 
       
   379 \end{lstlisting}
       
   380 
       
   381 \noindent
       
   382 The interesting part is in Lines 10 to 16. Since the \code{facT}
       
   383 function is recursive, we have also a recursive call in Line 16 in the
       
   384 JVM code. The problem is that before we can make the recursive call, we
       
   385 need to put the two arguments, namely \code{n - 1} and \code{n * acc},
       
   386 onto the stack. That is how we communicate arguments to a function. To
       
   387 see the the difficulty, imagine you call this function 1000 times
       
   388 recursively. Each call results in some hefty overhead on the
       
   389 stack---ultimately leading to a stack overflow. Well, it is possible to
       
   390 avoid this overhead completely in many circumstances. This is what
       
   391 tail-call optimisations are about.   
       
   392 
       
   393 Note that the call to \code{facT} in the program is the last instruction
       
   394 before the \code{ireturn} (the label in Line 17 does not count since it
       
   395 is not an instruction). Also remember, before we make the recursive call
       
   396 the arguments of \code{facT} need to put on the stack. Once we are
       
   397 ``inside'' the arguments on the stack turn into local variables. Therefore 
       
   398 \code{n} and \code{acc} are referenced inside the function with \pcode{iload 0}
       
   399 and \pcode{iload 1} respectively.
       
   400 
       
   401 The idea of tail-call optimisation is to eliminate the expensive recursive 
       
   402 functions call and replace it by a simple jump back to the beginning of the
       
   403 function. To make this work we have to change how we communicate the arguments
       
   404 to the next level of ``recursion'': we cannot use the stack, but have to load 
       
   405 the argument into the corresponding local variables. This gives the following
       
   406 code
       
   407 
       
   408 \begin{lstlisting}[language=JVMIS, numbers=left, escapeinside={(*@}{@*)}]
       
   409 .method public static facT(II)I 
       
   410 .limit locals 2
       
   411 .limit stack 6
       
   412 (*@\hm{facT\_Start:} @*)
       
   413   iload 0
       
   414   ldc 0
       
   415   if_icmpne If_else_2
       
   416   iload 1
       
   417   goto If_end_3
       
   418 If_else_2:
       
   419   iload 0
       
   420   ldc 1
       
   421   isub
       
   422   iload 0
       
   423   iload 1
       
   424   imul
       
   425   (*@\hm{istore 1} @*)
       
   426   (*@\hm{istore 0} @*)
       
   427   (*@\hm{goto facT\_Start} @*)
       
   428 If_end_3:
       
   429   ireturn
       
   430 .end method 
       
   431 \end{lstlisting}
       
   432 
       
   433 \noindent
       
   434 In Line 4 we introduce a label for indicating where the start of the
       
   435 function is. Important are Lines 17 and 18 where we store the values
       
   436 from the stack into local variables. When we then jump back to the 
       
   437 beginning of the function (in Line 19) it will look like to the 
       
   438 function as if the function had been called the normal way via values
       
   439 on the stack. But because of the jump, clearly, no memory on the
       
   440 stack is needed. In effect we replaced a recursive call with a simple 
       
   441 loop.
       
   442 
       
   443 Can this optimisation always be applied? Unfortunately not. The 
       
   444 recursive call needs to be in tail-call position, that is the last
       
   445 operation needs to be the recursive call. This is for example
       
   446 not the case with the usual formulation of the factorial function.
       
   447 Consider again the Fun-program
       
   448 
       
   449 \begin{lstlisting}[numbers=none]
       
   450 def fact(n) = if n == 0 then 1 
       
   451               else n * fact(n - 1)
       
   452 \end{lstlisting}            
       
   453 
       
   454 \noindent
       
   455 In this version of the factorial function the recursive call is
       
   456 \emph{not} the last operation (which can also been seen very clearly
       
   457 in the generated JVM code). Because of this, the plumbing of local 
       
   458 variables would not work and in effect the optimisation is not applicable. 
       
   459 Very roughly speaking the tail-position of a function is in the two 
       
   460 highlighted places
       
   461 
       
   462 \begin{itemize}
       
   463 \item \texttt{if Bexp then \hm{Exp} else \hm{Exp}}
       
   464 \item \texttt{Exp ; \hm{Exp}}
       
   465 \end{itemize}
       
   466 
       
   467 To sum up, the compiler needs to recognise when a recursive call
       
   468 is in tail-position. It then can apply the tail-call optimisations
       
   469 technique, which is well known and widely implemented in compilers
       
   470 for functional programming languages.
       
   471 
       
   472 
   335 \end{document}
   473 \end{document}
   336 
   474 
   337 %%% Local Variables: 
   475 %%% Local Variables: 
   338 %%% mode: latex  
   476 %%% mode: latex  
   339 %%% TeX-master: t
   477 %%% TeX-master: t