--- a/handouts/ho08.tex Sun Nov 17 16:12:16 2019 +0000
+++ b/handouts/ho08.tex Sun Nov 17 23:53:54 2019 +0000
@@ -5,6 +5,17 @@
\usepackage{../grammar}
\usepackage{../graphics}
\usepackage{amssymb}
+\usepackage{xcolor}
+
+\usepackage[most]{tcolorbox}
+
+\newtcbox{\hm}[1][]{%
+ size=fbox,
+ tcbox raise base, nobeforeafter,
+ enhanced, colframe=gray,
+ colback=gray!30, boxrule=1pt,
+ fontupper=\ttfamily,
+ #1}
% modern compilers are different
% https://channel9.msdn.com/Blogs/Seth-Juarez/Anders-Hejlsberg-on-Modern-Compiler-Construction
@@ -13,6 +24,8 @@
%https://dfilaretti.github.io/2017-04-30/halting-problem
+
+
\begin{document}
\lstset{morekeywords={def,if,then,else,write,read},keywordstyle=\color{codepurple}\bfseries}
@@ -26,7 +39,7 @@
inside the main function. In this handout we like to have a
look at a slightly more comfortable language, which I call
Fun-language, and a tiny-teeny bit more realistic compiler.
-The Fun-language is a functional programming language. A small
+The Fun-language is a small functional programming language. A small
collection of programs we want to be able to write and compile
is as follows:
@@ -44,15 +57,13 @@
def gcd(a, b) = if b == 0 then a else gcd(b, a % b);
\end{lstlisting}
-\noindent Compare the code of the fib-program with the same
-program written in the While-language\ldots Fun is definitely
-more comfortable. We will still focus on programs involving
-integers only, that means for example that every function is
-expected to return an integer. The point of the Fun language
-is to compile each function to a separate method in JVM
-bytecode (not just a big monolithic code chunk). The means we
-need to adapt to some of the conventions of the JVM about
-methods.
+\noindent Compare the code of the fib-program with the same program
+written in the While-language\ldots Fun is definitely more comfortable.
+We will still focus on programs involving integers only, that means for
+example that every function in Fun is expected to return an integer. The
+point of the Fun language is to compile each function to a separate
+method in JVM bytecode (not just a big monolithic code chunk). The means
+we need to adapt to some of the conventions of the JVM about methods.
The grammar of the Fun-language is slightly simpler than the
While-language, because the main syntactic category are
@@ -63,7 +74,7 @@
\begin{plstx}[rhs style=,margin=1.5cm]
: \meta{Exp} ::= \meta{Id} | \meta{Num} {\hspace{3.7cm}}
| \meta{Exp} + \meta{Exp} | ... | (\meta{Exp}) {\hspace{3.7cm}}
- | \code{if} \meta{BExp} \code{then} \meta{Exp} \code{else} \meta{Exp}
+ | \code{if} \; \meta{BExp} \; \code{then} \; \meta{Exp} \; \code{else} \; \meta{Exp}
| \code{write} \meta{Exp} {\hspace{5cm}}
| \meta{Exp} ; \meta{Exp} {\hspace{5cm}}
| \textit{FunName} (\meta{Exp}, ..., \meta{Exp})\\
@@ -120,23 +131,21 @@
case class Main(e: Exp) extends Decl
\end{lstlisting}}
-Let us first look at some clauses for compiling expressions.
-The compilation of arithmetic and boolean expressions is just
-like for the While-language and do not need any modification
-(recall that the \textit{compile}-function for boolean
-expression takes a third argument for the label where the
-control-flow should jump when the boolean expression is
-\emph{not} true---this is needed for compiling \pcode{if}s).
-One additional feature in the Fun-language are sequences.
-Their purpose is to do one calculation after another. The
-reason why we need to be careful however is the convention
-that every expression can only produce s single result
-(including sequences). Since this result will be on the
-top of the stack, we need to generate a
-\pcode{pop}-instruction in order to clean-up the stack. Given
-the expression of the form \pcode{exp1 ; exp2} we need to
-generate code where after the first code chunk
-a \pcode{pop}-instruction is needed.
+Let us first look at some clauses for compiling expressions. The
+compilation of arithmetic and boolean expressions is just like for the
+While-language and does not need any modification (recall that the
+\textit{compile}-function for boolean expressions takes a third argument
+for the label where the control-flow should jump when the boolean
+expression is \emph{not} true---this is needed for compiling
+\pcode{if}s). One additional feature in the Fun-language are sequences.
+Their purpose is to do one calculation after another or printing out an
+intermediate result. The reason why we need to be careful however is the
+convention that every expression can only produce s single result
+(including sequences). Since this result will be on the top of the
+stack, we need to generate a \pcode{pop}-instruction for sequences in
+order to clean-up the stack. For example, for an expression of the form
+\pcode{exp1 ; exp2} we need to generate code where after the first code
+chunk a \pcode{pop}-instruction is needed.
\begin{lstlisting}[language=JVMIS, numbers=none,mathescape]
$\textrm{\textit{compile}}($exp1$)$
@@ -332,6 +341,135 @@
the result on top of the stack as the result of the
add-function.
+\subsection*{Tail-Call Optimisations}
+
+Let us now briefly touch upon the vast topic of compiler optimisations.
+As an example, let's perform tail-call optimisations for our
+Fun-language. Consider the following version of the factorial function:
+
+\begin{lstlisting}
+def facT(n, acc) =
+ if n == 0 then acc
+ else facT(n - 1, n * acc);
+\end{lstlisting}
+
+\noindent
+The corrsponding JVM code for this function is below:
+
+\begin{lstlisting}[language=JVMIS, numbers=left]
+.method public static facT(II)I
+.limit locals 2
+.limit stack 6
+ iload 0
+ ldc 0
+ if_icmpne If_else_2
+ iload 1
+ goto If_end_3
+If_else_2:
+ iload 0
+ ldc 1
+ isub
+ iload 0
+ iload 1
+ imul
+ invokestatic fact/fact/facT(II)I
+If_end_3:
+ ireturn
+.end method
+\end{lstlisting}
+
+\noindent
+The interesting part is in Lines 10 to 16. Since the \code{facT}
+function is recursive, we have also a recursive call in Line 16 in the
+JVM code. The problem is that before we can make the recursive call, we
+need to put the two arguments, namely \code{n - 1} and \code{n * acc},
+onto the stack. That is how we communicate arguments to a function. To
+see the the difficulty, imagine you call this function 1000 times
+recursively. Each call results in some hefty overhead on the
+stack---ultimately leading to a stack overflow. Well, it is possible to
+avoid this overhead completely in many circumstances. This is what
+tail-call optimisations are about.
+
+Note that the call to \code{facT} in the program is the last instruction
+before the \code{ireturn} (the label in Line 17 does not count since it
+is not an instruction). Also remember, before we make the recursive call
+the arguments of \code{facT} need to put on the stack. Once we are
+``inside'' the arguments on the stack turn into local variables. Therefore
+\code{n} and \code{acc} are referenced inside the function with \pcode{iload 0}
+and \pcode{iload 1} respectively.
+
+The idea of tail-call optimisation is to eliminate the expensive recursive
+functions call and replace it by a simple jump back to the beginning of the
+function. To make this work we have to change how we communicate the arguments
+to the next level of ``recursion'': we cannot use the stack, but have to load
+the argument into the corresponding local variables. This gives the following
+code
+
+\begin{lstlisting}[language=JVMIS, numbers=left, escapeinside={(*@}{@*)}]
+.method public static facT(II)I
+.limit locals 2
+.limit stack 6
+(*@\hm{facT\_Start:} @*)
+ iload 0
+ ldc 0
+ if_icmpne If_else_2
+ iload 1
+ goto If_end_3
+If_else_2:
+ iload 0
+ ldc 1
+ isub
+ iload 0
+ iload 1
+ imul
+ (*@\hm{istore 1} @*)
+ (*@\hm{istore 0} @*)
+ (*@\hm{goto facT\_Start} @*)
+If_end_3:
+ ireturn
+.end method
+\end{lstlisting}
+
+\noindent
+In Line 4 we introduce a label for indicating where the start of the
+function is. Important are Lines 17 and 18 where we store the values
+from the stack into local variables. When we then jump back to the
+beginning of the function (in Line 19) it will look like to the
+function as if the function had been called the normal way via values
+on the stack. But because of the jump, clearly, no memory on the
+stack is needed. In effect we replaced a recursive call with a simple
+loop.
+
+Can this optimisation always be applied? Unfortunately not. The
+recursive call needs to be in tail-call position, that is the last
+operation needs to be the recursive call. This is for example
+not the case with the usual formulation of the factorial function.
+Consider again the Fun-program
+
+\begin{lstlisting}[numbers=none]
+def fact(n) = if n == 0 then 1
+ else n * fact(n - 1)
+\end{lstlisting}
+
+\noindent
+In this version of the factorial function the recursive call is
+\emph{not} the last operation (which can also been seen very clearly
+in the generated JVM code). Because of this, the plumbing of local
+variables would not work and in effect the optimisation is not applicable.
+Very roughly speaking the tail-position of a function is in the two
+highlighted places
+
+\begin{itemize}
+\item \texttt{if Bexp then \hm{Exp} else \hm{Exp}}
+\item \texttt{Exp ; \hm{Exp}}
+\end{itemize}
+
+To sum up, the compiler needs to recognise when a recursive call
+is in tail-position. It then can apply the tail-call optimisations
+technique, which is well known and widely implemented in compilers
+for functional programming languages.
+
+
\end{document}
%%% Local Variables: