handouts/ho09.tex
author Christian Urban <urbanc@in.tum.de>
Sun, 27 Oct 2019 15:16:22 +0000
changeset 677 decfd8cf8180
parent 539 ed8f014217be
child 678 ff3b48da282c
permissions -rw-r--r--
updated
Ignore whitespace changes - Everywhere: Within whitespace: At end of lines:
677
decfd8cf8180 updated
Christian Urban <urbanc@in.tum.de>
parents: 539
diff changeset
     1
% !TEX program = xelatex
539
ed8f014217be updated
Christian Urban <urbanc@in.tum.de>
parents:
diff changeset
     2
\documentclass{article}
ed8f014217be updated
Christian Urban <urbanc@in.tum.de>
parents:
diff changeset
     3
\usepackage{../style}
ed8f014217be updated
Christian Urban <urbanc@in.tum.de>
parents:
diff changeset
     4
\usepackage{../langs}
ed8f014217be updated
Christian Urban <urbanc@in.tum.de>
parents:
diff changeset
     5
\usepackage{../graphics}
ed8f014217be updated
Christian Urban <urbanc@in.tum.de>
parents:
diff changeset
     6
\usepackage{../grammar}
677
decfd8cf8180 updated
Christian Urban <urbanc@in.tum.de>
parents: 539
diff changeset
     7
%%\usepackage{multicol}
539
ed8f014217be updated
Christian Urban <urbanc@in.tum.de>
parents:
diff changeset
     8
677
decfd8cf8180 updated
Christian Urban <urbanc@in.tum.de>
parents: 539
diff changeset
     9
%%\newcommand{\dn}{\stackrel{\mbox{\scriptsize def}}{=}}
539
ed8f014217be updated
Christian Urban <urbanc@in.tum.de>
parents:
diff changeset
    10
ed8f014217be updated
Christian Urban <urbanc@in.tum.de>
parents:
diff changeset
    11
\begin{document}
677
decfd8cf8180 updated
Christian Urban <urbanc@in.tum.de>
parents: 539
diff changeset
    12
\fnote{\copyright{} Christian Urban, King's College London, 2019}
539
ed8f014217be updated
Christian Urban <urbanc@in.tum.de>
parents:
diff changeset
    13
ed8f014217be updated
Christian Urban <urbanc@in.tum.de>
parents:
diff changeset
    14
677
decfd8cf8180 updated
Christian Urban <urbanc@in.tum.de>
parents: 539
diff changeset
    15
\section*{Handout 9 (LLVM, SSA and CPS)}
539
ed8f014217be updated
Christian Urban <urbanc@in.tum.de>
parents:
diff changeset
    16
677
decfd8cf8180 updated
Christian Urban <urbanc@in.tum.de>
parents: 539
diff changeset
    17
Reflecting on our tiny compiler targetting the JVM, code generation was
decfd8cf8180 updated
Christian Urban <urbanc@in.tum.de>
parents: 539
diff changeset
    18
actually not so hard, no? One of the main reason for this ease is that
decfd8cf8180 updated
Christian Urban <urbanc@in.tum.de>
parents: 539
diff changeset
    19
the JVM is a stack-based virtual machine and it is, for example, not
decfd8cf8180 updated
Christian Urban <urbanc@in.tum.de>
parents: 539
diff changeset
    20
hard to translate arithmetic expressions into instructions manipulating
decfd8cf8180 updated
Christian Urban <urbanc@in.tum.de>
parents: 539
diff changeset
    21
the stack. The problem is that ``real'' CPUs, although supporting stack
decfd8cf8180 updated
Christian Urban <urbanc@in.tum.de>
parents: 539
diff changeset
    22
operations, are not really \emph{stack machines}. They are just not
decfd8cf8180 updated
Christian Urban <urbanc@in.tum.de>
parents: 539
diff changeset
    23
optimised for this way of calculating things. The design of CPUs is more
decfd8cf8180 updated
Christian Urban <urbanc@in.tum.de>
parents: 539
diff changeset
    24
like, here is a piece of memory---compiler, or compiler writer, do
decfd8cf8180 updated
Christian Urban <urbanc@in.tum.de>
parents: 539
diff changeset
    25
something with it. Otherwise get lost. So in the name of raw speed,
decfd8cf8180 updated
Christian Urban <urbanc@in.tum.de>
parents: 539
diff changeset
    26
modern compilers go the extra mile and generate code that is much easier
decfd8cf8180 updated
Christian Urban <urbanc@in.tum.de>
parents: 539
diff changeset
    27
and faster to process by CPUs. 
539
ed8f014217be updated
Christian Urban <urbanc@in.tum.de>
parents:
diff changeset
    28
677
decfd8cf8180 updated
Christian Urban <urbanc@in.tum.de>
parents: 539
diff changeset
    29
Another reason why it makes sense to go the extra mile is that stack
decfd8cf8180 updated
Christian Urban <urbanc@in.tum.de>
parents: 539
diff changeset
    30
instructions are very difficult to optimise---you cannot just re-arrange
decfd8cf8180 updated
Christian Urban <urbanc@in.tum.de>
parents: 539
diff changeset
    31
instructions without messing about with what is calculated on the stack.
decfd8cf8180 updated
Christian Urban <urbanc@in.tum.de>
parents: 539
diff changeset
    32
Also it is hard to find out if all the calculations on the stack are
decfd8cf8180 updated
Christian Urban <urbanc@in.tum.de>
parents: 539
diff changeset
    33
actually necessary and not by chance dead code. The JVM has for all this
decfd8cf8180 updated
Christian Urban <urbanc@in.tum.de>
parents: 539
diff changeset
    34
sophisticated machinery to make such ``high-level'' code still run fast,
decfd8cf8180 updated
Christian Urban <urbanc@in.tum.de>
parents: 539
diff changeset
    35
but let's say that for the sake of argument we want to not rely on it.
decfd8cf8180 updated
Christian Urban <urbanc@in.tum.de>
parents: 539
diff changeset
    36
We want to generate fast code ourselves. This means we have to work around
decfd8cf8180 updated
Christian Urban <urbanc@in.tum.de>
parents: 539
diff changeset
    37
the intricacies of what instructions CPUs can actually process. To make
decfd8cf8180 updated
Christian Urban <urbanc@in.tum.de>
parents: 539
diff changeset
    38
this all tractable, we target the LLVM Intermediate Language. In this way
decfd8cf8180 updated
Christian Urban <urbanc@in.tum.de>
parents: 539
diff changeset
    39
we can take advantage of the tools coming with LLVM and for example do not
decfd8cf8180 updated
Christian Urban <urbanc@in.tum.de>
parents: 539
diff changeset
    40
have to worry about things like that CPUs have only a limited amount of
decfd8cf8180 updated
Christian Urban <urbanc@in.tum.de>
parents: 539
diff changeset
    41
registers. 
539
ed8f014217be updated
Christian Urban <urbanc@in.tum.de>
parents:
diff changeset
    42
677
decfd8cf8180 updated
Christian Urban <urbanc@in.tum.de>
parents: 539
diff changeset
    43
LLVM\footnote{\url{http://llvm.org}} is a beautiful example that
decfd8cf8180 updated
Christian Urban <urbanc@in.tum.de>
parents: 539
diff changeset
    44
projects from Academia can make a difference in the world. LLVM was
decfd8cf8180 updated
Christian Urban <urbanc@in.tum.de>
parents: 539
diff changeset
    45
started in 2000 by two researchers at the  University of Illinois at
decfd8cf8180 updated
Christian Urban <urbanc@in.tum.de>
parents: 539
diff changeset
    46
Urbana–Champaign. At the time the behemoth of compilers was gcc with its
decfd8cf8180 updated
Christian Urban <urbanc@in.tum.de>
parents: 539
diff changeset
    47
myriad of front-ends for other languages (e.g.~gfortran, Ada, Go,
decfd8cf8180 updated
Christian Urban <urbanc@in.tum.de>
parents: 539
diff changeset
    48
Objective-C, Pascal etc). The problem was that gcc morphed over time
decfd8cf8180 updated
Christian Urban <urbanc@in.tum.de>
parents: 539
diff changeset
    49
into a monolithic gigantic piece of m\ldots ehm software, which you could
decfd8cf8180 updated
Christian Urban <urbanc@in.tum.de>
parents: 539
diff changeset
    50
not mess about in an afternoon. In contrast LLVM was a modular suite of
decfd8cf8180 updated
Christian Urban <urbanc@in.tum.de>
parents: 539
diff changeset
    51
tools with which you could play around easily and try out something new.
decfd8cf8180 updated
Christian Urban <urbanc@in.tum.de>
parents: 539
diff changeset
    52
LLVM became a big player once Apple hired one of the original developers
decfd8cf8180 updated
Christian Urban <urbanc@in.tum.de>
parents: 539
diff changeset
    53
(I cannot remember the reason why Apple did not want to use gcc, but
decfd8cf8180 updated
Christian Urban <urbanc@in.tum.de>
parents: 539
diff changeset
    54
maybe they were also just disgusted by big monolithic codebase). Anyway,
decfd8cf8180 updated
Christian Urban <urbanc@in.tum.de>
parents: 539
diff changeset
    55
LLVM is now the big player and gcc is more or less legacy. This does not
decfd8cf8180 updated
Christian Urban <urbanc@in.tum.de>
parents: 539
diff changeset
    56
mean that programming languages like C and C++ are dying out any time
decfd8cf8180 updated
Christian Urban <urbanc@in.tum.de>
parents: 539
diff changeset
    57
soon---they are nicely supported by LLVM.
539
ed8f014217be updated
Christian Urban <urbanc@in.tum.de>
parents:
diff changeset
    58
677
decfd8cf8180 updated
Christian Urban <urbanc@in.tum.de>
parents: 539
diff changeset
    59
Targetting the LLVM-IR also means we can profit from the very modular
decfd8cf8180 updated
Christian Urban <urbanc@in.tum.de>
parents: 539
diff changeset
    60
structure of the LLVM compiler and let for example the compiler generate
decfd8cf8180 updated
Christian Urban <urbanc@in.tum.de>
parents: 539
diff changeset
    61
code for X86, or ARM etc. We can be agnostic about where our code
decfd8cf8180 updated
Christian Urban <urbanc@in.tum.de>
parents: 539
diff changeset
    62
actually runs. However, what we have to do is to generate code in
decfd8cf8180 updated
Christian Urban <urbanc@in.tum.de>
parents: 539
diff changeset
    63
\emph{Static Single-Assignment} format (short SSA), because that is what
decfd8cf8180 updated
Christian Urban <urbanc@in.tum.de>
parents: 539
diff changeset
    64
the LLVM-IR expects from us and which is the intermediate format that
decfd8cf8180 updated
Christian Urban <urbanc@in.tum.de>
parents: 539
diff changeset
    65
LLVM can use to do cool things (like targetting strange architectures)
decfd8cf8180 updated
Christian Urban <urbanc@in.tum.de>
parents: 539
diff changeset
    66
and allocating memory efficiently. 
539
ed8f014217be updated
Christian Urban <urbanc@in.tum.de>
parents:
diff changeset
    67
677
decfd8cf8180 updated
Christian Urban <urbanc@in.tum.de>
parents: 539
diff changeset
    68
The idea behind SSA is to use very simple variable assignments where
decfd8cf8180 updated
Christian Urban <urbanc@in.tum.de>
parents: 539
diff changeset
    69
every variable is assigned only once. The assignments also need to be
decfd8cf8180 updated
Christian Urban <urbanc@in.tum.de>
parents: 539
diff changeset
    70
extremely primitive in the sense that they can be just simple operations
decfd8cf8180 updated
Christian Urban <urbanc@in.tum.de>
parents: 539
diff changeset
    71
like addition, multiplication, jumps and so on. A typical program in SSA
decfd8cf8180 updated
Christian Urban <urbanc@in.tum.de>
parents: 539
diff changeset
    72
is 
539
ed8f014217be updated
Christian Urban <urbanc@in.tum.de>
parents:
diff changeset
    73
677
decfd8cf8180 updated
Christian Urban <urbanc@in.tum.de>
parents: 539
diff changeset
    74
\begin{lstlisting}[language=LLVM,numbers=none]
decfd8cf8180 updated
Christian Urban <urbanc@in.tum.de>
parents: 539
diff changeset
    75
    x := 1
decfd8cf8180 updated
Christian Urban <urbanc@in.tum.de>
parents: 539
diff changeset
    76
    y := 2     
decfd8cf8180 updated
Christian Urban <urbanc@in.tum.de>
parents: 539
diff changeset
    77
    z := x + y
decfd8cf8180 updated
Christian Urban <urbanc@in.tum.de>
parents: 539
diff changeset
    78
\end{lstlisting}
539
ed8f014217be updated
Christian Urban <urbanc@in.tum.de>
parents:
diff changeset
    79
677
decfd8cf8180 updated
Christian Urban <urbanc@in.tum.de>
parents: 539
diff changeset
    80
\noindent where every variable is used only once. You cannot for example have
decfd8cf8180 updated
Christian Urban <urbanc@in.tum.de>
parents: 539
diff changeset
    81
decfd8cf8180 updated
Christian Urban <urbanc@in.tum.de>
parents: 539
diff changeset
    82
\begin{lstlisting}[language=LLVM,numbers=none]
decfd8cf8180 updated
Christian Urban <urbanc@in.tum.de>
parents: 539
diff changeset
    83
    x := 1
decfd8cf8180 updated
Christian Urban <urbanc@in.tum.de>
parents: 539
diff changeset
    84
    y := 2     
decfd8cf8180 updated
Christian Urban <urbanc@in.tum.de>
parents: 539
diff changeset
    85
    x := x + y
decfd8cf8180 updated
Christian Urban <urbanc@in.tum.de>
parents: 539
diff changeset
    86
\end{lstlisting}
539
ed8f014217be updated
Christian Urban <urbanc@in.tum.de>
parents:
diff changeset
    87
677
decfd8cf8180 updated
Christian Urban <urbanc@in.tum.de>
parents: 539
diff changeset
    88
\noindent because in this snippet \texttt{x} is assigned twice. There
decfd8cf8180 updated
Christian Urban <urbanc@in.tum.de>
parents: 539
diff changeset
    89
are sophisticated algorithm for imperative languages, like C, for
decfd8cf8180 updated
Christian Urban <urbanc@in.tum.de>
parents: 539
diff changeset
    90
efficiently transforming a program into SSA format, but we do not have
decfd8cf8180 updated
Christian Urban <urbanc@in.tum.de>
parents: 539
diff changeset
    91
to be concerned about those. We want to compile a functional language
decfd8cf8180 updated
Christian Urban <urbanc@in.tum.de>
parents: 539
diff changeset
    92
and there things get much more interesting than just sophisticated. We
decfd8cf8180 updated
Christian Urban <urbanc@in.tum.de>
parents: 539
diff changeset
    93
will need to have a look at CPS translations, which stands for
decfd8cf8180 updated
Christian Urban <urbanc@in.tum.de>
parents: 539
diff changeset
    94
Continuation-Passing-Style---basically black art. So sit tight.
539
ed8f014217be updated
Christian Urban <urbanc@in.tum.de>
parents:
diff changeset
    95
677
decfd8cf8180 updated
Christian Urban <urbanc@in.tum.de>
parents: 539
diff changeset
    96
\subsection*{CPS-Translations}
539
ed8f014217be updated
Christian Urban <urbanc@in.tum.de>
parents:
diff changeset
    97
677
decfd8cf8180 updated
Christian Urban <urbanc@in.tum.de>
parents: 539
diff changeset
    98
What is good about our simple fun language is that it basically only 
decfd8cf8180 updated
Christian Urban <urbanc@in.tum.de>
parents: 539
diff changeset
    99
contains expressions (be they arithmetic expressions or boolean expressions).
decfd8cf8180 updated
Christian Urban <urbanc@in.tum.de>
parents: 539
diff changeset
   100
The only exceptions are function definitions, for which we can easily
decfd8cf8180 updated
Christian Urban <urbanc@in.tum.de>
parents: 539
diff changeset
   101
use the mechanism of defining functions in LLVM-IR. For example the simple
decfd8cf8180 updated
Christian Urban <urbanc@in.tum.de>
parents: 539
diff changeset
   102
fun program 
539
ed8f014217be updated
Christian Urban <urbanc@in.tum.de>
parents:
diff changeset
   103
677
decfd8cf8180 updated
Christian Urban <urbanc@in.tum.de>
parents: 539
diff changeset
   104
\begin{lstlisting}[language=Scala,numbers=none]
decfd8cf8180 updated
Christian Urban <urbanc@in.tum.de>
parents: 539
diff changeset
   105
def dble(x) = x * x
decfd8cf8180 updated
Christian Urban <urbanc@in.tum.de>
parents: 539
diff changeset
   106
\end{lstlisting}
539
ed8f014217be updated
Christian Urban <urbanc@in.tum.de>
parents:
diff changeset
   107
677
decfd8cf8180 updated
Christian Urban <urbanc@in.tum.de>
parents: 539
diff changeset
   108
\noindent
decfd8cf8180 updated
Christian Urban <urbanc@in.tum.de>
parents: 539
diff changeset
   109
can be compiled into the following LLVM-IR function:
539
ed8f014217be updated
Christian Urban <urbanc@in.tum.de>
parents:
diff changeset
   110
677
decfd8cf8180 updated
Christian Urban <urbanc@in.tum.de>
parents: 539
diff changeset
   111
\begin{lstlisting}[language=LLVM]
decfd8cf8180 updated
Christian Urban <urbanc@in.tum.de>
parents: 539
diff changeset
   112
define i32 dble(i32 %x) {
decfd8cf8180 updated
Christian Urban <urbanc@in.tum.de>
parents: 539
diff changeset
   113
   %tmp =  mul i32 %x, % x
decfd8cf8180 updated
Christian Urban <urbanc@in.tum.de>
parents: 539
diff changeset
   114
   ret i32 %tmp
decfd8cf8180 updated
Christian Urban <urbanc@in.tum.de>
parents: 539
diff changeset
   115
}    
decfd8cf8180 updated
Christian Urban <urbanc@in.tum.de>
parents: 539
diff changeset
   116
\end{lstlisting}
539
ed8f014217be updated
Christian Urban <urbanc@in.tum.de>
parents:
diff changeset
   117
ed8f014217be updated
Christian Urban <urbanc@in.tum.de>
parents:
diff changeset
   118
 
ed8f014217be updated
Christian Urban <urbanc@in.tum.de>
parents:
diff changeset
   119
\end{document}
ed8f014217be updated
Christian Urban <urbanc@in.tum.de>
parents:
diff changeset
   120
ed8f014217be updated
Christian Urban <urbanc@in.tum.de>
parents:
diff changeset
   121
ed8f014217be updated
Christian Urban <urbanc@in.tum.de>
parents:
diff changeset
   122
%%% Local Variables: 
ed8f014217be updated
Christian Urban <urbanc@in.tum.de>
parents:
diff changeset
   123
%%% mode: latex
ed8f014217be updated
Christian Urban <urbanc@in.tum.de>
parents:
diff changeset
   124
%%% TeX-master: t
ed8f014217be updated
Christian Urban <urbanc@in.tum.de>
parents:
diff changeset
   125
%%% End: