handouts/ho09.tex
author Christian Urban <christian.urban@kcl.ac.uk>
Wed, 07 Oct 2020 09:08:55 +0100
changeset 777 a10430cb797c
parent 722 14914b57e207
child 898 45a48c47dcca
permissions -rw-r--r--
updated
Ignore whitespace changes - Everywhere: Within whitespace: At end of lines:
677
decfd8cf8180 updated
Christian Urban <urbanc@in.tum.de>
parents: 539
diff changeset
     1
% !TEX program = xelatex
539
ed8f014217be updated
Christian Urban <urbanc@in.tum.de>
parents:
diff changeset
     2
\documentclass{article}
ed8f014217be updated
Christian Urban <urbanc@in.tum.de>
parents:
diff changeset
     3
\usepackage{../style}
ed8f014217be updated
Christian Urban <urbanc@in.tum.de>
parents:
diff changeset
     4
\usepackage{../langs}
ed8f014217be updated
Christian Urban <urbanc@in.tum.de>
parents:
diff changeset
     5
\usepackage{../graphics}
ed8f014217be updated
Christian Urban <urbanc@in.tum.de>
parents:
diff changeset
     6
\usepackage{../grammar}
677
decfd8cf8180 updated
Christian Urban <urbanc@in.tum.de>
parents: 539
diff changeset
     7
%%\usepackage{multicol}
539
ed8f014217be updated
Christian Urban <urbanc@in.tum.de>
parents:
diff changeset
     8
677
decfd8cf8180 updated
Christian Urban <urbanc@in.tum.de>
parents: 539
diff changeset
     9
%%\newcommand{\dn}{\stackrel{\mbox{\scriptsize def}}{=}}
539
ed8f014217be updated
Christian Urban <urbanc@in.tum.de>
parents:
diff changeset
    10
ed8f014217be updated
Christian Urban <urbanc@in.tum.de>
parents:
diff changeset
    11
\begin{document}
677
decfd8cf8180 updated
Christian Urban <urbanc@in.tum.de>
parents: 539
diff changeset
    12
\fnote{\copyright{} Christian Urban, King's College London, 2019}
539
ed8f014217be updated
Christian Urban <urbanc@in.tum.de>
parents:
diff changeset
    13
ed8f014217be updated
Christian Urban <urbanc@in.tum.de>
parents:
diff changeset
    14
722
14914b57e207 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents: 704
diff changeset
    15
% CPS translations
14914b57e207 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents: 704
diff changeset
    16
% https://felleisen.org/matthias/4400-s20/lecture17.html
14914b57e207 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents: 704
diff changeset
    17
14914b57e207 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents: 704
diff changeset
    18
%% pattern matching resources
14914b57e207 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents: 704
diff changeset
    19
%% https://www.reddit.com/r/ProgrammingLanguages/comments/g1vno3/beginner_resources_for_compiling_pattern_matching/
14914b57e207 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents: 704
diff changeset
    20
677
decfd8cf8180 updated
Christian Urban <urbanc@in.tum.de>
parents: 539
diff changeset
    21
\section*{Handout 9 (LLVM, SSA and CPS)}
539
ed8f014217be updated
Christian Urban <urbanc@in.tum.de>
parents:
diff changeset
    22
700
52263ffd17b9 updated
Christian Urban <urbanc@in.tum.de>
parents: 680
diff changeset
    23
Reflecting on our two tiny compilers targetting the JVM, the code
52263ffd17b9 updated
Christian Urban <urbanc@in.tum.de>
parents: 680
diff changeset
    24
generation part was actually not so hard, no? Pretty much just some
52263ffd17b9 updated
Christian Urban <urbanc@in.tum.de>
parents: 680
diff changeset
    25
post-traversal of the abstract syntax tree, yes? One of the reasons for
52263ffd17b9 updated
Christian Urban <urbanc@in.tum.de>
parents: 680
diff changeset
    26
this ease is that the JVM is a stack-based virtual machine and it is
52263ffd17b9 updated
Christian Urban <urbanc@in.tum.de>
parents: 680
diff changeset
    27
therefore not hard to translate deeply-nested arithmetic expressions
52263ffd17b9 updated
Christian Urban <urbanc@in.tum.de>
parents: 680
diff changeset
    28
into a sequence of instructions manipulating the stack. The problem is
52263ffd17b9 updated
Christian Urban <urbanc@in.tum.de>
parents: 680
diff changeset
    29
that ``real'' CPUs, although supporting stack operations, are not really
52263ffd17b9 updated
Christian Urban <urbanc@in.tum.de>
parents: 680
diff changeset
    30
designed to be \emph{stack machines}.  The design of CPUs is more like,
52263ffd17b9 updated
Christian Urban <urbanc@in.tum.de>
parents: 680
diff changeset
    31
here is a chunk of memory---compiler, or better compiler writers, do
52263ffd17b9 updated
Christian Urban <urbanc@in.tum.de>
parents: 680
diff changeset
    32
something with it. Consequently, modern compilers need to go the extra
52263ffd17b9 updated
Christian Urban <urbanc@in.tum.de>
parents: 680
diff changeset
    33
mile in order to generate code that is much easier and faster to process
52263ffd17b9 updated
Christian Urban <urbanc@in.tum.de>
parents: 680
diff changeset
    34
by CPUs. To make this all tractable for this module, we target the LLVM
680
eecc4d5a2172 updated
Christian Urban <urbanc@in.tum.de>
parents: 679
diff changeset
    35
Intermediate Language. In this way we can take advantage of the tools
eecc4d5a2172 updated
Christian Urban <urbanc@in.tum.de>
parents: 679
diff changeset
    36
coming with LLVM. For example we do not have to worry about things like
eecc4d5a2172 updated
Christian Urban <urbanc@in.tum.de>
parents: 679
diff changeset
    37
register allocations.\bigskip 
539
ed8f014217be updated
Christian Urban <urbanc@in.tum.de>
parents:
diff changeset
    38
678
ff3b48da282c updated
Christian Urban <urbanc@in.tum.de>
parents: 677
diff changeset
    39
\noindent LLVM\footnote{\url{http://llvm.org}} is a beautiful example
700
52263ffd17b9 updated
Christian Urban <urbanc@in.tum.de>
parents: 680
diff changeset
    40
that projects from Academia can make a difference in the World. LLVM
678
ff3b48da282c updated
Christian Urban <urbanc@in.tum.de>
parents: 677
diff changeset
    41
started in 2000 as a project by two researchers at the  University of
ff3b48da282c updated
Christian Urban <urbanc@in.tum.de>
parents: 677
diff changeset
    42
Illinois at Urbana-Champaign. At the time the behemoth of compilers was
680
eecc4d5a2172 updated
Christian Urban <urbanc@in.tum.de>
parents: 679
diff changeset
    43
gcc with its myriad of front-ends for other languages (C++, Fortran,
678
ff3b48da282c updated
Christian Urban <urbanc@in.tum.de>
parents: 677
diff changeset
    44
Ada, Go, Objective-C, Pascal etc). The problem was that gcc morphed over
ff3b48da282c updated
Christian Urban <urbanc@in.tum.de>
parents: 677
diff changeset
    45
time into a monolithic gigantic piece of m\ldots ehm software, which you
ff3b48da282c updated
Christian Urban <urbanc@in.tum.de>
parents: 677
diff changeset
    46
could not mess about in an afternoon. In contrast, LLVM is designed to
700
52263ffd17b9 updated
Christian Urban <urbanc@in.tum.de>
parents: 680
diff changeset
    47
be a modular suite of tools with which you can play around easily and
678
ff3b48da282c updated
Christian Urban <urbanc@in.tum.de>
parents: 677
diff changeset
    48
try out something new. LLVM became a big player once Apple hired one of
ff3b48da282c updated
Christian Urban <urbanc@in.tum.de>
parents: 677
diff changeset
    49
the original developers (I cannot remember the reason why Apple did not
ff3b48da282c updated
Christian Urban <urbanc@in.tum.de>
parents: 677
diff changeset
    50
want to use gcc, but maybe they were also just disgusted by its big
ff3b48da282c updated
Christian Urban <urbanc@in.tum.de>
parents: 677
diff changeset
    51
monolithic codebase). Anyway, LLVM is now the big player and gcc is more
ff3b48da282c updated
Christian Urban <urbanc@in.tum.de>
parents: 677
diff changeset
    52
or less legacy. This does not mean that programming languages like C and
ff3b48da282c updated
Christian Urban <urbanc@in.tum.de>
parents: 677
diff changeset
    53
C++ are dying out any time soon---they are nicely supported by LLVM.
539
ed8f014217be updated
Christian Urban <urbanc@in.tum.de>
parents:
diff changeset
    54
700
52263ffd17b9 updated
Christian Urban <urbanc@in.tum.de>
parents: 680
diff changeset
    55
We will target the LLVM Intermediate Language, or LLVM Intermediate
52263ffd17b9 updated
Christian Urban <urbanc@in.tum.de>
parents: 680
diff changeset
    56
Representation (short LLVM-IR). The LLVM-IR looks very similar to the
52263ffd17b9 updated
Christian Urban <urbanc@in.tum.de>
parents: 680
diff changeset
    57
assembly language of Jasmin and Krakatau. It will also allow us to
52263ffd17b9 updated
Christian Urban <urbanc@in.tum.de>
parents: 680
diff changeset
    58
benefit from the modular structure of the LLVM compiler and let for
52263ffd17b9 updated
Christian Urban <urbanc@in.tum.de>
parents: 680
diff changeset
    59
example the compiler generate code for different CPUs, like X86 or ARM.
52263ffd17b9 updated
Christian Urban <urbanc@in.tum.de>
parents: 680
diff changeset
    60
That means we can be agnostic about where our code actually runs. We can
52263ffd17b9 updated
Christian Urban <urbanc@in.tum.de>
parents: 680
diff changeset
    61
also be ignorant about optimising code and allocating memory
52263ffd17b9 updated
Christian Urban <urbanc@in.tum.de>
parents: 680
diff changeset
    62
efficiently. 
539
ed8f014217be updated
Christian Urban <urbanc@in.tum.de>
parents:
diff changeset
    63
700
52263ffd17b9 updated
Christian Urban <urbanc@in.tum.de>
parents: 680
diff changeset
    64
However, what we have to do for LLVM is to generate code in \emph{Static
52263ffd17b9 updated
Christian Urban <urbanc@in.tum.de>
parents: 680
diff changeset
    65
Single-Assignment} format (short SSA), because that is what the LLVM-IR
52263ffd17b9 updated
Christian Urban <urbanc@in.tum.de>
parents: 680
diff changeset
    66
expects from us. A reason why LLVM uses the SSA format, rather than
52263ffd17b9 updated
Christian Urban <urbanc@in.tum.de>
parents: 680
diff changeset
    67
JVM-like stack instructions, is that stack instructions are difficult to
52263ffd17b9 updated
Christian Urban <urbanc@in.tum.de>
parents: 680
diff changeset
    68
optimise---you cannot just re-arrange instructions without messing about
52263ffd17b9 updated
Christian Urban <urbanc@in.tum.de>
parents: 680
diff changeset
    69
with what is calculated on the stack. Also it is hard to find out if all
52263ffd17b9 updated
Christian Urban <urbanc@in.tum.de>
parents: 680
diff changeset
    70
the calculations on the stack are actually necessary and not by chance
52263ffd17b9 updated
Christian Urban <urbanc@in.tum.de>
parents: 680
diff changeset
    71
dead code. The JVM has for all these obstacles sophisticated machinery
52263ffd17b9 updated
Christian Urban <urbanc@in.tum.de>
parents: 680
diff changeset
    72
to make such ``high-level'' code still run fast, but let's say that for
52263ffd17b9 updated
Christian Urban <urbanc@in.tum.de>
parents: 680
diff changeset
    73
the sake of argument we do not want to rely on it. We want to generate
52263ffd17b9 updated
Christian Urban <urbanc@in.tum.de>
parents: 680
diff changeset
    74
fast code ourselves. This means we have to work around the intricacies
52263ffd17b9 updated
Christian Urban <urbanc@in.tum.de>
parents: 680
diff changeset
    75
of what instructions CPUs can actually process fast. This is what the
52263ffd17b9 updated
Christian Urban <urbanc@in.tum.de>
parents: 680
diff changeset
    76
SSA format is designed for.
52263ffd17b9 updated
Christian Urban <urbanc@in.tum.de>
parents: 680
diff changeset
    77
52263ffd17b9 updated
Christian Urban <urbanc@in.tum.de>
parents: 680
diff changeset
    78
52263ffd17b9 updated
Christian Urban <urbanc@in.tum.de>
parents: 680
diff changeset
    79
The main idea behind the SSA format is to use very simple variable
678
ff3b48da282c updated
Christian Urban <urbanc@in.tum.de>
parents: 677
diff changeset
    80
assignments where every variable is assigned only once. The assignments
ff3b48da282c updated
Christian Urban <urbanc@in.tum.de>
parents: 677
diff changeset
    81
also need to be primitive in the sense that they can be just simple
ff3b48da282c updated
Christian Urban <urbanc@in.tum.de>
parents: 677
diff changeset
    82
operations like addition, multiplication, jumps, comparisons and so on.
680
eecc4d5a2172 updated
Christian Urban <urbanc@in.tum.de>
parents: 679
diff changeset
    83
Say, we have an expression $((1 + a) + (3 + (b * 5)))$, then the
700
52263ffd17b9 updated
Christian Urban <urbanc@in.tum.de>
parents: 680
diff changeset
    84
corresponding SSA format is 
680
eecc4d5a2172 updated
Christian Urban <urbanc@in.tum.de>
parents: 679
diff changeset
    85
 
eecc4d5a2172 updated
Christian Urban <urbanc@in.tum.de>
parents: 679
diff changeset
    86
\begin{lstlisting}[language=LLVMIR,numbers=left]
eecc4d5a2172 updated
Christian Urban <urbanc@in.tum.de>
parents: 679
diff changeset
    87
let tmp0 = add 1 a in   
eecc4d5a2172 updated
Christian Urban <urbanc@in.tum.de>
parents: 679
diff changeset
    88
let tmp1 = mul b 5 in 
eecc4d5a2172 updated
Christian Urban <urbanc@in.tum.de>
parents: 679
diff changeset
    89
let tmp2 = add 3 tmp1 in 
700
52263ffd17b9 updated
Christian Urban <urbanc@in.tum.de>
parents: 680
diff changeset
    90
let tmp3 = add tmp0 tmp2 in tmp3 
677
decfd8cf8180 updated
Christian Urban <urbanc@in.tum.de>
parents: 539
diff changeset
    91
\end{lstlisting}
539
ed8f014217be updated
Christian Urban <urbanc@in.tum.de>
parents:
diff changeset
    92
678
ff3b48da282c updated
Christian Urban <urbanc@in.tum.de>
parents: 677
diff changeset
    93
\noindent where every variable is used only once (we could not write
680
eecc4d5a2172 updated
Christian Urban <urbanc@in.tum.de>
parents: 679
diff changeset
    94
\texttt{tmp1 = add 3 tmp1} in Line 3 for example).  There are
678
ff3b48da282c updated
Christian Urban <urbanc@in.tum.de>
parents: 677
diff changeset
    95
sophisticated algorithms for imperative languages, like C, that
ff3b48da282c updated
Christian Urban <urbanc@in.tum.de>
parents: 677
diff changeset
    96
efficiently transform a high-level program into SSA format. But we can
ff3b48da282c updated
Christian Urban <urbanc@in.tum.de>
parents: 677
diff changeset
    97
ignore them here. We want to compile a functional language and there
ff3b48da282c updated
Christian Urban <urbanc@in.tum.de>
parents: 677
diff changeset
    98
things get much more interesting than just sophisticated. We will need
ff3b48da282c updated
Christian Urban <urbanc@in.tum.de>
parents: 677
diff changeset
    99
to have a look at CPS translations, where the CPS stands for
ff3b48da282c updated
Christian Urban <urbanc@in.tum.de>
parents: 677
diff changeset
   100
Continuation-Passing-Style---basically black programming art or
ff3b48da282c updated
Christian Urban <urbanc@in.tum.de>
parents: 677
diff changeset
   101
abracadabra programming. So sit tight.
539
ed8f014217be updated
Christian Urban <urbanc@in.tum.de>
parents:
diff changeset
   102
678
ff3b48da282c updated
Christian Urban <urbanc@in.tum.de>
parents: 677
diff changeset
   103
\subsection*{LLVM-IR}
539
ed8f014217be updated
Christian Urban <urbanc@in.tum.de>
parents:
diff changeset
   104
700
52263ffd17b9 updated
Christian Urban <urbanc@in.tum.de>
parents: 680
diff changeset
   105
Before we start, let's first have a look at the \emph{LLVM Intermediate
52263ffd17b9 updated
Christian Urban <urbanc@in.tum.de>
parents: 680
diff changeset
   106
Representation} in more detail. The LLVM-IR is in between the frontends
52263ffd17b9 updated
Christian Urban <urbanc@in.tum.de>
parents: 680
diff changeset
   107
and backends of the LLVM framework. It allows compilation of multiple
52263ffd17b9 updated
Christian Urban <urbanc@in.tum.de>
parents: 680
diff changeset
   108
source languages to multiple targets. It is also the place where most of
52263ffd17b9 updated
Christian Urban <urbanc@in.tum.de>
parents: 680
diff changeset
   109
the target independent optimisations are performed. 
52263ffd17b9 updated
Christian Urban <urbanc@in.tum.de>
parents: 680
diff changeset
   110
52263ffd17b9 updated
Christian Urban <urbanc@in.tum.de>
parents: 680
diff changeset
   111
What is good about our toy Fun language is that it basically only
52263ffd17b9 updated
Christian Urban <urbanc@in.tum.de>
parents: 680
diff changeset
   112
contains expressions (be they arithmetic expressions, boolean
52263ffd17b9 updated
Christian Urban <urbanc@in.tum.de>
parents: 680
diff changeset
   113
expressions or if-expressions). The exception are function definitions.
52263ffd17b9 updated
Christian Urban <urbanc@in.tum.de>
parents: 680
diff changeset
   114
Luckily, for them we can use the mechanism of defining functions in the
52263ffd17b9 updated
Christian Urban <urbanc@in.tum.de>
parents: 680
diff changeset
   115
LLVM-IR (this is similar to using JVM methods for functions in our
52263ffd17b9 updated
Christian Urban <urbanc@in.tum.de>
parents: 680
diff changeset
   116
earlier compiler). For example the simple Fun program 
539
ed8f014217be updated
Christian Urban <urbanc@in.tum.de>
parents:
diff changeset
   117
ed8f014217be updated
Christian Urban <urbanc@in.tum.de>
parents:
diff changeset
   118
677
decfd8cf8180 updated
Christian Urban <urbanc@in.tum.de>
parents: 539
diff changeset
   119
\begin{lstlisting}[language=Scala,numbers=none]
678
ff3b48da282c updated
Christian Urban <urbanc@in.tum.de>
parents: 677
diff changeset
   120
def sqr(x) = x * x
677
decfd8cf8180 updated
Christian Urban <urbanc@in.tum.de>
parents: 539
diff changeset
   121
\end{lstlisting}
539
ed8f014217be updated
Christian Urban <urbanc@in.tum.de>
parents:
diff changeset
   122
677
decfd8cf8180 updated
Christian Urban <urbanc@in.tum.de>
parents: 539
diff changeset
   123
\noindent
700
52263ffd17b9 updated
Christian Urban <urbanc@in.tum.de>
parents: 680
diff changeset
   124
can be compiled to the following LLVM-IR function:
539
ed8f014217be updated
Christian Urban <urbanc@in.tum.de>
parents:
diff changeset
   125
677
decfd8cf8180 updated
Christian Urban <urbanc@in.tum.de>
parents: 539
diff changeset
   126
\begin{lstlisting}[language=LLVM]
678
ff3b48da282c updated
Christian Urban <urbanc@in.tum.de>
parents: 677
diff changeset
   127
define i32 @sqr(i32 %x) {
ff3b48da282c updated
Christian Urban <urbanc@in.tum.de>
parents: 677
diff changeset
   128
   %tmp = mul i32 %x, %x
677
decfd8cf8180 updated
Christian Urban <urbanc@in.tum.de>
parents: 539
diff changeset
   129
   ret i32 %tmp
decfd8cf8180 updated
Christian Urban <urbanc@in.tum.de>
parents: 539
diff changeset
   130
}    
decfd8cf8180 updated
Christian Urban <urbanc@in.tum.de>
parents: 539
diff changeset
   131
\end{lstlisting}
539
ed8f014217be updated
Christian Urban <urbanc@in.tum.de>
parents:
diff changeset
   132
700
52263ffd17b9 updated
Christian Urban <urbanc@in.tum.de>
parents: 680
diff changeset
   133
\noindent First notice that all variable names, in this case \texttt{x}
52263ffd17b9 updated
Christian Urban <urbanc@in.tum.de>
parents: 680
diff changeset
   134
and \texttt{tmp}, are prefixed with \texttt{\%} in the LLVM-IR.
52263ffd17b9 updated
Christian Urban <urbanc@in.tum.de>
parents: 680
diff changeset
   135
Temporary variables can be named with an identifier, such as
52263ffd17b9 updated
Christian Urban <urbanc@in.tum.de>
parents: 680
diff changeset
   136
\texttt{tmp}, or numbers. Function names, since they are ``global'',
52263ffd17b9 updated
Christian Urban <urbanc@in.tum.de>
parents: 680
diff changeset
   137
need to be prefixed with @-symbol. Also, the LLVM-IR is a fully typed
52263ffd17b9 updated
Christian Urban <urbanc@in.tum.de>
parents: 680
diff changeset
   138
language. The \texttt{i32} type stands for 32-bit integers. There are
52263ffd17b9 updated
Christian Urban <urbanc@in.tum.de>
parents: 680
diff changeset
   139
also types for 64-bit integers (\texttt{i64}), chars (\texttt{i8}),
52263ffd17b9 updated
Christian Urban <urbanc@in.tum.de>
parents: 680
diff changeset
   140
floats, arrays and even pointer types. In the code above, \texttt{sqr}
52263ffd17b9 updated
Christian Urban <urbanc@in.tum.de>
parents: 680
diff changeset
   141
takes an argument of type \texttt{i32} and produces a result of type
52263ffd17b9 updated
Christian Urban <urbanc@in.tum.de>
parents: 680
diff changeset
   142
\texttt{i32} (the result type is in front of the function name, like in
52263ffd17b9 updated
Christian Urban <urbanc@in.tum.de>
parents: 680
diff changeset
   143
C). Each arithmetic operation, for example addition and multiplication,
52263ffd17b9 updated
Christian Urban <urbanc@in.tum.de>
parents: 680
diff changeset
   144
are also prefixed with the type they operate on. Obviously these types
52263ffd17b9 updated
Christian Urban <urbanc@in.tum.de>
parents: 680
diff changeset
   145
need to match up\ldots{} but since we have in our programs only
52263ffd17b9 updated
Christian Urban <urbanc@in.tum.de>
parents: 680
diff changeset
   146
integers, \texttt{i32} everywhere will do. We do not have to generate
701
681c36b2af27 updated
Christian Urban <urbanc@in.tum.de>
parents: 700
diff changeset
   147
any other types, but obviously this is a limitation in our Fun language.
539
ed8f014217be updated
Christian Urban <urbanc@in.tum.de>
parents:
diff changeset
   148
 
700
52263ffd17b9 updated
Christian Urban <urbanc@in.tum.de>
parents: 680
diff changeset
   149
There are a few interesting instructions in the LLVM-IR which are quite
701
681c36b2af27 updated
Christian Urban <urbanc@in.tum.de>
parents: 700
diff changeset
   150
different than what we have seen in the JVM. Can you remember the
681c36b2af27 updated
Christian Urban <urbanc@in.tum.de>
parents: 700
diff changeset
   151
kerfuffle we had to go through with boolean expressions and negating the
681c36b2af27 updated
Christian Urban <urbanc@in.tum.de>
parents: 700
diff changeset
   152
condition? In the LLVM-IR, branching  if-conditions is implemented
681c36b2af27 updated
Christian Urban <urbanc@in.tum.de>
parents: 700
diff changeset
   153
differently: there is a separate \texttt{br}-instruction as follows:
700
52263ffd17b9 updated
Christian Urban <urbanc@in.tum.de>
parents: 680
diff changeset
   154
52263ffd17b9 updated
Christian Urban <urbanc@in.tum.de>
parents: 680
diff changeset
   155
\begin{lstlisting}[language=LLVM]
52263ffd17b9 updated
Christian Urban <urbanc@in.tum.de>
parents: 680
diff changeset
   156
br i1 %var, label %if_br, label %else_br
52263ffd17b9 updated
Christian Urban <urbanc@in.tum.de>
parents: 680
diff changeset
   157
\end{lstlisting}
52263ffd17b9 updated
Christian Urban <urbanc@in.tum.de>
parents: 680
diff changeset
   158
52263ffd17b9 updated
Christian Urban <urbanc@in.tum.de>
parents: 680
diff changeset
   159
\noindent
52263ffd17b9 updated
Christian Urban <urbanc@in.tum.de>
parents: 680
diff changeset
   160
The type \texttt{i1} stands for booleans. If the variable is true, then
52263ffd17b9 updated
Christian Urban <urbanc@in.tum.de>
parents: 680
diff changeset
   161
this instruction jumps to the if-branch, which needs an explicit label;
52263ffd17b9 updated
Christian Urban <urbanc@in.tum.de>
parents: 680
diff changeset
   162
otherwise to the else-branch, again with its own label. This allows us
701
681c36b2af27 updated
Christian Urban <urbanc@in.tum.de>
parents: 700
diff changeset
   163
to keep the meaning of the boolean expression as is when compiling if's.
681c36b2af27 updated
Christian Urban <urbanc@in.tum.de>
parents: 700
diff changeset
   164
A value of type boolean is generated in the LLVM-IR by the
681c36b2af27 updated
Christian Urban <urbanc@in.tum.de>
parents: 700
diff changeset
   165
\texttt{icmp}-instruction. This instruction is for integers (hence the
681c36b2af27 updated
Christian Urban <urbanc@in.tum.de>
parents: 700
diff changeset
   166
\texttt{i}) and takes the comparison operation as argument. For example
700
52263ffd17b9 updated
Christian Urban <urbanc@in.tum.de>
parents: 680
diff changeset
   167
52263ffd17b9 updated
Christian Urban <urbanc@in.tum.de>
parents: 680
diff changeset
   168
\begin{lstlisting}[language=LLVM]
52263ffd17b9 updated
Christian Urban <urbanc@in.tum.de>
parents: 680
diff changeset
   169
icmp eq i32  %x, %y     ; for equal
52263ffd17b9 updated
Christian Urban <urbanc@in.tum.de>
parents: 680
diff changeset
   170
icmp sle i32 %x, %y     ;   signed less or equal
52263ffd17b9 updated
Christian Urban <urbanc@in.tum.de>
parents: 680
diff changeset
   171
icmp slt i32 %x, %y     ;   signed less than
52263ffd17b9 updated
Christian Urban <urbanc@in.tum.de>
parents: 680
diff changeset
   172
icmp ult i32 %x, %y     ;   unsigned less than 
52263ffd17b9 updated
Christian Urban <urbanc@in.tum.de>
parents: 680
diff changeset
   173
\end{lstlisting}
52263ffd17b9 updated
Christian Urban <urbanc@in.tum.de>
parents: 680
diff changeset
   174
52263ffd17b9 updated
Christian Urban <urbanc@in.tum.de>
parents: 680
diff changeset
   175
\noindent
52263ffd17b9 updated
Christian Urban <urbanc@in.tum.de>
parents: 680
diff changeset
   176
In some operations, the LLVM-IR distinguishes between signed and 
52263ffd17b9 updated
Christian Urban <urbanc@in.tum.de>
parents: 680
diff changeset
   177
unsigned representations of integers.
52263ffd17b9 updated
Christian Urban <urbanc@in.tum.de>
parents: 680
diff changeset
   178
701
681c36b2af27 updated
Christian Urban <urbanc@in.tum.de>
parents: 700
diff changeset
   179
It is also easy to call another function in LLVM-IR: as can be 
681c36b2af27 updated
Christian Urban <urbanc@in.tum.de>
parents: 700
diff changeset
   180
seen from Figure~\ref{lli} we can just call a function with the 
681c36b2af27 updated
Christian Urban <urbanc@in.tum.de>
parents: 700
diff changeset
   181
instruction \texttt{call} and can also assign the result to 
681c36b2af27 updated
Christian Urban <urbanc@in.tum.de>
parents: 700
diff changeset
   182
a variable. The syntax is as follows
681c36b2af27 updated
Christian Urban <urbanc@in.tum.de>
parents: 700
diff changeset
   183
681c36b2af27 updated
Christian Urban <urbanc@in.tum.de>
parents: 700
diff changeset
   184
\begin{lstlisting}[language=LLVM]
681c36b2af27 updated
Christian Urban <urbanc@in.tum.de>
parents: 700
diff changeset
   185
%var = call i32 @foo(...args...)
681c36b2af27 updated
Christian Urban <urbanc@in.tum.de>
parents: 700
diff changeset
   186
\end{lstlisting}
681c36b2af27 updated
Christian Urban <urbanc@in.tum.de>
parents: 700
diff changeset
   187
681c36b2af27 updated
Christian Urban <urbanc@in.tum.de>
parents: 700
diff changeset
   188
\noindent
681c36b2af27 updated
Christian Urban <urbanc@in.tum.de>
parents: 700
diff changeset
   189
where the arguments can only be simple variables, not compound
681c36b2af27 updated
Christian Urban <urbanc@in.tum.de>
parents: 700
diff changeset
   190
expressions.
681c36b2af27 updated
Christian Urban <urbanc@in.tum.de>
parents: 700
diff changeset
   191
679
8fc109f36b78 updated
Christian Urban <urbanc@in.tum.de>
parents: 678
diff changeset
   192
Conveniently, you can use the program \texttt{lli}, which comes with
8fc109f36b78 updated
Christian Urban <urbanc@in.tum.de>
parents: 678
diff changeset
   193
LLVM, to interpret programs written in the LLVM-IR. So you can easily
8fc109f36b78 updated
Christian Urban <urbanc@in.tum.de>
parents: 678
diff changeset
   194
check whether the code you produced actually works. To get a running
8fc109f36b78 updated
Christian Urban <urbanc@in.tum.de>
parents: 678
diff changeset
   195
program that does something interesting you need to add some boilerplate
701
681c36b2af27 updated
Christian Urban <urbanc@in.tum.de>
parents: 700
diff changeset
   196
about printing out numbers and a main-function that is the entry point
700
52263ffd17b9 updated
Christian Urban <urbanc@in.tum.de>
parents: 680
diff changeset
   197
for the program (see Figure~\ref{lli} for a complete listing). Again
52263ffd17b9 updated
Christian Urban <urbanc@in.tum.de>
parents: 680
diff changeset
   198
this is very similar to the boilerplate we needed to add in our JVM
52263ffd17b9 updated
Christian Urban <urbanc@in.tum.de>
parents: 680
diff changeset
   199
compiler. 
52263ffd17b9 updated
Christian Urban <urbanc@in.tum.de>
parents: 680
diff changeset
   200
52263ffd17b9 updated
Christian Urban <urbanc@in.tum.de>
parents: 680
diff changeset
   201
You can generate a binary for the program in Figure~\ref{lli} by using
52263ffd17b9 updated
Christian Urban <urbanc@in.tum.de>
parents: 680
diff changeset
   202
the \texttt{llc}-compiler and then \texttt{gcc}, whereby \texttt{llc} generates
52263ffd17b9 updated
Christian Urban <urbanc@in.tum.de>
parents: 680
diff changeset
   203
an object file and \texttt{gcc} (that is clang) generates the
52263ffd17b9 updated
Christian Urban <urbanc@in.tum.de>
parents: 680
diff changeset
   204
executable binary:
678
ff3b48da282c updated
Christian Urban <urbanc@in.tum.de>
parents: 677
diff changeset
   205
679
8fc109f36b78 updated
Christian Urban <urbanc@in.tum.de>
parents: 678
diff changeset
   206
\begin{lstlisting}[language=bash,numbers=none]
8fc109f36b78 updated
Christian Urban <urbanc@in.tum.de>
parents: 678
diff changeset
   207
llc -filetype=obj sqr.ll
8fc109f36b78 updated
Christian Urban <urbanc@in.tum.de>
parents: 678
diff changeset
   208
gcc sqr.o -o a.out
8fc109f36b78 updated
Christian Urban <urbanc@in.tum.de>
parents: 678
diff changeset
   209
./a.out
680
eecc4d5a2172 updated
Christian Urban <urbanc@in.tum.de>
parents: 679
diff changeset
   210
> 25
679
8fc109f36b78 updated
Christian Urban <urbanc@in.tum.de>
parents: 678
diff changeset
   211
\end{lstlisting}
8fc109f36b78 updated
Christian Urban <urbanc@in.tum.de>
parents: 678
diff changeset
   212
8fc109f36b78 updated
Christian Urban <urbanc@in.tum.de>
parents: 678
diff changeset
   213
\begin{figure}[t]\small 
8fc109f36b78 updated
Christian Urban <urbanc@in.tum.de>
parents: 678
diff changeset
   214
\lstinputlisting[language=LLVM,numbers=left]{../progs/sqr.ll}
700
52263ffd17b9 updated
Christian Urban <urbanc@in.tum.de>
parents: 680
diff changeset
   215
\caption{An LLVM-IR program for calculating the square function. It
52263ffd17b9 updated
Christian Urban <urbanc@in.tum.de>
parents: 680
diff changeset
   216
calls this function in \texttt{@main} with the argument \texttt{5}. The
52263ffd17b9 updated
Christian Urban <urbanc@in.tum.de>
parents: 680
diff changeset
   217
code for the \texttt{sqr} function is in Lines 13 -- 16. The main
52263ffd17b9 updated
Christian Urban <urbanc@in.tum.de>
parents: 680
diff changeset
   218
function calls \texttt{sqr} and then prints out the result. The other
679
8fc109f36b78 updated
Christian Urban <urbanc@in.tum.de>
parents: 678
diff changeset
   219
code is boilerplate for printing out integers.\label{lli}}
678
ff3b48da282c updated
Christian Urban <urbanc@in.tum.de>
parents: 677
diff changeset
   220
\end{figure}   
ff3b48da282c updated
Christian Urban <urbanc@in.tum.de>
parents: 677
diff changeset
   221
679
8fc109f36b78 updated
Christian Urban <urbanc@in.tum.de>
parents: 678
diff changeset
   222
8fc109f36b78 updated
Christian Urban <urbanc@in.tum.de>
parents: 678
diff changeset
   223
    
8fc109f36b78 updated
Christian Urban <urbanc@in.tum.de>
parents: 678
diff changeset
   224
\subsection*{Our Own Intermediate Language}
8fc109f36b78 updated
Christian Urban <urbanc@in.tum.de>
parents: 678
diff changeset
   225
8fc109f36b78 updated
Christian Urban <urbanc@in.tum.de>
parents: 678
diff changeset
   226
Remember compilers have to solve the problem of bridging the gap between
680
eecc4d5a2172 updated
Christian Urban <urbanc@in.tum.de>
parents: 679
diff changeset
   227
``high-level'' programs and ``low-level'' hardware. If the gap is too
eecc4d5a2172 updated
Christian Urban <urbanc@in.tum.de>
parents: 679
diff changeset
   228
wide for one step, then a good strategy is to lay a stepping stone
eecc4d5a2172 updated
Christian Urban <urbanc@in.tum.de>
parents: 679
diff changeset
   229
somewhere in between. The LLVM-IR itself is such a stepping stone to
eecc4d5a2172 updated
Christian Urban <urbanc@in.tum.de>
parents: 679
diff changeset
   230
make the task of generating and optimising code easier. Like a real
eecc4d5a2172 updated
Christian Urban <urbanc@in.tum.de>
parents: 679
diff changeset
   231
compiler we will use our own stepping stone which I call the
700
52263ffd17b9 updated
Christian Urban <urbanc@in.tum.de>
parents: 680
diff changeset
   232
\emph{K-language}. For what follows recall the various kinds of
52263ffd17b9 updated
Christian Urban <urbanc@in.tum.de>
parents: 680
diff changeset
   233
expressions in the Fun language. For convenience the Scala code of the
52263ffd17b9 updated
Christian Urban <urbanc@in.tum.de>
parents: 680
diff changeset
   234
corresponding abstract syntax trees is shown on top of
52263ffd17b9 updated
Christian Urban <urbanc@in.tum.de>
parents: 680
diff changeset
   235
Figure~\ref{absfun}. Below is the code for the abstract syntax trees in
701
681c36b2af27 updated
Christian Urban <urbanc@in.tum.de>
parents: 700
diff changeset
   236
the K-language. In K, here are two kinds of syntactic entities, namely
700
52263ffd17b9 updated
Christian Urban <urbanc@in.tum.de>
parents: 680
diff changeset
   237
\emph{K-values} and \emph{K-expressions}. The central constructor of the
701
681c36b2af27 updated
Christian Urban <urbanc@in.tum.de>
parents: 700
diff changeset
   238
K-language is \texttt{KLet}. For this recall in SSA that arithmetic
681c36b2af27 updated
Christian Urban <urbanc@in.tum.de>
parents: 700
diff changeset
   239
expressions such as $((1 + a) + (3 + (b * 5)))$ need to be broken up
681c36b2af27 updated
Christian Urban <urbanc@in.tum.de>
parents: 700
diff changeset
   240
into smaller ``atomic'' steps, like so
680
eecc4d5a2172 updated
Christian Urban <urbanc@in.tum.de>
parents: 679
diff changeset
   241
 
eecc4d5a2172 updated
Christian Urban <urbanc@in.tum.de>
parents: 679
diff changeset
   242
\begin{lstlisting}[language=LLVMIR,numbers=none]
eecc4d5a2172 updated
Christian Urban <urbanc@in.tum.de>
parents: 679
diff changeset
   243
let tmp0 = add 1 a in   
eecc4d5a2172 updated
Christian Urban <urbanc@in.tum.de>
parents: 679
diff changeset
   244
let tmp1 = mul b 5 in 
eecc4d5a2172 updated
Christian Urban <urbanc@in.tum.de>
parents: 679
diff changeset
   245
let tmp2 = add 3 tmp1 in 
eecc4d5a2172 updated
Christian Urban <urbanc@in.tum.de>
parents: 679
diff changeset
   246
let tmp3 = add tmp0 tmp2 in
eecc4d5a2172 updated
Christian Urban <urbanc@in.tum.de>
parents: 679
diff changeset
   247
  tmp3 
eecc4d5a2172 updated
Christian Urban <urbanc@in.tum.de>
parents: 679
diff changeset
   248
\end{lstlisting}
eecc4d5a2172 updated
Christian Urban <urbanc@in.tum.de>
parents: 679
diff changeset
   249
eecc4d5a2172 updated
Christian Urban <urbanc@in.tum.de>
parents: 679
diff changeset
   250
\noindent
700
52263ffd17b9 updated
Christian Urban <urbanc@in.tum.de>
parents: 680
diff changeset
   251
Here \texttt{tmp3} will contain the result of what the whole expression
52263ffd17b9 updated
Christian Urban <urbanc@in.tum.de>
parents: 680
diff changeset
   252
stands for. In each individual step we can only perform an ``atomic''
52263ffd17b9 updated
Christian Urban <urbanc@in.tum.de>
parents: 680
diff changeset
   253
operation, like addition or multiplication of a number and a variable.
52263ffd17b9 updated
Christian Urban <urbanc@in.tum.de>
parents: 680
diff changeset
   254
We are not allowed to have for example an if-condition on the right-hand
52263ffd17b9 updated
Christian Urban <urbanc@in.tum.de>
parents: 680
diff changeset
   255
side of an equals. Such constraints are enforced upon us because of how
52263ffd17b9 updated
Christian Urban <urbanc@in.tum.de>
parents: 680
diff changeset
   256
the SSA format works in the LLVM-IR. By having in \texttt{KLet} taking
52263ffd17b9 updated
Christian Urban <urbanc@in.tum.de>
parents: 680
diff changeset
   257
first a string (standing for an intermediate result) and second a value,
52263ffd17b9 updated
Christian Urban <urbanc@in.tum.de>
parents: 680
diff changeset
   258
we can fulfil this constraint ``by construction''---there is no way we
52263ffd17b9 updated
Christian Urban <urbanc@in.tum.de>
parents: 680
diff changeset
   259
could write anything else than a value. 
52263ffd17b9 updated
Christian Urban <urbanc@in.tum.de>
parents: 680
diff changeset
   260
52263ffd17b9 updated
Christian Urban <urbanc@in.tum.de>
parents: 680
diff changeset
   261
To sum up, K-values are the atomic operations that can be on the
52263ffd17b9 updated
Christian Urban <urbanc@in.tum.de>
parents: 680
diff changeset
   262
right-hand side of equal-signs. The K-language is restricted such that
52263ffd17b9 updated
Christian Urban <urbanc@in.tum.de>
parents: 680
diff changeset
   263
it is easy to generate the SSA format for the LLVM-IR. 
680
eecc4d5a2172 updated
Christian Urban <urbanc@in.tum.de>
parents: 679
diff changeset
   264
679
8fc109f36b78 updated
Christian Urban <urbanc@in.tum.de>
parents: 678
diff changeset
   265
8fc109f36b78 updated
Christian Urban <urbanc@in.tum.de>
parents: 678
diff changeset
   266
8fc109f36b78 updated
Christian Urban <urbanc@in.tum.de>
parents: 678
diff changeset
   267
\begin{figure}[p]\small
678
ff3b48da282c updated
Christian Urban <urbanc@in.tum.de>
parents: 677
diff changeset
   268
\begin{lstlisting}[language=Scala,numbers=none]
701
681c36b2af27 updated
Christian Urban <urbanc@in.tum.de>
parents: 700
diff changeset
   269
// Fun language (expressions)
679
8fc109f36b78 updated
Christian Urban <urbanc@in.tum.de>
parents: 678
diff changeset
   270
abstract class Exp 
8fc109f36b78 updated
Christian Urban <urbanc@in.tum.de>
parents: 678
diff changeset
   271
abstract class BExp 
678
ff3b48da282c updated
Christian Urban <urbanc@in.tum.de>
parents: 677
diff changeset
   272
ff3b48da282c updated
Christian Urban <urbanc@in.tum.de>
parents: 677
diff changeset
   273
case class Call(name: String, args: List[Exp]) extends Exp
ff3b48da282c updated
Christian Urban <urbanc@in.tum.de>
parents: 677
diff changeset
   274
case class If(a: BExp, e1: Exp, e2: Exp) extends Exp
ff3b48da282c updated
Christian Urban <urbanc@in.tum.de>
parents: 677
diff changeset
   275
case class Write(e: Exp) extends Exp
ff3b48da282c updated
Christian Urban <urbanc@in.tum.de>
parents: 677
diff changeset
   276
case class Var(s: String) extends Exp
ff3b48da282c updated
Christian Urban <urbanc@in.tum.de>
parents: 677
diff changeset
   277
case class Num(i: Int) extends Exp
ff3b48da282c updated
Christian Urban <urbanc@in.tum.de>
parents: 677
diff changeset
   278
case class Aop(o: String, a1: Exp, a2: Exp) extends Exp
ff3b48da282c updated
Christian Urban <urbanc@in.tum.de>
parents: 677
diff changeset
   279
case class Sequence(e1: Exp, e2: Exp) extends Exp
679
8fc109f36b78 updated
Christian Urban <urbanc@in.tum.de>
parents: 678
diff changeset
   280
case class Bop(o: String, a1: Exp, a2: Exp) extends BExp  
8fc109f36b78 updated
Christian Urban <urbanc@in.tum.de>
parents: 678
diff changeset
   281
8fc109f36b78 updated
Christian Urban <urbanc@in.tum.de>
parents: 678
diff changeset
   282
8fc109f36b78 updated
Christian Urban <urbanc@in.tum.de>
parents: 678
diff changeset
   283
8fc109f36b78 updated
Christian Urban <urbanc@in.tum.de>
parents: 678
diff changeset
   284
// K-language (K-expressions, K-values)
8fc109f36b78 updated
Christian Urban <urbanc@in.tum.de>
parents: 678
diff changeset
   285
abstract class KExp
8fc109f36b78 updated
Christian Urban <urbanc@in.tum.de>
parents: 678
diff changeset
   286
abstract class KVal
8fc109f36b78 updated
Christian Urban <urbanc@in.tum.de>
parents: 678
diff changeset
   287
8fc109f36b78 updated
Christian Urban <urbanc@in.tum.de>
parents: 678
diff changeset
   288
case class KVar(s: String) extends KVal
8fc109f36b78 updated
Christian Urban <urbanc@in.tum.de>
parents: 678
diff changeset
   289
case class KNum(i: Int) extends KVal
8fc109f36b78 updated
Christian Urban <urbanc@in.tum.de>
parents: 678
diff changeset
   290
case class Kop(o: String, v1: KVal, v2: KVal) extends KVal
8fc109f36b78 updated
Christian Urban <urbanc@in.tum.de>
parents: 678
diff changeset
   291
case class KCall(o: String, vrs: List[KVal]) extends KVal
8fc109f36b78 updated
Christian Urban <urbanc@in.tum.de>
parents: 678
diff changeset
   292
case class KWrite(v: KVal) extends KVal
8fc109f36b78 updated
Christian Urban <urbanc@in.tum.de>
parents: 678
diff changeset
   293
8fc109f36b78 updated
Christian Urban <urbanc@in.tum.de>
parents: 678
diff changeset
   294
case class KIf(x1: String, e1: KExp, e2: KExp) extends KExp
680
eecc4d5a2172 updated
Christian Urban <urbanc@in.tum.de>
parents: 679
diff changeset
   295
case class KLet(x: String, v: KVal, e: KExp) extends KExp
679
8fc109f36b78 updated
Christian Urban <urbanc@in.tum.de>
parents: 678
diff changeset
   296
case class KReturn(v: KVal) extends KExp
678
ff3b48da282c updated
Christian Urban <urbanc@in.tum.de>
parents: 677
diff changeset
   297
\end{lstlisting}
ff3b48da282c updated
Christian Urban <urbanc@in.tum.de>
parents: 677
diff changeset
   298
\caption{Abstract syntax trees for the Fun language.\label{absfun}}
ff3b48da282c updated
Christian Urban <urbanc@in.tum.de>
parents: 677
diff changeset
   299
\end{figure}
679
8fc109f36b78 updated
Christian Urban <urbanc@in.tum.de>
parents: 678
diff changeset
   300
678
ff3b48da282c updated
Christian Urban <urbanc@in.tum.de>
parents: 677
diff changeset
   301
ff3b48da282c updated
Christian Urban <urbanc@in.tum.de>
parents: 677
diff changeset
   302
ff3b48da282c updated
Christian Urban <urbanc@in.tum.de>
parents: 677
diff changeset
   303
\subsection*{CPS-Translations}
ff3b48da282c updated
Christian Urban <urbanc@in.tum.de>
parents: 677
diff changeset
   304
704
6d9c960a2b26 updated
Christian Urban <urbanc@in.tum.de>
parents: 701
diff changeset
   305
CPS stands for Continuation-Passing-Style. It is a kind of programming
6d9c960a2b26 updated
Christian Urban <urbanc@in.tum.de>
parents: 701
diff changeset
   306
technique often used in advanced functional programming. Before we delve
6d9c960a2b26 updated
Christian Urban <urbanc@in.tum.de>
parents: 701
diff changeset
   307
into the CPS-translation for our Fun language, let us look at 
6d9c960a2b26 updated
Christian Urban <urbanc@in.tum.de>
parents: 701
diff changeset
   308
CPS-versions of some well-known functions. Consider
6d9c960a2b26 updated
Christian Urban <urbanc@in.tum.de>
parents: 701
diff changeset
   309
6d9c960a2b26 updated
Christian Urban <urbanc@in.tum.de>
parents: 701
diff changeset
   310
\begin{lstlisting}[language=Scala, numbers=none]
6d9c960a2b26 updated
Christian Urban <urbanc@in.tum.de>
parents: 701
diff changeset
   311
def fact(n: Int) : Int = 
6d9c960a2b26 updated
Christian Urban <urbanc@in.tum.de>
parents: 701
diff changeset
   312
  if (n == 0) 1 else n * fact(n - 1) 
6d9c960a2b26 updated
Christian Urban <urbanc@in.tum.de>
parents: 701
diff changeset
   313
\end{lstlisting}
6d9c960a2b26 updated
Christian Urban <urbanc@in.tum.de>
parents: 701
diff changeset
   314
6d9c960a2b26 updated
Christian Urban <urbanc@in.tum.de>
parents: 701
diff changeset
   315
\noindent 
6d9c960a2b26 updated
Christian Urban <urbanc@in.tum.de>
parents: 701
diff changeset
   316
This is clearly the usual factorial function. But now consider the
6d9c960a2b26 updated
Christian Urban <urbanc@in.tum.de>
parents: 701
diff changeset
   317
following version of the factorial function:
6d9c960a2b26 updated
Christian Urban <urbanc@in.tum.de>
parents: 701
diff changeset
   318
6d9c960a2b26 updated
Christian Urban <urbanc@in.tum.de>
parents: 701
diff changeset
   319
\begin{lstlisting}[language=Scala, numbers=none]
6d9c960a2b26 updated
Christian Urban <urbanc@in.tum.de>
parents: 701
diff changeset
   320
def factC(n: Int, ret: Int => Int) : Int = 
6d9c960a2b26 updated
Christian Urban <urbanc@in.tum.de>
parents: 701
diff changeset
   321
  if (n == 0) ret(1) 
6d9c960a2b26 updated
Christian Urban <urbanc@in.tum.de>
parents: 701
diff changeset
   322
  else factC(n - 1, x => ret(n * x)) 
6d9c960a2b26 updated
Christian Urban <urbanc@in.tum.de>
parents: 701
diff changeset
   323
6d9c960a2b26 updated
Christian Urban <urbanc@in.tum.de>
parents: 701
diff changeset
   324
factC(3, identity)
6d9c960a2b26 updated
Christian Urban <urbanc@in.tum.de>
parents: 701
diff changeset
   325
\end{lstlisting}
6d9c960a2b26 updated
Christian Urban <urbanc@in.tum.de>
parents: 701
diff changeset
   326
6d9c960a2b26 updated
Christian Urban <urbanc@in.tum.de>
parents: 701
diff changeset
   327
\noindent
6d9c960a2b26 updated
Christian Urban <urbanc@in.tum.de>
parents: 701
diff changeset
   328
This function is called with the number, in this case 3, and the
6d9c960a2b26 updated
Christian Urban <urbanc@in.tum.de>
parents: 701
diff changeset
   329
identity-function (which returns just its input). The recursive
6d9c960a2b26 updated
Christian Urban <urbanc@in.tum.de>
parents: 701
diff changeset
   330
calls are:
6d9c960a2b26 updated
Christian Urban <urbanc@in.tum.de>
parents: 701
diff changeset
   331
6d9c960a2b26 updated
Christian Urban <urbanc@in.tum.de>
parents: 701
diff changeset
   332
\begin{lstlisting}[language=Scala, numbers=none]
6d9c960a2b26 updated
Christian Urban <urbanc@in.tum.de>
parents: 701
diff changeset
   333
factC(2, x => identity(3 * x))
6d9c960a2b26 updated
Christian Urban <urbanc@in.tum.de>
parents: 701
diff changeset
   334
factC(1, x => identity(3 * (2 * x)))
6d9c960a2b26 updated
Christian Urban <urbanc@in.tum.de>
parents: 701
diff changeset
   335
factC(0, x => identity(3 * (2 * (1 * x))))
6d9c960a2b26 updated
Christian Urban <urbanc@in.tum.de>
parents: 701
diff changeset
   336
\end{lstlisting}
6d9c960a2b26 updated
Christian Urban <urbanc@in.tum.de>
parents: 701
diff changeset
   337
6d9c960a2b26 updated
Christian Urban <urbanc@in.tum.de>
parents: 701
diff changeset
   338
\noindent
6d9c960a2b26 updated
Christian Urban <urbanc@in.tum.de>
parents: 701
diff changeset
   339
Having reached 0, we get out of the recursion and apply 1 to the
6d9c960a2b26 updated
Christian Urban <urbanc@in.tum.de>
parents: 701
diff changeset
   340
continuation (see if-branch above). This gives
6d9c960a2b26 updated
Christian Urban <urbanc@in.tum.de>
parents: 701
diff changeset
   341
6d9c960a2b26 updated
Christian Urban <urbanc@in.tum.de>
parents: 701
diff changeset
   342
\begin{lstlisting}[language=Scala, numbers=none]
6d9c960a2b26 updated
Christian Urban <urbanc@in.tum.de>
parents: 701
diff changeset
   343
identity(3 * (2 * (1 * 1)))
6d9c960a2b26 updated
Christian Urban <urbanc@in.tum.de>
parents: 701
diff changeset
   344
= 3 * (2 * (1 * 1))
6d9c960a2b26 updated
Christian Urban <urbanc@in.tum.de>
parents: 701
diff changeset
   345
= 6
6d9c960a2b26 updated
Christian Urban <urbanc@in.tum.de>
parents: 701
diff changeset
   346
\end{lstlisting}
6d9c960a2b26 updated
Christian Urban <urbanc@in.tum.de>
parents: 701
diff changeset
   347
6d9c960a2b26 updated
Christian Urban <urbanc@in.tum.de>
parents: 701
diff changeset
   348
\noindent
6d9c960a2b26 updated
Christian Urban <urbanc@in.tum.de>
parents: 701
diff changeset
   349
which is the expected result. If this looks somewhat familiar, then this
6d9c960a2b26 updated
Christian Urban <urbanc@in.tum.de>
parents: 701
diff changeset
   350
is not a 1000 miles off, because functions with continuations can be
6d9c960a2b26 updated
Christian Urban <urbanc@in.tum.de>
parents: 701
diff changeset
   351
seen as a kind of generalisation of tail-recursive functions. Anyway
6d9c960a2b26 updated
Christian Urban <urbanc@in.tum.de>
parents: 701
diff changeset
   352
notice how the continuations is ``stacked up'' during the recursion and
6d9c960a2b26 updated
Christian Urban <urbanc@in.tum.de>
parents: 701
diff changeset
   353
then ``unrolled'' when we apply 1 to the continuation. Interestingly, we
6d9c960a2b26 updated
Christian Urban <urbanc@in.tum.de>
parents: 701
diff changeset
   354
can do something similar to the Fibonacci function where in the traditional
6d9c960a2b26 updated
Christian Urban <urbanc@in.tum.de>
parents: 701
diff changeset
   355
version we have two recursive calls. Consider the following function
6d9c960a2b26 updated
Christian Urban <urbanc@in.tum.de>
parents: 701
diff changeset
   356
6d9c960a2b26 updated
Christian Urban <urbanc@in.tum.de>
parents: 701
diff changeset
   357
\begin{lstlisting}[language=Scala, numbers=none]
6d9c960a2b26 updated
Christian Urban <urbanc@in.tum.de>
parents: 701
diff changeset
   358
def fibC(n: Int, ret: Int => Int) : Int = 
6d9c960a2b26 updated
Christian Urban <urbanc@in.tum.de>
parents: 701
diff changeset
   359
  if (n == 0 || n == 1) ret(1) 
6d9c960a2b26 updated
Christian Urban <urbanc@in.tum.de>
parents: 701
diff changeset
   360
  else fibC(n - 1,
6d9c960a2b26 updated
Christian Urban <urbanc@in.tum.de>
parents: 701
diff changeset
   361
             r1 => fibC(n - 2,
6d9c960a2b26 updated
Christian Urban <urbanc@in.tum.de>
parents: 701
diff changeset
   362
               r2 => ret(r1 + r2)))
6d9c960a2b26 updated
Christian Urban <urbanc@in.tum.de>
parents: 701
diff changeset
   363
\end{lstlisting}
6d9c960a2b26 updated
Christian Urban <urbanc@in.tum.de>
parents: 701
diff changeset
   364
6d9c960a2b26 updated
Christian Urban <urbanc@in.tum.de>
parents: 701
diff changeset
   365
\noindent
6d9c960a2b26 updated
Christian Urban <urbanc@in.tum.de>
parents: 701
diff changeset
   366
Here the continuation is a nested function essentially wrapping up 
6d9c960a2b26 updated
Christian Urban <urbanc@in.tum.de>
parents: 701
diff changeset
   367
the second recursive call. Let us check how the recursion unfolds
6d9c960a2b26 updated
Christian Urban <urbanc@in.tum.de>
parents: 701
diff changeset
   368
when called with 3 and the identity function:
6d9c960a2b26 updated
Christian Urban <urbanc@in.tum.de>
parents: 701
diff changeset
   369
6d9c960a2b26 updated
Christian Urban <urbanc@in.tum.de>
parents: 701
diff changeset
   370
\begin{lstlisting}[language=Scala, numbers=none]
6d9c960a2b26 updated
Christian Urban <urbanc@in.tum.de>
parents: 701
diff changeset
   371
fibC(3, id)
6d9c960a2b26 updated
Christian Urban <urbanc@in.tum.de>
parents: 701
diff changeset
   372
fibC(2, r1 => fibC(1, r2 => id(r1 + r2)))
6d9c960a2b26 updated
Christian Urban <urbanc@in.tum.de>
parents: 701
diff changeset
   373
fibC(1, r1 => 
6d9c960a2b26 updated
Christian Urban <urbanc@in.tum.de>
parents: 701
diff changeset
   374
   fibC(0, r2 => fibC(1, r2a => id((r1 + r2) + r2a))))
6d9c960a2b26 updated
Christian Urban <urbanc@in.tum.de>
parents: 701
diff changeset
   375
fibC(0, r2 => fibC(1, r2a => id((1 + r2) + r2a)))
6d9c960a2b26 updated
Christian Urban <urbanc@in.tum.de>
parents: 701
diff changeset
   376
fibC(1, r2a => id((1 + 1) + r2a))
6d9c960a2b26 updated
Christian Urban <urbanc@in.tum.de>
parents: 701
diff changeset
   377
id((1 + 1) + 1)
6d9c960a2b26 updated
Christian Urban <urbanc@in.tum.de>
parents: 701
diff changeset
   378
(1 + 1) + 1
6d9c960a2b26 updated
Christian Urban <urbanc@in.tum.de>
parents: 701
diff changeset
   379
3
6d9c960a2b26 updated
Christian Urban <urbanc@in.tum.de>
parents: 701
diff changeset
   380
\end{lstlisting}
6d9c960a2b26 updated
Christian Urban <urbanc@in.tum.de>
parents: 701
diff changeset
   381
6d9c960a2b26 updated
Christian Urban <urbanc@in.tum.de>
parents: 701
diff changeset
   382
Let us now come back to the CPS-translations for the Fun language. The
6d9c960a2b26 updated
Christian Urban <urbanc@in.tum.de>
parents: 701
diff changeset
   383
main difficulty of generating instructions in SSA format is that large
6d9c960a2b26 updated
Christian Urban <urbanc@in.tum.de>
parents: 701
diff changeset
   384
compound expressions need to be broken up into smaller pieces and
700
52263ffd17b9 updated
Christian Urban <urbanc@in.tum.de>
parents: 680
diff changeset
   385
intermediate results need to be chained into later instructions. To do
52263ffd17b9 updated
Christian Urban <urbanc@in.tum.de>
parents: 680
diff changeset
   386
this conveniently, CPS-translations have been developed. They use
52263ffd17b9 updated
Christian Urban <urbanc@in.tum.de>
parents: 680
diff changeset
   387
functions (``continuations'') to represent what is coming next in a
701
681c36b2af27 updated
Christian Urban <urbanc@in.tum.de>
parents: 700
diff changeset
   388
sequence of instructions. Continuations are functions of type
704
6d9c960a2b26 updated
Christian Urban <urbanc@in.tum.de>
parents: 701
diff changeset
   389
\code{KVal} to \code{KExp}. They can be seen as a sequence of
6d9c960a2b26 updated
Christian Urban <urbanc@in.tum.de>
parents: 701
diff changeset
   390
\code{KLet}s where there is a ``hole'' that needs to be filled. Consider
6d9c960a2b26 updated
Christian Urban <urbanc@in.tum.de>
parents: 701
diff changeset
   391
for example
678
ff3b48da282c updated
Christian Urban <urbanc@in.tum.de>
parents: 677
diff changeset
   392
701
681c36b2af27 updated
Christian Urban <urbanc@in.tum.de>
parents: 700
diff changeset
   393
\begin{lstlisting}[language=LLVMIR,numbers=left,escapeinside={(*@}{@*)}]
681c36b2af27 updated
Christian Urban <urbanc@in.tum.de>
parents: 700
diff changeset
   394
let tmp0 = add 1 a in   
681c36b2af27 updated
Christian Urban <urbanc@in.tum.de>
parents: 700
diff changeset
   395
let tmp1 = mul (*@$\Box$@*) 5 in 
681c36b2af27 updated
Christian Urban <urbanc@in.tum.de>
parents: 700
diff changeset
   396
let tmp2 = add 3 tmp1 in 
681c36b2af27 updated
Christian Urban <urbanc@in.tum.de>
parents: 700
diff changeset
   397
let tmp3 = add tmp0 tmp2 in
681c36b2af27 updated
Christian Urban <urbanc@in.tum.de>
parents: 700
diff changeset
   398
  tmp3 
681c36b2af27 updated
Christian Urban <urbanc@in.tum.de>
parents: 700
diff changeset
   399
\end{lstlisting}
681c36b2af27 updated
Christian Urban <urbanc@in.tum.de>
parents: 700
diff changeset
   400
681c36b2af27 updated
Christian Urban <urbanc@in.tum.de>
parents: 700
diff changeset
   401
\noindent
681c36b2af27 updated
Christian Urban <urbanc@in.tum.de>
parents: 700
diff changeset
   402
where in the second line is a $\Box$ which still expects a \code{KVal}
681c36b2af27 updated
Christian Urban <urbanc@in.tum.de>
parents: 700
diff changeset
   403
to be filled in before it becomes a ``proper'' \code{KExp}. When we 
681c36b2af27 updated
Christian Urban <urbanc@in.tum.de>
parents: 700
diff changeset
   404
apply and argument to the continuation (remember they are functions)
681c36b2af27 updated
Christian Urban <urbanc@in.tum.de>
parents: 700
diff changeset
   405
we essentially fill something into the corresponding hole. The code
681c36b2af27 updated
Christian Urban <urbanc@in.tum.de>
parents: 700
diff changeset
   406
of the CPS-translation is 
679
8fc109f36b78 updated
Christian Urban <urbanc@in.tum.de>
parents: 678
diff changeset
   407
701
681c36b2af27 updated
Christian Urban <urbanc@in.tum.de>
parents: 700
diff changeset
   408
\begin{lstlisting}[language=Scala,numbers=none]
681c36b2af27 updated
Christian Urban <urbanc@in.tum.de>
parents: 700
diff changeset
   409
def CPS(e: Exp)(k: KVal => KExp) : KExp = 
681c36b2af27 updated
Christian Urban <urbanc@in.tum.de>
parents: 700
diff changeset
   410
  e match { ... }
681c36b2af27 updated
Christian Urban <urbanc@in.tum.de>
parents: 700
diff changeset
   411
\end{lstlisting}
679
8fc109f36b78 updated
Christian Urban <urbanc@in.tum.de>
parents: 678
diff changeset
   412
701
681c36b2af27 updated
Christian Urban <urbanc@in.tum.de>
parents: 700
diff changeset
   413
\noindent 
681c36b2af27 updated
Christian Urban <urbanc@in.tum.de>
parents: 700
diff changeset
   414
where \code{k} is the continuation and \code{e} is the expression 
681c36b2af27 updated
Christian Urban <urbanc@in.tum.de>
parents: 700
diff changeset
   415
to be compiled. In case we have numbers or variables, we can just
681c36b2af27 updated
Christian Urban <urbanc@in.tum.de>
parents: 700
diff changeset
   416
apply the continuation like 
679
8fc109f36b78 updated
Christian Urban <urbanc@in.tum.de>
parents: 678
diff changeset
   417
701
681c36b2af27 updated
Christian Urban <urbanc@in.tum.de>
parents: 700
diff changeset
   418
\begin{center}
681c36b2af27 updated
Christian Urban <urbanc@in.tum.de>
parents: 700
diff changeset
   419
\code{k(KNum(n))} \qquad \code{k(KVar(x))}
681c36b2af27 updated
Christian Urban <urbanc@in.tum.de>
parents: 700
diff changeset
   420
\end{center}
679
8fc109f36b78 updated
Christian Urban <urbanc@in.tum.de>
parents: 678
diff changeset
   421
701
681c36b2af27 updated
Christian Urban <urbanc@in.tum.de>
parents: 700
diff changeset
   422
\noindent This would just fill in the $\Box$ in a \code{KLet}-expression.
681c36b2af27 updated
Christian Urban <urbanc@in.tum.de>
parents: 700
diff changeset
   423
More interesting is the case for an arithmetic expression.
681c36b2af27 updated
Christian Urban <urbanc@in.tum.de>
parents: 700
diff changeset
   424
681c36b2af27 updated
Christian Urban <urbanc@in.tum.de>
parents: 700
diff changeset
   425
\begin{lstlisting}[language=Scala,numbers=none]
681c36b2af27 updated
Christian Urban <urbanc@in.tum.de>
parents: 700
diff changeset
   426
case Aop(o, e1, e2) => {
681c36b2af27 updated
Christian Urban <urbanc@in.tum.de>
parents: 700
diff changeset
   427
    val z = Fresh("tmp")
681c36b2af27 updated
Christian Urban <urbanc@in.tum.de>
parents: 700
diff changeset
   428
    CPS(e1)(y1 => 
681c36b2af27 updated
Christian Urban <urbanc@in.tum.de>
parents: 700
diff changeset
   429
      CPS(e2)(y2 => KLet(z, Kop(o, y1, y2), k(KVar(z)))))
681c36b2af27 updated
Christian Urban <urbanc@in.tum.de>
parents: 700
diff changeset
   430
}
681c36b2af27 updated
Christian Urban <urbanc@in.tum.de>
parents: 700
diff changeset
   431
\end{lstlisting}
681c36b2af27 updated
Christian Urban <urbanc@in.tum.de>
parents: 700
diff changeset
   432
681c36b2af27 updated
Christian Urban <urbanc@in.tum.de>
parents: 700
diff changeset
   433
\noindent
539
ed8f014217be updated
Christian Urban <urbanc@in.tum.de>
parents:
diff changeset
   434
\end{document}
ed8f014217be updated
Christian Urban <urbanc@in.tum.de>
parents:
diff changeset
   435
ed8f014217be updated
Christian Urban <urbanc@in.tum.de>
parents:
diff changeset
   436
ed8f014217be updated
Christian Urban <urbanc@in.tum.de>
parents:
diff changeset
   437
%%% Local Variables: 
ed8f014217be updated
Christian Urban <urbanc@in.tum.de>
parents:
diff changeset
   438
%%% mode: latex
ed8f014217be updated
Christian Urban <urbanc@in.tum.de>
parents:
diff changeset
   439
%%% TeX-master: t
ed8f014217be updated
Christian Urban <urbanc@in.tum.de>
parents:
diff changeset
   440
%%% End: