handouts/ho09.tex
author Christian Urban <urbanc@in.tum.de>
Sun, 24 Nov 2019 01:03:38 +0000
changeset 699 b2dc9198687d
parent 680 242c1f0e60df
child 700 f1d4d582ac29
permissions -rw-r--r--
updated
Ignore whitespace changes - Everywhere: Within whitespace: At end of lines:
677
3787d4fae375 updated
Christian Urban <urbanc@in.tum.de>
parents: 539
diff changeset
     1
% !TEX program = xelatex
539
8a12889f8c8a updated
Christian Urban <urbanc@in.tum.de>
parents:
diff changeset
     2
\documentclass{article}
8a12889f8c8a updated
Christian Urban <urbanc@in.tum.de>
parents:
diff changeset
     3
\usepackage{../style}
8a12889f8c8a updated
Christian Urban <urbanc@in.tum.de>
parents:
diff changeset
     4
\usepackage{../langs}
8a12889f8c8a updated
Christian Urban <urbanc@in.tum.de>
parents:
diff changeset
     5
\usepackage{../graphics}
8a12889f8c8a updated
Christian Urban <urbanc@in.tum.de>
parents:
diff changeset
     6
\usepackage{../grammar}
677
3787d4fae375 updated
Christian Urban <urbanc@in.tum.de>
parents: 539
diff changeset
     7
%%\usepackage{multicol}
539
8a12889f8c8a updated
Christian Urban <urbanc@in.tum.de>
parents:
diff changeset
     8
677
3787d4fae375 updated
Christian Urban <urbanc@in.tum.de>
parents: 539
diff changeset
     9
%%\newcommand{\dn}{\stackrel{\mbox{\scriptsize def}}{=}}
539
8a12889f8c8a updated
Christian Urban <urbanc@in.tum.de>
parents:
diff changeset
    10
8a12889f8c8a updated
Christian Urban <urbanc@in.tum.de>
parents:
diff changeset
    11
\begin{document}
677
3787d4fae375 updated
Christian Urban <urbanc@in.tum.de>
parents: 539
diff changeset
    12
\fnote{\copyright{} Christian Urban, King's College London, 2019}
539
8a12889f8c8a updated
Christian Urban <urbanc@in.tum.de>
parents:
diff changeset
    13
8a12889f8c8a updated
Christian Urban <urbanc@in.tum.de>
parents:
diff changeset
    14
677
3787d4fae375 updated
Christian Urban <urbanc@in.tum.de>
parents: 539
diff changeset
    15
\section*{Handout 9 (LLVM, SSA and CPS)}
539
8a12889f8c8a updated
Christian Urban <urbanc@in.tum.de>
parents:
diff changeset
    16
678
6601ff1d9e0a updated
Christian Urban <urbanc@in.tum.de>
parents: 677
diff changeset
    17
Reflecting on our tiny compiler targetting the JVM, the code generation
6601ff1d9e0a updated
Christian Urban <urbanc@in.tum.de>
parents: 677
diff changeset
    18
part was actually not so hard, no? Pretty much just some post-traversal
680
242c1f0e60df updated
Christian Urban <urbanc@in.tum.de>
parents: 679
diff changeset
    19
of the abstract syntax tree, yes? One of the reasons for this ease is
242c1f0e60df updated
Christian Urban <urbanc@in.tum.de>
parents: 679
diff changeset
    20
that the JVM is a stack-based virtual machine and it is therefore not
242c1f0e60df updated
Christian Urban <urbanc@in.tum.de>
parents: 679
diff changeset
    21
hard to translate deeply-nested arithmetic expressions into a sequence
242c1f0e60df updated
Christian Urban <urbanc@in.tum.de>
parents: 679
diff changeset
    22
of instructions manipulating the stack. The problem is that ``real''
242c1f0e60df updated
Christian Urban <urbanc@in.tum.de>
parents: 679
diff changeset
    23
CPUs, although supporting stack operations, are not really designed to
242c1f0e60df updated
Christian Urban <urbanc@in.tum.de>
parents: 679
diff changeset
    24
be \emph{stack machines}.  The design of CPUs is more like, here is a
242c1f0e60df updated
Christian Urban <urbanc@in.tum.de>
parents: 679
diff changeset
    25
chunk of memory---compiler, or better compiler writers, do something
242c1f0e60df updated
Christian Urban <urbanc@in.tum.de>
parents: 679
diff changeset
    26
with it. Consequently, modern compilers need to go the extra mile in
242c1f0e60df updated
Christian Urban <urbanc@in.tum.de>
parents: 679
diff changeset
    27
order to generate code that is much easier and faster to process by
242c1f0e60df updated
Christian Urban <urbanc@in.tum.de>
parents: 679
diff changeset
    28
CPUs. To make this all tractable for this module, we target the LLVM
242c1f0e60df updated
Christian Urban <urbanc@in.tum.de>
parents: 679
diff changeset
    29
Intermediate Language. In this way we can take advantage of the tools
242c1f0e60df updated
Christian Urban <urbanc@in.tum.de>
parents: 679
diff changeset
    30
coming with LLVM. For example we do not have to worry about things like
242c1f0e60df updated
Christian Urban <urbanc@in.tum.de>
parents: 679
diff changeset
    31
register allocations.\bigskip 
539
8a12889f8c8a updated
Christian Urban <urbanc@in.tum.de>
parents:
diff changeset
    32
678
6601ff1d9e0a updated
Christian Urban <urbanc@in.tum.de>
parents: 677
diff changeset
    33
\noindent LLVM\footnote{\url{http://llvm.org}} is a beautiful example
6601ff1d9e0a updated
Christian Urban <urbanc@in.tum.de>
parents: 677
diff changeset
    34
that projects from Academia can make a difference in the world. LLVM
6601ff1d9e0a updated
Christian Urban <urbanc@in.tum.de>
parents: 677
diff changeset
    35
started in 2000 as a project by two researchers at the  University of
6601ff1d9e0a updated
Christian Urban <urbanc@in.tum.de>
parents: 677
diff changeset
    36
Illinois at Urbana-Champaign. At the time the behemoth of compilers was
680
242c1f0e60df updated
Christian Urban <urbanc@in.tum.de>
parents: 679
diff changeset
    37
gcc with its myriad of front-ends for other languages (C++, Fortran,
678
6601ff1d9e0a updated
Christian Urban <urbanc@in.tum.de>
parents: 677
diff changeset
    38
Ada, Go, Objective-C, Pascal etc). The problem was that gcc morphed over
6601ff1d9e0a updated
Christian Urban <urbanc@in.tum.de>
parents: 677
diff changeset
    39
time into a monolithic gigantic piece of m\ldots ehm software, which you
6601ff1d9e0a updated
Christian Urban <urbanc@in.tum.de>
parents: 677
diff changeset
    40
could not mess about in an afternoon. In contrast, LLVM is designed to
6601ff1d9e0a updated
Christian Urban <urbanc@in.tum.de>
parents: 677
diff changeset
    41
be a modular suite of tools with which you could play around easily and
6601ff1d9e0a updated
Christian Urban <urbanc@in.tum.de>
parents: 677
diff changeset
    42
try out something new. LLVM became a big player once Apple hired one of
6601ff1d9e0a updated
Christian Urban <urbanc@in.tum.de>
parents: 677
diff changeset
    43
the original developers (I cannot remember the reason why Apple did not
6601ff1d9e0a updated
Christian Urban <urbanc@in.tum.de>
parents: 677
diff changeset
    44
want to use gcc, but maybe they were also just disgusted by its big
6601ff1d9e0a updated
Christian Urban <urbanc@in.tum.de>
parents: 677
diff changeset
    45
monolithic codebase). Anyway, LLVM is now the big player and gcc is more
6601ff1d9e0a updated
Christian Urban <urbanc@in.tum.de>
parents: 677
diff changeset
    46
or less legacy. This does not mean that programming languages like C and
6601ff1d9e0a updated
Christian Urban <urbanc@in.tum.de>
parents: 677
diff changeset
    47
C++ are dying out any time soon---they are nicely supported by LLVM.
539
8a12889f8c8a updated
Christian Urban <urbanc@in.tum.de>
parents:
diff changeset
    48
680
242c1f0e60df updated
Christian Urban <urbanc@in.tum.de>
parents: 679
diff changeset
    49
We will target the LLVM Intermediate Language, or Intermediate
242c1f0e60df updated
Christian Urban <urbanc@in.tum.de>
parents: 679
diff changeset
    50
Representation (short LLVM-IR). As a result we can benefit from the
678
6601ff1d9e0a updated
Christian Urban <urbanc@in.tum.de>
parents: 677
diff changeset
    51
modular structure of the LLVM compiler and let for example the compiler
680
242c1f0e60df updated
Christian Urban <urbanc@in.tum.de>
parents: 679
diff changeset
    52
generate code for different CPUs, like X86 or ARM. That means we can be
242c1f0e60df updated
Christian Urban <urbanc@in.tum.de>
parents: 679
diff changeset
    53
agnostic about where our code actually runs. We can also be ignorant
242c1f0e60df updated
Christian Urban <urbanc@in.tum.de>
parents: 679
diff changeset
    54
about optimising code and allocating memory efficiently. However, what
242c1f0e60df updated
Christian Urban <urbanc@in.tum.de>
parents: 679
diff changeset
    55
we have to do is to generate code in \emph{Static Single-Assignment}
242c1f0e60df updated
Christian Urban <urbanc@in.tum.de>
parents: 679
diff changeset
    56
format (short SSA), because that is what the LLVM-IR expects from us. 
539
8a12889f8c8a updated
Christian Urban <urbanc@in.tum.de>
parents:
diff changeset
    57
678
6601ff1d9e0a updated
Christian Urban <urbanc@in.tum.de>
parents: 677
diff changeset
    58
The idea behind the SSA format is to use very simple variable
6601ff1d9e0a updated
Christian Urban <urbanc@in.tum.de>
parents: 677
diff changeset
    59
assignments where every variable is assigned only once. The assignments
6601ff1d9e0a updated
Christian Urban <urbanc@in.tum.de>
parents: 677
diff changeset
    60
also need to be primitive in the sense that they can be just simple
6601ff1d9e0a updated
Christian Urban <urbanc@in.tum.de>
parents: 677
diff changeset
    61
operations like addition, multiplication, jumps, comparisons and so on.
680
242c1f0e60df updated
Christian Urban <urbanc@in.tum.de>
parents: 679
diff changeset
    62
Say, we have an expression $((1 + a) + (3 + (b * 5)))$, then the
242c1f0e60df updated
Christian Urban <urbanc@in.tum.de>
parents: 679
diff changeset
    63
SSA is 
242c1f0e60df updated
Christian Urban <urbanc@in.tum.de>
parents: 679
diff changeset
    64
 
242c1f0e60df updated
Christian Urban <urbanc@in.tum.de>
parents: 679
diff changeset
    65
\begin{lstlisting}[language=LLVMIR,numbers=left]
242c1f0e60df updated
Christian Urban <urbanc@in.tum.de>
parents: 679
diff changeset
    66
let tmp0 = add 1 a in   
242c1f0e60df updated
Christian Urban <urbanc@in.tum.de>
parents: 679
diff changeset
    67
let tmp1 = mul b 5 in 
242c1f0e60df updated
Christian Urban <urbanc@in.tum.de>
parents: 679
diff changeset
    68
let tmp2 = add 3 tmp1 in 
242c1f0e60df updated
Christian Urban <urbanc@in.tum.de>
parents: 679
diff changeset
    69
let tmp3 = add tmp0 tmp2 in
242c1f0e60df updated
Christian Urban <urbanc@in.tum.de>
parents: 679
diff changeset
    70
  tmp3 
677
3787d4fae375 updated
Christian Urban <urbanc@in.tum.de>
parents: 539
diff changeset
    71
\end{lstlisting}
539
8a12889f8c8a updated
Christian Urban <urbanc@in.tum.de>
parents:
diff changeset
    72
678
6601ff1d9e0a updated
Christian Urban <urbanc@in.tum.de>
parents: 677
diff changeset
    73
\noindent where every variable is used only once (we could not write
680
242c1f0e60df updated
Christian Urban <urbanc@in.tum.de>
parents: 679
diff changeset
    74
\texttt{tmp1 = add 3 tmp1} in Line 3 for example).  There are
678
6601ff1d9e0a updated
Christian Urban <urbanc@in.tum.de>
parents: 677
diff changeset
    75
sophisticated algorithms for imperative languages, like C, that
6601ff1d9e0a updated
Christian Urban <urbanc@in.tum.de>
parents: 677
diff changeset
    76
efficiently transform a high-level program into SSA format. But we can
6601ff1d9e0a updated
Christian Urban <urbanc@in.tum.de>
parents: 677
diff changeset
    77
ignore them here. We want to compile a functional language and there
6601ff1d9e0a updated
Christian Urban <urbanc@in.tum.de>
parents: 677
diff changeset
    78
things get much more interesting than just sophisticated. We will need
6601ff1d9e0a updated
Christian Urban <urbanc@in.tum.de>
parents: 677
diff changeset
    79
to have a look at CPS translations, where the CPS stands for
6601ff1d9e0a updated
Christian Urban <urbanc@in.tum.de>
parents: 677
diff changeset
    80
Continuation-Passing-Style---basically black programming art or
6601ff1d9e0a updated
Christian Urban <urbanc@in.tum.de>
parents: 677
diff changeset
    81
abracadabra programming. So sit tight.
539
8a12889f8c8a updated
Christian Urban <urbanc@in.tum.de>
parents:
diff changeset
    82
678
6601ff1d9e0a updated
Christian Urban <urbanc@in.tum.de>
parents: 677
diff changeset
    83
\subsection*{LLVM-IR}
539
8a12889f8c8a updated
Christian Urban <urbanc@in.tum.de>
parents:
diff changeset
    84
678
6601ff1d9e0a updated
Christian Urban <urbanc@in.tum.de>
parents: 677
diff changeset
    85
Before we start, lets first have a look at the \emph{LLVM Intermediate
680
242c1f0e60df updated
Christian Urban <urbanc@in.tum.de>
parents: 679
diff changeset
    86
Representation} in more detail. What is good about our toy Fun language
242c1f0e60df updated
Christian Urban <urbanc@in.tum.de>
parents: 679
diff changeset
    87
is that it basically only contains expressions (be they arithmetic
242c1f0e60df updated
Christian Urban <urbanc@in.tum.de>
parents: 679
diff changeset
    88
expressions or boolean expressions or if-expressions). The exception are
242c1f0e60df updated
Christian Urban <urbanc@in.tum.de>
parents: 679
diff changeset
    89
function definitions. Luckily, for them we can use the mechanism of
242c1f0e60df updated
Christian Urban <urbanc@in.tum.de>
parents: 679
diff changeset
    90
defining functions in LLVM-IR. For example the simple Fun program 
539
8a12889f8c8a updated
Christian Urban <urbanc@in.tum.de>
parents:
diff changeset
    91
8a12889f8c8a updated
Christian Urban <urbanc@in.tum.de>
parents:
diff changeset
    92
677
3787d4fae375 updated
Christian Urban <urbanc@in.tum.de>
parents: 539
diff changeset
    93
\begin{lstlisting}[language=Scala,numbers=none]
678
6601ff1d9e0a updated
Christian Urban <urbanc@in.tum.de>
parents: 677
diff changeset
    94
def sqr(x) = x * x
677
3787d4fae375 updated
Christian Urban <urbanc@in.tum.de>
parents: 539
diff changeset
    95
\end{lstlisting}
539
8a12889f8c8a updated
Christian Urban <urbanc@in.tum.de>
parents:
diff changeset
    96
677
3787d4fae375 updated
Christian Urban <urbanc@in.tum.de>
parents: 539
diff changeset
    97
\noindent
3787d4fae375 updated
Christian Urban <urbanc@in.tum.de>
parents: 539
diff changeset
    98
can be compiled into the following LLVM-IR function:
539
8a12889f8c8a updated
Christian Urban <urbanc@in.tum.de>
parents:
diff changeset
    99
677
3787d4fae375 updated
Christian Urban <urbanc@in.tum.de>
parents: 539
diff changeset
   100
\begin{lstlisting}[language=LLVM]
678
6601ff1d9e0a updated
Christian Urban <urbanc@in.tum.de>
parents: 677
diff changeset
   101
define i32 @sqr(i32 %x) {
6601ff1d9e0a updated
Christian Urban <urbanc@in.tum.de>
parents: 677
diff changeset
   102
   %tmp = mul i32 %x, %x
677
3787d4fae375 updated
Christian Urban <urbanc@in.tum.de>
parents: 539
diff changeset
   103
   ret i32 %tmp
3787d4fae375 updated
Christian Urban <urbanc@in.tum.de>
parents: 539
diff changeset
   104
}    
3787d4fae375 updated
Christian Urban <urbanc@in.tum.de>
parents: 539
diff changeset
   105
\end{lstlisting}
539
8a12889f8c8a updated
Christian Urban <urbanc@in.tum.de>
parents:
diff changeset
   106
680
242c1f0e60df updated
Christian Urban <urbanc@in.tum.de>
parents: 679
diff changeset
   107
\noindent First notice that all variable names in the LLVM-IR are
242c1f0e60df updated
Christian Urban <urbanc@in.tum.de>
parents: 679
diff changeset
   108
prefixed by \texttt{\%}; function names need to be prefixed with
242c1f0e60df updated
Christian Urban <urbanc@in.tum.de>
parents: 679
diff changeset
   109
@-symbol. Also, the LLVM-IR is a fully typed language. The \texttt{i32}
242c1f0e60df updated
Christian Urban <urbanc@in.tum.de>
parents: 679
diff changeset
   110
type stands for 32-bit integers. There are also types for 64-bit
242c1f0e60df updated
Christian Urban <urbanc@in.tum.de>
parents: 679
diff changeset
   111
integers (\texttt{i64}), chars (\texttt{i8}), floats, arrays and even
242c1f0e60df updated
Christian Urban <urbanc@in.tum.de>
parents: 679
diff changeset
   112
pointer types. In the code above, \texttt{sqr} takes an argument of type
242c1f0e60df updated
Christian Urban <urbanc@in.tum.de>
parents: 679
diff changeset
   113
\texttt{i32} and produces a result of type \texttt{i32} (the result type
242c1f0e60df updated
Christian Urban <urbanc@in.tum.de>
parents: 679
diff changeset
   114
is before the function name, like in C). Each arithmetic operation, like
242c1f0e60df updated
Christian Urban <urbanc@in.tum.de>
parents: 679
diff changeset
   115
addition or multiplication, are also prefixed with the type they operate
242c1f0e60df updated
Christian Urban <urbanc@in.tum.de>
parents: 679
diff changeset
   116
on. Obviously these types need to match up\ldots{} but since we have in
242c1f0e60df updated
Christian Urban <urbanc@in.tum.de>
parents: 679
diff changeset
   117
our programs only integers, \texttt{i32} everywhere will do. 
539
8a12889f8c8a updated
Christian Urban <urbanc@in.tum.de>
parents:
diff changeset
   118
 
679
9a4404f65b63 updated
Christian Urban <urbanc@in.tum.de>
parents: 678
diff changeset
   119
Conveniently, you can use the program \texttt{lli}, which comes with
9a4404f65b63 updated
Christian Urban <urbanc@in.tum.de>
parents: 678
diff changeset
   120
LLVM, to interpret programs written in the LLVM-IR. So you can easily
9a4404f65b63 updated
Christian Urban <urbanc@in.tum.de>
parents: 678
diff changeset
   121
check whether the code you produced actually works. To get a running
9a4404f65b63 updated
Christian Urban <urbanc@in.tum.de>
parents: 678
diff changeset
   122
program that does something interesting you need to add some boilerplate
9a4404f65b63 updated
Christian Urban <urbanc@in.tum.de>
parents: 678
diff changeset
   123
about printing out numbers and a main-function that is the entrypoint
680
242c1f0e60df updated
Christian Urban <urbanc@in.tum.de>
parents: 679
diff changeset
   124
for the program (see Figure~\ref{lli} for a complete listing). You can
242c1f0e60df updated
Christian Urban <urbanc@in.tum.de>
parents: 679
diff changeset
   125
generate a binary for this program using \texttt{llc}-compiler and
242c1f0e60df updated
Christian Urban <urbanc@in.tum.de>
parents: 679
diff changeset
   126
\texttt{gcc}---\texttt{llc} can generate an object file and then you can 
242c1f0e60df updated
Christian Urban <urbanc@in.tum.de>
parents: 679
diff changeset
   127
use \texttt{gcc} (that is clang) for generating an executable  binary:
678
6601ff1d9e0a updated
Christian Urban <urbanc@in.tum.de>
parents: 677
diff changeset
   128
679
9a4404f65b63 updated
Christian Urban <urbanc@in.tum.de>
parents: 678
diff changeset
   129
\begin{lstlisting}[language=bash,numbers=none]
9a4404f65b63 updated
Christian Urban <urbanc@in.tum.de>
parents: 678
diff changeset
   130
llc -filetype=obj sqr.ll
9a4404f65b63 updated
Christian Urban <urbanc@in.tum.de>
parents: 678
diff changeset
   131
gcc sqr.o -o a.out
9a4404f65b63 updated
Christian Urban <urbanc@in.tum.de>
parents: 678
diff changeset
   132
./a.out
680
242c1f0e60df updated
Christian Urban <urbanc@in.tum.de>
parents: 679
diff changeset
   133
> 25
679
9a4404f65b63 updated
Christian Urban <urbanc@in.tum.de>
parents: 678
diff changeset
   134
\end{lstlisting}
9a4404f65b63 updated
Christian Urban <urbanc@in.tum.de>
parents: 678
diff changeset
   135
9a4404f65b63 updated
Christian Urban <urbanc@in.tum.de>
parents: 678
diff changeset
   136
\begin{figure}[t]\small 
9a4404f65b63 updated
Christian Urban <urbanc@in.tum.de>
parents: 678
diff changeset
   137
\lstinputlisting[language=LLVM,numbers=left]{../progs/sqr.ll}
9a4404f65b63 updated
Christian Urban <urbanc@in.tum.de>
parents: 678
diff changeset
   138
\caption{An LLVM-IR program for calculating the square function. The 
9a4404f65b63 updated
Christian Urban <urbanc@in.tum.de>
parents: 678
diff changeset
   139
interesting function is \texttt{sqr} in Lines 13 -- 16. The main
9a4404f65b63 updated
Christian Urban <urbanc@in.tum.de>
parents: 678
diff changeset
   140
function calls \texttt{sqr} and prints out the result. The other
9a4404f65b63 updated
Christian Urban <urbanc@in.tum.de>
parents: 678
diff changeset
   141
code is boilerplate for printing out integers.\label{lli}}
678
6601ff1d9e0a updated
Christian Urban <urbanc@in.tum.de>
parents: 677
diff changeset
   142
\end{figure}   
6601ff1d9e0a updated
Christian Urban <urbanc@in.tum.de>
parents: 677
diff changeset
   143
679
9a4404f65b63 updated
Christian Urban <urbanc@in.tum.de>
parents: 678
diff changeset
   144
9a4404f65b63 updated
Christian Urban <urbanc@in.tum.de>
parents: 678
diff changeset
   145
    
9a4404f65b63 updated
Christian Urban <urbanc@in.tum.de>
parents: 678
diff changeset
   146
\subsection*{Our Own Intermediate Language}
9a4404f65b63 updated
Christian Urban <urbanc@in.tum.de>
parents: 678
diff changeset
   147
9a4404f65b63 updated
Christian Urban <urbanc@in.tum.de>
parents: 678
diff changeset
   148
Remember compilers have to solve the problem of bridging the gap between
680
242c1f0e60df updated
Christian Urban <urbanc@in.tum.de>
parents: 679
diff changeset
   149
``high-level'' programs and ``low-level'' hardware. If the gap is too
242c1f0e60df updated
Christian Urban <urbanc@in.tum.de>
parents: 679
diff changeset
   150
wide for one step, then a good strategy is to lay a stepping stone
242c1f0e60df updated
Christian Urban <urbanc@in.tum.de>
parents: 679
diff changeset
   151
somewhere in between. The LLVM-IR itself is such a stepping stone to
242c1f0e60df updated
Christian Urban <urbanc@in.tum.de>
parents: 679
diff changeset
   152
make the task of generating and optimising code easier. Like a real
242c1f0e60df updated
Christian Urban <urbanc@in.tum.de>
parents: 679
diff changeset
   153
compiler we will use our own stepping stone which I call the
242c1f0e60df updated
Christian Urban <urbanc@in.tum.de>
parents: 679
diff changeset
   154
\emph{K-language}. For this remember expressions (and boolean
242c1f0e60df updated
Christian Urban <urbanc@in.tum.de>
parents: 679
diff changeset
   155
expressions) in the Fun language. For convenience the Scala code is
242c1f0e60df updated
Christian Urban <urbanc@in.tum.de>
parents: 679
diff changeset
   156
shown on top of Figure~\ref{absfun}. Below is the code for the
242c1f0e60df updated
Christian Urban <urbanc@in.tum.de>
parents: 679
diff changeset
   157
K-language. There are two kinds of syntactic entities, namely
242c1f0e60df updated
Christian Urban <urbanc@in.tum.de>
parents: 679
diff changeset
   158
\emph{K-values} and \emph{K-expressions}. The central point of the
242c1f0e60df updated
Christian Urban <urbanc@in.tum.de>
parents: 679
diff changeset
   159
K-language is the \texttt{KLet}-constructor. For this recall that 
242c1f0e60df updated
Christian Urban <urbanc@in.tum.de>
parents: 679
diff changeset
   160
arithmetic expressions such as $((1 + a) + (3 + (b * 5)))$ need
242c1f0e60df updated
Christian Urban <urbanc@in.tum.de>
parents: 679
diff changeset
   161
to be broken up into smaller ``atomic'' steps, like so
242c1f0e60df updated
Christian Urban <urbanc@in.tum.de>
parents: 679
diff changeset
   162
 
242c1f0e60df updated
Christian Urban <urbanc@in.tum.de>
parents: 679
diff changeset
   163
\begin{lstlisting}[language=LLVMIR,numbers=none]
242c1f0e60df updated
Christian Urban <urbanc@in.tum.de>
parents: 679
diff changeset
   164
let tmp0 = add 1 a in   
242c1f0e60df updated
Christian Urban <urbanc@in.tum.de>
parents: 679
diff changeset
   165
let tmp1 = mul b 5 in 
242c1f0e60df updated
Christian Urban <urbanc@in.tum.de>
parents: 679
diff changeset
   166
let tmp2 = add 3 tmp1 in 
242c1f0e60df updated
Christian Urban <urbanc@in.tum.de>
parents: 679
diff changeset
   167
let tmp3 = add tmp0 tmp2 in
242c1f0e60df updated
Christian Urban <urbanc@in.tum.de>
parents: 679
diff changeset
   168
  tmp3 
242c1f0e60df updated
Christian Urban <urbanc@in.tum.de>
parents: 679
diff changeset
   169
\end{lstlisting}
242c1f0e60df updated
Christian Urban <urbanc@in.tum.de>
parents: 679
diff changeset
   170
242c1f0e60df updated
Christian Urban <urbanc@in.tum.de>
parents: 679
diff changeset
   171
\noindent
242c1f0e60df updated
Christian Urban <urbanc@in.tum.de>
parents: 679
diff changeset
   172
Here \texttt{tmp3} will contain the result of what the expression stands
242c1f0e60df updated
Christian Urban <urbanc@in.tum.de>
parents: 679
diff changeset
   173
for. In each step we can only perform an ``atomic'' operation, like
242c1f0e60df updated
Christian Urban <urbanc@in.tum.de>
parents: 679
diff changeset
   174
addition or multiplication. We could not for example have an
242c1f0e60df updated
Christian Urban <urbanc@in.tum.de>
parents: 679
diff changeset
   175
if-condition on the right-hand side of an equals. These constraints
242c1f0e60df updated
Christian Urban <urbanc@in.tum.de>
parents: 679
diff changeset
   176
enforced upon us because how the SSA format works in the LLVM-IR. By
242c1f0e60df updated
Christian Urban <urbanc@in.tum.de>
parents: 679
diff changeset
   177
having in \texttt{KLet},  first a string (standing for an intermediate
242c1f0e60df updated
Christian Urban <urbanc@in.tum.de>
parents: 679
diff changeset
   178
result) and second a value, we can fulfil this constraint---there is no
242c1f0e60df updated
Christian Urban <urbanc@in.tum.de>
parents: 679
diff changeset
   179
way we could write anything else than a value. To sum up, K-values are
242c1f0e60df updated
Christian Urban <urbanc@in.tum.de>
parents: 679
diff changeset
   180
the atomic operations that can be on the right-hand side of equal-signs.
242c1f0e60df updated
Christian Urban <urbanc@in.tum.de>
parents: 679
diff changeset
   181
The K-language is restricted such that it is easy to generate the SSA format 
242c1f0e60df updated
Christian Urban <urbanc@in.tum.de>
parents: 679
diff changeset
   182
for the LLVM-IR.
242c1f0e60df updated
Christian Urban <urbanc@in.tum.de>
parents: 679
diff changeset
   183
679
9a4404f65b63 updated
Christian Urban <urbanc@in.tum.de>
parents: 678
diff changeset
   184
9a4404f65b63 updated
Christian Urban <urbanc@in.tum.de>
parents: 678
diff changeset
   185
9a4404f65b63 updated
Christian Urban <urbanc@in.tum.de>
parents: 678
diff changeset
   186
\begin{figure}[p]\small
678
6601ff1d9e0a updated
Christian Urban <urbanc@in.tum.de>
parents: 677
diff changeset
   187
\begin{lstlisting}[language=Scala,numbers=none]
679
9a4404f65b63 updated
Christian Urban <urbanc@in.tum.de>
parents: 678
diff changeset
   188
// Fun-language (expressions)
9a4404f65b63 updated
Christian Urban <urbanc@in.tum.de>
parents: 678
diff changeset
   189
abstract class Exp 
9a4404f65b63 updated
Christian Urban <urbanc@in.tum.de>
parents: 678
diff changeset
   190
abstract class BExp 
678
6601ff1d9e0a updated
Christian Urban <urbanc@in.tum.de>
parents: 677
diff changeset
   191
6601ff1d9e0a updated
Christian Urban <urbanc@in.tum.de>
parents: 677
diff changeset
   192
case class Call(name: String, args: List[Exp]) extends Exp
6601ff1d9e0a updated
Christian Urban <urbanc@in.tum.de>
parents: 677
diff changeset
   193
case class If(a: BExp, e1: Exp, e2: Exp) extends Exp
6601ff1d9e0a updated
Christian Urban <urbanc@in.tum.de>
parents: 677
diff changeset
   194
case class Write(e: Exp) extends Exp
6601ff1d9e0a updated
Christian Urban <urbanc@in.tum.de>
parents: 677
diff changeset
   195
case class Var(s: String) extends Exp
6601ff1d9e0a updated
Christian Urban <urbanc@in.tum.de>
parents: 677
diff changeset
   196
case class Num(i: Int) extends Exp
6601ff1d9e0a updated
Christian Urban <urbanc@in.tum.de>
parents: 677
diff changeset
   197
case class Aop(o: String, a1: Exp, a2: Exp) extends Exp
6601ff1d9e0a updated
Christian Urban <urbanc@in.tum.de>
parents: 677
diff changeset
   198
case class Sequence(e1: Exp, e2: Exp) extends Exp
679
9a4404f65b63 updated
Christian Urban <urbanc@in.tum.de>
parents: 678
diff changeset
   199
case class Bop(o: String, a1: Exp, a2: Exp) extends BExp  
9a4404f65b63 updated
Christian Urban <urbanc@in.tum.de>
parents: 678
diff changeset
   200
9a4404f65b63 updated
Christian Urban <urbanc@in.tum.de>
parents: 678
diff changeset
   201
9a4404f65b63 updated
Christian Urban <urbanc@in.tum.de>
parents: 678
diff changeset
   202
9a4404f65b63 updated
Christian Urban <urbanc@in.tum.de>
parents: 678
diff changeset
   203
// K-language (K-expressions, K-values)
9a4404f65b63 updated
Christian Urban <urbanc@in.tum.de>
parents: 678
diff changeset
   204
abstract class KExp
9a4404f65b63 updated
Christian Urban <urbanc@in.tum.de>
parents: 678
diff changeset
   205
abstract class KVal
9a4404f65b63 updated
Christian Urban <urbanc@in.tum.de>
parents: 678
diff changeset
   206
9a4404f65b63 updated
Christian Urban <urbanc@in.tum.de>
parents: 678
diff changeset
   207
case class KVar(s: String) extends KVal
9a4404f65b63 updated
Christian Urban <urbanc@in.tum.de>
parents: 678
diff changeset
   208
case class KNum(i: Int) extends KVal
9a4404f65b63 updated
Christian Urban <urbanc@in.tum.de>
parents: 678
diff changeset
   209
case class Kop(o: String, v1: KVal, v2: KVal) extends KVal
9a4404f65b63 updated
Christian Urban <urbanc@in.tum.de>
parents: 678
diff changeset
   210
case class KCall(o: String, vrs: List[KVal]) extends KVal
9a4404f65b63 updated
Christian Urban <urbanc@in.tum.de>
parents: 678
diff changeset
   211
case class KWrite(v: KVal) extends KVal
9a4404f65b63 updated
Christian Urban <urbanc@in.tum.de>
parents: 678
diff changeset
   212
9a4404f65b63 updated
Christian Urban <urbanc@in.tum.de>
parents: 678
diff changeset
   213
case class KIf(x1: String, e1: KExp, e2: KExp) extends KExp
680
242c1f0e60df updated
Christian Urban <urbanc@in.tum.de>
parents: 679
diff changeset
   214
case class KLet(x: String, v: KVal, e: KExp) extends KExp
679
9a4404f65b63 updated
Christian Urban <urbanc@in.tum.de>
parents: 678
diff changeset
   215
case class KReturn(v: KVal) extends KExp
678
6601ff1d9e0a updated
Christian Urban <urbanc@in.tum.de>
parents: 677
diff changeset
   216
\end{lstlisting}
6601ff1d9e0a updated
Christian Urban <urbanc@in.tum.de>
parents: 677
diff changeset
   217
\caption{Abstract syntax trees for the Fun language.\label{absfun}}
6601ff1d9e0a updated
Christian Urban <urbanc@in.tum.de>
parents: 677
diff changeset
   218
\end{figure}
679
9a4404f65b63 updated
Christian Urban <urbanc@in.tum.de>
parents: 678
diff changeset
   219
678
6601ff1d9e0a updated
Christian Urban <urbanc@in.tum.de>
parents: 677
diff changeset
   220
6601ff1d9e0a updated
Christian Urban <urbanc@in.tum.de>
parents: 677
diff changeset
   221
6601ff1d9e0a updated
Christian Urban <urbanc@in.tum.de>
parents: 677
diff changeset
   222
\subsection*{CPS-Translations}
6601ff1d9e0a updated
Christian Urban <urbanc@in.tum.de>
parents: 677
diff changeset
   223
6601ff1d9e0a updated
Christian Urban <urbanc@in.tum.de>
parents: 677
diff changeset
   224
679
9a4404f65b63 updated
Christian Urban <urbanc@in.tum.de>
parents: 678
diff changeset
   225
9a4404f65b63 updated
Christian Urban <urbanc@in.tum.de>
parents: 678
diff changeset
   226
9a4404f65b63 updated
Christian Urban <urbanc@in.tum.de>
parents: 678
diff changeset
   227
9a4404f65b63 updated
Christian Urban <urbanc@in.tum.de>
parents: 678
diff changeset
   228
Another reason why it makes sense to go the extra mile is that stack
9a4404f65b63 updated
Christian Urban <urbanc@in.tum.de>
parents: 678
diff changeset
   229
instructions are very difficult to optimise---you cannot just re-arrange
9a4404f65b63 updated
Christian Urban <urbanc@in.tum.de>
parents: 678
diff changeset
   230
instructions without messing about with what is calculated on the stack.
9a4404f65b63 updated
Christian Urban <urbanc@in.tum.de>
parents: 678
diff changeset
   231
Also it is hard to find out if all the calculations on the stack are
9a4404f65b63 updated
Christian Urban <urbanc@in.tum.de>
parents: 678
diff changeset
   232
actually necessary and not by chance dead code. The JVM has for all this
9a4404f65b63 updated
Christian Urban <urbanc@in.tum.de>
parents: 678
diff changeset
   233
sophisticated machinery to make such ``high-level'' code still run fast,
9a4404f65b63 updated
Christian Urban <urbanc@in.tum.de>
parents: 678
diff changeset
   234
but let's say that for the sake of argument we do not want to rely on
9a4404f65b63 updated
Christian Urban <urbanc@in.tum.de>
parents: 678
diff changeset
   235
it. We want to generate fast code ourselves. This means we have to work
9a4404f65b63 updated
Christian Urban <urbanc@in.tum.de>
parents: 678
diff changeset
   236
around the intricacies of what instructions CPUs can actually process.
9a4404f65b63 updated
Christian Urban <urbanc@in.tum.de>
parents: 678
diff changeset
   237
539
8a12889f8c8a updated
Christian Urban <urbanc@in.tum.de>
parents:
diff changeset
   238
\end{document}
8a12889f8c8a updated
Christian Urban <urbanc@in.tum.de>
parents:
diff changeset
   239
8a12889f8c8a updated
Christian Urban <urbanc@in.tum.de>
parents:
diff changeset
   240
8a12889f8c8a updated
Christian Urban <urbanc@in.tum.de>
parents:
diff changeset
   241
%%% Local Variables: 
8a12889f8c8a updated
Christian Urban <urbanc@in.tum.de>
parents:
diff changeset
   242
%%% mode: latex
8a12889f8c8a updated
Christian Urban <urbanc@in.tum.de>
parents:
diff changeset
   243
%%% TeX-master: t
8a12889f8c8a updated
Christian Urban <urbanc@in.tum.de>
parents:
diff changeset
   244
%%% End: