% !TEX program = xelatex+ −
\documentclass{article}+ −
\usepackage{../style}+ −
\usepackage{../graphicss}+ −
\usepackage{../langs}+ −
\definecolor{navyblue}{rgb}{0.0, 0.0, 0.5}+ −
\definecolor{pansypurple}{rgb}{0.47, 0.09, 0.29}+ −
+ −
\begin{document}+ −
+ −
+ −
%\color{pansypurple}+ −
%\section*{RESIT / REPLACEMENT}+ −
%+ −
%{\bf+ −
%The resit / replacement task is essentially CW5 (listed below) with+ −
%the exception that the lexer and parser is already provided. The+ −
%parser will generate an AST (see file \texttt{fun\_llvm.sc}). Your task+ −
%is to generate an AST for the K-intermediate language and supply+ −
%sufficient type annotations such that you can generate valid code for+ −
%the LLVM-IR. The submission deadline is 9th August at 16:00. At the+ −
%deadline, please send me an email containing a zip-file with your+ −
%files.+ −
%Feel free to reuse the files I have uploaded on KEATS (especially+ −
%the files generating simple LLVM-IR code). Of help might also be the+ −
%videos of Week~10.\bigskip+ −
%+ −
%\noindent+ −
%Good Luck!}+ −
%\color{black}+ −
+ −
+ −
\section*{Coursework 5}+ −
+ −
+ −
+ −
\noindent This coursework is worth 25\% and is due on \cwFIVE{} at+ −
16:00. You are asked to implement a compiler targeting the LLVM-IR.+ −
Be careful that this CW needs some material about the LLVM-IR+ −
that has not been shown in the lectures and your own experiments+ −
might be required. You can find information about the LLVM-IR at+ −
+ −
\begin{itemize}+ −
\item \url{https://bit.ly/3rheZYr}+ −
\item \url{https://llvm.org/docs/LangRef.html} + −
\end{itemize} + −
+ −
\noindent+ −
You can do the implementation of your compiler in any programming+ −
language you like, but you need to submit the source code with which+ −
you generated the LLVM-IR files, otherwise a mark of 0\% will be+ −
awarded. You are asked to submit the code of your compiler, but also+ −
the generated \texttt{.ll} files. No PDF is needed for this+ −
coursework. You should use the lexer and parser from the previous+ −
courseworks, but you need to make some modifications to them for the+ −
`typed' version of the Fun-language. I will award up to 5\% if a lexer+ −
and a parser are correctly implemented. At the end, please package+ −
everything(!) in a zip-file that creates a directory with the name+ −
+ −
\begin{center}+ −
\texttt{YournameYourFamilyname}+ −
\end{center}+ −
+ −
\noindent+ −
on my end. You will be marked according to the input files+ −
+ −
\begin{itemize}+ −
\item\href{https://talisker.nms.kcl.ac.uk/cgi-bin/repos.cgi/afl-material/raw-file/tip/cwtests/cw05/sqr.fun}{sqr.fun} + −
\item\href{https://talisker.nms.kcl.ac.uk/cgi-bin/repos.cgi/afl-material/raw-file/tip/cwtests/cw05/fact.fun}{fact.fun}+ −
\item\href{https://talisker.nms.kcl.ac.uk/cgi-bin/repos.cgi/afl-material/raw-file/tip/cwtests/cw05/mand.fun}{mand.fun}+ −
\item\href{https://talisker.nms.kcl.ac.uk/cgi-bin/repos.cgi/afl-material/raw-file/tip/cwtests/cw05/mand2.fun}{mand2.fun}+ −
\item\href{https://talisker.nms.kcl.ac.uk/cgi-bin/repos.cgi/afl-material/raw-file/tip/cwtests/cw05/hanoi.fun}{hanoi.fun} + −
\end{itemize} + −
+ −
\noindent+ −
which are uploaded to KEATS.+ −
+ −
\subsection*{Disclaimer\alert}+ −
+ −
It should be understood that the work you submit represents your own+ −
effort. You have not copied from anyone else. An exception is the+ −
Scala code I showed during the lectures or uploaded to KEATS, which+ −
you can both use. You can also use your own code from the CW~1 --+ −
CW~4.+ −
+ −
+ −
\subsection*{Task}+ −
+ −
The goal is to lex and parse 5 Fun-programs, including the+ −
Mandelbrot program shown in Figure~\ref{mand}, and generate+ −
corresponding code for the LLVM-IR. Unfortunately the calculations for+ −
the Mandelbrot Set require floating point arithmetic and therefore we+ −
cannot be as simple-minded about types as we have been so far+ −
(remember the LLVM-IR is a fully-typed language and needs to know the+ −
exact types of each expression). The idea is to deal appropriately+ −
with three types, namely \texttt{Int}, \texttt{Double} and+ −
\texttt{Void} (they are represented in the LLVM-IR as \texttt{i32},+ −
\texttt{double} and \texttt{void}). You need to extend the lexer and+ −
parser accordingly in order to deal with type annotations. The+ −
Fun-language includes global constants, such as+ −
+ −
\begin{lstlisting}[numbers=none]+ −
val Ymin: Double = -1.3;+ −
val Maxiters: Int = 1000;+ −
\end{lstlisting}+ −
+ −
\noindent+ −
where you can assume that they are `normal' identifiers, just+ −
starting with a capital letter---all other identifiers should have+ −
lower-case letters. Function definitions can take arguments of+ −
type \texttt{Int} or \texttt{Double}, and need to specify a return+ −
type, which can be \texttt{Void}, for example+ −
+ −
\begin{lstlisting}[numbers=none]+ −
def foo(n: Int, x: Double) : Double = ...+ −
def id(n: Int) : Int = ...+ −
def bar() : Void = ...+ −
\end{lstlisting}+ −
+ −
\noindent+ −
The idea is to record all typing information that is given+ −
in the Fun-program, but then delay any further typing inference to+ −
after the CPS-translation. That means the parser should+ −
generate ASTs given by the Scala dataypes:+ −
+ −
\begin{lstlisting}[numbers=none,language=Scala]+ −
abstract class Exp + −
abstract class BExp + −
abstract class Decl + −
+ −
case class Def(name: String, args: List[(String, String)],+ −
ty: String, body: Exp) extends Decl+ −
case class Main(e: Exp) extends Decl+ −
case class Const(name: String, v: Int) extends Decl+ −
case class FConst(name: String, x: Double) extends Decl+ −
+ −
case class Call(name: String, args: List[Exp]) extends Exp+ −
case class If(a: BExp, e1: Exp, e2: Exp) extends Exp+ −
case class Var(s: String) extends Exp+ −
case class Num(i: Int) extends Exp // integer numbers+ −
case class FNum(i: Double) extends Exp // floating numbers+ −
case class ChConst(c: Int) extends Exp // char constants+ −
case class Aop(o: String, a1: Exp, a2: Exp) extends Exp+ −
case class Sequence(e1: Exp, e2: Exp) extends Exp+ −
case class Bop(o: String, a1: Exp, a2: Exp) extends BExp+ −
\end{lstlisting}+ −
+ −
\noindent+ −
This datatype distinguishes whether the global constant is an integer+ −
constant or floating constant. Also a function definition needs to+ −
record the return type of the function, namely the argument+ −
\texttt{ty} in \texttt{Def}, and the arguments consist of an pairs of+ −
identifier names and types (\texttt{Int} or \texttt{Double}). The hard+ −
part of the CW is to design the K-intermediate language and infer all+ −
necessary types in order to generate LLVM-IR code. You can check+ −
your LLVM-IR code by running it with the interpreter \texttt{lli}.+ −
+ −
\begin{figure}[t]+ −
\lstinputlisting[language=Scala]{../cwtests/cw05/mand.fun}+ −
\caption{The Mandelbrot program in the `typed' Fun-language.\label{mand}}+ −
\end{figure}+ −
+ −
\begin{figure}[t]+ −
\includegraphics[scale=0.35]{../solution/cw5/out.png}+ −
\caption{Ascii output of the Mandelbrot program.\label{mand2}}+ −
\end{figure}+ −
+ −
Also note that the second version of the Mandelbrot program and also+ −
the Tower of Hanoi program use character constants, like \texttt{'a'},+ −
\texttt{'1'}, \texttt{'$\backslash$n'} and so on. When they are tokenised,+ −
such characters should be interpreted as the corresponding ASCII code (an+ −
integer), such that we can use them in calculations like \texttt{'a' + 10}+ −
where the result should be 107. As usual, the character \texttt{'$\backslash$n'} is the+ −
ASCII code 10.+ −
+ −
+ −
\subsection*{LLVM-IR}+ −
+ −
There are some subtleties in the LLVM-IR you need to be aware of:+ −
+ −
\begin{itemize}+ −
\item \textbf{Global constants}: While global constants such as+ −
+ −
\begin{lstlisting}[numbers=none] + −
val Max : Int = 10;+ −
\end{lstlisting}+ −
+ −
\noindent+ −
can be easily defined in the LLVM-IR as follows+ −
+ −
\begin{lstlisting}[numbers=none] + −
@Max = global i32 10+ −
\end{lstlisting}+ −
+ −
\noindent+ −
they cannot easily be referenced. If you want to use+ −
this constant then you need to generate code such as+ −
+ −
\begin{lstlisting}[numbers=none] + −
%tmp_22 = load i32, i32* @Max+ −
\end{lstlisting}+ −
+ −
\noindent+ −
first, which treats \texttt{@Max} as an Integer-pointer (type+ −
\texttt{i32*}) that needs to be loaded into a local variable,+ −
here \texttt{\%tmp\_22}.+ −
+ −
\item \textbf{Void-Functions}: While integer and double functions+ −
can easily be called and their results can be allocated to a+ −
temporary variable:+ −
+ −
\begin{lstlisting}[numbers=none] + −
%tmp_23 = call i32 @sqr (i32 %n)+ −
\end{lstlisting}+ −
+ −
void-functions cannot be allocated to a variable. They need to be+ −
called just as+ −
+ −
\begin{lstlisting}[numbers=none] + −
call void @print_int (i32 %tmp_23)+ −
\end{lstlisting}+ −
+ −
\item \textbf{Floating-Point Operations}: While integer operations+ −
are specified in the LLVM-IR as+ −
+ −
\begin{lstlisting}[numbers=none,language=Scala]+ −
def compile_op(op: String) = op match {+ −
case "+" => "add i32 "+ −
case "*" => "mul i32 "+ −
case "-" => "sub i32 "+ −
case "==" => "icmp eq i32 "+ −
case "!=" => "icmp ne i32 "+ −
case "<=" => "icmp sle i32 " // signed less or equal+ −
case "<" => "icmp slt i32 " // signed less than+ −
}\end{lstlisting}+ −
+ −
the corresponding operations on doubles are+ −
+ −
\begin{lstlisting}[numbers=none,language=Scala]+ −
def compile_dop(op: String) = op match {+ −
case "+" => "fadd double "+ −
case "*" => "fmul double "+ −
case "-" => "fsub double "+ −
case "==" => "fcmp oeq double "+ −
case "!=" => "fcmp one double "+ −
case "<=" => "fcmp ole double " + −
case "<" => "fcmp olt double " + −
}\end{lstlisting}+ −
+ −
\item \textbf{Typing}: In order to leave the CPS-translations+ −
as is, it makes sense to defer the full type-inference to the+ −
K-intermediate-language. For this it is good to define+ −
the \texttt{KVar} constructor as+ −
+ −
\begin{lstlisting}[numbers=none,language=Scala] + −
case class KVar(s: String, ty: Ty = "UNDEF") extends KVal\end{lstlisting}+ −
+ −
where first a default type, for example \texttt{UNDEF}, is+ −
given. Then you need to define two typing functions+ −
+ −
\begin{lstlisting}[numbers=none,language=Scala] + −
def typ_val(v: KVal, ts: TyEnv) = ???+ −
def typ_exp(a: KExp, ts: TyEnv) = ???+ −
\end{lstlisting}+ −
+ −
Both functions require a typing-environment that updates+ −
the information about what type each variable, operation+ −
and so on receives. Once the types are inferred, the+ −
LLVM-IR code can be generated. Since we are dealing only+ −
with simple first-order functions, nothing on the scale+ −
as the `Hindley-Milner' typing-algorithm is needed. I suggest+ −
to just look at what data is avaliable and generate all+ −
missing information by ``simple means''\ldots rather than+ −
looking at the literature which solves the problem+ −
with much heavier machinery.+ −
+ −
\item \textbf{Build-In Functions}: The `prelude' comes+ −
with several build-in functions: \texttt{new\_line()},+ −
\texttt{skip}, \texttt{print\_int(n)}, \texttt{print\_space()},+ −
\texttt{print\_star()} and \texttt{print\_char(n)}. You can find the `prelude' for+ −
example in the file \texttt{sqr.ll}.+ −
\end{itemize} + −
+ −
\end{document}+ −
+ −
%%% Local Variables: + −
%%% mode: latex+ −
%%% TeX-master: t+ −
%%% End: + −