630
|
1 |
% !TEX program = xelatex
|
200
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents:
diff
changeset
|
2 |
\documentclass{article}
|
299
Christian Urban <christian dot urban at kcl dot ac dot uk>
diff
changeset
|
3 |
\usepackage{../style}
|
820
|
4 |
\usepackage{../graphics}
|
216
Christian Urban <christian dot urban at kcl dot ac dot uk>
diff
changeset
|
5 |
\usepackage{../langs}
|
200
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents:
diff
changeset
|
6 |
|
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents:
diff
changeset
|
7 |
\begin{document}
|
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents:
diff
changeset
|
8 |
|
836
|
9 |
\section*{Coursework 5}
|
200
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents:
diff
changeset
|
10 |
|
722
|
11 |
|
|
12 |
|
836
|
13 |
\noindent This coursework is worth 25\% and is due on \cwFIVE{} at
|
820
|
14 |
18:00. You are asked to implement a compiler targeting the LLVM-IR.
|
|
15 |
Be careful that this CW needs some material about the LLVM-IR
|
|
16 |
that has not been shown in the lectures and your own experiments
|
|
17 |
might be required. You can find information about the LLVM-IR at
|
|
18 |
|
|
19 |
\begin{itemize}
|
|
20 |
\item \url{https://bit.ly/3rheZYr}
|
|
21 |
\item \url{https://llvm.org/docs/LangRef.html}
|
|
22 |
\end{itemize}
|
|
23 |
|
|
24 |
\noindent
|
|
25 |
You can do the implementation of your compiler in any programming
|
748
|
26 |
language you like, but you need to submit the source code with which
|
820
|
27 |
you generated the LLVM-IR files, otherwise a mark of 0\% will be
|
853
|
28 |
awarded. You are asked to submit the code of your compiler, but also
|
|
29 |
the generated \texttt{.ll} files. You should use the lexer and parser
|
|
30 |
from the previous courseworks, but you need to make some modifications
|
|
31 |
to them for the `typed' fun-language. I will award up to 5\% if a
|
|
32 |
lexer and a parser are correctly implemented. At the end, please
|
|
33 |
package everything(!) in a zip-file that creates a directory with the
|
|
34 |
name
|
|
35 |
|
|
36 |
\begin{center}
|
|
37 |
\texttt{YournameYourFamilyname}
|
|
38 |
\end{center}
|
|
39 |
|
|
40 |
\noindent
|
855
|
41 |
on my end. You will be marked according to the input files
|
|
42 |
|
|
43 |
\begin{itemize}
|
857
|
44 |
\item\href{https://talisker.nms.kcl.ac.uk/cgi-bin/repos.cgi/afl-material/raw-file/tip/cwtests/cw05/sqr.fun}{sqr.fun}
|
|
45 |
\item\href{https://talisker.nms.kcl.ac.uk/cgi-bin/repos.cgi/afl-material/raw-file/tip/cwtests/cw05/fact.fun}{fact.fun}
|
|
46 |
\item\href{https://talisker.nms.kcl.ac.uk/cgi-bin/repos.cgi/afl-material/raw-file/tip/cwtests/cw05/mand.fun}{mand.fun}
|
|
47 |
\item\href{https://talisker.nms.kcl.ac.uk/cgi-bin/repos.cgi/afl-material/raw-file/tip/cwtests/cw05/mand2.fun}{mand2.fun}
|
|
48 |
\item\href{https://talisker.nms.kcl.ac.uk/cgi-bin/repos.cgi/afl-material/raw-file/tip/cwtests/cw05/hanoi.fun}{hanoi.fun}
|
855
|
49 |
\end{itemize}
|
|
50 |
|
|
51 |
\noindent
|
|
52 |
which are uploaded to KEATS.
|
200
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents:
diff
changeset
|
53 |
|
750
|
54 |
\subsection*{Disclaimer\alert}
|
358
Christian Urban <christian dot urban at kcl dot ac dot uk>
diff
changeset
|
55 |
|
750
|
56 |
It should be understood that the work you submit represents your own
|
|
57 |
effort. You have not copied from anyone else. An exception is the
|
|
58 |
Scala code I showed during the lectures or uploaded to KEATS, which
|
751
|
59 |
you can both use. You can also use your own code from the CW~1 --
|
|
60 |
CW~4.
|
200
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents:
diff
changeset
|
61 |
|
299
Christian Urban <christian dot urban at kcl dot ac dot uk>
diff
changeset
|
62 |
|
820
|
63 |
\subsection*{Task}
|
|
64 |
|
855
|
65 |
The main goal is to lex and parse 5 Fun-programs, including the
|
853
|
66 |
Mandelbrot program shown in Figure~\ref{mand}, and generate
|
|
67 |
corresponding code for the LLVM-IR. Unfortunately the calculations for
|
|
68 |
the Mandelbrot Set require floating point arithmetic and therefore we
|
|
69 |
cannot be as simple-minded about types as we have been so far
|
|
70 |
(remember the LLVM-IR is a fully-typed language and needs to know the
|
|
71 |
exact types of each expression). The idea is to deal appropriately
|
|
72 |
with three types, namely \texttt{Int}, \texttt{Double} and
|
|
73 |
\texttt{Void} (they are represented in the LLVM-IR as \texttt{i32},
|
|
74 |
\texttt{double} and \texttt{void}). You need to extend the lexer and
|
|
75 |
parser accordingly in order to deal with type annotations. The
|
|
76 |
Fun-language includes global constants, such as
|
820
|
77 |
|
|
78 |
\begin{lstlisting}[numbers=none]
|
|
79 |
val Ymin: Double = -1.3;
|
|
80 |
val Maxiters: Int = 1000;
|
|
81 |
\end{lstlisting}
|
|
82 |
|
|
83 |
\noindent
|
|
84 |
where you want to assume that they are `normal' identifiers, just
|
|
85 |
starting with a capital letter---all other identifiers should have
|
|
86 |
lower-case letters. Function definitions can take arguments of
|
|
87 |
type \texttt{Int} or \texttt{Double}, and need to specify a return
|
|
88 |
type, which can be \texttt{Void}, for example
|
|
89 |
|
|
90 |
\begin{lstlisting}[numbers=none]
|
|
91 |
def foo(n: Int, x: Double) : Double = ...
|
853
|
92 |
def id(n: Int) : Int = ...
|
820
|
93 |
def bar() : Void = ...
|
|
94 |
\end{lstlisting}
|
|
95 |
|
|
96 |
\noindent
|
|
97 |
The idea is to record all typing information that is given
|
853
|
98 |
in the Fun-program, but then delay any further typing inference to
|
820
|
99 |
after the CPS-translation. That means the parser should
|
|
100 |
generate ASTs given by the Scala dataypes:
|
|
101 |
|
|
102 |
\begin{lstlisting}[numbers=none,language=Scala]
|
|
103 |
abstract class Exp
|
|
104 |
abstract class BExp
|
|
105 |
abstract class Decl
|
|
106 |
|
|
107 |
case class Def(name: String, args: List[(String, String)],
|
|
108 |
ty: String, body: Exp) extends Decl
|
|
109 |
case class Main(e: Exp) extends Decl
|
|
110 |
case class Const(name: String, v: Int) extends Decl
|
|
111 |
case class FConst(name: String, x: Float) extends Decl
|
|
112 |
|
|
113 |
case class Call(name: String, args: List[Exp]) extends Exp
|
|
114 |
case class If(a: BExp, e1: Exp, e2: Exp) extends Exp
|
|
115 |
case class Var(s: String) extends Exp
|
853
|
116 |
case class Num(i: Int) extends Exp // integer numbers
|
|
117 |
case class FNum(i: Float) extends Exp // floating numbers
|
857
|
118 |
case class ChConst(c: Int) extends Exp // char constants
|
820
|
119 |
case class Aop(o: String, a1: Exp, a2: Exp) extends Exp
|
|
120 |
case class Sequence(e1: Exp, e2: Exp) extends Exp
|
|
121 |
case class Bop(o: String, a1: Exp, a2: Exp) extends BExp
|
|
122 |
\end{lstlisting}
|
|
123 |
|
|
124 |
\noindent
|
|
125 |
This datatype distinguishes whether the global constant is an integer
|
|
126 |
constant or floating constant. Also a function definition needs to
|
|
127 |
record the return type of the function, namely the argument
|
|
128 |
\texttt{ty} in \texttt{Def}, and the arguments consist of an pairs of
|
|
129 |
identifier names and types (\texttt{Int} or \texttt{Double}). The hard
|
|
130 |
part of the CW is to design the K-intermediate language and infer all
|
|
131 |
necessary types in order to generate LLVM-IR code. You can check
|
|
132 |
your LLVM-IR code by running it with the interpreter \texttt{lli}.
|
|
133 |
|
|
134 |
\begin{figure}[t]
|
857
|
135 |
\lstinputlisting[language=Scala]{../cwtests/cw05/mand.fun}
|
820
|
136 |
\caption{The Mandelbrot program in the `typed' Fun-language.\label{mand}}
|
|
137 |
\end{figure}
|
|
138 |
|
|
139 |
\begin{figure}[t]
|
857
|
140 |
\includegraphics[scale=0.35]{../solution/cw5/out.png}
|
820
|
141 |
\caption{Ascii output of the Mandelbrot program.\label{mand}}
|
|
142 |
\end{figure}
|
|
143 |
|
853
|
144 |
Also note that the second version of the Mandelbrot program and also
|
|
145 |
the Tower of Hanoi program uses character constants, like \texttt{'a'},
|
|
146 |
\texttt{'1'}, \texttt{'$\backslash$n'} and so on. When they are tokenised,
|
|
147 |
such characters should be interpreted as the corresponding ASCII code (an
|
|
148 |
integer), such that we can use them in calculations like \texttt{'a' + 10}
|
|
149 |
where the result should be 107. As usual, the character \texttt{'$\backslash$n'} is the
|
|
150 |
ASCII code 10.
|
|
151 |
|
|
152 |
|
820
|
153 |
\subsection*{LLVM-IR}
|
|
154 |
|
|
155 |
There are some subtleties in the LLVM-IR you need to be aware of:
|
|
156 |
|
|
157 |
\begin{itemize}
|
|
158 |
\item \textbf{Global constants}: While global constants such as
|
|
159 |
|
|
160 |
\begin{lstlisting}[numbers=none]
|
|
161 |
val Max : Int = 10;
|
|
162 |
\end{lstlisting}
|
200
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents:
diff
changeset
|
163 |
|
820
|
164 |
\noindent
|
|
165 |
can be easily defined in the LLVM-IR as follows
|
|
166 |
|
|
167 |
\begin{lstlisting}[numbers=none]
|
|
168 |
@Max = global i32 10
|
|
169 |
\end{lstlisting}
|
|
170 |
|
|
171 |
\noindent
|
|
172 |
they cannot easily be referenced. If you want to use
|
|
173 |
this constant then you need to generate code such as
|
|
174 |
|
|
175 |
\begin{lstlisting}[numbers=none]
|
|
176 |
%tmp_22 = load i32, i32* @Max
|
|
177 |
\end{lstlisting}
|
|
178 |
|
|
179 |
\noindent
|
|
180 |
first, which treats \texttt{@Max} as an Integer-pointer (type
|
|
181 |
\texttt{i32*}) that needs to be loaded into a local variable,
|
|
182 |
here \texttt{\%tmp\_22}.
|
|
183 |
|
|
184 |
\item \textbf{Void-Functions}: While integer and double functions
|
|
185 |
can easily be called and their results can be allocated to a
|
|
186 |
temporary variable:
|
|
187 |
|
|
188 |
\begin{lstlisting}[numbers=none]
|
|
189 |
%tmp_23 = call i32 @sqr (i32 %n)
|
|
190 |
\end{lstlisting}
|
|
191 |
|
|
192 |
void-functions cannot be allocated to a variable. They need to be
|
|
193 |
called just as
|
|
194 |
|
|
195 |
\begin{lstlisting}[numbers=none]
|
|
196 |
call void @print_int (i32 %tmp_23)
|
|
197 |
\end{lstlisting}
|
|
198 |
|
|
199 |
\item \textbf{Floating-Point Operations}: While integer operations
|
|
200 |
are specified in the LLVM-IR as
|
201
Christian Urban <christian dot urban at kcl dot ac dot uk>
diff
changeset
|
201 |
|
820
|
202 |
\begin{lstlisting}[numbers=none,language=Scala]
|
|
203 |
def compile_op(op: String) = op match {
|
|
204 |
case "+" => "add i32 "
|
|
205 |
case "*" => "mul i32 "
|
|
206 |
case "-" => "sub i32 "
|
|
207 |
case "==" => "icmp eq i32 "
|
853
|
208 |
case "!=" => "icmp ne i32 "
|
820
|
209 |
case "<=" => "icmp sle i32 " // signed less or equal
|
|
210 |
case "<" => "icmp slt i32 " // signed less than
|
|
211 |
}\end{lstlisting}
|
|
212 |
|
|
213 |
the corresponding operations on doubles are
|
|
214 |
|
|
215 |
\begin{lstlisting}[numbers=none,language=Scala]
|
|
216 |
def compile_dop(op: String) = op match {
|
|
217 |
case "+" => "fadd double "
|
|
218 |
case "*" => "fmul double "
|
|
219 |
case "-" => "fsub double "
|
|
220 |
case "==" => "fcmp oeq double "
|
853
|
221 |
case "!=" => "fcmp one double "
|
820
|
222 |
case "<=" => "fcmp ole double "
|
|
223 |
case "<" => "fcmp olt double "
|
|
224 |
}\end{lstlisting}
|
|
225 |
|
|
226 |
\item \textbf{Typing}: In order to leave the CPS-translations
|
|
227 |
as is, it makes sense to defer the full type-inference to the
|
|
228 |
K-intermediate-language. For this it is good to define
|
|
229 |
the \texttt{KVar} constructor as
|
|
230 |
|
|
231 |
\begin{lstlisting}[numbers=none,language=Scala]
|
|
232 |
case class KVar(s: String, ty: Ty = "UNDEF") extends KVal\end{lstlisting}
|
|
233 |
|
|
234 |
where first a default type, for example \texttt{UNDEF}, is
|
|
235 |
given. Then you need to define two typing functions
|
|
236 |
|
|
237 |
\begin{lstlisting}[numbers=none,language=Scala]
|
|
238 |
def typ_val(v: KVal, ts: TyEnv) = ???
|
|
239 |
def typ_exp(a: KExp, ts: TyEnv) = ???
|
|
240 |
\end{lstlisting}
|
|
241 |
|
|
242 |
Both functions require a typing-environment that updates
|
|
243 |
the information about what type each variable, operation
|
|
244 |
and so on receives. Once the types are inferred, the
|
|
245 |
LLVM-IR code can be generated. Since we are dealing only
|
|
246 |
with simple first-order functions, nothing on the scale
|
|
247 |
as the `Hindley-Milner' typing-algorithm is needed. I suggest
|
|
248 |
to just look at what data is avaliable and generate all
|
836
|
249 |
missing information by ``simple means''\ldots rather than
|
|
250 |
looking at the literature which solves the problem
|
|
251 |
with much heavier machinery.
|
820
|
252 |
|
|
253 |
\item \textbf{Build-In Functions}: The `prelude' comes
|
|
254 |
with several build-in functions: \texttt{new\_line()},
|
853
|
255 |
\texttt{skip}, \texttt{print\_int(n)}, \texttt{print\_space()},
|
|
256 |
\texttt{print\_star()} and \texttt{print\_char(n)}. You can find the `prelude' for
|
821
|
257 |
example in the file \texttt{sqr.ll}.
|
820
|
258 |
\end{itemize}
|
205
Christian Urban <christian dot urban at kcl dot ac dot uk>
diff
changeset
|
259 |
|
200
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents:
diff
changeset
|
260 |
\end{document}
|
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents:
diff
changeset
|
261 |
|
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents:
diff
changeset
|
262 |
%%% Local Variables:
|
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents:
diff
changeset
|
263 |
%%% mode: latex
|
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents:
diff
changeset
|
264 |
%%% TeX-master: t
|
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents:
diff
changeset
|
265 |
%%% End:
|