author | Christian Urban <christian.urban@kcl.ac.uk> |
Wed, 21 Feb 2024 09:14:12 +0000 | |
changeset 959 | 64ec1884d860 |
parent 943 | 5365ef60707e |
permissions | -rw-r--r-- |
630 | 1 |
% !TEX program = xelatex |
200
7415871b1ef5
added
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents:
diff
changeset
|
2 |
\documentclass{article} |
299
6322922aa990
update
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents:
298
diff
changeset
|
3 |
\usepackage{../style} |
865 | 4 |
\usepackage{../graphicss} |
216
f5ec7c597c5b
updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents:
214
diff
changeset
|
5 |
\usepackage{../langs} |
873 | 6 |
\definecolor{navyblue}{rgb}{0.0, 0.0, 0.5} |
7 |
\definecolor{pansypurple}{rgb}{0.47, 0.09, 0.29} |
|
200
7415871b1ef5
added
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents:
diff
changeset
|
8 |
|
7415871b1ef5
added
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents:
diff
changeset
|
9 |
\begin{document} |
873 | 10 |
|
11 |
||
959
64ec1884d860
updated and added pascal.while file
Christian Urban <christian.urban@kcl.ac.uk>
parents:
943
diff
changeset
|
12 |
%\color{pansypurple} |
64ec1884d860
updated and added pascal.while file
Christian Urban <christian.urban@kcl.ac.uk>
parents:
943
diff
changeset
|
13 |
%\section*{RESIT / REPLACEMENT} |
917 | 14 |
|
959
64ec1884d860
updated and added pascal.while file
Christian Urban <christian.urban@kcl.ac.uk>
parents:
943
diff
changeset
|
15 |
%{\bf |
64ec1884d860
updated and added pascal.while file
Christian Urban <christian.urban@kcl.ac.uk>
parents:
943
diff
changeset
|
16 |
%The resit / replacement task is essentially CW5 (listed below) with |
64ec1884d860
updated and added pascal.while file
Christian Urban <christian.urban@kcl.ac.uk>
parents:
943
diff
changeset
|
17 |
%the exception that the lexer and parser is already provided. The |
64ec1884d860
updated and added pascal.while file
Christian Urban <christian.urban@kcl.ac.uk>
parents:
943
diff
changeset
|
18 |
%parser will generate an AST (see file \texttt{fun\_llvm.sc}). Your task |
64ec1884d860
updated and added pascal.while file
Christian Urban <christian.urban@kcl.ac.uk>
parents:
943
diff
changeset
|
19 |
%is to generate an AST for the K-intermediate language and supply |
64ec1884d860
updated and added pascal.while file
Christian Urban <christian.urban@kcl.ac.uk>
parents:
943
diff
changeset
|
20 |
%sufficient type annotations such that you can generate valid code for |
64ec1884d860
updated and added pascal.while file
Christian Urban <christian.urban@kcl.ac.uk>
parents:
943
diff
changeset
|
21 |
%the LLVM-IR. The submission deadline is 4th August at 16:00. At the |
64ec1884d860
updated and added pascal.while file
Christian Urban <christian.urban@kcl.ac.uk>
parents:
943
diff
changeset
|
22 |
%deadline, please send me an email containing a zip-file with your |
64ec1884d860
updated and added pascal.while file
Christian Urban <christian.urban@kcl.ac.uk>
parents:
943
diff
changeset
|
23 |
%files. |
64ec1884d860
updated and added pascal.while file
Christian Urban <christian.urban@kcl.ac.uk>
parents:
943
diff
changeset
|
24 |
%Feel free to reuse the files I have uploaded on KEATS (especially |
64ec1884d860
updated and added pascal.while file
Christian Urban <christian.urban@kcl.ac.uk>
parents:
943
diff
changeset
|
25 |
%the files generating simple LLVM-IR code). Of help might also be the |
64ec1884d860
updated and added pascal.while file
Christian Urban <christian.urban@kcl.ac.uk>
parents:
943
diff
changeset
|
26 |
%videos of Week~10.\bigskip |
64ec1884d860
updated and added pascal.while file
Christian Urban <christian.urban@kcl.ac.uk>
parents:
943
diff
changeset
|
27 |
% |
64ec1884d860
updated and added pascal.while file
Christian Urban <christian.urban@kcl.ac.uk>
parents:
943
diff
changeset
|
28 |
%\noindent |
64ec1884d860
updated and added pascal.while file
Christian Urban <christian.urban@kcl.ac.uk>
parents:
943
diff
changeset
|
29 |
%Good Luck!}\smallskip\\ |
64ec1884d860
updated and added pascal.while file
Christian Urban <christian.urban@kcl.ac.uk>
parents:
943
diff
changeset
|
30 |
%\noindent |
64ec1884d860
updated and added pascal.while file
Christian Urban <christian.urban@kcl.ac.uk>
parents:
943
diff
changeset
|
31 |
%Christian |
64ec1884d860
updated and added pascal.while file
Christian Urban <christian.urban@kcl.ac.uk>
parents:
943
diff
changeset
|
32 |
%\color{black} |
873 | 33 |
|
200
7415871b1ef5
added
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents:
diff
changeset
|
34 |
|
836 | 35 |
\section*{Coursework 5} |
200
7415871b1ef5
added
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents:
diff
changeset
|
36 |
|
722 | 37 |
|
38 |
||
836 | 39 |
\noindent This coursework is worth 25\% and is due on \cwFIVE{} at |
877 | 40 |
16:00. You are asked to implement a compiler targeting the LLVM-IR. |
820 | 41 |
Be careful that this CW needs some material about the LLVM-IR |
42 |
that has not been shown in the lectures and your own experiments |
|
959
64ec1884d860
updated and added pascal.while file
Christian Urban <christian.urban@kcl.ac.uk>
parents:
943
diff
changeset
|
43 |
and research might be required. You can find information about the LLVM-IR at |
820 | 44 |
|
45 |
\begin{itemize} |
|
46 |
\item \url{https://bit.ly/3rheZYr} |
|
47 |
\item \url{https://llvm.org/docs/LangRef.html} |
|
48 |
\end{itemize} |
|
49 |
||
50 |
\noindent |
|
51 |
You can do the implementation of your compiler in any programming |
|
748 | 52 |
language you like, but you need to submit the source code with which |
820 | 53 |
you generated the LLVM-IR files, otherwise a mark of 0\% will be |
853 | 54 |
awarded. You are asked to submit the code of your compiler, but also |
858 | 55 |
the generated \texttt{.ll} files. No PDF is needed for this |
56 |
coursework. You should use the lexer and parser from the previous |
|
57 |
courseworks, but you need to make some modifications to them for the |
|
58 |
`typed' version of the Fun-language. I will award up to 5\% if a lexer |
|
959
64ec1884d860
updated and added pascal.while file
Christian Urban <christian.urban@kcl.ac.uk>
parents:
943
diff
changeset
|
59 |
and a parser are correctly implemented. |
853 | 60 |
|
959
64ec1884d860
updated and added pascal.while file
Christian Urban <christian.urban@kcl.ac.uk>
parents:
943
diff
changeset
|
61 |
%At the end, please package |
64ec1884d860
updated and added pascal.while file
Christian Urban <christian.urban@kcl.ac.uk>
parents:
943
diff
changeset
|
62 |
%everything(!) in a zip-file that creates a directory with the name |
64ec1884d860
updated and added pascal.while file
Christian Urban <christian.urban@kcl.ac.uk>
parents:
943
diff
changeset
|
63 |
% |
64ec1884d860
updated and added pascal.while file
Christian Urban <christian.urban@kcl.ac.uk>
parents:
943
diff
changeset
|
64 |
%\begin{center} |
64ec1884d860
updated and added pascal.while file
Christian Urban <christian.urban@kcl.ac.uk>
parents:
943
diff
changeset
|
65 |
%\texttt{YournameYourFamilyname} |
64ec1884d860
updated and added pascal.while file
Christian Urban <christian.urban@kcl.ac.uk>
parents:
943
diff
changeset
|
66 |
%\end{center} |
64ec1884d860
updated and added pascal.while file
Christian Urban <christian.urban@kcl.ac.uk>
parents:
943
diff
changeset
|
67 |
% |
64ec1884d860
updated and added pascal.while file
Christian Urban <christian.urban@kcl.ac.uk>
parents:
943
diff
changeset
|
68 |
%\noindent |
64ec1884d860
updated and added pascal.while file
Christian Urban <christian.urban@kcl.ac.uk>
parents:
943
diff
changeset
|
69 |
%on my end. |
64ec1884d860
updated and added pascal.while file
Christian Urban <christian.urban@kcl.ac.uk>
parents:
943
diff
changeset
|
70 |
You will be marked according to the input files |
855 | 71 |
|
72 |
\begin{itemize} |
|
905 | 73 |
\item\href{https://nms.kcl.ac.uk/christian.urban/cfl/progs/sqr.fun}{sqr.fun} |
74 |
\item\href{https://nms.kcl.ac.uk/christian.urban/cfl/progs/fact.fun}{fact.fun} |
|
75 |
\item\href{https://nms.kcl.ac.uk/christian.urban/cfl/progs/mand.fun}{mand.fun} |
|
76 |
\item\href{https://nms.kcl.ac.uk/christian.urban/cfl/progs/mand2.fun}{mand2.fun} |
|
77 |
\item\href{https://nms.kcl.ac.uk/christian.urban/cfl/progs/hanoi.fun}{hanoi.fun} |
|
855 | 78 |
\end{itemize} |
79 |
||
80 |
\noindent |
|
959
64ec1884d860
updated and added pascal.while file
Christian Urban <christian.urban@kcl.ac.uk>
parents:
943
diff
changeset
|
81 |
which are uploaded to KEATS and Github. |
200
7415871b1ef5
added
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents:
diff
changeset
|
82 |
|
750 | 83 |
\subsection*{Disclaimer\alert} |
358
b3129cff41e9
updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents:
333
diff
changeset
|
84 |
|
750 | 85 |
It should be understood that the work you submit represents your own |
86 |
effort. You have not copied from anyone else. An exception is the |
|
87 |
Scala code I showed during the lectures or uploaded to KEATS, which |
|
751 | 88 |
you can both use. You can also use your own code from the CW~1 -- |
886 | 89 |
CW~4. But do not |
90 |
be tempted to ask Github Copilot for help or do any other |
|
91 |
shenanigans like this! |
|
200
7415871b1ef5
added
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents:
diff
changeset
|
92 |
|
299
6322922aa990
update
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents:
298
diff
changeset
|
93 |
|
820 | 94 |
\subsection*{Task} |
95 |
||
858 | 96 |
The goal is to lex and parse 5 Fun-programs, including the |
853 | 97 |
Mandelbrot program shown in Figure~\ref{mand}, and generate |
98 |
corresponding code for the LLVM-IR. Unfortunately the calculations for |
|
99 |
the Mandelbrot Set require floating point arithmetic and therefore we |
|
100 |
cannot be as simple-minded about types as we have been so far |
|
101 |
(remember the LLVM-IR is a fully-typed language and needs to know the |
|
102 |
exact types of each expression). The idea is to deal appropriately |
|
103 |
with three types, namely \texttt{Int}, \texttt{Double} and |
|
104 |
\texttt{Void} (they are represented in the LLVM-IR as \texttt{i32}, |
|
105 |
\texttt{double} and \texttt{void}). You need to extend the lexer and |
|
106 |
parser accordingly in order to deal with type annotations. The |
|
107 |
Fun-language includes global constants, such as |
|
820 | 108 |
|
109 |
\begin{lstlisting}[numbers=none] |
|
110 |
val Ymin: Double = -1.3; |
|
111 |
val Maxiters: Int = 1000; |
|
112 |
\end{lstlisting} |
|
113 |
||
114 |
\noindent |
|
858 | 115 |
where you can assume that they are `normal' identifiers, just |
820 | 116 |
starting with a capital letter---all other identifiers should have |
117 |
lower-case letters. Function definitions can take arguments of |
|
118 |
type \texttt{Int} or \texttt{Double}, and need to specify a return |
|
119 |
type, which can be \texttt{Void}, for example |
|
120 |
||
121 |
\begin{lstlisting}[numbers=none] |
|
122 |
def foo(n: Int, x: Double) : Double = ... |
|
853 | 123 |
def id(n: Int) : Int = ... |
820 | 124 |
def bar() : Void = ... |
125 |
\end{lstlisting} |
|
126 |
||
127 |
\noindent |
|
128 |
The idea is to record all typing information that is given |
|
853 | 129 |
in the Fun-program, but then delay any further typing inference to |
820 | 130 |
after the CPS-translation. That means the parser should |
131 |
generate ASTs given by the Scala dataypes: |
|
132 |
||
133 |
\begin{lstlisting}[numbers=none,language=Scala] |
|
134 |
abstract class Exp |
|
135 |
abstract class BExp |
|
136 |
abstract class Decl |
|
137 |
||
138 |
case class Def(name: String, args: List[(String, String)], |
|
139 |
ty: String, body: Exp) extends Decl |
|
140 |
case class Main(e: Exp) extends Decl |
|
141 |
case class Const(name: String, v: Int) extends Decl |
|
868
8fb3b6d3be70
updated to Doubles trhoughout
Christian Urban <christian.urban@kcl.ac.uk>
parents:
865
diff
changeset
|
142 |
case class FConst(name: String, x: Double) extends Decl |
820 | 143 |
|
144 |
case class Call(name: String, args: List[Exp]) extends Exp |
|
145 |
case class If(a: BExp, e1: Exp, e2: Exp) extends Exp |
|
146 |
case class Var(s: String) extends Exp |
|
853 | 147 |
case class Num(i: Int) extends Exp // integer numbers |
868
8fb3b6d3be70
updated to Doubles trhoughout
Christian Urban <christian.urban@kcl.ac.uk>
parents:
865
diff
changeset
|
148 |
case class FNum(i: Double) extends Exp // floating numbers |
857 | 149 |
case class ChConst(c: Int) extends Exp // char constants |
820 | 150 |
case class Aop(o: String, a1: Exp, a2: Exp) extends Exp |
151 |
case class Sequence(e1: Exp, e2: Exp) extends Exp |
|
152 |
case class Bop(o: String, a1: Exp, a2: Exp) extends BExp |
|
153 |
\end{lstlisting} |
|
154 |
||
155 |
\noindent |
|
156 |
This datatype distinguishes whether the global constant is an integer |
|
157 |
constant or floating constant. Also a function definition needs to |
|
158 |
record the return type of the function, namely the argument |
|
159 |
\texttt{ty} in \texttt{Def}, and the arguments consist of an pairs of |
|
160 |
identifier names and types (\texttt{Int} or \texttt{Double}). The hard |
|
161 |
part of the CW is to design the K-intermediate language and infer all |
|
162 |
necessary types in order to generate LLVM-IR code. You can check |
|
163 |
your LLVM-IR code by running it with the interpreter \texttt{lli}. |
|
164 |
||
165 |
\begin{figure}[t] |
|
857 | 166 |
\lstinputlisting[language=Scala]{../cwtests/cw05/mand.fun} |
820 | 167 |
\caption{The Mandelbrot program in the `typed' Fun-language.\label{mand}} |
168 |
\end{figure} |
|
169 |
||
170 |
\begin{figure}[t] |
|
943 | 171 |
\includegraphics[scale=0.35]{../solutions/cw5/out.png} |
865 | 172 |
\caption{Ascii output of the Mandelbrot program.\label{mand2}} |
820 | 173 |
\end{figure} |
174 |
||
853 | 175 |
Also note that the second version of the Mandelbrot program and also |
858 | 176 |
the Tower of Hanoi program use character constants, like \texttt{'a'}, |
853 | 177 |
\texttt{'1'}, \texttt{'$\backslash$n'} and so on. When they are tokenised, |
178 |
such characters should be interpreted as the corresponding ASCII code (an |
|
179 |
integer), such that we can use them in calculations like \texttt{'a' + 10} |
|
180 |
where the result should be 107. As usual, the character \texttt{'$\backslash$n'} is the |
|
181 |
ASCII code 10. |
|
182 |
||
183 |
||
820 | 184 |
\subsection*{LLVM-IR} |
185 |
||
186 |
There are some subtleties in the LLVM-IR you need to be aware of: |
|
187 |
||
188 |
\begin{itemize} |
|
189 |
\item \textbf{Global constants}: While global constants such as |
|
190 |
||
191 |
\begin{lstlisting}[numbers=none] |
|
192 |
val Max : Int = 10; |
|
193 |
\end{lstlisting} |
|
200
7415871b1ef5
added
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents:
diff
changeset
|
194 |
|
820 | 195 |
\noindent |
196 |
can be easily defined in the LLVM-IR as follows |
|
197 |
||
198 |
\begin{lstlisting}[numbers=none] |
|
199 |
@Max = global i32 10 |
|
200 |
\end{lstlisting} |
|
201 |
||
202 |
\noindent |
|
203 |
they cannot easily be referenced. If you want to use |
|
204 |
this constant then you need to generate code such as |
|
205 |
||
206 |
\begin{lstlisting}[numbers=none] |
|
207 |
%tmp_22 = load i32, i32* @Max |
|
208 |
\end{lstlisting} |
|
209 |
||
210 |
\noindent |
|
211 |
first, which treats \texttt{@Max} as an Integer-pointer (type |
|
212 |
\texttt{i32*}) that needs to be loaded into a local variable, |
|
213 |
here \texttt{\%tmp\_22}. |
|
214 |
||
215 |
\item \textbf{Void-Functions}: While integer and double functions |
|
216 |
can easily be called and their results can be allocated to a |
|
217 |
temporary variable: |
|
218 |
||
219 |
\begin{lstlisting}[numbers=none] |
|
220 |
%tmp_23 = call i32 @sqr (i32 %n) |
|
221 |
\end{lstlisting} |
|
222 |
||
223 |
void-functions cannot be allocated to a variable. They need to be |
|
224 |
called just as |
|
225 |
||
226 |
\begin{lstlisting}[numbers=none] |
|
227 |
call void @print_int (i32 %tmp_23) |
|
228 |
\end{lstlisting} |
|
229 |
||
230 |
\item \textbf{Floating-Point Operations}: While integer operations |
|
231 |
are specified in the LLVM-IR as |
|
201
c813506e0ee8
added
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents:
200
diff
changeset
|
232 |
|
820 | 233 |
\begin{lstlisting}[numbers=none,language=Scala] |
234 |
def compile_op(op: String) = op match { |
|
235 |
case "+" => "add i32 " |
|
236 |
case "*" => "mul i32 " |
|
237 |
case "-" => "sub i32 " |
|
238 |
case "==" => "icmp eq i32 " |
|
853 | 239 |
case "!=" => "icmp ne i32 " |
820 | 240 |
case "<=" => "icmp sle i32 " // signed less or equal |
241 |
case "<" => "icmp slt i32 " // signed less than |
|
242 |
}\end{lstlisting} |
|
243 |
||
244 |
the corresponding operations on doubles are |
|
245 |
||
246 |
\begin{lstlisting}[numbers=none,language=Scala] |
|
247 |
def compile_dop(op: String) = op match { |
|
248 |
case "+" => "fadd double " |
|
249 |
case "*" => "fmul double " |
|
250 |
case "-" => "fsub double " |
|
251 |
case "==" => "fcmp oeq double " |
|
853 | 252 |
case "!=" => "fcmp one double " |
820 | 253 |
case "<=" => "fcmp ole double " |
254 |
case "<" => "fcmp olt double " |
|
255 |
}\end{lstlisting} |
|
256 |
||
257 |
\item \textbf{Typing}: In order to leave the CPS-translations |
|
258 |
as is, it makes sense to defer the full type-inference to the |
|
259 |
K-intermediate-language. For this it is good to define |
|
260 |
the \texttt{KVar} constructor as |
|
261 |
||
262 |
\begin{lstlisting}[numbers=none,language=Scala] |
|
263 |
case class KVar(s: String, ty: Ty = "UNDEF") extends KVal\end{lstlisting} |
|
264 |
||
265 |
where first a default type, for example \texttt{UNDEF}, is |
|
266 |
given. Then you need to define two typing functions |
|
267 |
||
268 |
\begin{lstlisting}[numbers=none,language=Scala] |
|
269 |
def typ_val(v: KVal, ts: TyEnv) = ??? |
|
270 |
def typ_exp(a: KExp, ts: TyEnv) = ??? |
|
271 |
\end{lstlisting} |
|
272 |
||
273 |
Both functions require a typing-environment that updates |
|
274 |
the information about what type each variable, operation |
|
275 |
and so on receives. Once the types are inferred, the |
|
276 |
LLVM-IR code can be generated. Since we are dealing only |
|
277 |
with simple first-order functions, nothing on the scale |
|
278 |
as the `Hindley-Milner' typing-algorithm is needed. I suggest |
|
279 |
to just look at what data is avaliable and generate all |
|
836 | 280 |
missing information by ``simple means''\ldots rather than |
281 |
looking at the literature which solves the problem |
|
282 |
with much heavier machinery. |
|
820 | 283 |
|
284 |
\item \textbf{Build-In Functions}: The `prelude' comes |
|
285 |
with several build-in functions: \texttt{new\_line()}, |
|
853 | 286 |
\texttt{skip}, \texttt{print\_int(n)}, \texttt{print\_space()}, |
287 |
\texttt{print\_star()} and \texttt{print\_char(n)}. You can find the `prelude' for |
|
821 | 288 |
example in the file \texttt{sqr.ll}. |
820 | 289 |
\end{itemize} |
205
0b59588d28d2
updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents:
204
diff
changeset
|
290 |
|
200
7415871b1ef5
added
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents:
diff
changeset
|
291 |
\end{document} |
7415871b1ef5
added
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents:
diff
changeset
|
292 |
|
7415871b1ef5
added
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents:
diff
changeset
|
293 |
%%% Local Variables: |
7415871b1ef5
added
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents:
diff
changeset
|
294 |
%%% mode: latex |
7415871b1ef5
added
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents:
diff
changeset
|
295 |
%%% TeX-master: t |
7415871b1ef5
added
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents:
diff
changeset
|
296 |
%%% End: |