| author | Christian Urban <christian.urban@kcl.ac.uk> | 
| Mon, 10 Oct 2022 15:07:31 +0100 | |
| changeset 887 | 67d6615fa6e3 | 
| parent 886 | 7a8187cf5bb3 | 
| child 905 | d8f870aad77d | 
| permissions | -rw-r--r-- | 
| 630 | 1  | 
% !TEX program = xelatex  | 
| 
200
 
7415871b1ef5
added
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents:  
diff
changeset
 | 
2  | 
\documentclass{article}
 | 
| 
299
 
6322922aa990
update
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
298 
diff
changeset
 | 
3  | 
\usepackage{../style}
 | 
| 865 | 4  | 
\usepackage{../graphicss}
 | 
| 
216
 
f5ec7c597c5b
updated
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
214 
diff
changeset
 | 
5  | 
\usepackage{../langs}
 | 
| 873 | 6  | 
\definecolor{navyblue}{rgb}{0.0, 0.0, 0.5}
 | 
7  | 
\definecolor{pansypurple}{rgb}{0.47, 0.09, 0.29}
 | 
|
| 
200
 
7415871b1ef5
added
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents:  
diff
changeset
 | 
8  | 
|
| 
 
7415871b1ef5
added
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents:  
diff
changeset
 | 
9  | 
\begin{document}
 | 
| 873 | 10  | 
|
11  | 
||
| 875 | 12  | 
%\color{pansypurple}
 | 
13  | 
%\section*{RESIT / REPLACEMENT}
 | 
|
14  | 
%  | 
|
15  | 
%{\bf
 | 
|
16  | 
%The resit / replacement task is essentially CW5 (listed below) with  | 
|
17  | 
%the exception that the lexer and parser is already provided. The  | 
|
18  | 
%parser will generate an AST (see file \texttt{fun\_llvm.sc}). Your task
 | 
|
19  | 
%is to generate an AST for the K-intermediate language and supply  | 
|
20  | 
%sufficient type annotations such that you can generate valid code for  | 
|
21  | 
%the LLVM-IR. The submission deadline is 9th August at 16:00. At the  | 
|
22  | 
%deadline, please send me an email containing a zip-file with your  | 
|
23  | 
%files.  | 
|
24  | 
%Feel free to reuse the files I have uploaded on KEATS (especially  | 
|
25  | 
%the files generating simple LLVM-IR code). Of help might also be the  | 
|
26  | 
%videos of Week~10.\bigskip  | 
|
27  | 
%  | 
|
28  | 
%\noindent  | 
|
29  | 
%Good Luck!}  | 
|
30  | 
%\color{black}
 | 
|
| 873 | 31  | 
|
| 
200
 
7415871b1ef5
added
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents:  
diff
changeset
 | 
32  | 
|
| 836 | 33  | 
\section*{Coursework 5}
 | 
| 
200
 
7415871b1ef5
added
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents:  
diff
changeset
 | 
34  | 
|
| 722 | 35  | 
|
36  | 
||
| 836 | 37  | 
\noindent This coursework is worth 25\% and is due on \cwFIVE{} at
 | 
| 877 | 38  | 
16:00. You are asked to implement a compiler targeting the LLVM-IR.  | 
| 820 | 39  | 
Be careful that this CW needs some material about the LLVM-IR  | 
40  | 
that has not been shown in the lectures and your own experiments  | 
|
41  | 
might be required. You can find information about the LLVM-IR at  | 
|
42  | 
||
43  | 
\begin{itemize}
 | 
|
44  | 
\item \url{https://bit.ly/3rheZYr}
 | 
|
45  | 
\item \url{https://llvm.org/docs/LangRef.html}  
 | 
|
46  | 
\end{itemize}  
 | 
|
47  | 
||
48  | 
\noindent  | 
|
49  | 
You can do the implementation of your compiler in any programming  | 
|
| 748 | 50  | 
language you like, but you need to submit the source code with which  | 
| 820 | 51  | 
you generated the LLVM-IR files, otherwise a mark of 0\% will be  | 
| 853 | 52  | 
awarded. You are asked to submit the code of your compiler, but also  | 
| 858 | 53  | 
the generated \texttt{.ll} files. No PDF is needed for this
 | 
54  | 
coursework. You should use the lexer and parser from the previous  | 
|
55  | 
courseworks, but you need to make some modifications to them for the  | 
|
56  | 
`typed' version of the Fun-language. I will award up to 5\% if a lexer  | 
|
57  | 
and a parser are correctly implemented. At the end, please package  | 
|
58  | 
everything(!) in a zip-file that creates a directory with the name  | 
|
| 853 | 59  | 
|
60  | 
\begin{center}
 | 
|
61  | 
\texttt{YournameYourFamilyname}
 | 
|
62  | 
\end{center}
 | 
|
63  | 
||
64  | 
\noindent  | 
|
| 855 | 65  | 
on my end. You will be marked according to the input files  | 
66  | 
||
67  | 
\begin{itemize}
 | 
|
| 857 | 68  | 
\item\href{https://talisker.nms.kcl.ac.uk/cgi-bin/repos.cgi/afl-material/raw-file/tip/cwtests/cw05/sqr.fun}{sqr.fun}  
 | 
69  | 
\item\href{https://talisker.nms.kcl.ac.uk/cgi-bin/repos.cgi/afl-material/raw-file/tip/cwtests/cw05/fact.fun}{fact.fun}
 | 
|
70  | 
\item\href{https://talisker.nms.kcl.ac.uk/cgi-bin/repos.cgi/afl-material/raw-file/tip/cwtests/cw05/mand.fun}{mand.fun}
 | 
|
71  | 
\item\href{https://talisker.nms.kcl.ac.uk/cgi-bin/repos.cgi/afl-material/raw-file/tip/cwtests/cw05/mand2.fun}{mand2.fun}
 | 
|
72  | 
\item\href{https://talisker.nms.kcl.ac.uk/cgi-bin/repos.cgi/afl-material/raw-file/tip/cwtests/cw05/hanoi.fun}{hanoi.fun}    
 | 
|
| 855 | 73  | 
\end{itemize}  
 | 
74  | 
||
75  | 
\noindent  | 
|
76  | 
which are uploaded to KEATS.  | 
|
| 
200
 
7415871b1ef5
added
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents:  
diff
changeset
 | 
77  | 
|
| 750 | 78  | 
\subsection*{Disclaimer\alert}
 | 
| 
358
 
b3129cff41e9
updated
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
333 
diff
changeset
 | 
79  | 
|
| 750 | 80  | 
It should be understood that the work you submit represents your own  | 
81  | 
effort. You have not copied from anyone else. An exception is the  | 
|
82  | 
Scala code I showed during the lectures or uploaded to KEATS, which  | 
|
| 751 | 83  | 
you can both use. You can also use your own code from the CW~1 --  | 
| 886 | 84  | 
CW~4. But do not  | 
85  | 
be tempted to ask Github Copilot for help or do any other  | 
|
86  | 
shenanigans like this!  | 
|
| 
200
 
7415871b1ef5
added
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents:  
diff
changeset
 | 
87  | 
|
| 
299
 
6322922aa990
update
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
298 
diff
changeset
 | 
88  | 
|
| 820 | 89  | 
\subsection*{Task}
 | 
90  | 
||
| 858 | 91  | 
The goal is to lex and parse 5 Fun-programs, including the  | 
| 853 | 92  | 
Mandelbrot program shown in Figure~\ref{mand}, and generate
 | 
93  | 
corresponding code for the LLVM-IR. Unfortunately the calculations for  | 
|
94  | 
the Mandelbrot Set require floating point arithmetic and therefore we  | 
|
95  | 
cannot be as simple-minded about types as we have been so far  | 
|
96  | 
(remember the LLVM-IR is a fully-typed language and needs to know the  | 
|
97  | 
exact types of each expression). The idea is to deal appropriately  | 
|
98  | 
with three types, namely \texttt{Int}, \texttt{Double} and
 | 
|
99  | 
\texttt{Void} (they are represented in the LLVM-IR as \texttt{i32},
 | 
|
100  | 
\texttt{double} and \texttt{void}). You need to extend the lexer and
 | 
|
101  | 
parser accordingly in order to deal with type annotations. The  | 
|
102  | 
Fun-language includes global constants, such as  | 
|
| 820 | 103  | 
|
104  | 
\begin{lstlisting}[numbers=none]
 | 
|
105  | 
val Ymin: Double = -1.3;  | 
|
106  | 
val Maxiters: Int = 1000;  | 
|
107  | 
\end{lstlisting}
 | 
|
108  | 
||
109  | 
\noindent  | 
|
| 858 | 110  | 
where you can assume that they are `normal' identifiers, just  | 
| 820 | 111  | 
starting with a capital letter---all other identifiers should have  | 
112  | 
lower-case letters. Function definitions can take arguments of  | 
|
113  | 
type \texttt{Int} or \texttt{Double}, and need to specify a return
 | 
|
114  | 
type, which can be \texttt{Void}, for example
 | 
|
115  | 
||
116  | 
\begin{lstlisting}[numbers=none]
 | 
|
117  | 
def foo(n: Int, x: Double) : Double = ...  | 
|
| 853 | 118  | 
def id(n: Int) : Int = ...  | 
| 820 | 119  | 
def bar() : Void = ...  | 
120  | 
\end{lstlisting}
 | 
|
121  | 
||
122  | 
\noindent  | 
|
123  | 
The idea is to record all typing information that is given  | 
|
| 853 | 124  | 
in the Fun-program, but then delay any further typing inference to  | 
| 820 | 125  | 
after the CPS-translation. That means the parser should  | 
126  | 
generate ASTs given by the Scala dataypes:  | 
|
127  | 
||
128  | 
\begin{lstlisting}[numbers=none,language=Scala]
 | 
|
129  | 
abstract class Exp  | 
|
130  | 
abstract class BExp  | 
|
131  | 
abstract class Decl  | 
|
132  | 
||
133  | 
case class Def(name: String, args: List[(String, String)],  | 
|
134  | 
ty: String, body: Exp) extends Decl  | 
|
135  | 
case class Main(e: Exp) extends Decl  | 
|
136  | 
case class Const(name: String, v: Int) extends Decl  | 
|
| 
868
 
b0acb8741b16
updated to Doubles trhoughout
 
Christian Urban <christian.urban@kcl.ac.uk> 
parents: 
865 
diff
changeset
 | 
137  | 
case class FConst(name: String, x: Double) extends Decl  | 
| 820 | 138  | 
|
139  | 
case class Call(name: String, args: List[Exp]) extends Exp  | 
|
140  | 
case class If(a: BExp, e1: Exp, e2: Exp) extends Exp  | 
|
141  | 
case class Var(s: String) extends Exp  | 
|
| 853 | 142  | 
case class Num(i: Int) extends Exp // integer numbers  | 
| 
868
 
b0acb8741b16
updated to Doubles trhoughout
 
Christian Urban <christian.urban@kcl.ac.uk> 
parents: 
865 
diff
changeset
 | 
143  | 
case class FNum(i: Double) extends Exp // floating numbers  | 
| 857 | 144  | 
case class ChConst(c: Int) extends Exp // char constants  | 
| 820 | 145  | 
case class Aop(o: String, a1: Exp, a2: Exp) extends Exp  | 
146  | 
case class Sequence(e1: Exp, e2: Exp) extends Exp  | 
|
147  | 
case class Bop(o: String, a1: Exp, a2: Exp) extends BExp  | 
|
148  | 
\end{lstlisting}
 | 
|
149  | 
||
150  | 
\noindent  | 
|
151  | 
This datatype distinguishes whether the global constant is an integer  | 
|
152  | 
constant or floating constant. Also a function definition needs to  | 
|
153  | 
record the return type of the function, namely the argument  | 
|
154  | 
\texttt{ty} in \texttt{Def}, and the arguments consist of an pairs of
 | 
|
155  | 
identifier names and types (\texttt{Int} or \texttt{Double}). The hard
 | 
|
156  | 
part of the CW is to design the K-intermediate language and infer all  | 
|
157  | 
necessary types in order to generate LLVM-IR code. You can check  | 
|
158  | 
your LLVM-IR code by running it with the interpreter \texttt{lli}.
 | 
|
159  | 
||
160  | 
\begin{figure}[t]
 | 
|
| 857 | 161  | 
\lstinputlisting[language=Scala]{../cwtests/cw05/mand.fun}
 | 
| 820 | 162  | 
\caption{The Mandelbrot program in the `typed' Fun-language.\label{mand}}
 | 
163  | 
\end{figure}
 | 
|
164  | 
||
165  | 
\begin{figure}[t]
 | 
|
| 857 | 166  | 
\includegraphics[scale=0.35]{../solution/cw5/out.png}
 | 
| 865 | 167  | 
\caption{Ascii output of the Mandelbrot program.\label{mand2}}
 | 
| 820 | 168  | 
\end{figure}
 | 
169  | 
||
| 853 | 170  | 
Also note that the second version of the Mandelbrot program and also  | 
| 858 | 171  | 
the Tower of Hanoi program use character constants, like \texttt{'a'},
 | 
| 853 | 172  | 
\texttt{'1'}, \texttt{'$\backslash$n'} and so on. When they are tokenised,
 | 
173  | 
such characters should be interpreted as the corresponding ASCII code (an  | 
|
174  | 
integer), such that we can use them in calculations like \texttt{'a' + 10}
 | 
|
175  | 
where the result should be 107. As usual, the character \texttt{'$\backslash$n'} is the
 | 
|
176  | 
ASCII code 10.  | 
|
177  | 
||
178  | 
||
| 820 | 179  | 
\subsection*{LLVM-IR}
 | 
180  | 
||
181  | 
There are some subtleties in the LLVM-IR you need to be aware of:  | 
|
182  | 
||
183  | 
\begin{itemize}
 | 
|
184  | 
\item \textbf{Global constants}: While global constants such as
 | 
|
185  | 
||
186  | 
\begin{lstlisting}[numbers=none]  
 | 
|
187  | 
val Max : Int = 10;  | 
|
188  | 
\end{lstlisting}
 | 
|
| 
200
 
7415871b1ef5
added
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents:  
diff
changeset
 | 
189  | 
|
| 820 | 190  | 
\noindent  | 
191  | 
can be easily defined in the LLVM-IR as follows  | 
|
192  | 
||
193  | 
\begin{lstlisting}[numbers=none]  
 | 
|
194  | 
@Max = global i32 10  | 
|
195  | 
\end{lstlisting}
 | 
|
196  | 
||
197  | 
\noindent  | 
|
198  | 
they cannot easily be referenced. If you want to use  | 
|
199  | 
this constant then you need to generate code such as  | 
|
200  | 
||
201  | 
\begin{lstlisting}[numbers=none]  
 | 
|
202  | 
%tmp_22 = load i32, i32* @Max  | 
|
203  | 
\end{lstlisting}
 | 
|
204  | 
||
205  | 
\noindent  | 
|
206  | 
first, which treats \texttt{@Max} as an Integer-pointer (type
 | 
|
207  | 
\texttt{i32*}) that needs to be loaded into a local variable,
 | 
|
208  | 
here \texttt{\%tmp\_22}.
 | 
|
209  | 
||
210  | 
\item \textbf{Void-Functions}: While integer and double functions
 | 
|
211  | 
can easily be called and their results can be allocated to a  | 
|
212  | 
temporary variable:  | 
|
213  | 
||
214  | 
  \begin{lstlisting}[numbers=none]  
 | 
|
215  | 
%tmp_23 = call i32 @sqr (i32 %n)  | 
|
216  | 
  \end{lstlisting}
 | 
|
217  | 
||
218  | 
void-functions cannot be allocated to a variable. They need to be  | 
|
219  | 
called just as  | 
|
220  | 
||
221  | 
  \begin{lstlisting}[numbers=none]  
 | 
|
222  | 
call void @print_int (i32 %tmp_23)  | 
|
223  | 
\end{lstlisting}
 | 
|
224  | 
||
225  | 
\item \textbf{Floating-Point Operations}: While integer operations
 | 
|
226  | 
are specified in the LLVM-IR as  | 
|
| 
201
 
c813506e0ee8
added
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
200 
diff
changeset
 | 
227  | 
|
| 820 | 228  | 
  \begin{lstlisting}[numbers=none,language=Scala]
 | 
229  | 
  def compile_op(op: String) = op match {
 | 
|
230  | 
case "+" => "add i32 "  | 
|
231  | 
case "*" => "mul i32 "  | 
|
232  | 
case "-" => "sub i32 "  | 
|
233  | 
case "==" => "icmp eq i32 "  | 
|
| 853 | 234  | 
case "!=" => "icmp ne i32 "  | 
| 820 | 235  | 
case "<=" => "icmp sle i32 " // signed less or equal  | 
236  | 
case "<" => "icmp slt i32 " // signed less than  | 
|
237  | 
  }\end{lstlisting}
 | 
|
238  | 
||
239  | 
the corresponding operations on doubles are  | 
|
240  | 
||
241  | 
  \begin{lstlisting}[numbers=none,language=Scala]
 | 
|
242  | 
  def compile_dop(op: String) = op match {
 | 
|
243  | 
case "+" => "fadd double "  | 
|
244  | 
case "*" => "fmul double "  | 
|
245  | 
case "-" => "fsub double "  | 
|
246  | 
case "==" => "fcmp oeq double "  | 
|
| 853 | 247  | 
case "!=" => "fcmp one double "  | 
| 820 | 248  | 
case "<=" => "fcmp ole double "  | 
249  | 
case "<" => "fcmp olt double "  | 
|
250  | 
  }\end{lstlisting}
 | 
|
251  | 
||
252  | 
\item \textbf{Typing}: In order to leave the CPS-translations
 | 
|
253  | 
as is, it makes sense to defer the full type-inference to the  | 
|
254  | 
K-intermediate-language. For this it is good to define  | 
|
255  | 
  the \texttt{KVar} constructor as
 | 
|
256  | 
||
257  | 
\begin{lstlisting}[numbers=none,language=Scala]  
 | 
|
258  | 
case class KVar(s: String, ty: Ty = "UNDEF") extends KVal\end{lstlisting}
 | 
|
259  | 
||
260  | 
  where first a default type, for example \texttt{UNDEF}, is
 | 
|
261  | 
given. Then you need to define two typing functions  | 
|
262  | 
||
263  | 
  \begin{lstlisting}[numbers=none,language=Scala]  
 | 
|
264  | 
def typ_val(v: KVal, ts: TyEnv) = ???  | 
|
265  | 
def typ_exp(a: KExp, ts: TyEnv) = ???  | 
|
266  | 
  \end{lstlisting}
 | 
|
267  | 
||
268  | 
Both functions require a typing-environment that updates  | 
|
269  | 
the information about what type each variable, operation  | 
|
270  | 
and so on receives. Once the types are inferred, the  | 
|
271  | 
LLVM-IR code can be generated. Since we are dealing only  | 
|
272  | 
with simple first-order functions, nothing on the scale  | 
|
273  | 
as the `Hindley-Milner' typing-algorithm is needed. I suggest  | 
|
274  | 
to just look at what data is avaliable and generate all  | 
|
| 836 | 275  | 
missing information by ``simple means''\ldots rather than  | 
276  | 
looking at the literature which solves the problem  | 
|
277  | 
with much heavier machinery.  | 
|
| 820 | 278  | 
|
279  | 
\item \textbf{Build-In Functions}: The `prelude' comes
 | 
|
280  | 
  with several build-in functions: \texttt{new\_line()},
 | 
|
| 853 | 281  | 
  \texttt{skip}, \texttt{print\_int(n)}, \texttt{print\_space()},
 | 
282  | 
  \texttt{print\_star()} and \texttt{print\_char(n)}. You can find the `prelude' for
 | 
|
| 821 | 283  | 
  example in the file \texttt{sqr.ll}.
 | 
| 820 | 284  | 
\end{itemize}  
 | 
| 
205
 
0b59588d28d2
updated
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
204 
diff
changeset
 | 
285  | 
|
| 
200
 
7415871b1ef5
added
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents:  
diff
changeset
 | 
286  | 
\end{document}
 | 
| 
 
7415871b1ef5
added
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents:  
diff
changeset
 | 
287  | 
|
| 
 
7415871b1ef5
added
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents:  
diff
changeset
 | 
288  | 
%%% Local Variables:  | 
| 
 
7415871b1ef5
added
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents:  
diff
changeset
 | 
289  | 
%%% mode: latex  | 
| 
 
7415871b1ef5
added
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents:  
diff
changeset
 | 
290  | 
%%% TeX-master: t  | 
| 
 
7415871b1ef5
added
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents:  
diff
changeset
 | 
291  | 
%%% End:  |