cws/cw05.tex
author Christian Urban <christian.urban@kcl.ac.uk>
Sat, 21 Oct 2023 09:09:09 +0100
changeset 943 5365ef60707e
parent 917 89e05a230d2d
child 959 64ec1884d860
permissions -rw-r--r--
updated

% !TEX program = xelatex
\documentclass{article}
\usepackage{../style}
\usepackage{../graphicss}
\usepackage{../langs}
\definecolor{navyblue}{rgb}{0.0, 0.0, 0.5}
\definecolor{pansypurple}{rgb}{0.47, 0.09, 0.29}

\begin{document}


\color{pansypurple}
\section*{RESIT / REPLACEMENT}

{\bf
The resit / replacement task is essentially CW5 (listed below) with
the exception that the lexer and parser is already provided. The
parser will generate an AST (see file \texttt{fun\_llvm.sc}). Your task
is to generate an AST for the K-intermediate language and supply
sufficient type annotations such that you can generate valid code for
the LLVM-IR. The submission deadline is 4th August at 16:00. At the
deadline, please send me an email containing a zip-file with your
files.
Feel free to reuse the files I have uploaded on KEATS (especially
the files generating simple LLVM-IR code). Of help might also be the
videos of Week~10.\bigskip

\noindent
Good Luck!}\smallskip\\
\noindent
Christian
\color{black}


\section*{Coursework 5}



\noindent This coursework is worth 25\% and is due on \cwFIVE{} at
16:00. You are asked to implement a compiler targeting the LLVM-IR.
Be careful that this CW needs some material about the LLVM-IR
that has not been shown in the lectures and your own experiments
might be required. You can find information about the LLVM-IR at

\begin{itemize}
\item \url{https://bit.ly/3rheZYr}
\item \url{https://llvm.org/docs/LangRef.html}  
\end{itemize}  

\noindent
You can do the implementation of your compiler in any programming
language you like, but you need to submit the source code with which
you generated the LLVM-IR files, otherwise a mark of 0\% will be
awarded. You are asked to submit the code of your compiler, but also
the generated \texttt{.ll} files. No PDF is needed for this
coursework.  You should use the lexer and parser from the previous
courseworks, but you need to make some modifications to them for the
`typed' version of the Fun-language. I will award up to 5\% if a lexer
and a parser are correctly implemented. At the end, please package
everything(!) in a zip-file that creates a directory with the name

\begin{center}
\texttt{YournameYourFamilyname}
\end{center}

\noindent
on my end. You will be marked according to the input files

\begin{itemize}
\item\href{https://nms.kcl.ac.uk/christian.urban/cfl/progs/sqr.fun}{sqr.fun}  
\item\href{https://nms.kcl.ac.uk/christian.urban/cfl/progs/fact.fun}{fact.fun}
\item\href{https://nms.kcl.ac.uk/christian.urban/cfl/progs/mand.fun}{mand.fun}
\item\href{https://nms.kcl.ac.uk/christian.urban/cfl/progs/mand2.fun}{mand2.fun}
\item\href{https://nms.kcl.ac.uk/christian.urban/cfl/progs/hanoi.fun}{hanoi.fun}    
\end{itemize}  

\noindent
which are uploaded to KEATS.

\subsection*{Disclaimer\alert}

It should be understood that the work you submit represents your own
effort. You have not copied from anyone else. An exception is the
Scala code I showed during the lectures or uploaded to KEATS, which
you can both use. You can also use your own code from the CW~1 --
CW~4. But do not
be tempted to ask Github Copilot for help or do any other
shenanigans like this!


\subsection*{Task}

The goal is to lex and parse 5 Fun-programs, including the
Mandelbrot program shown in Figure~\ref{mand}, and generate
corresponding code for the LLVM-IR. Unfortunately the calculations for
the Mandelbrot Set require floating point arithmetic and therefore we
cannot be as simple-minded about types as we have been so far
(remember the LLVM-IR is a fully-typed language and needs to know the
exact types of each expression). The idea is to deal appropriately
with three types, namely \texttt{Int}, \texttt{Double} and
\texttt{Void} (they are represented in the LLVM-IR as \texttt{i32},
\texttt{double} and \texttt{void}). You need to extend the lexer and
parser accordingly in order to deal with type annotations. The
Fun-language includes global constants, such as

\begin{lstlisting}[numbers=none]
  val Ymin: Double = -1.3;
  val Maxiters: Int = 1000;
\end{lstlisting}

\noindent
where you can assume that they are `normal' identifiers, just
starting with a capital letter---all other identifiers should have
lower-case letters. Function definitions can take arguments of
type \texttt{Int} or \texttt{Double}, and need to specify a return
type, which can be \texttt{Void}, for example

\begin{lstlisting}[numbers=none]
  def foo(n: Int, x: Double) : Double = ...
  def id(n: Int) : Int = ...
  def bar() : Void = ...
\end{lstlisting}

\noindent
The idea is to record all typing information that is given
in the Fun-program, but then delay any further typing inference to
after the CPS-translation. That means the parser should
generate ASTs given by the Scala dataypes:

\begin{lstlisting}[numbers=none,language=Scala]
abstract class Exp 
abstract class BExp  
abstract class Decl 

case class Def(name: String, args: List[(String, String)],
               ty: String, body: Exp) extends Decl
case class Main(e: Exp) extends Decl
case class Const(name: String, v: Int) extends Decl
case class FConst(name: String, x: Double) extends Decl

case class Call(name: String, args: List[Exp]) extends Exp
case class If(a: BExp, e1: Exp, e2: Exp) extends Exp
case class Var(s: String) extends Exp
case class Num(i: Int) extends Exp     // integer numbers
case class FNum(i: Double) extends Exp  // floating numbers
case class ChConst(c: Int) extends Exp // char constants
case class Aop(o: String, a1: Exp, a2: Exp) extends Exp
case class Sequence(e1: Exp, e2: Exp) extends Exp
case class Bop(o: String, a1: Exp, a2: Exp) extends BExp
\end{lstlisting}

\noindent
This datatype distinguishes whether the global constant is an integer
constant or floating constant. Also a function definition needs to
record the return type of the function, namely the argument
\texttt{ty} in \texttt{Def}, and the arguments consist of an pairs of
identifier names and types (\texttt{Int} or \texttt{Double}). The hard
part of the CW is to design the K-intermediate language and infer all
necessary types in order to generate LLVM-IR code. You can check
your LLVM-IR code by running it with the interpreter \texttt{lli}.

\begin{figure}[t]
\lstinputlisting[language=Scala]{../cwtests/cw05/mand.fun}
\caption{The Mandelbrot program in the `typed' Fun-language.\label{mand}}
\end{figure}

\begin{figure}[t]
\includegraphics[scale=0.35]{../solutions/cw5/out.png}
\caption{Ascii output of the Mandelbrot program.\label{mand2}}
\end{figure}

Also note that the second version of the Mandelbrot program and also
the Tower of Hanoi program use character constants, like \texttt{'a'},
\texttt{'1'}, \texttt{'$\backslash$n'} and so on. When they are tokenised,
such characters should be interpreted as the corresponding ASCII code (an
integer), such that we can use them in calculations like \texttt{'a' + 10}
where the result should be 107. As usual, the character \texttt{'$\backslash$n'} is the
ASCII code 10.


\subsection*{LLVM-IR}

There are some subtleties in the LLVM-IR you need to be aware of:

\begin{itemize}
\item \textbf{Global constants}: While global constants such as

\begin{lstlisting}[numbers=none]  
val Max : Int = 10;
\end{lstlisting}

\noindent
can be easily defined in the LLVM-IR as follows

\begin{lstlisting}[numbers=none]  
@Max = global i32 10
\end{lstlisting}

\noindent
they cannot easily be referenced. If you want to use
this constant then you need to generate code such as

\begin{lstlisting}[numbers=none]  
%tmp_22 = load i32, i32* @Max
\end{lstlisting}

\noindent
first, which treats \texttt{@Max} as an Integer-pointer (type
\texttt{i32*}) that needs to be loaded into a local variable,
here \texttt{\%tmp\_22}.

\item \textbf{Void-Functions}: While integer and double functions
  can easily be called and their results can be allocated to a
  temporary variable:

  \begin{lstlisting}[numbers=none]  
   %tmp_23 = call i32 @sqr (i32 %n)
  \end{lstlisting}

  void-functions cannot be allocated to a variable. They need to be
  called just as

  \begin{lstlisting}[numbers=none]  
  call void @print_int (i32 %tmp_23)
\end{lstlisting}

\item \textbf{Floating-Point Operations}: While integer operations
  are specified in the LLVM-IR as

  \begin{lstlisting}[numbers=none,language=Scala]
  def compile_op(op: String) = op match {
    case "+" => "add i32 "
    case "*" => "mul i32 "
    case "-" => "sub i32 "
    case "==" => "icmp eq i32 "
    case "!=" => "icmp ne i32 "
    case "<=" => "icmp sle i32 " // signed less or equal
    case "<"  => "icmp slt i32 " // signed less than
  }\end{lstlisting}

  the corresponding operations on doubles are

  \begin{lstlisting}[numbers=none,language=Scala]
  def compile_dop(op: String) = op match {
    case "+" => "fadd double "
    case "*" => "fmul double "
    case "-" => "fsub double "
    case "==" => "fcmp oeq double "
    case "!=" => "fcmp one double "
    case "<=" => "fcmp ole double "   
    case "<"  => "fcmp olt double "   
  }\end{lstlisting}

\item \textbf{Typing}: In order to leave the CPS-translations
  as is, it makes sense to defer the full type-inference to the
  K-intermediate-language. For this it is good to define
  the \texttt{KVar} constructor as

\begin{lstlisting}[numbers=none,language=Scala]  
case class KVar(s: String, ty: Ty = "UNDEF") extends KVal\end{lstlisting}

  where first a default type, for example \texttt{UNDEF}, is
  given. Then you need to define two typing functions

  \begin{lstlisting}[numbers=none,language=Scala]  
    def typ_val(v: KVal, ts: TyEnv) = ???
    def typ_exp(a: KExp, ts: TyEnv) = ???
  \end{lstlisting}

  Both functions require a typing-environment that updates
  the information about what type each variable, operation
  and so on receives. Once the types are inferred, the
  LLVM-IR code can be generated. Since we are dealing only
  with simple first-order functions, nothing on the scale
  as the `Hindley-Milner' typing-algorithm is needed. I suggest
  to just look at what data is avaliable and generate all
  missing information by ``simple means''\ldots rather than
  looking at the literature which solves the problem
  with much heavier machinery.

\item \textbf{Build-In Functions}: The `prelude' comes
  with several build-in functions: \texttt{new\_line()},
  \texttt{skip}, \texttt{print\_int(n)}, \texttt{print\_space()},
  \texttt{print\_star()} and \texttt{print\_char(n)}. You can find the `prelude' for
  example in the file \texttt{sqr.ll}.
\end{itemize}  

\end{document}

%%% Local Variables: 
%%% mode: latex
%%% TeX-master: t
%%% End: