cws/cw04.tex
author Christian Urban <christian.urban@kcl.ac.uk>
Fri, 22 Nov 2024 12:42:07 +0000
changeset 974 0cb4bf2469d1
parent 959 64ec1884d860
permissions -rw-r--r--
updated

% !TEX program = xelatex
\documentclass{article}
\usepackage{../style}
\usepackage{../langs}

\begin{document}

%https://github.com/Storyyeller/Krakatau
%https://docs.oracle.com/javase/specs/jvms/se7/html/

% Jasmin Tutorial
%http://saksagan.ceng.metu.edu.tr/courses/ceng444/link/jvm-cpm.html

\section*{Coursework 4}

\noindent This coursework is worth 15\% and is due on \cwFOUR{}
at 16:00. You are asked to implement a compiler for
the WHILE language that targets the assembler language
provided by Jasmin or Krakatau (both have a very similar
syntax).  You can do the implementation in any programming
language you like, but you need to submit the source code with
which you answered the questions, otherwise a mark of 0\% will
be awarded. You should use the lexer and parser from the
previous courseworks. Implement your compiler in the file \texttt{cw04.sc}.

%Please package \emph{everything}(!) in
%a zip-file that creates a directory with the name
%\texttt{YournameYourFamilyname} on my end.

\subsection*{Disclaimer\alert}

It should be understood that the work you submit represents
your own effort. You have not copied from anyone else. An
exception is the Scala code I showed during the lectures,
which you can use. You can also use your own code from the
CW~1, CW~2 and CW~3. But do not
be tempted to ask Github Copilot for help or do any other
shenanigans like this!


\subsection*{Jasmin Assembler}

For this coursework you will need an assembler. 
The Jasmin assembler is available from

\begin{center}
\url{http://jasmin.sourceforge.net}
\end{center}

\noindent
This is a jar-file you can run on the commandline.
There is a user guide for Jasmin

\begin{center}
\url{http://jasmin.sourceforge.net/guide.html}
\end{center}

\noindent and also a description of some of the instructions
that the JVM understands

\begin{center}
\url{http://jasmin.sourceforge.net/instructions.html}
\end{center}

\noindent If you generated a correct assembler file for
Jasmin, for example \texttt{loops.j}, you can use

\begin{center}
\texttt{java -jar jasmin.jar loops.j}
\end{center}

\noindent in order to translate it into Java Byte Code. If needed, you
need to give the path to the Jasmin jar-file.  The resulting class
file can be run with

\begin{center}
\texttt{java loops}
\end{center}

\noindent where you might need to give the correct path to the
class file. For example:

\begin{center}
\texttt{java -cp . loops/loops}
\end{center}

\noindent There are also other resources about Jasmin on the
Internet, for example

\begin{center}
\small\url{https://saksagan.ceng.metu.edu.tr/courses/ceng444/link/f3jasmintutorial.html}
\end{center}

\noindent and

\begin{center}
  \small\url{http://www.csc.villanova.edu/~tway/courses/csc4181/s2022/labs/finalproject/JVM.pdf}
\end{center}

\noindent
If possible use Jasmin for the coursework. The Krakatau assembler
below has a slightly different syntax.


\subsection*{Krakatau Assembler (Version 1 \& 2)}

The Krakatau assembler is available from

\begin{center}
  \url{https://github.com/Storyyeller/Krakatau/tree/master}
\end{center}

\noindent This assembler requires Python and a package called
\pcode{ply} available from

\begin{center}
\url{https://pypi.python.org/pypi/ply}
\end{center}

\noindent This assembler is largely compatible with the Jasmin
syntax---that means for the files we are concerned with here, it
understands the same input syntax (no changes to your compiler need to
be made; ok maybe some small syntactic adjustments are needed, for
example labels need to start with a capital '\texttt{L}'). You can generate Java
Byte Code by using

\begin{center}
\texttt{python Krakatau-master/assemble.py loops.j}
\end{center}

\noindent where you may have to adapt the directory where
Krakatau is installed (I just downloaded the zip file from
Github and \pcode{Krakatau-master} was the directory where it
was installed). Again the resulting class-file you can run with
\texttt{java}. There is also a newer version of Krakatau available at

\begin{center}
\url{https://github.com/Storyyeller/Krakatau/tree/v2}
\end{center}

\noindent
This is now a Rust program using Cargo as package manager (I have not tried this
version---I assume it should produce the same output, but might be
easier to install because it avoids Python's \emph{dependency hell}).


%\noindent You need to submit a document containing the answers
%for the two questions below. You can do the implementation in
%any programming language you like, but you need to submit the
%source code with which you answered the questions. Otherwise
%the submission will not be counted. However, the coursework
%will \emph{only} be judged according to the answers. You can
%submit your answers in a txt-file or as pdf.\bigskip

%\subsection*{Question 0 (Unmarked)}
%
%Please include  in the PDF at the beginning your email address, your student
%number and whether you are BSc / MSci student and for the latter in which
%year your are in. Thanks! 

\subsection*{Question 1}

You need to lex and parse WHILE programs, and then generate Java Byte
Code instructions for the Jasmin assembler (or Krakatau
assembler). For this you should use the ASTs defined in CW3 (including
logical operators). As part of the solution you need to submit the assembler
instructions for the Fibonacci and Factorial programs. 

\begin{figure}[t]
\lstinputlisting[language=while]{../cwtests/cw04/fib.while}
\caption{The Fibonacci program in the WHILE language.\label{fibs}}
\end{figure}

\subsection*{Question 2}

Extend the syntax of your language so that it contains also
\texttt{for}-loops, like

\begin{center}
\lstset{language=While}
\code{for} \;\textit{Id} \texttt{:=} \textit{AExp}\; \code{upto} 
\;\textit{AExp}\; \code{do} \textit{Block} 
\end{center}

\noindent The intended meaning is to first assign the variable
\textit{Id} the value of the first arithmetic expression, then test
whether this value is less or equal than the value of the
second arithmetic expression. If yes, go through the loop, and
at the end increase the value of the loop variable by 1 and
start again with the test. If no, leave the loop. For example
the following instance of a \code{for}-loop is supposed to
print out the numbers \pcode{2}, \pcode{3}, \pcode{4}.


\begin{center}
\begin{minipage}{12cm}
\begin{lstlisting}[language=While, numbers=none]
for i := 2 upto 4 do {
    write i	
}
\end{lstlisting}
\end{minipage}
\end{center}

\noindent There are two ways how this can be implemented: one
is to adapt the code generation part of the compiler and
generate specific code for \code{for}-loops; the other is to
translate the abstract syntax tree of \code{for}-loops into
an abstract syntax tree using existing language constructs.
For example the loop above could be translated to the
following \code{while}-loop:

\begin{center}
\begin{minipage}{12cm}
\begin{lstlisting}[language=While, numbers=none]
i := 2;
while (i <= 4) do {
    write i;
    i := i + 1
}
\end{lstlisting}
\end{minipage}
\end{center}

\subsection*{Question 3 (OPTIONAL)}

\noindent In this question you are asked to think about the following
program:

\begin{center}
\begin{minipage}{12cm}
\begin{lstlisting}[language=While, numbers=none]
for i := 1 upto 10 do {
  for i := 1 upto 10 do {
    write i
  }
} 
\end{lstlisting}
\end{minipage}
\end{center}

\noindent 
Note that in this program the variable \pcode{i} is used
twice. Therefore you need to make a decision about how it should be compiled?
What should the program print?

\subsection*{Question 4}

Extend the lexer and parser to add a \textcolor{codepurple}{\pcode{break}} keyword.  Modify
the compiler (including lexer and parser) such that when a \textcolor{codepurple}{\texttt{break}}-statement is encountered
the code should jump out of the ``enclosing'' for/while-loop, or in case it
is not inside such a loop to the end of the program. For example the
program

\begin{center}
\begin{minipage}{12cm}
\begin{lstlisting}[language=While, numbers=none]
// should print 0 .. 10    
for i := 0 upto 10 do {
  write i;
  write "\n"
};

// should print 0 .. 4  
for i := 0 upto 10 do {
  if i > 4 then break else skip;
  write i;
  write "\n"
};

write "Should print\n";
break;
write "Should not print\n"
\end{lstlisting}
\end{minipage}
\end{center}

\noindent
should print out 0 to 10 with the first for-loop, but only 0
to 4 in the second. Similarly it should print out \code{"Should print"},
but not \code{"Should not print"}. For this you need to add
a label to the end of every for- and while-loop and also to the end of the
whole program just in case you need to jump to that label via a
\code{break}. The file you need to be able to process for this question
is called \texttt{break.while}.


\subsection*{Further Information}

The Java infrastructure unfortunately does not contain an
assembler out-of-the-box (therefore you need to download the
additional package Jasmin or Krakatau---see above). But it
does contain a disassembler, called \texttt{javap}. A
dissembler does the ``opposite'' of an assembler: it generates
readable assembler code from Java Byte Code. Have a look at
the following example: Compile using the usual Java compiler
the simple Hello World program below:

\begin{center}
\begin{minipage}{12cm}
\begin{lstlisting}[language=Java,numbers=none]
class HelloWorld {
    public static void main(String[] args) {
        System.out.println("Hello World!");
    }
}
\end{lstlisting}
\end{minipage}
\end{center}

\noindent
You can use the command

\begin{center}
\begin{minipage}{12cm}
\begin{lstlisting}[language={},numbers=none]
javap -c HelloWorld
\end{lstlisting}
\end{minipage}
\end{center}

\noindent to see the assembler instructions of the Java Byte
Code that has been generated for this program. You can compare
this with the code generated for the Scala version of Hello
World.

\begin{center}
\begin{minipage}{12cm}
\begin{lstlisting}[language=Scala,numbers=none]
object HelloWorld {
   def main(args: Array[String]) = {
      println("Hello World!")
  }
}
\end{lstlisting}
\end{minipage}
\end{center}


\subsection*{Library Functions}

You need to generate code for the commands \texttt{write} and
\texttt{read}. This will require the addition of some
``library'' functions to your generated code. The first
command even needs two versions, because you need to write out
an integer and string. The Java byte code will need two
separate functions for this. For writing out an integer, you
can use the assembler code

\begin{center}
\begin{minipage}{12cm}
\begin{lstlisting}[language=JVMIS, numbers=none]
.method public static write(I)V 
  .limit locals 1 
  .limit stack 2  
  getstatic java/lang/System/out Ljava/io/PrintStream; 
  iload 0
  invokevirtual java/io/PrintStream/println(I)V 
  return 
.end method
\end{lstlisting}
\end{minipage}
\end{center}

\noindent This function will invoke Java's \texttt{println}
function for integers. Then if you need to generate code for
\texttt{write x} where \texttt{x} is an integer variable, you
can generate

\begin{center}
\begin{minipage}{12cm}
\begin{lstlisting}[language=JVMIS, numbers=none]
iload n 
invokestatic XXX/XXX/write(I)V
\end{lstlisting}
\end{minipage}
\end{center}

\noindent where \texttt{n} is the index where the value of the
variable \texttt{x} is stored. The \texttt{XXX/XXX} needs to
be replaced with the class name which you use to generate the
code (for example \texttt{fib/fib} in case of the Fibonacci
numbers).

Writing out a string is similar. The corresponding library
function uses strings instead of integers:

\begin{center}
\begin{minipage}{12cm}
\begin{lstlisting}[language=JVMIS, numbers=none]
.method public static writes(Ljava/lang/String;)V
   .limit stack 2
   .limit locals 1
   getstatic java/lang/System/out Ljava/io/PrintStream;
   aload 0
   invokevirtual java/io/PrintStream/println(Ljava/lang/String;)V
   return
.end method
\end{lstlisting}
\end{minipage}
\end{center}

\noindent The code that needs to be generated for \code{write
"some_string"} commands is

\begin{center}
\begin{minipage}{12cm}
\begin{lstlisting}[language=JVMIS,numbers=none]
ldc "some_string"
invokestatic XXX/XXX/writes(Ljava/lang/String;)V
\end{lstlisting}
\end{minipage}
\end{center}

\noindent Again you need to adjust the \texttt{XXX/XXX} part
in each call.

The code for \texttt{read} is more complicated. The reason is
that inputting a string will need to be transformed into an
integer. The code in Figure~\ref{read} does this. It can be
called with

\begin{center}
\begin{minipage}{12cm}
\begin{lstlisting}[language=JVMIS,numbers=none]
invokestatic XXX/XXX/read()I 
istore n
\end{lstlisting}
\end{minipage}
\end{center}

\noindent 
where \texttt{n} is the index of the variable that requires an input.
%If you
%use Windows you need to take into account that a ``return'' is not just a newline,
%\code{'\\10'}, but \code{'\\13\\10'}. This means you need to change line~12 in 
%Figure~\ref{read} to \pcode{ldc 13}.  


\begin{figure}[t]\small
\begin{lstlisting}[language=JVMIS,numbers=left]
.method public static read()I 
    .limit locals 10 
    .limit stack 10

    ldc 0 
    istore 1  ; this will hold our final integer 
Label1: 
    getstatic java/lang/System/in Ljava/io/InputStream; 
    invokevirtual java/io/InputStream/read()I 
    istore 2 
    iload 2 
    ldc 10  ; test for the newline delimiter for Unix
    isub 
    ifeq Label2
    iload 2
    ldc 13  ; test for the carriage-return in Windows 
    isub
    ifeq Label2
    iload 2 
    ldc 32   ; the space delimiter 
    isub 
    ifeq Label2
    iload 2 
    ldc 48   ; we have our digit in ASCII, have to subtract it from 48 
    isub 
    ldc 10 
    iload 1 
    imul 
    iadd 
    istore 1 
    goto Label1 
Label2: 
    ; when we come here we have our integer computed
    ; in local variable 1 
    iload 1 
    ireturn 
.end method
\end{lstlisting}\normalsize
\caption{Assembler code for reading an integer from the console.
  This code is portable for Unix and Windows (see Lines 11--18 for 2 separate
  tests for the various end-of-line markers). Thanks to Harry
  Dilnot to make it portable.\label{read}}
\end{figure}

\end{document}

%%% Local Variables: 
%%% mode: latex
%%% TeX-master: t
%%% End: