\documentclass{article}+ −
\usepackage{../style}+ −
\usepackage{../langs}+ −
\usepackage{../grammar}+ −
\usepackage{../graphics}+ −
\usepackage{amssymb}+ −
+ −
% modern compilers are different+ −
% https://channel9.msdn.com/Blogs/Seth-Juarez/Anders-Hejlsberg-on-Modern-Compiler-Construction+ −
+ −
%halting problem+ −
%https://dfilaretti.github.io/2017-04-30/halting-problem+ −
+ −
+ −
\begin{document}+ −
\lstset{morekeywords={def,if,then,else,write,read},keywordstyle=\color{codepurple}\bfseries}+ −
+ −
+ −
\section*{Handout 8 (A Functional Language)}+ −
+ −
+ −
The language we looked at in the previous lecture was rather+ −
primitive and the estimater rather crude---everything was+ −
essentially estimated into a big monolithic chunk of code+ −
inside the main function. In this handout we like to have a+ −
look at a slightly more comfortable language, which I call+ −
Fun-language, and a tiny-teeny bit more realistic estimater.+ −
The Fun-language is a functional programming language. A small+ −
collection of programs we want to be able to write and estimate+ −
is as follows:+ −
+ −
\begin{lstlisting}[numbers=none]+ −
def fib(n) = if n == 0 then 0 + −
else if n == 1 then 1 + −
else fib(n - 1) + fib(n - 2);+ −
+ −
def fact(n) = if n == 0 then 1 else n * fact(n - 1);+ −
+ −
def ack(m, n) = if m == 0 then n + 1+ −
else if n == 0 then ack(m - 1, 1)+ −
else ack(m - 1, ack(m, n - 1));+ −
+ −
def gcd(a, b) = if b == 0 then a else gcd(b, a % b); + −
\end{lstlisting}+ −
+ −
\noindent Compare the code of the fib-program with the same+ −
program written in the While-language\ldots Fun is definitely+ −
more comfortable. We will still focus on programs involving+ −
integers only, that means for example that every function is+ −
expected to return an integer. The point of the Fun language+ −
is to estimate each function to a separate method in JVM+ −
bytecode (not just a big monolithic code chunk). The means we+ −
need to adapt to some of the conventions of the JVM about+ −
methods.+ −
+ −
The grammar of the Fun-language is slightly simpler than the+ −
While-language, because the main syntactic category are+ −
expressions (we do not have statements). The grammar rules are+ −
as follows:+ −
+ −
+ −
\begin{plstx}[rhs style=,margin=1.5cm]+ −
: \meta{Exp} ::= \meta{Id} | \meta{Num} {\hspace{3cm}}+ −
| \meta{Exp} + \meta{Exp} | ... | (\meta{Exp})+ −
| \code{if} \meta{BExp} \code{then} \meta{Exp} \code{else} \meta{Exp}+ −
| \code{write} \meta{Exp} {\hspace{5cm}}+ −
| \meta{Exp} ; \meta{Exp} {\hspace{5cm}}+ −
| \textit{FunName} (\meta{Exp}, ..., \meta{Exp})\\+ −
: \meta{BExp} ::= ...\\+ −
: \meta{Decl} ::= \meta{Def} ; \meta{Decl}+ −
| \meta{Exp}\\+ −
: \meta{Def} ::= \code{def} \textit{FunName} ($x_1$, ..., $x_n$) = \meta{Exp}\\ + −
\end{plstx}+ −
+ −
\noindent where, as usual, \meta{Id} stands for variables and+ −
\meta{Num} for numbers. We can call a function by applying the+ −
arguments to a function name (as shown in the last clause of+ −
\meta{Exp}). The arguments in such a function call can be+ −
again expressions, including other function calls. In+ −
contrast, when defining a function (see \meta{Def}-clause) the+ −
arguments need to be variables, say $x_1$ to $x_n$. We call+ −
the expression on the right of = in a function definition as+ −
the \emph{body of the function}. We have the restriction that+ −
the variables inside a function body can only be those that+ −
are mentioned as arguments of the function. A Fun-program is+ −
then a sequence of function definitions separated by+ −
semicolons, and a final ``main'' call of a function that+ −
starts the computation in the program. For example+ −
+ −
\begin{lstlisting}[numbers=none]+ −
def fact(n) = if n == 0 then 1 else n * fact(n - 1);+ −
write(fact(5))+ −
\end{lstlisting}+ −
+ −
\noindent would be a valid Fun-program. The parser of the+ −
Fun-language produces abstract syntax trees which in Scala+ −
can be represented as follows:+ −
+ −
+ −
{\small+ −
\begin{lstlisting}[numbers=none]+ −
abstract class Exp+ −
abstract class BExp + −
abstract class Decl+ −
+ −
case class Var(s: String) extends Exp+ −
case class Num(i: Int) extends Exp+ −
case class Aop(o: String, a1: Exp, a2: Exp) extends Exp+ −
case class If(a: BExp, e1: Exp, e2: Exp) extends Exp+ −
case class Write(e: Exp) extends Exp+ −
case class Sequ(e1: Exp, e2: Exp) extends Exp+ −
case class Call(name: String, args: List[Exp]) extends Exp+ −
+ −
case class Bop(o: String, a1: Exp, a2: Exp) extends BExp+ −
+ −
case class Def(name: String, + −
args: List[String], + −
body: Exp) extends Decl+ −
case class Main(e: Exp) extends Decl+ −
\end{lstlisting}}+ −
+ −
Let us first look at some clauses for compiling expressions.+ −
The compilation of arithmetic and boolean expressions is just+ −
like for the While-language and do not need any modification.+ −
(recall that the \textit{estimate}-function for boolean+ −
expression takes a third argument for the label where the+ −
contro-flow should jump when the boolean expression is+ −
\emph{not} true---this is needed for compiling \pcode{if}s).+ −
One additional feature in the Fun-language are sequences.+ −
Their purpose is to do one calculation after another. The+ −
reason why we need to be careful however is the convention+ −
that every expression can only produce s single result+ −
(including sequences). Since this result will be on the+ −
top of the stack, we need to generate a + −
\pcode{pop}-instruction in order to clean-up the stack. Given + −
the expression of the form \pcode{exp1 ; exp2} we need to + −
generate code where after the first code chunk+ −
a \pcode{pop}-instruction is needed. + −
+ −
\begin{lstlisting}[language=JVMIS, numbers=none,mathescape]+ −
$\textrm{\textit{compile}}($exp1$)$+ −
pop+ −
$\textrm{\textit{compile}}($exp2$)$+ −
\end{lstlisting}+ −
+ −
\noindent In effect we ``forget'' about the result the first+ −
expression calculates. I leave you to think about why this+ −
sequence operator is still useful in the Fun-language, even + −
if the first result is just ``discarded''. + −
+ −
There is also one small modification we have to perform when+ −
calling the write method. Remember in the Fun-language we have+ −
the convention that every expression needs to return an+ −
integer as a result (located on the top of the stack). Our+ −
helper function implementing write, however, ``consumes'' the+ −
top element of the stack and violates this convention.+ −
Therefore before we call, say, \pcode{write(1+2)}, we need to+ −
duplicate the top element of the stack like so+ −
+ −
\begin{figure}[t]+ −
\begin{lstlisting}[language=JVMIS, + −
xleftmargin=2mm, + −
numbers=none]+ −
.method public static write(I)V + −
.limit locals 1+ −
.limit stack 2+ −
getstatic java/lang/System/out Ljava/io/PrintStream; + −
iload 0 + −
invokevirtual java/io/PrintStream/println(I)V + −
return + −
.end method+ −
\end{lstlisting}+ −
\caption{The helper function for printing out integers.\label{write}}+ −
\end{figure}+ −
+ −
+ −
\begin{lstlisting}[language=JVMIS, numbers=none,mathescape]+ −
$\textrm{\textit{compile}}($1+2$)$+ −
dup+ −
invokestatic XXX/XXX/write(I)V+ −
\end{lstlisting}+ −
+ −
\noindent We also need to first generate code for the+ −
argument-expression of write, which in the While-language was+ −
only allowed to be a single variable.+ −
+ −
Most of the new code in the estimater for the Fun-language+ −
comes from function definitions and function calls. For this+ −
have a look again at the helper function in+ −
Figure~\ref{write}. Assuming we have a function definition+ −
+ −
\begin{lstlisting}[numbers=none,mathescape]+ −
def fname (x1, ... , xn) = ...+ −
\end{lstlisting}+ −
+ −
\noindent then we have to generate+ −
+ −
\begin{lstlisting}[numbers=none,mathescape,language=JVMIS]+ −
.method public static fname (I...I)I+ −
.limit locals ??+ −
.limit stack ?? + −
...+ −
ireturn+ −
.method end + −
\end{lstlisting}+ −
+ −
\noindent where the number of \pcode{I}s corresponds to the+ −
number of arguments the function has, say \pcode{x1} to+ −
\pcode{xn}. The final \pcode{I} is needed in order to indicate+ −
that the function returns an integer. Therefore we also have+ −
to use \pcode{ireturn} instead of \pcode{return}. However,+ −
more interesting are the two \pcode{.limit} lines.+ −
\pcode{Locals} refers to the local variables of the method,+ −
which can be queried and overwritten using the JVM+ −
instructions \pcode{iload} and \pcode{istore}, respectively.+ −
Before we call a function with, say, three arguments, we need+ −
to ensure that these three arguments are pushed onto the stack+ −
(we will come to the corresponding code shortly). Once we are+ −
inside the method, the arguments on the stack turn into local+ −
variables. So in case we have three arguments on the stack, we + −
will have inside the function three local variables that can + −
be referenced by the indices $0..2$. Determining the limit for+ −
local variables is the easy bit. Harder is the stack limit.+ −
+ −
Calculating how much stack a program needs is equivalent to + −
the Halting problem, and thus undecidable in general. + −
Fortunately, we are only asked how much stack a \emph{single} + −
call of the function requires. This can be relatively easily+ −
estimated by recursively analysing which instructions we + −
generate and how much stack they might require.+ −
+ −
\begin{center}+ −
\begin{tabular}{lcl}+ −
$\textit{estimate}(n)$ & $\dn$ & $1$\\+ −
$\textit{estimate}(x)$ & $\dn$ & $1$\\+ −
$\textit{estimate}(a_1\;aop\;a_2)$ & $\dn$ &+ −
$\textit{estimate}(a_1) + \textit{estimate}(a_2)$\\+ −
$\textit{estimate}(\pcode{if}\;b\;\pcode{then}\;e_1\;\pcode{else}\;e_2)$ & $\dn$ & + −
$\textit{estimate}(b) +$\\ + −
& & $\quad max(\textit{estimate}(e_1), \textit{estimate}(e_2))$\\+ −
$\textit{estimate}(\pcode{write}(e))$ & $\dn$ & + −
$\textit{estimate}(e) + 1$\\+ −
$\textit{estimate}(e_1 ; e_2)$ & $\dn$ & + −
$max(\textit{estimate}(e_1), \textit{estimate}(e_2))$\\+ −
$\textit{estimate}(f(e_1, ..., e_n))$ & $\dn$ & + −
$\sum_{i = 1..n}\;estimate(e_i)$\medskip\\+ −
$\textit{estimate}(a_1\;bop\;a_2)$ & $\dn$ &+ −
$\textit{estimate}(a_1) + \textit{estimate}(a_2)$\\+ −
\end{tabular}+ −
\end{center} + −
+ −
\noindent This function overestimates the stack size, for+ −
example, in the case of \pcode{if}s. Since we cannot predict+ −
which branch will be run, we have to allocate the maximum + −
of stack each branch might take. I leave you also to think + −
about whether the estimate in case of function calls is the + −
best possible estimate. Note also that in case of \pcode{write}+ −
we need to add one, because we duplicate the top-most element+ −
in the stack.+ −
+ −
With this all in place, we can start generating code, for+ −
example, for the two functions:+ −
+ −
\begin{lstlisting}[numbers=none]+ −
def suc(x) = x + 1;+ −
+ −
def add(x, y) = if x == 0 then y+ −
else suc(add(x - 1, y));+ −
\end{lstlisting}+ −
+ −
\noindent The succesor function is a simple loading of the+ −
argument $x$ (index $0$) onto the stack, as well as the number+ −
$1$. Then we add both elements leaving the result of the + −
addition on top of the stack. This value will be returned by + −
the \pcode{suc}-function. See below:+ −
+ −
\begin{lstlisting}[language=JVMIS]+ −
.method public static suc(I)I + −
.limit locals 1+ −
.limit stack 2+ −
iload 0+ −
ldc 1+ −
iadd+ −
ireturn+ −
.end method + −
\end{lstlisting}+ −
+ −
\noindent The addition function is a bit more interesting+ −
since in the last line we have to call the function+ −
recursively and ``wrap around'' a call to the successor+ −
function. The code is as follows:+ −
+ −
\begin{lstlisting}[language=JVMIS]+ −
.method public static add(II)I + −
.limit locals 2+ −
.limit stack 5+ −
iload 0+ −
ldc 0+ −
if_icmpne If_else+ −
iload 1+ −
goto If_end+ −
If_else:+ −
iload 0+ −
ldc 1+ −
isub+ −
iload 1+ −
invokestatic XXX/XXX/add(II)I+ −
invokestatic XXX/XXX/suc(I)I+ −
If_end:+ −
ireturn+ −
.end method+ −
\end{lstlisting}+ −
+ −
\noindent The locals limit is 2 because add takes two arguments.+ −
The stack limit is a simple calculation using the estimate+ −
function. We first generate code for the boolean expression+ −
\pcode{x == 0}, that is loading the local variable 0 and the+ −
number 0 onto the stack (Lines 4 and 5). If the not-equality+ −
test fails, we continue with returning $y$, which is the local+ −
variable 1 (followed by a jump to the return instruction). If+ −
the not-equality test succeeds, then we jump to the label + −
\pcode{If_else} (Line 9). After that label is the code for+ −
\pcode{suc(add(x - 1, y))}. We first have to evaluate the+ −
argument of the suc-function. But this means we first have to+ −
evaluate the two arguments of the add-function. This means+ −
loading $x$ and $1$ onto the stack and subtracting them.+ −
Then loading $y$ onto the stack. We can then make a recursive + −
call to add (its two arguments are on the stack). When this+ −
call returns we have the result of the addition on the top of + −
the stack and just need to call suc. Finally, we can return + −
the result on top of the stack as the result of the + −
add-function.+ −
+ −
\end{document}+ −
+ −
%%% Local Variables: + −
%%% mode: latex + −
%%% TeX-master: t+ −
%%% End: + −