handouts/ho08.tex
changeset 381 47eceea734c5
parent 380 1e88390e81aa
child 382 38e368dc9b10
equal deleted inserted replaced
380:1e88390e81aa 381:47eceea734c5
     1 \documentclass{article}
     1 \documentclass{article}
     2 \usepackage{../style}
     2 \usepackage{../style}
     3 \usepackage{../langs}
     3 \usepackage{../langs}
     4 \usepackage{../grammar}
     4 \usepackage{../grammar}
     5 \usepackage{../graphics}
     5 \usepackage{../graphics}
     6 
     6 \usepackage{amssymb}
     7 
     7 
     8 
     8 
     9 \begin{document}
     9 \begin{document}
    10 \lstset{morekeywords={def,if,then,else,write,read},keywordstyle=\color{codepurple}\bfseries}
    10 \lstset{morekeywords={def,if,then,else,write,read},keywordstyle=\color{codepurple}\bfseries}
    11 
    11 
    12 
    12 
    13 \section*{Handout 8 (A Functional Language)}
    13 \section*{Handout 8 (A Functional Language)}
    14 
    14 
    15 
    15 
    16 The language we looked at in the previous lecture was rather
    16 The language we looked at in the previous lecture was rather
    17 primitive and the compiler rather crude---everything was
    17 primitive and the estimater rather crude---everything was
    18 essentially compiled into a big monolithic chunk of code
    18 essentially estimated into a big monolithic chunk of code
    19 inside the main function. In this handout we like to have a
    19 inside the main function. In this handout we like to have a
    20 look at a slightly more comfortable language, which I call
    20 look at a slightly more comfortable language, which I call
    21 Fun-language, and a tiny-teeny bit more realistic compiler.
    21 Fun-language, and a tiny-teeny bit more realistic estimater.
    22 The Fun-language is a functional programming language. A small
    22 The Fun-language is a functional programming language. A small
    23 collection of programs we want to be able to write and compile
    23 collection of programs we want to be able to write and estimate
    24 is as follows:
    24 is as follows:
    25 
    25 
    26 \begin{lstlisting}[basicstyle=\ttfamily, numbers=none]
    26 \begin{lstlisting}[numbers=none]
    27 def fib(n) = if n == 0 then 0 
    27 def fib(n) = if n == 0 then 0 
    28              else if n == 1 then 1 
    28              else if n == 1 then 1 
    29              else fib(n - 1) + fib(n - 2);
    29              else fib(n - 1) + fib(n - 2);
    30 
    30 
    31 def fact(n) = if n == 0 then 1 else n * fact(n - 1);
    31 def fact(n) = if n == 0 then 1 else n * fact(n - 1);
    40 \noindent Compare the code of the fib-program with the same
    40 \noindent Compare the code of the fib-program with the same
    41 program written in the While-language\ldots Fun is definitely
    41 program written in the While-language\ldots Fun is definitely
    42 more comfortable. We will still focus on programs involving
    42 more comfortable. We will still focus on programs involving
    43 integers only, that means for example that every function is
    43 integers only, that means for example that every function is
    44 expected to return an integer. The point of the Fun language
    44 expected to return an integer. The point of the Fun language
    45 is to compile each function to a separate method in JVM
    45 is to estimate each function to a separate method in JVM
    46 bytecode (not just a big monolithic code chunk). The means we
    46 bytecode (not just a big monolithic code chunk). The means we
    47 need to adapt to some of the conventions of the JVM about
    47 need to adapt to some of the conventions of the JVM about
    48 methods.
    48 methods.
    49 
    49 
    50 The grammar of the Fun-language is slightly simpler than the
    50 The grammar of the Fun-language is slightly simpler than the
   114 \end{lstlisting}}
   114 \end{lstlisting}}
   115 
   115 
   116 Let us first look at some clauses for compiling expressions.
   116 Let us first look at some clauses for compiling expressions.
   117 The compilation of arithmetic and boolean expressions is just
   117 The compilation of arithmetic and boolean expressions is just
   118 like for the While-language and do not need any modification.
   118 like for the While-language and do not need any modification.
   119 (recall that the \textit{compile}-function for boolean
   119 (recall that the \textit{estimate}-function for boolean
   120 expression takes a third argument for the label where the
   120 expression takes a third argument for the label where the
   121 contro-flow should jump when the boolean expression is
   121 contro-flow should jump when the boolean expression is
   122 \emph{not} true---this is needed for compiling \pcode{if}s).
   122 \emph{not} true---this is needed for compiling \pcode{if}s).
   123 One additional feature in the Fun-language are sequences.
   123 One additional feature in the Fun-language are sequences.
   124 Their purpose is to do one calculation after another. The
   124 Their purpose is to do one calculation after another. The
   130 the expression of the form \pcode{exp1 ; exp2} we need to 
   130 the expression of the form \pcode{exp1 ; exp2} we need to 
   131 generate code where after the first code chunk
   131 generate code where after the first code chunk
   132 a \pcode{pop}-instruction is needed. 
   132 a \pcode{pop}-instruction is needed. 
   133 
   133 
   134 \begin{lstlisting}[language=JVMIS, numbers=none,mathescape]
   134 \begin{lstlisting}[language=JVMIS, numbers=none,mathescape]
   135 $\textrm{\textit{compile}}($exp1$)$
   135 $\textrm{\textit{estimate}}($exp1$)$
   136 pop
   136 pop
   137 $\textrm{\textit{compile}}($exp2$)$
   137 $\textrm{\textit{estimate}}($exp2$)$
   138 \end{lstlisting}
   138 \end{lstlisting}
   139 
   139 
   140 \noindent In effect we ``forget'' about the result the first
   140 \noindent In effect we ``forget'' about the result the first
   141 expression calculates. I leave you to think about why this
   141 expression calculates. I leave you to think about why this
   142 sequence operator is still useful in the Fun-language, even 
   142 sequence operator is still useful in the Fun-language, even 
   167 \caption{The helper function for printing out integers.\label{write}}
   167 \caption{The helper function for printing out integers.\label{write}}
   168 \end{figure}
   168 \end{figure}
   169 
   169 
   170 
   170 
   171 \begin{lstlisting}[language=JVMIS, numbers=none,mathescape]
   171 \begin{lstlisting}[language=JVMIS, numbers=none,mathescape]
   172 $\textrm{\textit{compile}}($1+2$)$
   172 $\textrm{\textit{estimate}}($1+2$)$
   173 dup
   173 dup
   174 invokestatic XXX/XXX/write(I)V
   174 invokestatic XXX/XXX/write(I)V
   175 \end{lstlisting}
   175 \end{lstlisting}
   176 
   176 
   177 \noindent We also need to first generate code for the
   177 \noindent We also need to first generate code for the
   178 argument-expression of write, which in the While-language was
   178 argument-expression of write, which in the While-language was
   179 only allowed to be a single variable.
   179 only allowed to be a single variable.
   180 
   180 
   181 Most of the new code in the compiler for the Fun-language
   181 Most of the new code in the estimater for the Fun-language
   182 comes from function definitions and function calls. For this
   182 comes from function definitions and function calls. For this
   183 have a look again at the helper function in
   183 have a look again at the helper function in
   184 Figure~\ref{write}. Assuming we have a function definition
   184 Figure~\ref{write}. Assuming we have a function definition
   185 
   185 
   186 \begin{lstlisting}[numbers=none,mathescape]
   186 \begin{lstlisting}[numbers=none,mathescape]
   201 \noindent where the number of \pcode{I}s corresponds to the
   201 \noindent where the number of \pcode{I}s corresponds to the
   202 number of arguments the function has, say \pcode{x1} to
   202 number of arguments the function has, say \pcode{x1} to
   203 \pcode{xn}. The final \pcode{I} is needed in order to indicate
   203 \pcode{xn}. The final \pcode{I} is needed in order to indicate
   204 that the function returns an integer. Therefore we also have
   204 that the function returns an integer. Therefore we also have
   205 to use \pcode{ireturn} instead of \pcode{return}. However,
   205 to use \pcode{ireturn} instead of \pcode{return}. However,
   206 more interesting are the two \pcode{.limit} lines. Locals
   206 more interesting are the two \pcode{.limit} lines.
   207 refers to the local variables of the method, which can be
   207 \pcode{Locals} refers to the local variables of the method,
   208 queried and overwritten using \pcode{iload} and
   208 which can be queried and overwritten using the JVM
   209 \pcode{istore}, respectively.
   209 instructions \pcode{iload} and \pcode{istore}, respectively.
       
   210 Before we call a function with, say, three arguments, we need
       
   211 to ensure that these three arguments are pushed onto the stack
       
   212 (we will come to the corresponding code shortly). Once we are
       
   213 inside the method, the arguments on the stack turn into local
       
   214 variables. So in case we have three arguments on the stack, we 
       
   215 will have inside the function three local variables that can 
       
   216 be referenced by the indices $0..2$. Determining the limit for
       
   217 local variables is the easy bit. Harder is the stack limit.
       
   218 
       
   219 Calculating how much stack a program needs is equivalent to 
       
   220 the Halting problem, and thus undecidable in general. 
       
   221 Fortunately, we are only asked how much stack a \emph{single} 
       
   222 call of the function requires. This can be relatively easily
       
   223 estimated by recursively analysing which instructions we 
       
   224 generate and how much stack they might require.
       
   225  
       
   226 \begin{center}
       
   227 \begin{tabular}{lcl}
       
   228 $\textit{estimate}(n)$ & $\dn$ & $1$\\
       
   229 $\textit{estimate}(x)$ & $\dn$ & $1$\\
       
   230 $\textit{estimate}(a_1\;aop\;a_2)$ & $\dn$ &
       
   231 $\textit{estimate}(a_1) + \textit{estimate}(a_2)$\\
       
   232 $\textit{estimate}(\pcode{if}\;b\;\pcode{then}\;e_1\;\pcode{else}\;e_2)$ & $\dn$ & 
       
   233 $\textit{estimate}(b) +$\\ 
       
   234 & & $\quad max(\textit{estimate}(e_1), \textit{estimate}(e_2))$\\
       
   235 $\textit{estimate}(\pcode{write}(e))$ & $\dn$ & 
       
   236 $\textit{estimate}(e) + 1$\\
       
   237 $\textit{estimate}(e_1 ; e_2)$ & $\dn$ & 
       
   238 $max(\textit{estimate}(e_1), \textit{estimate}(e_2))$\\
       
   239 $\textit{estimate}(f(e_1, ..., e_n))$ & $\dn$ & 
       
   240 $\sum_{i = 1..n}\;estimate(e_i)$\medskip\\
       
   241 $\textit{estimate}(a_1\;bop\;a_2)$ & $\dn$ &
       
   242 $\textit{estimate}(a_1) + \textit{estimate}(a_2)$\\
       
   243 \end{tabular}
       
   244 \end{center} 
       
   245  
       
   246 \noindent This function overestimates the stack size, for
       
   247 example, in the case of \pcode{if}s. Since we cannot predict
       
   248 which branch will be run, we have to allocate the maximum 
       
   249 of stack each branch might take. I leave you also to think 
       
   250 about whether the estimate in case of function calls is the 
       
   251 best possible estimate. Note also that in case of \pcode{write}
       
   252 we need to add one, because we duplicate the top-most element
       
   253 in the stack.
       
   254 
       
   255 With this all in place, we can start generating code, for
       
   256 example, for the two functions:
       
   257 
       
   258 \begin{lstlisting}[numbers=none]
       
   259 def suc(x) = x + 1;
       
   260 
       
   261 def add(x, y) = if x == 0 then y
       
   262                 else suc(add(x - 1, y));
       
   263 \end{lstlisting}
       
   264 
       
   265 \noindent The succesor function is a simple loading of the
       
   266 argument $x$ (index $0$) onto the stack, as well as the number
       
   267 $1$. Then we add both elements leaving the result of the 
       
   268 addition on top of the stack. This value will be returned by 
       
   269 the \pcode{suc}-function. See below:
       
   270 
       
   271 \begin{lstlisting}[language=JVMIS]
       
   272 .method public static suc(I)I 
       
   273 .limit locals 1
       
   274 .limit stack 2
       
   275   iload 0
       
   276   ldc 1
       
   277   iadd
       
   278   ireturn
       
   279 .end method 
       
   280 \end{lstlisting}
       
   281 
       
   282 \noindent The addition function is a bit more interesting
       
   283 since in the last line we have to call the function
       
   284 recursively and ``wrap around'' a call to the successor
       
   285 function. The code is as follows:
       
   286 
       
   287 \begin{lstlisting}[language=JVMIS]
       
   288 .method public static add(II)I 
       
   289 .limit locals 2
       
   290 .limit stack 5
       
   291   iload 0
       
   292   ldc 0
       
   293   if_icmpne If_else
       
   294   iload 1
       
   295   goto If_end
       
   296 If_else:
       
   297   iload 0
       
   298   ldc 1
       
   299   isub
       
   300   iload 1
       
   301   invokestatic XXX/XXX/add(II)I
       
   302   invokestatic XXX/XXX/suc(I)I
       
   303 If_end:
       
   304   ireturn
       
   305 .end method
       
   306 \end{lstlisting}
       
   307 
       
   308 \noindent The local limit is because add takes two arguments.
       
   309 The stack limit is a simple calculation using the estimate
       
   310 function. We first generate code for the boolean expression
       
   311 \pcode{x == 0}, that is loading local variable 0 and the
       
   312 number 0 onto the stack (Lines 4 and 5). If the not-equality
       
   313 test fails we continue with returning $y$, which is the local
       
   314 variable 1 (followed by a jump to the return instruction). If
       
   315 the not-equality test succeeds then we jump to the label 
       
   316 \pcode{If_else} (Line 9). After that label is the code for
       
   317 \pcode{suc(add(x - 1, y))}. We first have to evaluate the
       
   318 argument of the suc-function. But this means we first have to
       
   319 evaluate the two arguments of the add-function. This means
       
   320 loading $x$ and $1$ onto the stack and subtracting them.
       
   321 Then loading $y$ onto the stack. We can then make a recursive 
       
   322 call to add (its two arguments are on the stack). When this
       
   323 call returns we have the result of the addition on the top of 
       
   324 the stack and just need to call suc. Finally, we can return 
       
   325 the result on top of the stack as the result of the 
       
   326 add-function.
   210  
   327  
   211 \end{document}
   328 \end{document}
   212 
   329 
   213 %%% Local Variables: 
   330 %%% Local Variables: 
   214 %%% mode: latex  
   331 %%% mode: latex