handouts/ho07.tex
changeset 668 9ce78065f68d
parent 601 208b0f67a3d0
child 690 8d57433c7b5e
equal deleted inserted replaced
667:412556272333 668:9ce78065f68d
     5 \usepackage{../grammar}
     5 \usepackage{../grammar}
     6 \usepackage{../graphics}
     6 \usepackage{../graphics}
     7 
     7 
     8 
     8 
     9 \begin{document}
     9 \begin{document}
       
    10 \fnote{\copyright{} Christian Urban, King's College London, 2017, 2018, 2019}
    10 
    11 
    11 \section*{Handout 7 (Compilation)}
    12 \section*{Handout 7 (Compilation)}
    12 
    13 
    13 The purpose of a compiler is to transform a program, a human
    14 The purpose of a compiler is to transform a program a human can read and
    14 can read and write, into code the machine can run as fast as
    15 write into code the machine can run as fast as possible. The fastest
    15 possible. The fastest code would be machine code the CPU can
    16 code would be machine code the CPU can run directly, but it is often
    16 run directly, but it is often enough to improve the speed of a
    17 good enough for improving the speed of a program to just target a
    17 program by just targeting a virtual machine. This produces not
    18 virtual machine. This produces not the fastest possible code, but code
    18 the fastest possible code, but code that is fast enough and
    19 that is pretty enough. This way of producing code has the advantage that
    19 has the advantage that the virtual machine takes care of
    20 the virtual machine takes care of things a compiler would normally need
    20 things a compiler would normally need to take care of (like
    21 to take care of (like explicit memory management). An interesting answer
    21 explicit memory management). Why study compilers? John Regher
    22 for why studying compilers is given by John Regher in his compiler
    22 gives this answer in his compiler blog:\footnote{\url{http://blog.regehr.org/archives/1419}}
    23 blog:\footnote{\url{http://blog.regehr.org/archives/1419}}
    23 
    24 
    24 \begin{quote}\it{}``We can start off with a couple of observations
    25 \begin{quote}\it{}``We can start off with a couple of observations
    25   about the role of compilers. First, hardware is getting weirder
    26   about the role of compilers. First, hardware is getting weirder
    26   rather than getting clocked faster: almost all processors are
    27   rather than getting clocked faster: almost all processors are
    27   multicores and it looks like there is increasing asymmetry in
    28   multicores and it looks like there is increasing asymmetry in
    39   employment act for solid compiler hackers.''
    40   employment act for solid compiler hackers.''
    40 \end{quote}  
    41 \end{quote}  
    41 
    42 
    42 As a first example in this module we will implement a compiler for the
    43 As a first example in this module we will implement a compiler for the
    43 very simple While-language. It will generate code for the Java Virtual
    44 very simple While-language. It will generate code for the Java Virtual
    44 Machine (JVM). This is a stack-based virtual machine, a fact which
    45 Machine (JVM). This is a stack-based virtual machine, a fact which will
    45 will make it easy to generate code for arithmetic expressions. For
    46 make it easy to generate code for arithmetic expressions. For example
    46 example for generating code for the expression $1 + 2$ we need to
    47 when compiling $1 + 2$ we need to generate the following three
    47 generate the following three instructions
    48 instructions
    48 
    49 
    49 \begin{lstlisting}[numbers=none]
    50 \begin{lstlisting}[language=JVMIS,numbers=none]
    50 ldc 1
    51 ldc 1
    51 ldc 2
    52 ldc 2
    52 iadd 
    53 iadd 
    53 \end{lstlisting}
    54 \end{lstlisting}
    54 
    55 
    55 \noindent The first instruction loads the constant $1$ onto
    56 \noindent The first instruction loads the constant $1$ onto
    56 the stack, the next one $2$, the third instruction adds both
    57 the stack, the next one loads $2$, the third instruction adds both
    57 numbers together replacing the top two elements of the stack
    58 numbers together replacing the top two elements of the stack
    58 with the result $3$. For simplicity, we will throughout
    59 with the result $3$. For simplicity, we will throughout
    59 consider only integer numbers and results. Therefore we can
    60 consider only integer numbers and results. Therefore we can
    60 use the JVM instructions \code{iadd}, \code{isub},
    61 use the JVM instructions \code{iadd}, \code{isub},
    61 \code{imul}, \code{idiv} and so on. The \code{i} stands for
    62 \code{imul}, \code{idiv} and so on. The \code{i} stands for
    78           | \meta{Num}\\
    79           | \meta{Num}\\
    79 \end{plstx}
    80 \end{plstx}
    80 
    81 
    81 
    82 
    82 \noindent where \meta{Id} stands for variables and \meta{Num}
    83 \noindent where \meta{Id} stands for variables and \meta{Num}
    83 for numbers. For the moment let us omit variables from
    84 for numbers. For the moment let us omit variables from arithmetic
    84 arithmetic expressions. Our parser will take this grammar and
    85 expressions. Our parser will take this grammar and given an input
    85 given an input produce abstract syntax trees. For example for
    86 produce abstract syntax trees. For example we will obtain for the
    86 the expression $1 + ((2 * 3) + (4 - 3))$ it will produce the
    87 expression $1 + ((2 * 3) + (4 - 3))$ the following tree.
    87 following tree.
       
    88 
    88 
    89 \begin{center}
    89 \begin{center}
    90 \begin{tikzpicture}
    90 \begin{tikzpicture}
    91 \Tree [.$+$ [.$1$ ] [.$+$ [.$*$ $2$ $3$ ] [.$-$ $4$ $3$ ]]]
    91 \Tree [.$+$ [.$1$ ] [.$+$ [.$*$ $2$ $3$ ] [.$-$ $4$ $3$ ]]]
    92 \end{tikzpicture}
    92 \end{tikzpicture}
    93 \end{center}
    93 \end{center}
    94 
    94 
    95 \noindent To generate code for this expression, we need to
    95 \noindent To generate JVM code for this expression, we need to
    96 traverse this tree in post-order fashion and emit code for
    96 traverse this tree in post-order fashion and emit code for
    97 each node---this traversal in post-order fashion will produce
    97 each node---this traversal in post-order fashion will produce
    98 code for a stack-machine (what the JVM is). Doing so for the
    98 code for a stack-machine (what the JVM is). Doing so for the
    99 tree above generates the instructions
    99 tree above generates the instructions
   100 
   100 
   101 \begin{lstlisting}[numbers=none]
   101 \begin{lstlisting}[language=JVMIS,numbers=none]
   102 ldc 1 
   102 ldc 1 
   103 ldc 2 
   103 ldc 2 
   104 ldc 3 
   104 ldc 3 
   105 imul 
   105 imul 
   106 ldc 4 
   106 ldc 4 
   108 isub 
   108 isub 
   109 iadd 
   109 iadd 
   110 iadd
   110 iadd
   111 \end{lstlisting}
   111 \end{lstlisting}
   112 
   112 
   113 \noindent If we ``run'' these instructions, the result $8$
   113 \noindent If we ``run'' these instructions, the result $8$ will be on
   114 will be on top of the stack (I leave this to you to verify;
   114 top of the stack (I leave this to you to verify; the meaning of each
   115 the meaning of each instruction should be clear). The result
   115 instruction should be clear). The result being on the top of the stack
   116 being on the top of the stack will be a convention we always
   116 will be a convention we always observe in our compiler, that is the
   117 observe in our compiler, that is the results of arithmetic
   117 results of arithmetic expressions will always be on top of the stack.
   118 expressions will always be on top of the stack. Note, that a
   118 Note, that a different bracketing of the expression, for example $(1 +
   119 different bracketing of the expression, for example $(1 + (2 *
   119 (2 * 3)) + (4 - 3)$, produces a different abstract syntax tree and thus
   120 3)) + (4 - 3)$, produces a different abstract syntax tree and
   120 also a different list of instructions. Generating code in this
   121 thus potentially also a different list of instructions.
   121 post-order-traversal fashion is rather easy to implement: it can be done
   122 Generating code in this fashion is rather easy to implement:
   122 with the following recursive \textit{compile}-function, which takes the
   123 it can be done with the following recursive
   123 abstract syntax tree as argument:
   124 \textit{compile}-function, which takes the abstract syntax
       
   125 tree as argument:
       
   126 
   124 
   127 \begin{center}
   125 \begin{center}
   128 \begin{tabular}{lcl}
   126 \begin{tabular}{lcl}
   129 $\textit{compile}(n)$ & $\dn$ & $\pcode{ldc}\; n$\\
   127 $\textit{compile}(n)$ & $\dn$ & $\pcode{ldc}\; n$\\
   130 $\textit{compile}(a_1 + a_2)$ & $\dn$ &
   128 $\textit{compile}(a_1 + a_2)$ & $\dn$ &
   142 variables. We will represent them as \emph{local variables} in
   140 variables. We will represent them as \emph{local variables} in
   143 the JVM. Essentially, local variables are an array or pointers
   141 the JVM. Essentially, local variables are an array or pointers
   144 to memory cells, containing in our case only integers. Looking
   142 to memory cells, containing in our case only integers. Looking
   145 up a variable can be done with the instruction
   143 up a variable can be done with the instruction
   146 
   144 
   147 \begin{lstlisting}[mathescape,numbers=none]
   145 \begin{lstlisting}[language=JVMIS,mathescape,numbers=none]
   148 iload $index$
   146 iload $index$
   149 \end{lstlisting}
   147 \end{lstlisting}
   150 
   148 
   151 \noindent 
   149 \noindent 
   152 which places the content of the local variable $index$ onto 
   150 which places the content of the local variable $index$ onto 
   153 the stack. Storing the top of the stack into a local variable 
   151 the stack. Storing the top of the stack into a local variable 
   154 can be done by the instruction
   152 can be done by the instruction
   155 
   153 
   156 \begin{lstlisting}[mathescape,numbers=none]
   154 \begin{lstlisting}[language=JVMIS,mathescape,numbers=none]
   157 istore $index$
   155 istore $index$
   158 \end{lstlisting}
   156 \end{lstlisting}
   159 
   157 
   160 \noindent Note that this also pops off the top of the stack.
   158 \noindent Note that this also pops off the top of the stack.
   161 One problem we have to overcome, however, is that local
   159 One problem we have to overcome, however, is that local
   199 \begin{tabular}{lcl}
   197 \begin{tabular}{lcl}
   200 $\textit{compile}(\pcode{skip}, E)$ & $\dn$ & $([], E)$\\
   198 $\textit{compile}(\pcode{skip}, E)$ & $\dn$ & $([], E)$\\
   201 \end{tabular}
   199 \end{tabular}
   202 \end{center}
   200 \end{center}
   203 
   201 
   204 \noindent $[]$ is the empty list of instructions. Note that
   202 \noindent whereby $[]$ is the empty list of instructions. Note that
   205 the \textit{compile}-function for statements returns a pair, a
   203 the \textit{compile}-function for statements returns a pair, a
   206 list of instructions (in this case the empty list) and an
   204 list of instructions (in this case the empty list) and an
   207 environment for variables. The reason for the environment is
   205 environment for variables. The reason for the environment is
   208 that assignments in the While-language might change the
   206 that assignments in the While-language might change the
   209 environment---clearly if a variable is used for the first
   207 environment---clearly if a variable is used for the first
   231 the environment and assign $x$ with the largest index in $E$
   229 the environment and assign $x$ with the largest index in $E$
   232 plus one (that is $E' = E(x \mapsto largest\_index + 1)$). 
   230 plus one (that is $E' = E(x \mapsto largest\_index + 1)$). 
   233 That means for the assignment $x := x + 1$ we generate the
   231 That means for the assignment $x := x + 1$ we generate the
   234 following code
   232 following code
   235 
   233 
   236 \begin{lstlisting}[mathescape,numbers=none]
   234 \begin{lstlisting}[language=JVMIS,mathescape,numbers=none]
   237 iload $n_x$
   235 iload $n_x$
   238 ldc 1
   236 ldc 1
   239 iadd
   237 iadd
   240 istore $n_x$
   238 istore $n_x$
   241 \end{lstlisting}
   239 \end{lstlisting}
   242 
   240 
   243 \noindent 
   241 \noindent 
   244 where $n_x$ is the index for the variable $x$.
   242 where $n_x$ is the index for the variable $x$. The code for 
   245 
   243 looking-up the index for the variable is as follow:
   246 More complicated is the code for \pcode{if}-statements, say
   244 
       
   245 \begin{center}
       
   246 \begin{tabular}{lcl}
       
   247 $index \;=\; \textit{if}\;(E(x).\textit{isDefind})\;\textit{then}\;E(x)\;\textit{else}\;|E|$
       
   248 \end{tabular}
       
   249 \end{center}
       
   250 
       
   251 \noindent
       
   252 In case the environment $E$ contains an index for $x$, we return it.
       
   253 Otherwise we ``create'' a new index by returning the size $|E|$ of the
       
   254 environment (that will be an index that is guaranteed to be not used
       
   255 yet).
       
   256 
       
   257 More complicated is the generation of code for \pcode{if}-statements, say
   247 
   258 
   248 \begin{lstlisting}[mathescape,language={},numbers=none]
   259 \begin{lstlisting}[mathescape,language={},numbers=none]
   249 if $b$ then $cs_1$ else $cs_2$
   260 if $b$ then $cs_1$ else $cs_2$
   250 \end{lstlisting}
   261 \end{lstlisting}
   251 
   262 
   252 \noindent where $b$ is a boolean expression and the $cs_{1/2}$
   263 \noindent where $b$ is a boolean expression and the $cs_{1/2}$ are the
   253 are the statements for each \pcode{if}-branch. Lets assume
   264 statements for each of the \pcode{if}-branches. Lets assume we already
   254 we already generated code for $b$ and $cs_{1/2}$. Then in the
   265 generated code for $b$ and $cs_{1/2}$. Then in the true-case the
   255 true-case the control-flow of the program needs to be
   266 control-flow of the program needs to be
   256 
   267 
   257 \begin{center}
   268 \begin{center}
   258 \begin{tikzpicture}[node distance=2mm and 4mm,
   269 \begin{tikzpicture}[node distance=2mm and 4mm,
   259  block/.style={rectangle, minimum size=1cm, draw=black, line width=1mm},
   270  block/.style={rectangle, minimum size=1cm, draw=black, line width=1mm},
   260  point/.style={rectangle, inner sep=0mm, minimum size=0mm, fill=red},
   271  point/.style={rectangle, inner sep=0mm, minimum size=0mm, fill=red},
   417 \end{lstlisting}
   428 \end{lstlisting}
   418 
   429 
   419 \noindent 
   430 \noindent 
   420 The generated code is as follows:
   431 The generated code is as follows:
   421 
   432 
   422 \begin{lstlisting}[mathescape,language={}]
   433 \begin{lstlisting}[language=JVMIS,mathescape,language={}]
   423    ldc 1
   434    ldc 1
   424    ldc 1
   435    ldc 1
   425    if_icmpne L_ifelse $\quad\tikz[remember picture] \node (C) {\mbox{}};$
   436    if_icmpne L_ifelse $\quad\tikz[remember picture] \node (C) {\mbox{}};$
   426    ldc 2
   437    ldc 2
   427    istore 0
   438    istore 0
   526 while x <= 10 do x := x + 1
   537 while x <= 10 do x := x + 1
   527 \end{lstlisting}
   538 \end{lstlisting}
   528 
   539 
   529 \noindent yielding the following code
   540 \noindent yielding the following code
   530 
   541 
   531 \begin{lstlisting}[mathescape,language={}]
   542 \begin{lstlisting}[language=JVMIS,mathescape,language={}]
   532 L_wbegin: $\quad\tikz[remember picture] \node[] (LB) {\mbox{}};$
   543 L_wbegin: $\quad\tikz[remember picture] \node[] (LB) {\mbox{}};$
   533    iload 0
   544    iload 0
   534    ldc 10
   545    ldc 10
   535    if_icmpgt L_wend $\quad\tikz[remember picture] \node (LC) {\mbox{}};$
   546    if_icmpgt L_wend $\quad\tikz[remember picture] \node (LC) {\mbox{}};$
   536    iload 0
   547    iload 0