updated
authorChristian Urban <urbanc@in.tum.de>
Wed, 06 Nov 2019 15:04:40 +0000
changeset 680 eecc4d5a2172
parent 679 8fc109f36b78
child 681 7b7736bea3ca
updated
handouts/ho05.pdf
handouts/ho05.tex
handouts/ho09.pdf
handouts/ho09.tex
langs.sty
Binary file handouts/ho05.pdf has changed
--- a/handouts/ho05.tex	Fri Nov 01 13:21:51 2019 +0000
+++ b/handouts/ho05.tex	Wed Nov 06 15:04:40 2019 +0000
@@ -14,15 +14,16 @@
 
 \section*{Handout 5 (Grammars \& Parser)}
 
-While regular expressions are very useful for lexing and for
-recognising many patterns in strings (like email addresses),
-they have their limitations. For example there is no regular
-expression that can recognise the language $a^nb^n$. Another
-example for which there exists no regular expression is the
-language of well-parenthesised expressions. In languages like
-Lisp, which use parentheses rather extensively, it might be of
-interest to know whether the following two expressions are
-well-parenthesised or not (the left one is, the right one is not):
+While regular expressions are very useful for lexing and for recognising
+many patterns in strings (like email addresses), they have their
+limitations. For example there is no regular expression that can
+recognise the language $a^nb^n$ (where you have strings with $n$ $a$'s
+followed by the same amount of $b$'s). Another example for which there
+exists no regular expression is the language of well-parenthesised
+expressions. In languages like Lisp, which use parentheses rather
+extensively, it might be of interest to know whether the following two
+expressions are well-parenthesised or not (the left one is, the right
+one is not):
 
 \begin{center}
 $(((()()))())$  \hspace{10mm} $(((()()))()))$
@@ -52,7 +53,7 @@
 
 \noindent Each ``bubble'' stands for sets of languages (remember
 languages are sets of strings). As indicated the set of regular
-languages are fully included inside the context-free languages,
+languages is fully included inside the context-free languages,
 meaning every regular language is also context-free, but not vice
 versa. Below I will let you think, for example, what the context-free
 grammar is for the language corresponding to the regular expression
@@ -61,15 +62,15 @@
 Because of their convenience, context-free languages play an important
 role in `day-to-day' text processing and in programming
 languages. Context-free in this setting means that ``words'' have one
-meaning only and this meaning is independent from in which context
-the ``words'' appear. For example ambiguity issues like
+meaning only and this meaning is independent from the context
+the ``words'' appear in. For example ambiguity issues like
 
 \begin{center}
 \tt Time flies like an arrow; fruit flies like bananas.
 \end{center}  
 
 \noindent
-from natural languages were the meaning of \emph{flies} depend on the
+from natural languages were the meaning of \emph{flies} depends on the
 surrounding \emph{context} are avoided as much as possible.
 
 Context-free languages are usually specified by grammars. For example
@@ -162,9 +163,10 @@
 \end{center}
 
 \noindent
-A \emph{parse-tree} encodes how a string is derived with the starting symbol on 
-top and each non-terminal containing a subtree for how it is replaced in a derivation.
-The parse tree for the string $(1 + 23)+4$ is as follows:
+A \emph{parse-tree} encodes how a string is derived with the starting
+symbol on top and each non-terminal containing a subtree for how it is
+replaced in a derivation. The parse tree for the string $(1 + 23)+4$ is
+as follows:
 
 \begin{center}
 \begin{tikzpicture}[level distance=8mm, black]
@@ -190,30 +192,30 @@
 
 \noindent We are often interested in these parse-trees since
 they encode the structure of how a string is derived by a
-grammar. Before we come to the problem of constructing such
-parse-trees, we need to consider the following two properties
-of grammars. A grammar is \emph{left-recursive} if there is a
-derivation starting from a non-terminal, say \meta{NT} which leads
-to a string which again starts with \meta{NT}. This means a
-derivation of the form.
+grammar. 
+
+Before we come to the problem of constructing such parse-trees, we need
+to consider the following two properties of grammars. A grammar is
+\emph{left-recursive} if there is a derivation starting from a
+non-terminal, say \meta{NT} which leads to a string which again starts
+with \meta{NT}. This means a derivation of the form.
 
 \begin{center}
 $\meta{NT} \rightarrow \ldots \rightarrow \meta{NT} \cdot \ldots$
 \end{center}
 
-\noindent It can be easily seen that the grammar above for
-arithmetic expressions is left-recursive: for example the
-rules $\meta{E} \rightarrow \meta{E}\cdot + \cdot \meta{E}$ and 
-$\meta{N} \rightarrow \meta{N}\cdot \meta{N}$ show that this 
-grammar is left-recursive. But note
-that left-recursiveness can involve more than one step in the
-derivation. The problem with left-recursive grammars is that
-some algorithms cannot cope with them: they fall into a loop.
-Fortunately every left-recursive grammar can be transformed
-into one that is not left-recursive, although this
-transformation might make the grammar less ``human-readable''.
-For example if we want to give a non-left-recursive grammar
-for numbers we might specify
+\noindent It can be easily seen that the grammar above for arithmetic
+expressions is left-recursive: for example the rules $\meta{E}
+\rightarrow \meta{E}\cdot + \cdot \meta{E}$ and $\meta{N} \rightarrow
+\meta{N}\cdot \meta{N}$ show that this grammar is left-recursive. But
+note that left-recursiveness can involve more than one step in the
+derivation. The problem with left-recursive grammars is that some
+algorithms cannot cope with them: with left-recursive grammars they will
+fall into a loop. Fortunately every left-recursive grammar can be
+transformed into one that is not left-recursive, although this
+transformation might make the grammar less ``human-readable''. For
+example if we want to give a non-left-recursive grammar for numbers we
+might specify
 
 \begin{center}
 $\meta{N} \;\;\rightarrow\;\; 0\;|\;\ldots\;|\;9\;|\;
@@ -260,17 +262,211 @@
 \end{tabular} 
 \end{center}
 
-\noindent In particular in programming languages we will try
-to avoid ambiguous grammars because two different parse-trees
-for a string mean a program can be interpreted in two
-different ways. In such cases we have to somehow make sure the
-two different ways do not matter, or disambiguate the grammar
-in some other way (for example making the $+$
-left-associative). Unfortunately already the problem of
-deciding whether a grammar is ambiguous or not is in general
-undecidable. But in simple instance (the ones we deal in this
-module) one can usually see when a grammar is ambiguous.
+\noindent In particular in programming languages we will try to avoid
+ambiguous grammars because two different parse-trees for a string mean a
+program can be interpreted in two different ways. In such cases we have
+to somehow make sure the two different ways do not matter, or
+disambiguate the grammar in some other way (for example making the $+$
+left-associative). Unfortunately already the problem of deciding whether
+a grammar is ambiguous or not is in general undecidable. But in simple
+instance (the ones we deal with in this module) one can usually see when
+a grammar is ambiguous.
+
+\subsection*{Removing Left-Recursion}
+
+Let us come back to the problem of left-recursion and consider the 
+following grammar for binary numbers:
+
+\begin{plstx}[margin=1cm]
+: \meta{B} ::= \meta{B} \cdot \meta{B} | 0 | 1\\
+\end{plstx}
+
+\noindent
+It is clear that this grammar can create all binary numbers, but
+it is also clear that this grammar is left-recursive. Giving this
+grammar as is to parser combinators will result in an infinite 
+loop. Fortunately, every left-recursive grammar can be translated
+into one that is not left-recursive with the help of some
+transformation rules. Suppose we identified the ``offensive'' 
+rule, then we can separate the grammar into this offensive rule
+and the ``rest'':
+
+\begin{plstx}[margin=1cm]
+  : \meta{B} ::= \underbrace{\meta{B} \cdot \meta{B}}_{\textit{lft-rec}} 
+  | \underbrace{0 \;\;|\;\; 1}_{\textit{rest}}\\
+\end{plstx}
+
+\noindent
+To make the idea of the transformation clearer, suppose the left-recursive
+rule is of the form $\meta{B}\alpha$ (the left-recursive non-terminal 
+followed by something called $\alpha$) and the ``rest'' is called $\beta$.
+That means our grammar looks schematically as follows
+
+\begin{plstx}[margin=1cm]
+  : \meta{B} ::= \meta{B} \cdot \alpha | \beta\\
+\end{plstx}
+
+\noindent
+To get rid of the left-recursion, we are required to introduce 
+a new non-terminal, say $\meta{B'}$ and transform the rule
+as follows:
+
+\begin{plstx}[margin=1cm]
+  : \meta{B} ::= \beta \cdot \meta{B'}\\
+  : \meta{B'} ::= \alpha \cdot \meta{B'} | \epsilon\\
+\end{plstx}
+
+\noindent
+In our example of binary numbers we would after the transformation 
+end up with the rules
+
+\begin{plstx}[margin=1cm]
+  : \meta{B} ::= 0 \cdot \meta{B'} | 1 \cdot \meta{B'}\\
+  : \meta{B'} ::= \meta{B} \cdot \meta{B'} | \epsilon\\
+\end{plstx}
+
+\noindent
+A little thought should convince you that this grammar still derives
+all the binary numbers (for example 0 and 1 are derivable because $\meta{B'}$
+can be $\epsilon$). Less clear might be why this grammar is non-left recursive.
+For $\meta{B'}$ it is relatively clear because we will never be 
+able to derive things like
+
+\begin{center}
+$\meta{B'} \rightarrow\ldots\rightarrow \meta{B'}\cdot\ldots$
+\end{center}  
+
+\noindent
+because there will always be a $\meta{B}$ in front of a $\meta{B'}$, and
+$\meta{B}$ now has always a $0$ or $1$ in front, so a $\meta{B'}$ can
+never be in the first place. The reasoning is similar for $\meta{B}$:
+the $0$ and $1$ in the rule for $\meta{B}$ ``protect'' it from becoming
+left-recursive. This transformation does not mean the grammar is the
+simplest left-recursive grammar for binary numbers. For example the
+following grammar would do as well
+
+\begin{plstx}[margin=1cm]
+  : \meta{B} ::= 0 \cdot \meta{B} | 1 \cdot \meta{B} | 0 | 1\\
+\end{plstx}
+
+\noindent
+The point is that we can in principle transform every left-recursive
+grammar into one that is non-left-recursive one. This explains why often
+the following grammar is used for arithmetic expressions:
+
+\begin{plstx}[margin=1cm]
+  : \meta{E} ::= \meta{T} | \meta{T} \cdot + \cdot \meta{E} |  \meta{T} \cdot - \cdot \meta{E}\\
+  : \meta{T} ::= \meta{F} | \meta{F} \cdot * \cdot \meta{T}\\
+  : \meta{F} ::= num\_token | ( \cdot \meta{E} \cdot )\\
+\end{plstx}
 
+\noindent
+In this grammar all $\meta{E}$xpressions, $\meta{T}$erms and $\meta{F}$actors
+are in some way protected from being left-recusive. For example if you
+start $\meta{E}$ you can derive another one by going through $\meta{T}$, then 
+$\meta{F}$, but then $\meta{E}$ is protected by the open-parenthesis.
+
+\subsection*{Removing $\epsilon$-Rules and CYK-Algorithm}
+
+I showed above that the non-left-recursive grammar for binary numbers is
+
+\begin{plstx}[margin=1cm]
+  : \meta{B} ::= 0 \cdot \meta{B'} | 1 \cdot \meta{B'}\\
+  : \meta{B'} ::= \meta{B} \cdot \meta{B'} | \epsilon\\
+\end{plstx}
+
+\noindent
+The transformation made the original grammar non-left-recursive, but at
+the expense of introducing an $\epsilon$ in the second rule. Having an
+explicit $\epsilon$-rule is annoying to, not in terms of looping, but in
+terms of efficiency. The reason is that the $\epsilon$-rule always
+applies but since it recognises the empty string, it does not make any
+progress with recognising a string. Better are rules like $( \cdot
+\meta{E} \cdot )$ where something of the input is consumed. Getting
+rid of $\epsilon$-rules is also important for the CYK parsing algorithm,
+which can give us an insight into the complexity class of parsing.
+
+It turns out we can also by some generic transformations eliminate
+$\epsilon$-rules from grammars. Consider again the grammar above for
+binary numbers where have a rule $\meta{B'} ::= \epsilon$. In this case
+we look for rules of the (generic) form \mbox{$\meta{A} :=
+\alpha\cdot\meta{B'}\cdot\beta$}. That is there are rules that use
+$\meta{B'}$ and something ($\alpha$) is in front of $\meta{B'}$ and
+something follows ($\beta$). Such rules need to be replaced by
+additional rules of the form \mbox{$\meta{A} := \alpha\cdot\beta$}.
+In our running example there are the two rules for $\meta{B}$ which
+fall into this category
+
+\begin{plstx}[margin=1cm]
+  : \meta{B} ::= 0 \cdot \meta{B'} | 1 \cdot \meta{B'}\\
+\end{plstx} 
+
+\noindent To follow the general scheme of the transfromation,
+the $\alpha$ is either is either $0$ or $1$, and the $\beta$ happens
+to be empty. SO we need to generate new rules for the form 
+\mbox{$\meta{A} := \alpha\cdot\beta$}, which in our particular 
+example means we obtain
+
+\begin{plstx}[margin=1cm]
+  : \meta{B} ::= 0 \cdot \meta{B'} | 1 \cdot \meta{B'} | 0 | 1\\
+\end{plstx} 
+
+\noindent
+Unfortunately $\meta{B'}$ is also used in the rule 
+
+\begin{plstx}[margin=1cm]
+  : \meta{B'} ::= \meta{B} \cdot \meta{B'}\\
+\end{plstx}
+
+\noindent
+For this we repeat the transformation, giving 
+
+\begin{plstx}[margin=1cm]
+  : \meta{B'} ::= \meta{B} \cdot \meta{B'} | \meta{B}\\
+\end{plstx}
+
+\noindent
+In this case $\alpha$ was substituted with $\meta{B}$ and $\beta$
+was again empty. Once no rule is left over, we can simply throw
+away the $\epsilon$ rule.  This gives the grammar
+
+\begin{plstx}[margin=1cm]
+  : \meta{B} ::= 0 \cdot \meta{B'} | 1 \cdot \meta{B'} | 0 | 1\\
+  : \meta{B'} ::= \meta{B} \cdot \meta{B'} | \meta{B}\\
+\end{plstx}
+
+\noindent
+I let you think about whether this grammar can still recognise all
+binary numbers and whether this grammar is non-left-recursive. The
+precise statement for the transformation of removing $\epsilon$-rules is
+that if the original grammar was able to recognise only non-empty
+strings, then the transformed grammar will be equivalent (matching the
+same set of strings); if the original grammar was able to match the
+empty string, then the transformed grammar will be able to match the
+same strings, \emph{except} the empty string. So the  $\epsilon$-removal
+does not preserve equivalence of grammars, but the small defect with the
+empty string is not important for practical purposes.
+
+So why are these transformations all useful? Well apart from making the 
+parser combinators work (remember they cannot deal with left-recursion and
+are inefficient with $\epsilon$-rules), a second reason is that they help
+with getting any insight into the complexity of the parsing problem. 
+The parser combinators are very easy to implement, but are far from the 
+most efficient way of processing input (they can blow up exponentially
+with ambiguous grammars). The question remains what is the best possible
+complexity for parsing? It turns out that this is $O(n^3)$ for context-free
+languages. 
+
+To answer the question about complexity, let me describe next the CYK
+algorithm (named after the authors Cocke–Younger–Kasami). This algorithm
+works with grammars that are in Chomsky normalform. 
+
+TBD
+
+\end{document}
+
+
+%%% Parser combinators are now part of handout 6
 
 \subsection*{Parser Combinators}
 
@@ -533,9 +729,11 @@
 
 
 
-\end{document}
+
+
 
 %%% Local Variables: 
 %%% mode: latex  
 %%% TeX-master: t
 %%% End: 
+
Binary file handouts/ho09.pdf has changed
--- a/handouts/ho09.tex	Fri Nov 01 13:21:51 2019 +0000
+++ b/handouts/ho09.tex	Wed Nov 06 15:04:40 2019 +0000
@@ -16,25 +16,25 @@
 
 Reflecting on our tiny compiler targetting the JVM, the code generation
 part was actually not so hard, no? Pretty much just some post-traversal
-of the abstract syntax tree, yes? One of the main reason for this ease
-is that the JVM is a stack-based virtual machine and it is therefore not
-hard to translate arithmetic expressions into a sequence of instructions
-manipulating the stack. The problem is that ``real'' CPUs, although
-supporting stack operations, are not really designed to be \emph{stack
-machines}.  The design of CPUs is more like, here is a chunk of
-memory---compiler, or better compiler writers, do something with it.
-Consequently, modern compilers need to go the extra mile in order to
-generate code that is much easier and faster to process by CPUs. To make
-this all tractable for this module, we target the LLVM Intermediate
-Language. In this way we can take advantage of the tools coming with
-LLVM. For example we do not have to worry about things like register
-allocations.\bigskip 
+of the abstract syntax tree, yes? One of the reasons for this ease is
+that the JVM is a stack-based virtual machine and it is therefore not
+hard to translate deeply-nested arithmetic expressions into a sequence
+of instructions manipulating the stack. The problem is that ``real''
+CPUs, although supporting stack operations, are not really designed to
+be \emph{stack machines}.  The design of CPUs is more like, here is a
+chunk of memory---compiler, or better compiler writers, do something
+with it. Consequently, modern compilers need to go the extra mile in
+order to generate code that is much easier and faster to process by
+CPUs. To make this all tractable for this module, we target the LLVM
+Intermediate Language. In this way we can take advantage of the tools
+coming with LLVM. For example we do not have to worry about things like
+register allocations.\bigskip 
 
 \noindent LLVM\footnote{\url{http://llvm.org}} is a beautiful example
 that projects from Academia can make a difference in the world. LLVM
 started in 2000 as a project by two researchers at the  University of
 Illinois at Urbana-Champaign. At the time the behemoth of compilers was
-gcc with its myriad of front-ends for other languages (e.g.~Fortran,
+gcc with its myriad of front-ends for other languages (C++, Fortran,
 Ada, Go, Objective-C, Pascal etc). The problem was that gcc morphed over
 time into a monolithic gigantic piece of m\ldots ehm software, which you
 could not mess about in an afternoon. In contrast, LLVM is designed to
@@ -46,30 +46,32 @@
 or less legacy. This does not mean that programming languages like C and
 C++ are dying out any time soon---they are nicely supported by LLVM.
 
-Targetting the LLVM Intermediate Language, or Intermediate
-Representation (short LLVM-IR), also means we can profit from the very
+We will target the LLVM Intermediate Language, or Intermediate
+Representation (short LLVM-IR). As a result we can benefit from the
 modular structure of the LLVM compiler and let for example the compiler
-generate code for X86, or ARM etc. That means we can be agnostic about
-where our code actually runs. However, what we have to do is to generate
-code in \emph{Static Single-Assignment} format (short SSA), because that
-is what the LLVM-IR expects from us. LLVM-IR is the intermediate format
-that LLVM uses for doing cool things, like targetting strange
-architectures, optimising code and allocating memory efficiently. 
+generate code for different CPUs, like X86 or ARM. That means we can be
+agnostic about where our code actually runs. We can also be ignorant
+about optimising code and allocating memory efficiently. However, what
+we have to do is to generate code in \emph{Static Single-Assignment}
+format (short SSA), because that is what the LLVM-IR expects from us. 
 
 The idea behind the SSA format is to use very simple variable
 assignments where every variable is assigned only once. The assignments
 also need to be primitive in the sense that they can be just simple
 operations like addition, multiplication, jumps, comparisons and so on.
-An idealised snippet of a program in SSA is 
-
-\begin{lstlisting}[language=LLVM,numbers=none]
-    x := 1
-    y := 2     
-    z := x + y
+Say, we have an expression $((1 + a) + (3 + (b * 5)))$, then the
+SSA is 
+ 
+\begin{lstlisting}[language=LLVMIR,numbers=left]
+let tmp0 = add 1 a in   
+let tmp1 = mul b 5 in 
+let tmp2 = add 3 tmp1 in 
+let tmp3 = add tmp0 tmp2 in
+  tmp3 
 \end{lstlisting}
 
 \noindent where every variable is used only once (we could not write
-\texttt{x := x + y} in the last line for example).  There are
+\texttt{tmp1 = add 3 tmp1} in Line 3 for example).  There are
 sophisticated algorithms for imperative languages, like C, that
 efficiently transform a high-level program into SSA format. But we can
 ignore them here. We want to compile a functional language and there
@@ -81,11 +83,11 @@
 \subsection*{LLVM-IR}
 
 Before we start, lets first have a look at the \emph{LLVM Intermediate
-Representation}. What is good about our simple Fun language is that it
-basically only contains expressions (be they arithmetic expressions or
-boolean expressions). The exception is function definitions. Luckily,
-for them we can use the mechanism of defining functions in LLVM-IR. For
-example the simple Fun program 
+Representation} in more detail. What is good about our toy Fun language
+is that it basically only contains expressions (be they arithmetic
+expressions or boolean expressions or if-expressions). The exception are
+function definitions. Luckily, for them we can use the mechanism of
+defining functions in LLVM-IR. For example the simple Fun program 
 
 
 \begin{lstlisting}[language=Scala,numbers=none]
@@ -102,30 +104,33 @@
 }    
 \end{lstlisting}
 
-\noindent First to notice is that all variable names in the LLVM-IR are
-prefixed by \texttt{\%}; function names need to be prefixed with @.
-Also, the LLVM-IR is a fully typed language. The \texttt{i32} type stands
-for a 32-bit integer. There are also types for 64-bit integers, chars
-(\texttt{i8}), floats, arrays and even pointer types. In teh code above,
-\texttt{sqr} takes an argument of type \texttt{i32} and produces a
-result of type \texttt{i32}. Each arithmetic operation, like addition or
-multiplication, are also prefixed with the type they operate on.
-Obviously these types need to match up\ldots{} but since we have in our
-programs only integers, \texttt{i32} everywhere will do. 
+\noindent First notice that all variable names in the LLVM-IR are
+prefixed by \texttt{\%}; function names need to be prefixed with
+@-symbol. Also, the LLVM-IR is a fully typed language. The \texttt{i32}
+type stands for 32-bit integers. There are also types for 64-bit
+integers (\texttt{i64}), chars (\texttt{i8}), floats, arrays and even
+pointer types. In the code above, \texttt{sqr} takes an argument of type
+\texttt{i32} and produces a result of type \texttt{i32} (the result type
+is before the function name, like in C). Each arithmetic operation, like
+addition or multiplication, are also prefixed with the type they operate
+on. Obviously these types need to match up\ldots{} but since we have in
+our programs only integers, \texttt{i32} everywhere will do. 
  
 Conveniently, you can use the program \texttt{lli}, which comes with
 LLVM, to interpret programs written in the LLVM-IR. So you can easily
 check whether the code you produced actually works. To get a running
 program that does something interesting you need to add some boilerplate
 about printing out numbers and a main-function that is the entrypoint
-for the program (see Figure~\ref{lli}). You can generate a binary for
-this program using \texttt{llc}-compiler in order to generate an object
-file and then use gcc (clang) for generating a binary:
+for the program (see Figure~\ref{lli} for a complete listing). You can
+generate a binary for this program using \texttt{llc}-compiler and
+\texttt{gcc}---\texttt{llc} can generate an object file and then you can 
+use \texttt{gcc} (that is clang) for generating an executable  binary:
 
 \begin{lstlisting}[language=bash,numbers=none]
 llc -filetype=obj sqr.ll
 gcc sqr.o -o a.out
 ./a.out
+> 25
 \end{lstlisting}
 
 \begin{figure}[t]\small 
@@ -141,13 +146,41 @@
 \subsection*{Our Own Intermediate Language}
 
 Remember compilers have to solve the problem of bridging the gap between
-``high-level'' programs and ``low-level'' hardware. If the gap is tool
-wide then a good strategy is to lay a stepping stone somewhere in
-between. The LLVM-IR itself is such a stepping stone to make the task of
-generating code easier. Like a real compiler we will use another
-stepping stone which I call \emph{K-language}. For this remember
-expressions (and boolean expressions) in the Fun language are given by
-the code on top of Figure~\ref{absfun}
+``high-level'' programs and ``low-level'' hardware. If the gap is too
+wide for one step, then a good strategy is to lay a stepping stone
+somewhere in between. The LLVM-IR itself is such a stepping stone to
+make the task of generating and optimising code easier. Like a real
+compiler we will use our own stepping stone which I call the
+\emph{K-language}. For this remember expressions (and boolean
+expressions) in the Fun language. For convenience the Scala code is
+shown on top of Figure~\ref{absfun}. Below is the code for the
+K-language. There are two kinds of syntactic entities, namely
+\emph{K-values} and \emph{K-expressions}. The central point of the
+K-language is the \texttt{KLet}-constructor. For this recall that 
+arithmetic expressions such as $((1 + a) + (3 + (b * 5)))$ need
+to be broken up into smaller ``atomic'' steps, like so
+ 
+\begin{lstlisting}[language=LLVMIR,numbers=none]
+let tmp0 = add 1 a in   
+let tmp1 = mul b 5 in 
+let tmp2 = add 3 tmp1 in 
+let tmp3 = add tmp0 tmp2 in
+  tmp3 
+\end{lstlisting}
+
+\noindent
+Here \texttt{tmp3} will contain the result of what the expression stands
+for. In each step we can only perform an ``atomic'' operation, like
+addition or multiplication. We could not for example have an
+if-condition on the right-hand side of an equals. These constraints
+enforced upon us because how the SSA format works in the LLVM-IR. By
+having in \texttt{KLet},  first a string (standing for an intermediate
+result) and second a value, we can fulfil this constraint---there is no
+way we could write anything else than a value. To sum up, K-values are
+the atomic operations that can be on the right-hand side of equal-signs.
+The K-language is restricted such that it is easy to generate the SSA format 
+for the LLVM-IR.
+
 
 
 \begin{figure}[p]\small
@@ -178,7 +211,7 @@
 case class KWrite(v: KVal) extends KVal
 
 case class KIf(x1: String, e1: KExp, e2: KExp) extends KExp
-case class KLet(x: String, e1: KVal, e2: KExp) extends KExp
+case class KLet(x: String, v: KVal, e: KExp) extends KExp
 case class KReturn(v: KVal) extends KExp
 \end{lstlisting}
 \caption{Abstract syntax trees for the Fun language.\label{absfun}}
--- a/langs.sty	Fri Nov 01 13:21:51 2019 +0000
+++ b/langs.sty	Wed Nov 06 15:04:40 2019 +0000
@@ -68,6 +68,9 @@
   morekeywords={ldc,iload,istore,ifeq,if_icmpge},
 }[keywords]
 
+\lstdefinelanguage{LLVMIR}{
+  otherkeywords={let,in,add,mul},
+}[strings]
 
 \newcommand{\code}[1]{{\lstinline{#1}}}
 \newcommand{\pcode}[1]{\mbox{\lstset{language={},keywordstyle=\color{black}}\lstinline!#1!}}