# HG changeset patch
# User Christian Urban <christian.urban@kcl.ac.uk>
# Date 1669316734 0
# Node ID 45a48c47dccaec37dfbed75bdc9959135db83cea
# Parent  904de68a27a42095aa248be5a1847fb71d7a4a51
updated

diff -r 904de68a27a4 -r 45a48c47dcca handouts/ho09.pdf
Binary file handouts/ho09.pdf has changed
diff -r 904de68a27a4 -r 45a48c47dcca handouts/ho09.tex
--- a/handouts/ho09.tex	Tue Nov 15 11:34:33 2022 +0000
+++ b/handouts/ho09.tex	Thu Nov 24 19:05:34 2022 +0000
@@ -47,7 +47,7 @@
 be a modular suite of tools with which you can play around easily and
 try out something new. LLVM became a big player once Apple hired one of
 the original developers (I cannot remember the reason why Apple did not
-want to use gcc, but maybe they were also just disgusted by its big
+want to use gcc, but maybe they were also just disgusted by gcc's big
 monolithic codebase). Anyway, LLVM is now the big player and gcc is more
 or less legacy. This does not mean that programming languages like C and
 C++ are dying out any time soon---they are nicely supported by LLVM.
@@ -77,7 +77,7 @@
 
 
 The main idea behind the SSA format is to use very simple variable
-assignments where every variable is assigned only once. The assignments
+assignments where every tmp-variable is assigned only once. The assignments
 also need to be primitive in the sense that they can be just simple
 operations like addition, multiplication, jumps, comparisons and so on.
 Say, we have an expression $((1 + a) + (3 + (b * 5)))$, then the
@@ -91,12 +91,13 @@
 \end{lstlisting}
 
 \noindent where every variable is used only once (we could not write
-\texttt{tmp1 = add 3 tmp1} in Line 3 for example).  There are
-sophisticated algorithms for imperative languages, like C, that
-efficiently transform a high-level program into SSA format. But we can
-ignore them here. We want to compile a functional language and there
-things get much more interesting than just sophisticated. We will need
-to have a look at CPS translations, where the CPS stands for
+\texttt{tmp1 = add 3 tmp1} in Line 3 for example).
+
+There are sophisticated algorithms for imperative languages, like C,
+that efficiently transform a high-level program into SSA format. But
+we can ignore them here. We want to compile a functional language and
+there things get much more interesting than just sophisticated. We
+will need to have a look at CPS translations, where the CPS stands for
 Continuation-Passing-Style---basically black programming art or
 abracadabra programming. So sit tight.
 
@@ -133,8 +134,8 @@
 \noindent First notice that all variable names, in this case \texttt{x}
 and \texttt{tmp}, are prefixed with \texttt{\%} in the LLVM-IR.
 Temporary variables can be named with an identifier, such as
-\texttt{tmp}, or numbers. Function names, since they are ``global'',
-need to be prefixed with @-symbol. Also, the LLVM-IR is a fully typed
+\texttt{tmp}, or numbers. In contrast, function names, since they are ``global'',
+need to be prefixed with an @-symbol. Also, the LLVM-IR is a fully typed
 language. The \texttt{i32} type stands for 32-bit integers. There are
 also types for 64-bit integers (\texttt{i64}), chars (\texttt{i8}),
 floats, arrays and even pointer types. In the code above, \texttt{sqr}
@@ -143,7 +144,7 @@
 C). Each arithmetic operation, for example addition and multiplication,
 are also prefixed with the type they operate on. Obviously these types
 need to match up\ldots{} but since we have in our programs only
-integers, \texttt{i32} everywhere will do. We do not have to generate
+integers, for the moment \texttt{i32} everywhere will do. We do not have to generate
 any other types, but obviously this is a limitation in our Fun language.
  
 There are a few interesting instructions in the LLVM-IR which are quite
@@ -159,21 +160,21 @@
 \noindent
 The type \texttt{i1} stands for booleans. If the variable is true, then
 this instruction jumps to the if-branch, which needs an explicit label;
-otherwise to the else-branch, again with its own label. This allows us
-to keep the meaning of the boolean expression as is when compiling if's.
+otherwise it jumps to the else-branch, again with its own label. This allows us
+to keep the meaning of the boolean expression ``as is'' when compiling if's---thanks god no more negating the boolean.
 A value of type boolean is generated in the LLVM-IR by the
 \texttt{icmp}-instruction. This instruction is for integers (hence the
 \texttt{i}) and takes the comparison operation as argument. For example
 
 \begin{lstlisting}[language=LLVM]
-icmp eq i32  %x, %y     ; for equal
-icmp sle i32 %x, %y     ;   signed less or equal
-icmp slt i32 %x, %y     ;   signed less than
-icmp ult i32 %x, %y     ;   unsigned less than 
+icmp eq  i32 %x, %y     ; for equal
+icmp sle i32 %x, %y     ; signed less or equal
+icmp slt i32 %x, %y     ; signed less than
+icmp ult i32 %x, %y     ; unsigned less than 
 \end{lstlisting}
 
 \noindent
-In some operations, the LLVM-IR distinguishes between signed and 
+Note that in some operations the LLVM-IR distinguishes between signed and 
 unsigned representations of integers.
 
 It is also easy to call another function in LLVM-IR: as can be 
@@ -346,8 +347,8 @@
 \end{lstlisting}
 
 \noindent
-which is the expected result. If this looks somewhat familiar, then this
-is not a 1000 miles off, because functions with continuations can be
+which is the expected result. If this looks somewhat familiar to you,
+than this is because functions with continuations can be
 seen as a kind of generalisation of tail-recursive functions. Anyway
 notice how the continuations is ``stacked up'' during the recursion and
 then ``unrolled'' when we apply 1 to the continuation. Interestingly, we
@@ -385,7 +386,7 @@
 intermediate results need to be chained into later instructions. To do
 this conveniently, CPS-translations have been developed. They use
 functions (``continuations'') to represent what is coming next in a
-sequence of instructions. Continuations are functions of type
+sequence of instructions. In our case, continuations are functions of type
 \code{KVal} to \code{KExp}. They can be seen as a sequence of
 \code{KLet}s where there is a ``hole'' that needs to be filled. Consider
 for example
@@ -401,7 +402,7 @@
 \noindent
 where in the second line is a $\Box$ which still expects a \code{KVal}
 to be filled in before it becomes a ``proper'' \code{KExp}. When we 
-apply and argument to the continuation (remember they are functions)
+apply an argument to the continuation (remember they are functions)
 we essentially fill something into the corresponding hole. The code
 of the CPS-translation is 
 
@@ -431,6 +432,10 @@
 \end{lstlisting}
 
 \noindent
+For more such rules, have a look to the code of the fun-llvm
+compiler.
+
+\noindent
 \end{document}
 
 
diff -r 904de68a27a4 -r 45a48c47dcca hws/hw06.tex
--- a/hws/hw06.tex	Tue Nov 15 11:34:33 2022 +0000
+++ b/hws/hw06.tex	Thu Nov 24 19:05:34 2022 +0000
@@ -68,6 +68,11 @@
       advantages to first lex a string and then feed a
       sequence of tokens as input to the parser?
 
+% Reason 1 you can filter out whitespaces and comments, which makes the grammar rules simpler. If you have to make sure that a whitespace comes after a variable say, then your parser rule for variables  gets more complicated. Same with comments which do not contribute anything to the parser tree.
+% Reason 2) The lexer can already classify tokens, for example as numbers, keywords or identifiers. This again makes the grammar rules more deterministic and as a result faster to parse.
+% The point is that parser combinators can be used to process strings, but in case of compilers where whitespaces and comments need to be filtered out, the lexing phase is actually quite useful.
+
+      
 \item The injection function for sequence regular expressions is defined
       by three clauses:
 
diff -r 904de68a27a4 -r 45a48c47dcca progs/parser-combinators/comb1.sc
--- a/progs/parser-combinators/comb1.sc	Tue Nov 15 11:34:33 2022 +0000
+++ b/progs/parser-combinators/comb1.sc	Thu Nov 24 19:05:34 2022 +0000
@@ -141,7 +141,7 @@
 println(P.parse_all("()"))
 
 // A parser for arithmetic expressions (Terms and Factors)
-{
+
 lazy val E: Parser[String, Int] = {
   (T ~ p"+" ~ E).map{ case ((x, _), z) => x + z } ||
   (T ~ p"-" ~ E).map{ case ((x, _), z) => x - z } || T }
@@ -149,7 +149,7 @@
   (F ~ p"*" ~ T).map{ case ((x, _), z) => x * z } || F }
 lazy val F: Parser[String, Int] = {
   (p"(" ~ E ~ p")").map{ case ((_, y), _) => y } || NumParserInt }
-}
+
 println(E.parse_all("1+3+4"))
 println(E.parse("1+3+4"))
 println(E.parse_all("4*2+3"))
diff -r 904de68a27a4 -r 45a48c47dcca slides/slides06.pdf
Binary file slides/slides06.pdf has changed
diff -r 904de68a27a4 -r 45a48c47dcca slides/slides07.pdf
Binary file slides/slides07.pdf has changed
diff -r 904de68a27a4 -r 45a48c47dcca slides/slides07.tex
--- a/slides/slides07.tex	Tue Nov 15 11:34:33 2022 +0000
+++ b/slides/slides07.tex	Thu Nov 24 19:05:34 2022 +0000
@@ -18,16 +18,18 @@
   \begin{tabular}{@ {}c@ {}}
   \\[-3mm]
   \LARGE Compilers and \\[-2mm] 
-  \LARGE Formal Languages\\[3mm] 
+  \LARGE Formal Languages\\[-5mm] 
   \end{tabular}}
 
+  
   \normalsize
   \begin{center}
   \begin{tabular}{ll}
-    Email:  & christian.urban at kcl.ac.uk\\
-    %Office Hours: & Thursdays 12 -- 14\\
-    %Location: & N7.07 (North Wing, Bush House)\\
-    Slides \& Progs: & KEATS (also homework is there)\\  
+  Email:  & christian.urban at kcl.ac.uk\\
+  Office Hour: & Fridays 11 -- 12\\
+  Location: & N7.07 (North Wing, Bush House)\\
+  Slides \& Progs: & KEATS\\
+  Pollev: & \texttt{\alert{https://pollev.com/cfltutoratki576}}\\  
   \end{tabular}
   \end{center}