# HG changeset patch # User Christian Urban # Date 1669316734 0 # Node ID 45a48c47dccaec37dfbed75bdc9959135db83cea # Parent 904de68a27a42095aa248be5a1847fb71d7a4a51 updated diff -r 904de68a27a4 -r 45a48c47dcca handouts/ho09.pdf Binary file handouts/ho09.pdf has changed diff -r 904de68a27a4 -r 45a48c47dcca handouts/ho09.tex --- a/handouts/ho09.tex Tue Nov 15 11:34:33 2022 +0000 +++ b/handouts/ho09.tex Thu Nov 24 19:05:34 2022 +0000 @@ -47,7 +47,7 @@ be a modular suite of tools with which you can play around easily and try out something new. LLVM became a big player once Apple hired one of the original developers (I cannot remember the reason why Apple did not -want to use gcc, but maybe they were also just disgusted by its big +want to use gcc, but maybe they were also just disgusted by gcc's big monolithic codebase). Anyway, LLVM is now the big player and gcc is more or less legacy. This does not mean that programming languages like C and C++ are dying out any time soon---they are nicely supported by LLVM. @@ -77,7 +77,7 @@ The main idea behind the SSA format is to use very simple variable -assignments where every variable is assigned only once. The assignments +assignments where every tmp-variable is assigned only once. The assignments also need to be primitive in the sense that they can be just simple operations like addition, multiplication, jumps, comparisons and so on. Say, we have an expression $((1 + a) + (3 + (b * 5)))$, then the @@ -91,12 +91,13 @@ \end{lstlisting} \noindent where every variable is used only once (we could not write -\texttt{tmp1 = add 3 tmp1} in Line 3 for example). There are -sophisticated algorithms for imperative languages, like C, that -efficiently transform a high-level program into SSA format. But we can -ignore them here. We want to compile a functional language and there -things get much more interesting than just sophisticated. We will need -to have a look at CPS translations, where the CPS stands for +\texttt{tmp1 = add 3 tmp1} in Line 3 for example). + +There are sophisticated algorithms for imperative languages, like C, +that efficiently transform a high-level program into SSA format. But +we can ignore them here. We want to compile a functional language and +there things get much more interesting than just sophisticated. We +will need to have a look at CPS translations, where the CPS stands for Continuation-Passing-Style---basically black programming art or abracadabra programming. So sit tight. @@ -133,8 +134,8 @@ \noindent First notice that all variable names, in this case \texttt{x} and \texttt{tmp}, are prefixed with \texttt{\%} in the LLVM-IR. Temporary variables can be named with an identifier, such as -\texttt{tmp}, or numbers. Function names, since they are ``global'', -need to be prefixed with @-symbol. Also, the LLVM-IR is a fully typed +\texttt{tmp}, or numbers. In contrast, function names, since they are ``global'', +need to be prefixed with an @-symbol. Also, the LLVM-IR is a fully typed language. The \texttt{i32} type stands for 32-bit integers. There are also types for 64-bit integers (\texttt{i64}), chars (\texttt{i8}), floats, arrays and even pointer types. In the code above, \texttt{sqr} @@ -143,7 +144,7 @@ C). Each arithmetic operation, for example addition and multiplication, are also prefixed with the type they operate on. Obviously these types need to match up\ldots{} but since we have in our programs only -integers, \texttt{i32} everywhere will do. We do not have to generate +integers, for the moment \texttt{i32} everywhere will do. We do not have to generate any other types, but obviously this is a limitation in our Fun language. There are a few interesting instructions in the LLVM-IR which are quite @@ -159,21 +160,21 @@ \noindent The type \texttt{i1} stands for booleans. If the variable is true, then this instruction jumps to the if-branch, which needs an explicit label; -otherwise to the else-branch, again with its own label. This allows us -to keep the meaning of the boolean expression as is when compiling if's. +otherwise it jumps to the else-branch, again with its own label. This allows us +to keep the meaning of the boolean expression ``as is'' when compiling if's---thanks god no more negating the boolean. A value of type boolean is generated in the LLVM-IR by the \texttt{icmp}-instruction. This instruction is for integers (hence the \texttt{i}) and takes the comparison operation as argument. For example \begin{lstlisting}[language=LLVM] -icmp eq i32 %x, %y ; for equal -icmp sle i32 %x, %y ; signed less or equal -icmp slt i32 %x, %y ; signed less than -icmp ult i32 %x, %y ; unsigned less than +icmp eq i32 %x, %y ; for equal +icmp sle i32 %x, %y ; signed less or equal +icmp slt i32 %x, %y ; signed less than +icmp ult i32 %x, %y ; unsigned less than \end{lstlisting} \noindent -In some operations, the LLVM-IR distinguishes between signed and +Note that in some operations the LLVM-IR distinguishes between signed and unsigned representations of integers. It is also easy to call another function in LLVM-IR: as can be @@ -346,8 +347,8 @@ \end{lstlisting} \noindent -which is the expected result. If this looks somewhat familiar, then this -is not a 1000 miles off, because functions with continuations can be +which is the expected result. If this looks somewhat familiar to you, +than this is because functions with continuations can be seen as a kind of generalisation of tail-recursive functions. Anyway notice how the continuations is ``stacked up'' during the recursion and then ``unrolled'' when we apply 1 to the continuation. Interestingly, we @@ -385,7 +386,7 @@ intermediate results need to be chained into later instructions. To do this conveniently, CPS-translations have been developed. They use functions (``continuations'') to represent what is coming next in a -sequence of instructions. Continuations are functions of type +sequence of instructions. In our case, continuations are functions of type \code{KVal} to \code{KExp}. They can be seen as a sequence of \code{KLet}s where there is a ``hole'' that needs to be filled. Consider for example @@ -401,7 +402,7 @@ \noindent where in the second line is a $\Box$ which still expects a \code{KVal} to be filled in before it becomes a ``proper'' \code{KExp}. When we -apply and argument to the continuation (remember they are functions) +apply an argument to the continuation (remember they are functions) we essentially fill something into the corresponding hole. The code of the CPS-translation is @@ -431,6 +432,10 @@ \end{lstlisting} \noindent +For more such rules, have a look to the code of the fun-llvm +compiler. + +\noindent \end{document} diff -r 904de68a27a4 -r 45a48c47dcca hws/hw06.tex --- a/hws/hw06.tex Tue Nov 15 11:34:33 2022 +0000 +++ b/hws/hw06.tex Thu Nov 24 19:05:34 2022 +0000 @@ -68,6 +68,11 @@ advantages to first lex a string and then feed a sequence of tokens as input to the parser? +% Reason 1 you can filter out whitespaces and comments, which makes the grammar rules simpler. If you have to make sure that a whitespace comes after a variable say, then your parser rule for variables gets more complicated. Same with comments which do not contribute anything to the parser tree. +% Reason 2) The lexer can already classify tokens, for example as numbers, keywords or identifiers. This again makes the grammar rules more deterministic and as a result faster to parse. +% The point is that parser combinators can be used to process strings, but in case of compilers where whitespaces and comments need to be filtered out, the lexing phase is actually quite useful. + + \item The injection function for sequence regular expressions is defined by three clauses: diff -r 904de68a27a4 -r 45a48c47dcca progs/parser-combinators/comb1.sc --- a/progs/parser-combinators/comb1.sc Tue Nov 15 11:34:33 2022 +0000 +++ b/progs/parser-combinators/comb1.sc Thu Nov 24 19:05:34 2022 +0000 @@ -141,7 +141,7 @@ println(P.parse_all("()")) // A parser for arithmetic expressions (Terms and Factors) -{ + lazy val E: Parser[String, Int] = { (T ~ p"+" ~ E).map{ case ((x, _), z) => x + z } || (T ~ p"-" ~ E).map{ case ((x, _), z) => x - z } || T } @@ -149,7 +149,7 @@ (F ~ p"*" ~ T).map{ case ((x, _), z) => x * z } || F } lazy val F: Parser[String, Int] = { (p"(" ~ E ~ p")").map{ case ((_, y), _) => y } || NumParserInt } -} + println(E.parse_all("1+3+4")) println(E.parse("1+3+4")) println(E.parse_all("4*2+3")) diff -r 904de68a27a4 -r 45a48c47dcca slides/slides06.pdf Binary file slides/slides06.pdf has changed diff -r 904de68a27a4 -r 45a48c47dcca slides/slides07.pdf Binary file slides/slides07.pdf has changed diff -r 904de68a27a4 -r 45a48c47dcca slides/slides07.tex --- a/slides/slides07.tex Tue Nov 15 11:34:33 2022 +0000 +++ b/slides/slides07.tex Thu Nov 24 19:05:34 2022 +0000 @@ -18,16 +18,18 @@ \begin{tabular}{@ {}c@ {}} \\[-3mm] \LARGE Compilers and \\[-2mm] - \LARGE Formal Languages\\[3mm] + \LARGE Formal Languages\\[-5mm] \end{tabular}} + \normalsize \begin{center} \begin{tabular}{ll} - Email: & christian.urban at kcl.ac.uk\\ - %Office Hours: & Thursdays 12 -- 14\\ - %Location: & N7.07 (North Wing, Bush House)\\ - Slides \& Progs: & KEATS (also homework is there)\\ + Email: & christian.urban at kcl.ac.uk\\ + Office Hour: & Fridays 11 -- 12\\ + Location: & N7.07 (North Wing, Bush House)\\ + Slides \& Progs: & KEATS\\ + Pollev: & \texttt{\alert{https://pollev.com/cfltutoratki576}}\\ \end{tabular} \end{center}