handouts/pep-ho.tex
author Christian Urban <urbanc@in.tum.de>
Wed, 04 Jul 2018 14:48:05 +0100
changeset 188 937c995b047a
parent 187 4d300409f2fe
child 189 ff815ca0bbcf
child 192 a112e0e2325c
permissions -rw-r--r--
updated

\documentclass{article}
\usepackage{../style}
\usepackage{../langs}
\usepackage{marvosym}
\usepackage{boxedminipage}

%cheat sheet
%http://worldline.github.io/scala-cheatsheet/

% case class, apply, unapply
% see https://medium.com/@thejasbabu/scala-pattern-matching-9c9e73ba9a8a

\begin{document}
\fnote{\copyright{} Christian Urban, King's College London, 2017, 2018}

\section*{A Crash-Course in Scala}

\mbox{}\hfill\textit{``Scala --- \underline{S}lowly \underline{c}ompiled 
\underline{a}cademic \underline{la}nguage''}\smallskip\\
\mbox{}\hfill\textit{ --- a joke(?) on Twitter}\bigskip
 
\noindent
Scala is a programming language that combines functional and
object-oriented programming-styles. It has received quite a bit of
attention in the last five or so years. One reason for this attention is
that, like the Java programming language, Scala compiles to the Java
Virtual Machine (JVM) and therefore Scala programs can run under MacOSX,
Linux and Windows.\footnote{There are also experimental backends of
Scala for producing code under Android (\url{http://scala-android.org});
for generating JavaScript code to build browser applications
\url{(https://www.scala-js.org)}; and there is work under way to
have a native Scala compiler generating X86-code
(\url{http://www.scala-native.org}).} Because of this it has also access to
the myriads of Java libraries. Unlike Java, however, Scala often allows
programmers to write very concise and elegant code.  Some therefore say
``Scala is the better Java''.\footnote{from
\url{https://www.slideshare.net/maximnovak/joy-of-scala}} 

A number of companies---the Guardian, Twitter, Coursera, FourSquare, Netflix,
LinkedIn to name a few---either use Scala exclusively in
production code, or at least to some substantial degree. Scala seems
also useful in job-interviews (especially in data science) according to
this anecdotal report

\begin{quote}
\url{http://techcrunch.com/2016/06/14/scala-is-the-new-golden-child}
\end{quote}

\noindent
The official Scala compiler can be downloaded from

\begin{quote}
\url{http://www.scala-lang.org}
\end{quote}

\noindent
I found a convenient IDE for writing Scala programs is Microsoft's
\textit{Visual Studio Code} (VS Code) which runs under MacOSX, Linux and
obviously Windows. It can be downloaded for free from

\begin{quote}
\url{https://code.visualstudio.com}
\end{quote}

\noindent
and should already come pre-installed in the Department (together with
the Scala compiler). VS Code is far from perfect, however it includes a
\textit{Marketplace} from which a multitude of extensions can be
downloaded that make editing and running Scala code a little easier (see
Figure~\ref{vscode} for my setup).

\begin{figure}[t]
\begin{boxedminipage}{\textwidth}  
\begin{center}  
\includegraphics[scale=0.15]{../pics/vscode.png}\\[-10mm]\mbox{}
\end{center}
\caption{My personal installation of VS Code includes the following
packages from Marketplace: Scala Syntax (official), Code Runner, Code
Spell Checker, Rewrap and Subtle Match Brackets. I have also bound 
the keys \keys{Ctrl} \keys{Ret} to the action
``Run-Selected-Text-In-Active-Terminal'' in order to quickly evaluate
small code snippets in the Scala REPL.\label{vscode}}
\end{boxedminipage}
\end{figure}  

What I like most about VS Code is that it provides easy access to the
Scala REPL. But if you prefer another editor for coding, it is also
painless to work with Scala completely on the command line (as you might
have done with \texttt{g++} in the earlier part of PEP). For the
lazybones among us, there is even an online editor and environment for
developing and running Scala programs called \textit{ScalaFiddle}, which
requires zero setup (assuming you have a browser handy). You can access
it from: 
 
\begin{quote}
\url{https://scalafiddle.io}\medskip
\end{quote}
  

Scala can be used with the heavy-duty IDEs Eclipse and IntelliJ.
A ready-made Scala bundle for Eclipse is available from

\begin{quote}
\url{http://scala-ide.org/download/sdk.html}
\end{quote}

\noindent
Also IntelliJ includes plugins for Scala. \textbf{BUT}, I do \textbf{not}
recommend the usage of either Eclipse or IntelliJ for PEP: these IDEs
seem to make your life harder, rather than easier, for the small
programs we will write in this module. They are really meant to be used
when you have a million-lines codebase, rather than our
``toy-programs''\ldots{}for example why on earth am I required to create a
completely new project with several subdirectories when I just want to
try out 20-lines of Scala code? Your mileage may vary though. ;o)

\subsection*{Why Functional Programming?}

Before we go on, let me explain a bit more why we want to inflict upon
you another programming language. You hopefully have mastered Java and
C++\ldots{}the world should be your oyster, no? Well, it is not that
easy. We do require Scala in PEP, but actually we do not religiously
care whether you learn Scala---after all it is just a programming
language (albeit a nifty one IMHO). What we do care about is that you
learn about \textit{functional programming}. Scala is just the vehicle
for that. Still, you need to learn Scala well enough to get good marks
in PEP, but functional programming could equally be taught with Haskell,
F\#, SML, Ocaml, Kotlin, Clojure, Scheme, Elm and many other functional
programming languages. %Your friendly lecturer just happens to like Scala
%and the Department agreed that it is a good idea to inflict Scala upon
%you.

Very likely writing programs in a functional programming language is
quite different from what you are  used to in your study so far. It
might even be totally alien to you. The reason is that functional
programming seems to go against the core principles of
\textit{imperative programming} (which is what you do in Java and C++
for example). The main idea of imperative programming  is that you have
some form of ``state'' in your program and you continuously change this
state by issuing some commands (e.g.~updating a field in an array or
object, adding one to a variable and so on). The classic example for
this style of programming are \texttt{for}-loops, for example

\begin{lstlisting}[language=C,numbers=none]
for (int i = 10; i < 20; i++) { 
      //...Do something interesting with i...
}
\end{lstlisting}

\noindent Here the integer variable \texttt{i} embodies the state, which
is first set to \texttt{10} and then increased by one in each
loop-iteration until it reaches \texttt{20} at which point the loop
exits. When this code is compiled and actually runs, there will be some
dedicated space reserved for \texttt{i} in memory. This space of
typically 32 bits contains \texttt{i}'s current value\ldots\texttt{10}
at the beginning, and then the content will be updated by, or
overwritten with, some new content in every iteration. The main point
here is that this kind of updating, or manipulating, memory is
25.806\ldots or \textbf{THE ROOT OF ALL EVIL}!!

\begin{center}
\includegraphics[scale=0.25]{../pics/root-of-all-evil.png}
\end{center}  


\noindent
\ldots{}Well, it is perfectly benign if you have a sequential program
that gets run instruction by instruction...nicely one after another.
This kind of running code uses a single core of your CPU and goes as
fast as your CPU frequency, also called clock-speed, allows. The problem
is that this clock-speed has not much increased over the past decade and
no dramatic increases are predicted for any time soon. So you are a bit
stuck, unlike previous generations of developers who could rely upon the
fact that every 2 years or so their code would run twice as fast (in
ideal circumstances) because the clock-speed of their CPUs got twice as
fast. This unfortunately does not happen any more nowadays. To get you
out of this dreadful situation, CPU producers pile more and more
cores into CPUs in order to make them more powerful and potentially make
software faster. The task for you as developer is to take somehow
advantage of these cores by running as much of your code as possible in
parallel on as many cores you have available (typically 4 in modern
laptops and sometimes much more on high-end machines). In this
situation, \textit{mutable} variables like \texttt{i} above are evil, or
at least a major nuisance: Because if you want to distribute some of the
loop-iterations over the cores that are currently idle in your system,
you need to be extremely careful about who can read and overwrite
the variable \texttt{i}.\footnote{If you are of the mistaken belief that nothing
nasty can happen to \texttt{i} inside the \texttt{for}-loop, then you
need to go back over the C++ material.} Especially the writing operation
is critical because you do not want that conflicting writes mess about
with \texttt{i}. Take my word: an untold amount of misery has arisen
from this problem. The catch is that if you try to solve this problem in
C++ or Java, and be as defensive as possible about reads and writes to
\texttt{i}, then you need to synchronise access to it. The result is that
your program more often than not waits more than it runs, thereby
defeating the point of trying to run the program in parallel in the
first place. If you are less defensive, then usually all hell breaks
loose by seemingly obtaining random results. And forget the idea of
being able to debug such code.

The central idea of functional programming is to eliminate any state
from programs---or at least from the ``interesting bits''. Because then
it is easy to parallelize the resulting programs: if you do not have any
state, then once created, all memory content stays unchanged and reads
to such memory are absolutely safe without the need of any
synchronisation. An example is given in Figure~\ref{mand} where in the
absence of the annoying state, Scala makes it very easy to calculate the
Mandelbrot set on as many cores of your CPU as possible. Why is it so
easy in this example? Because each pixel in the Mandelbrot set can be
calculated independently and the calculation does not need to update any
variable. It is so easy in fact that going from the sequential version
of the Mandelbrot program to the parallel version can be achieved by
adding just eight characters---in two places you have to add
\texttt{.par}. Try the same in C++ or Java!

\begin{figure}[p]
\begin{boxedminipage}{\textwidth}

A Scala program for generating pretty pictures of the Mandelbrot set 
(\url{https://en.wikipedia.org/wiki/Mandelbrot_set},
\url{https://www.youtube.com/watch?v=aSg2Db3jF_4}):
\begin{center}    
\begin{tabular}{c}  
\includegraphics[scale=0.12]{../pics/mand1.png}
\end{tabular}
\end{center}

\begin{center}
\begin{tabular}{@{}p{0.45\textwidth}|p{0.45\textwidth}@{}}
  \bf sequential version: & \bf parallel version (on 4 cores):\smallskip\\

  {\hfill\includegraphics[scale=0.12]{../pics/mand4.png}\hfill} &
  {\hfill\includegraphics[scale=0.12]{../pics/mand3.png}\hfill} \\

{\footnotesize\begin{lstlisting}[xleftmargin=-1mm]
for (y <- (0 until H)) {
  for (x <- (0 until W)) {
    
    val c = start + 
      (x * d_x + y * d_y * i)
    val iters = iterations(c, max) 
    val col = 
      if (iters == max) black 
      else colours(iters % 16)

    pixel(x, y, col)
  }
  viewer.updateUI()
}   
\end{lstlisting}}   
& 
{\footnotesize\begin{lstlisting}[xleftmargin=0mm]
for (y <- (0 until H)/*@\keys{\texttt{.par}}@*/) {
  for (x <- (0 until W)/*@\keys{\texttt{.par}}@*/) {
      
    val c = start + 
      (x * d_x + y * d_y * i)
    val iters = iterations(c, max) 
    val col = 
      if (iters == max) black 
      else colours(iters % 16)
  
    pixel(x, y, col)
  }
  viewer.updateUI()
}   
\end{lstlisting}}\\

\centering\includegraphics[scale=0.5]{../pics/cpu2.png} &
\centering\includegraphics[scale=0.5]{../pics/cpu1.png}
\end{tabular}
\end{center}
\caption{The main ``loops'' creating the picture of the Mandelbrot set. The parallel version
only differs in \texttt{.par} added to the ``ranges'' of the x-y-coordinates. As can be
seen from the CPU loads: in the sequential versions there is a lower peak for an
extended period, while in the parallel version there is a short sharp burst for 
essentially the same workload.
\label{mand}} 
\end{boxedminipage}
\end{figure}  

But remember this easy parallelisation of code requires that we
have no state in our programs\ldots{}that is no counters like
\texttt{i} in \texttt{for}-loops. You might then ask, how do I write
loops without such counters? Well, teaching you that this is possible is
one of the main points of the Scala-part in PEP. I can assure you it is
possible, but you have to get your head around it. Once you have
mastered this, it will be fun to have no state in your programs (a side
product is that it much easier to debug state-less code and also more
often than not easier to understand). So have fun with
Scala!\footnote{If you are still not convinced about the function
programming ``thing'', there are a few more arguments: a lot of research
in programming languages happens to take place in functional programming
languages. This has resulted in ultra-useful features such as
pattern-matching, strong type-systems, lazyness, implicits, algebraic
datatypes  to name a few. Imperative languages seem to often lag behind
in adopting them: I know, for example, that Java will at some point in
the future support pattern-matching, which has been used in SML for at
least 40(!) years. See
\url{http://cr.openjdk.java.net/~briangoetz/amber/pattern-match.html}.
Also Rust, a C-like programming language that has been developed since
2010 and is gaining quite some interest, borrows many ideas from
functional programming from yesteryear.}


\subsection*{The Very Basics}

One advantage of Scala over Java is that it includes an interpreter (a
REPL, or
\underline{R}ead-\underline{E}val-\underline{P}rint-\underline{L}oop)
with which you can run and test small code snippets without the need
of a compiler. This helps a lot with interactively developing
programs. It is my preferred way of writing small Scala
programs. Once you installed Scala, you can start the interpreter by
typing on the command line:

\begin{lstlisting}[language={},numbers=none,basicstyle=\ttfamily\small]
$ scala
Welcome to Scala 2.12.6 (Java HotSpot(TM) 64-Bit Server VM, Java 9).
Type in expressions for evaluation. Or try :help.

scala>
\end{lstlisting}%$

\noindent The precise response may vary depending
on the version and platform where you installed Scala. At the Scala
prompt you can type things like \code{2 + 3}\;\keys{Ret} and
the output will be

\begin{lstlisting}[numbers=none]
scala> 2 + 3
res0: Int = 5
\end{lstlisting}

\noindent The answer means that he result of the addition is of type
\code{Int} and the actual result is 5; \code{res0} is a name that
Scala gives automatically to the result. You can reuse this name later
on, for example

\begin{lstlisting}[numbers=none]
scala> res0 + 4
res1: Int = 9
\end{lstlisting}

\noindent
Another classic example you can try out is

\begin{lstlisting}[numbers=none]
scala> print("hello world")
hello world
\end{lstlisting}

\noindent Note that in this case there is no result. The
reason is that \code{print} does not actually produce a result
(there is no \code{resX} and no type), rather it is a
function that causes the \emph{side-effect} of printing out a
string. Once you are more familiar with the functional
programming-style, you will know what the difference is
between a function that returns a result, like addition, and a
function that causes a side-effect, like \code{print}. We
shall come back to this point later, but if you are curious
now, the latter kind of functions always has \code{Unit} as
return type. It is just not printed by Scala. 

You can try more examples with the Scala REPL, but feel free to
first guess what the result is (not all answers by Scala are obvious):

\begin{lstlisting}[numbers=none]
scala> 2 + 2
scala> 1 / 2
scala> 1.0 / 2
scala> 1 / 2.0
scala> 1 / 0
scala> 1.0 / 0.0
scala> true == false
scala> true && false
scala> 1 > 1.0
scala> "12345".length
scala> List(1,2,1).size
scala> Set(1,2,1).size
\end{lstlisting}\smallskip

\noindent
Please take the Scala REPL seriously: If you want to take advantage of my
reference implementation for the assignments, you will need to be
able to ``play around'' with it!

\subsection*{Standalone Scala Apps}

If you want to write a stand-alone app in Scala, you can
implement an object that is an instance of \code{App}, say

\begin{lstlisting}[numbers=none]
object Hello extends App {
    println("hello world")
}
\end{lstlisting}

\noindent save it in a file, for example {\tt hello-world.scala}, and
then run the compiler (\texttt{scalac}) and start the runtime
environment (\texttt{scala}):

\begin{lstlisting}[language={},numbers=none,basicstyle=\ttfamily\small]
$ scalac hello-world.scala
$ scala Hello
hello world
\end{lstlisting}

\noindent
Like Java, Scala targets the JVM and consequently
Scala programs can also be executed by the bog-standard Java
Runtime. This only requires the inclusion of {\tt
scala-library.jar}, which on my computer can be done as
follows:

\begin{lstlisting}[language={},numbers=none,basicstyle=\ttfamily\small]
$ scalac hello-world.scala
$ java -cp /usr/local/src/scala/lib/scala-library.jar:. Hello
hello world
\end{lstlisting}

\noindent You might need to adapt the path to where you have
installed Scala.

\subsection*{Values}

In the lectures I will try to avoid as much as possible the term
\emph{variables} familiar from other programming languages. The reason
is that Scala has \emph{values}, which can be seen as abbreviations of
larger expressions. For example

\begin{lstlisting}[numbers=none]
scala> val x = 42
x: Int = 42

scala> val y = 3 + 4
y: Int = 7

scala> val z = x / y
z: Int = 6
\end{lstlisting}

\noindent
Why the kerfuffle about values? Well, values are \emph{immutable}. You 
cannot change their value after you defined them. If you try to reassign
\code{z} above, Scala will yell at you:

\begin{lstlisting}[numbers=none]
scala> z = 9
error: reassignment to val
       z = 9
         ^
\end{lstlisting}

\noindent
So it would be a bit absurd to call values as variables...you cannot
change them; they cannot vary. You might think you can re-assign them like

\begin{lstlisting}[numbers=none]
scala> val x = 42
scala> val z = x / 7
scala> val x = 70
scala> println(z) 
\end{lstlisting}

\noindent but try to guess what Scala will print out 
for \code{z}?  Will it be \code{6} or \code{10}? A final word about
values: Try to stick to the convention that names of values should be
lower case, like \code{x}, \code{y}, \code{foo41} and so on. Upper-case
names you should reserve for what is called \emph{constructors}.


\subsection*{Function Definitions}

We do functional programming! So defining functions will be our main occupation.
As an example, a function named \code{f} taking a single argument of type 
\code{Int} can be defined in Scala as follows:

\begin{lstlisting}[numbers=none]
def f(x: Int) : String = ...EXPR...
\end{lstlisting} 

\noindent
This function returns the value resulting from evaluating the expression
\code{EXPR} (whatever is substituted for this). The result will be
of type \code{String}. It is a good habit to include this information
about the return type always. Simple examples of Scala functions are:

\begin{lstlisting}[numbers=none]
def incr(x: Int) : Int = x + 1
def double(x: Int) : Int = x + x
def square(x: Int) : Int = x * x
\end{lstlisting}

\noindent
The general scheme for a function is

\begin{lstlisting}[numbers=none]
def fname(arg1: ty1, arg2: ty2,..., argn: tyn): rty = {
  BODY
}
\end{lstlisting}

\noindent
where each argument requires its type and the result type of the
function, \code{rty}, should be given. If the body of the function is
more complex, then it can be enclosed in braces, like above. If it it
is just a simple expression, like \code{x + 1}, you can omit the
braces. Very often functions are recursive (call themselves) like
the venerable factorial function:

\begin{lstlisting}[numbers=none]
def fact(n: Int): Int = 
  if (n == 0) 1 else n * fact(n - 1)
\end{lstlisting}

\noindent
Note that Scala does not have a \code{then}-keyword in an \code{if}-statement.
  
\subsection*{Loops, or better the Absence thereof}

Coming from Java or C++, you might be surprised that Scala does
not really have loops. It has instead, what is in functional
programming called, \emph{maps}. To illustrate how they work,
let us assume you have a list of numbers from 1 to 8 and want to
build the list of squares. The list of numbers from 1 to 8 
can be constructed in Scala as follows:

\begin{lstlisting}[numbers=none]
scala> (1 to 8).toList
res1: List[Int] = List(1, 2, 3, 4, 5, 6, 7, 8)
\end{lstlisting}

\noindent Generating from this list, the list of squares in a
programming language such as Java, you would assume the list
is given as a kind of array. You would then iterate, or loop,
an index over this array and replace each entry in the array
by the square. Right? In Scala, and in other functional
programming languages, you use maps to achieve the same. 

A map essentially takes a function that describes how each
element is transformed (for example squared) and a list over
which this function should work. There are two forms to
express such maps in Scala. The first way is called a
\emph{for-comprehension}. Squaring the numbers from 1 to 8
would look as follows:

\begin{lstlisting}[numbers=none]
scala> for (n <- (1 to 8).toList) yield n * n
res2: List[Int] = List(1, 4, 9, 16, 25, 36, 49, 64)
\end{lstlisting}

\noindent The important keywords are \code{for} and
\code{yield}. This for-comprehension roughly states that from
the list of numbers we draw \code{n}s and compute the result
of \code{n * n}. As you can see, we specified the list where
each \code{n} comes from, namely \code{(1 to 8).toList}, and
how each element needs to be transformed. This can also be
expressed in a second way in Scala by using directly
\code{map}s as follows:

\begin{lstlisting}[numbers=none]
scala> (1 to 8).toList.map(n => n * n)
res3 = List(1, 4, 9, 16, 25, 36, 49, 64)
\end{lstlisting}

\noindent In this way, the expression \code{n => n * n} stands
for the function that calculates the square (this is how the
\code{n}s are transformed). This expression for functions
might remind you of your lessons about the lambda-calculus
where this would have been written as $\lambda n.\,n * n$. It
might not be obvious, but for-comprehensions are just
syntactic sugar: when compiling, Scala translates
for-comprehensions into equivalent maps. This even works
when for-comprehensions get more complicated (see below).

The very charming feature of Scala is that such maps or
for-comprehensions can be written for any kind of data
collection, such as lists, sets, vectors, options and so on.
For example if we instead compute the reminders modulo 3 of
this list, we can write

\begin{lstlisting}[numbers=none]
scala> (1 to 8).toList.map(n => n % 3)
res4 = List(1, 2, 0, 1, 2, 0, 1, 2)
\end{lstlisting}

\noindent If we, however, transform the numbers 1 to 8 not
into a list, but into a set, and then compute the reminders
modulo 3 we obtain

\begin{lstlisting}[numbers=none]
scala> (1 to 8).toSet[Int].map(n => n % 3)
res5 = Set(2, 1, 0)
\end{lstlisting}

\noindent This is the correct result for sets, as there are
only three equivalence classes of integers modulo 3. Note that
in this example we need to ``help'' Scala to transform the
numbers into a set of integers by explicitly annotating the
type \code{Int}. Since maps and for-comprehensions are
just syntactic variants of each other, the latter can also be
written as

\begin{lstlisting}[numbers=none]
scala> for (n <- (1 to 8).toSet[Int]) yield n % 3
res5 = Set(2, 1, 0)
\end{lstlisting}

For-comprehensions can also be nested and the selection of 
elements can be guarded. For example if we want to pair up
the numbers 1 to 4 with the letters a to c, we can write

\begin{lstlisting}[numbers=none]
scala> for (n <- (1 to 4).toList; 
            m <- ('a' to 'c').toList) yield (n, m)
res6 = List((1,a), (1,b), (1,c), (2,a), (2,b), (2,c), 
            (3,a), (3,b), (3,c), (4,a), (4,b), (4,c))
\end{lstlisting}

\noindent 
Or if we want to find all pairs of numbers between 1 and 3
where the sum is an even number, we can write

\begin{lstlisting}[numbers=none]
scala> for (n <- (1 to 3).toList; 
            m <- (1 to 3).toList;
            if (n + m) % 2 == 0) yield (n, m)
res7 = List((1,1), (1,3), (2,2), (3,1), (3,3))
\end{lstlisting}

\noindent The \code{if}-condition in the for-comprehension
filters out all pairs where the sum is not even.

While hopefully this all looks reasonable, there is one
complication: In the examples above we always wanted to
transform one list into another list (e.g.~list of squares),
or one set into another set (set of numbers into set of
reminders modulo 3). What happens if we just want to print out
a list of integers? Then actually the for-comprehension
needs to be modified. The reason is that \code{print}, you
guessed it, does not produce any result, but only produces
what is in the functional-programming-lingo called a
side-effect. Printing out the list of numbers from 1 to 5
would look as follows

\begin{lstlisting}[numbers=none]
scala> for (n <- (1 to 5).toList) print(n)
12345
\end{lstlisting}

\noindent
where you need to omit the keyword \code{yield}. You can
also do more elaborate calculations such as

\begin{lstlisting}[numbers=none]
scala> for (n <- (1 to 5).toList) {
  val square_n = n * n
  println(s"$n * $n = $square_n") 
}
1 * 1 = 1
2 * 2 = 4
3 * 3 = 9
4 * 4 = 16
5 * 5 = 25
\end{lstlisting}%$

\noindent In this code I use a variable assignment (\code{val
square_n = ...} ) and also what is called in Scala a
\emph{string interpolation}, written \code{s"..."}. The latter
is for printing out an equation. It allows me to refer to the
integer values \code{n} and \code{square\_n} inside a string.
This is very convenient for printing out ``things''. 

The corresponding map construction for functions with 
side-effects is in Scala called \code{foreach}. So you 
could also write


\begin{lstlisting}[numbers=none]
scala> (1 to 5).toList.foreach(n => print(n))
12345
\end{lstlisting}


\noindent or even just

\begin{lstlisting}[numbers=none]
scala> (1 to 5).toList.foreach(print)
12345
\end{lstlisting}

\noindent Again I hope this reminds you a bit of your
lambda-calculus lessons, where an explanation is given why
both forms produce the same result.


If you want to find out more about maps and functions with
side-effects, you can ponder about the response Scala gives if
you replace \code{foreach} by \code{map} in the expression
above. Scala will still allow \code{map} with side-effect
functions, but then reacts with a slightly interesting result.

\subsection*{Types}

In most functional programming languages, types play an
important role. Scala is such a language. You have already
seen built-in types, like \code{Int}, \code{Boolean},
\code{String} and \code{BigInt}, but also user-defined ones,
like \code{Rexp}. Unfortunately, types can be a thorny
subject, especially in Scala. For example, why do we need to
give the type to \code{toSet[Int]}, but not to \code{toList}?
The reason is the power of Scala, which sometimes means it
cannot infer all necessary typing information. At the
beginning while getting familiar with Scala, I recommend a
``play-it-by-ear-approach'' to types. Fully understanding
type-systems, especially complicated ones like in Scala, can
take a module on their own.\footnote{Still, such a study can
be a rewarding training: If you are in the business of
designing new programming languages, you will not be able to
turn a blind eye to types. They essentially help programmers
to avoid common programming errors and help with maintaining
code.}

In Scala, types are needed whenever you define an inductive
datatype and also whenever you define functions (their
arguments and their results need a type). Base types are types
that do not take any (type)arguments, for example \code{Int}
and \code{String}. Compound types take one or more arguments,
which as seen earlier need to be given in angle-brackets, for
example \code{List[Int]} or \code{Set[List[String]]} or 
\code{Map[Int, Int]}.

There are a few special type-constructors that fall outside
this pattern. One is for tuples, where the type is written
with parentheses. For example 

\begin{lstlisting}[ numbers=none]
(Int, Int, String)
\end{lstlisting}

\noindent is for a triple (a tuple with three components---two
integers and a string). Tuples are helpful if you want to
define functions with multiple results, say the function
returning the quotient and reminder of two numbers. For this
you might define:


\begin{lstlisting}[ numbers=none]
def quo_rem(m: Int, n: Int) : (Int, Int) = (m / n, m % n)
\end{lstlisting}


\noindent Since this function returns a pair of integers, its
return type needs to be of type \code{(Int, Int)}.
Incidentally, this is also the input type of this function.
Notice this function takes \emph{two} arguments, namely
\code{m} and \code{n}, both of which are integers. They are
``packaged'' in a pair. Consequently the complete type of
\code{quo_rem} is

\begin{lstlisting}[ numbers=none]
(Int, Int) => (Int, Int)
\end{lstlisting}

Another special type-constructor is for functions, written as
the arrow \code{=>}. For example, the type \code{Int =>
String} is for a function that takes an integer as input
argument and produces a string as result. A function of this
type is for instance

\begin{lstlisting}[numbers=none]
def mk_string(n: Int) : String = n match {
  case 0 => "zero"
  case 1 => "one"
  case 2 => "two"
  case _ => "many" 
} 
\end{lstlisting}

\noindent It takes an integer as input argument and returns a
string. Unlike other functional programming languages, there
is in Scala no easy way to find out the types of existing
functions, except by looking into the documentation

\begin{quote}
\url{http://www.scala-lang.org/api/current/}
\end{quote}

The function arrow can also be iterated, as in 
\code{Int => String => Boolean}. This is the type for a function
taking an integer as first argument and a string as second,
and the result of the function is a boolean. Though silly, a
function of this type would be


\begin{lstlisting}[numbers=none]
def chk_string(n: Int)(s: String) : Boolean = 
  mk_string(n) == s
\end{lstlisting}


\noindent which checks whether the integer \code{n}
corresponds to the name \code{s} given by the function
\code{mk\_string}. Notice the unusual way of specifying the
arguments of this function: the arguments are given one after
the other, instead of being in a pair (what would be the type
of this function then?). This way of specifying the arguments
can be useful, for example in situations like this

\begin{lstlisting}[numbers=none]
scala> List("one", "two", "three", "many").map(chk_string(2))
res4 = List(false, true, false, false)

scala> List("one", "two", "three", "many").map(chk_string(3))
res5 = List(false, false, false, true)
\end{lstlisting}

\noindent In each case we can give to \code{map} a specialised
version of \code{chk_string}---once specialised to 2 and once
to 3. This kind of ``specialising'' a function is called
\emph{partial application}---we have not yet given to this
function all arguments it needs, but only some of them.

Coming back to the type \code{Int => String => Boolean}. The
rule about such function types is that the right-most type
specifies what the function returns (a boolean in this case).
The types before that specify how many arguments the function
expects and what their type is (in this case two arguments,
one of type \code{Int} and another of type \code{String}).
Given this rule, what kind of function has type
\mbox{\code{(Int => String) => Boolean}}? Well, it returns a
boolean. More interestingly, though, it only takes a single
argument (because of the parentheses). The single argument
happens to be another function (taking an integer as input and
returning a string). Remember that \code{mk_string} is just 
such a function. So how can we use it? For this define
the somewhat silly function \code{apply_3}:

\begin{lstlisting}[numbers=none]
def apply_3(f: Int => String): Bool = f(3) == "many"

scala> apply_3(mk_string)
res6 = true
\end{lstlisting}

You might ask: Apart from silly functions like above, what is
the point of having functions as input arguments to other
functions? In Java there is indeed no need of this kind of
feature: at least in the past it did not allow such
constructions. I think, the point of Java 8 is to lift this
restriction. But in all functional programming languages,
including Scala, it is really essential to allow functions as
input argument. Above you already seen \code{map} and
\code{foreach} which need this. Consider the functions
\code{print} and \code{println}, which both print out strings,
but the latter adds a line break. You can call \code{foreach}
with either of them and thus changing how, for example, five
numbers are printed.


\begin{lstlisting}[numbers=none]
scala> (1 to 5).toList.foreach(print)
12345
scala> (1 to 5).toList.foreach(println)
1
2
3
4
5
\end{lstlisting}


\noindent This is actually one of the main design principles
in functional programming. You have generic functions like
\code{map} and \code{foreach} that can traverse data containers,
like lists or sets. They then take a function to specify what
should be done with each element during the traversal. This
requires that the generic traversal functions can cope with
any kind of function (not just functions that, for example,
take as input an integer and produce a string like above).
This means we cannot fix the type of the generic traversal
functions, but have to keep them
\emph{polymorphic}.\footnote{Another interesting topic about
types, but we omit it here for the sake of brevity.} 

There is one more type constructor that is rather special. It
is called \code{Unit}. Recall that \code{Boolean} has two
values, namely \code{true} and \code{false}. This can be used,
for example, to test something and decide whether the test
succeeds or not. In contrast the type \code{Unit} has only a
single value, written \code{()}. This seems like a completely
useless type and return value for a function, but is actually
quite useful. It indicates when the function does not return
any result. The purpose of these functions is to cause
something being written on the screen or written into a file,
for example. This is what is called they cause some effect on 
the side, namely a new content displayed on the screen or some
new data in a file. Scala uses the \code{Unit} type to indicate
that a function does not have a result, but potentially causes
some side-effect. Typical examples are the printing functions, 
like \code{print}.


% \subsection*{Cool Stuff}

% The first wow-moment I had with Scala was when I came across
% the following code-snippet for reading a web-page. 


% \begin{lstlisting}[ numbers=none]
% import io.Source
% val url = """http://www.inf.kcl.ac.uk/staff/urbanc/"""
% Source.fromURL(url)("ISO-8859-1").take(10000).mkString
% \end{lstlisting}


% \noindent These three lines return a string containing the
% HTML-code of my webpage. It actually already does something
% more sophisticated, namely only returns the first 10000
% characters of a webpage in case it is too large. Why is that
% code-snippet of any interest? Well, try implementing
% reading-from-a-webpage in Java. I also like the possibility of
% triple-quoting strings, which I have only seen in Scala so
% far. The idea behind this is that in such a string all
% characters are interpreted literally---there are no escaped
% characters, like \verb|\n| for newlines.

% My second wow-moment I had with a feature of Scala that other
% functional programming languages do not have. This feature is
% about implicit type conversions. If you have regular
% expressions and want to use them for language processing you
% often want to recognise keywords in a language, for example
% \code{for},{} \code{if},{} \code{yield} and so on. But the
% basic regular expression \code{CHAR} can only recognise a
% single character. In order to recognise a whole string, like
% \code{for}, you have to put many of those together using
% \code{SEQ}:


% \begin{lstlisting}[numbers=none]
% SEQ(CHAR('f'), SEQ(CHAR('o'), CHAR('r')))
% \end{lstlisting}

% \noindent This gets quickly unreadable when the strings and
% regular expressions get more complicated. In other functional
% programming languages, you can explicitly write a conversion
% function that takes a string, say \dq{\pcode{for}}, and
% generates the regular expression above. But then your code is
% littered with such conversion functions.

% In Scala you can do better by ``hiding'' the conversion
% functions. The keyword for doing this is \code{implicit} and
% it needs a built-in library called 

% \begin{lstlisting}[numbers=none]
% scala.language.implicitConversions
% \end{lstlisting}

% \noindent
% Consider the code


% \begin{lstlisting}[language=Scala]
% import scala.language.implicitConversions

% def charlist2rexp(s: List[Char]) : Rexp = s match {
%   case Nil => EMPTY
%   case c::Nil => CHAR(c)
%   case c::s => SEQ(CHAR(c), charlist2rexp(s))
% }

% implicit def string2rexp(s: String) : Rexp = 
%   charlist2rexp(s.toList)
% \end{lstlisting}


% \noindent where the first seven lines implement a function
% that given a list of characters generates the corresponding
% regular expression. In Lines 9 and 10, this function is used
% for transforming a string into a regular expression. Since the
% \code{string2rexp}-function is declared as \code{implicit},
% the effect will be that whenever Scala expects a regular
% expression, but I only give it a string, it will automatically
% insert a call to the \code{string2rexp}-function. I can now
% write for example

% \begin{lstlisting}[numbers=none]
% scala> ALT("ab", "ac")
% res9 = ALT(SEQ(CHAR(a),CHAR(b)),SEQ(CHAR(a),CHAR(c)))
% \end{lstlisting}

% \noindent Recall that \code{ALT} expects two regular
% expressions as arguments, but I only supply two strings. The
% implicit conversion function will transform the string into a
% regular expression.

% Using implicit definitions, Scala allows me to introduce
% some further syntactic sugar for regular expressions:


% \begin{lstlisting}[ numbers=none]
% implicit def RexpOps(r: Rexp) = new {
%   def | (s: Rexp) = ALT(r, s)
%   def ~ (s: Rexp) = SEQ(r, s)
%   def % = STAR(r)
% }

% implicit def stringOps(s: String) = new {
%   def | (r: Rexp) = ALT(s, r)
%   def | (r: String) = ALT(s, r)
%   def ~ (r: Rexp) = SEQ(s, r)
%   def ~ (r: String) = SEQ(s, r)
%   def % = STAR(s)
% }
% \end{lstlisting}

 
% \noindent This might seem a bit overly complicated, but its effect is
% that I can now write regular expressions such as $ab + ac$ 
% simply as


% \begin{lstlisting}[numbers=none]
% scala> "ab" | "ac"
% res10 = ALT(SEQ(CHAR(a),CHAR(b)),SEQ(CHAR(a),CHAR(c)))
% \end{lstlisting}

 
% \noindent I leave you to figure out what the other
% syntactic sugar in the code above stands for.
 
% One more useful feature of Scala is the ability to define
% functions with varying argument lists. This is a feature that
% is already present in old languages, like C, but seems to have
% been forgotten in the meantime---Java does not have it. In the
% context of regular expressions this feature comes in handy:
% Say you are fed up with writing many alternatives as


% \begin{lstlisting}[numbers=none]
% ALT(..., ALT(..., ALT(..., ...)))
% \end{lstlisting}


% \noindent To make it difficult, you do not know how deep such
% alternatives are nested. So you need something flexible that
% can take as many alternatives as needed. In Scala one can
% achieve this by adding a \code{*} to the type of an argument.
% Consider the code


% \begin{lstlisting}[language=Scala]
% def Alts(rs: List[Rexp]) : Rexp = rs match {
%   case Nil => NULL
%   case r::Nil => r
%   case r::rs => ALT(r, Alts(rs))
% }

% def ALTS(rs: Rexp*) = Alts(rs.toList)
% \end{lstlisting}


% \noindent The function in Lines 1 to 5 takes a list of regular
% expressions and converts it into an appropriate alternative
% regular expression. In Line 7 there is a wrapper for this
% function which uses the feature of varying argument lists. The
% effect of this code  is that I can write the regular
% expression for keywords as


% \begin{lstlisting}[numbers=none]
% ALTS("for", "def", "yield", "implicit", "if", "match", "case")
% \end{lstlisting}


% \noindent Again I leave it to you to find out how much this
% simplifies the regular expression in comparison with if I had
% to write this by hand using only the ``plain'' regular
% expressions from the inductive datatype.

\bigskip\noindent
\textit{More TBD.}

\subsection*{Coursework}

\subsection*{More Info}

There is much more to Scala than I can possibly describe in
this document. Fortunately there are a number of free books
about Scala and of course lots of help online. For example

\begin{itemize}
\item \url{http://www.scala-lang.org/docu/files/ScalaByExample.pdf}
\item \url{http://www.scala-lang.org/docu/files/ScalaTutorial.pdf}
\item \url{https://www.youtube.com/user/ShadowofCatron}
\item \url{http://docs.scala-lang.org/tutorials}
\item \url{https://www.scala-exercises.org}
\item \url{https://twitter.github.io/scala_school}
\end{itemize}
 
\noindent There is also a course at Coursera on Functional
Programming Principles in Scala by Martin Odersky, the main
developer of the Scala language. And a document that explains
Scala for Java programmers

\begin{itemize}
\item \small\url{http://docs.scala-lang.org/tutorials/scala-for-java-programmers.html}
\end{itemize}

While I am quite enthusiastic about Scala, I am also happy to
admit that it has more than its fair share of faults. The
problem seen earlier of having to give an explicit type to
\code{toSet}, but not \code{toList} is one of them. There are
also many ``deep'' ideas about types in Scala, which even to
me as seasoned functional programmer are puzzling. Whilst
implicits are great, they can also be a source of great
headaches, for example consider the code:

\begin{lstlisting}[numbers=none]
scala>  List (1, 2, 3) contains "your mom"
res1: Boolean = false
\end{lstlisting}

\noindent Rather than returning \code{false}, this code should
throw a typing-error. There are also many limitations Scala
inherited from the JVM that can be really annoying. For
example a fixed stack size. One can work around this
particular limitation, but why does one have to?
More such `puzzles' can be found at

\begin{center}
  \url{http://scalapuzzlers.com} and
  \url{http://latkin.org/blog/2017/05/02/when-the-scala-compiler-doesnt-help/}
\end{center}

Even if Scala has been a success in several high-profile
companies, there is also a company (Yammer) that first used
Scala in their production code, but then moved away from it.
Allegedly they did not like the steep learning curve of Scala
and also that new versions of Scala often introduced
incompatibilities in old code. In the past two months
there have also been two forks of the Scala compiler.
It needs to be seen what the future brings for Scala.

%So all in all, Scala might not be a great teaching language,
%but I hope this is mitigated by the fact that I never require
%you to write any Scala code. You only need to be able to read
%it. In the coursework you can use any programming language you
%like. If you want to use Scala for this, then be my guest; if
%you do not want, stick with the language you are most familiar
%with.



\begin{flushright}\it
There are only two kinds of languages: the ones people complain 
about\\ and the ones nobody uses.\smallskip\\
\mbox{}\hfill\small{}---Bjarne Stroustrup (the inventor of C++)
\end{flushright}

\end{document}

%%% Local Variables: 
%%% mode: latex
%%% TeX-master: t
%%% End: