--- a/handouts/scala-ho.tex Sat Sep 06 15:28:59 2014 +0100
+++ b/handouts/scala-ho.tex Sat Sep 06 21:53:41 2014 +0100
@@ -9,7 +9,7 @@
\section*{A Crash-Course on Scala}
Scala is a programming language that combines functional and
-object-oriented programming-styles, and has received in the
+object-oriented programming-styles. It has received in the
last five years or so quite a bit of attention. One reason for
this attention is that, like the Java programming language,
Scala compiles to the Java Virtual Machine (JVM) and therefore
@@ -33,13 +33,13 @@
language you like. I use Scala for showing you code during the
lectures because its functional programming-style allows me to
implement the functions we will discuss with very small
-code-snippets. If I had to do this in Java, for example, I
-would first have to go through heaps of boilerplate code and
-the code-snippets would not look pretty. Since the Scala
-compiler is free, you can download the code-snippets and run
-every example I give. But if you prefer, you can also easily
-translate them into any other functional language, for example
-Haskell, Standard ML, F$^\#$, Ocaml and so on.
+code-snippets. If I had to do this in Java, I would first have
+to go through heaps of boilerplate code and the code-snippets
+would not look pretty. Since the Scala compiler is free, you
+can download the code-snippets and run every example I give.
+But if you prefer, you can also easily translate them into any
+other functional language, for example Haskell, Standard ML,
+F$^\#$, Ocaml and so on.
Developing programs in Scala can be done with the Eclipse IDE
and also with IntelliJ IDE, but for the small programs I will
@@ -65,7 +65,7 @@
where you installed Scala. At the Scala prompt you can type
things like \code{2 + 3} \keys{Ret} and the output will be
-\begin{lstlisting}[language=Scala,numbers=none]
+\begin{lstlisting}[numbers=none]
scala> 2 + 3
res0: Int = 5
\end{lstlisting}
@@ -74,7 +74,7 @@
type \code{Int} and the actual result is 5. Another classic
example you can try out is
-\begin{lstlisting}[language=Scala,numbers=none]
+\begin{lstlisting}[numbers=none]
scala> print("hello world")
hello world
\end{lstlisting}
@@ -94,9 +94,9 @@
If you want to write a stand-alone app in Scala, you can
implement an object that is an instance of \code{App}, say
-\begin{lstlisting}[language=Scala,numbers=none]
+\begin{lstlisting}[numbers=none]
object Hello extends App {
- println ("hello world")
+ println("hello world")
}
\end{lstlisting}
@@ -160,11 +160,11 @@
cases we do have arguments. For example the character regular
expression needs to take as an argument the character it is
supposed to recognise. In Scala, the cases without arguments
-are called \emph{case objects}, while the ones with arguments
-are \emph{case classes}. The corresponding Scala code is as
-follows:
+are called \emph{case objects}, whereas the ones with
+arguments are \emph{case classes}. The corresponding Scala
+code is as follows:
-\begin{lstlisting}[language=Scala,numbers=none]
+\begin{lstlisting}[numbers=none]
abstract class Rexp
case object NULL extends Rexp
case object EMPTY extends Rexp
@@ -190,7 +190,7 @@
in Scala. If you want to assign this regular expression to a
variable, you can use the keyword \code{val} and type
-\begin{lstlisting}[language=Scala,numbers=none]
+\begin{lstlisting}[numbers=none]
scala> val r = ALT(CHAR('a'), CHAR('b'))
r: ALT = ALT(CHAR(a),CHAR(b))
\end{lstlisting}
@@ -213,7 +213,7 @@
different if you want to form a list of regular expressions,
for example
-\begin{lstlisting}[language=Scala,numbers=none]
+\begin{lstlisting}[numbers=none]
scala> val ls = List(ALT(CHAR('a'), CHAR('b')), NULL)
ls: List[Rexp] = List(ALT(CHAR(a),CHAR(b)), NULL)
\end{lstlisting}
@@ -243,7 +243,7 @@
which corresponds to a famous unsolved problem in
mathematics.\footnote{See for example
\url{http://mathworld.wolfram.com/CollatzProblem.html}.}
-Mathematician define this series as:
+Mathematicians define this series as:
\[
collatz_{n + 1} \dn
@@ -254,7 +254,7 @@
\]
\noindent The famous unsolved question is whether this
-series started with any $n > 0$ as $collaz_0$ will always
+series started with any $n > 0$ as $collatz_0$ will always
return to $1$. This is obvious when started with $1$, and also
with $2$, but already needs a bit of head-scratching for the
case of $3$.
@@ -262,25 +262,23 @@
If we want to avoid the head-scratching, we could implement
this as the following function in Scala:
-
-\lstinputlisting[language=Scala,numbers=none]{../progs/collatz.scala}
-
+\lstinputlisting[numbers=none]{../progs/collatz.scala}
\noindent The keyword for function definitions is \code{def}
followed by the name of the function. After that you have a
list of arguments (enclosed in parentheses and separated by
-commas). Each argument in this list needs its type annotated.
-In this case we only have one argument, which is of type
-\code{BigInt}. This type stands in Scala for arbitrary precision
-integers (in case you want to try out the function on really
-big numbers). After the arguments comes the type of what the
-function returns---a Boolean in this case for indicating that
-the function has reached 1. Finally, after the \code{=}
-comes the \emph{body} of the function implementing what the
-function is supposed to do. What the \code{collatz} function
-does should be pretty self-explanatory: the function first
-tests whether \code{n} is equal to 1 in which case it returns
-\code{true} and so on.
+commas). Each argument in this list needs its type to be
+annotated. In this case we only have one argument, which is of
+type \code{BigInt}. This type stands in Scala for arbitrary
+precision integers (in case you want to try out the function
+on really big numbers). After the arguments comes the type of
+what the function returns---a Boolean in this case for
+indicating that the function has reached 1. Finally, after the
+\code{=} comes the \emph{body} of the function implementing
+what the function is supposed to do. What the \code{collatz}
+function does should be pretty self-explanatory: the function
+first tests whether \code{n} is equal to 1 in which case it
+returns \code{true} and so on.
Notice a quirk in Scala's syntax for \code{if}s: The condition
needs to be enclosed in parentheses and the then-case comes
@@ -293,14 +291,13 @@
sequence of \code{if}s. This can be very cumbersome and brittle
if there are many cases. If we wanted to define a function
over regular expressions in Java, for example, which does not
-have pattern-matching, the resulting code would be just
+have pattern-matching, the resulting code would just be
awkward.
Mathematicians already use the power of pattern-matching when
they define the function that takes a regular expression and
produces another regular expression that can recognise the
-reversed strings. The resulting recursive function is often
-defined as follows:
+reversed strings. They define this function as follows:
\begin{center}
\begin{tabular}{r@{\hspace{2mm}}c@{\hspace{2mm}}l}
@@ -313,17 +310,17 @@
\end{tabular}
\end{center}
-\noindent This function is defined by recursion analysing each
-pattern of what the regular expression could look like. The
-corresponding Scala code looks very similar to this
-definition, thanks to pattern-matching.
+\noindent It is defined by recursion analysing each pattern of
+what the regular expression might look like. The corresponding
+Scala code looks very similar to this definition, thanks to
+pattern-matching.
\lstinputlisting[language=Scala]{../progs/rev.scala}
\noindent The keyword for starting a pattern-match is
-\code{match} followed by a list of \code{case}s. Before the match
-keyword can be another pattern, but often as in the case
-above, it is just a variable you want to pattern-match
+\code{match} followed by a list of \code{case}s. Before the
+match keyword can be another pattern, but often, as in the
+case above, it is just a variable you want to pattern-match
(the \code{r} after \code{=} in Line 1).
Each case in this definition follows the structure of how we
@@ -345,7 +342,7 @@
In Scala each pattern-match can also be guarded as in
-\begin{lstlisting}[language=Scala, numbers=none]
+\begin{lstlisting}[ numbers=none]
case Pattern if Condition => Do_Something
\end{lstlisting}
@@ -356,27 +353,28 @@
\noindent Although in this particular case the pattern-match
-does not improve the code in any way. In cases like \code{rev}
-it is really crucial. The underscore in Line 4 indicates that
-we do not care what the patter8n looks like. Thus this case
-acts like a default case whenever the cases above did not
-match. Cases are always tried out from top to bottom.
+does not improve the code in any way. In cases like
+\code{rev}, however, it is really crucial. The underscore in
+Line 4 indicates that we do not care what the pattern looks
+like. Thus this case acts like a default case whenever the
+cases above did not match. Cases are always tried out from top
+to bottom.
\subsection*{Loops, or the Absence of}
Coming from Java or C, you might be surprised that Scala does
not really have loops. It has instead, what is in functional
-programming called \emph{maps}. To illustrate how they work,
+programming called ,\emph{maps}. To illustrate how they work,
lets assume you have a list of numbers from 1 to 8 and want to
build the list of squares. The list of numbers from 1 to 8
can be constructed in Scala as follows:
-\begin{lstlisting}[language=Scala,numbers=none]
+\begin{lstlisting}[numbers=none]
scala> (1 to 8).toList
res1: List[Int] = List(1, 2, 3, 4, 5, 6, 7, 8)
\end{lstlisting}
-\noindent Generating from this list the list of squares in a
+\noindent Generating from this list, the list of squares in a
programming language such as Java, you would assume the list
is given as a kind of array. You would then iterate, or loop,
an index over this array and replace each entry in the array
@@ -390,7 +388,7 @@
\emph{for-comprehension}. Squaring the numbers from 1 to 8
would look in this form as follows:
-\begin{lstlisting}[language=Scala,numbers=none]
+\begin{lstlisting}[numbers=none]
scala> for (n <- (1 to 8).toList) yield n * n
res2: List[Int] = List(1, 4, 9, 16, 25, 36, 49, 64)
\end{lstlisting}
@@ -404,7 +402,7 @@
expressed in a second way in Scala by using directly
\code{map}s as follows:
-\begin{lstlisting}[language=Scala,numbers=none]
+\begin{lstlisting}[numbers=none]
scala> (1 to 8).toList.map(n => n * n)
res3 = List(1, 4, 9, 16, 25, 36, 49, 64)
\end{lstlisting}
@@ -425,7 +423,7 @@
For example if we instead compute the reminders modulo 3 of
this list, we can write
-\begin{lstlisting}[language=Scala,numbers=none]
+\begin{lstlisting}[numbers=none]
scala> (1 to 8).toList.map(n => n % 3)
res4 = List(1, 2, 0, 1, 2, 0, 1, 2)
\end{lstlisting}
@@ -434,7 +432,7 @@
into a list, but into a set, and then compute the reminders
modulo 3 we obtain
-\begin{lstlisting}[language=Scala,numbers=none]
+\begin{lstlisting}[numbers=none]
scala> (1 to 8).toSet[Int].map(n => n % 3)
res5 = Set(2, 1, 0)
\end{lstlisting}
@@ -447,7 +445,7 @@
just syntactic variants of each other, the latter can also be
written as
-\begin{lstlisting}[language=Scala,numbers=none]
+\begin{lstlisting}[numbers=none]
scala> for (n <- (1 to 8).toSet[Int]) yield n % 3
res5 = Set(2, 1, 0)
\end{lstlisting}
@@ -456,9 +454,9 @@
elements can be guarded. For example if we want to pair up
the numbers 1 to 4 with the letters a to c, we can write
-\begin{lstlisting}[language=Scala,numbers=none]
+\begin{lstlisting}[numbers=none]
scala> for (n <- (1 to 4).toList;
- l <- ('a' to 'c').toList) yield (n, l)
+ m <- ('a' to 'c').toList) yield (n, m)
res6 = List((1,a), (1,b), (1,c), (2,a), (2,b), (2,c),
(3,a), (3,b), (3,c), (4,a), (4,b), (4,c))
\end{lstlisting}
@@ -467,7 +465,7 @@
Or if we want to find all pairs of numbers between 1 and 3
where the sum is an even number, we can write
-\begin{lstlisting}[language=Scala,numbers=none]
+\begin{lstlisting}[numbers=none]
scala> for (n <- (1 to 3).toList;
m <- (1 to 3).toList;
if (n + m) % 2 == 0) yield (n, m)
@@ -489,7 +487,7 @@
side-effect. Printing out the list of numbers from 1 to 5
would look as follows
-\begin{lstlisting}[language=Scala,numbers=none]
+\begin{lstlisting}[numbers=none]
scala> for (n <- (1 to 5).toList) print(n)
12345
\end{lstlisting}
@@ -498,7 +496,7 @@
where you need to omit the keyword \code{yield}. You can
also do more elaborate calculations such as
-\begin{lstlisting}[language=Scala,numbers=none]
+\begin{lstlisting}[numbers=none]
scala> for (n <- (1 to 5).toList) {
val square_n = n * n
println(s"$n * $n = $square_n")
@@ -511,18 +509,18 @@
\end{lstlisting}
\noindent In this code I use a variable assignment (\code{val
-square_n = ...} ) and what is called a
-\emph{string interpolation}, written \code{s"..."}, in order to
-print out an equation. The string interpolation allows me to
-refer to the integer values \code{n} and \code{square\_n} inside
-a string. This is very convenient for printing out ``things''.
+square_n = ...} ) and also what is called in Scala a
+\emph{string interpolation}, written \code{s"..."}. The latter
+is for printing out an equation. It allows me to refer to the
+integer values \code{n} and \code{square\_n} inside a string.
+This is very convenient for printing out ``things''.
The corresponding map construction for functions with
side-effects is in Scala called \code{foreach}. So you
could also write
-\begin{lstlisting}[language=Scala,numbers=none]
+\begin{lstlisting}[numbers=none]
scala> (1 to 5).toList.foreach(n => print(n))
12345
\end{lstlisting}
@@ -530,7 +528,7 @@
\noindent or even just
-\begin{lstlisting}[language=Scala,numbers=none]
+\begin{lstlisting}[numbers=none]
scala> (1 to 5).toList.foreach(print)
12345
\end{lstlisting}
@@ -548,7 +546,7 @@
\subsection*{Types}
-In most functional programming languages types play an
+In most functional programming languages, types play an
important role. Scala is such a language. You have already
seen built-in types, like \code{Int}, \code{Boolean},
\code{String} and \code{BigInt}, but also user-defined ones,
@@ -580,7 +578,7 @@
this pattern. One is for tuples, where the type is written
with parentheses. For example
-\begin{lstlisting}[language=Scala, numbers=none]
+\begin{lstlisting}[ numbers=none]
(Int, Int, String)
\end{lstlisting}
@@ -591,7 +589,7 @@
you might define:
-\begin{lstlisting}[language=Scala, numbers=none]
+\begin{lstlisting}[ numbers=none]
def quo_rem(m: Int, n: Int) : (Int, Int) = (m / n, m % n)
\end{lstlisting}
@@ -604,7 +602,7 @@
``packaged'' in a pair. Consequently the complete type of
\code{quo_rem} is
-\begin{lstlisting}[language=Scala, numbers=none]
+\begin{lstlisting}[ numbers=none]
(Int, Int) => (Int, Int)
\end{lstlisting}
@@ -613,7 +611,7 @@
\code{Int => String} is for a function that takes an integer as argument
and produces a string. A function of this type is for instance
-\begin{lstlisting}[language=Scala,numbers=none]
+\begin{lstlisting}[numbers=none]
def mk_string(n: Int) : String = n match {
case 0 => "zero"
case 1 => "one"
@@ -638,7 +636,7 @@
function of this type would be
-\begin{lstlisting}[language=Scala,numbers=none]
+\begin{lstlisting}[numbers=none]
def chk_string(n: Int)(s: String) : Boolean =
mk_string(n) == s
\end{lstlisting}
@@ -652,7 +650,7 @@
of this function then?). This way of specifying the arguments
can be useful, for example in situations like this
-\begin{lstlisting}[language=Scala,numbers=none]
+\begin{lstlisting}[numbers=none]
scala> List("one", "two", "three", "many").map(chk_string(2))
res4 = List(false, true, false, false)
@@ -679,16 +677,16 @@
happens to be another function (taking an integer as input and
returning a string). Remember that \code{mk_string} is just
such a function. So how can we use it? For this define
-the somewhat silly function \code{apply_3}
+the somewhat silly function \code{apply_3}:
-\begin{lstlisting}[language=Scala,numbers=none]
+\begin{lstlisting}[numbers=none]
def apply_3(f: Int => String): Bool = f(3) == "many"
scala> apply_3(mk_string)
res6 = true
\end{lstlisting}
-You might ask, apart from silly functions like above, what is
+You might ask: Apart from silly functions like above, what is
the point of having function as arguments to other functions?
In Java there is indeed no need of this kind of feature. But
in all functional programming languages, including Scala, it
@@ -700,7 +698,7 @@
numbers are printed.
-\begin{lstlisting}[language=Scala,numbers=none]
+\begin{lstlisting}[numbers=none]
scala> (1 to 5).toList.foreach(print)
12345
scala> (1 to 5).toList.foreach(println)
@@ -749,7 +747,7 @@
the following code-snippet for reading a web-page.
-\begin{lstlisting}[language=Scala, numbers=none]
+\begin{lstlisting}[ numbers=none]
import io.Source
val url = """http://www.inf.kcl.ac.uk/staff/urbanc/"""
Source.fromURL(url).take(10000).mkString
@@ -759,43 +757,42 @@
\noindent These three lines return a string containing the
HTML-code of my webpage. It actually already does something
more sophisticated, namely only returns the first 10000
-characters of a webpage in case a ``webpage'' is too large.
-Why is that code-snippet of any interest? Well, try
-implementing reading-from-a-webpage in Java. I also like the
-possibility of triple-quoting strings, which I have only seen
-in Scala so far. The idea behind this is that in such a string
-all characters are interpreted literally---there are no
-escaped characters, like \verb|\n| for newlines.
+characters of a webpage in case it is too large. Why is that
+code-snippet of any interest? Well, try implementing
+reading-from-a-webpage in Java. I also like the possibility of
+triple-quoting strings, which I have only seen in Scala so
+far. The idea behind this is that in such a string all
+characters are interpreted literally---there are no escaped
+characters, like \verb|\n| for newlines.
My second wow-moment I had with a feature of Scala that other
-functional programming languages also do not have. This
-feature is about implicit type conversions. If you have
-regular expressions and want to use them for language
-processing you often want to recognise keywords in a language,
-for example \code{for},{} \code{if},{} \code{yield} and so on. But
-the basic regular expression, \code{CHAR}, can only recognise
-a single character. In order to recognise a whole string, like
-\code{ for}, you have to put many of those together using
+functional programming languages do not have. This feature is
+about implicit type conversions. If you have regular
+expressions and want to use them for language processing you
+often want to recognise keywords in a language, for example
+\code{for},{} \code{if},{} \code{yield} and so on. But the
+basic regular expression \code{CHAR} can only recognise a
+single character. In order to recognise a whole string, like
+\code{for}, you have to put many of those together using
\code{SEQ}:
-\begin{lstlisting}[language=Scala,numbers=none]
+\begin{lstlisting}[numbers=none]
SEQ(CHAR('f'), SEQ(CHAR('o'), CHAR('r')))
\end{lstlisting}
-
\noindent This gets quickly unreadable when the strings and
regular expressions get more complicated. In other functional
programming language, you can explicitly write a conversion
-function that takes a string, say \code{for}, and generates the
-regular expression above. But then your code is littered with
-such conversion function.
+function that takes a string, say \dq{\pcode{for}}, and
+generates the regular expression above. But then your code is
+littered with such conversion functions.
In Scala you can do better by ``hiding'' the conversion
functions. The keyword for doing this is \code{implicit} and
it needs a built-in library called
-\begin{lstlisting}[language=Scala,numbers=none]
+\begin{lstlisting}[numbers=none]
scala.language.implicitConversions
\end{lstlisting}
@@ -821,27 +818,27 @@
that given a list of characters generates the corresponding
regular expression. In Lines 9 and 10, this function is used
for transforming a string into a regular expression. Since the
-\code{string2rexp}-function is declared as \code{implicit} the
-effect will be that whenever Scala expects a regular
+\code{string2rexp}-function is declared as \code{implicit},
+the effect will be that whenever Scala expects a regular
expression, but I only give it a string, it will automatically
insert a call to the \code{string2rexp}-function. I can now
write for example
-\begin{lstlisting}[language=Scala,numbers=none]
+\begin{lstlisting}[numbers=none]
scala> ALT("ab", "ac")
res9 = ALT(SEQ(CHAR(a),CHAR(b)),SEQ(CHAR(a),CHAR(c)))
\end{lstlisting}
-\noindent \code{ALT} expects two regular expressions
-as arguments, but I only supply two strings. The implicit
-conversion function will transform the string into
-a regular expression.
+\noindent Recall that \code{ALT} expects two regular
+expressions as arguments, but I only supply two strings. The
+implicit conversion function will transform the string into a
+regular expression.
Using implicit definitions, Scala allows me to introduce
some further syntactic sugar for regular expressions:
-\begin{lstlisting}[language=Scala, numbers=none]
+\begin{lstlisting}[ numbers=none]
implicit def RexpOps(r: Rexp) = new {
def | (s: Rexp) = ALT(r, s)
def ~ (s: Rexp) = SEQ(r, s)
@@ -863,7 +860,7 @@
even simpler as
-\begin{lstlisting}[language=Scala,numbers=none]
+\begin{lstlisting}[numbers=none]
scala> "ab" | "ac"
res10 = ALT(SEQ(CHAR(a),CHAR(b)),SEQ(CHAR(a),CHAR(c)))
\end{lstlisting}
@@ -873,14 +870,14 @@
syntactic sugar in the code above stands for.
One more useful feature of Scala is the ability to define
-functions with variable argument lists. This is a feature that
+functions with varying argument lists. This is a feature that
is already present in old languages, like C, but seems to have
been forgotten in the meantime---Java does not have it. In the
context of regular expressions this feature comes in handy:
Say you are fed up with writing many alternatives as
-\begin{lstlisting}[language=Scala,numbers=none]
+\begin{lstlisting}[numbers=none]
ALT(..., ALT(..., ALT(..., ...)))
\end{lstlisting}
@@ -911,14 +908,14 @@
expression for keywords as
-\begin{lstlisting}[language=Scala,numbers=none]
+\begin{lstlisting}[numbers=none]
ALTS("for", "def", "yield", "implicit", "if", "match", "case")
\end{lstlisting}
-\noindent Again I leave you to it to find out how much this
-simplifies the regular expression in comparison if I had to
-write this by hand using only the ``plain'' regular
+\noindent Again I leave it to you to find out how much this
+simplifies the regular expression in comparison with if I had
+to write this by hand using only the ``plain'' regular
expressions from the inductive datatype.
\subsection*{More Info}
@@ -946,13 +943,11 @@
implicits are great, they can also be a source of great
headaches, for example consider the code:
-
-\begin{lstlisting}[language=Scala,numbers=none]
+\begin{lstlisting}[numbers=none]
scala> List (1, 2, 3) contains "your mom"
res1: Boolean = false
\end{lstlisting}
-
\noindent Rather than returning \code{false}, this code should
throw a typing-error. There are also many limitations Scala
inherited from the JVM that can be really annoying. For
@@ -964,7 +959,9 @@
Scala in their production code, but then moved away from it.
Allegedly they did not like the steep learning curve of Scala
and also that new versions of Scala often introduced
-incompatibilities in old code.
+incompatibilities in old code. In the past two months
+there have also been two forks of the Scala compiler.
+It needs to be seen what the future brings for Scala.
So all in all, Scala might not be a great teaching language,
but I hope this is mitigated by the fact that I never require