updated
authorChristian Urban <christian dot urban at kcl dot ac dot uk>
Tue, 26 Aug 2014 17:26:06 +0100
changeset 229 00c4fda3d6c5
parent 228 4df4404455d0
child 230 0fd668d7b619
updated
handouts/scala-ho.pdf
handouts/scala-ho.tex
Binary file handouts/scala-ho.pdf has changed
--- a/handouts/scala-ho.tex	Sun Aug 24 10:49:21 2014 +0100
+++ b/handouts/scala-ho.tex	Tue Aug 26 17:26:06 2014 +0100
@@ -6,7 +6,7 @@
 \usepackage{amsmath}
 \usepackage{../langs}
 \usepackage{mathpazo}
-\usepackage[scaled=.95]{helvet}
+\usepackage[scaled=.9]{helvet}
 
 \newcommand{\dn}{\stackrel{\mbox{\scriptsize def}}{=}}%
 \definecolor{codegray}{gray}{0.9}
@@ -19,14 +19,14 @@
 Scala is programming language that combines functional and
 object-oriented programming-styles, and has received in the
 last five years quite a bit of attention. One reason for this
-attention is that, like the Java programming language, it
+attention is that, like the Java programming language, Scala
 compiles to the Java Virtual Machine (JVM) and therefore can
 run under MacOSX, Linux and Windows.\footnote{There are also
 experimental backends for Android and JavaScript.} Unlike
 Java, however, Scala often allows programmers to write concise
-and elegant code; some therefore say Scala is the much better
-Java. If you want to try it out, the Scala compiler can be
-downloaded from
+and elegant code. Some therefore say Scala is the much better
+Java. If you want to try it out yourself, the Scala compiler
+can be downloaded from
 
 \begin{quote}
 \url{http://www.scala-lang.org}
@@ -36,22 +36,22 @@
 \emph{any} part of the programming coursework in \emph{any}
 programming language you like. I use Scala for showing you
 code during the lectures because its functional
-programming-style allows me to implement some of the functions
-we will discuss with very small and elegant code. Since the
-compiler is free, you can download it and run every example I
-give. But if you prefer, you can also translate the examples
+programming-style allows me to implement the functions we will
+discuss with very small code-snippets. Since the compiler is
+free, you can download them and run every example I give. But
+if you prefer, you can also easily translate the code-snippets
 into any other functional language, for example Haskell, ML,
 F\# and so on.
 
 Writing programs in Scala can be done with the Eclipse IDE and
-also with IntelliJ, but for the small programs we will look at
-the Emacs-editor id good for me and I will run programs on the
-command line. One advantage of Scala over Java is that it
-includes an interpreter (a REPL, or Read-Eval-Print-Loop) with
-which you can run and test small code-snippets without the
-need of the compiler. This helps a lot for interactively
-developing programs. Once you installed Scala correctly, you
-can start the interpreter by typing
+also with IntelliJ, but for the small programs I will develop
+the good old Emacs-editor is adequate for me and I will run
+the programs on the command line. One advantage of Scala over Java
+is that it includes an interpreter (a REPL, or
+Read-Eval-Print-Loop) with which you can run and test small
+code-snippets without the need of the compiler. This helps a
+lot with interactively developing programs. Once you installed
+Scala correctly, you can start the interpreter by typing
 
 
 \begin{alltt}
@@ -63,7 +63,7 @@
 scala>
 \end{alltt}
 
-\noindent The precise output may vary due to the platform
+\noindent The precise response may vary due to the platform
 where you installed Scala. At the scala prompt you can type
 things like {\tt 2 + 3} \keys{Ret} and the output will be
 
@@ -74,30 +74,63 @@
 
 \noindent indicating that the result of the addition is of
 type {\tt Int} and the actual result is {\tt 5}. Another
-example you can type in is
+classic example you can try out is
 
 \begin{alltt}
 scala> print ("hello world")
 hello world
 \end{alltt}
 
-\noindent which prints out a string. Note that in this case
-there is no result: the reason is that {\tt print} does not
-actually produce a result (there is no {\tt res\_}), rather it
-is a function that causes the \emph{side-effect} of printing
-out a string. Once you are more familiar with the functional
-programming-style, you will know what the difference is
-between a function that returns a result, like addition, and a
-function that causes a side-effect, like {\tt print}. We shall
-come back to this later, but if you are curious, the latter
+\noindent Note that in this case there is no result: the
+reason is that {\tt print} does not actually produce a result
+(there is no {\tt resXX}), rather it is a function that causes
+the \emph{side-effect} of printing out a string. Once you are
+more familiar with the functional programming-style, you will
+know what the difference is between a function that returns a
+result, like addition, and a function that causes a
+side-effect, like {\tt print}. We shall come back to this
+point in due course, but if you are curious now, the latter
 kind of functions always have as return type {\tt Unit}.
 
+If you want to write a stand-alone app, you can implement
+an object that is an instance of {\tt App}, say
+
+\begin{quote}
+\begin{lstlisting}[language=Scala]
+object Hello extends App {
+    println ("hello world")
+}
+\end{lstlisting}
+\end{quote}
+
+\noindent save it in a file, say {\tt hellow-world.scala}, and
+then run the compiler and runtime environment:
+
+\begin{alltt}
+$ scalac hello-world.scala
+$ scala Hello
+hello world
+\end{alltt}
+
+As mentioned above, Scala targets the JVM and
+consequently Scala programs can also be executed by the
+bog-standard Java runtime. This only requires the inclusion of
+{\tt scala-library.jar}, which on my computer can be done as
+follows:
+
+\begin{alltt}
+$ scalac hello-world.scala
+$ java -cp /usr/local/src/scala/lib/scala-library.jar:. Hello
+hello world
+\end{alltt}
+
+
 \subsection*{Inductive Datatypes}
 
-The elegance and conciseness of Scala programs stems often
-from the fact that inductive datatypes can be easily defined.
-For example in ``every-day Mathematics'' we would define
-regular expressions simply by the grammar
+The elegance and conciseness of Scala programs are often a
+result of inductive datatypes that can be easily defined. For
+example in ``every-day mathematics'' we would define regular
+expressions simply by giving the grammar
 
 \begin{center}
 \begin{tabular}{r@{\hspace{2mm}}r@{\hspace{2mm}}l@{\hspace{13mm}}l}
@@ -112,17 +145,18 @@
 
 \noindent This grammar specifies what regular expressions are
 (essentially a kind of tree-structure with three kinds of
-inner nodes and three kinds of leave nodes). If you are
+inner nodes---sequence, alternative and star---and three kinds
+of leave nodes---null, empty and character). If you are
 familiar with Java, it might be an instructive exercise to
 define this kind of inductive datatypes in
 Java.\footnote{Happy programming! ;o)}
 
 Implementing the regular expressions from above in Scala is
 very simple: It first requires an \emph{abstract class}, say,
-{\tt Rexp}. This will act as type for regular expressions.
+{\tt Rexp}. This will act as the type for regular expressions.
 Second, it requires some instances. The cases for
 $\varnothing$ and $\epsilon$ do not have any arguments, while
-in the other cases we do have arguments. For example the
+in all the other cases we do have arguments. For example the
 character regular expression needs to take as an argument the
 character it is supposed to recognise. In Scala, the cases
 without arguments are called \emph{case objects}, while the
@@ -141,57 +175,435 @@
 \end{lstlisting}
 \end{quote}
 
-\noindent Given the grammar above, I hope you can see the
-underlying pattern. In order to be an instance of {\tt Rexp},
-each case object or case class needs to extend {\tt Rexp}. If
-you want to play with such definitions, feel free to define
-for example binary trees.
+\noindent In order to be an instance of {\tt Rexp}, each case
+object and case class needs to extend {\tt Rexp}. Given the
+grammar above, I hope you can see the underlying pattern. If
+you want to play further with such definitions, feel free to
+define for example binary trees.
 
-Once you make a definition like the one above, you can 
-represent, say, the regular expression for $a + b$ as
-{\tt ALT(CHAR('a'), CHAR('b'))}. If you want to assign 
-this regular expression to a variable, you can just type
+Once you make a definition like the one above, you can
+represent, for example, the regular expression for $a + b$ in
+Scala as {\tt ALT(CHAR('a'), CHAR('b'))}. Expressions such as
+{\tt 'a'} stand for ASCII characters. If you want to assign
+this regular expression to a variable, you can use the keyword
+{\tt val} and type
 
 \begin{alltt}
 scala> val r = ALT(CHAR('a'), CHAR('b'))
 r: ALT = ALT(CHAR(a),CHAR(b))
 \end{alltt}
 
-\noindent In order to make such assignments there is no
-constructor need in the class (like in Java). However, if
-there is the need, you can of course define such a constructor
-in Scala. 
+\noindent As you can see, in order to make such assignments,
+no constructor is required in the class (as in Java). However,
+if there is the need for some non-standard initialisation, you
+can of course define such a constructor in Scala. But we omit
+this here. 
 
-Note that Scala says the variable {\tt r} is of type {\tt
-ALT}, not {\tt Rexp}. Scala always tries to find the most
-general type that is needed for a variable, but does not
-``over-generalise''. In this case there is no need to give
-{\tt r} the more general type of {\tt Rexp}. This is different
-if you want to form a list of regular expressions, for example
+Note that Scala in its response says the variable {\tt r} is
+of type {\tt ALT}, not {\tt Rexp}. This might be a bit
+unexpected, but can be explained as follows: Scala always
+tries to find the most general type that is needed for a
+variable or expression, but does not ``over-generalise''. In
+this case there is no need to give {\tt r} the more general
+type of {\tt Rexp}. This is different if you want to form a
+list of regular expressions, for example
 
 \begin{alltt}
 scala> val ls = List(ALT(CHAR('a'), CHAR('b')), NULL)
 ls: List[Rexp] = List(ALT(CHAR(a),CHAR(b)), NULL)
 \end{alltt}
 
-\noindent In this case Scala needs to assign a type to the
-regular expressions, so that it is compatible with the fact
-that list can only contain elements of a single type, in this
-case this is {\tt Rexp}.\footnote{If you type in this example,
-you will notice that the type contains some further
-information, but lets ignore this for the moment.} Note that if a type takes another
-type as argument, this is written for example as
-{\tt List[Rexp]}.
+\noindent In this case Scala needs to assign a common type to
+the regular expressions, so that it is compatible with the
+fact that lists can only contain elements of a single type, in
+this case the type is {\tt Rexp}.\footnote{If you type in this
+example, you will notice that the type contains some further
+information, but lets ignore this for the moment.} 
+
+For types like {\tt List[...]} the general rule is that when a
+type takes another type as argument, then this is written in
+angle-brackets. This can also contain nested types as in {\tt
+List[Set[Rexp]]}, which is a list of sets each of which
+contains regular expressions.
 
 \subsection*{Functions and Pattern-Matching}
 
+I mentioned above that Scala is a very elegant programming
+language for the code we will write in this module. This
+elegance mainly stems from the fact that functions can be
+implemented very easily in Scala. Lets first consider a
+problem from number theory, called the \emph{Collatz-series},
+which corresponds to a famous unsolved problem in
+mathematics.\footnote{See for example
+\url{http://mathworld.wolfram.com/CollatzProblem.html}.}
+Mathematician define this series as:
+
+\[
+collatz_{n + 1} \dn 
+\begin{cases}
+\frac{1}{2} * collatz_n & \text{if $collatz_n$ is even}\\
+3 * collatz_n + 1 & \text{if $collatz_n$ is odd}
+\end{cases}
+\]
+
+\noindent The famous unsolved question is whether this
+series started with any $n > 0$ as $collaz_0$ will always
+return to $1$. This is obvious when started with $1$, and also
+with $2$, but already needs a bit of head-scratching for the
+case of $3$.
+
+If we want to avoid the head-scratching, we could implement
+this as the following function in Scala:
+
+\begin{quote}
+\lstinputlisting[language=Scala]{../progs/collatz.scala}
+\end{quote}
+
+\noindent The keyword for function definitions is {\tt def}
+followed by the name of the function. After that you have a
+list of arguments (enclosed in parentheses and separated by
+commas). Each argument in this list needs its type annotated.
+In this case we only have one argument, which is of type {\tt
+BigInt}. This type stands for arbitrary precision integers.
+After the arguments comes the type of what the function
+returns---a Boolean in this case for indicating that the
+function has reached {\tt 1}. Finally, after the {\tt =} comes
+the \emph{body} of the function implementing what the function
+is supposed to do. What the {\tt collatz} function does should
+be pretty self-explanatory: the function first tests whether
+{\tt n} is equal to $1$ in which case it returns {\tt true}
+and so on.
+
+Notice a quirk in Scala's syntax for {\tt if}s: The condition
+needs to be enclosed in parentheses and the then-case comes
+right after the condition---there is no {\tt then} keyword in
+Scala.
+
+The real power of Scala comes, however, from the ability to
+define functions by \emph{pattern matching}. In the {\tt
+collatz} function above we need to test each case using a
+sequence of {\tt if}s. This can be very cumbersome and brittle
+if there are many cases. If we wanted to define a function
+over regular expressions in say Java, which does not have
+pattern-matching, the resulting code would just be awkward.
+
+Mathematicians already use the power of pattern-matching, for
+example, when they define the function that takes a regular
+expression and produces another regular expression that can
+recognise the reversed strings. The resulting recursive
+function is often defined as follows:
+
+\begin{center}
+\begin{tabular}{r@{\hspace{1mm}}c@{\hspace{1mm}}l}
+$rev(\varnothing)$   & $\dn$ & $\varnothing$\\
+$rev(\epsilon)$      & $\dn$ & $\epsilon$\\
+$rev(c)$             & $\dn$ & $c$\\
+$rev(r_1 + r_2)$     & $\dn$ & $rev(r_1) + rev(r_2)$\\
+$rev(r_1 \cdot r_2)$ & $\dn$ & $rev(r_2) \cdot rev(r_1)$\\
+$rev(r^*)$                   & $\dn$ & $rev(r)^*$\\
+\end{tabular}
+\end{center}
+
+\noindent The corresponding Scala code looks very similar
+to this definition, thanks to pattern-matching.
 
 
+\begin{quote}
+\lstinputlisting[language=Scala]{../progs/rev.scala}
+\end{quote}
+
+\noindent The keyword for starting a pattern-match is {\tt
+match} followed by a list of {\tt case}s. Before the match
+can be another pattern, but often as in the case above, 
+it is just a variable you want to pattern-match.
+
+In the {\tt rev}-function above, each case follows the
+structure of how we defined regular expressions as inductive
+datatype. For example the case in Line 3 you can read as: if
+the regular expression {\tt r} is of the form {\tt EMPTY} then
+do whatever follows the {\tt =>} (in this case just return
+{\tt EMPTY}). Line 5 reads as: if the regular expression
+{\tt r} is of the form {\tt ALT(r1, r2)}, where the
+left-branch of the alternative is matched by the variable {\tt
+r1} and the right-branch by {\tt r2} then do ``something''.
+The ``something'' can now use the variables {\tt r1} and {\tt
+r2} from the match. 
+
+If you want to play with this function, call, it for 
+example, with the regular expression $ab + ac$. This regular
+expression can recognise the strings $ab$ and $ac$. The 
+function {\tt rev} produces the result $ba + ca$, which
+can recognise the reversed strings $ba$ and $ca$.
+
+In Scala each pattern-match can also be guarded as in
+
+\begin{quote}
+\begin{lstlisting}[language=Scala, numbers=none]
+case Pattern if Condition => Do_Something
+\end{lstlisting}
+\end{quote}
+
+\noindent This allows us, for example, to re-write the {\tt
+collatz}-function from above as follows:
+ 
+\begin{quote}
+\lstinputlisting[language=Scala]{../progs/collatz2.scala}
+\end{quote}
+
+\noindent Although in this case the pattern-match does not
+improve the code in any way. The underscore in the last case
+indicates that we do not care what the pattern looks like. 
+Thus Line 4 acts like a default case whenever the cases above 
+did not match. Cases are always tried out from top to bottom.
+ 
+\subsection*{Loops}
+
+Coming from Java or C, you might be surprised that Scala does
+not really have loops. It has instead, what is in functional
+programming called \emph{maps}. To illustrate how they work,
+lets assume you have a list of numbers from 1 to 10 and want to
+build the list of squares. The list of numbers from 1 to 10 
+can be constructed in Scala as follows:
+
+\begin{alltt}
+scala> (1 to 10).toList
+res1: List[Int] = List(1, 2, 3, 4, 5, 6, 7, 8, 9, 10)
+\end{alltt}
+
+\noindent Generating from this list the list of squares in a
+non-functional language (e.g.~Java), you would assume the list
+is given as a kind of array. You would then iterate, or loop,
+an index over this array and replace each entry in the array
+by the square. Right? In Scala, and in other functional
+programming languages, you use maps to achieve the same. 
+
+Maps essentially take a function that describes how each
+element is transformed (for example squaring) and a list over
+which this function should work. There are two forms to
+express maps in Scala. The first way is in a {\tt
+for}-construction. Squaring the numbers from 1 to 10 would
+look in this form as follows:
+
+\begin{alltt}
+scala> for (n <- (1 to 10).toList) yield n * n
+res2: List[Int] = List(1, 4, 9, 16, 25, 36, 49, 64, 81, 100)
+\end{alltt}
+
+\noindent The keywords are {\tt for} and {\tt yield}. This
+{\tt for}-construction roughly says that from the list of
+numbers we draw {\tt n}s and compute the result of {\tt n *
+n}. In consequence we specified the list where each {\tt n}
+comes from, namely {\tt (1 to 10).toList}, and how each
+element needs to be transformed. This can also be expressed
+in a second way by using directly {\tt map} as follows:
+
+\begin{alltt}
+scala> (1 to 10).toList.map(n => n * n)
+res3 = List(1, 4, 9, 16, 25, 36, 49, 64, 81, 100)
+\end{alltt}
+
+\noindent In this way the expression {\tt n => n * n} stands
+for the function that calculates the square (this is how the
+{\tt n}s are transformed). This expression for functions might
+remind you on your lessons about the lambda-calculus where
+this would have been written as $\lambda n.\,n * n$.
+
+It might not be obvious, but {\tt for}-constructions are just
+syntactic sugar: when compiling, Scala translates them into
+equivalent maps.
+
+
+The very charming feature of Scala is that such maps or {\tt
+for}-constructions can be written for any kind of data
+collection, such as lists, sets, vectors and so on. For
+example if we instead compute the reminders modulo $3$ of this
+list, we can write
+
+\begin{alltt}
+scala> (1 to 10).toList.map(n => n \% 3)
+res4 = List(1, 2, 0, 1, 2, 0, 1, 2, 0, 1)
+\end{alltt}
+
+\noindent If we, however, transform the numbers 1 to 10 not
+into a list, but into a set, and then compute the reminders
+modulo $3$ we obtain
+
+\begin{alltt}
+scala> (1 to 10).toSet[Int].map(n => n \% 3)
+res5 = Set(2, 1, 0)
+\end{alltt}
+
+\noindent This is the correct result for sets, as there are
+only 3 equivalence classes of integers modulo 3. Note that we
+need to ``help'' Scala in this example to transform the
+numbers into a set of integers by explicitly annotating the
+type {\tt Int}. Since maps and {\tt for}-construction are just
+syntactic variants of each other, the latter can also be
+written as
+
+\begin{alltt}
+scala> for (n <- (1 to 10).toSet[Int]) yield n \% 3
+res5 = Set(2, 1, 0)
+\end{alltt}
+
+While hopefully this all looks reasonable, there is one
+complication: In the examples above we always wanted to
+transform one list into another list (e.g.~list of squares),
+or one set into another set (set of numbers into set of
+reminders modulo 3). What happens if we just want to print out
+a list of integers? Then actually the {\tt for}-construction
+needs to be modified. The reason is that {\tt print}, you
+guessed it, does not produce any result, but only produces, in
+the functional-programming-lingo, a side-effect. Printing out
+the list of numbers from 1 to 5 would look as follows
+
+\begin{alltt}
+scala> for (n <- (1 to 5).toList) println(n)
+1
+2
+3
+4
+5
+\end{alltt}
+
+\noindent
+where you need to omit the keyword {\tt yield}. You can
+also do more elaborate calculations such as
+
+\begin{alltt}
+scala> for (n <- (1 to 5).toList) \{
+  val square_n = n * n
+  println(s"$n * $n = $square_n") 
+\}
+1 * 1 = 1
+2 * 2 = 4
+3 * 3 = 9
+4 * 4 = 16
+5 * 5 = 25
+\end{alltt}
+
+\noindent
+In this code I use a \emph{string interpolation},
+written {\tt s"..."} in order to print out an equation.
+This string interpolation allows me to refer to the integer 
+values {\tt n} and {\tt square\_n} inside a string. This
+is very convenient for printing out ``things''. 
+
+The corresponding map construction for functions with 
+side-effexts is in Scala called {\tt foreach}. So you 
+could also write
+
+\begin{alltt}
+scala> (1 to 5).toList.foreach(n => println(n))
+1
+2
+3
+4
+5
+\end{alltt}
+
+\noindent or even just
+
+\begin{alltt}
+scala> (1 to 5).toList.foreach(println)
+1
+2
+3
+4
+5
+\end{alltt}
+
+\noindent If you want to find out more maps and functions
+with side-efffects, you can ponder about the response Scala
+gives if you replace {\tt foreach} by {\tt map} in the
+expression above. Scala will still allow {\tt map}, but
+then reacts with an interesting result.
 
 \subsection*{Types}
 
+In most functional programming languages types play an
+important role. Scala is such a language. You have already
+seen built-in types, like {\tt Int}, {\tt Boolean}, {\tt
+String} and {\tt BigInt}, but also user-defined ones, like {\tt Rexp}.
+Unfortunately, types can be a thorny subject, especially in
+Scala. For example, why do we need to give the type to {\tt
+toSet[Int]} but not to {\tt toList}? The reason is the power
+of Scala, which sometimes means it cannot infer all necessary
+typing information. At the beginning while getting familiar
+with Scala, I recommend a ``play-it-by-ear-approach'' to
+types. Fully understanding type-systems, especially compicated
+ones like in Scala, can take a module on their
+own.\footnote{Still, such a study can be a rewarding training:
+If you are in the business of designing new programming
+languages, you will not be able to turn a blind eye to types.
+They essentially help programmers to avoid common programming
+errors and help with maintaining code.}
+
+In Scala, types are needed whenever you define an inductive
+datatype and also whenever you define functions (their
+arguments and their results need a type). Base types are types
+that do not take any (type)arguments, for example {\tt Int}
+and {\tt String}. Compound types take one or more arguments,
+which need to be given in angle-brackets, for example {\tt
+List[Int]} or {\tt Set[List[String]]} or {\tt Map[Int, Int]}.
+
+There are e few special type-constructors. One is for tuples,
+which is written with parentheses. For example {\tt (Int, Int,
+String)} for a triple consisting of two integers and a string.
+Tuples are helpful if you want to define a function with
+multiple results, say the function returning the quotient and
+reminder of two numbers. For this you might define:
+
+\begin{alltt}
+def quo_rem(m: Int, n: Int) : (Int, Int) = (m \verb|/| n, m \% n)
+\end{alltt}
+
+\noindent
+Since this function returns a pair of integers, its type
+needs to be {\tt (Int, Int)}. 
+
+Another special type-constructor is for functions, written
+{\tt =>}. For example, the type {\tt Int => String} stands 
+for a function that takes an integer as argument and produces 
+a string. A function of this type is for instance
+
+\begin{quote}
+\begin{lstlisting}[language=Scala]
+def mk_string(n: Int) : String = n match {
+  case 0 => "zero"
+  case 1 => "one"
+  case 2 => "two"
+  case _ => "many" 
+} 
+\end{lstlisting}
+\end{quote}
+
+\noindent Unlike other functional programming languages, there
+is no easy way to find out the types of existing functions
+except for looking into the documentation
+
+\begin{quote}
+\url{http://www.scala-lang.org/api/current/}
+\end{quote}
+
+\noindent The function arrow can also be iterated, as in {\tt
+Int => String => Boolean}. This is the type for a function
+taking an integer as first argument and a string as second,
+and the result of the function is a boolean. Though silly, a
+function of this type is
+
+\begin{quote}
+\begin{lstlisting}[language=Scala]
+def chk_string(n: Int, s: String) : Boolean = 
+  mk_string(n) == s
+\end{lstlisting}
+\end{quote}
+
+\noindent
+
 \subsection*{Cool Stuff}
 
+\subsection*{More Info}
 
 \end{document}