handouts/scala-ho.tex
changeset 123 556cd74cbba9
child 167 349d706586ef
equal deleted inserted replaced
122:90dd9c6162b3 123:556cd74cbba9
       
     1 \documentclass{article}
       
     2 \usepackage{../style}
       
     3 \usepackage{../langs}
       
     4 \usepackage{marvosym}
       
     5 
       
     6 %cheat sheet
       
     7 %http://worldline.github.io/scala-cheatsheet/
       
     8 
       
     9 \begin{document}
       
    10 
       
    11 \section*{A Crash-Course on Scala}
       
    12 
       
    13 Scala is a programming language that combines functional and
       
    14 object-oriented programming-styles. It has received quite a
       
    15 bit of attention in the last five years or so. One reason for
       
    16 this attention is that, like the Java programming language,
       
    17 Scala compiles to the Java Virtual Machine (JVM) and therefore
       
    18 Scala programs can run under MacOSX, Linux and
       
    19 Windows.\footnote{There are also experimental backends for
       
    20 Android and JavaScript; and also work is under way to have a
       
    21 native compiler, see \url{https://github.com/scala-native/scala-native}.} Unlike Java, however, Scala often
       
    22 allows programmers to write very concise and elegant code.
       
    23 Some therefore say: Scala is the much better Java. A number of
       
    24 companies, The Guardian, Twitter, Coursera, FourSquare,
       
    25 LinkedIn to name a few, either use Scala exclusively in
       
    26 production code, or at least to some substantial degree. It
       
    27 also seems to be useful in job-interviews (in Data Science)
       
    28 according to this annectotical report
       
    29 
       
    30 \begin{quote}
       
    31 \url{https://techcrunch.com/2016/06/14/scala-is-the-new-golden-child/}
       
    32 \end{quote}
       
    33 
       
    34 \noindent
       
    35 If you want to try out Scala yourself, the official Scala compiler can be
       
    36 downloaded from
       
    37 
       
    38 \begin{quote}
       
    39 \url{http://www.scala-lang.org}
       
    40 \end{quote}
       
    41 
       
    42 \noindent
       
    43 A ready-made bundle with the Eclipse IDE is at
       
    44 
       
    45 \begin{quote}
       
    46 \url{http://scala-ide.org/download/sdk.html}
       
    47 \end{quote}
       
    48 
       
    49 Why do I use Scala in the AFL module? Actually, you can do
       
    50 \emph{any} part of the coursework in \emph{any} programming
       
    51 language you like. I use Scala for showing you code during the
       
    52 lectures because its functional programming-style allows me to
       
    53 implement the functions we will discuss with very small
       
    54 code-snippets. If I had to do this in Java, I would first have
       
    55 to go through heaps of boilerplate code and the code-snippets
       
    56 would not look pretty. Since the Scala compiler is free, you
       
    57 can download the code-snippets and run every example I give.
       
    58 But if you prefer, you can also easily translate them into any
       
    59 other functional language, for example Haskell, Swift,
       
    60 Standard ML, F$^\#$, Ocaml and so on.
       
    61 
       
    62 Developing programs in Scala can be done with the Eclipse IDE
       
    63 and also with the IntelliJ IDE, but for the small programs I will
       
    64 develop the good old Emacs-editor is adequate for me and I
       
    65 will run the programs on the command line. One advantage of
       
    66 Scala over Java is that it includes an interpreter (a REPL, or
       
    67 \underline{R}ead-\underline{E}val-\underline{P}rint-\underline{L}oop)
       
    68 with which you can run and test small code-snippets without
       
    69 the need of the compiler. This helps a lot with interactively
       
    70 developing programs. Once you installed Scala, you can start
       
    71 the interpreter by typing on the command line:
       
    72 
       
    73 \begin{lstlisting}[language={},numbers=none,basicstyle=\ttfamily\small]
       
    74 $ scala
       
    75 Welcome to Scala version 2.11.8 (Java HotSpot(TM) 64-Bit Server VM).
       
    76 Type in expressions for evaluation. Or try :help.
       
    77 
       
    78 scala>
       
    79 \end{lstlisting}
       
    80 
       
    81 \noindent Of course the precise response may vary due to the
       
    82 version and platform where you installed Scala. At the Scala
       
    83 prompt you can type things like \code{2 + 3} \keys{Ret} and
       
    84 the output will be
       
    85 
       
    86 \begin{lstlisting}[numbers=none]
       
    87 scala> 2 + 3
       
    88 res0: Int = 5
       
    89 \end{lstlisting}
       
    90 
       
    91 \noindent indicating that the result of the addition is of
       
    92 type \code{Int} and the actual result is 5. Another classic
       
    93 example you can try out is
       
    94 
       
    95 \begin{lstlisting}[numbers=none]
       
    96 scala> print("hello world")
       
    97 hello world
       
    98 \end{lstlisting}
       
    99 
       
   100 \noindent Note that in this case there is no result. The
       
   101 reason is that \code{print} does not actually produce a result
       
   102 (there is no \code{resXX} and no type), rather it is a
       
   103 function that causes the \emph{side-effect} of printing out a
       
   104 string. Once you are more familiar with the functional
       
   105 programming-style, you will know what the difference is
       
   106 between a function that returns a result, like addition, and a
       
   107 function that causes a side-effect, like \code{print}. We
       
   108 shall come back to this point later, but if you are curious
       
   109 now, the latter kind of functions always has \code{Unit} as
       
   110 return type.
       
   111 
       
   112 If you want to write a stand-alone app in Scala, you can
       
   113 implement an object that is an instance of \code{App}, say
       
   114 
       
   115 \begin{lstlisting}[numbers=none]
       
   116 object Hello extends App {
       
   117     println("hello world")
       
   118 }
       
   119 \end{lstlisting}
       
   120 
       
   121 \noindent save it in a file, say {\tt hello-world.scala}, and
       
   122 then run the compiler and runtime environment:
       
   123 
       
   124 \begin{lstlisting}[language={},numbers=none,basicstyle=\ttfamily\small]
       
   125 $ scalac hello-world.scala
       
   126 $ scala Hello
       
   127 hello world
       
   128 \end{lstlisting}
       
   129 
       
   130 As mentioned above, Scala targets the JVM and consequently
       
   131 Scala programs can also be executed by the bog-standard Java
       
   132 Runtime. This only requires the inclusion of {\tt
       
   133 scala-library.jar}, which on my computer can be done as
       
   134 follows:
       
   135 
       
   136 \begin{lstlisting}[language={},numbers=none,basicstyle=\ttfamily\small]
       
   137 $ scalac hello-world.scala
       
   138 $ java -cp /usr/local/src/scala/lib/scala-library.jar:. Hello
       
   139 hello world
       
   140 \end{lstlisting}
       
   141 
       
   142 \noindent You might need to adapt the path to where you have
       
   143 installed Scala.
       
   144 
       
   145 \subsection*{Inductive Datatypes}
       
   146 
       
   147 The elegance and conciseness of Scala programs are often a
       
   148 result of inductive datatypes that can be easily defined in
       
   149 Scala. For example in ``every-day mathematics'' we define
       
   150 regular expressions simply by giving the grammar
       
   151 
       
   152 \begin{center}
       
   153 \begin{tabular}{r@{\hspace{2mm}}r@{\hspace{2mm}}l@{\hspace{13mm}}l}
       
   154   $r$ & $::=$ &   $\ZERO$            & null\\
       
   155         & $\mid$ & $\ONE$            & empty string\\
       
   156         & $\mid$ & $c$               & single character\\
       
   157         & $\mid$ & $r_1 \cdot r_2$   & sequence\\
       
   158         & $\mid$ & $r_1 + r_2$       & alternative / choice\\
       
   159         & $\mid$ & $r^\star$             & star (zero or more)\\
       
   160   \end{tabular}
       
   161 \end{center}
       
   162 
       
   163 \noindent This grammar specifies what regular expressions are
       
   164 (essentially a kind of tree-structure with three kinds of
       
   165 inner nodes---sequence, alternative and star---and three kinds
       
   166 of leave nodes---null, empty and character). If you are
       
   167 familiar with Java, it might be an instructive exercise to
       
   168 define this kind of inductive datatypes in Java\footnote{Happy
       
   169 programming! \Smiley} and then compare it with how it can be
       
   170 implemented in Scala.
       
   171 
       
   172 Implementing the regular expressions from above in Scala is
       
   173 actually very simple: It first requires an \emph{abstract
       
   174 class}, say, \code{Rexp}. This will act as the type for
       
   175 regular expressions. Second, it requires a case for each
       
   176 clause in the grammar. The cases for $\ZERO$ and $\ONE$ do not
       
   177 have any arguments, while in all the other cases we do have
       
   178 arguments. For example the character regular expression needs
       
   179 to take as an argument the character it is supposed to
       
   180 recognise. In Scala, the cases without arguments are called
       
   181 \emph{case objects}, whereas the ones with arguments are
       
   182 \emph{case classes}. The corresponding Scala code is as
       
   183 follows:
       
   184 
       
   185 \begin{lstlisting}[numbers=none]
       
   186 abstract class Rexp 
       
   187 case object ZERO extends Rexp
       
   188 case object ONE extends Rexp
       
   189 case class CHAR (c: Char) extends Rexp
       
   190 case class SEQ (r1: Rexp, r2: Rexp) extends Rexp 
       
   191 case class ALT (r1: Rexp, r2: Rexp) extends Rexp 
       
   192 case class STAR (r: Rexp) extends Rexp 
       
   193 \end{lstlisting}
       
   194 
       
   195 \noindent In order to be an instance of \code{Rexp}, each case
       
   196 object and case class needs to extend \code{Rexp}. Given the
       
   197 grammar above, I hope you can see the underlying pattern. If
       
   198 you want to play further with such definitions of inductive
       
   199 datatypes, feel free to define for example binary trees.
       
   200 
       
   201 Once you make a definition like the one above in Scala, you
       
   202 can represent the regular expression for $a + b$, for example,
       
   203 as \code{ALT(CHAR('a'), CHAR('b'))}. Expressions such as
       
   204 \code{'a'} stand for ASCII characters, though in the output
       
   205 syntax, as you can see below, the quotes are omitted. In a
       
   206 later section we will see how we can support the more
       
   207 mathematical infix notation for regular expression operators
       
   208 in Scala. If you want to assign this regular expression to a
       
   209 variable, you can use the keyword \code{val} and type
       
   210 
       
   211 \begin{lstlisting}[numbers=none]
       
   212 scala> val r = ALT(CHAR('a'), CHAR('b'))
       
   213 r: ALT = ALT(CHAR(a),CHAR(b))
       
   214 \end{lstlisting}
       
   215 
       
   216 \noindent As you can see, in order to make such assignments,
       
   217 no \code{new} or constructor is required in the class (as in
       
   218 Java). However, if there is the need for some non-standard
       
   219 initialisation, you can of course define such a constructor in
       
   220 Scala too. But we omit such ``tricks'' here. 
       
   221 
       
   222 Note that Scala in its response says the variable \code{r} is
       
   223 of type \code{ALT}, not \code{Rexp}. This might be a bit
       
   224 unexpected, but can be explained as follows: Scala always
       
   225 tries to find the most general type that is needed for a
       
   226 variable or expression, but does not ``over-generalise''. In
       
   227 our definition the type \code{Rexp} is more general than
       
   228 \code{ALT}, since it is the abstract class for all regular
       
   229 expressions. But in this particular case there is no need to
       
   230 give \code{r} the more general type of \code{Rexp}. This is
       
   231 different if you want to form a list of regular expressions,
       
   232 for example
       
   233 
       
   234 \begin{lstlisting}[numbers=none]
       
   235 scala> val ls = List(ALT(CHAR('a'), CHAR('b')), ZERO)
       
   236 ls: List[Rexp] = List(ALT(CHAR(a),CHAR(b)), ZERO)
       
   237 \end{lstlisting}
       
   238 
       
   239 \noindent In this case, Scala needs to assign a common type to
       
   240 the regular expressions so that it is compatible with the
       
   241 fact that lists can only contain elements of a single type. In
       
   242 this case the first common type is \code{Rexp}.\footnote{If you
       
   243 type in this example, you will notice that the type contains
       
   244 some further information, but let us ignore this for the
       
   245 moment.} 
       
   246 
       
   247 For compound types like \code{List[...]}, the general rule is
       
   248 that when a type takes another type as argument, then this
       
   249 argument type is written in angle-brackets. This can also
       
   250 contain nested types as in \code{List[Set[Rexp]]}, which is a
       
   251 list of sets each of which contains regular expressions.
       
   252 
       
   253 \subsection*{Functions and Pattern-Matching}
       
   254 
       
   255 I mentioned above that Scala is a very elegant programming
       
   256 language for the code we will write in this module. This
       
   257 elegance mainly stems from the fact that in addition to
       
   258 inductive datatypes, also functions can be implemented very
       
   259 easily in Scala. To show you this, let us first consider a
       
   260 problem from number theory, called the \emph{Collatz-series},
       
   261 which corresponds to a famous unsolved problem in
       
   262 mathematics.\footnote{See for example
       
   263 \url{http://mathworld.wolfram.com/CollatzProblem.html}.}
       
   264 Mathematicians define this series as:
       
   265 
       
   266 \[
       
   267 collatz_{n + 1} \dn 
       
   268 \begin{cases}
       
   269 \frac{1}{2} * collatz_n & \text{if $collatz_n$ is even}\\
       
   270 3 * collatz_n + 1 & \text{if $collatz_n$ is odd}
       
   271 \end{cases}
       
   272 \]
       
   273 
       
   274 \noindent The famous unsolved question is whether this
       
   275 series started with any $n > 0$ as $collatz_0$ will always
       
   276 return to $1$. This is obvious when started with $1$, and also
       
   277 with $2$, but already needs a bit of head-scratching for the
       
   278 case of $3$.
       
   279 
       
   280 If we want to avoid the head-scratching, we could implement
       
   281 this as the following function in Scala:
       
   282 
       
   283 \lstinputlisting[numbers=none]{../progs/collatz.scala}
       
   284 
       
   285 \noindent The keyword for function definitions is \code{def}
       
   286 followed by the name of the function. After that you have a
       
   287 list of arguments (enclosed in parentheses and separated by
       
   288 commas). Each argument in this list needs its type to be
       
   289 annotated. In this case we only have one argument, which is of
       
   290 type \code{BigInt}. This type stands in Scala for arbitrary
       
   291 precision integers (in case you want to try out the function
       
   292 on really big numbers). After the arguments comes the type of
       
   293 what the function returns---a Boolean in this case for
       
   294 indicating that the function has reached 1. Finally, after the
       
   295 \code{=} comes the \emph{body} of the function implementing
       
   296 what the function is supposed to do. What the \code{collatz}
       
   297 function does should be pretty self-explanatory: the function
       
   298 first tests whether \code{n} is equal to 1 in which case it
       
   299 returns \code{true} and so on.
       
   300 
       
   301 Notice the quirk in Scala's syntax for \code{if}s: The condition
       
   302 needs to be enclosed in parentheses and the then-case comes
       
   303 right after the condition---there is no \code{then} keyword in
       
   304 Scala.
       
   305 
       
   306 The real power of Scala comes, however, from the ability to
       
   307 define functions by \emph{pattern matching}. In the
       
   308 \code{collatz} function above we need to test each case using a
       
   309 sequence of \code{if}s. This can be very cumbersome and brittle
       
   310 if there are many cases. If we wanted to define a function
       
   311 over regular expressions in Java, for example, which does not
       
   312 have pattern-matching, the resulting code would just be
       
   313 awkward.
       
   314 
       
   315 Mathematicians already use the power of pattern-matching when
       
   316 they define the function that takes a regular expression and
       
   317 produces another regular expression that can recognise the
       
   318 reversed strings. They define this function as follows:
       
   319 
       
   320 \begin{center}
       
   321 \begin{tabular}{r@{\hspace{2mm}}c@{\hspace{2mm}}l}
       
   322 $rev(\ZERO)$   & $\dn$ & $\ZERO$\\
       
   323 $rev(\ONE)$      & $\dn$ & $\ONE$\\
       
   324 $rev(c)$             & $\dn$ & $c$\\
       
   325 $rev(r_1 + r_2)$     & $\dn$ & $rev(r_1) + rev(r_2)$\\
       
   326 $rev(r_1 \cdot r_2)$ & $\dn$ & $rev(r_2) \cdot rev(r_1)$\\
       
   327 $rev(r^*)$                   & $\dn$ & $rev(r)^*$\\
       
   328 \end{tabular}
       
   329 \end{center}
       
   330 
       
   331 \noindent It is defined by recursion analysing each pattern of
       
   332 what the regular expression might look like. The corresponding
       
   333 Scala code looks very similar to this definition, thanks to
       
   334 pattern-matching.
       
   335 
       
   336 %%\lstinputlisting[language=Scala]{../progs/rev.scala}
       
   337 
       
   338 \noindent The keyword for starting a pattern-match is
       
   339 \code{match} followed by a list of \code{case}s. Before the
       
   340 match keyword can be another pattern, but often, as in the
       
   341 case above, it is just a variable you want to pattern-match
       
   342 (the \code{r} after \code{=} in Line 1).
       
   343 
       
   344 Each case in this definition follows the structure of how we
       
   345 defined regular expressions as inductive datatype. For example
       
   346 the case in Line 3 you can read as: if the regular expression
       
   347 \code{r} is of the form \code{EMPTY} then do whatever follows
       
   348 the \code{=>} (in this case just return \code{EMPTY}). Line 5
       
   349 reads as: if the regular expression \code{r} is of the form
       
   350 \code{ALT(r1, r2)}, where the left-branch of the alternative is
       
   351 matched by the variable \code{r1} and the right-branch by
       
   352 \code{r2} then do ``something''. The ``something'' can now use the
       
   353 variables \code{r1} and \code{r2} from the match. 
       
   354 
       
   355 If you want to play with this function, call it for example
       
   356 with the regular expression $ab + ac$. This regular expression
       
   357 can recognise the strings $ab$ and $ac$. The function 
       
   358 \code{rev} produces $ba + ca$, which can recognise the reversed
       
   359 strings $ba$ and $ca$.
       
   360 
       
   361 In Scala each pattern-match can also be guarded as in
       
   362 
       
   363 \begin{lstlisting}[ numbers=none]
       
   364 case Pattern if Condition => Do_Something
       
   365 \end{lstlisting}
       
   366 
       
   367 \noindent This allows us, for example, to re-write the 
       
   368 \code{collatz}-function from above as follows:
       
   369  
       
   370 %%\lstinputlisting[language=Scala]{../progs/collatz2.scala}
       
   371 
       
   372 
       
   373 \noindent Although in this particular case the pattern-match
       
   374 does not improve the code in any way. In cases like
       
   375 \code{rev}, however, it is really crucial. The underscore in
       
   376 Line 4 indicates that we do not care what the pattern looks
       
   377 like. Thus this case acts like a default case whenever the
       
   378 cases above did not match. Cases are always tried out from top
       
   379 to bottom.
       
   380  
       
   381 \subsection*{Loops, or better the Absence thereof}
       
   382 
       
   383 Coming from Java or C, you might be surprised that Scala does
       
   384 not really have loops. It has instead, what is in functional
       
   385 programming called, \emph{maps}. To illustrate how they work,
       
   386 let us assume you have a list of numbers from 1 to 8 and want to
       
   387 build the list of squares. The list of numbers from 1 to 8 
       
   388 can be constructed in Scala as follows:
       
   389 
       
   390 \begin{lstlisting}[numbers=none]
       
   391 scala> (1 to 8).toList
       
   392 res1: List[Int] = List(1, 2, 3, 4, 5, 6, 7, 8)
       
   393 \end{lstlisting}
       
   394 
       
   395 \noindent Generating from this list, the list of squares in a
       
   396 programming language such as Java, you would assume the list
       
   397 is given as a kind of array. You would then iterate, or loop,
       
   398 an index over this array and replace each entry in the array
       
   399 by the square. Right? In Scala, and in other functional
       
   400 programming languages, you use maps to achieve the same. 
       
   401 
       
   402 A map essentially takes a function that describes how each
       
   403 element is transformed (for example squared) and a list over
       
   404 which this function should work. There are two forms to
       
   405 express such maps in Scala. The first way is called a
       
   406 \emph{for-comprehension}. Squaring the numbers from 1 to 8
       
   407 would look in this form as follows:
       
   408 
       
   409 \begin{lstlisting}[numbers=none]
       
   410 scala> for (n <- (1 to 8).toList) yield n * n
       
   411 res2: List[Int] = List(1, 4, 9, 16, 25, 36, 49, 64)
       
   412 \end{lstlisting}
       
   413 
       
   414 \noindent The important keywords are \code{for} and
       
   415 \code{yield}. This for-comprehension roughly states that from
       
   416 the list of numbers we draw \code{n}s and compute the result
       
   417 of \code{n * n}. As you can see, we specified the list where
       
   418 each \code{n} comes from, namely \code{(1 to 8).toList}, and
       
   419 how each element needs to be transformed. This can also be
       
   420 expressed in a second way in Scala by using directly
       
   421 \code{map}s as follows:
       
   422 
       
   423 \begin{lstlisting}[numbers=none]
       
   424 scala> (1 to 8).toList.map(n => n * n)
       
   425 res3 = List(1, 4, 9, 16, 25, 36, 49, 64)
       
   426 \end{lstlisting}
       
   427 
       
   428 \noindent In this way, the expression \code{n => n * n} stands
       
   429 for the function that calculates the square (this is how the
       
   430 \code{n}s are transformed). This expression for functions
       
   431 might remind you of your lessons about the lambda-calculus
       
   432 where this would have been written as $\lambda n.\,n * n$. It
       
   433 might not be obvious, but for-comprehensions are just
       
   434 syntactic sugar: when compiling, Scala translates
       
   435 for-comprehensions into equivalent maps. This even works
       
   436 when for-comprehensions get more complicated (see below).
       
   437 
       
   438 The very charming feature of Scala is that such maps or
       
   439 for-comprehensions can be written for any kind of data
       
   440 collection, such as lists, sets, vectors, options and so on.
       
   441 For example if we instead compute the reminders modulo 3 of
       
   442 this list, we can write
       
   443 
       
   444 \begin{lstlisting}[numbers=none]
       
   445 scala> (1 to 8).toList.map(n => n % 3)
       
   446 res4 = List(1, 2, 0, 1, 2, 0, 1, 2)
       
   447 \end{lstlisting}
       
   448 
       
   449 \noindent If we, however, transform the numbers 1 to 8 not
       
   450 into a list, but into a set, and then compute the reminders
       
   451 modulo 3 we obtain
       
   452 
       
   453 \begin{lstlisting}[numbers=none]
       
   454 scala> (1 to 8).toSet[Int].map(n => n % 3)
       
   455 res5 = Set(2, 1, 0)
       
   456 \end{lstlisting}
       
   457 
       
   458 \noindent This is the correct result for sets, as there are
       
   459 only three equivalence classes of integers modulo 3. Note that
       
   460 in this example we need to ``help'' Scala to transform the
       
   461 numbers into a set of integers by explicitly annotating the
       
   462 type \code{Int}. Since maps and for-comprehensions are
       
   463 just syntactic variants of each other, the latter can also be
       
   464 written as
       
   465 
       
   466 \begin{lstlisting}[numbers=none]
       
   467 scala> for (n <- (1 to 8).toSet[Int]) yield n % 3
       
   468 res5 = Set(2, 1, 0)
       
   469 \end{lstlisting}
       
   470 
       
   471 For-comprehensions can also be nested and the selection of 
       
   472 elements can be guarded. For example if we want to pair up
       
   473 the numbers 1 to 4 with the letters a to c, we can write
       
   474 
       
   475 \begin{lstlisting}[numbers=none]
       
   476 scala> for (n <- (1 to 4).toList; 
       
   477             m <- ('a' to 'c').toList) yield (n, m)
       
   478 res6 = List((1,a), (1,b), (1,c), (2,a), (2,b), (2,c), 
       
   479             (3,a), (3,b), (3,c), (4,a), (4,b), (4,c))
       
   480 \end{lstlisting}
       
   481 
       
   482 \noindent 
       
   483 Or if we want to find all pairs of numbers between 1 and 3
       
   484 where the sum is an even number, we can write
       
   485 
       
   486 \begin{lstlisting}[numbers=none]
       
   487 scala> for (n <- (1 to 3).toList; 
       
   488             m <- (1 to 3).toList;
       
   489             if (n + m) % 2 == 0) yield (n, m)
       
   490 res7 = List((1,1), (1,3), (2,2), (3,1), (3,3))
       
   491 \end{lstlisting}
       
   492 
       
   493 \noindent The \code{if}-condition in the for-comprehension
       
   494 filters out all pairs where the sum is not even.
       
   495 
       
   496 While hopefully this all looks reasonable, there is one
       
   497 complication: In the examples above we always wanted to
       
   498 transform one list into another list (e.g.~list of squares),
       
   499 or one set into another set (set of numbers into set of
       
   500 reminders modulo 3). What happens if we just want to print out
       
   501 a list of integers? Then actually the for-comprehension
       
   502 needs to be modified. The reason is that \code{print}, you
       
   503 guessed it, does not produce any result, but only produces
       
   504 what is in the functional-programming-lingo called a
       
   505 side-effect. Printing out the list of numbers from 1 to 5
       
   506 would look as follows
       
   507 
       
   508 \begin{lstlisting}[numbers=none]
       
   509 scala> for (n <- (1 to 5).toList) print(n)
       
   510 12345
       
   511 \end{lstlisting}
       
   512 
       
   513 \noindent
       
   514 where you need to omit the keyword \code{yield}. You can
       
   515 also do more elaborate calculations such as
       
   516 
       
   517 \begin{lstlisting}[numbers=none]
       
   518 scala> for (n <- (1 to 5).toList) {
       
   519   val square_n = n * n
       
   520   println(s"$n * $n = $square_n") 
       
   521 }
       
   522 1 * 1 = 1
       
   523 2 * 2 = 4
       
   524 3 * 3 = 9
       
   525 4 * 4 = 16
       
   526 5 * 5 = 25
       
   527 \end{lstlisting}
       
   528 
       
   529 \noindent In this code I use a variable assignment (\code{val
       
   530 square_n = ...} ) and also what is called in Scala a
       
   531 \emph{string interpolation}, written \code{s"..."}. The latter
       
   532 is for printing out an equation. It allows me to refer to the
       
   533 integer values \code{n} and \code{square\_n} inside a string.
       
   534 This is very convenient for printing out ``things''. 
       
   535 
       
   536 The corresponding map construction for functions with 
       
   537 side-effects is in Scala called \code{foreach}. So you 
       
   538 could also write
       
   539 
       
   540 
       
   541 \begin{lstlisting}[numbers=none]
       
   542 scala> (1 to 5).toList.foreach(n => print(n))
       
   543 12345
       
   544 \end{lstlisting}
       
   545 
       
   546 
       
   547 \noindent or even just
       
   548 
       
   549 \begin{lstlisting}[numbers=none]
       
   550 scala> (1 to 5).toList.foreach(print)
       
   551 12345
       
   552 \end{lstlisting}
       
   553 
       
   554 \noindent Again I hope this reminds you a bit of your
       
   555 lambda-calculus lessons, where an explanation is given why
       
   556 both forms produce the same result.
       
   557 
       
   558 
       
   559 If you want to find out more about maps and functions with
       
   560 side-effects, you can ponder about the response Scala gives if
       
   561 you replace \code{foreach} by \code{map} in the expression
       
   562 above. Scala will still allow \code{map} with side-effect
       
   563 functions, but then reacts with a slightly interesting result.
       
   564 
       
   565 \subsection*{Types}
       
   566 
       
   567 In most functional programming languages, types play an
       
   568 important role. Scala is such a language. You have already
       
   569 seen built-in types, like \code{Int}, \code{Boolean},
       
   570 \code{String} and \code{BigInt}, but also user-defined ones,
       
   571 like \code{Rexp}. Unfortunately, types can be a thorny
       
   572 subject, especially in Scala. For example, why do we need to
       
   573 give the type to \code{toSet[Int]}, but not to \code{toList}?
       
   574 The reason is the power of Scala, which sometimes means it
       
   575 cannot infer all necessary typing information. At the
       
   576 beginning while getting familiar with Scala, I recommend a
       
   577 ``play-it-by-ear-approach'' to types. Fully understanding
       
   578 type-systems, especially complicated ones like in Scala, can
       
   579 take a module on their own.\footnote{Still, such a study can
       
   580 be a rewarding training: If you are in the business of
       
   581 designing new programming languages, you will not be able to
       
   582 turn a blind eye to types. They essentially help programmers
       
   583 to avoid common programming errors and help with maintaining
       
   584 code.}
       
   585 
       
   586 In Scala, types are needed whenever you define an inductive
       
   587 datatype and also whenever you define functions (their
       
   588 arguments and their results need a type). Base types are types
       
   589 that do not take any (type)arguments, for example \code{Int}
       
   590 and \code{String}. Compound types take one or more arguments,
       
   591 which as seen earlier need to be given in angle-brackets, for
       
   592 example \code{List[Int]} or \code{Set[List[String]]} or 
       
   593 \code{Map[Int, Int]}.
       
   594 
       
   595 There are a few special type-constructors that fall outside
       
   596 this pattern. One is for tuples, where the type is written
       
   597 with parentheses. For example 
       
   598 
       
   599 \begin{lstlisting}[ numbers=none]
       
   600 (Int, Int, String)
       
   601 \end{lstlisting}
       
   602 
       
   603 \noindent is for a triple (a tuple with three components---two
       
   604 integers and a string). Tuples are helpful if you want to
       
   605 define functions with multiple results, say the function
       
   606 returning the quotient and reminder of two numbers. For this
       
   607 you might define:
       
   608 
       
   609 
       
   610 \begin{lstlisting}[ numbers=none]
       
   611 def quo_rem(m: Int, n: Int) : (Int, Int) = (m / n, m % n)
       
   612 \end{lstlisting}
       
   613 
       
   614 
       
   615 \noindent Since this function returns a pair of integers, its
       
   616 return type needs to be of type \code{(Int, Int)}.
       
   617 Incidentally, this is also the input type of this function.
       
   618 Notice this function takes \emph{two} arguments, namely
       
   619 \code{m} and \code{n}, both of which are integers. They are
       
   620 ``packaged'' in a pair. Consequently the complete type of
       
   621 \code{quo_rem} is
       
   622 
       
   623 \begin{lstlisting}[ numbers=none]
       
   624 (Int, Int) => (Int, Int)
       
   625 \end{lstlisting}
       
   626 
       
   627 Another special type-constructor is for functions, written as
       
   628 the arrow \code{=>}. For example, the type \code{Int =>
       
   629 String} is for a function that takes an integer as input
       
   630 argument and produces a string as result. A function of this
       
   631 type is for instance
       
   632 
       
   633 \begin{lstlisting}[numbers=none]
       
   634 def mk_string(n: Int) : String = n match {
       
   635   case 0 => "zero"
       
   636   case 1 => "one"
       
   637   case 2 => "two"
       
   638   case _ => "many" 
       
   639 } 
       
   640 \end{lstlisting}
       
   641 
       
   642 \noindent It takes an integer as input argument and returns a
       
   643 string. Unlike other functional programming languages, there
       
   644 is in Scala no easy way to find out the types of existing
       
   645 functions, except by looking into the documentation
       
   646 
       
   647 \begin{quote}
       
   648 \url{http://www.scala-lang.org/api/current/}
       
   649 \end{quote}
       
   650 
       
   651 The function arrow can also be iterated, as in 
       
   652 \code{Int => String => Boolean}. This is the type for a function
       
   653 taking an integer as first argument and a string as second,
       
   654 and the result of the function is a boolean. Though silly, a
       
   655 function of this type would be
       
   656 
       
   657 
       
   658 \begin{lstlisting}[numbers=none]
       
   659 def chk_string(n: Int)(s: String) : Boolean = 
       
   660   mk_string(n) == s
       
   661 \end{lstlisting}
       
   662 
       
   663 
       
   664 \noindent which checks whether the integer \code{n}
       
   665 corresponds to the name \code{s} given by the function
       
   666 \code{mk\_string}. Notice the unusual way of specifying the
       
   667 arguments of this function: the arguments are given one after
       
   668 the other, instead of being in a pair (what would be the type
       
   669 of this function then?). This way of specifying the arguments
       
   670 can be useful, for example in situations like this
       
   671 
       
   672 \begin{lstlisting}[numbers=none]
       
   673 scala> List("one", "two", "three", "many").map(chk_string(2))
       
   674 res4 = List(false, true, false, false)
       
   675 
       
   676 scala> List("one", "two", "three", "many").map(chk_string(3))
       
   677 res5 = List(false, false, false, true)
       
   678 \end{lstlisting}
       
   679 
       
   680 \noindent In each case we can give to \code{map} a specialised
       
   681 version of \code{chk_string}---once specialised to 2 and once
       
   682 to 3. This kind of ``specialising'' a function is called
       
   683 \emph{partial application}---we have not yet given to this
       
   684 function all arguments it needs, but only some of them.
       
   685 
       
   686 Coming back to the type \code{Int => String => Boolean}. The
       
   687 rule about such function types is that the right-most type
       
   688 specifies what the function returns (a boolean in this case).
       
   689 The types before that specify how many arguments the function
       
   690 expects and what their type is (in this case two arguments,
       
   691 one of type \code{Int} and another of type \code{String}).
       
   692 Given this rule, what kind of function has type
       
   693 \mbox{\code{(Int => String) => Boolean}}? Well, it returns a
       
   694 boolean. More interestingly, though, it only takes a single
       
   695 argument (because of the parentheses). The single argument
       
   696 happens to be another function (taking an integer as input and
       
   697 returning a string). Remember that \code{mk_string} is just 
       
   698 such a function. So how can we use it? For this define
       
   699 the somewhat silly function \code{apply_3}:
       
   700 
       
   701 \begin{lstlisting}[numbers=none]
       
   702 def apply_3(f: Int => String): Bool = f(3) == "many"
       
   703 
       
   704 scala> apply_3(mk_string)
       
   705 res6 = true
       
   706 \end{lstlisting}
       
   707 
       
   708 You might ask: Apart from silly functions like above, what is
       
   709 the point of having functions as input arguments to other
       
   710 functions? In Java there is indeed no need of this kind of
       
   711 feature: at least in the past it did not allow such
       
   712 constructions. I think, the point of Java 8 is to lift this
       
   713 restriction. But in all functional programming languages,
       
   714 including Scala, it is really essential to allow functions as
       
   715 input argument. Above you already seen \code{map} and
       
   716 \code{foreach} which need this. Consider the functions
       
   717 \code{print} and \code{println}, which both print out strings,
       
   718 but the latter adds a line break. You can call \code{foreach}
       
   719 with either of them and thus changing how, for example, five
       
   720 numbers are printed.
       
   721 
       
   722 
       
   723 \begin{lstlisting}[numbers=none]
       
   724 scala> (1 to 5).toList.foreach(print)
       
   725 12345
       
   726 scala> (1 to 5).toList.foreach(println)
       
   727 1
       
   728 2
       
   729 3
       
   730 4
       
   731 5
       
   732 \end{lstlisting}
       
   733 
       
   734 
       
   735 \noindent This is actually one of the main design principles
       
   736 in functional programming. You have generic functions like
       
   737 \code{map} and \code{foreach} that can traverse data containers,
       
   738 like lists or sets. They then take a function to specify what
       
   739 should be done with each element during the traversal. This
       
   740 requires that the generic traversal functions can cope with
       
   741 any kind of function (not just functions that, for example,
       
   742 take as input an integer and produce a string like above).
       
   743 This means we cannot fix the type of the generic traversal
       
   744 functions, but have to keep them
       
   745 \emph{polymorphic}.\footnote{Another interestic topic about
       
   746 types, but we omit it here for the sake of brevity.} 
       
   747 
       
   748 There is one more type constructor that is rather special. It
       
   749 is called \code{Unit}. Recall that \code{Boolean} has two
       
   750 values, namely \code{true} and \code{false}. This can be used,
       
   751 for example, to test something and decide whether the test
       
   752 succeeds or not. In contrast the type \code{Unit} has only a
       
   753 single value, written \code{()}. This seems like a completely
       
   754 useless type and return value for a function, but is actually
       
   755 quite useful. It indicates when the function does not return
       
   756 any result. The purpose of these functions is to cause
       
   757 something being written on the screen or written into a file,
       
   758 for example. This is what is called they cause some effect on 
       
   759 the side, namely a new content displayed on the screen or some
       
   760 new data in a file. Scala uses the \code{Unit} type to indicate
       
   761 that a function does not have a result, but potentially causes
       
   762 some side-effect. Typical examples are the printing functions, 
       
   763 like \code{print}.
       
   764 
       
   765 
       
   766 \subsection*{Cool Stuff}
       
   767 
       
   768 The first wow-moment I had with Scala was when I came across
       
   769 the following code-snippet for reading a web-page. 
       
   770 
       
   771 
       
   772 \begin{lstlisting}[ numbers=none]
       
   773 import io.Source
       
   774 val url = """http://www.inf.kcl.ac.uk/staff/urbanc/"""
       
   775 Source.fromURL(url)("ISO-8859-1").take(10000).mkString
       
   776 \end{lstlisting}
       
   777 
       
   778 
       
   779 \noindent These three lines return a string containing the
       
   780 HTML-code of my webpage. It actually already does something
       
   781 more sophisticated, namely only returns the first 10000
       
   782 characters of a webpage in case it is too large. Why is that
       
   783 code-snippet of any interest? Well, try implementing
       
   784 reading-from-a-webpage in Java. I also like the possibility of
       
   785 triple-quoting strings, which I have only seen in Scala so
       
   786 far. The idea behind this is that in such a string all
       
   787 characters are interpreted literally---there are no escaped
       
   788 characters, like \verb|\n| for newlines.
       
   789 
       
   790 My second wow-moment I had with a feature of Scala that other
       
   791 functional programming languages do not have. This feature is
       
   792 about implicit type conversions. If you have regular
       
   793 expressions and want to use them for language processing you
       
   794 often want to recognise keywords in a language, for example
       
   795 \code{for},{} \code{if},{} \code{yield} and so on. But the
       
   796 basic regular expression \code{CHAR} can only recognise a
       
   797 single character. In order to recognise a whole string, like
       
   798 \code{for}, you have to put many of those together using
       
   799 \code{SEQ}:
       
   800 
       
   801 
       
   802 \begin{lstlisting}[numbers=none]
       
   803 SEQ(CHAR('f'), SEQ(CHAR('o'), CHAR('r')))
       
   804 \end{lstlisting}
       
   805 
       
   806 \noindent This gets quickly unreadable when the strings and
       
   807 regular expressions get more complicated. In other functional
       
   808 programming languages, you can explicitly write a conversion
       
   809 function that takes a string, say \dq{\pcode{for}}, and
       
   810 generates the regular expression above. But then your code is
       
   811 littered with such conversion functions.
       
   812 
       
   813 In Scala you can do better by ``hiding'' the conversion
       
   814 functions. The keyword for doing this is \code{implicit} and
       
   815 it needs a built-in library called 
       
   816 
       
   817 \begin{lstlisting}[numbers=none]
       
   818 scala.language.implicitConversions
       
   819 \end{lstlisting}
       
   820 
       
   821 \noindent
       
   822 Consider the code
       
   823 
       
   824 
       
   825 \begin{lstlisting}[language=Scala]
       
   826 import scala.language.implicitConversions
       
   827 
       
   828 def charlist2rexp(s: List[Char]) : Rexp = s match {
       
   829   case Nil => EMPTY
       
   830   case c::Nil => CHAR(c)
       
   831   case c::s => SEQ(CHAR(c), charlist2rexp(s))
       
   832 }
       
   833 
       
   834 implicit def string2rexp(s: String) : Rexp = 
       
   835   charlist2rexp(s.toList)
       
   836 \end{lstlisting}
       
   837 
       
   838 
       
   839 \noindent where the first seven lines implement a function
       
   840 that given a list of characters generates the corresponding
       
   841 regular expression. In Lines 9 and 10, this function is used
       
   842 for transforming a string into a regular expression. Since the
       
   843 \code{string2rexp}-function is declared as \code{implicit},
       
   844 the effect will be that whenever Scala expects a regular
       
   845 expression, but I only give it a string, it will automatically
       
   846 insert a call to the \code{string2rexp}-function. I can now
       
   847 write for example
       
   848 
       
   849 \begin{lstlisting}[numbers=none]
       
   850 scala> ALT("ab", "ac")
       
   851 res9 = ALT(SEQ(CHAR(a),CHAR(b)),SEQ(CHAR(a),CHAR(c)))
       
   852 \end{lstlisting}
       
   853 
       
   854 \noindent Recall that \code{ALT} expects two regular
       
   855 expressions as arguments, but I only supply two strings. The
       
   856 implicit conversion function will transform the string into a
       
   857 regular expression.
       
   858 
       
   859 Using implicit definitions, Scala allows me to introduce
       
   860 some further syntactic sugar for regular expressions:
       
   861 
       
   862 
       
   863 \begin{lstlisting}[ numbers=none]
       
   864 implicit def RexpOps(r: Rexp) = new {
       
   865   def | (s: Rexp) = ALT(r, s)
       
   866   def ~ (s: Rexp) = SEQ(r, s)
       
   867   def % = STAR(r)
       
   868 }
       
   869 
       
   870 implicit def stringOps(s: String) = new {
       
   871   def | (r: Rexp) = ALT(s, r)
       
   872   def | (r: String) = ALT(s, r)
       
   873   def ~ (r: Rexp) = SEQ(s, r)
       
   874   def ~ (r: String) = SEQ(s, r)
       
   875   def % = STAR(s)
       
   876 }
       
   877 \end{lstlisting}
       
   878 
       
   879  
       
   880 \noindent This might seem a bit overly complicated, but its effect is
       
   881 that I can now write regular expressions such as $ab + ac$ 
       
   882 simply as
       
   883 
       
   884 
       
   885 \begin{lstlisting}[numbers=none]
       
   886 scala> "ab" | "ac"
       
   887 res10 = ALT(SEQ(CHAR(a),CHAR(b)),SEQ(CHAR(a),CHAR(c)))
       
   888 \end{lstlisting}
       
   889 
       
   890  
       
   891 \noindent I leave you to figure out what the other
       
   892 syntactic sugar in the code above stands for.
       
   893  
       
   894 One more useful feature of Scala is the ability to define
       
   895 functions with varying argument lists. This is a feature that
       
   896 is already present in old languages, like C, but seems to have
       
   897 been forgotten in the meantime---Java does not have it. In the
       
   898 context of regular expressions this feature comes in handy:
       
   899 Say you are fed up with writing many alternatives as
       
   900 
       
   901 
       
   902 \begin{lstlisting}[numbers=none]
       
   903 ALT(..., ALT(..., ALT(..., ...)))
       
   904 \end{lstlisting}
       
   905 
       
   906 
       
   907 \noindent To make it difficult, you do not know how deep such
       
   908 alternatives are nested. So you need something flexible that
       
   909 can take as many alternatives as needed. In Scala one can
       
   910 achieve this by adding a \code{*} to the type of an argument.
       
   911 Consider the code
       
   912 
       
   913 
       
   914 \begin{lstlisting}[language=Scala]
       
   915 def Alts(rs: List[Rexp]) : Rexp = rs match {
       
   916   case Nil => NULL
       
   917   case r::Nil => r
       
   918   case r::rs => ALT(r, Alts(rs))
       
   919 }
       
   920 
       
   921 def ALTS(rs: Rexp*) = Alts(rs.toList)
       
   922 \end{lstlisting}
       
   923 
       
   924 
       
   925 \noindent The function in Lines 1 to 5 takes a list of regular
       
   926 expressions and converts it into an appropriate alternative
       
   927 regular expression. In Line 7 there is a wrapper for this
       
   928 function which uses the feature of varying argument lists. The
       
   929 effect of this code  is that I can write the regular
       
   930 expression for keywords as
       
   931 
       
   932 
       
   933 \begin{lstlisting}[numbers=none]
       
   934 ALTS("for", "def", "yield", "implicit", "if", "match", "case")
       
   935 \end{lstlisting}
       
   936 
       
   937 
       
   938 \noindent Again I leave it to you to find out how much this
       
   939 simplifies the regular expression in comparison with if I had
       
   940 to write this by hand using only the ``plain'' regular
       
   941 expressions from the inductive datatype.
       
   942 
       
   943 \subsection*{More Info}
       
   944 
       
   945 There is much more to Scala than I can possibly describe in
       
   946 this document. Fortunately there are a number of free books
       
   947 about Scala and of course lots of help online. For example
       
   948 
       
   949 \begin{itemize}
       
   950 \item \url{http://www.scala-lang.org/docu/files/ScalaByExample.pdf}
       
   951 \item \url{http://www.scala-lang.org/docu/files/ScalaTutorial.pdf}
       
   952 \item \url{https://www.youtube.com/user/ShadowofCatron}
       
   953 \item \url{http://docs.scala-lang.org/tutorials}
       
   954 \item \url{https://www.scala-exercises.org}
       
   955 \end{itemize}
       
   956 
       
   957 \noindent There is also a course at Coursera on Functional
       
   958 Programming Principles in Scala by Martin Odersky, the main
       
   959 developer of the Scala language. And a document that explains
       
   960 Scala for Java programmers
       
   961 
       
   962 \begin{itemize}
       
   963 \item \small\url{http://docs.scala-lang.org/tutorials/scala-for-java-programmers.html}
       
   964 \end{itemize}
       
   965 
       
   966 While I am quite enthusiastic about Scala, I am also happy to
       
   967 admit that it has more than its fair share of faults. The
       
   968 problem seen earlier of having to give an explicit type to
       
   969 \code{toSet}, but not \code{toList} is one of them. There are
       
   970 also many ``deep'' ideas about types in Scala, which even to
       
   971 me as seasoned functional programmer are puzzling. Whilst
       
   972 implicits are great, they can also be a source of great
       
   973 headaches, for example consider the code:
       
   974 
       
   975 \begin{lstlisting}[numbers=none]
       
   976 scala>  List (1, 2, 3) contains "your mom"
       
   977 res1: Boolean = false
       
   978 \end{lstlisting}
       
   979 
       
   980 \noindent Rather than returning \code{false}, this code should
       
   981 throw a typing-error. There are also many limitations Scala
       
   982 inherited from the JVM that can be really annoying. For
       
   983 example a fixed stack size. One can work around this
       
   984 particular limitation, but why does one have to?
       
   985 More such `puzzles' can be found at
       
   986 
       
   987 \begin{center}
       
   988   \url{http://scalapuzzlers.com} and
       
   989   \url{http://latkin.org/blog/2017/05/02/when-the-scala-compiler-doesnt-help/}
       
   990 \end{center}
       
   991 
       
   992 Even if Scala has been a success in several high-profile
       
   993 companies, there is also a company (Yammer) that first used
       
   994 Scala in their production code, but then moved away from it.
       
   995 Allegedly they did not like the steep learning curve of Scala
       
   996 and also that new versions of Scala often introduced
       
   997 incompatibilities in old code. In the past two months
       
   998 there have also been two forks of the Scala compiler.
       
   999 It needs to be seen what the future brings for Scala.
       
  1000 
       
  1001 So all in all, Scala might not be a great teaching language,
       
  1002 but I hope this is mitigated by the fact that I never require
       
  1003 you to write any Scala code. You only need to be able to read
       
  1004 it. In the coursework you can use any programming language you
       
  1005 like. If you want to use Scala for this, then be my guest; if
       
  1006 you do not want, stick with the language you are most familiar
       
  1007 with.
       
  1008 
       
  1009 
       
  1010 
       
  1011 \end{document}
       
  1012 
       
  1013 %%% Local Variables: 
       
  1014 %%% mode: latex
       
  1015 %%% TeX-master: t
       
  1016 %%% End: