|     19 readable and understandable), but other times we need pedantic |     19 readable and understandable), but other times we need pedantic | 
|     20 precision for actual programs. |     20 precision for actual programs. | 
|     21  |     21  | 
|     22 \subsubsection*{Characters and Strings} |     22 \subsubsection*{Characters and Strings} | 
|     23  |     23  | 
|     24 The most important concept in this module are strings. Strings |     24 The most basic concept in this module are strings. Strings | 
|     25 are composed of \defn{characters}. While characters are surely |     25 are composed of \defn{characters}. While characters are surely | 
|     26 a familiar concept, we will make one subtle distinction in |     26 a familiar concept, we will make one subtle distinction in | 
|     27 this module. If we want to refer to concrete characters, like |     27 this module. If we want to refer to concrete characters, like | 
|     28 \code{a}, \code{b}, \code{c} and so on, we use a typewriter font. |     28 \code{a}, \code{b}, \code{c} and so on, we will use a typewriter font. | 
|     29 Accordingly if we want to refer to the concrete characters of |     29 Accordingly if we want to refer to the concrete characters of | 
|     30 my email address we shall write |     30 my email address we shall write | 
|     31  |     31  | 
|     32 \begin{center} |     32 \begin{center} | 
|     33 \pcode{christian.urban@kcl.ac.uk} |     33 \pcode{christian.urban@kcl.ac.uk} | 
|     53 \noindent In this string, we do not really care what the |     53 \noindent In this string, we do not really care what the | 
|     54 characters stand for, except we do care about the fact that |     54 characters stand for, except we do care about the fact that | 
|     55 for example the character $a$ is not equal to $b$ and so on. |     55 for example the character $a$ is not equal to $b$ and so on. | 
|     56 Why do I make this distinction? Because we often need to |     56 Why do I make this distinction? Because we often need to | 
|     57 define functions using variables ranging over characters. We |     57 define functions using variables ranging over characters. We | 
|     58 need to somehow say this is a variable, say $c$, ranging over |     58 need to somehow say ``this-is-a-variable'' and give it a name.  | 
|     59 characters, while this is the actual character \pcode{c}. |     59 In such cases we use the italic font. | 
|         |     60  | 
|     60  |     61  | 
|     61 An \defn{alphabet} is a (non-empty) finite set of characters. |     62 An \defn{alphabet} is a (non-empty) finite set of characters. | 
|     62 Often the letter $\Sigma$ is used to refer to an alphabet. For |     63 Often the letter $\Sigma$ is used to refer to an alphabet. For | 
|     63 example the ASCII characters \pcode{a} to \pcode{z} form an |     64 example the ASCII characters \pcode{a} to \pcode{z} form an | 
|     64 alphabet. The digits $0$ to $9$ are another alphabet. The |     65 alphabet. The digits $0$ to $9$ are another alphabet. The | 
|     76 typed programming languages, such as Scala, strings have a special |     77 typed programming languages, such as Scala, strings have a special | 
|     77 type---namely \pcode{String} which is different from the type |     78 type---namely \pcode{String} which is different from the type | 
|     78 for lists of characters. This is because strings can be |     79 for lists of characters. This is because strings can be | 
|     79 efficiently represented in memory, unlike lists. Since |     80 efficiently represented in memory, unlike lists. Since | 
|     80 \code{String} and the type of lists of characters |     81 \code{String} and the type of lists of characters | 
|     81 (\code{List[Char]}) are not the same, we need to explicitly |     82 (namely \code{List[Char]}) are not the same, we need to explicitly | 
|     82 coerce elements between the two types, for example |     83 coerce elements between the two types, for example | 
|     83  |     84  | 
|     84 \begin{lstlisting}[numbers=none] |     85 \begin{lstlisting}[numbers=none] | 
|     85 scala> "abc".toList |     86 scala> "abc".toList | 
|     86 res01: List[Char] = List(a, b, c) |     87 res01: List[Char] = List(a, b, c) | 
|    102 say \dq{\textit{ello}} when making definitions about strings. |    103 say \dq{\textit{ello}} when making definitions about strings. | 
|    103 There are also some subtleties with the empty string, |    104 There are also some subtleties with the empty string, | 
|    104 sometimes written as \dq{} but also as the empty list of |    105 sometimes written as \dq{} but also as the empty list of | 
|    105 characters $[\,]$.\footnote{In the literature you can also |    106 characters $[\,]$.\footnote{In the literature you can also | 
|    106 often find that $\varepsilon$ or $\lambda$ is used to |    107 often find that $\varepsilon$ or $\lambda$ is used to | 
|    107 represent the empty string.}  |    108 represent the empty string. But we are not going to use this notation.}  | 
|    108  |    109  | 
|    109 Two strings, say $s_1$ and $s_2$, can be \defn{concatenated}, |    110 Two strings, say $s_1$ and $s_2$, can be \defn{concatenated}, | 
|    110 which we write as $s_1 @ s_2$. If we regard $s_1$ and $s_2$ as |    111 which we write as $s_1 @ s_2$. If we regard $s_1$ and $s_2$ as | 
|    111 lists of characters, then $@$ is the list-append function. |    112 lists of characters, then $@$ is the list-append function. | 
|    112 Suppose we are given two strings \dq{\textit{foo}} and |    113 Suppose we are given two strings \dq{\textit{foo}} and | 
|    113 \dq{\textit{bar}}, then their concatenation, written |    114 \dq{\textit{bar}}, then their concatenation, written | 
|    114 \dq{\textit{foo}} $@$ \dq{\textit{bar}}, gives |    115 \dq{\textit{foo}} $@$ \dq{\textit{bar}}, gives | 
|    115 \dq{\textit{foobar}}. But as said above, we will often |    116 \dq{\textit{foobar}}. But as said above, we will often | 
|    116 simplify our life and just drop the double quotes whenever it |    117 simplify our life and just drop the double quotes whenever it | 
|    117 is clear we are talking about strings. So we will often just |    118 is clear we are talking about strings. So we will just | 
|    118 write \textit{foo}, \textit{bar}, \textit{foobar} or |    119 write \textit{foo}, \textit{bar}, \textit{foobar}  | 
|    119 \textit{foo $@$ bar}. |    120 \textit{foo $@$ bar} and so on. | 
|    120  |    121  | 
|    121 Occasionally we will use the notation $a^n$ for strings, which stands |    122 Occasionally we will use the notation $a^n$ for strings, which stands | 
|    122 for the string of $n$ repeated $a$s. So $a^{n}b^{n}$ is a string that |    123 for the string of $n$ repeated $a$s. So $a^{n}b^{n}$ is a string that | 
|    123 has some number of $a$s followed by the same number of $b$s.  A simple |    124 has some number of $a$s followed by the same number of $b$s. | 
|    124 property of string concatenation is \emph{associativity}, meaning |    125 Confusingly, in Scala the notation is ``times'' for this opration. | 
|         |    126 So you can write  | 
|         |    127  | 
|         |    128 \begin{lstlisting}[numbers=none] | 
|         |    129 scala> "a" * 13 | 
|         |    130 val res02: String = aaaaaaaaaaaaa | 
|         |    131 \end{lstlisting} | 
|         |    132  | 
|         |    133 \noindent | 
|         |    134 A simple property of string concatenation is \emph{associativity}, meaning | 
|    125  |    135  | 
|    126 \[(s_1 @ s_2) @ s_3 = s_1 @ (s_2 @ s_3)\]   |    136 \[(s_1 @ s_2) @ s_3 = s_1 @ (s_2 @ s_3)\]   | 
|    127  |    137  | 
|    128 \noindent are always equal strings. The empty string behaves |    138 \noindent are always equal strings. The empty string behaves | 
|    129 like a \emph{unit element}, therefore |    139 like a \emph{unit element}, therefore | 
|    183 A \backslash B & \dn & \{x\;|\; x \in A \wedge x \not\in B\}  |    193 A \backslash B & \dn & \{x\;|\; x \in A \wedge x \not\in B\}  | 
|    184 \end{eqnarray*} |    194 \end{eqnarray*} | 
|    185  |    195  | 
|    186 \noindent In general set comprehensions are of the form |    196 \noindent In general set comprehensions are of the form | 
|    187 $\{a\;|\;P\}$ which stands for the set of all elements $a$ |    197 $\{a\;|\;P\}$ which stands for the set of all elements $a$ | 
|    188 (from some set) for which some property $P$ holds. |    198 (from some set) for which some property $P$ holds. If programming | 
|         |    199 is more your-kind-of-thing, you might recognise the similarities | 
|         |    200 with for-comprehensions, for example for the silly set above you | 
|         |    201 could write | 
|         |    202  | 
|         |    203 \begin{lstlisting}[numbers=none] | 
|         |    204 scala> for (n <- (0 to 10).toSet; if n * n < 9) yield n | 
|         |    205 val res03: Set[Int] = Set(0, 1, 2) | 
|         |    206 \end{lstlisting} | 
|         |    207  | 
|         |    208 \noindent | 
|         |    209 This is pretty much the same as $\{n\;|\;  n \in \mathbb{N} \wedge n^2 < 9\}$ | 
|         |    210 just in Scala syntax. | 
|    189   |    211   | 
|    190 For defining sets, we will also often use the notion of the |    212 For defining sets, we will also often use the notion of the | 
|    191 ``big union''. An example is as follows: |    213 ``big union''. An example is as follows: | 
|    192  |    214  | 
|    193 \begin{equation}\label{bigunion} |    215 \begin{equation}\label{bigunion} | 
|    232  |    254  | 
|    233 \noindent |    255 \noindent | 
|    234 contain actually the same amount of elements. Does this make sense? |    256 contain actually the same amount of elements. Does this make sense? | 
|    235 Though this might all look strange, infinite sets will be a |    257 Though this might all look strange, infinite sets will be a | 
|    236 topic that is very relevant to the material of this module. It tells |    258 topic that is very relevant to the material of this module. It tells | 
|    237 us what we can compute with a computer (actually algorithm) and what |    259 us what we can compute with a computer (actually an algorithm) and what | 
|    238 we cannot. But during the first 9 lectures we can go by without this |    260 we cannot. But during the first 9 lectures we can go by without this | 
|    239 ``weird'' stuff. End of aside.\smallskip |    261 ``weird'' stuff. End of aside.\smallskip | 
|    240  |    262  | 
|    241 Another important notion in this module are \defn{languages}, which |    263 Another important notion in this module are \defn{languages}, which | 
|    242 are sets of strings. One of the main goals for us will be how to |    264 are sets of strings. One of the main goals for us will be how to | 
|    246 hardness is meant in terms of Turing decidable, for example.} |    268 hardness is meant in terms of Turing decidable, for example.} | 
|    247 Note that the language containing the empty string $\{\dq{}\}$ |    269 Note that the language containing the empty string $\{\dq{}\}$ | 
|    248 is not equal to $\varnothing$, the empty language (or empty |    270 is not equal to $\varnothing$, the empty language (or empty | 
|    249 set): The former contains one element, namely \dq{} (also |    271 set): The former contains one element, namely \dq{} (also | 
|    250 written $[\,]$), but the latter does not contain any |    272 written $[\,]$), but the latter does not contain any | 
|    251 element. |    273 element at all! Make sure you see the difference. | 
|    252  |    274  | 
|    253 For languages we define the operation of \defn{language |    275 For languages we define the operation of \defn{language | 
|    254 concatenation}, written like in the string case as $A @ B$: |    276 concatenation}, written like in the string case as $A @ B$: | 
|    255  |    277  | 
|    256 \begin{equation}\label{langconc} |    278 \begin{equation}\label{langconc} | 
|    265  |    287  | 
|    266 \[ |    288 \[ | 
|    267 \{abzzz, abqq, abr, aczzz, acqq, acr\} |    289 \{abzzz, abqq, abr, aczzz, acqq, acr\} | 
|    268 \]  |    290 \]  | 
|    269  |    291  | 
|    270 \noindent Recall the properties for string concatenation. For |    292 \noindent The cool thing about Scala is that we can define language | 
|         |    293 concatenation very elegantly as | 
|         |    294  | 
|         |    295 \begin{lstlisting}[numbers=none] | 
|         |    296 def concat(A: Set[String], B: Set[String]) = | 
|         |    297   for (x <- A ; y <- B) yield x ++ y | 
|         |    298 \end{lstlisting} | 
|         |    299  | 
|         |    300 \noindent | 
|         |    301 where \code{++} is string concatenation in Scala. | 
|         |    302  | 
|         |    303 Recall the properties for string concatenation. For | 
|    271 language concatenation we have the following properties |    304 language concatenation we have the following properties | 
|    272  |    305  | 
|    273 \begin{center} |    306 \begin{center} | 
|    274 \begin{tabular}{ll} |    307 \begin{tabular}{ll} | 
|    275 associativity: & $(A @ B) @ C = A @ (B @ C)$\\ |    308 associativity: & $(A @ B) @ C = A @ (B @ C)$\\ | 
|    327 what $\{[]\}\star$ and $\varnothing\star$ are? |    362 what $\{[]\}\star$ and $\varnothing\star$ are? | 
|    328  |    363  | 
|    329 Recall that an alphabet is often referred to by the letter |    364 Recall that an alphabet is often referred to by the letter | 
|    330 $\Sigma$. We can now write for the set of \emph{all} strings |    365 $\Sigma$. We can now write for the set of \emph{all} strings | 
|    331 over this alphabet as $\Sigma\star$. In doing so we also include the |    366 over this alphabet as $\Sigma\star$. In doing so we also include the | 
|    332 empty string as a possible string over $\Sigma$. So if $\Sigma |    367 empty string as a possible string (over $\Sigma$). Assuming $\Sigma | 
|    333 = \{a, b\}$, then $\Sigma\star$ is |    368 = \{a, b\}$, then $\Sigma\star$ is | 
|    334  |    369  | 
|    335 \[ |    370 \[ | 
|    336 \{[], a, b, aa, ab, ba, bb, aaa, aab, aba, abb, baa, bab, \ldots\} |    371 \{[], a, b, aa, ab, ba, bb, aaa, aab, aba, abb, baa, bab, \ldots\} | 
|    337 \] |    372 \] | 
|    338  |    373  | 
|    339 \noindent or in other words all strings containing $a$s and |    374 \noindent or in words all strings containing $a$s and | 
|    340 $b$s only, plus the empty string. |    375 $b$s only, plus the empty string. | 
|    341  |    376 \bigskip | 
|         |    377  | 
|         |    378 \noindent | 
|         |    379 Thanks for making it until here! There are also some personal conventions | 
|         |    380 about regular expressions. But I will explain them in the handout for the | 
|         |    381 first week. An exercise you can do: Implement the power operation for languages | 
|         |    382 and try out some examples. | 
|    342 \end{document} |    383 \end{document} | 
|    343  |    384  | 
|    344 %%% Local Variables:  |    385 %%% Local Variables:  | 
|    345 %%% mode: latex |    386 %%% mode: latex | 
|    346 %%% TeX-master: t |    387 %%% TeX-master: t |