diff -r fc2e3609d5e5 -r d3e477fe6c66 handouts/notation.tex --- a/handouts/notation.tex Mon Jul 20 10:06:43 2020 +0100 +++ b/handouts/notation.tex Mon Jul 20 16:46:30 2020 +0100 @@ -5,7 +5,7 @@ \begin{document} -\fnote{\copyright{} Christian Urban, King's College London, 2014, 2015, 2016, 2017, 2018} +\fnote{\copyright{} Christian Urban, King's College London, 2014, 2015, 2016, 2017, 2018, 2020} \section*{A Crash-Course on Notation} @@ -21,11 +21,11 @@ \subsubsection*{Characters and Strings} -The most important concept in this module are strings. Strings +The most basic concept in this module are strings. Strings are composed of \defn{characters}. While characters are surely a familiar concept, we will make one subtle distinction in this module. If we want to refer to concrete characters, like -\code{a}, \code{b}, \code{c} and so on, we use a typewriter font. +\code{a}, \code{b}, \code{c} and so on, we will use a typewriter font. Accordingly if we want to refer to the concrete characters of my email address we shall write @@ -55,8 +55,9 @@ for example the character $a$ is not equal to $b$ and so on. Why do I make this distinction? Because we often need to define functions using variables ranging over characters. We -need to somehow say this is a variable, say $c$, ranging over -characters, while this is the actual character \pcode{c}. +need to somehow say ``this-is-a-variable'' and give it a name. +In such cases we use the italic font. + An \defn{alphabet} is a (non-empty) finite set of characters. Often the letter $\Sigma$ is used to refer to an alphabet. For @@ -78,7 +79,7 @@ for lists of characters. This is because strings can be efficiently represented in memory, unlike lists. Since \code{String} and the type of lists of characters -(\code{List[Char]}) are not the same, we need to explicitly +(namely \code{List[Char]}) are not the same, we need to explicitly coerce elements between the two types, for example \begin{lstlisting}[numbers=none] @@ -104,7 +105,7 @@ sometimes written as \dq{} but also as the empty list of characters $[\,]$.\footnote{In the literature you can also often find that $\varepsilon$ or $\lambda$ is used to -represent the empty string.} +represent the empty string. But we are not going to use this notation.} Two strings, say $s_1$ and $s_2$, can be \defn{concatenated}, which we write as $s_1 @ s_2$. If we regard $s_1$ and $s_2$ as @@ -114,14 +115,23 @@ \dq{\textit{foo}} $@$ \dq{\textit{bar}}, gives \dq{\textit{foobar}}. But as said above, we will often simplify our life and just drop the double quotes whenever it -is clear we are talking about strings. So we will often just -write \textit{foo}, \textit{bar}, \textit{foobar} or -\textit{foo $@$ bar}. +is clear we are talking about strings. So we will just +write \textit{foo}, \textit{bar}, \textit{foobar} +\textit{foo $@$ bar} and so on. Occasionally we will use the notation $a^n$ for strings, which stands for the string of $n$ repeated $a$s. So $a^{n}b^{n}$ is a string that -has some number of $a$s followed by the same number of $b$s. A simple -property of string concatenation is \emph{associativity}, meaning +has some number of $a$s followed by the same number of $b$s. +Confusingly, in Scala the notation is ``times'' for this opration. +So you can write + +\begin{lstlisting}[numbers=none] +scala> "a" * 13 +val res02: String = aaaaaaaaaaaaa +\end{lstlisting} + +\noindent +A simple property of string concatenation is \emph{associativity}, meaning \[(s_1 @ s_2) @ s_3 = s_1 @ (s_2 @ s_3)\] @@ -185,7 +195,19 @@ \noindent In general set comprehensions are of the form $\{a\;|\;P\}$ which stands for the set of all elements $a$ -(from some set) for which some property $P$ holds. +(from some set) for which some property $P$ holds. If programming +is more your-kind-of-thing, you might recognise the similarities +with for-comprehensions, for example for the silly set above you +could write + +\begin{lstlisting}[numbers=none] +scala> for (n <- (0 to 10).toSet; if n * n < 9) yield n +val res03: Set[Int] = Set(0, 1, 2) +\end{lstlisting} + +\noindent +This is pretty much the same as $\{n\;|\; n \in \mathbb{N} \wedge n^2 < 9\}$ +just in Scala syntax. For defining sets, we will also often use the notion of the ``big union''. An example is as follows: @@ -234,7 +256,7 @@ contain actually the same amount of elements. Does this make sense? Though this might all look strange, infinite sets will be a topic that is very relevant to the material of this module. It tells -us what we can compute with a computer (actually algorithm) and what +us what we can compute with a computer (actually an algorithm) and what we cannot. But during the first 9 lectures we can go by without this ``weird'' stuff. End of aside.\smallskip @@ -248,7 +270,7 @@ is not equal to $\varnothing$, the empty language (or empty set): The former contains one element, namely \dq{} (also written $[\,]$), but the latter does not contain any -element. +element at all! Make sure you see the difference. For languages we define the operation of \defn{language concatenation}, written like in the string case as $A @ B$: @@ -267,7 +289,18 @@ \{abzzz, abqq, abr, aczzz, acqq, acr\} \] -\noindent Recall the properties for string concatenation. For +\noindent The cool thing about Scala is that we can define language +concatenation very elegantly as + +\begin{lstlisting}[numbers=none] +def concat(A: Set[String], B: Set[String]) = + for (x <- A ; y <- B) yield x ++ y +\end{lstlisting} + +\noindent +where \code{++} is string concatenation in Scala. + +Recall the properties for string concatenation. For language concatenation we have the following properties \begin{center} @@ -280,8 +313,9 @@ \end{center} \noindent Note the difference in the last two lines: the empty -set behaves like $0$ for multiplication and the set $\{[]\}$ -like $1$ for multiplication ($n * 1 = n$ and $n * 0 = 0$). +set behaves like $0$ for multiplication, whereas the set $\{[]\}$ +behaves like $1$ for multiplication ($n * 1 = n$ and $n * 0 = 0$). +Again this is a subtletly you need to get compfortable with. Using the operation of language concatenation, we can define a \defn{language power} operation as follows: @@ -307,7 +341,8 @@ \end{equation} \noindent This star operation is often also called -\emph{Kleene-star}. Unfolding the definition in~\eqref{star} +\emph{Kleene-star} after the mathematician/computer scientist +Stephen Cole Kleene. Unfolding the definition in~\eqref{star} gives \[ @@ -329,16 +364,22 @@ Recall that an alphabet is often referred to by the letter $\Sigma$. We can now write for the set of \emph{all} strings over this alphabet as $\Sigma\star$. In doing so we also include the -empty string as a possible string over $\Sigma$. So if $\Sigma +empty string as a possible string (over $\Sigma$). Assuming $\Sigma = \{a, b\}$, then $\Sigma\star$ is \[ \{[], a, b, aa, ab, ba, bb, aaa, aab, aba, abb, baa, bab, \ldots\} \] -\noindent or in other words all strings containing $a$s and +\noindent or in words all strings containing $a$s and $b$s only, plus the empty string. +\bigskip +\noindent +Thanks for making it until here! There are also some personal conventions +about regular expressions. But I will explain them in the handout for the +first week. An exercise you can do: Implement the power operation for languages +and try out some examples. \end{document} %%% Local Variables: