handouts/notation.tex
author Christian Urban <christian.urban@kcl.ac.uk>
Sat, 09 Nov 2024 06:31:45 +0000
changeset 973 f25d338d16c9
parent 875 49d21814a633
permissions -rw-r--r--
updated
Ignore whitespace changes - Everywhere: Within whitespace: At end of lines:
239
68d98140b90b added notation handout
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents:
diff changeset
     1
\documentclass{article}
68d98140b90b added notation handout
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents:
diff changeset
     2
\usepackage{../style}
68d98140b90b added notation handout
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents:
diff changeset
     3
\usepackage{../langs}
830
dbf9d710ce65 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents: 736
diff changeset
     4
\usepackage{../graphics}
239
68d98140b90b added notation handout
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents:
diff changeset
     5
68d98140b90b added notation handout
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents:
diff changeset
     6
68d98140b90b added notation handout
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents:
diff changeset
     7
\begin{document}
736
d3e477fe6c66 updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 550
diff changeset
     8
\fnote{\copyright{} Christian Urban, King's College London, 2014, 2015, 2016, 2017, 2018, 2020}
505
5b9cf7fbd51a updated
Christian Urban <urbanc@in.tum.de>
parents: 502
diff changeset
     9
239
68d98140b90b added notation handout
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents:
diff changeset
    10
68d98140b90b added notation handout
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents:
diff changeset
    11
\section*{A Crash-Course on Notation}
68d98140b90b added notation handout
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents:
diff changeset
    12
875
49d21814a633 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents: 830
diff changeset
    13
There are an innumerable number of books available on compilers, automata theory
476
d922cc83b70c updated
Christian Urban <urbanc@in.tum.de>
parents: 404
diff changeset
    14
and formal languages. Unfortunately, they often use their own
d922cc83b70c updated
Christian Urban <urbanc@in.tum.de>
parents: 404
diff changeset
    15
notational conventions and their own symbols. This handout is meant to
502
bf1c472b244e updated
Christian Urban <urbanc@in.tum.de>
parents: 496
diff changeset
    16
clarify some of the notation I will use. I apologise in advance that
476
d922cc83b70c updated
Christian Urban <urbanc@in.tum.de>
parents: 404
diff changeset
    17
sometimes I will be a bit fuzzy\ldots the problem is that often we
d922cc83b70c updated
Christian Urban <urbanc@in.tum.de>
parents: 404
diff changeset
    18
want to have convenience in our mathematical definitions (to make them
d922cc83b70c updated
Christian Urban <urbanc@in.tum.de>
parents: 404
diff changeset
    19
readable and understandable), but other times we need pedantic
d922cc83b70c updated
Christian Urban <urbanc@in.tum.de>
parents: 404
diff changeset
    20
precision for actual programs.
241
10f02605a46a polished
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 239
diff changeset
    21
239
68d98140b90b added notation handout
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents:
diff changeset
    22
\subsubsection*{Characters and Strings}
68d98140b90b added notation handout
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents:
diff changeset
    23
736
d3e477fe6c66 updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 550
diff changeset
    24
The most basic concept in this module are strings. Strings
241
10f02605a46a polished
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 239
diff changeset
    25
are composed of \defn{characters}. While characters are surely
10f02605a46a polished
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 239
diff changeset
    26
a familiar concept, we will make one subtle distinction in
10f02605a46a polished
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 239
diff changeset
    27
this module. If we want to refer to concrete characters, like
736
d3e477fe6c66 updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 550
diff changeset
    28
\code{a}, \code{b}, \code{c} and so on, we will use a typewriter font.
241
10f02605a46a polished
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 239
diff changeset
    29
Accordingly if we want to refer to the concrete characters of
10f02605a46a polished
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 239
diff changeset
    30
my email address we shall write
239
68d98140b90b added notation handout
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents:
diff changeset
    31
68d98140b90b added notation handout
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents:
diff changeset
    32
\begin{center}
68d98140b90b added notation handout
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents:
diff changeset
    33
\pcode{christian.urban@kcl.ac.uk}
68d98140b90b added notation handout
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents:
diff changeset
    34
\end{center}
68d98140b90b added notation handout
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents:
diff changeset
    35
241
10f02605a46a polished
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 239
diff changeset
    36
\noindent If we also need to explicitly indicate the ``space''
239
68d98140b90b added notation handout
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents:
diff changeset
    37
character, we write \VS{}\hspace{1mm}. For example
68d98140b90b added notation handout
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents:
diff changeset
    38
68d98140b90b added notation handout
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents:
diff changeset
    39
\begin{center}
68d98140b90b added notation handout
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents:
diff changeset
    40
\tt{}hello\VS\hspace{0.5mm}world
68d98140b90b added notation handout
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents:
diff changeset
    41
\end{center}
68d98140b90b added notation handout
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents:
diff changeset
    42
68d98140b90b added notation handout
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents:
diff changeset
    43
241
10f02605a46a polished
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 239
diff changeset
    44
\noindent But often we do not care which particular characters
10f02605a46a polished
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 239
diff changeset
    45
we use. In such cases we use the italic font and write $a$,
332
4755ad4b457b updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 266
diff changeset
    46
$b$, $c$ and so on for characters. Therefore if we need a
239
68d98140b90b added notation handout
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents:
diff changeset
    47
representative string, we might write
68d98140b90b added notation handout
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents:
diff changeset
    48
404
245d302791c7 updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 398
diff changeset
    49
\[
239
68d98140b90b added notation handout
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents:
diff changeset
    50
abracadabra
404
245d302791c7 updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 398
diff changeset
    51
\]
239
68d98140b90b added notation handout
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents:
diff changeset
    52
241
10f02605a46a polished
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 239
diff changeset
    53
\noindent In this string, we do not really care what the
10f02605a46a polished
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 239
diff changeset
    54
characters stand for, except we do care about the fact that
10f02605a46a polished
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 239
diff changeset
    55
for example the character $a$ is not equal to $b$ and so on.
404
245d302791c7 updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 398
diff changeset
    56
Why do I make this distinction? Because we often need to
245d302791c7 updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 398
diff changeset
    57
define functions using variables ranging over characters. We
736
d3e477fe6c66 updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 550
diff changeset
    58
need to somehow say ``this-is-a-variable'' and give it a name. 
d3e477fe6c66 updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 550
diff changeset
    59
In such cases we use the italic font.
d3e477fe6c66 updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 550
diff changeset
    60
239
68d98140b90b added notation handout
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents:
diff changeset
    61
241
10f02605a46a polished
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 239
diff changeset
    62
An \defn{alphabet} is a (non-empty) finite set of characters.
10f02605a46a polished
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 239
diff changeset
    63
Often the letter $\Sigma$ is used to refer to an alphabet. For
10f02605a46a polished
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 239
diff changeset
    64
example the ASCII characters \pcode{a} to \pcode{z} form an
10f02605a46a polished
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 239
diff changeset
    65
alphabet. The digits $0$ to $9$ are another alphabet. The
10f02605a46a polished
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 239
diff changeset
    66
Greek letters $\alpha$ to $\omega$ also form an alphabet. If
10f02605a46a polished
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 239
diff changeset
    67
nothing else is specified, we usually assume the alphabet
10f02605a46a polished
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 239
diff changeset
    68
consists of just the lower-case letters $a$, $b$, \ldots, $z$.
10f02605a46a polished
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 239
diff changeset
    69
Sometimes, however, we explicitly want to restrict strings to
10f02605a46a polished
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 239
diff changeset
    70
contain only the letters $a$ and $b$, for example. In this
10f02605a46a polished
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 239
diff changeset
    71
case we will state that the alphabet is the set $\{a, b\}$. 
239
68d98140b90b added notation handout
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents:
diff changeset
    72
68d98140b90b added notation handout
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents:
diff changeset
    73
\defn{Strings} are lists of characters. Unfortunately, there
68d98140b90b added notation handout
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents:
diff changeset
    74
are many ways how we can write down strings. In programming
496
5c9de27a5b30 updated
Christian Urban <urbanc@in.tum.de>
parents: 476
diff changeset
    75
languages, they are usually written as \dq{\texttt{hello}} where the
404
245d302791c7 updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 398
diff changeset
    76
double quotes indicate that we are dealing with a string. In
496
5c9de27a5b30 updated
Christian Urban <urbanc@in.tum.de>
parents: 476
diff changeset
    77
typed programming languages, such as Scala, strings have a special
404
245d302791c7 updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 398
diff changeset
    78
type---namely \pcode{String} which is different from the type
502
bf1c472b244e updated
Christian Urban <urbanc@in.tum.de>
parents: 496
diff changeset
    79
for lists of characters. This is because strings can be
bf1c472b244e updated
Christian Urban <urbanc@in.tum.de>
parents: 496
diff changeset
    80
efficiently represented in memory, unlike lists. Since
bf1c472b244e updated
Christian Urban <urbanc@in.tum.de>
parents: 496
diff changeset
    81
\code{String} and the type of lists of characters
736
d3e477fe6c66 updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 550
diff changeset
    82
(namely \code{List[Char]}) are not the same, we need to explicitly
404
245d302791c7 updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 398
diff changeset
    83
coerce elements between the two types, for example
245d302791c7 updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 398
diff changeset
    84
245d302791c7 updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 398
diff changeset
    85
\begin{lstlisting}[numbers=none]
245d302791c7 updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 398
diff changeset
    86
scala> "abc".toList
245d302791c7 updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 398
diff changeset
    87
res01: List[Char] = List(a, b, c)
245d302791c7 updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 398
diff changeset
    88
\end{lstlisting}
245d302791c7 updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 398
diff changeset
    89
502
bf1c472b244e updated
Christian Urban <urbanc@in.tum.de>
parents: 496
diff changeset
    90
\noindent
bf1c472b244e updated
Christian Urban <urbanc@in.tum.de>
parents: 496
diff changeset
    91
However, we do not want to do this kind of explicit coercion in our
bf1c472b244e updated
Christian Urban <urbanc@in.tum.de>
parents: 496
diff changeset
    92
pencil-and-paper, everyday arguments.  So in our (mathematical)
550
71fc4a7a7039 updated
Christian Urban <urbanc@in.tum.de>
parents: 505
diff changeset
    93
definitions we regard strings as lists of characters and we will also
71fc4a7a7039 updated
Christian Urban <urbanc@in.tum.de>
parents: 505
diff changeset
    94
write \dq{$hello$} as list
239
68d98140b90b added notation handout
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents:
diff changeset
    95
68d98140b90b added notation handout
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents:
diff changeset
    96
\[
404
245d302791c7 updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 398
diff changeset
    97
[\text{\it h, e, l, l, o}] \qquad\text{or simply}\qquad \textit{hello}
239
68d98140b90b added notation handout
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents:
diff changeset
    98
\]
68d98140b90b added notation handout
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents:
diff changeset
    99
68d98140b90b added notation handout
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents:
diff changeset
   100
\noindent The important point is that we can always decompose
68d98140b90b added notation handout
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents:
diff changeset
   101
such strings. For example, we will often consider the first
68d98140b90b added notation handout
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents:
diff changeset
   102
character of a string, say $h$, and the ``rest'' of a string
68d98140b90b added notation handout
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents:
diff changeset
   103
say \dq{\textit{ello}} when making definitions about strings.
241
10f02605a46a polished
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 239
diff changeset
   104
There are also some subtleties with the empty string,
10f02605a46a polished
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 239
diff changeset
   105
sometimes written as \dq{} but also as the empty list of
10f02605a46a polished
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 239
diff changeset
   106
characters $[\,]$.\footnote{In the literature you can also
10f02605a46a polished
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 239
diff changeset
   107
often find that $\varepsilon$ or $\lambda$ is used to
736
d3e477fe6c66 updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 550
diff changeset
   108
represent the empty string. But we are not going to use this notation.} 
241
10f02605a46a polished
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 239
diff changeset
   109
10f02605a46a polished
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 239
diff changeset
   110
Two strings, say $s_1$ and $s_2$, can be \defn{concatenated},
404
245d302791c7 updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 398
diff changeset
   111
which we write as $s_1 @ s_2$. If we regard $s_1$ and $s_2$ as
245d302791c7 updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 398
diff changeset
   112
lists of characters, then $@$ is the list-append function.
245d302791c7 updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 398
diff changeset
   113
Suppose we are given two strings \dq{\textit{foo}} and
502
bf1c472b244e updated
Christian Urban <urbanc@in.tum.de>
parents: 496
diff changeset
   114
\dq{\textit{bar}}, then their concatenation, written
404
245d302791c7 updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 398
diff changeset
   115
\dq{\textit{foo}} $@$ \dq{\textit{bar}}, gives
245d302791c7 updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 398
diff changeset
   116
\dq{\textit{foobar}}. But as said above, we will often
245d302791c7 updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 398
diff changeset
   117
simplify our life and just drop the double quotes whenever it
736
d3e477fe6c66 updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 550
diff changeset
   118
is clear we are talking about strings. So we will just
d3e477fe6c66 updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 550
diff changeset
   119
write \textit{foo}, \textit{bar}, \textit{foobar} 
d3e477fe6c66 updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 550
diff changeset
   120
\textit{foo $@$ bar} and so on.
239
68d98140b90b added notation handout
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents:
diff changeset
   121
496
5c9de27a5b30 updated
Christian Urban <urbanc@in.tum.de>
parents: 476
diff changeset
   122
Occasionally we will use the notation $a^n$ for strings, which stands
5c9de27a5b30 updated
Christian Urban <urbanc@in.tum.de>
parents: 476
diff changeset
   123
for the string of $n$ repeated $a$s. So $a^{n}b^{n}$ is a string that
736
d3e477fe6c66 updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 550
diff changeset
   124
has some number of $a$s followed by the same number of $b$s.
d3e477fe6c66 updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 550
diff changeset
   125
Confusingly, in Scala the notation is ``times'' for this opration.
d3e477fe6c66 updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 550
diff changeset
   126
So you can write 
d3e477fe6c66 updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 550
diff changeset
   127
d3e477fe6c66 updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 550
diff changeset
   128
\begin{lstlisting}[numbers=none]
d3e477fe6c66 updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 550
diff changeset
   129
scala> "a" * 13
d3e477fe6c66 updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 550
diff changeset
   130
val res02: String = aaaaaaaaaaaaa
d3e477fe6c66 updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 550
diff changeset
   131
\end{lstlisting}
d3e477fe6c66 updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 550
diff changeset
   132
d3e477fe6c66 updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 550
diff changeset
   133
\noindent
d3e477fe6c66 updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 550
diff changeset
   134
A simple property of string concatenation is \emph{associativity}, meaning
239
68d98140b90b added notation handout
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents:
diff changeset
   135
68d98140b90b added notation handout
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents:
diff changeset
   136
\[(s_1 @ s_2) @ s_3 = s_1 @ (s_2 @ s_3)\]  
68d98140b90b added notation handout
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents:
diff changeset
   137
68d98140b90b added notation handout
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents:
diff changeset
   138
\noindent are always equal strings. The empty string behaves
404
245d302791c7 updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 398
diff changeset
   139
like a \emph{unit element}, therefore
239
68d98140b90b added notation handout
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents:
diff changeset
   140
68d98140b90b added notation handout
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents:
diff changeset
   141
\[s \,@\, [] = [] \,@\, s = s\]
68d98140b90b added notation handout
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents:
diff changeset
   142
241
10f02605a46a polished
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 239
diff changeset
   143
239
68d98140b90b added notation handout
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents:
diff changeset
   144
\subsubsection*{Sets and Languages}
68d98140b90b added notation handout
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents:
diff changeset
   145
241
10f02605a46a polished
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 239
diff changeset
   146
We will use the familiar operations $\cup$, $\cap$, $\subset$
10f02605a46a polished
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 239
diff changeset
   147
and $\subseteq$ for sets. For the empty set we will either
10f02605a46a polished
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 239
diff changeset
   148
write $\varnothing$ or $\{\,\}$. The set containing the
10f02605a46a polished
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 239
diff changeset
   149
natural numbers $1$, $2$ and $3$, for example, we will write
10f02605a46a polished
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 239
diff changeset
   150
with curly braces as
239
68d98140b90b added notation handout
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents:
diff changeset
   151
68d98140b90b added notation handout
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents:
diff changeset
   152
\[
68d98140b90b added notation handout
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents:
diff changeset
   153
\{1, 2, 3\}
68d98140b90b added notation handout
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents:
diff changeset
   154
\]
68d98140b90b added notation handout
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents:
diff changeset
   155
496
5c9de27a5b30 updated
Christian Urban <urbanc@in.tum.de>
parents: 476
diff changeset
   156
\noindent The notation $\in$ means \emph{element of}, so $1 \in \{1,
502
bf1c472b244e updated
Christian Urban <urbanc@in.tum.de>
parents: 496
diff changeset
   157
2, 3\}$ is true and $4 \in \{1, 2, 3\}$ is false.  Note that the
496
5c9de27a5b30 updated
Christian Urban <urbanc@in.tum.de>
parents: 476
diff changeset
   158
\emph{list} $[1, 2, 3]$ is something different from the \emph{set}
5c9de27a5b30 updated
Christian Urban <urbanc@in.tum.de>
parents: 476
diff changeset
   159
$\{1, 2, 3\}$: in the former we care about the order and potentially
5c9de27a5b30 updated
Christian Urban <urbanc@in.tum.de>
parents: 476
diff changeset
   160
several occurrences of a number; while with the latter we do not.
502
bf1c472b244e updated
Christian Urban <urbanc@in.tum.de>
parents: 496
diff changeset
   161
Also sets can potentially have infinitely many elements, whereas lists
bf1c472b244e updated
Christian Urban <urbanc@in.tum.de>
parents: 496
diff changeset
   162
cannot. For example
496
5c9de27a5b30 updated
Christian Urban <urbanc@in.tum.de>
parents: 476
diff changeset
   163
the set of all natural numbers $\{0, 1, 2, \ldots\}$ is infinite. This
5c9de27a5b30 updated
Christian Urban <urbanc@in.tum.de>
parents: 476
diff changeset
   164
set is often also abbreviated as $\mathbb{N}$. Lists can be very large, but they cannot contain infinitely many elements.
5c9de27a5b30 updated
Christian Urban <urbanc@in.tum.de>
parents: 476
diff changeset
   165
550
71fc4a7a7039 updated
Christian Urban <urbanc@in.tum.de>
parents: 505
diff changeset
   166
We can define sets by giving all their elements, for example $\{0, 1\}$ for
71fc4a7a7039 updated
Christian Urban <urbanc@in.tum.de>
parents: 505
diff changeset
   167
the set containing just $0$ and $1$. But often we need to use \defn{set
71fc4a7a7039 updated
Christian Urban <urbanc@in.tum.de>
parents: 505
diff changeset
   168
  comprehensions} to define sets. For example the set of all \emph{even}
71fc4a7a7039 updated
Christian Urban <urbanc@in.tum.de>
parents: 505
diff changeset
   169
natural numbers can be defined as
239
68d98140b90b added notation handout
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents:
diff changeset
   170
68d98140b90b added notation handout
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents:
diff changeset
   171
\[
68d98140b90b added notation handout
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents:
diff changeset
   172
\{n\;|\;n\in\mathbb{N} \wedge n\;\text{is even}\}
68d98140b90b added notation handout
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents:
diff changeset
   173
\]
68d98140b90b added notation handout
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents:
diff changeset
   174
  
550
71fc4a7a7039 updated
Christian Urban <urbanc@in.tum.de>
parents: 505
diff changeset
   175
\noindent Set comprehensions consist of a ``base set'' (in this case
71fc4a7a7039 updated
Christian Urban <urbanc@in.tum.de>
parents: 505
diff changeset
   176
all the natural numbers) and a predicate (here eveness).
71fc4a7a7039 updated
Christian Urban <urbanc@in.tum.de>
parents: 505
diff changeset
   177
71fc4a7a7039 updated
Christian Urban <urbanc@in.tum.de>
parents: 505
diff changeset
   178
Though silly, but the set $\{0, 1, 2\}$ could also be
239
68d98140b90b added notation handout
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents:
diff changeset
   179
defined by the following set comprehension
68d98140b90b added notation handout
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents:
diff changeset
   180
68d98140b90b added notation handout
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents:
diff changeset
   181
\[
550
71fc4a7a7039 updated
Christian Urban <urbanc@in.tum.de>
parents: 505
diff changeset
   182
\{n\;|\;  n \in \mathbb{N} \wedge n^2 < 9\}
239
68d98140b90b added notation handout
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents:
diff changeset
   183
\]
68d98140b90b added notation handout
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents:
diff changeset
   184
496
5c9de27a5b30 updated
Christian Urban <urbanc@in.tum.de>
parents: 476
diff changeset
   185
\noindent Can you see why this defines the set $\{0, 1, 2\}$?  Notice
550
71fc4a7a7039 updated
Christian Urban <urbanc@in.tum.de>
parents: 505
diff changeset
   186
that set comprehensions are quite powerful constructions. For example they
71fc4a7a7039 updated
Christian Urban <urbanc@in.tum.de>
parents: 505
diff changeset
   187
could be used to define set union,
71fc4a7a7039 updated
Christian Urban <urbanc@in.tum.de>
parents: 505
diff changeset
   188
set intersection and set difference:
239
68d98140b90b added notation handout
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents:
diff changeset
   189
68d98140b90b added notation handout
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents:
diff changeset
   190
\begin{eqnarray*}
68d98140b90b added notation handout
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents:
diff changeset
   191
A \cup B & \dn & \{x\;|\; x \in A \vee x \in B\}\\
68d98140b90b added notation handout
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents:
diff changeset
   192
A \cap B & \dn & \{x\;|\; x \in A \wedge x \in B\}\\
68d98140b90b added notation handout
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents:
diff changeset
   193
A \backslash B & \dn & \{x\;|\; x \in A \wedge x \not\in B\} 
68d98140b90b added notation handout
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents:
diff changeset
   194
\end{eqnarray*}
68d98140b90b added notation handout
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents:
diff changeset
   195
241
10f02605a46a polished
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 239
diff changeset
   196
\noindent In general set comprehensions are of the form
10f02605a46a polished
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 239
diff changeset
   197
$\{a\;|\;P\}$ which stands for the set of all elements $a$
736
d3e477fe6c66 updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 550
diff changeset
   198
(from some set) for which some property $P$ holds. If programming
d3e477fe6c66 updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 550
diff changeset
   199
is more your-kind-of-thing, you might recognise the similarities
d3e477fe6c66 updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 550
diff changeset
   200
with for-comprehensions, for example for the silly set above you
d3e477fe6c66 updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 550
diff changeset
   201
could write
d3e477fe6c66 updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 550
diff changeset
   202
d3e477fe6c66 updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 550
diff changeset
   203
\begin{lstlisting}[numbers=none]
d3e477fe6c66 updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 550
diff changeset
   204
scala> for (n <- (0 to 10).toSet; if n * n < 9) yield n
d3e477fe6c66 updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 550
diff changeset
   205
val res03: Set[Int] = Set(0, 1, 2)
d3e477fe6c66 updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 550
diff changeset
   206
\end{lstlisting}
d3e477fe6c66 updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 550
diff changeset
   207
d3e477fe6c66 updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 550
diff changeset
   208
\noindent
d3e477fe6c66 updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 550
diff changeset
   209
This is pretty much the same as $\{n\;|\;  n \in \mathbb{N} \wedge n^2 < 9\}$
d3e477fe6c66 updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 550
diff changeset
   210
just in Scala syntax.
241
10f02605a46a polished
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 239
diff changeset
   211
 
10f02605a46a polished
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 239
diff changeset
   212
For defining sets, we will also often use the notion of the
10f02605a46a polished
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 239
diff changeset
   213
``big union''. An example is as follows:
239
68d98140b90b added notation handout
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents:
diff changeset
   214
68d98140b90b added notation handout
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents:
diff changeset
   215
\begin{equation}\label{bigunion}
68d98140b90b added notation handout
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents:
diff changeset
   216
\bigcup_{0\le n}\; \{n^2, n^2 + 1\}
68d98140b90b added notation handout
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents:
diff changeset
   217
\end{equation}
68d98140b90b added notation handout
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents:
diff changeset
   218
68d98140b90b added notation handout
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents:
diff changeset
   219
\noindent which is the set of all squares and their immediate
68d98140b90b added notation handout
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents:
diff changeset
   220
successors, so
68d98140b90b added notation handout
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents:
diff changeset
   221
68d98140b90b added notation handout
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents:
diff changeset
   222
\[
68d98140b90b added notation handout
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents:
diff changeset
   223
\{0, 1, 2, 4, 5, 9, 10, 16, 17, \ldots\}
68d98140b90b added notation handout
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents:
diff changeset
   224
\]
68d98140b90b added notation handout
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents:
diff changeset
   225
68d98140b90b added notation handout
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents:
diff changeset
   226
\noindent A big union is a sequence of unions which are 
68d98140b90b added notation handout
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents:
diff changeset
   227
indexed typically by a natural number. So the big union in
68d98140b90b added notation handout
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents:
diff changeset
   228
\eqref{bigunion} could equally be written as
68d98140b90b added notation handout
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents:
diff changeset
   229
68d98140b90b added notation handout
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents:
diff changeset
   230
\[
68d98140b90b added notation handout
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents:
diff changeset
   231
\{0, 1\} \cup \{1, 2\} \cup \{4, 5\} \cup \{9, 10\} \cup 
68d98140b90b added notation handout
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents:
diff changeset
   232
\ldots
68d98140b90b added notation handout
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents:
diff changeset
   233
\]
68d98140b90b added notation handout
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents:
diff changeset
   234
830
dbf9d710ce65 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents: 736
diff changeset
   235
\noindent but using the big union notation is more concise.\medskip
239
68d98140b90b added notation handout
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents:
diff changeset
   236
550
71fc4a7a7039 updated
Christian Urban <urbanc@in.tum.de>
parents: 505
diff changeset
   237
As an aside: While this stuff about sets might all look trivial or
71fc4a7a7039 updated
Christian Urban <urbanc@in.tum.de>
parents: 505
diff changeset
   238
even needlessly pedantic, \emph{Nature} is never simple. If you want
71fc4a7a7039 updated
Christian Urban <urbanc@in.tum.de>
parents: 505
diff changeset
   239
to be amazed how complicated sets can get, watch out for the last
71fc4a7a7039 updated
Christian Urban <urbanc@in.tum.de>
parents: 505
diff changeset
   240
lecture just before Christmas where I want to convince you of the fact
71fc4a7a7039 updated
Christian Urban <urbanc@in.tum.de>
parents: 505
diff changeset
   241
that some sets are more infinite than other sets. Yes, you read
71fc4a7a7039 updated
Christian Urban <urbanc@in.tum.de>
parents: 505
diff changeset
   242
correctly, there can be sets that are ``more infinite'' than
71fc4a7a7039 updated
Christian Urban <urbanc@in.tum.de>
parents: 505
diff changeset
   243
others. If you think this is obvious: say you have the infinite set
71fc4a7a7039 updated
Christian Urban <urbanc@in.tum.de>
parents: 505
diff changeset
   244
$\mathbb{N}\backslash\{0\} = \{1, 2, 3, 4, \ldots\}$ which is all the
71fc4a7a7039 updated
Christian Urban <urbanc@in.tum.de>
parents: 505
diff changeset
   245
natural numbers except $0$, and then compare it to the set
71fc4a7a7039 updated
Christian Urban <urbanc@in.tum.de>
parents: 505
diff changeset
   246
$\{0, 1, 2, 3, 4, \ldots\}$ which contains the $0$ and all other
71fc4a7a7039 updated
Christian Urban <urbanc@in.tum.de>
parents: 505
diff changeset
   247
numbers. If you think, the second must be more infinite\ldots{} well,
71fc4a7a7039 updated
Christian Urban <urbanc@in.tum.de>
parents: 505
diff changeset
   248
then think again. Because the two infinite sets
496
5c9de27a5b30 updated
Christian Urban <urbanc@in.tum.de>
parents: 476
diff changeset
   249
5c9de27a5b30 updated
Christian Urban <urbanc@in.tum.de>
parents: 476
diff changeset
   250
\begin{center}
5c9de27a5b30 updated
Christian Urban <urbanc@in.tum.de>
parents: 476
diff changeset
   251
  $\{1, 2, 3, 4, \ldots\}$ and
5c9de27a5b30 updated
Christian Urban <urbanc@in.tum.de>
parents: 476
diff changeset
   252
  $\{0, 1, 2, 3, 4, \ldots\}$
5c9de27a5b30 updated
Christian Urban <urbanc@in.tum.de>
parents: 476
diff changeset
   253
\end{center}
5c9de27a5b30 updated
Christian Urban <urbanc@in.tum.de>
parents: 476
diff changeset
   254
5c9de27a5b30 updated
Christian Urban <urbanc@in.tum.de>
parents: 476
diff changeset
   255
\noindent
830
dbf9d710ce65 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents: 736
diff changeset
   256
contain actually the same amount of elements. Does this make sense to you?
dbf9d710ce65 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents: 736
diff changeset
   257
If yes, good. If not, then something to learn about.
dbf9d710ce65 updated
Christian Urban <christian.urban@kcl.ac.uk>
parents: 736
diff changeset
   258
550
71fc4a7a7039 updated
Christian Urban <urbanc@in.tum.de>
parents: 505
diff changeset
   259
Though this might all look strange, infinite sets will be a
496
5c9de27a5b30 updated
Christian Urban <urbanc@in.tum.de>
parents: 476
diff changeset
   260
topic that is very relevant to the material of this module. It tells
736
d3e477fe6c66 updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 550
diff changeset
   261
us what we can compute with a computer (actually an algorithm) and what
502
bf1c472b244e updated
Christian Urban <urbanc@in.tum.de>
parents: 496
diff changeset
   262
we cannot. But during the first 9 lectures we can go by without this
550
71fc4a7a7039 updated
Christian Urban <urbanc@in.tum.de>
parents: 505
diff changeset
   263
``weird'' stuff. End of aside.\smallskip
404
245d302791c7 updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 398
diff changeset
   264
245d302791c7 updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 398
diff changeset
   265
Another important notion in this module are \defn{languages}, which
398
c8ce95067c1a updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 332
diff changeset
   266
are sets of strings. One of the main goals for us will be how to
239
68d98140b90b added notation handout
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents:
diff changeset
   267
(formally) specify languages and to find out whether a string
241
10f02605a46a polished
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 239
diff changeset
   268
is in a language or not.\footnote{You might wish to ponder
10f02605a46a polished
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 239
diff changeset
   269
whether this is in general a hard or easy problem, where
10f02605a46a polished
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 239
diff changeset
   270
hardness is meant in terms of Turing decidable, for example.}
10f02605a46a polished
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 239
diff changeset
   271
Note that the language containing the empty string $\{\dq{}\}$
10f02605a46a polished
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 239
diff changeset
   272
is not equal to $\varnothing$, the empty language (or empty
10f02605a46a polished
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 239
diff changeset
   273
set): The former contains one element, namely \dq{} (also
10f02605a46a polished
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 239
diff changeset
   274
written $[\,]$), but the latter does not contain any
736
d3e477fe6c66 updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 550
diff changeset
   275
element at all! Make sure you see the difference.
239
68d98140b90b added notation handout
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents:
diff changeset
   276
68d98140b90b added notation handout
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents:
diff changeset
   277
For languages we define the operation of \defn{language
241
10f02605a46a polished
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 239
diff changeset
   278
concatenation}, written like in the string case as $A @ B$:
239
68d98140b90b added notation handout
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents:
diff changeset
   279
68d98140b90b added notation handout
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents:
diff changeset
   280
\begin{equation}\label{langconc}
68d98140b90b added notation handout
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents:
diff changeset
   281
A @ B \dn \{s_1 @ s_2\;|\; s_1\in A \wedge s_2\in B\}
68d98140b90b added notation handout
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents:
diff changeset
   282
\end{equation}
68d98140b90b added notation handout
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents:
diff changeset
   283
68d98140b90b added notation handout
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents:
diff changeset
   284
\noindent Be careful to understand the difference: the $@$
68d98140b90b added notation handout
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents:
diff changeset
   285
in $s_1 @ s_2$ is string concatenation, while $A @ B$ refers 
68d98140b90b added notation handout
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents:
diff changeset
   286
to the concatenation of two languages (or sets of strings).
68d98140b90b added notation handout
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents:
diff changeset
   287
As an example suppose $A=\{ab, ac\}$ and $B=\{zzz, qq, r\}$,
241
10f02605a46a polished
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 239
diff changeset
   288
then $A \,@\, B$ is the language
239
68d98140b90b added notation handout
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents:
diff changeset
   289
68d98140b90b added notation handout
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents:
diff changeset
   290
\[
68d98140b90b added notation handout
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents:
diff changeset
   291
\{abzzz, abqq, abr, aczzz, acqq, acr\}
68d98140b90b added notation handout
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents:
diff changeset
   292
\] 
68d98140b90b added notation handout
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents:
diff changeset
   293
736
d3e477fe6c66 updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 550
diff changeset
   294
\noindent The cool thing about Scala is that we can define language
d3e477fe6c66 updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 550
diff changeset
   295
concatenation very elegantly as
d3e477fe6c66 updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 550
diff changeset
   296
d3e477fe6c66 updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 550
diff changeset
   297
\begin{lstlisting}[numbers=none]
d3e477fe6c66 updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 550
diff changeset
   298
def concat(A: Set[String], B: Set[String]) =
d3e477fe6c66 updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 550
diff changeset
   299
  for (x <- A ; y <- B) yield x ++ y
d3e477fe6c66 updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 550
diff changeset
   300
\end{lstlisting}
d3e477fe6c66 updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 550
diff changeset
   301
d3e477fe6c66 updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 550
diff changeset
   302
\noindent
d3e477fe6c66 updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 550
diff changeset
   303
where \code{++} is string concatenation in Scala.
d3e477fe6c66 updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 550
diff changeset
   304
d3e477fe6c66 updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 550
diff changeset
   305
Recall the properties for string concatenation. For
239
68d98140b90b added notation handout
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents:
diff changeset
   306
language concatenation we have the following properties
68d98140b90b added notation handout
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents:
diff changeset
   307
68d98140b90b added notation handout
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents:
diff changeset
   308
\begin{center}
68d98140b90b added notation handout
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents:
diff changeset
   309
\begin{tabular}{ll}
68d98140b90b added notation handout
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents:
diff changeset
   310
associativity: & $(A @ B) @ C = A @ (B @ C)$\\
68d98140b90b added notation handout
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents:
diff changeset
   311
unit element:  & $A \,@\, \{[]\} = \{[]\} \,@\, A = A$\\
68d98140b90b added notation handout
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents:
diff changeset
   312
zero element:  & $A \,@\, \varnothing = \varnothing \,@\, A = 
68d98140b90b added notation handout
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents:
diff changeset
   313
\varnothing$
68d98140b90b added notation handout
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents:
diff changeset
   314
\end{tabular}
68d98140b90b added notation handout
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents:
diff changeset
   315
\end{center}
68d98140b90b added notation handout
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents:
diff changeset
   316
241
10f02605a46a polished
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 239
diff changeset
   317
\noindent Note the difference in the last two lines: the empty
736
d3e477fe6c66 updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 550
diff changeset
   318
set behaves like $0$ for multiplication, whereas the set $\{[]\}$
d3e477fe6c66 updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 550
diff changeset
   319
behaves like $1$ for multiplication ($n * 1 = n$ and $n * 0 = 0$).
d3e477fe6c66 updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 550
diff changeset
   320
Again this is a subtletly you need to get compfortable with.
239
68d98140b90b added notation handout
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents:
diff changeset
   321
502
bf1c472b244e updated
Christian Urban <urbanc@in.tum.de>
parents: 496
diff changeset
   322
Using the operation of language concatenation, we can define a
239
68d98140b90b added notation handout
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents:
diff changeset
   323
\defn{language power} operation as follows:
68d98140b90b added notation handout
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents:
diff changeset
   324
68d98140b90b added notation handout
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents:
diff changeset
   325
\begin{eqnarray*}
68d98140b90b added notation handout
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents:
diff changeset
   326
A^0     & \dn & \{[]\}\\
68d98140b90b added notation handout
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents:
diff changeset
   327
A^{n+1} & \dn & A \,@\, A^n
68d98140b90b added notation handout
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents:
diff changeset
   328
\end{eqnarray*}
68d98140b90b added notation handout
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents:
diff changeset
   329
241
10f02605a46a polished
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 239
diff changeset
   330
\noindent This definition is by recursion on natural numbers.
239
68d98140b90b added notation handout
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents:
diff changeset
   331
Note carefully that the zero-case is not defined as the empty
68d98140b90b added notation handout
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents:
diff changeset
   332
set, but the set containing the empty string. So no matter
68d98140b90b added notation handout
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents:
diff changeset
   333
what the set $A$ is, $A^0$ will always be $\{[]\}$. (There is
68d98140b90b added notation handout
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents:
diff changeset
   334
another hint about a connection between the $@$-operation and
502
bf1c472b244e updated
Christian Urban <urbanc@in.tum.de>
parents: 496
diff changeset
   335
multiplication: How is $x^n$ defined in mathematics and what is
242
35104ee14f87 updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 241
diff changeset
   336
$x^0$?)
239
68d98140b90b added notation handout
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents:
diff changeset
   337
242
35104ee14f87 updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 241
diff changeset
   338
Next we can define the \defn{star operation} for languages:
404
245d302791c7 updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 398
diff changeset
   339
$A\star$ is the union of all powers of $A$, or short
239
68d98140b90b added notation handout
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents:
diff changeset
   340
241
10f02605a46a polished
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 239
diff changeset
   341
\begin{equation}\label{star}
404
245d302791c7 updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 398
diff changeset
   342
A\star \dn \bigcup_{0\le n}\; A^n
241
10f02605a46a polished
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 239
diff changeset
   343
\end{equation}
239
68d98140b90b added notation handout
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents:
diff changeset
   344
241
10f02605a46a polished
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 239
diff changeset
   345
\noindent This star operation is often also called
736
d3e477fe6c66 updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 550
diff changeset
   346
\emph{Kleene-star} after the mathematician/computer scientist
d3e477fe6c66 updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 550
diff changeset
   347
Stephen Cole Kleene. Unfolding the definition in~\eqref{star}
241
10f02605a46a polished
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 239
diff changeset
   348
gives
239
68d98140b90b added notation handout
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents:
diff changeset
   349
68d98140b90b added notation handout
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents:
diff changeset
   350
\[
502
bf1c472b244e updated
Christian Urban <urbanc@in.tum.de>
parents: 496
diff changeset
   351
A\star \dn A^0 \cup A^1 \cup A^2 \cup A^3 \cup \ldots
239
68d98140b90b added notation handout
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents:
diff changeset
   352
\]
68d98140b90b added notation handout
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents:
diff changeset
   353
68d98140b90b added notation handout
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents:
diff changeset
   354
\noindent
68d98140b90b added notation handout
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents:
diff changeset
   355
which is equal to 
68d98140b90b added notation handout
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents:
diff changeset
   356
68d98140b90b added notation handout
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents:
diff changeset
   357
\[
502
bf1c472b244e updated
Christian Urban <urbanc@in.tum.de>
parents: 496
diff changeset
   358
A\star \dn \{[]\} \,\cup\, A \,\cup\, A @ A \,\cup\, A @ A @ A \,\cup\, \ldots
239
68d98140b90b added notation handout
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents:
diff changeset
   359
\]
68d98140b90b added notation handout
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents:
diff changeset
   360
404
245d302791c7 updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 398
diff changeset
   361
\noindent We can see that the empty string is always in $A\star$,
239
68d98140b90b added notation handout
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents:
diff changeset
   362
no matter what $A$ is. This is because $[] \in A^0$. To make
241
10f02605a46a polished
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 239
diff changeset
   363
sure you understand these definitions, I leave you to answer
404
245d302791c7 updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 398
diff changeset
   364
what $\{[]\}\star$ and $\varnothing\star$ are?
239
68d98140b90b added notation handout
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents:
diff changeset
   365
68d98140b90b added notation handout
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents:
diff changeset
   366
Recall that an alphabet is often referred to by the letter
404
245d302791c7 updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 398
diff changeset
   367
$\Sigma$. We can now write for the set of \emph{all} strings
502
bf1c472b244e updated
Christian Urban <urbanc@in.tum.de>
parents: 496
diff changeset
   368
over this alphabet as $\Sigma\star$. In doing so we also include the
736
d3e477fe6c66 updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 550
diff changeset
   369
empty string as a possible string (over $\Sigma$). Assuming $\Sigma
404
245d302791c7 updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 398
diff changeset
   370
= \{a, b\}$, then $\Sigma\star$ is
239
68d98140b90b added notation handout
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents:
diff changeset
   371
68d98140b90b added notation handout
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents:
diff changeset
   372
\[
246
baf41b05210f updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 242
diff changeset
   373
\{[], a, b, aa, ab, ba, bb, aaa, aab, aba, abb, baa, bab, \ldots\}
239
68d98140b90b added notation handout
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents:
diff changeset
   374
\]
68d98140b90b added notation handout
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents:
diff changeset
   375
736
d3e477fe6c66 updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 550
diff changeset
   376
\noindent or in words all strings containing $a$s and
246
baf41b05210f updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 242
diff changeset
   377
$b$s only, plus the empty string.
736
d3e477fe6c66 updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 550
diff changeset
   378
\bigskip
239
68d98140b90b added notation handout
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents:
diff changeset
   379
736
d3e477fe6c66 updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 550
diff changeset
   380
\noindent
d3e477fe6c66 updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 550
diff changeset
   381
Thanks for making it until here! There are also some personal conventions
d3e477fe6c66 updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 550
diff changeset
   382
about regular expressions. But I will explain them in the handout for the
d3e477fe6c66 updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 550
diff changeset
   383
first week. An exercise you can do: Implement the power operation for languages
d3e477fe6c66 updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents: 550
diff changeset
   384
and try out some examples.
239
68d98140b90b added notation handout
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents:
diff changeset
   385
\end{document}
68d98140b90b added notation handout
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents:
diff changeset
   386
68d98140b90b added notation handout
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents:
diff changeset
   387
%%% Local Variables: 
68d98140b90b added notation handout
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents:
diff changeset
   388
%%% mode: latex
68d98140b90b added notation handout
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents:
diff changeset
   389
%%% TeX-master: t
68d98140b90b added notation handout
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents:
diff changeset
   390
%%% End: