| author | Christian Urban <christian.urban@kcl.ac.uk> | 
| Mon, 20 Oct 2025 22:18:21 +0200 | |
| changeset 1012 | fbe868b97e33 | 
| parent 993 | 29db0cb761c2 | 
| permissions | -rw-r--r-- | 
| 
239
 
68d98140b90b
added notation handout
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents:  
diff
changeset
 | 
1  | 
\documentclass{article}
 | 
| 
 
68d98140b90b
added notation handout
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents:  
diff
changeset
 | 
2  | 
\usepackage{../style}
 | 
| 
 
68d98140b90b
added notation handout
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents:  
diff
changeset
 | 
3  | 
\usepackage{../langs}
 | 
| 830 | 4  | 
\usepackage{../graphics}
 | 
| 
239
 
68d98140b90b
added notation handout
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents:  
diff
changeset
 | 
5  | 
|
| 
 
68d98140b90b
added notation handout
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents:  
diff
changeset
 | 
6  | 
|
| 
 
68d98140b90b
added notation handout
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents:  
diff
changeset
 | 
7  | 
\begin{document}
 | 
| 993 | 8  | 
\fnote{\copyright{} Christian Urban, King's College London, 2014, 2015, 2016, 2017, 2018, 2020, 2025}
 | 
| 505 | 9  | 
|
| 
239
 
68d98140b90b
added notation handout
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents:  
diff
changeset
 | 
10  | 
|
| 
 
68d98140b90b
added notation handout
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents:  
diff
changeset
 | 
11  | 
\section*{A Crash-Course on Notation}
 | 
| 
 
68d98140b90b
added notation handout
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents:  
diff
changeset
 | 
12  | 
|
| 981 | 13  | 
There are innumerable books available on compilers, automata theory  | 
| 476 | 14  | 
and formal languages. Unfortunately, they often use their own  | 
15  | 
notational conventions and their own symbols. This handout is meant to  | 
|
| 502 | 16  | 
clarify some of the notation I will use. I apologise in advance that  | 
| 476 | 17  | 
sometimes I will be a bit fuzzy\ldots the problem is that often we  | 
18  | 
want to have convenience in our mathematical definitions (to make them  | 
|
19  | 
readable and understandable), but other times we need pedantic  | 
|
20  | 
precision for actual programs.  | 
|
| 
241
 
10f02605a46a
polished
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
239 
diff
changeset
 | 
21  | 
|
| 
239
 
68d98140b90b
added notation handout
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents:  
diff
changeset
 | 
22  | 
\subsubsection*{Characters and Strings}
 | 
| 
 
68d98140b90b
added notation handout
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents:  
diff
changeset
 | 
23  | 
|
| 
736
 
0494995fd979
updated
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
550 
diff
changeset
 | 
24  | 
The most basic concept in this module are strings. Strings  | 
| 
241
 
10f02605a46a
polished
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
239 
diff
changeset
 | 
25  | 
are composed of \defn{characters}. While characters are surely
 | 
| 
 
10f02605a46a
polished
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
239 
diff
changeset
 | 
26  | 
a familiar concept, we will make one subtle distinction in  | 
| 
 
10f02605a46a
polished
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
239 
diff
changeset
 | 
27  | 
this module. If we want to refer to concrete characters, like  | 
| 
736
 
0494995fd979
updated
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
550 
diff
changeset
 | 
28  | 
\code{a}, \code{b}, \code{c} and so on, we will use a typewriter font.
 | 
| 
241
 
10f02605a46a
polished
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
239 
diff
changeset
 | 
29  | 
Accordingly if we want to refer to the concrete characters of  | 
| 
 
10f02605a46a
polished
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
239 
diff
changeset
 | 
30  | 
my email address we shall write  | 
| 
239
 
68d98140b90b
added notation handout
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents:  
diff
changeset
 | 
31  | 
|
| 
 
68d98140b90b
added notation handout
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents:  
diff
changeset
 | 
32  | 
\begin{center}
 | 
| 
 
68d98140b90b
added notation handout
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents:  
diff
changeset
 | 
33  | 
\pcode{christian.urban@kcl.ac.uk}
 | 
| 
 
68d98140b90b
added notation handout
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents:  
diff
changeset
 | 
34  | 
\end{center}
 | 
| 
 
68d98140b90b
added notation handout
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents:  
diff
changeset
 | 
35  | 
|
| 
241
 
10f02605a46a
polished
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
239 
diff
changeset
 | 
36  | 
\noindent If we also need to explicitly indicate the ``space''  | 
| 
239
 
68d98140b90b
added notation handout
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents:  
diff
changeset
 | 
37  | 
character, we write \VS{}\hspace{1mm}. For example
 | 
| 
 
68d98140b90b
added notation handout
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents:  
diff
changeset
 | 
38  | 
|
| 
 
68d98140b90b
added notation handout
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents:  
diff
changeset
 | 
39  | 
\begin{center}
 | 
| 
 
68d98140b90b
added notation handout
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents:  
diff
changeset
 | 
40  | 
\tt{}hello\VS\hspace{0.5mm}world
 | 
| 
 
68d98140b90b
added notation handout
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents:  
diff
changeset
 | 
41  | 
\end{center}
 | 
| 
 
68d98140b90b
added notation handout
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents:  
diff
changeset
 | 
42  | 
|
| 
 
68d98140b90b
added notation handout
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents:  
diff
changeset
 | 
43  | 
|
| 
241
 
10f02605a46a
polished
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
239 
diff
changeset
 | 
44  | 
\noindent But often we do not care which particular characters  | 
| 
 
10f02605a46a
polished
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
239 
diff
changeset
 | 
45  | 
we use. In such cases we use the italic font and write $a$,  | 
| 
332
 
4755ad4b457b
updated
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
266 
diff
changeset
 | 
46  | 
$b$, $c$ and so on for characters. Therefore if we need a  | 
| 
239
 
68d98140b90b
added notation handout
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents:  
diff
changeset
 | 
47  | 
representative string, we might write  | 
| 
 
68d98140b90b
added notation handout
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents:  
diff
changeset
 | 
48  | 
|
| 
404
 
245d302791c7
updated
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
398 
diff
changeset
 | 
49  | 
\[  | 
| 
239
 
68d98140b90b
added notation handout
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents:  
diff
changeset
 | 
50  | 
abracadabra  | 
| 
404
 
245d302791c7
updated
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
398 
diff
changeset
 | 
51  | 
\]  | 
| 
239
 
68d98140b90b
added notation handout
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents:  
diff
changeset
 | 
52  | 
|
| 
241
 
10f02605a46a
polished
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
239 
diff
changeset
 | 
53  | 
\noindent In this string, we do not really care what the  | 
| 
 
10f02605a46a
polished
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
239 
diff
changeset
 | 
54  | 
characters stand for, except we do care about the fact that  | 
| 
 
10f02605a46a
polished
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
239 
diff
changeset
 | 
55  | 
for example the character $a$ is not equal to $b$ and so on.  | 
| 
404
 
245d302791c7
updated
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
398 
diff
changeset
 | 
56  | 
Why do I make this distinction? Because we often need to  | 
| 
 
245d302791c7
updated
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
398 
diff
changeset
 | 
57  | 
define functions using variables ranging over characters. We  | 
| 
736
 
0494995fd979
updated
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
550 
diff
changeset
 | 
58  | 
need to somehow say ``this-is-a-variable'' and give it a name.  | 
| 
 
0494995fd979
updated
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
550 
diff
changeset
 | 
59  | 
In such cases we use the italic font.  | 
| 
 
0494995fd979
updated
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
550 
diff
changeset
 | 
60  | 
|
| 
239
 
68d98140b90b
added notation handout
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents:  
diff
changeset
 | 
61  | 
|
| 
241
 
10f02605a46a
polished
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
239 
diff
changeset
 | 
62  | 
An \defn{alphabet} is a (non-empty) finite set of characters.
 | 
| 
 
10f02605a46a
polished
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
239 
diff
changeset
 | 
63  | 
Often the letter $\Sigma$ is used to refer to an alphabet. For  | 
| 
 
10f02605a46a
polished
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
239 
diff
changeset
 | 
64  | 
example the ASCII characters \pcode{a} to \pcode{z} form an
 | 
| 
 
10f02605a46a
polished
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
239 
diff
changeset
 | 
65  | 
alphabet. The digits $0$ to $9$ are another alphabet. The  | 
| 
 
10f02605a46a
polished
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
239 
diff
changeset
 | 
66  | 
Greek letters $\alpha$ to $\omega$ also form an alphabet. If  | 
| 
 
10f02605a46a
polished
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
239 
diff
changeset
 | 
67  | 
nothing else is specified, we usually assume the alphabet  | 
| 
 
10f02605a46a
polished
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
239 
diff
changeset
 | 
68  | 
consists of just the lower-case letters $a$, $b$, \ldots, $z$.  | 
| 
 
10f02605a46a
polished
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
239 
diff
changeset
 | 
69  | 
Sometimes, however, we explicitly want to restrict strings to  | 
| 
 
10f02605a46a
polished
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
239 
diff
changeset
 | 
70  | 
contain only the letters $a$ and $b$, for example. In this  | 
| 
 
10f02605a46a
polished
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
239 
diff
changeset
 | 
71  | 
case we will state that the alphabet is the set $\{a, b\}$. 
 | 
| 
239
 
68d98140b90b
added notation handout
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents:  
diff
changeset
 | 
72  | 
|
| 
 
68d98140b90b
added notation handout
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents:  
diff
changeset
 | 
73  | 
\defn{Strings} are lists of characters. Unfortunately, there
 | 
| 
 
68d98140b90b
added notation handout
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents:  
diff
changeset
 | 
74  | 
are many ways how we can write down strings. In programming  | 
| 496 | 75  | 
languages, they are usually written as \dq{\texttt{hello}} where the
 | 
| 
404
 
245d302791c7
updated
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
398 
diff
changeset
 | 
76  | 
double quotes indicate that we are dealing with a string. In  | 
| 496 | 77  | 
typed programming languages, such as Scala, strings have a special  | 
| 
404
 
245d302791c7
updated
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
398 
diff
changeset
 | 
78  | 
type---namely \pcode{String} which is different from the type
 | 
| 502 | 79  | 
for lists of characters. This is because strings can be  | 
80  | 
efficiently represented in memory, unlike lists. Since  | 
|
81  | 
\code{String} and the type of lists of characters
 | 
|
| 
736
 
0494995fd979
updated
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
550 
diff
changeset
 | 
82  | 
(namely \code{List[Char]}) are not the same, we need to explicitly
 | 
| 
404
 
245d302791c7
updated
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
398 
diff
changeset
 | 
83  | 
coerce elements between the two types, for example  | 
| 
 
245d302791c7
updated
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
398 
diff
changeset
 | 
84  | 
|
| 
 
245d302791c7
updated
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
398 
diff
changeset
 | 
85  | 
\begin{lstlisting}[numbers=none]
 | 
| 
 
245d302791c7
updated
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
398 
diff
changeset
 | 
86  | 
scala> "abc".toList  | 
| 
 
245d302791c7
updated
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
398 
diff
changeset
 | 
87  | 
res01: List[Char] = List(a, b, c)  | 
| 
 
245d302791c7
updated
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
398 
diff
changeset
 | 
88  | 
\end{lstlisting}
 | 
| 
 
245d302791c7
updated
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
398 
diff
changeset
 | 
89  | 
|
| 502 | 90  | 
\noindent  | 
91  | 
However, we do not want to do this kind of explicit coercion in our  | 
|
92  | 
pencil-and-paper, everyday arguments. So in our (mathematical)  | 
|
| 550 | 93  | 
definitions we regard strings as lists of characters and we will also  | 
94  | 
write \dq{$hello$} as list
 | 
|
| 
239
 
68d98140b90b
added notation handout
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents:  
diff
changeset
 | 
95  | 
|
| 
 
68d98140b90b
added notation handout
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents:  
diff
changeset
 | 
96  | 
\[  | 
| 
404
 
245d302791c7
updated
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
398 
diff
changeset
 | 
97  | 
[\text{\it h, e, l, l, o}] \qquad\text{or simply}\qquad \textit{hello}
 | 
| 
239
 
68d98140b90b
added notation handout
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents:  
diff
changeset
 | 
98  | 
\]  | 
| 
 
68d98140b90b
added notation handout
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents:  
diff
changeset
 | 
99  | 
|
| 
 
68d98140b90b
added notation handout
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents:  
diff
changeset
 | 
100  | 
\noindent The important point is that we can always decompose  | 
| 
 
68d98140b90b
added notation handout
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents:  
diff
changeset
 | 
101  | 
such strings. For example, we will often consider the first  | 
| 
 
68d98140b90b
added notation handout
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents:  
diff
changeset
 | 
102  | 
character of a string, say $h$, and the ``rest'' of a string  | 
| 
 
68d98140b90b
added notation handout
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents:  
diff
changeset
 | 
103  | 
say \dq{\textit{ello}} when making definitions about strings.
 | 
| 
241
 
10f02605a46a
polished
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
239 
diff
changeset
 | 
104  | 
There are also some subtleties with the empty string,  | 
| 
 
10f02605a46a
polished
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
239 
diff
changeset
 | 
105  | 
sometimes written as \dq{} but also as the empty list of
 | 
| 
 
10f02605a46a
polished
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
239 
diff
changeset
 | 
106  | 
characters $[\,]$.\footnote{In the literature you can also
 | 
| 
 
10f02605a46a
polished
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
239 
diff
changeset
 | 
107  | 
often find that $\varepsilon$ or $\lambda$ is used to  | 
| 
736
 
0494995fd979
updated
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
550 
diff
changeset
 | 
108  | 
represent the empty string. But we are not going to use this notation.}  | 
| 
241
 
10f02605a46a
polished
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
239 
diff
changeset
 | 
109  | 
|
| 
 
10f02605a46a
polished
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
239 
diff
changeset
 | 
110  | 
Two strings, say $s_1$ and $s_2$, can be \defn{concatenated},
 | 
| 
404
 
245d302791c7
updated
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
398 
diff
changeset
 | 
111  | 
which we write as $s_1 @ s_2$. If we regard $s_1$ and $s_2$ as  | 
| 
 
245d302791c7
updated
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
398 
diff
changeset
 | 
112  | 
lists of characters, then $@$ is the list-append function.  | 
| 
 
245d302791c7
updated
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
398 
diff
changeset
 | 
113  | 
Suppose we are given two strings \dq{\textit{foo}} and
 | 
| 502 | 114  | 
\dq{\textit{bar}}, then their concatenation, written
 | 
| 
404
 
245d302791c7
updated
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
398 
diff
changeset
 | 
115  | 
\dq{\textit{foo}} $@$ \dq{\textit{bar}}, gives
 | 
| 
 
245d302791c7
updated
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
398 
diff
changeset
 | 
116  | 
\dq{\textit{foobar}}. But as said above, we will often
 | 
| 
 
245d302791c7
updated
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
398 
diff
changeset
 | 
117  | 
simplify our life and just drop the double quotes whenever it  | 
| 
736
 
0494995fd979
updated
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
550 
diff
changeset
 | 
118  | 
is clear we are talking about strings. So we will just  | 
| 981 | 119  | 
write \textit{foo}, \textit{bar}, \textit{foobar}, 
 | 
| 
736
 
0494995fd979
updated
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
550 
diff
changeset
 | 
120  | 
\textit{foo $@$ bar} and so on.
 | 
| 
239
 
68d98140b90b
added notation handout
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents:  
diff
changeset
 | 
121  | 
|
| 496 | 122  | 
Occasionally we will use the notation $a^n$ for strings, which stands  | 
123  | 
for the string of $n$ repeated $a$s. So $a^{n}b^{n}$ is a string that
 | 
|
| 
736
 
0494995fd979
updated
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
550 
diff
changeset
 | 
124  | 
has some number of $a$s followed by the same number of $b$s.  | 
| 
 
0494995fd979
updated
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
550 
diff
changeset
 | 
125  | 
Confusingly, in Scala the notation is ``times'' for this opration.  | 
| 
 
0494995fd979
updated
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
550 
diff
changeset
 | 
126  | 
So you can write  | 
| 
 
0494995fd979
updated
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
550 
diff
changeset
 | 
127  | 
|
| 
 
0494995fd979
updated
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
550 
diff
changeset
 | 
128  | 
\begin{lstlisting}[numbers=none]
 | 
| 
 
0494995fd979
updated
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
550 
diff
changeset
 | 
129  | 
scala> "a" * 13  | 
| 
 
0494995fd979
updated
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
550 
diff
changeset
 | 
130  | 
val res02: String = aaaaaaaaaaaaa  | 
| 
 
0494995fd979
updated
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
550 
diff
changeset
 | 
131  | 
\end{lstlisting}
 | 
| 
 
0494995fd979
updated
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
550 
diff
changeset
 | 
132  | 
|
| 
 
0494995fd979
updated
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
550 
diff
changeset
 | 
133  | 
\noindent  | 
| 
 
0494995fd979
updated
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
550 
diff
changeset
 | 
134  | 
A simple property of string concatenation is \emph{associativity}, meaning
 | 
| 
239
 
68d98140b90b
added notation handout
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents:  
diff
changeset
 | 
135  | 
|
| 
 
68d98140b90b
added notation handout
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents:  
diff
changeset
 | 
136  | 
\[(s_1 @ s_2) @ s_3 = s_1 @ (s_2 @ s_3)\]  | 
| 
 
68d98140b90b
added notation handout
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents:  
diff
changeset
 | 
137  | 
|
| 
 
68d98140b90b
added notation handout
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents:  
diff
changeset
 | 
138  | 
\noindent are always equal strings. The empty string behaves  | 
| 
404
 
245d302791c7
updated
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
398 
diff
changeset
 | 
139  | 
like a \emph{unit element}, therefore
 | 
| 
239
 
68d98140b90b
added notation handout
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents:  
diff
changeset
 | 
140  | 
|
| 
 
68d98140b90b
added notation handout
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents:  
diff
changeset
 | 
141  | 
\[s \,@\, [] = [] \,@\, s = s\]  | 
| 
 
68d98140b90b
added notation handout
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents:  
diff
changeset
 | 
142  | 
|
| 
241
 
10f02605a46a
polished
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
239 
diff
changeset
 | 
143  | 
|
| 
239
 
68d98140b90b
added notation handout
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents:  
diff
changeset
 | 
144  | 
\subsubsection*{Sets and Languages}
 | 
| 
 
68d98140b90b
added notation handout
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents:  
diff
changeset
 | 
145  | 
|
| 
241
 
10f02605a46a
polished
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
239 
diff
changeset
 | 
146  | 
We will use the familiar operations $\cup$, $\cap$, $\subset$  | 
| 
 
10f02605a46a
polished
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
239 
diff
changeset
 | 
147  | 
and $\subseteq$ for sets. For the empty set we will either  | 
| 
 
10f02605a46a
polished
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
239 
diff
changeset
 | 
148  | 
write $\varnothing$ or $\{\,\}$. The set containing the
 | 
| 
 
10f02605a46a
polished
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
239 
diff
changeset
 | 
149  | 
natural numbers $1$, $2$ and $3$, for example, we will write  | 
| 
 
10f02605a46a
polished
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
239 
diff
changeset
 | 
150  | 
with curly braces as  | 
| 
239
 
68d98140b90b
added notation handout
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents:  
diff
changeset
 | 
151  | 
|
| 
 
68d98140b90b
added notation handout
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents:  
diff
changeset
 | 
152  | 
\[  | 
| 
 
68d98140b90b
added notation handout
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents:  
diff
changeset
 | 
153  | 
\{1, 2, 3\}
 | 
| 
 
68d98140b90b
added notation handout
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents:  
diff
changeset
 | 
154  | 
\]  | 
| 
 
68d98140b90b
added notation handout
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents:  
diff
changeset
 | 
155  | 
|
| 496 | 156  | 
\noindent The notation $\in$ means \emph{element of}, so $1 \in \{1,
 | 
| 502 | 157  | 
2, 3\}$ is true and $4 \in \{1, 2, 3\}$ is false.  Note that the
 | 
| 496 | 158  | 
\emph{list} $[1, 2, 3]$ is something different from the \emph{set}
 | 
159  | 
$\{1, 2, 3\}$: in the former we care about the order and potentially
 | 
|
160  | 
several occurrences of a number; while with the latter we do not.  | 
|
| 502 | 161  | 
Also sets can potentially have infinitely many elements, whereas lists  | 
162  | 
cannot. For example  | 
|
| 496 | 163  | 
the set of all natural numbers $\{0, 1, 2, \ldots\}$ is infinite. This
 | 
164  | 
set is often also abbreviated as $\mathbb{N}$. Lists can be very large, but they cannot contain infinitely many elements.
 | 
|
165  | 
||
| 550 | 166  | 
We can define sets by giving all their elements, for example $\{0, 1\}$ for
 | 
167  | 
the set containing just $0$ and $1$. But often we need to use \defn{set
 | 
|
168  | 
  comprehensions} to define sets. For example the set of all \emph{even}
 | 
|
169  | 
natural numbers can be defined as  | 
|
| 
239
 
68d98140b90b
added notation handout
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents:  
diff
changeset
 | 
170  | 
|
| 
 
68d98140b90b
added notation handout
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents:  
diff
changeset
 | 
171  | 
\[  | 
| 
 
68d98140b90b
added notation handout
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents:  
diff
changeset
 | 
172  | 
\{n\;|\;n\in\mathbb{N} \wedge n\;\text{is even}\}
 | 
| 
 
68d98140b90b
added notation handout
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents:  
diff
changeset
 | 
173  | 
\]  | 
| 
 
68d98140b90b
added notation handout
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents:  
diff
changeset
 | 
174  | 
|
| 550 | 175  | 
\noindent Set comprehensions consist of a ``base set'' (in this case  | 
176  | 
all the natural numbers) and a predicate (here eveness).  | 
|
177  | 
||
178  | 
Though silly, but the set $\{0, 1, 2\}$ could also be
 | 
|
| 
239
 
68d98140b90b
added notation handout
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents:  
diff
changeset
 | 
179  | 
defined by the following set comprehension  | 
| 
 
68d98140b90b
added notation handout
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents:  
diff
changeset
 | 
180  | 
|
| 
 
68d98140b90b
added notation handout
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents:  
diff
changeset
 | 
181  | 
\[  | 
| 550 | 182  | 
\{n\;|\;  n \in \mathbb{N} \wedge n^2 < 9\}
 | 
| 
239
 
68d98140b90b
added notation handout
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents:  
diff
changeset
 | 
183  | 
\]  | 
| 
 
68d98140b90b
added notation handout
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents:  
diff
changeset
 | 
184  | 
|
| 496 | 185  | 
\noindent Can you see why this defines the set $\{0, 1, 2\}$?  Notice
 | 
| 550 | 186  | 
that set comprehensions are quite powerful constructions. For example they  | 
187  | 
could be used to define set union,  | 
|
188  | 
set intersection and set difference:  | 
|
| 
239
 
68d98140b90b
added notation handout
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents:  
diff
changeset
 | 
189  | 
|
| 
 
68d98140b90b
added notation handout
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents:  
diff
changeset
 | 
190  | 
\begin{eqnarray*}
 | 
| 
 
68d98140b90b
added notation handout
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents:  
diff
changeset
 | 
191  | 
A \cup B & \dn & \{x\;|\; x \in A \vee x \in B\}\\
 | 
| 
 
68d98140b90b
added notation handout
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents:  
diff
changeset
 | 
192  | 
A \cap B & \dn & \{x\;|\; x \in A \wedge x \in B\}\\
 | 
| 
 
68d98140b90b
added notation handout
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents:  
diff
changeset
 | 
193  | 
A \backslash B & \dn & \{x\;|\; x \in A \wedge x \not\in B\} 
 | 
| 
 
68d98140b90b
added notation handout
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents:  
diff
changeset
 | 
194  | 
\end{eqnarray*}
 | 
| 
 
68d98140b90b
added notation handout
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents:  
diff
changeset
 | 
195  | 
|
| 
241
 
10f02605a46a
polished
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
239 
diff
changeset
 | 
196  | 
\noindent In general set comprehensions are of the form  | 
| 
 
10f02605a46a
polished
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
239 
diff
changeset
 | 
197  | 
$\{a\;|\;P\}$ which stands for the set of all elements $a$
 | 
| 
736
 
0494995fd979
updated
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
550 
diff
changeset
 | 
198  | 
(from some set) for which some property $P$ holds. If programming  | 
| 
 
0494995fd979
updated
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
550 
diff
changeset
 | 
199  | 
is more your-kind-of-thing, you might recognise the similarities  | 
| 
 
0494995fd979
updated
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
550 
diff
changeset
 | 
200  | 
with for-comprehensions, for example for the silly set above you  | 
| 
 
0494995fd979
updated
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
550 
diff
changeset
 | 
201  | 
could write  | 
| 
 
0494995fd979
updated
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
550 
diff
changeset
 | 
202  | 
|
| 
 
0494995fd979
updated
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
550 
diff
changeset
 | 
203  | 
\begin{lstlisting}[numbers=none]
 | 
| 
 
0494995fd979
updated
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
550 
diff
changeset
 | 
204  | 
scala> for (n <- (0 to 10).toSet; if n * n < 9) yield n  | 
| 
 
0494995fd979
updated
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
550 
diff
changeset
 | 
205  | 
val res03: Set[Int] = Set(0, 1, 2)  | 
| 
 
0494995fd979
updated
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
550 
diff
changeset
 | 
206  | 
\end{lstlisting}
 | 
| 
 
0494995fd979
updated
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
550 
diff
changeset
 | 
207  | 
|
| 
 
0494995fd979
updated
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
550 
diff
changeset
 | 
208  | 
\noindent  | 
| 
 
0494995fd979
updated
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
550 
diff
changeset
 | 
209  | 
This is pretty much the same as $\{n\;|\;  n \in \mathbb{N} \wedge n^2 < 9\}$
 | 
| 
 
0494995fd979
updated
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
550 
diff
changeset
 | 
210  | 
just in Scala syntax.  | 
| 
241
 
10f02605a46a
polished
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
239 
diff
changeset
 | 
211  | 
|
| 
 
10f02605a46a
polished
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
239 
diff
changeset
 | 
212  | 
For defining sets, we will also often use the notion of the  | 
| 
 
10f02605a46a
polished
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
239 
diff
changeset
 | 
213  | 
``big union''. An example is as follows:  | 
| 
239
 
68d98140b90b
added notation handout
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents:  
diff
changeset
 | 
214  | 
|
| 
 
68d98140b90b
added notation handout
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents:  
diff
changeset
 | 
215  | 
\begin{equation}\label{bigunion}
 | 
| 
 
68d98140b90b
added notation handout
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents:  
diff
changeset
 | 
216  | 
\bigcup_{0\le n}\; \{n^2, n^2 + 1\}
 | 
| 
 
68d98140b90b
added notation handout
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents:  
diff
changeset
 | 
217  | 
\end{equation}
 | 
| 
 
68d98140b90b
added notation handout
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents:  
diff
changeset
 | 
218  | 
|
| 
 
68d98140b90b
added notation handout
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents:  
diff
changeset
 | 
219  | 
\noindent which is the set of all squares and their immediate  | 
| 
 
68d98140b90b
added notation handout
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents:  
diff
changeset
 | 
220  | 
successors, so  | 
| 
 
68d98140b90b
added notation handout
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents:  
diff
changeset
 | 
221  | 
|
| 
 
68d98140b90b
added notation handout
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents:  
diff
changeset
 | 
222  | 
\[  | 
| 
 
68d98140b90b
added notation handout
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents:  
diff
changeset
 | 
223  | 
\{0, 1, 2, 4, 5, 9, 10, 16, 17, \ldots\}
 | 
| 
 
68d98140b90b
added notation handout
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents:  
diff
changeset
 | 
224  | 
\]  | 
| 
 
68d98140b90b
added notation handout
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents:  
diff
changeset
 | 
225  | 
|
| 
 
68d98140b90b
added notation handout
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents:  
diff
changeset
 | 
226  | 
\noindent A big union is a sequence of unions which are  | 
| 
 
68d98140b90b
added notation handout
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents:  
diff
changeset
 | 
227  | 
indexed typically by a natural number. So the big union in  | 
| 
 
68d98140b90b
added notation handout
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents:  
diff
changeset
 | 
228  | 
\eqref{bigunion} could equally be written as
 | 
| 
 
68d98140b90b
added notation handout
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents:  
diff
changeset
 | 
229  | 
|
| 
 
68d98140b90b
added notation handout
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents:  
diff
changeset
 | 
230  | 
\[  | 
| 
 
68d98140b90b
added notation handout
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents:  
diff
changeset
 | 
231  | 
\{0, 1\} \cup \{1, 2\} \cup \{4, 5\} \cup \{9, 10\} \cup 
 | 
| 
 
68d98140b90b
added notation handout
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents:  
diff
changeset
 | 
232  | 
\ldots  | 
| 
 
68d98140b90b
added notation handout
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents:  
diff
changeset
 | 
233  | 
\]  | 
| 
 
68d98140b90b
added notation handout
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents:  
diff
changeset
 | 
234  | 
|
| 830 | 235  | 
\noindent but using the big union notation is more concise.\medskip  | 
| 
239
 
68d98140b90b
added notation handout
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents:  
diff
changeset
 | 
236  | 
|
| 550 | 237  | 
As an aside: While this stuff about sets might all look trivial or  | 
238  | 
even needlessly pedantic, \emph{Nature} is never simple. If you want
 | 
|
239  | 
to be amazed how complicated sets can get, watch out for the last  | 
|
240  | 
lecture just before Christmas where I want to convince you of the fact  | 
|
241  | 
that some sets are more infinite than other sets. Yes, you read  | 
|
242  | 
correctly, there can be sets that are ``more infinite'' than  | 
|
243  | 
others. If you think this is obvious: say you have the infinite set  | 
|
244  | 
$\mathbb{N}\backslash\{0\} = \{1, 2, 3, 4, \ldots\}$ which is all the
 | 
|
245  | 
natural numbers except $0$, and then compare it to the set  | 
|
246  | 
$\{0, 1, 2, 3, 4, \ldots\}$ which contains the $0$ and all other
 | 
|
247  | 
numbers. If you think, the second must be more infinite\ldots{} well,
 | 
|
248  | 
then think again. Because the two infinite sets  | 
|
| 496 | 249  | 
|
250  | 
\begin{center}
 | 
|
251  | 
  $\{1, 2, 3, 4, \ldots\}$ and
 | 
|
252  | 
  $\{0, 1, 2, 3, 4, \ldots\}$
 | 
|
253  | 
\end{center}
 | 
|
254  | 
||
255  | 
\noindent  | 
|
| 830 | 256  | 
contain actually the same amount of elements. Does this make sense to you?  | 
| 981 | 257  | 
If yes, good. If not, then something to learn about.  | 
| 830 | 258  | 
|
| 981 | 259  | 
Though this might all look strange, infinite sets will be a topic that  | 
260  | 
is very relevant to the material of this module. It tells us what we  | 
|
261  | 
can compute with a computer (actually an algorithm) and what we  | 
|
262  | 
cannot. But during the first 9 lectures we can go by without this  | 
|
263  | 
``weird'' stuff. \textbf{Update:} Unfortunately we have now so much
 | 
|
264  | 
material about compilers in the module that I needed to drop this  | 
|
265  | 
lecture about infinite sets. This is really a pity because notions  | 
|
266  | 
like infinity (and decidability) play important roles in compilers as  | 
|
267  | 
soon as one goes beyond the basics. Fortunately you can get pretty much  | 
|
268  | 
the same  | 
|
269  | 
material from very slick videos produced by the Youtube channel Veritasium  | 
|
270  | 
(\textit{How An Infinite Hotel Ran Out Of
 | 
|
271  | 
  Room}).\video{https://www.youtube.com/watch?v=OxGsU8oIWjY} End of
 | 
|
272  | 
aside.\smallskip  | 
|
| 
404
 
245d302791c7
updated
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
398 
diff
changeset
 | 
273  | 
|
| 
 
245d302791c7
updated
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
398 
diff
changeset
 | 
274  | 
Another important notion in this module are \defn{languages}, which
 | 
| 
398
 
c8ce95067c1a
updated
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
332 
diff
changeset
 | 
275  | 
are sets of strings. One of the main goals for us will be how to  | 
| 
239
 
68d98140b90b
added notation handout
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents:  
diff
changeset
 | 
276  | 
(formally) specify languages and to find out whether a string  | 
| 
241
 
10f02605a46a
polished
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
239 
diff
changeset
 | 
277  | 
is in a language or not.\footnote{You might wish to ponder
 | 
| 
 
10f02605a46a
polished
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
239 
diff
changeset
 | 
278  | 
whether this is in general a hard or easy problem, where  | 
| 
 
10f02605a46a
polished
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
239 
diff
changeset
 | 
279  | 
hardness is meant in terms of Turing decidable, for example.}  | 
| 
 
10f02605a46a
polished
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
239 
diff
changeset
 | 
280  | 
Note that the language containing the empty string $\{\dq{}\}$
 | 
| 
 
10f02605a46a
polished
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
239 
diff
changeset
 | 
281  | 
is not equal to $\varnothing$, the empty language (or empty  | 
| 
 
10f02605a46a
polished
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
239 
diff
changeset
 | 
282  | 
set): The former contains one element, namely \dq{} (also
 | 
| 
 
10f02605a46a
polished
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
239 
diff
changeset
 | 
283  | 
written $[\,]$), but the latter does not contain any  | 
| 
736
 
0494995fd979
updated
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
550 
diff
changeset
 | 
284  | 
element at all! Make sure you see the difference.  | 
| 
239
 
68d98140b90b
added notation handout
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents:  
diff
changeset
 | 
285  | 
|
| 
 
68d98140b90b
added notation handout
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents:  
diff
changeset
 | 
286  | 
For languages we define the operation of \defn{language
 | 
| 
241
 
10f02605a46a
polished
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
239 
diff
changeset
 | 
287  | 
concatenation}, written like in the string case as $A @ B$:  | 
| 
239
 
68d98140b90b
added notation handout
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents:  
diff
changeset
 | 
288  | 
|
| 
 
68d98140b90b
added notation handout
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents:  
diff
changeset
 | 
289  | 
\begin{equation}\label{langconc}
 | 
| 
 
68d98140b90b
added notation handout
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents:  
diff
changeset
 | 
290  | 
A @ B \dn \{s_1 @ s_2\;|\; s_1\in A \wedge s_2\in B\}
 | 
| 
 
68d98140b90b
added notation handout
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents:  
diff
changeset
 | 
291  | 
\end{equation}
 | 
| 
 
68d98140b90b
added notation handout
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents:  
diff
changeset
 | 
292  | 
|
| 
 
68d98140b90b
added notation handout
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents:  
diff
changeset
 | 
293  | 
\noindent Be careful to understand the difference: the $@$  | 
| 
 
68d98140b90b
added notation handout
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents:  
diff
changeset
 | 
294  | 
in $s_1 @ s_2$ is string concatenation, while $A @ B$ refers  | 
| 
 
68d98140b90b
added notation handout
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents:  
diff
changeset
 | 
295  | 
to the concatenation of two languages (or sets of strings).  | 
| 
 
68d98140b90b
added notation handout
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents:  
diff
changeset
 | 
296  | 
As an example suppose $A=\{ab, ac\}$ and $B=\{zzz, qq, r\}$,
 | 
| 
241
 
10f02605a46a
polished
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
239 
diff
changeset
 | 
297  | 
then $A \,@\, B$ is the language  | 
| 
239
 
68d98140b90b
added notation handout
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents:  
diff
changeset
 | 
298  | 
|
| 
 
68d98140b90b
added notation handout
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents:  
diff
changeset
 | 
299  | 
\[  | 
| 
 
68d98140b90b
added notation handout
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents:  
diff
changeset
 | 
300  | 
\{abzzz, abqq, abr, aczzz, acqq, acr\}
 | 
| 
 
68d98140b90b
added notation handout
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents:  
diff
changeset
 | 
301  | 
\]  | 
| 
 
68d98140b90b
added notation handout
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents:  
diff
changeset
 | 
302  | 
|
| 
736
 
0494995fd979
updated
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
550 
diff
changeset
 | 
303  | 
\noindent The cool thing about Scala is that we can define language  | 
| 
 
0494995fd979
updated
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
550 
diff
changeset
 | 
304  | 
concatenation very elegantly as  | 
| 
 
0494995fd979
updated
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
550 
diff
changeset
 | 
305  | 
|
| 
 
0494995fd979
updated
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
550 
diff
changeset
 | 
306  | 
\begin{lstlisting}[numbers=none]
 | 
| 
 
0494995fd979
updated
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
550 
diff
changeset
 | 
307  | 
def concat(A: Set[String], B: Set[String]) =  | 
| 
 
0494995fd979
updated
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
550 
diff
changeset
 | 
308  | 
for (x <- A ; y <- B) yield x ++ y  | 
| 
 
0494995fd979
updated
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
550 
diff
changeset
 | 
309  | 
\end{lstlisting}
 | 
| 
 
0494995fd979
updated
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
550 
diff
changeset
 | 
310  | 
|
| 
 
0494995fd979
updated
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
550 
diff
changeset
 | 
311  | 
\noindent  | 
| 
 
0494995fd979
updated
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
550 
diff
changeset
 | 
312  | 
where \code{++} is string concatenation in Scala.
 | 
| 
 
0494995fd979
updated
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
550 
diff
changeset
 | 
313  | 
|
| 
 
0494995fd979
updated
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
550 
diff
changeset
 | 
314  | 
Recall the properties for string concatenation. For  | 
| 
239
 
68d98140b90b
added notation handout
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents:  
diff
changeset
 | 
315  | 
language concatenation we have the following properties  | 
| 
 
68d98140b90b
added notation handout
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents:  
diff
changeset
 | 
316  | 
|
| 
 
68d98140b90b
added notation handout
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents:  
diff
changeset
 | 
317  | 
\begin{center}
 | 
| 
 
68d98140b90b
added notation handout
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents:  
diff
changeset
 | 
318  | 
\begin{tabular}{ll}
 | 
| 
 
68d98140b90b
added notation handout
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents:  
diff
changeset
 | 
319  | 
associativity: & $(A @ B) @ C = A @ (B @ C)$\\  | 
| 
 
68d98140b90b
added notation handout
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents:  
diff
changeset
 | 
320  | 
unit element:  & $A \,@\, \{[]\} = \{[]\} \,@\, A = A$\\
 | 
| 
 
68d98140b90b
added notation handout
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents:  
diff
changeset
 | 
321  | 
zero element: & $A \,@\, \varnothing = \varnothing \,@\, A =  | 
| 
 
68d98140b90b
added notation handout
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents:  
diff
changeset
 | 
322  | 
\varnothing$  | 
| 
 
68d98140b90b
added notation handout
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents:  
diff
changeset
 | 
323  | 
\end{tabular}
 | 
| 
 
68d98140b90b
added notation handout
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents:  
diff
changeset
 | 
324  | 
\end{center}
 | 
| 
 
68d98140b90b
added notation handout
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents:  
diff
changeset
 | 
325  | 
|
| 
241
 
10f02605a46a
polished
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
239 
diff
changeset
 | 
326  | 
\noindent Note the difference in the last two lines: the empty  | 
| 
736
 
0494995fd979
updated
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
550 
diff
changeset
 | 
327  | 
set behaves like $0$ for multiplication, whereas the set $\{[]\}$
 | 
| 
 
0494995fd979
updated
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
550 
diff
changeset
 | 
328  | 
behaves like $1$ for multiplication ($n * 1 = n$ and $n * 0 = 0$).  | 
| 
 
0494995fd979
updated
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
550 
diff
changeset
 | 
329  | 
Again this is a subtletly you need to get compfortable with.  | 
| 
239
 
68d98140b90b
added notation handout
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents:  
diff
changeset
 | 
330  | 
|
| 502 | 331  | 
Using the operation of language concatenation, we can define a  | 
| 
239
 
68d98140b90b
added notation handout
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents:  
diff
changeset
 | 
332  | 
\defn{language power} operation as follows:
 | 
| 
 
68d98140b90b
added notation handout
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents:  
diff
changeset
 | 
333  | 
|
| 
 
68d98140b90b
added notation handout
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents:  
diff
changeset
 | 
334  | 
\begin{eqnarray*}
 | 
| 
 
68d98140b90b
added notation handout
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents:  
diff
changeset
 | 
335  | 
A^0     & \dn & \{[]\}\\
 | 
| 
 
68d98140b90b
added notation handout
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents:  
diff
changeset
 | 
336  | 
A^{n+1} & \dn & A \,@\, A^n
 | 
| 
 
68d98140b90b
added notation handout
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents:  
diff
changeset
 | 
337  | 
\end{eqnarray*}
 | 
| 
 
68d98140b90b
added notation handout
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents:  
diff
changeset
 | 
338  | 
|
| 
241
 
10f02605a46a
polished
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
239 
diff
changeset
 | 
339  | 
\noindent This definition is by recursion on natural numbers.  | 
| 
239
 
68d98140b90b
added notation handout
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents:  
diff
changeset
 | 
340  | 
Note carefully that the zero-case is not defined as the empty  | 
| 
 
68d98140b90b
added notation handout
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents:  
diff
changeset
 | 
341  | 
set, but the set containing the empty string. So no matter  | 
| 
 
68d98140b90b
added notation handout
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents:  
diff
changeset
 | 
342  | 
what the set $A$ is, $A^0$ will always be $\{[]\}$. (There is
 | 
| 
 
68d98140b90b
added notation handout
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents:  
diff
changeset
 | 
343  | 
another hint about a connection between the $@$-operation and  | 
| 502 | 344  | 
multiplication: How is $x^n$ defined in mathematics and what is  | 
| 
242
 
35104ee14f87
updated
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
241 
diff
changeset
 | 
345  | 
$x^0$?)  | 
| 
239
 
68d98140b90b
added notation handout
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents:  
diff
changeset
 | 
346  | 
|
| 
242
 
35104ee14f87
updated
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
241 
diff
changeset
 | 
347  | 
Next we can define the \defn{star operation} for languages:
 | 
| 
404
 
245d302791c7
updated
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
398 
diff
changeset
 | 
348  | 
$A\star$ is the union of all powers of $A$, or short  | 
| 
239
 
68d98140b90b
added notation handout
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents:  
diff
changeset
 | 
349  | 
|
| 
241
 
10f02605a46a
polished
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
239 
diff
changeset
 | 
350  | 
\begin{equation}\label{star}
 | 
| 
404
 
245d302791c7
updated
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
398 
diff
changeset
 | 
351  | 
A\star \dn \bigcup_{0\le n}\; A^n
 | 
| 
241
 
10f02605a46a
polished
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
239 
diff
changeset
 | 
352  | 
\end{equation}
 | 
| 
239
 
68d98140b90b
added notation handout
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents:  
diff
changeset
 | 
353  | 
|
| 
241
 
10f02605a46a
polished
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
239 
diff
changeset
 | 
354  | 
\noindent This star operation is often also called  | 
| 
736
 
0494995fd979
updated
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
550 
diff
changeset
 | 
355  | 
\emph{Kleene-star} after the mathematician/computer scientist
 | 
| 
 
0494995fd979
updated
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
550 
diff
changeset
 | 
356  | 
Stephen Cole Kleene. Unfolding the definition in~\eqref{star}
 | 
| 
241
 
10f02605a46a
polished
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
239 
diff
changeset
 | 
357  | 
gives  | 
| 
239
 
68d98140b90b
added notation handout
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents:  
diff
changeset
 | 
358  | 
|
| 
 
68d98140b90b
added notation handout
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents:  
diff
changeset
 | 
359  | 
\[  | 
| 502 | 360  | 
A\star \dn A^0 \cup A^1 \cup A^2 \cup A^3 \cup \ldots  | 
| 
239
 
68d98140b90b
added notation handout
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents:  
diff
changeset
 | 
361  | 
\]  | 
| 
 
68d98140b90b
added notation handout
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents:  
diff
changeset
 | 
362  | 
|
| 
 
68d98140b90b
added notation handout
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents:  
diff
changeset
 | 
363  | 
\noindent  | 
| 
 
68d98140b90b
added notation handout
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents:  
diff
changeset
 | 
364  | 
which is equal to  | 
| 
 
68d98140b90b
added notation handout
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents:  
diff
changeset
 | 
365  | 
|
| 
 
68d98140b90b
added notation handout
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents:  
diff
changeset
 | 
366  | 
\[  | 
| 502 | 367  | 
A\star \dn \{[]\} \,\cup\, A \,\cup\, A @ A \,\cup\, A @ A @ A \,\cup\, \ldots
 | 
| 
239
 
68d98140b90b
added notation handout
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents:  
diff
changeset
 | 
368  | 
\]  | 
| 
 
68d98140b90b
added notation handout
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents:  
diff
changeset
 | 
369  | 
|
| 
404
 
245d302791c7
updated
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
398 
diff
changeset
 | 
370  | 
\noindent We can see that the empty string is always in $A\star$,  | 
| 
239
 
68d98140b90b
added notation handout
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents:  
diff
changeset
 | 
371  | 
no matter what $A$ is. This is because $[] \in A^0$. To make  | 
| 
241
 
10f02605a46a
polished
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
239 
diff
changeset
 | 
372  | 
sure you understand these definitions, I leave you to answer  | 
| 
404
 
245d302791c7
updated
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
398 
diff
changeset
 | 
373  | 
what $\{[]\}\star$ and $\varnothing\star$ are?
 | 
| 
239
 
68d98140b90b
added notation handout
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents:  
diff
changeset
 | 
374  | 
|
| 
 
68d98140b90b
added notation handout
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents:  
diff
changeset
 | 
375  | 
Recall that an alphabet is often referred to by the letter  | 
| 
404
 
245d302791c7
updated
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
398 
diff
changeset
 | 
376  | 
$\Sigma$. We can now write for the set of \emph{all} strings
 | 
| 502 | 377  | 
over this alphabet as $\Sigma\star$. In doing so we also include the  | 
| 
736
 
0494995fd979
updated
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
550 
diff
changeset
 | 
378  | 
empty string as a possible string (over $\Sigma$). Assuming $\Sigma  | 
| 
404
 
245d302791c7
updated
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
398 
diff
changeset
 | 
379  | 
= \{a, b\}$, then $\Sigma\star$ is
 | 
| 
239
 
68d98140b90b
added notation handout
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents:  
diff
changeset
 | 
380  | 
|
| 
 
68d98140b90b
added notation handout
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents:  
diff
changeset
 | 
381  | 
\[  | 
| 
246
 
baf41b05210f
updated
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
242 
diff
changeset
 | 
382  | 
\{[], a, b, aa, ab, ba, bb, aaa, aab, aba, abb, baa, bab, \ldots\}
 | 
| 
239
 
68d98140b90b
added notation handout
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents:  
diff
changeset
 | 
383  | 
\]  | 
| 
 
68d98140b90b
added notation handout
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents:  
diff
changeset
 | 
384  | 
|
| 
736
 
0494995fd979
updated
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
550 
diff
changeset
 | 
385  | 
\noindent or in words all strings containing $a$s and  | 
| 
246
 
baf41b05210f
updated
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
242 
diff
changeset
 | 
386  | 
$b$s only, plus the empty string.  | 
| 
736
 
0494995fd979
updated
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
550 
diff
changeset
 | 
387  | 
\bigskip  | 
| 
239
 
68d98140b90b
added notation handout
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents:  
diff
changeset
 | 
388  | 
|
| 
736
 
0494995fd979
updated
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
550 
diff
changeset
 | 
389  | 
\noindent  | 
| 
 
0494995fd979
updated
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
550 
diff
changeset
 | 
390  | 
Thanks for making it until here! There are also some personal conventions  | 
| 
 
0494995fd979
updated
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
550 
diff
changeset
 | 
391  | 
about regular expressions. But I will explain them in the handout for the  | 
| 981 | 392  | 
first week. An exercise you can do: Implement the power operation  | 
393  | 
for languages in your preferred programming language and try out some examples.  | 
|
| 
239
 
68d98140b90b
added notation handout
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents:  
diff
changeset
 | 
394  | 
\end{document}
 | 
| 
 
68d98140b90b
added notation handout
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents:  
diff
changeset
 | 
395  | 
|
| 
 
68d98140b90b
added notation handout
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents:  
diff
changeset
 | 
396  | 
%%% Local Variables:  | 
| 
 
68d98140b90b
added notation handout
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents:  
diff
changeset
 | 
397  | 
%%% mode: latex  | 
| 
 
68d98140b90b
added notation handout
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents:  
diff
changeset
 | 
398  | 
%%% TeX-master: t  | 
| 
 
68d98140b90b
added notation handout
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents:  
diff
changeset
 | 
399  | 
%%% End:  |