\documentclass{article}+ −
\usepackage{../style}+ −
\usepackage{../langs}+ −
+ −
+ −
+ −
\begin{document}+ −
\fnote{\copyright{} Christian Urban, King's College London, 2014, 2015, 2016, 2017, 2018}+ −
+ −
+ −
\section*{A Crash-Course on Notation}+ −
+ −
There are innumerable books available about compilers, automata theory+ −
and formal languages. Unfortunately, they often use their own+ −
notational conventions and their own symbols. This handout is meant to+ −
clarify some of the notation I will use. I apologise in advance that+ −
sometimes I will be a bit fuzzy\ldots the problem is that often we+ −
want to have convenience in our mathematical definitions (to make them+ −
readable and understandable), but other times we need pedantic+ −
precision for actual programs.+ −
+ −
\subsubsection*{Characters and Strings}+ −
+ −
The most important concept in this module are strings. Strings+ −
are composed of \defn{characters}. While characters are surely+ −
a familiar concept, we will make one subtle distinction in+ −
this module. If we want to refer to concrete characters, like+ −
\code{a}, \code{b}, \code{c} and so on, we use a typewriter font.+ −
Accordingly if we want to refer to the concrete characters of+ −
my email address we shall write+ −
+ −
\begin{center}+ −
\pcode{christian.urban@kcl.ac.uk}+ −
\end{center}+ −
+ −
\noindent If we also need to explicitly indicate the ``space''+ −
character, we write \VS{}\hspace{1mm}. For example+ −
+ −
\begin{center}+ −
\tt{}hello\VS\hspace{0.5mm}world+ −
\end{center}+ −
+ −
+ −
\noindent But often we do not care which particular characters+ −
we use. In such cases we use the italic font and write $a$,+ −
$b$, $c$ and so on for characters. Therefore if we need a+ −
representative string, we might write+ −
+ −
\[+ −
abracadabra+ −
\]+ −
+ −
\noindent In this string, we do not really care what the+ −
characters stand for, except we do care about the fact that+ −
for example the character $a$ is not equal to $b$ and so on.+ −
Why do I make this distinction? Because we often need to+ −
define functions using variables ranging over characters. We+ −
need to somehow say this is a variable, say $c$, ranging over+ −
characters, while this is the actual character \pcode{c}.+ −
+ −
An \defn{alphabet} is a (non-empty) finite set of characters.+ −
Often the letter $\Sigma$ is used to refer to an alphabet. For+ −
example the ASCII characters \pcode{a} to \pcode{z} form an+ −
alphabet. The digits $0$ to $9$ are another alphabet. The+ −
Greek letters $\alpha$ to $\omega$ also form an alphabet. If+ −
nothing else is specified, we usually assume the alphabet+ −
consists of just the lower-case letters $a$, $b$, \ldots, $z$.+ −
Sometimes, however, we explicitly want to restrict strings to+ −
contain only the letters $a$ and $b$, for example. In this+ −
case we will state that the alphabet is the set $\{a, b\}$. + −
+ −
\defn{Strings} are lists of characters. Unfortunately, there+ −
are many ways how we can write down strings. In programming+ −
languages, they are usually written as \dq{\texttt{hello}} where the+ −
double quotes indicate that we are dealing with a string. In+ −
typed programming languages, such as Scala, strings have a special+ −
type---namely \pcode{String} which is different from the type+ −
for lists of characters. This is because strings can be+ −
efficiently represented in memory, unlike lists. Since+ −
\code{String} and the type of lists of characters+ −
(\code{List[Char]}) are not the same, we need to explicitly+ −
coerce elements between the two types, for example+ −
+ −
\begin{lstlisting}[numbers=none]+ −
scala> "abc".toList+ −
res01: List[Char] = List(a, b, c)+ −
\end{lstlisting}+ −
+ −
\noindent+ −
However, we do not want to do this kind of explicit coercion in our+ −
pencil-and-paper, everyday arguments. So in our (mathematical)+ −
definitions we regard strings as lists of characters and we will also+ −
write \dq{$hello$} as list+ −
+ −
\[+ −
[\text{\it h, e, l, l, o}] \qquad\text{or simply}\qquad \textit{hello}+ −
\]+ −
+ −
\noindent The important point is that we can always decompose+ −
such strings. For example, we will often consider the first+ −
character of a string, say $h$, and the ``rest'' of a string+ −
say \dq{\textit{ello}} when making definitions about strings.+ −
There are also some subtleties with the empty string,+ −
sometimes written as \dq{} but also as the empty list of+ −
characters $[\,]$.\footnote{In the literature you can also+ −
often find that $\varepsilon$ or $\lambda$ is used to+ −
represent the empty string.} + −
+ −
Two strings, say $s_1$ and $s_2$, can be \defn{concatenated},+ −
which we write as $s_1 @ s_2$. If we regard $s_1$ and $s_2$ as+ −
lists of characters, then $@$ is the list-append function.+ −
Suppose we are given two strings \dq{\textit{foo}} and+ −
\dq{\textit{bar}}, then their concatenation, written+ −
\dq{\textit{foo}} $@$ \dq{\textit{bar}}, gives+ −
\dq{\textit{foobar}}. But as said above, we will often+ −
simplify our life and just drop the double quotes whenever it+ −
is clear we are talking about strings. So we will often just+ −
write \textit{foo}, \textit{bar}, \textit{foobar} or+ −
\textit{foo $@$ bar}.+ −
+ −
Occasionally we will use the notation $a^n$ for strings, which stands+ −
for the string of $n$ repeated $a$s. So $a^{n}b^{n}$ is a string that+ −
has some number of $a$s followed by the same number of $b$s. A simple+ −
property of string concatenation is \emph{associativity}, meaning+ −
+ −
\[(s_1 @ s_2) @ s_3 = s_1 @ (s_2 @ s_3)\] + −
+ −
\noindent are always equal strings. The empty string behaves+ −
like a \emph{unit element}, therefore+ −
+ −
\[s \,@\, [] = [] \,@\, s = s\]+ −
+ −
+ −
\subsubsection*{Sets and Languages}+ −
+ −
We will use the familiar operations $\cup$, $\cap$, $\subset$+ −
and $\subseteq$ for sets. For the empty set we will either+ −
write $\varnothing$ or $\{\,\}$. The set containing the+ −
natural numbers $1$, $2$ and $3$, for example, we will write+ −
with curly braces as+ −
+ −
\[+ −
\{1, 2, 3\}+ −
\]+ −
+ −
\noindent The notation $\in$ means \emph{element of}, so $1 \in \{1,+ −
2, 3\}$ is true and $4 \in \{1, 2, 3\}$ is false. Note that the+ −
\emph{list} $[1, 2, 3]$ is something different from the \emph{set}+ −
$\{1, 2, 3\}$: in the former we care about the order and potentially+ −
several occurrences of a number; while with the latter we do not.+ −
Also sets can potentially have infinitely many elements, whereas lists+ −
cannot. For example+ −
the set of all natural numbers $\{0, 1, 2, \ldots\}$ is infinite. This+ −
set is often also abbreviated as $\mathbb{N}$. Lists can be very large, but they cannot contain infinitely many elements.+ −
+ −
We can define sets by giving all their elements, for example $\{0, 1\}$ for+ −
the set containing just $0$ and $1$. But often we need to use \defn{set+ −
comprehensions} to define sets. For example the set of all \emph{even}+ −
natural numbers can be defined as+ −
+ −
\[+ −
\{n\;|\;n\in\mathbb{N} \wedge n\;\text{is even}\}+ −
\]+ −
+ −
\noindent Set comprehensions consist of a ``base set'' (in this case+ −
all the natural numbers) and a predicate (here eveness).+ −
+ −
Though silly, but the set $\{0, 1, 2\}$ could also be+ −
defined by the following set comprehension+ −
+ −
\[+ −
\{n\;|\; n \in \mathbb{N} \wedge n^2 < 9\}+ −
\]+ −
+ −
\noindent Can you see why this defines the set $\{0, 1, 2\}$? Notice+ −
that set comprehensions are quite powerful constructions. For example they+ −
could be used to define set union,+ −
set intersection and set difference:+ −
+ −
\begin{eqnarray*}+ −
A \cup B & \dn & \{x\;|\; x \in A \vee x \in B\}\\+ −
A \cap B & \dn & \{x\;|\; x \in A \wedge x \in B\}\\+ −
A \backslash B & \dn & \{x\;|\; x \in A \wedge x \not\in B\} + −
\end{eqnarray*}+ −
+ −
\noindent In general set comprehensions are of the form+ −
$\{a\;|\;P\}$ which stands for the set of all elements $a$+ −
(from some set) for which some property $P$ holds.+ −
+ −
For defining sets, we will also often use the notion of the+ −
``big union''. An example is as follows:+ −
+ −
\begin{equation}\label{bigunion}+ −
\bigcup_{0\le n}\; \{n^2, n^2 + 1\}+ −
\end{equation}+ −
+ −
\noindent which is the set of all squares and their immediate+ −
successors, so+ −
+ −
\[+ −
\{0, 1, 2, 4, 5, 9, 10, 16, 17, \ldots\}+ −
\]+ −
+ −
\noindent A big union is a sequence of unions which are + −
indexed typically by a natural number. So the big union in+ −
\eqref{bigunion} could equally be written as+ −
+ −
\[+ −
\{0, 1\} \cup \{1, 2\} \cup \{4, 5\} \cup \{9, 10\} \cup + −
\ldots+ −
\]+ −
+ −
\noindent but using the big union notation is more concise.+ −
+ −
As an aside: While this stuff about sets might all look trivial or+ −
even needlessly pedantic, \emph{Nature} is never simple. If you want+ −
to be amazed how complicated sets can get, watch out for the last+ −
lecture just before Christmas where I want to convince you of the fact+ −
that some sets are more infinite than other sets. Yes, you read+ −
correctly, there can be sets that are ``more infinite'' than+ −
others. If you think this is obvious: say you have the infinite set+ −
$\mathbb{N}\backslash\{0\} = \{1, 2, 3, 4, \ldots\}$ which is all the+ −
natural numbers except $0$, and then compare it to the set+ −
$\{0, 1, 2, 3, 4, \ldots\}$ which contains the $0$ and all other+ −
numbers. If you think, the second must be more infinite\ldots{} well,+ −
then think again. Because the two infinite sets+ −
+ −
\begin{center}+ −
$\{1, 2, 3, 4, \ldots\}$ and+ −
$\{0, 1, 2, 3, 4, \ldots\}$+ −
\end{center}+ −
+ −
\noindent+ −
contain actually the same amount of elements. Does this make sense?+ −
Though this might all look strange, infinite sets will be a+ −
topic that is very relevant to the material of this module. It tells+ −
us what we can compute with a computer (actually algorithm) and what+ −
we cannot. But during the first 9 lectures we can go by without this+ −
``weird'' stuff. End of aside.\smallskip+ −
+ −
Another important notion in this module are \defn{languages}, which+ −
are sets of strings. One of the main goals for us will be how to+ −
(formally) specify languages and to find out whether a string+ −
is in a language or not.\footnote{You might wish to ponder+ −
whether this is in general a hard or easy problem, where+ −
hardness is meant in terms of Turing decidable, for example.}+ −
Note that the language containing the empty string $\{\dq{}\}$+ −
is not equal to $\varnothing$, the empty language (or empty+ −
set): The former contains one element, namely \dq{} (also+ −
written $[\,]$), but the latter does not contain any+ −
element.+ −
+ −
For languages we define the operation of \defn{language+ −
concatenation}, written like in the string case as $A @ B$:+ −
+ −
\begin{equation}\label{langconc}+ −
A @ B \dn \{s_1 @ s_2\;|\; s_1\in A \wedge s_2\in B\}+ −
\end{equation}+ −
+ −
\noindent Be careful to understand the difference: the $@$+ −
in $s_1 @ s_2$ is string concatenation, while $A @ B$ refers + −
to the concatenation of two languages (or sets of strings).+ −
As an example suppose $A=\{ab, ac\}$ and $B=\{zzz, qq, r\}$,+ −
then $A \,@\, B$ is the language+ −
+ −
\[+ −
\{abzzz, abqq, abr, aczzz, acqq, acr\}+ −
\] + −
+ −
\noindent Recall the properties for string concatenation. For+ −
language concatenation we have the following properties+ −
+ −
\begin{center}+ −
\begin{tabular}{ll}+ −
associativity: & $(A @ B) @ C = A @ (B @ C)$\\+ −
unit element: & $A \,@\, \{[]\} = \{[]\} \,@\, A = A$\\+ −
zero element: & $A \,@\, \varnothing = \varnothing \,@\, A = + −
\varnothing$+ −
\end{tabular}+ −
\end{center}+ −
+ −
\noindent Note the difference in the last two lines: the empty+ −
set behaves like $0$ for multiplication and the set $\{[]\}$+ −
like $1$ for multiplication ($n * 1 = n$ and $n * 0 = 0$).+ −
+ −
Using the operation of language concatenation, we can define a+ −
\defn{language power} operation as follows:+ −
+ −
\begin{eqnarray*}+ −
A^0 & \dn & \{[]\}\\+ −
A^{n+1} & \dn & A \,@\, A^n+ −
\end{eqnarray*}+ −
+ −
\noindent This definition is by recursion on natural numbers.+ −
Note carefully that the zero-case is not defined as the empty+ −
set, but the set containing the empty string. So no matter+ −
what the set $A$ is, $A^0$ will always be $\{[]\}$. (There is+ −
another hint about a connection between the $@$-operation and+ −
multiplication: How is $x^n$ defined in mathematics and what is+ −
$x^0$?)+ −
+ −
Next we can define the \defn{star operation} for languages:+ −
$A\star$ is the union of all powers of $A$, or short+ −
+ −
\begin{equation}\label{star}+ −
A\star \dn \bigcup_{0\le n}\; A^n+ −
\end{equation}+ −
+ −
\noindent This star operation is often also called+ −
\emph{Kleene-star}. Unfolding the definition in~\eqref{star}+ −
gives+ −
+ −
\[+ −
A\star \dn A^0 \cup A^1 \cup A^2 \cup A^3 \cup \ldots+ −
\]+ −
+ −
\noindent+ −
which is equal to + −
+ −
\[+ −
A\star \dn \{[]\} \,\cup\, A \,\cup\, A @ A \,\cup\, A @ A @ A \,\cup\, \ldots+ −
\]+ −
+ −
\noindent We can see that the empty string is always in $A\star$,+ −
no matter what $A$ is. This is because $[] \in A^0$. To make+ −
sure you understand these definitions, I leave you to answer+ −
what $\{[]\}\star$ and $\varnothing\star$ are?+ −
+ −
Recall that an alphabet is often referred to by the letter+ −
$\Sigma$. We can now write for the set of \emph{all} strings+ −
over this alphabet as $\Sigma\star$. In doing so we also include the+ −
empty string as a possible string over $\Sigma$. So if $\Sigma+ −
= \{a, b\}$, then $\Sigma\star$ is+ −
+ −
\[+ −
\{[], a, b, aa, ab, ba, bb, aaa, aab, aba, abb, baa, bab, \ldots\}+ −
\]+ −
+ −
\noindent or in other words all strings containing $a$s and+ −
$b$s only, plus the empty string.+ −
+ −
\end{document}+ −
+ −
%%% Local Variables: + −
%%% mode: latex+ −
%%% TeX-master: t+ −
%%% End: + −