--- a/handouts/notation.tex Sat Sep 06 21:53:41 2014 +0100
+++ b/handouts/notation.tex Sun Sep 07 08:37:44 2014 +0100
@@ -8,20 +8,26 @@
\section*{A Crash-Course on Notation}
+There are innumerable books available about automata and
+formal languages. Unfortunately, they often use their own
+notational conventions and their own symbols. This handout is
+meant to clarify some of the notation I will use.
+
\subsubsection*{Characters and Strings}
-In this module we will often use \defn{characters}. While they
-are surely familiar, we will make one subtle distinction. If
-we want to refer to concrete characters, like \code{a},
-\code{b} and so on, we use a typewriter font. So if we want to
-refer to the concrete characters of my email address we shall
-write
+The most important concept in this module are strings. Strings
+are composed of \defn{characters}. While characters are surely
+a familiar concept, we will make one subtle distinction in
+this module. If we want to refer to concrete characters, like
+\code{a}, \code{b} and so on, we use a typewriter font.
+Accordingly if we want to refer to the concrete characters of
+my email address we shall write
\begin{center}
\pcode{christian.urban@kcl.ac.uk}
\end{center}
-\noindent If we need to explicitly indicate the ``space''
+\noindent If we also need to explicitly indicate the ``space''
character, we write \VS{}\hspace{1mm}. For example
\begin{center}
@@ -29,35 +35,36 @@
\end{center}
-\noindent But often we do not care
-about which characters we use. In such cases we us the italic
-font and write $a$, $b$ and so on. So if we need a
+\noindent But often we do not care which particular characters
+we use. In such cases we use the italic font and write $a$,
+$b$ and so on for characters. Therefore if we need a
representative string, we might write
\begin{equation}\label{abracadabra}
abracadabra
\end{equation}
-\noindent We do not really care what the characters stand for,
-except we do care about is that for example the character $a$
-is not equal to $b$.
+\noindent In this string, we do not really care what the
+characters stand for, except we do care about the fact that
+for example the character $a$ is not equal to $b$ and so on.
-An \defn{alphabet} is a finite set of characters. Often the
-letter $\Sigma$ is used to refer to an alphabet. For example
-the ASCII characters \pcode{a} to \pcode{z} form an alphabet.
-The digits $0$ to $9$ are another alphabet. If nothing else is
-specified, we usually assume the alphabet consists of just the
-lower-case letters $a$, $b$, \ldots, $z$. Sometimes, however,
-we explicitly restrict strings to contain, for example, only
-the letters $a$ and $b$. In this case we say the alphabet is
-the set $\{a, b\}$.
+An \defn{alphabet} is a (non-empty) finite set of characters.
+Often the letter $\Sigma$ is used to refer to an alphabet. For
+example the ASCII characters \pcode{a} to \pcode{z} form an
+alphabet. The digits $0$ to $9$ are another alphabet. The
+Greek letters $\alpha$ to $\omega$ also form an alphabet. If
+nothing else is specified, we usually assume the alphabet
+consists of just the lower-case letters $a$, $b$, \ldots, $z$.
+Sometimes, however, we explicitly want to restrict strings to
+contain only the letters $a$ and $b$, for example. In this
+case we will state that the alphabet is the set $\{a, b\}$.
\defn{Strings} are lists of characters. Unfortunately, there
are many ways how we can write down strings. In programming
languages, they are usually written as \dq{$hello$} where the
-double quotes indicate that we dealing with a string. But
-since, strings are lists of characters we could also write
-this string as
+double quotes indicate that we are dealing with a string. But
+since we regard strings as lists of characters we could also
+write this string as
\[
[\text{\it h, e, l, l, o}]
@@ -67,18 +74,21 @@
such strings. For example, we will often consider the first
character of a string, say $h$, and the ``rest'' of a string
say \dq{\textit{ello}} when making definitions about strings.
-There are some subtleties with the empty string, sometimes
-written as \dq{} but also as the empty list of characters
-$[\,]$. Two strings, for example $s_1$ and $s_2$, can be
-\emph{concatenated}, which we write as $s_1 @ s_2$. Suppose we
-are given two strings \dq{\textit{foo}} and \dq{\textit{bar}},
-then their concatenation, writen
-\dq{\textit{foo}} $@$ \dq{\textit{bar}}, gives
-\dq{\textit{foobar}}. Often we will simplify our life and just
-drop the double quotes whenever it is clear we are talking
-about strings, writing as already in \eqref{abracadabra} just
-\textit{foo}, \textit{bar}, \textit{foobar} or \textit{foo $@$
-bar}.
+There are also some subtleties with the empty string,
+sometimes written as \dq{} but also as the empty list of
+characters $[\,]$.\footnote{In the literature you can also
+often find that $\varepsilon$ or $\lambda$ is used to
+represent the empty string.}
+
+Two strings, say $s_1$ and $s_2$, can be \defn{concatenated},
+which we write as $s_1 @ s_2$. Suppose we are given two
+strings \dq{\textit{foo}} and \dq{\textit{bar}}, then their
+concatenation, writen \dq{\textit{foo}} $@$ \dq{\textit{bar}},
+gives \dq{\textit{foobar}}. Often we will simplify our life
+and just drop the double quotes whenever it is clear we are
+talking about strings, writing as already in
+\eqref{abracadabra} just \textit{foo}, \textit{bar},
+\textit{foobar} or \textit{foo $@$ bar}.
Some simple properties of string concatenation hold. For
example the concatenation operation is \emph{associative},
@@ -91,6 +101,11 @@
\[s \,@\, [] = [] \,@\, s = s\]
+
+Occasionally we will use the notation $a^n$ for strings, which
+stands for the string of $n$ repeated $a$s. So $a^{n}b^{n}$ is
+a string that has as many $a$s as $b$s.
+
While for us strings are just lists of characters, programming
languages often differentiate between the two concepts. In
Scala, for example, there is the type of \code{String} and the
@@ -106,10 +121,11 @@
\subsubsection*{Sets and Languages}
-We will use the familiar operations $\cup$ and $\cap$ for
-sets. For the empty set we will either write $\varnothing$ or
-$\{\,\}$. The set containing, for example, the natural numbers
-$1$, $2$ and $3$ we will write as
+We will use the familiar operations $\cup$, $\cap$, $\subset$
+and $\subseteq$ for sets. For the empty set we will either
+write $\varnothing$ or $\{\,\}$. The set containing the
+natural numbers $1$, $2$ and $3$, for example, we will write
+with curly braces as
\[
\{1, 2, 3\}
@@ -120,9 +136,9 @@
Sets can potentially have infinitely many elements. For
example the set of all natural numbers $\{0, 1, 2, \ldots\}$
is infinite. This set is often also abbreviated as
-$\mathbb{N}$. We can define sets by giving all elements, like
-$\{0, 1\}$, but also by \defn{set comprehensions}. For example
-the set of all even natural numbers can be defined as
+$\mathbb{N}$. We can define sets by giving all elements, for
+example $\{0, 1\}$, but also by \defn{set comprehensions}. For
+example the set of all even natural numbers can be defined as
\[
\{n\;|\;n\in\mathbb{N} \wedge n\;\text{is even}\}
@@ -144,8 +160,12 @@
A \backslash B & \dn & \{x\;|\; x \in A \wedge x \not\in B\}
\end{eqnarray*}
-\noindent For defining sets, we will also often use the notion
-of the ``big union''. An example is as follows:
+\noindent In general set comprehensions are of the form
+$\{a\;|\;P\}$ which stands for the set of all elements $a$
+(from some set) for which some property $P$ holds.
+
+For defining sets, we will also often use the notion of the
+``big union''. An example is as follows:
\begin{equation}\label{bigunion}
\bigcup_{0\le n}\; \{n^2, n^2 + 1\}
@@ -169,17 +189,20 @@
\noindent but using the big union notation is more concise.
-An important notion in this module are \defn{Languages}, which
+An important notion in this module are \defn{languages}, which
are sets of strings. The main goal for us will be how to
(formally) specify languages and to find out whether a string
-is in a language or not. Note that the language containing the
-empty string $\{\dq{}\}$ is not equal to the empty language
-(or empty set): The former contains one element, namely \dq{}
-(also written $[\,]$), but the latter does not contain any.
-
+is in a language or not.\footnote{You might wish to ponder
+whether this is in general a hard or easy problem, where
+hardness is meant in terms of Turing decidable, for example.}
+Note that the language containing the empty string $\{\dq{}\}$
+is not equal to $\varnothing$, the empty language (or empty
+set): The former contains one element, namely \dq{} (also
+written $[\,]$), but the latter does not contain any
+element.
For languages we define the operation of \defn{language
-concatenation}, written $A @ B$:
+concatenation}, written like in the string case as $A @ B$:
\begin{equation}\label{langconc}
A @ B \dn \{s_1 @ s_2\;|\; s_1\in A \wedge s_2\in B\}
@@ -189,8 +212,7 @@
in $s_1 @ s_2$ is string concatenation, while $A @ B$ refers
to the concatenation of two languages (or sets of strings).
As an example suppose $A=\{ab, ac\}$ and $B=\{zzz, qq, r\}$,
-then $A \,@\, B$ is
-
+then $A \,@\, B$ is the language
\[
\{abzzz, abqq, abr, aczzz, acqq, acr\}
@@ -208,9 +230,9 @@
\end{tabular}
\end{center}
-\noindent Note the difference: the empty set behaves like $0$
-for multiplication and the set $\{[]\}$ like $1$ for
-multiplication.
+\noindent Note the difference in the last two lines: the empty
+set behaves like $0$ for multiplication and the set $\{[]\}$
+like $1$ for multiplication.
Following the language concatenation, we can define a
\defn{language power} operation as follows:
@@ -220,7 +242,7 @@
A^{n+1} & \dn & A \,@\, A^n
\end{eqnarray*}
-\noindent This definition is by induction on natural numbers.
+\noindent This definition is by recursion on natural numbers.
Note carefully that the zero-case is not defined as the empty
set, but the set containing the empty string. So no matter
what the set $A$ is, $A^0$ will always be $\{[]\}$. (There is
@@ -230,12 +252,13 @@
Next we can define the \defn{star operation} for languages:
$A^*$ is the union of all powers of $A$, or short
-\[
+\begin{equation}\label{star}
A^* \dn \bigcup_{0\le n}\; A^n
-\]
+\end{equation}
-\noindent
-Unfolding this definition
+\noindent This star operation is often also called
+\emph{Kleene-star}. Unfolding the definition in \eqref{star}
+gives
\[
A^0 \cup A^1 \cup A^2 \cup A^3 \cup \ldots
@@ -248,16 +271,16 @@
\{[]\} \,\cup\, A \,\cup\, A @ A \,\cup\, A @ A @ A \,\cup\, \ldots
\]
-\noindent we can see that the empty string is always in $A^*$,
+\noindent We can see that the empty string is always in $A^*$,
no matter what $A$ is. This is because $[] \in A^0$. To make
-sure you understand these definition, I leave you to answer
+sure you understand these definitions, I leave you to answer
what $\{[]\}^*$ and $\varnothing^*$ are.
Recall that an alphabet is often referred to by the letter
$\Sigma$. We can now write for the set of all strings over
this alphabet $\Sigma^*$. In doing so we also include the
empty string as a possible string over $\Sigma$. So if
-$\Sigma = \{a, b\}$ then $\Sigma^*$ is
+$\Sigma = \{a, b\}$, then $\Sigma^*$ is
\[
\{[], a, b, ab, ba, aaa, aab, aba, abb, baa, bab, \ldots\}