handouts/notation.tex
author Christian Urban <christian dot urban at kcl dot ac dot uk>
Fri, 25 Sep 2015 20:59:24 +0100
changeset 332 4755ad4b457b
parent 266 ae039d6ae3f2
child 398 c8ce95067c1a
permissions -rw-r--r--
updated

\documentclass{article}
\usepackage{../style}
\usepackage{../langs}



\begin{document}

\section*{A Crash-Course on Notation}

There are innumerable books available about automata and
formal languages. Unfortunately, they often use their own
notational conventions and their own symbols. This handout is
meant to clarify some of the notation I will use.

\subsubsection*{Characters and Strings}

The most important concept in this module are strings. Strings
are composed of \defn{characters}. While characters are surely
a familiar concept, we will make one subtle distinction in
this module. If we want to refer to concrete characters, like
\code{a}, \code{b}, \code{c} and so on, we use a typewriter font.
Accordingly if we want to refer to the concrete characters of
my email address we shall write

\begin{center}
\pcode{christian.urban@kcl.ac.uk}
\end{center}

\noindent If we also need to explicitly indicate the ``space''
character, we write \VS{}\hspace{1mm}. For example

\begin{center}
\tt{}hello\VS\hspace{0.5mm}world
\end{center}


\noindent But often we do not care which particular characters
we use. In such cases we use the italic font and write $a$,
$b$, $c$ and so on for characters. Therefore if we need a
representative string, we might write

\begin{equation}\label{abracadabra}
abracadabra
\end{equation}

\noindent In this string, we do not really care what the
characters stand for, except we do care about the fact that
for example the character $a$ is not equal to $b$ and so on.

An \defn{alphabet} is a (non-empty) finite set of characters.
Often the letter $\Sigma$ is used to refer to an alphabet. For
example the ASCII characters \pcode{a} to \pcode{z} form an
alphabet. The digits $0$ to $9$ are another alphabet. The
Greek letters $\alpha$ to $\omega$ also form an alphabet. If
nothing else is specified, we usually assume the alphabet
consists of just the lower-case letters $a$, $b$, \ldots, $z$.
Sometimes, however, we explicitly want to restrict strings to
contain only the letters $a$ and $b$, for example. In this
case we will state that the alphabet is the set $\{a, b\}$. 

\defn{Strings} are lists of characters. Unfortunately, there
are many ways how we can write down strings. In programming
languages, they are usually written as \dq{$hello$} where the
double quotes indicate that we are dealing with a string. But
since we regard strings as lists of characters we could also
write this string as

\[
[\text{\it h, e, l, l, o}] \;\;\text{or simply}\;\; \textit{hello}
\]

\noindent The important point is that we can always decompose
such strings. For example, we will often consider the first
character of a string, say $h$, and the ``rest'' of a string
say \dq{\textit{ello}} when making definitions about strings.
There are also some subtleties with the empty string,
sometimes written as \dq{} but also as the empty list of
characters $[\,]$.\footnote{In the literature you can also
often find that $\varepsilon$ or $\lambda$ is used to
represent the empty string.} 

Two strings, say $s_1$ and $s_2$, can be \defn{concatenated},
which we write as $s_1 @ s_2$. Suppose we are given two
strings \dq{\textit{foo}} and \dq{\textit{bar}}, then their
concatenation, writen \dq{\textit{foo}} $@$ \dq{\textit{bar}},
gives \dq{\textit{foobar}}. Often we will simplify our life
and just drop the double quotes whenever it is clear we are
talking about strings, writing as already in
\eqref{abracadabra} just \textit{foo}, \textit{bar},
\textit{foobar} or \textit{foo $@$ bar}.

Some simple properties of string concatenation hold. For
example the concatenation operation is \emph{associative},
meaning

\[(s_1 @ s_2) @ s_3 = s_1 @ (s_2 @ s_3)\]  

\noindent are always equal strings. The empty string behaves
like a unit element, therefore

\[s \,@\, [] = [] \,@\, s = s\]


Occasionally we will use the notation $a^n$ for strings, which 
stands for the string of $n$ repeated $a$s. So $a^{n}b^{n}$ is
a string that has as many $a$s as $b$s.

Note however that while for us strings are just lists of
characters, programming languages often differentiate between
the two concepts. In Scala, for example, there is the type of
\code{String} and the type of lists of characters,
\code{List[Char]}. They are not the same and we need to
explicitly coerce elements between the two types, for example

\begin{lstlisting}[numbers=none]
scala> "abc".toList
res01: List[Char] = List(a, b, c)
\end{lstlisting}


\subsubsection*{Sets and Languages}

We will use the familiar operations $\cup$, $\cap$, $\subset$
and $\subseteq$ for sets. For the empty set we will either
write $\varnothing$ or $\{\,\}$. The set containing the
natural numbers $1$, $2$ and $3$, for example, we will write
with curly braces as

\[
\{1, 2, 3\}
\]

\noindent The notation $\in$ means \emph{element of}, so $1
\in \{1, 2, 3\}$ is true and $4 \in \{1, 2, 3\}$ is false.
Sets can potentially have infinitely many elements. For
example the set of all natural numbers $\{0, 1, 2, \ldots\}$
is infinite. This set is often also abbreviated as
$\mathbb{N}$. We can define sets by giving all elements, for
example $\{0, 1\}$, but also by \defn{set comprehensions}. For
example the set of all even natural numbers can be defined as

\[
\{n\;|\;n\in\mathbb{N} \wedge n\;\text{is even}\}
\]
  
\noindent Though silly, but the set $\{0, 1, 2\}$ could also be
defined by the following set comprehension

\[
\{n\;|\; n^2 < 9 \wedge n \in \mathbb{N}\}
\]

\noindent Notice that set comprehensions could be used
to define set union, intersection and difference:

\begin{eqnarray*}
A \cup B & \dn & \{x\;|\; x \in A \vee x \in B\}\\
A \cap B & \dn & \{x\;|\; x \in A \wedge x \in B\}\\
A \backslash B & \dn & \{x\;|\; x \in A \wedge x \not\in B\} 
\end{eqnarray*}

\noindent In general set comprehensions are of the form
$\{a\;|\;P\}$ which stands for the set of all elements $a$
(from some set) for which some property $P$ holds.
 
For defining sets, we will also often use the notion of the
``big union''. An example is as follows:

\begin{equation}\label{bigunion}
\bigcup_{0\le n}\; \{n^2, n^2 + 1\}
\end{equation}

\noindent which is the set of all squares and their immediate
successors, so

\[
\{0, 1, 2, 4, 5, 9, 10, 16, 17, \ldots\}
\]

\noindent A big union is a sequence of unions which are 
indexed typically by a natural number. So the big union in
\eqref{bigunion} could equally be written as

\[
\{0, 1\} \cup \{1, 2\} \cup \{4, 5\} \cup \{9, 10\} \cup 
\ldots
\]

\noindent but using the big union notation is more concise.

An important notion in this module are \defn{languages}, which
are sets of strings. The main goal for us will be how to
(formally) specify languages and to find out whether a string
is in a language or not.\footnote{You might wish to ponder
whether this is in general a hard or easy problem, where
hardness is meant in terms of Turing decidable, for example.}
Note that the language containing the empty string $\{\dq{}\}$
is not equal to $\varnothing$, the empty language (or empty
set): The former contains one element, namely \dq{} (also
written $[\,]$), but the latter does not contain any
element.

For languages we define the operation of \defn{language
concatenation}, written like in the string case as $A @ B$:

\begin{equation}\label{langconc}
A @ B \dn \{s_1 @ s_2\;|\; s_1\in A \wedge s_2\in B\}
\end{equation}

\noindent Be careful to understand the difference: the $@$
in $s_1 @ s_2$ is string concatenation, while $A @ B$ refers 
to the concatenation of two languages (or sets of strings).
As an example suppose $A=\{ab, ac\}$ and $B=\{zzz, qq, r\}$,
then $A \,@\, B$ is the language

\[
\{abzzz, abqq, abr, aczzz, acqq, acr\}
\] 

\noindent Recall the properties for string concatenation. For
language concatenation we have the following properties

\begin{center}
\begin{tabular}{ll}
associativity: & $(A @ B) @ C = A @ (B @ C)$\\
unit element:  & $A \,@\, \{[]\} = \{[]\} \,@\, A = A$\\
zero element:  & $A \,@\, \varnothing = \varnothing \,@\, A = 
\varnothing$
\end{tabular}
\end{center}

\noindent Note the difference in the last two lines: the empty
set behaves like $0$ for multiplication and the set $\{[]\}$
like $1$ for multiplication ($n * 1 = n$ and $n * 0 = 0$).

Following the language concatenation, we can define a
\defn{language power} operation as follows:

\begin{eqnarray*}
A^0     & \dn & \{[]\}\\
A^{n+1} & \dn & A \,@\, A^n
\end{eqnarray*}

\noindent This definition is by recursion on natural numbers.
Note carefully that the zero-case is not defined as the empty
set, but the set containing the empty string. So no matter
what the set $A$ is, $A^0$ will always be $\{[]\}$. (There is
another hint about a connection between the $@$-operation and
multiplication: How is $x^n$ defined recursively and what is
$x^0$?)

Next we can define the \defn{star operation} for languages:
$A^*$ is the union of all powers of $A$, or short

\begin{equation}\label{star}
A^* \dn \bigcup_{0\le n}\; A^n
\end{equation}

\noindent This star operation is often also called
\emph{Kleene-star}. Unfolding the definition in \eqref{star}
gives

\[
A^0 \cup A^1 \cup A^2 \cup A^3 \cup \ldots
\]

\noindent
which is equal to 

\[
\{[]\} \,\cup\, A \,\cup\, A @ A \,\cup\, A @ A @ A \,\cup\, \ldots
\]

\noindent We can see that the empty string is always in $A^*$,
no matter what $A$ is. This is because $[] \in A^0$. To make
sure you understand these definitions, I leave you to answer
what $\{[]\}^*$ and $\varnothing^*$ are. 

Recall that an alphabet is often referred to by the letter
$\Sigma$. We can now write for the set of all strings over
this alphabet $\Sigma^*$. In doing so we also include the 
empty string as a possible string over $\Sigma$. So if
$\Sigma = \{a, b\}$, then $\Sigma^*$ is

\[
\{[], a, b, aa, ab, ba, bb, aaa, aab, aba, abb, baa, bab, \ldots\}
\]

\noindent or in other words all strings containing $a$s and
$b$s only, plus the empty string.

\end{document}

%%% Local Variables: 
%%% mode: latex
%%% TeX-master: t
%%% End: