handouts/notation.tex
changeset 241 10f02605a46a
parent 239 68d98140b90b
child 242 35104ee14f87
--- a/handouts/notation.tex	Sat Sep 06 21:53:41 2014 +0100
+++ b/handouts/notation.tex	Sun Sep 07 08:37:44 2014 +0100
@@ -8,20 +8,26 @@
 
 \section*{A Crash-Course on Notation}
 
+There are innumerable books available about automata and
+formal languages. Unfortunately, they often use their own
+notational conventions and their own symbols. This handout is
+meant to clarify some of the notation I will use.
+
 \subsubsection*{Characters and Strings}
 
-In this module we will often use \defn{characters}. While they
-are surely familiar, we will make one subtle distinction. If
-we want to refer to concrete characters, like \code{a},
-\code{b} and so on, we use a typewriter font. So if we want to
-refer to the concrete characters of my email address we shall
-write
+The most important concept in this module are strings. Strings
+are composed of \defn{characters}. While characters are surely
+a familiar concept, we will make one subtle distinction in
+this module. If we want to refer to concrete characters, like
+\code{a}, \code{b} and so on, we use a typewriter font.
+Accordingly if we want to refer to the concrete characters of
+my email address we shall write
 
 \begin{center}
 \pcode{christian.urban@kcl.ac.uk}
 \end{center}
 
-\noindent If we need to explicitly indicate the ``space''
+\noindent If we also need to explicitly indicate the ``space''
 character, we write \VS{}\hspace{1mm}. For example
 
 \begin{center}
@@ -29,35 +35,36 @@
 \end{center}
 
 
-\noindent But often we do not care
-about which characters we use. In such cases we us the italic
-font and write $a$, $b$ and so on. So if we need a
+\noindent But often we do not care which particular characters
+we use. In such cases we use the italic font and write $a$,
+$b$ and so on for characters. Therefore if we need a
 representative string, we might write
 
 \begin{equation}\label{abracadabra}
 abracadabra
 \end{equation}
 
-\noindent We do not really care what the characters stand for,
-except we do care about is that for example the character $a$
-is not equal to $b$.
+\noindent In this string, we do not really care what the
+characters stand for, except we do care about the fact that
+for example the character $a$ is not equal to $b$ and so on.
 
-An \defn{alphabet} is a finite set of characters. Often the
-letter $\Sigma$ is used to refer to an alphabet. For example
-the ASCII characters \pcode{a} to \pcode{z} form an alphabet.
-The digits $0$ to $9$ are another alphabet. If nothing else is
-specified, we usually assume the alphabet consists of just the
-lower-case letters $a$, $b$, \ldots, $z$. Sometimes, however,
-we explicitly restrict strings to contain, for example, only
-the letters $a$ and $b$. In this case we say the alphabet is
-the set $\{a, b\}$. 
+An \defn{alphabet} is a (non-empty) finite set of characters.
+Often the letter $\Sigma$ is used to refer to an alphabet. For
+example the ASCII characters \pcode{a} to \pcode{z} form an
+alphabet. The digits $0$ to $9$ are another alphabet. The
+Greek letters $\alpha$ to $\omega$ also form an alphabet. If
+nothing else is specified, we usually assume the alphabet
+consists of just the lower-case letters $a$, $b$, \ldots, $z$.
+Sometimes, however, we explicitly want to restrict strings to
+contain only the letters $a$ and $b$, for example. In this
+case we will state that the alphabet is the set $\{a, b\}$. 
 
 \defn{Strings} are lists of characters. Unfortunately, there
 are many ways how we can write down strings. In programming
 languages, they are usually written as \dq{$hello$} where the
-double quotes indicate that we dealing with a string. But
-since, strings are lists of characters we could also write
-this string as
+double quotes indicate that we are dealing with a string. But
+since we regard strings as lists of characters we could also
+write this string as
 
 \[
 [\text{\it h, e, l, l, o}]
@@ -67,18 +74,21 @@
 such strings. For example, we will often consider the first
 character of a string, say $h$, and the ``rest'' of a string
 say \dq{\textit{ello}} when making definitions about strings.
-There are some subtleties with the empty string, sometimes
-written as \dq{} but also as the empty list of characters
-$[\,]$. Two strings, for example $s_1$ and $s_2$, can be
-\emph{concatenated}, which we write as $s_1 @ s_2$. Suppose we
-are given two strings \dq{\textit{foo}} and \dq{\textit{bar}},
-then their concatenation, writen
-\dq{\textit{foo}} $@$ \dq{\textit{bar}}, gives
-\dq{\textit{foobar}}. Often we will simplify our life and just
-drop the double quotes whenever it is clear we are talking
-about strings, writing as already in \eqref{abracadabra} just
-\textit{foo}, \textit{bar}, \textit{foobar} or \textit{foo $@$
-bar}.
+There are also some subtleties with the empty string,
+sometimes written as \dq{} but also as the empty list of
+characters $[\,]$.\footnote{In the literature you can also
+often find that $\varepsilon$ or $\lambda$ is used to
+represent the empty string.} 
+
+Two strings, say $s_1$ and $s_2$, can be \defn{concatenated},
+which we write as $s_1 @ s_2$. Suppose we are given two
+strings \dq{\textit{foo}} and \dq{\textit{bar}}, then their
+concatenation, writen \dq{\textit{foo}} $@$ \dq{\textit{bar}},
+gives \dq{\textit{foobar}}. Often we will simplify our life
+and just drop the double quotes whenever it is clear we are
+talking about strings, writing as already in
+\eqref{abracadabra} just \textit{foo}, \textit{bar},
+\textit{foobar} or \textit{foo $@$ bar}.
 
 Some simple properties of string concatenation hold. For
 example the concatenation operation is \emph{associative},
@@ -91,6 +101,11 @@
 
 \[s \,@\, [] = [] \,@\, s = s\]
 
+
+Occasionally we will use the notation $a^n$ for strings, which 
+stands for the string of $n$ repeated $a$s. So $a^{n}b^{n}$ is
+a string that has as many $a$s as $b$s.
+
 While for us strings are just lists of characters, programming
 languages often differentiate between the two concepts. In
 Scala, for example, there is the type of \code{String} and the
@@ -106,10 +121,11 @@
 
 \subsubsection*{Sets and Languages}
 
-We will use the familiar operations $\cup$ and $\cap$ for
-sets. For the empty set we will either write $\varnothing$ or
-$\{\,\}$. The set containing, for example, the natural numbers
-$1$, $2$ and $3$ we will write as
+We will use the familiar operations $\cup$, $\cap$, $\subset$
+and $\subseteq$ for sets. For the empty set we will either
+write $\varnothing$ or $\{\,\}$. The set containing the
+natural numbers $1$, $2$ and $3$, for example, we will write
+with curly braces as
 
 \[
 \{1, 2, 3\}
@@ -120,9 +136,9 @@
 Sets can potentially have infinitely many elements. For
 example the set of all natural numbers $\{0, 1, 2, \ldots\}$
 is infinite. This set is often also abbreviated as
-$\mathbb{N}$. We can define sets by giving all elements, like
-$\{0, 1\}$, but also by \defn{set comprehensions}. For example
-the set of all even natural numbers can be defined as
+$\mathbb{N}$. We can define sets by giving all elements, for
+example $\{0, 1\}$, but also by \defn{set comprehensions}. For
+example the set of all even natural numbers can be defined as
 
 \[
 \{n\;|\;n\in\mathbb{N} \wedge n\;\text{is even}\}
@@ -144,8 +160,12 @@
 A \backslash B & \dn & \{x\;|\; x \in A \wedge x \not\in B\} 
 \end{eqnarray*}
 
-\noindent For defining sets, we will also often use the notion
-of the ``big union''. An example is as follows:
+\noindent In general set comprehensions are of the form
+$\{a\;|\;P\}$ which stands for the set of all elements $a$
+(from some set) for which some property $P$ holds.
+ 
+For defining sets, we will also often use the notion of the
+``big union''. An example is as follows:
 
 \begin{equation}\label{bigunion}
 \bigcup_{0\le n}\; \{n^2, n^2 + 1\}
@@ -169,17 +189,20 @@
 
 \noindent but using the big union notation is more concise.
 
-An important notion in this module are \defn{Languages}, which
+An important notion in this module are \defn{languages}, which
 are sets of strings. The main goal for us will be how to
 (formally) specify languages and to find out whether a string
-is in a language or not. Note that the language containing the
-empty string $\{\dq{}\}$ is not equal to the empty language
-(or empty set): The former contains one element, namely \dq{}
-(also written $[\,]$), but the latter does not contain any.
-
+is in a language or not.\footnote{You might wish to ponder
+whether this is in general a hard or easy problem, where
+hardness is meant in terms of Turing decidable, for example.}
+Note that the language containing the empty string $\{\dq{}\}$
+is not equal to $\varnothing$, the empty language (or empty
+set): The former contains one element, namely \dq{} (also
+written $[\,]$), but the latter does not contain any
+element.
 
 For languages we define the operation of \defn{language
-concatenation}, written $A @ B$:
+concatenation}, written like in the string case as $A @ B$:
 
 \begin{equation}\label{langconc}
 A @ B \dn \{s_1 @ s_2\;|\; s_1\in A \wedge s_2\in B\}
@@ -189,8 +212,7 @@
 in $s_1 @ s_2$ is string concatenation, while $A @ B$ refers 
 to the concatenation of two languages (or sets of strings).
 As an example suppose $A=\{ab, ac\}$ and $B=\{zzz, qq, r\}$,
-then $A \,@\, B$ is
-
+then $A \,@\, B$ is the language
 
 \[
 \{abzzz, abqq, abr, aczzz, acqq, acr\}
@@ -208,9 +230,9 @@
 \end{tabular}
 \end{center}
 
-\noindent Note the difference: the empty set behaves like $0$
-for multiplication and the set $\{[]\}$ like $1$ for
-multiplication.
+\noindent Note the difference in the last two lines: the empty
+set behaves like $0$ for multiplication and the set $\{[]\}$
+like $1$ for multiplication.
 
 Following the language concatenation, we can define a
 \defn{language power} operation as follows:
@@ -220,7 +242,7 @@
 A^{n+1} & \dn & A \,@\, A^n
 \end{eqnarray*}
 
-\noindent This definition is by induction on natural numbers.
+\noindent This definition is by recursion on natural numbers.
 Note carefully that the zero-case is not defined as the empty
 set, but the set containing the empty string. So no matter
 what the set $A$ is, $A^0$ will always be $\{[]\}$. (There is
@@ -230,12 +252,13 @@
 Next we can define the \defn{star operation} for languages: 
 $A^*$ is the union of all powers of $A$, or short
 
-\[
+\begin{equation}\label{star}
 A^* \dn \bigcup_{0\le n}\; A^n
-\]
+\end{equation}
 
-\noindent
-Unfolding this definition
+\noindent This star operation is often also called
+\emph{Kleene-star}. Unfolding the definition in \eqref{star}
+gives
 
 \[
 A^0 \cup A^1 \cup A^2 \cup A^3 \cup \ldots
@@ -248,16 +271,16 @@
 \{[]\} \,\cup\, A \,\cup\, A @ A \,\cup\, A @ A @ A \,\cup\, \ldots
 \]
 
-\noindent we can see that the empty string is always in $A^*$,
+\noindent We can see that the empty string is always in $A^*$,
 no matter what $A$ is. This is because $[] \in A^0$. To make
-sure you understand these definition, I leave you to answer
+sure you understand these definitions, I leave you to answer
 what $\{[]\}^*$ and $\varnothing^*$ are. 
 
 Recall that an alphabet is often referred to by the letter
 $\Sigma$. We can now write for the set of all strings over
 this alphabet $\Sigma^*$. In doing so we also include the 
 empty string as a possible string over $\Sigma$. So if
-$\Sigma = \{a, b\}$ then $\Sigma^*$ is
+$\Sigma = \{a, b\}$, then $\Sigma^*$ is
 
 \[
 \{[], a, b, ab, ba, aaa, aab, aba, abb, baa, bab, \ldots\}