afl-material: comparison handouts/notation.tex

equal deleted inserted replaced

-:de4f6382590a
+:10f02605a46a
 \begin{document}
 \section*{A Crash-Course on Notation}
+There are innumerable books available about automata and
+formal languages. Unfortunately, they often use their own
+notational conventions and their own symbols. This handout is
+meant to clarify some of the notation I will use.
 \subsubsection*{Characters and Strings}
-In this module we will often use \defn{characters}. While they
+The most important concept in this module are strings. Strings
-are surely familiar, we will make one subtle distinction. If
+are composed of \defn{characters}. While characters are surely
-we want to refer to concrete characters, like \code{a},
+a familiar concept, we will make one subtle distinction in
-\code{b} and so on, we use a typewriter font. So if we want to
+this module. If we want to refer to concrete characters, like
-refer to the concrete characters of my email address we shall
+\code{a}, \code{b} and so on, we use a typewriter font.
-write
+Accordingly if we want to refer to the concrete characters of
+my email address we shall write
 \begin{center}
 \pcode{christian.urban@kcl.ac.uk}
 \end{center}
-\noindent If we need to explicitly indicate the ``space''
+\noindent If we also need to explicitly indicate the ``space''
 character, we write \VS{}\hspace{1mm}. For example
 \begin{center}
 \tt{}hello\VS\hspace{0.5mm}world
 \end{center}
-\noindent But often we do not care
+\noindent But often we do not care which particular characters
-about which characters we use. In such cases we us the italic
+we use. In such cases we use the italic font and write $a$,
-font and write $a$, $b$ and so on. So if we need a
+$b$ and so on for characters. Therefore if we need a
 representative string, we might write
 \begin{equation}\label{abracadabra}
 abracadabra
 \end{equation}
-\noindent We do not really care what the characters stand for,
+\noindent In this string, we do not really care what the
-except we do care about is that for example the character $a$
+characters stand for, except we do care about the fact that
-is not equal to $b$.
+for example the character $a$ is not equal to $b$ and so on.
-An \defn{alphabet} is a finite set of characters. Often the
+An \defn{alphabet} is a (non-empty) finite set of characters.
-letter $\Sigma$ is used to refer to an alphabet. For example
+Often the letter $\Sigma$ is used to refer to an alphabet. For
-the ASCII characters \pcode{a} to \pcode{z} form an alphabet.
+example the ASCII characters \pcode{a} to \pcode{z} form an
-The digits $0$ to $9$ are another alphabet. If nothing else is
+alphabet. The digits $0$ to $9$ are another alphabet. The
-specified, we usually assume the alphabet consists of just the
+Greek letters $\alpha$ to $\omega$ also form an alphabet. If
-lower-case letters $a$, $b$, \ldots, $z$. Sometimes, however,
+nothing else is specified, we usually assume the alphabet
-we explicitly restrict strings to contain, for example, only
+consists of just the lower-case letters $a$, $b$, \ldots, $z$.
-the letters $a$ and $b$. In this case we say the alphabet is
+Sometimes, however, we explicitly want to restrict strings to
-the set $\{a, b\}$.
+contain only the letters $a$ and $b$, for example. In this
+case we will state that the alphabet is the set $\{a, b\}$.
 \defn{Strings} are lists of characters. Unfortunately, there
 are many ways how we can write down strings. In programming
 languages, they are usually written as \dq{$hello$} where the
-double quotes indicate that we dealing with a string. But
+double quotes indicate that we are dealing with a string. But
-since, strings are lists of characters we could also write
+since we regard strings as lists of characters we could also
-this string as
+write this string as
 \[
 [\text{\it h, e, l, l, o}]
 \]
 \noindent The important point is that we can always decompose
 such strings. For example, we will often consider the first
 character of a string, say $h$, and the ``rest'' of a string
 say \dq{\textit{ello}} when making definitions about strings.
-There are some subtleties with the empty string, sometimes
+There are also some subtleties with the empty string,
-written as \dq{} but also as the empty list of characters
+sometimes written as \dq{} but also as the empty list of
-$[\,]$. Two strings, for example $s_1$ and $s_2$, can be
+characters $[\,]$.\footnote{In the literature you can also
-\emph{concatenated}, which we write as $s_1 @ s_2$. Suppose we
+often find that $\varepsilon$ or $\lambda$ is used to
-are given two strings \dq{\textit{foo}} and \dq{\textit{bar}},
+represent the empty string.}
-then their concatenation, writen
-\dq{\textit{foo}} $@$ \dq{\textit{bar}}, gives
+Two strings, say $s_1$ and $s_2$, can be \defn{concatenated},
-\dq{\textit{foobar}}. Often we will simplify our life and just
+which we write as $s_1 @ s_2$. Suppose we are given two
-drop the double quotes whenever it is clear we are talking
+strings \dq{\textit{foo}} and \dq{\textit{bar}}, then their
-about strings, writing as already in \eqref{abracadabra} just
+concatenation, writen \dq{\textit{foo}} $@$ \dq{\textit{bar}},
-\textit{foo}, \textit{bar}, \textit{foobar} or \textit{foo $@$
+gives \dq{\textit{foobar}}. Often we will simplify our life
-bar}.
+and just drop the double quotes whenever it is clear we are
+talking about strings, writing as already in
+\eqref{abracadabra} just \textit{foo}, \textit{bar},
+\textit{foobar} or \textit{foo $@$ bar}.
 Some simple properties of string concatenation hold. For
 example the concatenation operation is \emph{associative},
 meaning
 \noindent are always equal strings. The empty string behaves
 like a unit element, therefore
 \[s \,@\, [] = [] \,@\, s = s\]
+Occasionally we will use the notation $a^n$ for strings, which
+stands for the string of $n$ repeated $a$s. So $a^{n}b^{n}$ is
+a string that has as many $a$s as $b$s.
 While for us strings are just lists of characters, programming
 languages often differentiate between the two concepts. In
 Scala, for example, there is the type of \code{String} and the
 type of lists of characters,  \code{List[Char]}. They are not
 \end{lstlisting}
 \subsubsection*{Sets and Languages}
-We will use the familiar operations $\cup$ and $\cap$ for
+We will use the familiar operations $\cup$, $\cap$, $\subset$
-sets. For the empty set we will either write $\varnothing$ or
+and $\subseteq$ for sets. For the empty set we will either
-$\{\,\}$. The set containing, for example, the natural numbers
+write $\varnothing$ or $\{\,\}$. The set containing the
-$1$, $2$ and $3$ we will write as
+natural numbers $1$, $2$ and $3$, for example, we will write
+with curly braces as
 \[
 \{1, 2, 3\}
 \]
 \noindent The notation $\in$ means \emph{element of}, so $1
 \in \{1, 2, 3\}$ is true and $3 \in \{1, 2, 3\}$ is false.
 Sets can potentially have infinitely many elements. For
 example the set of all natural numbers $\{0, 1, 2, \ldots\}$
 is infinite. This set is often also abbreviated as
-$\mathbb{N}$. We can define sets by giving all elements, like
+$\mathbb{N}$. We can define sets by giving all elements, for
-$\{0, 1\}$, but also by \defn{set comprehensions}. For example
+example $\{0, 1\}$, but also by \defn{set comprehensions}. For
-the set of all even natural numbers can be defined as
+example the set of all even natural numbers can be defined as
 \[
 \{n\;|\;n\in\mathbb{N} \wedge n\;\text{is even}\}
 \]
 A \cup B & \dn & \{x\;|\; x \in A \vee x \in B\}\\
 A \cap B & \dn & \{x\;|\; x \in A \wedge x \in B\}\\
 A \backslash B & \dn & \{x\;|\; x \in A \wedge x \not\in B\}
 \end{eqnarray*}
-\noindent For defining sets, we will also often use the notion
+\noindent In general set comprehensions are of the form
-of the ``big union''. An example is as follows:
+$\{a\;|\;P\}$ which stands for the set of all elements $a$
+(from some set) for which some property $P$ holds.
+For defining sets, we will also often use the notion of the
+``big union''. An example is as follows:
 \begin{equation}\label{bigunion}
 \bigcup_{0\le n}\; \{n^2, n^2 + 1\}
 \end{equation}
 \ldots
 \]
 \noindent but using the big union notation is more concise.
-An important notion in this module are \defn{Languages}, which
+An important notion in this module are \defn{languages}, which
 are sets of strings. The main goal for us will be how to
 (formally) specify languages and to find out whether a string
-is in a language or not. Note that the language containing the
+is in a language or not.\footnote{You might wish to ponder
-empty string $\{\dq{}\}$ is not equal to the empty language
+whether this is in general a hard or easy problem, where
-(or empty set): The former contains one element, namely \dq{}
+hardness is meant in terms of Turing decidable, for example.}
-(also written $[\,]$), but the latter does not contain any.
+Note that the language containing the empty string $\{\dq{}\}$
+is not equal to $\varnothing$, the empty language (or empty
+set): The former contains one element, namely \dq{} (also
+written $[\,]$), but the latter does not contain any
+element.
 For languages we define the operation of \defn{language
-concatenation}, written $A @ B$:
+concatenation}, written like in the string case as $A @ B$:
 \begin{equation}\label{langconc}
 A @ B \dn \{s_1 @ s_2\;|\; s_1\in A \wedge s_2\in B\}
 \end{equation}
 \noindent Be careful to understand the difference: the $@$
 in $s_1 @ s_2$ is string concatenation, while $A @ B$ refers
 to the concatenation of two languages (or sets of strings).
 As an example suppose $A=\{ab, ac\}$ and $B=\{zzz, qq, r\}$,
-then $A \,@\, B$ is
+then $A \,@\, B$ is the language
 \[
 \{abzzz, abqq, abr, aczzz, acqq, acr\}
 \]
 zero element:  & $A \,@\, \varnothing = \varnothing \,@\, A =
 \varnothing$
 \end{tabular}
 \end{center}
-\noindent Note the difference: the empty set behaves like $0$
+\noindent Note the difference in the last two lines: the empty
-for multiplication and the set $\{[]\}$ like $1$ for
+set behaves like $0$ for multiplication and the set $\{[]\}$
-multiplication.
+like $1$ for multiplication.
 Following the language concatenation, we can define a
 \defn{language power} operation as follows:
 \begin{eqnarray*}
 A^0     & \dn & \{[]\}\\
 A^{n+1} & \dn & A \,@\, A^n
 \end{eqnarray*}
-\noindent This definition is by induction on natural numbers.
+\noindent This definition is by recursion on natural numbers.
 Note carefully that the zero-case is not defined as the empty
 set, but the set containing the empty string. So no matter
 what the set $A$ is, $A^0$ will always be $\{[]\}$. (There is
 another hint about a connection between the $@$-operation and
 multiplication: How is $x^n$ defined and what is $x^0$?)
 Next we can define the \defn{star operation} for languages:
 $A^*$ is the union of all powers of $A$, or short
-\[
+\begin{equation}\label{star}
 A^* \dn \bigcup_{0\le n}\; A^n
-\]
+\end{equation}
-\noindent
+\noindent This star operation is often also called
-Unfolding this definition
+\emph{Kleene-star}. Unfolding the definition in \eqref{star}
+gives
 \[
 A^0 \cup A^1 \cup A^2 \cup A^3 \cup \ldots
 \]
 \[
 \{[]\} \,\cup\, A \,\cup\, A @ A \,\cup\, A @ A @ A \,\cup\, \ldots
 \]
-\noindent we can see that the empty string is always in $A^*$,
+\noindent We can see that the empty string is always in $A^*$,
 no matter what $A$ is. This is because $[] \in A^0$. To make
-sure you understand these definition, I leave you to answer
+sure you understand these definitions, I leave you to answer
 what $\{[]\}^*$ and $\varnothing^*$ are.
 Recall that an alphabet is often referred to by the letter
 $\Sigma$. We can now write for the set of all strings over
 this alphabet $\Sigma^*$. In doing so we also include the
 empty string as a possible string over $\Sigma$. So if
-$\Sigma = \{a, b\}$ then $\Sigma^*$ is
+$\Sigma = \{a, b\}$, then $\Sigma^*$ is
 \[
 \{[], a, b, ab, ba, aaa, aab, aba, abb, baa, bab, \ldots\}
 \]

changeset 241	10f02605a46a
parent 239	68d98140b90b
child 242	35104ee14f87