diff -r de4f6382590a -r 10f02605a46a handouts/notation.tex --- a/handouts/notation.tex Sat Sep 06 21:53:41 2014 +0100 +++ b/handouts/notation.tex Sun Sep 07 08:37:44 2014 +0100 @@ -8,20 +8,26 @@ \section*{A Crash-Course on Notation} +There are innumerable books available about automata and +formal languages. Unfortunately, they often use their own +notational conventions and their own symbols. This handout is +meant to clarify some of the notation I will use. + \subsubsection*{Characters and Strings} -In this module we will often use \defn{characters}. While they -are surely familiar, we will make one subtle distinction. If -we want to refer to concrete characters, like \code{a}, -\code{b} and so on, we use a typewriter font. So if we want to -refer to the concrete characters of my email address we shall -write +The most important concept in this module are strings. Strings +are composed of \defn{characters}. While characters are surely +a familiar concept, we will make one subtle distinction in +this module. If we want to refer to concrete characters, like +\code{a}, \code{b} and so on, we use a typewriter font. +Accordingly if we want to refer to the concrete characters of +my email address we shall write \begin{center} \pcode{christian.urban@kcl.ac.uk} \end{center} -\noindent If we need to explicitly indicate the ``space'' +\noindent If we also need to explicitly indicate the ``space'' character, we write \VS{}\hspace{1mm}. For example \begin{center} @@ -29,35 +35,36 @@ \end{center} -\noindent But often we do not care -about which characters we use. In such cases we us the italic -font and write $a$, $b$ and so on. So if we need a +\noindent But often we do not care which particular characters +we use. In such cases we use the italic font and write $a$, +$b$ and so on for characters. Therefore if we need a representative string, we might write \begin{equation}\label{abracadabra} abracadabra \end{equation} -\noindent We do not really care what the characters stand for, -except we do care about is that for example the character $a$ -is not equal to $b$. +\noindent In this string, we do not really care what the +characters stand for, except we do care about the fact that +for example the character $a$ is not equal to $b$ and so on. -An \defn{alphabet} is a finite set of characters. Often the -letter $\Sigma$ is used to refer to an alphabet. For example -the ASCII characters \pcode{a} to \pcode{z} form an alphabet. -The digits $0$ to $9$ are another alphabet. If nothing else is -specified, we usually assume the alphabet consists of just the -lower-case letters $a$, $b$, \ldots, $z$. Sometimes, however, -we explicitly restrict strings to contain, for example, only -the letters $a$ and $b$. In this case we say the alphabet is -the set $\{a, b\}$. +An \defn{alphabet} is a (non-empty) finite set of characters. +Often the letter $\Sigma$ is used to refer to an alphabet. For +example the ASCII characters \pcode{a} to \pcode{z} form an +alphabet. The digits $0$ to $9$ are another alphabet. The +Greek letters $\alpha$ to $\omega$ also form an alphabet. If +nothing else is specified, we usually assume the alphabet +consists of just the lower-case letters $a$, $b$, \ldots, $z$. +Sometimes, however, we explicitly want to restrict strings to +contain only the letters $a$ and $b$, for example. In this +case we will state that the alphabet is the set $\{a, b\}$. \defn{Strings} are lists of characters. Unfortunately, there are many ways how we can write down strings. In programming languages, they are usually written as \dq{$hello$} where the -double quotes indicate that we dealing with a string. But -since, strings are lists of characters we could also write -this string as +double quotes indicate that we are dealing with a string. But +since we regard strings as lists of characters we could also +write this string as \[ [\text{\it h, e, l, l, o}] @@ -67,18 +74,21 @@ such strings. For example, we will often consider the first character of a string, say $h$, and the ``rest'' of a string say \dq{\textit{ello}} when making definitions about strings. -There are some subtleties with the empty string, sometimes -written as \dq{} but also as the empty list of characters -$[\,]$. Two strings, for example $s_1$ and $s_2$, can be -\emph{concatenated}, which we write as $s_1 @ s_2$. Suppose we -are given two strings \dq{\textit{foo}} and \dq{\textit{bar}}, -then their concatenation, writen -\dq{\textit{foo}} $@$ \dq{\textit{bar}}, gives -\dq{\textit{foobar}}. Often we will simplify our life and just -drop the double quotes whenever it is clear we are talking -about strings, writing as already in \eqref{abracadabra} just -\textit{foo}, \textit{bar}, \textit{foobar} or \textit{foo $@$ -bar}. +There are also some subtleties with the empty string, +sometimes written as \dq{} but also as the empty list of +characters $[\,]$.\footnote{In the literature you can also +often find that $\varepsilon$ or $\lambda$ is used to +represent the empty string.} + +Two strings, say $s_1$ and $s_2$, can be \defn{concatenated}, +which we write as $s_1 @ s_2$. Suppose we are given two +strings \dq{\textit{foo}} and \dq{\textit{bar}}, then their +concatenation, writen \dq{\textit{foo}} $@$ \dq{\textit{bar}}, +gives \dq{\textit{foobar}}. Often we will simplify our life +and just drop the double quotes whenever it is clear we are +talking about strings, writing as already in +\eqref{abracadabra} just \textit{foo}, \textit{bar}, +\textit{foobar} or \textit{foo $@$ bar}. Some simple properties of string concatenation hold. For example the concatenation operation is \emph{associative}, @@ -91,6 +101,11 @@ \[s \,@\, [] = [] \,@\, s = s\] + +Occasionally we will use the notation $a^n$ for strings, which +stands for the string of $n$ repeated $a$s. So $a^{n}b^{n}$ is +a string that has as many $a$s as $b$s. + While for us strings are just lists of characters, programming languages often differentiate between the two concepts. In Scala, for example, there is the type of \code{String} and the @@ -106,10 +121,11 @@ \subsubsection*{Sets and Languages} -We will use the familiar operations $\cup$ and $\cap$ for -sets. For the empty set we will either write $\varnothing$ or -$\{\,\}$. The set containing, for example, the natural numbers -$1$, $2$ and $3$ we will write as +We will use the familiar operations $\cup$, $\cap$, $\subset$ +and $\subseteq$ for sets. For the empty set we will either +write $\varnothing$ or $\{\,\}$. The set containing the +natural numbers $1$, $2$ and $3$, for example, we will write +with curly braces as \[ \{1, 2, 3\} @@ -120,9 +136,9 @@ Sets can potentially have infinitely many elements. For example the set of all natural numbers $\{0, 1, 2, \ldots\}$ is infinite. This set is often also abbreviated as -$\mathbb{N}$. We can define sets by giving all elements, like -$\{0, 1\}$, but also by \defn{set comprehensions}. For example -the set of all even natural numbers can be defined as +$\mathbb{N}$. We can define sets by giving all elements, for +example $\{0, 1\}$, but also by \defn{set comprehensions}. For +example the set of all even natural numbers can be defined as \[ \{n\;|\;n\in\mathbb{N} \wedge n\;\text{is even}\} @@ -144,8 +160,12 @@ A \backslash B & \dn & \{x\;|\; x \in A \wedge x \not\in B\} \end{eqnarray*} -\noindent For defining sets, we will also often use the notion -of the ``big union''. An example is as follows: +\noindent In general set comprehensions are of the form +$\{a\;|\;P\}$ which stands for the set of all elements $a$ +(from some set) for which some property $P$ holds. + +For defining sets, we will also often use the notion of the +``big union''. An example is as follows: \begin{equation}\label{bigunion} \bigcup_{0\le n}\; \{n^2, n^2 + 1\} @@ -169,17 +189,20 @@ \noindent but using the big union notation is more concise. -An important notion in this module are \defn{Languages}, which +An important notion in this module are \defn{languages}, which are sets of strings. The main goal for us will be how to (formally) specify languages and to find out whether a string -is in a language or not. Note that the language containing the -empty string $\{\dq{}\}$ is not equal to the empty language -(or empty set): The former contains one element, namely \dq{} -(also written $[\,]$), but the latter does not contain any. - +is in a language or not.\footnote{You might wish to ponder +whether this is in general a hard or easy problem, where +hardness is meant in terms of Turing decidable, for example.} +Note that the language containing the empty string $\{\dq{}\}$ +is not equal to $\varnothing$, the empty language (or empty +set): The former contains one element, namely \dq{} (also +written $[\,]$), but the latter does not contain any +element. For languages we define the operation of \defn{language -concatenation}, written $A @ B$: +concatenation}, written like in the string case as $A @ B$: \begin{equation}\label{langconc} A @ B \dn \{s_1 @ s_2\;|\; s_1\in A \wedge s_2\in B\} @@ -189,8 +212,7 @@ in $s_1 @ s_2$ is string concatenation, while $A @ B$ refers to the concatenation of two languages (or sets of strings). As an example suppose $A=\{ab, ac\}$ and $B=\{zzz, qq, r\}$, -then $A \,@\, B$ is - +then $A \,@\, B$ is the language \[ \{abzzz, abqq, abr, aczzz, acqq, acr\} @@ -208,9 +230,9 @@ \end{tabular} \end{center} -\noindent Note the difference: the empty set behaves like $0$ -for multiplication and the set $\{[]\}$ like $1$ for -multiplication. +\noindent Note the difference in the last two lines: the empty +set behaves like $0$ for multiplication and the set $\{[]\}$ +like $1$ for multiplication. Following the language concatenation, we can define a \defn{language power} operation as follows: @@ -220,7 +242,7 @@ A^{n+1} & \dn & A \,@\, A^n \end{eqnarray*} -\noindent This definition is by induction on natural numbers. +\noindent This definition is by recursion on natural numbers. Note carefully that the zero-case is not defined as the empty set, but the set containing the empty string. So no matter what the set $A$ is, $A^0$ will always be $\{[]\}$. (There is @@ -230,12 +252,13 @@ Next we can define the \defn{star operation} for languages: $A^*$ is the union of all powers of $A$, or short -\[ +\begin{equation}\label{star} A^* \dn \bigcup_{0\le n}\; A^n -\] +\end{equation} -\noindent -Unfolding this definition +\noindent This star operation is often also called +\emph{Kleene-star}. Unfolding the definition in \eqref{star} +gives \[ A^0 \cup A^1 \cup A^2 \cup A^3 \cup \ldots @@ -248,16 +271,16 @@ \{[]\} \,\cup\, A \,\cup\, A @ A \,\cup\, A @ A @ A \,\cup\, \ldots \] -\noindent we can see that the empty string is always in $A^*$, +\noindent We can see that the empty string is always in $A^*$, no matter what $A$ is. This is because $[] \in A^0$. To make -sure you understand these definition, I leave you to answer +sure you understand these definitions, I leave you to answer what $\{[]\}^*$ and $\varnothing^*$ are. Recall that an alphabet is often referred to by the letter $\Sigma$. We can now write for the set of all strings over this alphabet $\Sigma^*$. In doing so we also include the empty string as a possible string over $\Sigma$. So if -$\Sigma = \{a, b\}$ then $\Sigma^*$ is +$\Sigma = \{a, b\}$, then $\Sigma^*$ is \[ \{[], a, b, ab, ba, aaa, aab, aba, abb, baa, bab, \ldots\}