\documentclass{article}\usepackage{../style}\usepackage{../langs}\begin{document}\section*{A Crash-Course on Notation}There are innumerable books available about automata andformal languages. Unfortunately, they often use their ownnotational conventions and their own symbols. This handout ismeant to clarify some of the notation I will use.\subsubsection*{Characters and Strings}The most important concept in this module are strings. Stringsare composed of \defn{characters}. While characters are surelya familiar concept, we will make one subtle distinction inthis module. If we want to refer to concrete characters, like\code{a}, \code{b} and so on, we use a typewriter font.Accordingly if we want to refer to the concrete characters ofmy email address we shall write\begin{center}\pcode{christian.urban@kcl.ac.uk}\end{center}\noindent If we also need to explicitly indicate the ``space''character, we write \VS{}\hspace{1mm}. For example\begin{center}\tt{}hello\VS\hspace{0.5mm}world\end{center}\noindent But often we do not care which particular characterswe use. In such cases we use the italic font and write $a$,$b$ and so on for characters. Therefore if we need arepresentative string, we might write\begin{equation}\label{abracadabra}abracadabra\end{equation}\noindent In this string, we do not really care what thecharacters stand for, except we do care about the fact thatfor example the character $a$ is not equal to $b$ and so on.An \defn{alphabet} is a (non-empty) finite set of characters.Often the letter $\Sigma$ is used to refer to an alphabet. Forexample the ASCII characters \pcode{a} to \pcode{z} form analphabet. The digits $0$ to $9$ are another alphabet. TheGreek letters $\alpha$ to $\omega$ also form an alphabet. Ifnothing else is specified, we usually assume the alphabetconsists of just the lower-case letters $a$, $b$, \ldots, $z$.Sometimes, however, we explicitly want to restrict strings tocontain only the letters $a$ and $b$, for example. In thiscase we will state that the alphabet is the set $\{a, b\}$. \defn{Strings} are lists of characters. Unfortunately, thereare many ways how we can write down strings. In programminglanguages, they are usually written as \dq{$hello$} where thedouble quotes indicate that we are dealing with a string. Butsince we regard strings as lists of characters we could alsowrite this string as\[[\text{\it h, e, l, l, o}]\]\noindent The important point is that we can always decomposesuch strings. For example, we will often consider the firstcharacter of a string, say $h$, and the ``rest'' of a stringsay \dq{\textit{ello}} when making definitions about strings.There are also some subtleties with the empty string,sometimes written as \dq{} but also as the empty list ofcharacters $[\,]$.\footnote{In the literature you can alsooften find that $\varepsilon$ or $\lambda$ is used torepresent the empty string.} Two strings, say $s_1$ and $s_2$, can be \defn{concatenated},which we write as $s_1 @ s_2$. Suppose we are given twostrings \dq{\textit{foo}} and \dq{\textit{bar}}, then theirconcatenation, writen \dq{\textit{foo}} $@$ \dq{\textit{bar}},gives \dq{\textit{foobar}}. Often we will simplify our lifeand just drop the double quotes whenever it is clear we aretalking about strings, writing as already in\eqref{abracadabra} just \textit{foo}, \textit{bar},\textit{foobar} or \textit{foo $@$ bar}.Some simple properties of string concatenation hold. Forexample the concatenation operation is \emph{associative},meaning\[(s_1 @ s_2) @ s_3 = s_1 @ (s_2 @ s_3)\] \noindent are always equal strings. The empty string behaveslike a unit element, therefore\[s \,@\, [] = [] \,@\, s = s\]Occasionally we will use the notation $a^n$ for strings, which stands for the string of $n$ repeated $a$s. So $a^{n}b^{n}$ isa string that has as many $a$s as $b$s.While for us strings are just lists of characters, programminglanguages often differentiate between the two concepts. InScala, for example, there is the type of \code{String} and thetype of lists of characters, \code{List[Char]}. They are notthe same and we need to explicitly coerce elements between thetwo types, for example\begin{lstlisting}[numbers=none]scala> "abc".toListres01: List[Char] = List(a, b, c)\end{lstlisting}\subsubsection*{Sets and Languages}We will use the familiar operations $\cup$, $\cap$, $\subset$and $\subseteq$ for sets. For the empty set we will eitherwrite $\varnothing$ or $\{\,\}$. The set containing thenatural numbers $1$, $2$ and $3$, for example, we will writewith curly braces as\[\{1, 2, 3\}\]\noindent The notation $\in$ means \emph{element of}, so $1\in \{1, 2, 3\}$ is true and $3 \in \{1, 2, 3\}$ is false.Sets can potentially have infinitely many elements. Forexample the set of all natural numbers $\{0, 1, 2, \ldots\}$is infinite. This set is often also abbreviated as$\mathbb{N}$. We can define sets by giving all elements, forexample $\{0, 1\}$, but also by \defn{set comprehensions}. Forexample the set of all even natural numbers can be defined as\[\{n\;|\;n\in\mathbb{N} \wedge n\;\text{is even}\}\]\noindent Though silly, but the set $\{0, 1, 2\}$ could also bedefined by the following set comprehension\[\{n\;|\; n^2 < 9 \wedge n \in \mathbb{N}\}\]\noindent Notice that set comprehensions could be usedto define set union, intersection and difference:\begin{eqnarray*}A \cup B & \dn & \{x\;|\; x \in A \vee x \in B\}\\A \cap B & \dn & \{x\;|\; x \in A \wedge x \in B\}\\A \backslash B & \dn & \{x\;|\; x \in A \wedge x \not\in B\} \end{eqnarray*}\noindent In general set comprehensions are of the form$\{a\;|\;P\}$ which stands for the set of all elements $a$(from some set) for which some property $P$ holds.For defining sets, we will also often use the notion of the``big union''. An example is as follows:\begin{equation}\label{bigunion}\bigcup_{0\le n}\; \{n^2, n^2 + 1\}\end{equation}\noindent which is the set of all squares and their immediatesuccessors, so\[\{0, 1, 2, 4, 5, 9, 10, 16, 17, \ldots\}\]\noindent A big union is a sequence of unions which are indexed typically by a natural number. So the big union in\eqref{bigunion} could equally be written as\[\{0, 1\} \cup \{1, 2\} \cup \{4, 5\} \cup \{9, 10\} \cup \ldots\]\noindent but using the big union notation is more concise.An important notion in this module are \defn{languages}, whichare sets of strings. The main goal for us will be how to(formally) specify languages and to find out whether a stringis in a language or not.\footnote{You might wish to ponderwhether this is in general a hard or easy problem, wherehardness is meant in terms of Turing decidable, for example.}Note that the language containing the empty string $\{\dq{}\}$is not equal to $\varnothing$, the empty language (or emptyset): The former contains one element, namely \dq{} (alsowritten $[\,]$), but the latter does not contain anyelement.For languages we define the operation of \defn{languageconcatenation}, written like in the string case as $A @ B$:\begin{equation}\label{langconc}A @ B \dn \{s_1 @ s_2\;|\; s_1\in A \wedge s_2\in B\}\end{equation}\noindent Be careful to understand the difference: the $@$in $s_1 @ s_2$ is string concatenation, while $A @ B$ refers to the concatenation of two languages (or sets of strings).As an example suppose $A=\{ab, ac\}$ and $B=\{zzz, qq, r\}$,then $A \,@\, B$ is the language\[\{abzzz, abqq, abr, aczzz, acqq, acr\}\] \noindent Recall the properties for string concatenation. Forlanguage concatenation we have the following properties\begin{center}\begin{tabular}{ll}associativity: & $(A @ B) @ C = A @ (B @ C)$\\unit element: & $A \,@\, \{[]\} = \{[]\} \,@\, A = A$\\zero element: & $A \,@\, \varnothing = \varnothing \,@\, A = \varnothing$\end{tabular}\end{center}\noindent Note the difference in the last two lines: the emptyset behaves like $0$ for multiplication and the set $\{[]\}$like $1$ for multiplication.Following the language concatenation, we can define a\defn{language power} operation as follows:\begin{eqnarray*}A^0 & \dn & \{[]\}\\A^{n+1} & \dn & A \,@\, A^n\end{eqnarray*}\noindent This definition is by recursion on natural numbers.Note carefully that the zero-case is not defined as the emptyset, but the set containing the empty string. So no matterwhat the set $A$ is, $A^0$ will always be $\{[]\}$. (There isanother hint about a connection between the $@$-operation andmultiplication: How is $x^n$ defined and what is $x^0$?)Next we can define the \defn{star operation} for languages: $A^*$ is the union of all powers of $A$, or short\begin{equation}\label{star}A^* \dn \bigcup_{0\le n}\; A^n\end{equation}\noindent This star operation is often also called\emph{Kleene-star}. Unfolding the definition in \eqref{star}gives\[A^0 \cup A^1 \cup A^2 \cup A^3 \cup \ldots\]\noindentwhich is equal to \[\{[]\} \,\cup\, A \,\cup\, A @ A \,\cup\, A @ A @ A \,\cup\, \ldots\]\noindent We can see that the empty string is always in $A^*$,no matter what $A$ is. This is because $[] \in A^0$. To makesure you understand these definitions, I leave you to answerwhat $\{[]\}^*$ and $\varnothing^*$ are. Recall that an alphabet is often referred to by the letter$\Sigma$. We can now write for the set of all strings overthis alphabet $\Sigma^*$. In doing so we also include the empty string as a possible string over $\Sigma$. So if$\Sigma = \{a, b\}$, then $\Sigma^*$ is\[\{[], a, b, ab, ba, aaa, aab, aba, abb, baa, bab, \ldots\}\]\noindent or in other words all strings containing $a$s and $b$s only.\end{document}%%% Local Variables: %%% mode: latex%%% TeX-master: t%%% End: