\documentclass{article}\usepackage{../style}\usepackage{../langs}\begin{document}\section*{A Crash-Course on Notation}There are innumerable books available about compilers, automata theoryand formal languages. Unfortunately, they often use their ownnotational conventions and their own symbols. This handout is meant toclarify some of the notation I will use. I appologise in advance thatsometimes I will be a bit fuzzy\ldots the problem is that often wewant to have convenience in our mathematical definitions (to make themreadable and understandable), but other times we need pedanticprecision for actual programs.\subsubsection*{Characters and Strings}The most important concept in this module are strings. Stringsare composed of \defn{characters}. While characters are surelya familiar concept, we will make one subtle distinction inthis module. If we want to refer to concrete characters, like\code{a}, \code{b}, \code{c} and so on, we use a typewriter font.Accordingly if we want to refer to the concrete characters ofmy email address we shall write\begin{center}\pcode{christian.urban@kcl.ac.uk}\end{center}\noindent If we also need to explicitly indicate the ``space''character, we write \VS{}\hspace{1mm}. For example\begin{center}\tt{}hello\VS\hspace{0.5mm}world\end{center}\noindent But often we do not care which particular characterswe use. In such cases we use the italic font and write $a$,$b$, $c$ and so on for characters. Therefore if we need arepresentative string, we might write\[abracadabra\]\noindent In this string, we do not really care what thecharacters stand for, except we do care about the fact thatfor example the character $a$ is not equal to $b$ and so on.Why do I make this distinction? Because we often need todefine functions using variables ranging over characters. Weneed to somehow say this is a variable, say $c$, ranging overcharacters, while this is the atual character \pcode{c}.An \defn{alphabet} is a (non-empty) finite set of characters.Often the letter $\Sigma$ is used to refer to an alphabet. Forexample the ASCII characters \pcode{a} to \pcode{z} form analphabet. The digits $0$ to $9$ are another alphabet. TheGreek letters $\alpha$ to $\omega$ also form an alphabet. Ifnothing else is specified, we usually assume the alphabetconsists of just the lower-case letters $a$, $b$, \ldots, $z$.Sometimes, however, we explicitly want to restrict strings tocontain only the letters $a$ and $b$, for example. In thiscase we will state that the alphabet is the set $\{a, b\}$. \defn{Strings} are lists of characters. Unfortunately, thereare many ways how we can write down strings. In programminglanguages, they are usually written as \dq{$hello$} where thedouble quotes indicate that we are dealing with a string. Inprogramming languages, such as Scala, strings have a specialtype---namely \pcode{String} which is different from the typefor lists of chatacters. This is because strings can beefficiently represented in memory, unlike general lists. Since\code{String} and the type of lists of characters,\code{List[Char]} are not the same, we need to explicitlycoerce elements between the two types, for example\begin{lstlisting}[numbers=none]scala> "abc".toListres01: List[Char] = List(a, b, c)\end{lstlisting}\noindent Since in our (mathematical) definitions we regardstrings as lists of characters, we will also write\dq{$hello$} as\[[\text{\it h, e, l, l, o}] \qquad\text{or simply}\qquad \textit{hello}\]\noindent The important point is that we can always decomposesuch strings. For example, we will often consider the firstcharacter of a string, say $h$, and the ``rest'' of a stringsay \dq{\textit{ello}} when making definitions about strings.There are also some subtleties with the empty string,sometimes written as \dq{} but also as the empty list ofcharacters $[\,]$.\footnote{In the literature you can alsooften find that $\varepsilon$ or $\lambda$ is used torepresent the empty string.} Two strings, say $s_1$ and $s_2$, can be \defn{concatenated},which we write as $s_1 @ s_2$. If we regard $s_1$ and $s_2$ aslists of characters, then $@$ is the list-append function.Suppose we are given two strings \dq{\textit{foo}} and\dq{\textit{bar}}, then their concatenation, writen\dq{\textit{foo}} $@$ \dq{\textit{bar}}, gives\dq{\textit{foobar}}. But as said above, we will oftensimplify our life and just drop the double quotes whenever itis clear we are talking about strings, So we will often justwrite \textit{foo}, \textit{bar}, \textit{foobar} or\textit{foo $@$ bar}.Occasionally we will use the notation $a^n$ for strings, which stands for the string of $n$ repeated $a$s. So $a^{n}b^{n}$ isa string that has as many $a$s by as many $b$s.A simple property of string concatenation is \emph{associativity}, meaning\[(s_1 @ s_2) @ s_3 = s_1 @ (s_2 @ s_3)\] \noindent are always equal strings. The empty string behaveslike a \emph{unit element}, therefore\[s \,@\, [] = [] \,@\, s = s\]\subsubsection*{Sets and Languages}We will use the familiar operations $\cup$, $\cap$, $\subset$and $\subseteq$ for sets. For the empty set we will eitherwrite $\varnothing$ or $\{\,\}$. The set containing thenatural numbers $1$, $2$ and $3$, for example, we will writewith curly braces as\[\{1, 2, 3\}\]\noindent The notation $\in$ means \emph{element of}, so $1\in \{1, 2, 3\}$ is true and $4 \in \{1, 2, 3\}$ is false.Sets can potentially have infinitely many elements. Forexample the set of all natural numbers $\{0, 1, 2, \ldots\}$is infinite. This set is often also abbreviated as$\mathbb{N}$. We can define sets by giving all elements, forexample $\{0, 1\}$ for the set containing just $0$ and $1$,but also by \defn{set comprehensions}. For example the set ofall even natural numbers can be defined as\[\{n\;|\;n\in\mathbb{N} \wedge n\;\text{is even}\}\]\noindent Though silly, but the set $\{0, 1, 2\}$ could also bedefined by the following set comprehension\[\{n\;|\; n^2 < 9 \wedge n \in \mathbb{N}\}\]\noindent Notice that set comprehensions could be usedto define set union, intersection and difference:\begin{eqnarray*}A \cup B & \dn & \{x\;|\; x \in A \vee x \in B\}\\A \cap B & \dn & \{x\;|\; x \in A \wedge x \in B\}\\A \backslash B & \dn & \{x\;|\; x \in A \wedge x \not\in B\} \end{eqnarray*}\noindent In general set comprehensions are of the form$\{a\;|\;P\}$ which stands for the set of all elements $a$(from some set) for which some property $P$ holds.For defining sets, we will also often use the notion of the``big union''. An example is as follows:\begin{equation}\label{bigunion}\bigcup_{0\le n}\; \{n^2, n^2 + 1\}\end{equation}\noindent which is the set of all squares and their immediatesuccessors, so\[\{0, 1, 2, 4, 5, 9, 10, 16, 17, \ldots\}\]\noindent A big union is a sequence of unions which are indexed typically by a natural number. So the big union in\eqref{bigunion} could equally be written as\[\{0, 1\} \cup \{1, 2\} \cup \{4, 5\} \cup \{9, 10\} \cup \ldots\]\noindent but using the big union notation is more concise.While this stuff about sete might all look trivial or evenneedlessly pedantic, \emph{Nature} is never simple. If youwant to be amazed how complicated sets can get, watch out forthe last lecture just before Christmas where I want you toconvince you of the fact that some sets are more infinite thanothers. Actually that will be a fact that is very relevant tothe material of this module.Another important notion in this module are \defn{languages}, whichare sets of strings. One of the main goals for us will be how to(formally) specify languages and to find out whether a stringis in a language or not.\footnote{You might wish to ponderwhether this is in general a hard or easy problem, wherehardness is meant in terms of Turing decidable, for example.}Note that the language containing the empty string $\{\dq{}\}$is not equal to $\varnothing$, the empty language (or emptyset): The former contains one element, namely \dq{} (alsowritten $[\,]$), but the latter does not contain anyelement.For languages we define the operation of \defn{languageconcatenation}, written like in the string case as $A @ B$:\begin{equation}\label{langconc}A @ B \dn \{s_1 @ s_2\;|\; s_1\in A \wedge s_2\in B\}\end{equation}\noindent Be careful to understand the difference: the $@$in $s_1 @ s_2$ is string concatenation, while $A @ B$ refers to the concatenation of two languages (or sets of strings).As an example suppose $A=\{ab, ac\}$ and $B=\{zzz, qq, r\}$,then $A \,@\, B$ is the language\[\{abzzz, abqq, abr, aczzz, acqq, acr\}\] \noindent Recall the properties for string concatenation. Forlanguage concatenation we have the following properties\begin{center}\begin{tabular}{ll}associativity: & $(A @ B) @ C = A @ (B @ C)$\\unit element: & $A \,@\, \{[]\} = \{[]\} \,@\, A = A$\\zero element: & $A \,@\, \varnothing = \varnothing \,@\, A = \varnothing$\end{tabular}\end{center}\noindent Note the difference in the last two lines: the emptyset behaves like $0$ for multiplication and the set $\{[]\}$like $1$ for multiplication ($n * 1 = n$ and $n * 0 = 0$).Following the language concatenation, we can define a\defn{language power} operation as follows:\begin{eqnarray*}A^0 & \dn & \{[]\}\\A^{n+1} & \dn & A \,@\, A^n\end{eqnarray*}\noindent This definition is by recursion on natural numbers.Note carefully that the zero-case is not defined as the emptyset, but the set containing the empty string. So no matterwhat the set $A$ is, $A^0$ will always be $\{[]\}$. (There isanother hint about a connection between the $@$-operation andmultiplication: How is $x^n$ defined recursively and what is$x^0$?)Next we can define the \defn{star operation} for languages:$A\star$ is the union of all powers of $A$, or short\begin{equation}\label{star}A\star \dn \bigcup_{0\le n}\; A^n\end{equation}\noindent This star operation is often also called\emph{Kleene-star}. Unfolding the definition in \eqref{star}gives\[A^0 \cup A^1 \cup A^2 \cup A^3 \cup \ldots\]\noindentwhich is equal to \[\{[]\} \,\cup\, A \,\cup\, A @ A \,\cup\, A @ A @ A \,\cup\, \ldots\]\noindent We can see that the empty string is always in $A\star$,no matter what $A$ is. This is because $[] \in A^0$. To makesure you understand these definitions, I leave you to answerwhat $\{[]\}\star$ and $\varnothing\star$ are?Recall that an alphabet is often referred to by the letter$\Sigma$. We can now write for the set of \emph{all} stringsover this alphabet $\Sigma\star$. In doing so we also include theempty string as a possible string over $\Sigma$. So if $\Sigma= \{a, b\}$, then $\Sigma\star$ is\[\{[], a, b, aa, ab, ba, bb, aaa, aab, aba, abb, baa, bab, \ldots\}\]\noindent or in other words all strings containing $a$s and$b$s only, plus the empty string.\end{document}%%% Local Variables: %%% mode: latex%%% TeX-master: t%%% End: