\documentclass{article}\usepackage{../style}\usepackage{../langs}\usepackage{../graphics}\begin{document}\fnote{\copyright{} Christian Urban, King's College London, 2014, 2015, 2016, 2017, 2018, 2020}\section*{A Crash-Course on Notation}There are an innumerable number of books available on compilers, automata theoryand formal languages. Unfortunately, they often use their ownnotational conventions and their own symbols. This handout is meant toclarify some of the notation I will use. I apologise in advance thatsometimes I will be a bit fuzzy\ldots the problem is that often wewant to have convenience in our mathematical definitions (to make themreadable and understandable), but other times we need pedanticprecision for actual programs.\subsubsection*{Characters and Strings}The most basic concept in this module are strings. Stringsare composed of \defn{characters}. While characters are surelya familiar concept, we will make one subtle distinction inthis module. If we want to refer to concrete characters, like\code{a}, \code{b}, \code{c} and so on, we will use a typewriter font.Accordingly if we want to refer to the concrete characters ofmy email address we shall write\begin{center}\pcode{christian.urban@kcl.ac.uk}\end{center}\noindent If we also need to explicitly indicate the ``space''character, we write \VS{}\hspace{1mm}. For example\begin{center}\tt{}hello\VS\hspace{0.5mm}world\end{center}\noindent But often we do not care which particular characterswe use. In such cases we use the italic font and write $a$,$b$, $c$ and so on for characters. Therefore if we need arepresentative string, we might write\[abracadabra\]\noindent In this string, we do not really care what thecharacters stand for, except we do care about the fact thatfor example the character $a$ is not equal to $b$ and so on.Why do I make this distinction? Because we often need todefine functions using variables ranging over characters. Weneed to somehow say ``this-is-a-variable'' and give it a name. In such cases we use the italic font.An \defn{alphabet} is a (non-empty) finite set of characters.Often the letter $\Sigma$ is used to refer to an alphabet. Forexample the ASCII characters \pcode{a} to \pcode{z} form analphabet. The digits $0$ to $9$ are another alphabet. TheGreek letters $\alpha$ to $\omega$ also form an alphabet. Ifnothing else is specified, we usually assume the alphabetconsists of just the lower-case letters $a$, $b$, \ldots, $z$.Sometimes, however, we explicitly want to restrict strings tocontain only the letters $a$ and $b$, for example. In thiscase we will state that the alphabet is the set $\{a, b\}$. \defn{Strings} are lists of characters. Unfortunately, thereare many ways how we can write down strings. In programminglanguages, they are usually written as \dq{\texttt{hello}} where thedouble quotes indicate that we are dealing with a string. Intyped programming languages, such as Scala, strings have a specialtype---namely \pcode{String} which is different from the typefor lists of characters. This is because strings can beefficiently represented in memory, unlike lists. Since\code{String} and the type of lists of characters(namely \code{List[Char]}) are not the same, we need to explicitlycoerce elements between the two types, for example\begin{lstlisting}[numbers=none]scala> "abc".toListres01: List[Char] = List(a, b, c)\end{lstlisting}\noindentHowever, we do not want to do this kind of explicit coercion in ourpencil-and-paper, everyday arguments. So in our (mathematical)definitions we regard strings as lists of characters and we will alsowrite \dq{$hello$} as list\[[\text{\it h, e, l, l, o}] \qquad\text{or simply}\qquad \textit{hello}\]\noindent The important point is that we can always decomposesuch strings. For example, we will often consider the firstcharacter of a string, say $h$, and the ``rest'' of a stringsay \dq{\textit{ello}} when making definitions about strings.There are also some subtleties with the empty string,sometimes written as \dq{} but also as the empty list ofcharacters $[\,]$.\footnote{In the literature you can alsooften find that $\varepsilon$ or $\lambda$ is used torepresent the empty string. But we are not going to use this notation.} Two strings, say $s_1$ and $s_2$, can be \defn{concatenated},which we write as $s_1 @ s_2$. If we regard $s_1$ and $s_2$ aslists of characters, then $@$ is the list-append function.Suppose we are given two strings \dq{\textit{foo}} and\dq{\textit{bar}}, then their concatenation, written\dq{\textit{foo}} $@$ \dq{\textit{bar}}, gives\dq{\textit{foobar}}. But as said above, we will oftensimplify our life and just drop the double quotes whenever itis clear we are talking about strings. So we will justwrite \textit{foo}, \textit{bar}, \textit{foobar} \textit{foo $@$ bar} and so on.Occasionally we will use the notation $a^n$ for strings, which standsfor the string of $n$ repeated $a$s. So $a^{n}b^{n}$ is a string thathas some number of $a$s followed by the same number of $b$s.Confusingly, in Scala the notation is ``times'' for this opration.So you can write \begin{lstlisting}[numbers=none]scala> "a" * 13val res02: String = aaaaaaaaaaaaa\end{lstlisting}\noindentA simple property of string concatenation is \emph{associativity}, meaning\[(s_1 @ s_2) @ s_3 = s_1 @ (s_2 @ s_3)\] \noindent are always equal strings. The empty string behaveslike a \emph{unit element}, therefore\[s \,@\, [] = [] \,@\, s = s\]\subsubsection*{Sets and Languages}We will use the familiar operations $\cup$, $\cap$, $\subset$and $\subseteq$ for sets. For the empty set we will eitherwrite $\varnothing$ or $\{\,\}$. The set containing thenatural numbers $1$, $2$ and $3$, for example, we will writewith curly braces as\[\{1, 2, 3\}\]\noindent The notation $\in$ means \emph{element of}, so $1 \in \{1,2, 3\}$ is true and $4 \in \{1, 2, 3\}$ is false. Note that the\emph{list} $[1, 2, 3]$ is something different from the \emph{set}$\{1, 2, 3\}$: in the former we care about the order and potentiallyseveral occurrences of a number; while with the latter we do not.Also sets can potentially have infinitely many elements, whereas listscannot. For examplethe set of all natural numbers $\{0, 1, 2, \ldots\}$ is infinite. Thisset is often also abbreviated as $\mathbb{N}$. Lists can be very large, but they cannot contain infinitely many elements.We can define sets by giving all their elements, for example $\{0, 1\}$ forthe set containing just $0$ and $1$. But often we need to use \defn{set comprehensions} to define sets. For example the set of all \emph{even}natural numbers can be defined as\[\{n\;|\;n\in\mathbb{N} \wedge n\;\text{is even}\}\]\noindent Set comprehensions consist of a ``base set'' (in this caseall the natural numbers) and a predicate (here eveness).Though silly, but the set $\{0, 1, 2\}$ could also bedefined by the following set comprehension\[\{n\;|\; n \in \mathbb{N} \wedge n^2 < 9\}\]\noindent Can you see why this defines the set $\{0, 1, 2\}$? Noticethat set comprehensions are quite powerful constructions. For example theycould be used to define set union,set intersection and set difference:\begin{eqnarray*}A \cup B & \dn & \{x\;|\; x \in A \vee x \in B\}\\A \cap B & \dn & \{x\;|\; x \in A \wedge x \in B\}\\A \backslash B & \dn & \{x\;|\; x \in A \wedge x \not\in B\} \end{eqnarray*}\noindent In general set comprehensions are of the form$\{a\;|\;P\}$ which stands for the set of all elements $a$(from some set) for which some property $P$ holds. If programmingis more your-kind-of-thing, you might recognise the similaritieswith for-comprehensions, for example for the silly set above youcould write\begin{lstlisting}[numbers=none]scala> for (n <- (0 to 10).toSet; if n * n < 9) yield nval res03: Set[Int] = Set(0, 1, 2)\end{lstlisting}\noindentThis is pretty much the same as $\{n\;|\; n \in \mathbb{N} \wedge n^2 < 9\}$just in Scala syntax.For defining sets, we will also often use the notion of the``big union''. An example is as follows:\begin{equation}\label{bigunion}\bigcup_{0\le n}\; \{n^2, n^2 + 1\}\end{equation}\noindent which is the set of all squares and their immediatesuccessors, so\[\{0, 1, 2, 4, 5, 9, 10, 16, 17, \ldots\}\]\noindent A big union is a sequence of unions which are indexed typically by a natural number. So the big union in\eqref{bigunion} could equally be written as\[\{0, 1\} \cup \{1, 2\} \cup \{4, 5\} \cup \{9, 10\} \cup \ldots\]\noindent but using the big union notation is more concise.\medskipAs an aside: While this stuff about sets might all look trivial oreven needlessly pedantic, \emph{Nature} is never simple. If you wantto be amazed how complicated sets can get, watch out for the lastlecture just before Christmas where I want to convince you of the factthat some sets are more infinite than other sets. Yes, you readcorrectly, there can be sets that are ``more infinite'' thanothers. If you think this is obvious: say you have the infinite set$\mathbb{N}\backslash\{0\} = \{1, 2, 3, 4, \ldots\}$ which is all thenatural numbers except $0$, and then compare it to the set$\{0, 1, 2, 3, 4, \ldots\}$ which contains the $0$ and all othernumbers. If you think, the second must be more infinite\ldots{} well,then think again. Because the two infinite sets\begin{center} $\{1, 2, 3, 4, \ldots\}$ and $\{0, 1, 2, 3, 4, \ldots\}$\end{center}\noindentcontain actually the same amount of elements. Does this make sense to you?If yes, good. If not, then something to learn about.Though this might all look strange, infinite sets will be atopic that is very relevant to the material of this module. It tellsus what we can compute with a computer (actually an algorithm) and whatwe cannot. But during the first 9 lectures we can go by without this``weird'' stuff. End of aside.\smallskipAnother important notion in this module are \defn{languages}, whichare sets of strings. One of the main goals for us will be how to(formally) specify languages and to find out whether a stringis in a language or not.\footnote{You might wish to ponderwhether this is in general a hard or easy problem, wherehardness is meant in terms of Turing decidable, for example.}Note that the language containing the empty string $\{\dq{}\}$is not equal to $\varnothing$, the empty language (or emptyset): The former contains one element, namely \dq{} (alsowritten $[\,]$), but the latter does not contain anyelement at all! Make sure you see the difference.For languages we define the operation of \defn{languageconcatenation}, written like in the string case as $A @ B$:\begin{equation}\label{langconc}A @ B \dn \{s_1 @ s_2\;|\; s_1\in A \wedge s_2\in B\}\end{equation}\noindent Be careful to understand the difference: the $@$in $s_1 @ s_2$ is string concatenation, while $A @ B$ refers to the concatenation of two languages (or sets of strings).As an example suppose $A=\{ab, ac\}$ and $B=\{zzz, qq, r\}$,then $A \,@\, B$ is the language\[\{abzzz, abqq, abr, aczzz, acqq, acr\}\] \noindent The cool thing about Scala is that we can define languageconcatenation very elegantly as\begin{lstlisting}[numbers=none]def concat(A: Set[String], B: Set[String]) = for (x <- A ; y <- B) yield x ++ y\end{lstlisting}\noindentwhere \code{++} is string concatenation in Scala.Recall the properties for string concatenation. Forlanguage concatenation we have the following properties\begin{center}\begin{tabular}{ll}associativity: & $(A @ B) @ C = A @ (B @ C)$\\unit element: & $A \,@\, \{[]\} = \{[]\} \,@\, A = A$\\zero element: & $A \,@\, \varnothing = \varnothing \,@\, A = \varnothing$\end{tabular}\end{center}\noindent Note the difference in the last two lines: the emptyset behaves like $0$ for multiplication, whereas the set $\{[]\}$behaves like $1$ for multiplication ($n * 1 = n$ and $n * 0 = 0$).Again this is a subtletly you need to get compfortable with.Using the operation of language concatenation, we can define a\defn{language power} operation as follows:\begin{eqnarray*}A^0 & \dn & \{[]\}\\A^{n+1} & \dn & A \,@\, A^n\end{eqnarray*}\noindent This definition is by recursion on natural numbers.Note carefully that the zero-case is not defined as the emptyset, but the set containing the empty string. So no matterwhat the set $A$ is, $A^0$ will always be $\{[]\}$. (There isanother hint about a connection between the $@$-operation andmultiplication: How is $x^n$ defined in mathematics and what is$x^0$?)Next we can define the \defn{star operation} for languages:$A\star$ is the union of all powers of $A$, or short\begin{equation}\label{star}A\star \dn \bigcup_{0\le n}\; A^n\end{equation}\noindent This star operation is often also called\emph{Kleene-star} after the mathematician/computer scientistStephen Cole Kleene. Unfolding the definition in~\eqref{star}gives\[A\star \dn A^0 \cup A^1 \cup A^2 \cup A^3 \cup \ldots\]\noindentwhich is equal to \[A\star \dn \{[]\} \,\cup\, A \,\cup\, A @ A \,\cup\, A @ A @ A \,\cup\, \ldots\]\noindent We can see that the empty string is always in $A\star$,no matter what $A$ is. This is because $[] \in A^0$. To makesure you understand these definitions, I leave you to answerwhat $\{[]\}\star$ and $\varnothing\star$ are?Recall that an alphabet is often referred to by the letter$\Sigma$. We can now write for the set of \emph{all} stringsover this alphabet as $\Sigma\star$. In doing so we also include theempty string as a possible string (over $\Sigma$). Assuming $\Sigma= \{a, b\}$, then $\Sigma\star$ is\[\{[], a, b, aa, ab, ba, bb, aaa, aab, aba, abb, baa, bab, \ldots\}\]\noindent or in words all strings containing $a$s and$b$s only, plus the empty string.\bigskip\noindentThanks for making it until here! There are also some personal conventionsabout regular expressions. But I will explain them in the handout for thefirst week. An exercise you can do: Implement the power operation for languagesand try out some examples.\end{document}%%% Local Variables: %%% mode: latex%%% TeX-master: t%%% End: