--- a/handouts/ho01.tex Wed Jun 01 13:04:06 2016 +0100
+++ b/handouts/ho01.tex Tue Jun 14 11:41:48 2016 +0100
@@ -34,7 +34,7 @@
This module is about text processing, be it for web-crawlers,
compilers, dictionaries, DNA-data and so on. When looking for
-a particular string in a large text we can use the
+a particular string, like $abc$ in a large text we can use the
Knuth-Morris-Pratt algorithm, which is currently the most
efficient general string search algorithm. But often we do
\emph{not} just look for a particular string, but for string
@@ -67,14 +67,14 @@
\noindent where the first part matches one or more lowercase
letters (\pcode{a-z}), digits (\pcode{0-9}), underscores, dots
-or hyphens. The \pcode{+} ensures the ``one or more''. Then
-comes the \pcode{@}-sign, followed by the domain name which
-must be one or more lowercase letters, digits, underscores,
-dots or hyphens. Note there cannot be an underscore in the
-domain name. Finally there must be a dot followed by the
-toplevel domain. This toplevel domain must be 2 to 6 lowercase
-letters including the dot. Example strings which follow this
-pattern are:
+and hyphens. The \pcode{+} at the end of the brackets ensures
+the ``one or more''. Then comes the \pcode{@}-sign, followed
+by the domain name which must be one or more lowercase
+letters, digits, underscores, dots or hyphens. Note there
+cannot be an underscore in the domain name. Finally there must
+be a dot followed by the toplevel domain. This toplevel domain
+must be 2 to 6 lowercase letters including the dot. Example
+strings which follow this pattern are:
\begin{lstlisting}[language={},numbers=none,keywordstyle=\color{black}]
niceandsimple@example.org
@@ -113,10 +113,11 @@
are \pcode{x}, \pcode{foo}, \pcode{foo_bar_1}, \pcode{A_very_42_long_object_name},
but not \pcode{_i} and also not \pcode{4you}.
-Many programming language offer libraries that can be used to
+Many programming languages offer libraries that can be used to
validate such strings against regular expressions. Also there
are some common, and I am sure very familiar, ways of how to
-construct regular expressions. For example in Scala we have:
+construct regular expressions. For example in Scala we have
+a library implementing the following regular expressions:
\begin{center}
\begin{tabular}{lp{9cm}}
@@ -177,10 +178,11 @@
Regular expressions were introduced by Kleene in the 1950ies
and they have been object of intense study since then. They
-are nowadays pretty much ubiquitous in computer science. I am
-sure you have come across them before. Why on earth then is
-there any interest in studying them again in depth in this
-module? Well, one answer is in the following graph about
+are nowadays pretty much ubiquitous in computer science. There
+are many libraries implementing regular expressions. I am sure
+you have come across them before (remember PRA?). Why on earth
+then is there any interest in studying them again in depth in
+this module? Well, one answer is in the following graph about
regular expression matching in Python and in Ruby.
\begin{center}
@@ -214,7 +216,7 @@
Ruby is even slightly worse.\footnote{In this example Ruby
uses the slightly different regular expression
\texttt{a?a?a?...a?a?aaa...aa}, where the \texttt{a?} and
-\texttt{a} each occur $n$ times. More test results can be
+\texttt{a} each occur $n$ times. More such test cases can be
found at \url{http://www.computerbytesman.com/redos/}.}
Admittedly, this regular expression is carefully chosen to
exhibit this exponential behaviour, but similar ones occur
@@ -285,7 +287,7 @@
\noindent Because we overload our notation, there are some
subtleties you should be aware of. When regular expressions
-are referred to then $\ZERO$ (in bold font) does not stand for
+are referred to, then $\ZERO$ (in bold font) does not stand for
the number zero: rather it is a particular pattern that does
not match any string. Similarly, in the context of regular
expressions, $\ONE$ does not stand for the number one but for
@@ -368,7 +370,7 @@
If you prefer to think in terms of the implementation
of regular expressions in Scala, the constructors and
classes relate as follows\footnote{More about Scala is
-in the handout about A Crash-Course on Scala.}
+in the handout about \emph{A Crash-Course on Scala}.}
\begin{center}
\begin{tabular}{rcl}
@@ -663,7 +665,8 @@
\end{minipage}
\end{center}
-\caption{Nothing that can be said this\ldots\label{monster}}
+\caption{Nothing that can be said about this regular
+expression\ldots\label{monster}}
\end{figure}
--- a/handouts/notation.tex Wed Jun 01 13:04:06 2016 +0100
+++ b/handouts/notation.tex Tue Jun 14 11:41:48 2016 +0100
@@ -8,10 +8,15 @@
\section*{A Crash-Course on Notation}
-There are innumerable books available about compiler, automata
-and formal languages. Unfortunately, they often use their own
-notational conventions and their own symbols. This handout is
-meant to clarify some of the notation I will use.
+There are innumerable books available about compilers,
+automata theory and formal languages. Unfortunately, they
+often use their own notational conventions and their own
+symbols. This handout is meant to clarify some of the notation
+I will use. I appologise in advance that sometimes I will be a
+bit fuzzy\ldots the problem is that often we want to have
+convenience in our mathematical definitions (to make them
+readable and understandable), but sometimes we need precision
+for actual programs.
\subsubsection*{Characters and Strings}
@@ -40,13 +45,17 @@
$b$, $c$ and so on for characters. Therefore if we need a
representative string, we might write
-\begin{equation}\label{abracadabra}
+\[
abracadabra
-\end{equation}
+\]
\noindent In this string, we do not really care what the
characters stand for, except we do care about the fact that
for example the character $a$ is not equal to $b$ and so on.
+Why do I make this distinction? Because we often need to
+define functions using variables ranging over characters. We
+need to somehow say this is a variable, say $c$, ranging over
+characters, while this is the atual character \pcode{c}.
An \defn{alphabet} is a (non-empty) finite set of characters.
Often the letter $\Sigma$ is used to refer to an alphabet. For
@@ -62,12 +71,26 @@
\defn{Strings} are lists of characters. Unfortunately, there
are many ways how we can write down strings. In programming
languages, they are usually written as \dq{$hello$} where the
-double quotes indicate that we are dealing with a string. But
-since we regard strings as lists of characters we could also
-write this string as
+double quotes indicate that we are dealing with a string. In
+programming languages, such as Scala, strings have a special
+type---namely \pcode{String} which is different from the type
+for lists of chatacters. This is because strings can be
+efficiently represented in memory, unlike general lists. Since
+\code{String} and the type of lists of characters,
+\code{List[Char]} are not the same, we need to explicitly
+coerce elements between the two types, for example
+
+\begin{lstlisting}[numbers=none]
+scala> "abc".toList
+res01: List[Char] = List(a, b, c)
+\end{lstlisting}
+
+\noindent Since in our (mathematical) definitions we regard
+strings as lists of characters, we will also write
+\dq{$hello$} as
\[
-[\text{\it h, e, l, l, o}] \;\;\text{or simply}\;\; \textit{hello}
+[\text{\it h, e, l, l, o}] \qquad\text{or simply}\qquad \textit{hello}
\]
\noindent The important point is that we can always decompose
@@ -81,44 +104,32 @@
represent the empty string.}
Two strings, say $s_1$ and $s_2$, can be \defn{concatenated},
-which we write as $s_1 @ s_2$. Suppose we are given two
-strings \dq{\textit{foo}} and \dq{\textit{bar}}, then their
-concatenation, writen \dq{\textit{foo}} $@$ \dq{\textit{bar}},
-gives \dq{\textit{foobar}}. Often we will simplify our life
-and just drop the double quotes whenever it is clear we are
-talking about strings, writing as already in
-\eqref{abracadabra} just \textit{foo}, \textit{bar},
-\textit{foobar} or \textit{foo $@$ bar}.
+which we write as $s_1 @ s_2$. If we regard $s_1$ and $s_2$ as
+lists of characters, then $@$ is the list-append function.
+Suppose we are given two strings \dq{\textit{foo}} and
+\dq{\textit{bar}}, then their concatenation, writen
+\dq{\textit{foo}} $@$ \dq{\textit{bar}}, gives
+\dq{\textit{foobar}}. But as said above, we will often
+simplify our life and just drop the double quotes whenever it
+is clear we are talking about strings, So we will often just
+write \textit{foo}, \textit{bar}, \textit{foobar} or
+\textit{foo $@$ bar}.
-Some simple properties of string concatenation hold. For
-example the concatenation operation is \emph{associative},
-meaning
+Occasionally we will use the notation $a^n$ for strings, which
+stands for the string of $n$ repeated $a$s. So $a^{n}b^{n}$ is
+a string that has as many $a$s by as many $b$s.
+
+A simple property of string concatenation is
+\emph{associativity}, meaning
\[(s_1 @ s_2) @ s_3 = s_1 @ (s_2 @ s_3)\]
\noindent are always equal strings. The empty string behaves
-like a unit element, therefore
+like a \emph{unit element}, therefore
\[s \,@\, [] = [] \,@\, s = s\]
-Occasionally we will use the notation $a^n$ for strings, which
-stands for the string of $n$ repeated $a$s. So $a^{n}b^{n}$ is
-a string that has as many $a$s as $b$s.
-
-Note however that while for us strings are just lists of
-characters, programming languages often differentiate between
-the two concepts. In Scala, for example, there is the type of
-\code{String} and the type of lists of characters,
-\code{List[Char]}. They are not the same and we need to
-explicitly coerce elements between the two types, for example
-
-\begin{lstlisting}[numbers=none]
-scala> "abc".toList
-res01: List[Char] = List(a, b, c)
-\end{lstlisting}
-
-
\subsubsection*{Sets and Languages}
We will use the familiar operations $\cup$, $\cap$, $\subset$
@@ -137,8 +148,9 @@
example the set of all natural numbers $\{0, 1, 2, \ldots\}$
is infinite. This set is often also abbreviated as
$\mathbb{N}$. We can define sets by giving all elements, for
-example $\{0, 1\}$, but also by \defn{set comprehensions}. For
-example the set of all even natural numbers can be defined as
+example $\{0, 1\}$ for the set containing just $0$ and $1$,
+but also by \defn{set comprehensions}. For example the set of
+all even natural numbers can be defined as
\[
\{n\;|\;n\in\mathbb{N} \wedge n\;\text{is even}\}
@@ -189,7 +201,15 @@
\noindent but using the big union notation is more concise.
-An important notion in this module are \defn{languages}, which
+While this stuff about sete might all look trivial or even
+needlessly pedantic, \emph{Nature} is never simple. If you
+want to be amazed how complicated sets can get, watch out for
+the last lecture just before Christmas where I want you to
+convince you of the fact that some sets are more infinite than
+others. Actually that will be a fact that is very relevant to
+the material of this module.
+
+Another important notion in this module are \defn{languages}, which
are sets of strings. One of the main goals for us will be how to
(formally) specify languages and to find out whether a string
is in a language or not.\footnote{You might wish to ponder
@@ -251,10 +271,10 @@
$x^0$?)
Next we can define the \defn{star operation} for languages:
-$A^*$ is the union of all powers of $A$, or short
+$A\star$ is the union of all powers of $A$, or short
\begin{equation}\label{star}
-A^* \dn \bigcup_{0\le n}\; A^n
+A\star \dn \bigcup_{0\le n}\; A^n
\end{equation}
\noindent This star operation is often also called
@@ -272,16 +292,16 @@
\{[]\} \,\cup\, A \,\cup\, A @ A \,\cup\, A @ A @ A \,\cup\, \ldots
\]
-\noindent We can see that the empty string is always in $A^*$,
+\noindent We can see that the empty string is always in $A\star$,
no matter what $A$ is. This is because $[] \in A^0$. To make
sure you understand these definitions, I leave you to answer
-what $\{[]\}^*$ and $\varnothing^*$ are.
+what $\{[]\}\star$ and $\varnothing\star$ are?
Recall that an alphabet is often referred to by the letter
-$\Sigma$. We can now write for the set of all strings over
-this alphabet $\Sigma^*$. In doing so we also include the
-empty string as a possible string over $\Sigma$. So if
-$\Sigma = \{a, b\}$, then $\Sigma^*$ is
+$\Sigma$. We can now write for the set of \emph{all} strings
+over this alphabet $\Sigma\star$. In doing so we also include the
+empty string as a possible string over $\Sigma$. So if $\Sigma
+= \{a, b\}$, then $\Sigma\star$ is
\[
\{[], a, b, aa, ab, ba, bb, aaa, aab, aba, abb, baa, bab, \ldots\}