handouts/notation.tex
changeset 404 245d302791c7
parent 398 c8ce95067c1a
child 476 d922cc83b70c
equal deleted inserted replaced
403:564f7584eff1 404:245d302791c7
     6 
     6 
     7 \begin{document}
     7 \begin{document}
     8 
     8 
     9 \section*{A Crash-Course on Notation}
     9 \section*{A Crash-Course on Notation}
    10 
    10 
    11 There are innumerable books available about compiler, automata
    11 There are innumerable books available about compilers,
    12 and formal languages. Unfortunately, they often use their own
    12 automata theory and formal languages. Unfortunately, they
    13 notational conventions and their own symbols. This handout is
    13 often use their own notational conventions and their own
    14 meant to clarify some of the notation I will use.
    14 symbols. This handout is meant to clarify some of the notation
       
    15 I will use. I appologise in advance that sometimes I will be a
       
    16 bit fuzzy\ldots the problem is that often we want to have
       
    17 convenience in our mathematical definitions (to make them
       
    18 readable and understandable), but sometimes we need precision
       
    19 for actual programs.
    15 
    20 
    16 \subsubsection*{Characters and Strings}
    21 \subsubsection*{Characters and Strings}
    17 
    22 
    18 The most important concept in this module are strings. Strings
    23 The most important concept in this module are strings. Strings
    19 are composed of \defn{characters}. While characters are surely
    24 are composed of \defn{characters}. While characters are surely
    38 \noindent But often we do not care which particular characters
    43 \noindent But often we do not care which particular characters
    39 we use. In such cases we use the italic font and write $a$,
    44 we use. In such cases we use the italic font and write $a$,
    40 $b$, $c$ and so on for characters. Therefore if we need a
    45 $b$, $c$ and so on for characters. Therefore if we need a
    41 representative string, we might write
    46 representative string, we might write
    42 
    47 
    43 \begin{equation}\label{abracadabra}
    48 \[
    44 abracadabra
    49 abracadabra
    45 \end{equation}
    50 \]
    46 
    51 
    47 \noindent In this string, we do not really care what the
    52 \noindent In this string, we do not really care what the
    48 characters stand for, except we do care about the fact that
    53 characters stand for, except we do care about the fact that
    49 for example the character $a$ is not equal to $b$ and so on.
    54 for example the character $a$ is not equal to $b$ and so on.
       
    55 Why do I make this distinction? Because we often need to
       
    56 define functions using variables ranging over characters. We
       
    57 need to somehow say this is a variable, say $c$, ranging over
       
    58 characters, while this is the atual character \pcode{c}.
    50 
    59 
    51 An \defn{alphabet} is a (non-empty) finite set of characters.
    60 An \defn{alphabet} is a (non-empty) finite set of characters.
    52 Often the letter $\Sigma$ is used to refer to an alphabet. For
    61 Often the letter $\Sigma$ is used to refer to an alphabet. For
    53 example the ASCII characters \pcode{a} to \pcode{z} form an
    62 example the ASCII characters \pcode{a} to \pcode{z} form an
    54 alphabet. The digits $0$ to $9$ are another alphabet. The
    63 alphabet. The digits $0$ to $9$ are another alphabet. The
    60 case we will state that the alphabet is the set $\{a, b\}$. 
    69 case we will state that the alphabet is the set $\{a, b\}$. 
    61 
    70 
    62 \defn{Strings} are lists of characters. Unfortunately, there
    71 \defn{Strings} are lists of characters. Unfortunately, there
    63 are many ways how we can write down strings. In programming
    72 are many ways how we can write down strings. In programming
    64 languages, they are usually written as \dq{$hello$} where the
    73 languages, they are usually written as \dq{$hello$} where the
    65 double quotes indicate that we are dealing with a string. But
    74 double quotes indicate that we are dealing with a string. In
    66 since we regard strings as lists of characters we could also
    75 programming languages, such as Scala, strings have a special
    67 write this string as
    76 type---namely \pcode{String} which is different from the type
    68 
    77 for lists of chatacters. This is because strings can be
    69 \[
    78 efficiently represented in memory, unlike general lists. Since
    70 [\text{\it h, e, l, l, o}] \;\;\text{or simply}\;\; \textit{hello}
    79 \code{String} and the type of lists of characters,
       
    80 \code{List[Char]} are not the same, we need to explicitly
       
    81 coerce elements between the two types, for example
       
    82 
       
    83 \begin{lstlisting}[numbers=none]
       
    84 scala> "abc".toList
       
    85 res01: List[Char] = List(a, b, c)
       
    86 \end{lstlisting}
       
    87 
       
    88 \noindent Since in our (mathematical) definitions we regard
       
    89 strings as lists of characters, we will also write
       
    90 \dq{$hello$} as
       
    91 
       
    92 \[
       
    93 [\text{\it h, e, l, l, o}] \qquad\text{or simply}\qquad \textit{hello}
    71 \]
    94 \]
    72 
    95 
    73 \noindent The important point is that we can always decompose
    96 \noindent The important point is that we can always decompose
    74 such strings. For example, we will often consider the first
    97 such strings. For example, we will often consider the first
    75 character of a string, say $h$, and the ``rest'' of a string
    98 character of a string, say $h$, and the ``rest'' of a string
    79 characters $[\,]$.\footnote{In the literature you can also
   102 characters $[\,]$.\footnote{In the literature you can also
    80 often find that $\varepsilon$ or $\lambda$ is used to
   103 often find that $\varepsilon$ or $\lambda$ is used to
    81 represent the empty string.} 
   104 represent the empty string.} 
    82 
   105 
    83 Two strings, say $s_1$ and $s_2$, can be \defn{concatenated},
   106 Two strings, say $s_1$ and $s_2$, can be \defn{concatenated},
    84 which we write as $s_1 @ s_2$. Suppose we are given two
   107 which we write as $s_1 @ s_2$. If we regard $s_1$ and $s_2$ as
    85 strings \dq{\textit{foo}} and \dq{\textit{bar}}, then their
   108 lists of characters, then $@$ is the list-append function.
    86 concatenation, writen \dq{\textit{foo}} $@$ \dq{\textit{bar}},
   109 Suppose we are given two strings \dq{\textit{foo}} and
    87 gives \dq{\textit{foobar}}. Often we will simplify our life
   110 \dq{\textit{bar}}, then their concatenation, writen
    88 and just drop the double quotes whenever it is clear we are
   111 \dq{\textit{foo}} $@$ \dq{\textit{bar}}, gives
    89 talking about strings, writing as already in
   112 \dq{\textit{foobar}}. But as said above, we will often
    90 \eqref{abracadabra} just \textit{foo}, \textit{bar},
   113 simplify our life and just drop the double quotes whenever it
    91 \textit{foobar} or \textit{foo $@$ bar}.
   114 is clear we are talking about strings, So we will often just
    92 
   115 write \textit{foo}, \textit{bar}, \textit{foobar} or
    93 Some simple properties of string concatenation hold. For
   116 \textit{foo $@$ bar}.
    94 example the concatenation operation is \emph{associative},
       
    95 meaning
       
    96 
       
    97 \[(s_1 @ s_2) @ s_3 = s_1 @ (s_2 @ s_3)\]  
       
    98 
       
    99 \noindent are always equal strings. The empty string behaves
       
   100 like a unit element, therefore
       
   101 
       
   102 \[s \,@\, [] = [] \,@\, s = s\]
       
   103 
       
   104 
   117 
   105 Occasionally we will use the notation $a^n$ for strings, which 
   118 Occasionally we will use the notation $a^n$ for strings, which 
   106 stands for the string of $n$ repeated $a$s. So $a^{n}b^{n}$ is
   119 stands for the string of $n$ repeated $a$s. So $a^{n}b^{n}$ is
   107 a string that has as many $a$s as $b$s.
   120 a string that has as many $a$s by as many $b$s.
   108 
   121 
   109 Note however that while for us strings are just lists of
   122 A simple property of string concatenation is 
   110 characters, programming languages often differentiate between
   123 \emph{associativity}, meaning
   111 the two concepts. In Scala, for example, there is the type of
   124 
   112 \code{String} and the type of lists of characters,
   125 \[(s_1 @ s_2) @ s_3 = s_1 @ (s_2 @ s_3)\]  
   113 \code{List[Char]}. They are not the same and we need to
   126 
   114 explicitly coerce elements between the two types, for example
   127 \noindent are always equal strings. The empty string behaves
   115 
   128 like a \emph{unit element}, therefore
   116 \begin{lstlisting}[numbers=none]
   129 
   117 scala> "abc".toList
   130 \[s \,@\, [] = [] \,@\, s = s\]
   118 res01: List[Char] = List(a, b, c)
       
   119 \end{lstlisting}
       
   120 
   131 
   121 
   132 
   122 \subsubsection*{Sets and Languages}
   133 \subsubsection*{Sets and Languages}
   123 
   134 
   124 We will use the familiar operations $\cup$, $\cap$, $\subset$
   135 We will use the familiar operations $\cup$, $\cap$, $\subset$
   135 \in \{1, 2, 3\}$ is true and $4 \in \{1, 2, 3\}$ is false.
   146 \in \{1, 2, 3\}$ is true and $4 \in \{1, 2, 3\}$ is false.
   136 Sets can potentially have infinitely many elements. For
   147 Sets can potentially have infinitely many elements. For
   137 example the set of all natural numbers $\{0, 1, 2, \ldots\}$
   148 example the set of all natural numbers $\{0, 1, 2, \ldots\}$
   138 is infinite. This set is often also abbreviated as
   149 is infinite. This set is often also abbreviated as
   139 $\mathbb{N}$. We can define sets by giving all elements, for
   150 $\mathbb{N}$. We can define sets by giving all elements, for
   140 example $\{0, 1\}$, but also by \defn{set comprehensions}. For
   151 example $\{0, 1\}$ for the set containing just $0$ and $1$,
   141 example the set of all even natural numbers can be defined as
   152 but also by \defn{set comprehensions}. For example the set of
       
   153 all even natural numbers can be defined as
   142 
   154 
   143 \[
   155 \[
   144 \{n\;|\;n\in\mathbb{N} \wedge n\;\text{is even}\}
   156 \{n\;|\;n\in\mathbb{N} \wedge n\;\text{is even}\}
   145 \]
   157 \]
   146   
   158   
   187 \ldots
   199 \ldots
   188 \]
   200 \]
   189 
   201 
   190 \noindent but using the big union notation is more concise.
   202 \noindent but using the big union notation is more concise.
   191 
   203 
   192 An important notion in this module are \defn{languages}, which
   204 While this stuff about sete might all look trivial or even
       
   205 needlessly pedantic, \emph{Nature} is never simple. If you
       
   206 want to be amazed how complicated sets can get, watch out for
       
   207 the last lecture just before Christmas where I want you to
       
   208 convince you of the fact that some sets are more infinite than
       
   209 others. Actually that will be a fact that is very relevant to
       
   210 the material of this module.
       
   211 
       
   212 Another important notion in this module are \defn{languages}, which
   193 are sets of strings. One of the main goals for us will be how to
   213 are sets of strings. One of the main goals for us will be how to
   194 (formally) specify languages and to find out whether a string
   214 (formally) specify languages and to find out whether a string
   195 is in a language or not.\footnote{You might wish to ponder
   215 is in a language or not.\footnote{You might wish to ponder
   196 whether this is in general a hard or easy problem, where
   216 whether this is in general a hard or easy problem, where
   197 hardness is meant in terms of Turing decidable, for example.}
   217 hardness is meant in terms of Turing decidable, for example.}
   249 another hint about a connection between the $@$-operation and
   269 another hint about a connection between the $@$-operation and
   250 multiplication: How is $x^n$ defined recursively and what is
   270 multiplication: How is $x^n$ defined recursively and what is
   251 $x^0$?)
   271 $x^0$?)
   252 
   272 
   253 Next we can define the \defn{star operation} for languages:
   273 Next we can define the \defn{star operation} for languages:
   254 $A^*$ is the union of all powers of $A$, or short
   274 $A\star$ is the union of all powers of $A$, or short
   255 
   275 
   256 \begin{equation}\label{star}
   276 \begin{equation}\label{star}
   257 A^* \dn \bigcup_{0\le n}\; A^n
   277 A\star \dn \bigcup_{0\le n}\; A^n
   258 \end{equation}
   278 \end{equation}
   259 
   279 
   260 \noindent This star operation is often also called
   280 \noindent This star operation is often also called
   261 \emph{Kleene-star}. Unfolding the definition in \eqref{star}
   281 \emph{Kleene-star}. Unfolding the definition in \eqref{star}
   262 gives
   282 gives
   270 
   290 
   271 \[
   291 \[
   272 \{[]\} \,\cup\, A \,\cup\, A @ A \,\cup\, A @ A @ A \,\cup\, \ldots
   292 \{[]\} \,\cup\, A \,\cup\, A @ A \,\cup\, A @ A @ A \,\cup\, \ldots
   273 \]
   293 \]
   274 
   294 
   275 \noindent We can see that the empty string is always in $A^*$,
   295 \noindent We can see that the empty string is always in $A\star$,
   276 no matter what $A$ is. This is because $[] \in A^0$. To make
   296 no matter what $A$ is. This is because $[] \in A^0$. To make
   277 sure you understand these definitions, I leave you to answer
   297 sure you understand these definitions, I leave you to answer
   278 what $\{[]\}^*$ and $\varnothing^*$ are. 
   298 what $\{[]\}\star$ and $\varnothing\star$ are?
   279 
   299 
   280 Recall that an alphabet is often referred to by the letter
   300 Recall that an alphabet is often referred to by the letter
   281 $\Sigma$. We can now write for the set of all strings over
   301 $\Sigma$. We can now write for the set of \emph{all} strings
   282 this alphabet $\Sigma^*$. In doing so we also include the 
   302 over this alphabet $\Sigma\star$. In doing so we also include the
   283 empty string as a possible string over $\Sigma$. So if
   303 empty string as a possible string over $\Sigma$. So if $\Sigma
   284 $\Sigma = \{a, b\}$, then $\Sigma^*$ is
   304 = \{a, b\}$, then $\Sigma\star$ is
   285 
   305 
   286 \[
   306 \[
   287 \{[], a, b, aa, ab, ba, bb, aaa, aab, aba, abb, baa, bab, \ldots\}
   307 \{[], a, b, aa, ab, ba, bb, aaa, aab, aba, abb, baa, bab, \ldots\}
   288 \]
   308 \]
   289 
   309