handouts/notation.tex
changeset 504 3390e863d796
parent 502 81d20ccb8801
child 505 ca51b1d4dae5
equal deleted inserted replaced
503:3b9496db3fb9 504:3390e863d796
     6 
     6 
     7 \begin{document}
     7 \begin{document}
     8 
     8 
     9 \section*{A Crash-Course on Notation}
     9 \section*{A Crash-Course on Notation}
    10 
    10 
    11 There are innumerable books available about compilers,
    11 There are innumerable books available about compilers, automata theory
    12 automata theory and formal languages. Unfortunately, they
    12 and formal languages. Unfortunately, they often use their own
    13 often use their own notational conventions and their own
    13 notational conventions and their own symbols. This handout is meant to
    14 symbols. This handout is meant to clarify some of the notation
    14 clarify some of the notation I will use. I apologise in advance that
    15 I will use. I appologise in advance that sometimes I will be a
    15 sometimes I will be a bit fuzzy\ldots the problem is that often we
    16 bit fuzzy\ldots the problem is that often we want to have
    16 want to have convenience in our mathematical definitions (to make them
    17 convenience in our mathematical definitions (to make them
    17 readable and understandable), but other times we need pedantic
    18 readable and understandable), but sometimes we need precision
    18 precision for actual programs.
    19 for actual programs.
       
    20 
    19 
    21 \subsubsection*{Characters and Strings}
    20 \subsubsection*{Characters and Strings}
    22 
    21 
    23 The most important concept in this module are strings. Strings
    22 The most important concept in this module are strings. Strings
    24 are composed of \defn{characters}. While characters are surely
    23 are composed of \defn{characters}. While characters are surely
    53 characters stand for, except we do care about the fact that
    52 characters stand for, except we do care about the fact that
    54 for example the character $a$ is not equal to $b$ and so on.
    53 for example the character $a$ is not equal to $b$ and so on.
    55 Why do I make this distinction? Because we often need to
    54 Why do I make this distinction? Because we often need to
    56 define functions using variables ranging over characters. We
    55 define functions using variables ranging over characters. We
    57 need to somehow say this is a variable, say $c$, ranging over
    56 need to somehow say this is a variable, say $c$, ranging over
    58 characters, while this is the atual character \pcode{c}.
    57 characters, while this is the actual character \pcode{c}.
    59 
    58 
    60 An \defn{alphabet} is a (non-empty) finite set of characters.
    59 An \defn{alphabet} is a (non-empty) finite set of characters.
    61 Often the letter $\Sigma$ is used to refer to an alphabet. For
    60 Often the letter $\Sigma$ is used to refer to an alphabet. For
    62 example the ASCII characters \pcode{a} to \pcode{z} form an
    61 example the ASCII characters \pcode{a} to \pcode{z} form an
    63 alphabet. The digits $0$ to $9$ are another alphabet. The
    62 alphabet. The digits $0$ to $9$ are another alphabet. The
    68 contain only the letters $a$ and $b$, for example. In this
    67 contain only the letters $a$ and $b$, for example. In this
    69 case we will state that the alphabet is the set $\{a, b\}$. 
    68 case we will state that the alphabet is the set $\{a, b\}$. 
    70 
    69 
    71 \defn{Strings} are lists of characters. Unfortunately, there
    70 \defn{Strings} are lists of characters. Unfortunately, there
    72 are many ways how we can write down strings. In programming
    71 are many ways how we can write down strings. In programming
    73 languages, they are usually written as \dq{$hello$} where the
    72 languages, they are usually written as \dq{\texttt{hello}} where the
    74 double quotes indicate that we are dealing with a string. In
    73 double quotes indicate that we are dealing with a string. In
    75 programming languages, such as Scala, strings have a special
    74 typed programming languages, such as Scala, strings have a special
    76 type---namely \pcode{String} which is different from the type
    75 type---namely \pcode{String} which is different from the type
    77 for lists of chatacters. This is because strings can be
    76 for lists of characters. This is because strings can be
    78 efficiently represented in memory, unlike general lists. Since
    77 efficiently represented in memory, unlike lists. Since
    79 \code{String} and the type of lists of characters,
    78 \code{String} and the type of lists of characters
    80 \code{List[Char]} are not the same, we need to explicitly
    79 (\code{List[Char]}) are not the same, we need to explicitly
    81 coerce elements between the two types, for example
    80 coerce elements between the two types, for example
    82 
    81 
    83 \begin{lstlisting}[numbers=none]
    82 \begin{lstlisting}[numbers=none]
    84 scala> "abc".toList
    83 scala> "abc".toList
    85 res01: List[Char] = List(a, b, c)
    84 res01: List[Char] = List(a, b, c)
    86 \end{lstlisting}
    85 \end{lstlisting}
    87 
    86 
    88 \noindent Since in our (mathematical) definitions we regard
    87 \noindent
    89 strings as lists of characters, we will also write
    88 However, we do not want to do this kind of explicit coercion in our
    90 \dq{$hello$} as
    89 pencil-and-paper, everyday arguments.  So in our (mathematical)
       
    90 definitions we regard strings as lists of characters, we will also
       
    91 write \dq{$hello$} as
    91 
    92 
    92 \[
    93 \[
    93 [\text{\it h, e, l, l, o}] \qquad\text{or simply}\qquad \textit{hello}
    94 [\text{\it h, e, l, l, o}] \qquad\text{or simply}\qquad \textit{hello}
    94 \]
    95 \]
    95 
    96 
   105 
   106 
   106 Two strings, say $s_1$ and $s_2$, can be \defn{concatenated},
   107 Two strings, say $s_1$ and $s_2$, can be \defn{concatenated},
   107 which we write as $s_1 @ s_2$. If we regard $s_1$ and $s_2$ as
   108 which we write as $s_1 @ s_2$. If we regard $s_1$ and $s_2$ as
   108 lists of characters, then $@$ is the list-append function.
   109 lists of characters, then $@$ is the list-append function.
   109 Suppose we are given two strings \dq{\textit{foo}} and
   110 Suppose we are given two strings \dq{\textit{foo}} and
   110 \dq{\textit{bar}}, then their concatenation, writen
   111 \dq{\textit{bar}}, then their concatenation, written
   111 \dq{\textit{foo}} $@$ \dq{\textit{bar}}, gives
   112 \dq{\textit{foo}} $@$ \dq{\textit{bar}}, gives
   112 \dq{\textit{foobar}}. But as said above, we will often
   113 \dq{\textit{foobar}}. But as said above, we will often
   113 simplify our life and just drop the double quotes whenever it
   114 simplify our life and just drop the double quotes whenever it
   114 is clear we are talking about strings, So we will often just
   115 is clear we are talking about strings, So we will often just
   115 write \textit{foo}, \textit{bar}, \textit{foobar} or
   116 write \textit{foo}, \textit{bar}, \textit{foobar} or
   116 \textit{foo $@$ bar}.
   117 \textit{foo $@$ bar}.
   117 
   118 
   118 Occasionally we will use the notation $a^n$ for strings, which 
   119 Occasionally we will use the notation $a^n$ for strings, which stands
   119 stands for the string of $n$ repeated $a$s. So $a^{n}b^{n}$ is
   120 for the string of $n$ repeated $a$s. So $a^{n}b^{n}$ is a string that
   120 a string that has as many $a$s by as many $b$s.
   121 has some number of $a$s followed by the same number of $b$s.  A simple
   121 
   122 property of string concatenation is \emph{associativity}, meaning
   122 A simple property of string concatenation is 
       
   123 \emph{associativity}, meaning
       
   124 
   123 
   125 \[(s_1 @ s_2) @ s_3 = s_1 @ (s_2 @ s_3)\]  
   124 \[(s_1 @ s_2) @ s_3 = s_1 @ (s_2 @ s_3)\]  
   126 
   125 
   127 \noindent are always equal strings. The empty string behaves
   126 \noindent are always equal strings. The empty string behaves
   128 like a \emph{unit element}, therefore
   127 like a \emph{unit element}, therefore
   140 
   139 
   141 \[
   140 \[
   142 \{1, 2, 3\}
   141 \{1, 2, 3\}
   143 \]
   142 \]
   144 
   143 
   145 \noindent The notation $\in$ means \emph{element of}, so $1
   144 \noindent The notation $\in$ means \emph{element of}, so $1 \in \{1,
   146 \in \{1, 2, 3\}$ is true and $4 \in \{1, 2, 3\}$ is false.
   145 2, 3\}$ is true and $4 \in \{1, 2, 3\}$ is false.  Note that the
   147 Sets can potentially have infinitely many elements. For
   146 \emph{list} $[1, 2, 3]$ is something different from the \emph{set}
   148 example the set of all natural numbers $\{0, 1, 2, \ldots\}$
   147 $\{1, 2, 3\}$: in the former we care about the order and potentially
   149 is infinite. This set is often also abbreviated as
   148 several occurrences of a number; while with the latter we do not.
   150 $\mathbb{N}$. We can define sets by giving all elements, for
   149 Also sets can potentially have infinitely many elements, whereas lists
   151 example $\{0, 1\}$ for the set containing just $0$ and $1$,
   150 cannot. For example
   152 but also by \defn{set comprehensions}. For example the set of
   151 the set of all natural numbers $\{0, 1, 2, \ldots\}$ is infinite. This
   153 all even natural numbers can be defined as
   152 set is often also abbreviated as $\mathbb{N}$. Lists can be very large, but they cannot contain infinitely many elements.
       
   153 
       
   154 We can define sets by giving all elements, for example $\{0, 1\}$ for
       
   155 the set containing just $0$ and $1$, but also by \defn{set
       
   156   comprehensions}. For example the set of all even natural numbers can
       
   157 be defined as
   154 
   158 
   155 \[
   159 \[
   156 \{n\;|\;n\in\mathbb{N} \wedge n\;\text{is even}\}
   160 \{n\;|\;n\in\mathbb{N} \wedge n\;\text{is even}\}
   157 \]
   161 \]
   158   
   162   
   161 
   165 
   162 \[
   166 \[
   163 \{n\;|\; n^2 < 9 \wedge n \in \mathbb{N}\}
   167 \{n\;|\; n^2 < 9 \wedge n \in \mathbb{N}\}
   164 \]
   168 \]
   165 
   169 
   166 \noindent Notice that set comprehensions could be used
   170 \noindent Can you see why this defines the set $\{0, 1, 2\}$?  Notice
   167 to define set union, intersection and difference:
   171 that set comprehensions could be used to define set union,
       
   172 intersection and difference:
   168 
   173 
   169 \begin{eqnarray*}
   174 \begin{eqnarray*}
   170 A \cup B & \dn & \{x\;|\; x \in A \vee x \in B\}\\
   175 A \cup B & \dn & \{x\;|\; x \in A \vee x \in B\}\\
   171 A \cap B & \dn & \{x\;|\; x \in A \wedge x \in B\}\\
   176 A \cap B & \dn & \{x\;|\; x \in A \wedge x \in B\}\\
   172 A \backslash B & \dn & \{x\;|\; x \in A \wedge x \not\in B\} 
   177 A \backslash B & \dn & \{x\;|\; x \in A \wedge x \not\in B\} 
   199 \ldots
   204 \ldots
   200 \]
   205 \]
   201 
   206 
   202 \noindent but using the big union notation is more concise.
   207 \noindent but using the big union notation is more concise.
   203 
   208 
   204 While this stuff about sete might all look trivial or even
   209 As an aside: While this stuff about sets might all look trivial or even needlessly
   205 needlessly pedantic, \emph{Nature} is never simple. If you
   210 pedantic, \emph{Nature} is never simple. If you want to be amazed how
   206 want to be amazed how complicated sets can get, watch out for
   211 complicated sets can get, watch out for the last lecture just before
   207 the last lecture just before Christmas where I want you to
   212 Christmas where I want to convince you of the fact that some sets are
   208 convince you of the fact that some sets are more infinite than
   213 more infinite than others. Yes, you read correctly, there can be sets
   209 others. Actually that will be a fact that is very relevant to
   214 that are ``more infinite'' then others. If you think this is obvious:
   210 the material of this module.
   215 say you have the infinite set $\mathbb{N}\backslash\{0\} = \{1, 2, 3, 4, \ldots\}$ which is all
       
   216 the natural numbers except $0$, and then compare it to the set
       
   217 $\{0, 1, 2, 3, 4, \ldots\}$ which contains the $0$. If you think,
       
   218 the second must be more infinite\ldots{} well, then think again. Because the two
       
   219 infinite sets
       
   220 
       
   221 \begin{center}
       
   222   $\{1, 2, 3, 4, \ldots\}$ and
       
   223   $\{0, 1, 2, 3, 4, \ldots\}$
       
   224 \end{center}
       
   225 
       
   226 \noindent
       
   227 contain actually the same number of elements. Does this make sense?
       
   228 Though this might all look strange this about infinite sets will be a
       
   229 topic that is very relevant to the material of this module. It tells
       
   230 us what we can compute with a computer (actually algorithm) and what
       
   231 we cannot. But during the first 9 lectures we can go by without this
       
   232 ``weird'' stuff.
   211 
   233 
   212 Another important notion in this module are \defn{languages}, which
   234 Another important notion in this module are \defn{languages}, which
   213 are sets of strings. One of the main goals for us will be how to
   235 are sets of strings. One of the main goals for us will be how to
   214 (formally) specify languages and to find out whether a string
   236 (formally) specify languages and to find out whether a string
   215 is in a language or not.\footnote{You might wish to ponder
   237 is in a language or not.\footnote{You might wish to ponder
   252 
   274 
   253 \noindent Note the difference in the last two lines: the empty
   275 \noindent Note the difference in the last two lines: the empty
   254 set behaves like $0$ for multiplication and the set $\{[]\}$
   276 set behaves like $0$ for multiplication and the set $\{[]\}$
   255 like $1$ for multiplication ($n * 1 = n$ and $n * 0 = 0$).
   277 like $1$ for multiplication ($n * 1 = n$ and $n * 0 = 0$).
   256 
   278 
   257 Following the language concatenation, we can define a
   279 Using the operation of language concatenation, we can define a
   258 \defn{language power} operation as follows:
   280 \defn{language power} operation as follows:
   259 
   281 
   260 \begin{eqnarray*}
   282 \begin{eqnarray*}
   261 A^0     & \dn & \{[]\}\\
   283 A^0     & \dn & \{[]\}\\
   262 A^{n+1} & \dn & A \,@\, A^n
   284 A^{n+1} & \dn & A \,@\, A^n
   265 \noindent This definition is by recursion on natural numbers.
   287 \noindent This definition is by recursion on natural numbers.
   266 Note carefully that the zero-case is not defined as the empty
   288 Note carefully that the zero-case is not defined as the empty
   267 set, but the set containing the empty string. So no matter
   289 set, but the set containing the empty string. So no matter
   268 what the set $A$ is, $A^0$ will always be $\{[]\}$. (There is
   290 what the set $A$ is, $A^0$ will always be $\{[]\}$. (There is
   269 another hint about a connection between the $@$-operation and
   291 another hint about a connection between the $@$-operation and
   270 multiplication: How is $x^n$ defined recursively and what is
   292 multiplication: How is $x^n$ defined in mathematics and what is
   271 $x^0$?)
   293 $x^0$?)
   272 
   294 
   273 Next we can define the \defn{star operation} for languages:
   295 Next we can define the \defn{star operation} for languages:
   274 $A\star$ is the union of all powers of $A$, or short
   296 $A\star$ is the union of all powers of $A$, or short
   275 
   297 
   280 \noindent This star operation is often also called
   302 \noindent This star operation is often also called
   281 \emph{Kleene-star}. Unfolding the definition in \eqref{star}
   303 \emph{Kleene-star}. Unfolding the definition in \eqref{star}
   282 gives
   304 gives
   283 
   305 
   284 \[
   306 \[
   285 A^0 \cup A^1 \cup A^2 \cup A^3 \cup \ldots
   307 A\star \dn A^0 \cup A^1 \cup A^2 \cup A^3 \cup \ldots
   286 \]
   308 \]
   287 
   309 
   288 \noindent
   310 \noindent
   289 which is equal to 
   311 which is equal to 
   290 
   312 
   291 \[
   313 \[
   292 \{[]\} \,\cup\, A \,\cup\, A @ A \,\cup\, A @ A @ A \,\cup\, \ldots
   314 A\star \dn \{[]\} \,\cup\, A \,\cup\, A @ A \,\cup\, A @ A @ A \,\cup\, \ldots
   293 \]
   315 \]
   294 
   316 
   295 \noindent We can see that the empty string is always in $A\star$,
   317 \noindent We can see that the empty string is always in $A\star$,
   296 no matter what $A$ is. This is because $[] \in A^0$. To make
   318 no matter what $A$ is. This is because $[] \in A^0$. To make
   297 sure you understand these definitions, I leave you to answer
   319 sure you understand these definitions, I leave you to answer
   298 what $\{[]\}\star$ and $\varnothing\star$ are?
   320 what $\{[]\}\star$ and $\varnothing\star$ are?
   299 
   321 
   300 Recall that an alphabet is often referred to by the letter
   322 Recall that an alphabet is often referred to by the letter
   301 $\Sigma$. We can now write for the set of \emph{all} strings
   323 $\Sigma$. We can now write for the set of \emph{all} strings
   302 over this alphabet $\Sigma\star$. In doing so we also include the
   324 over this alphabet as $\Sigma\star$. In doing so we also include the
   303 empty string as a possible string over $\Sigma$. So if $\Sigma
   325 empty string as a possible string over $\Sigma$. So if $\Sigma
   304 = \{a, b\}$, then $\Sigma\star$ is
   326 = \{a, b\}$, then $\Sigma\star$ is
   305 
   327 
   306 \[
   328 \[
   307 \{[], a, b, aa, ab, ba, bb, aaa, aab, aba, abb, baa, bab, \ldots\}
   329 \{[], a, b, aa, ab, ba, bb, aaa, aab, aba, abb, baa, bab, \ldots\}