handouts/notation.tex
changeset 239 68d98140b90b
child 241 10f02605a46a
equal deleted inserted replaced
238:527fdb90fffe 239:68d98140b90b
       
     1 \documentclass{article}
       
     2 \usepackage{../style}
       
     3 \usepackage{../langs}
       
     4 
       
     5 
       
     6 
       
     7 \begin{document}
       
     8 
       
     9 \section*{A Crash-Course on Notation}
       
    10 
       
    11 \subsubsection*{Characters and Strings}
       
    12 
       
    13 In this module we will often use \defn{characters}. While they
       
    14 are surely familiar, we will make one subtle distinction. If
       
    15 we want to refer to concrete characters, like \code{a},
       
    16 \code{b} and so on, we use a typewriter font. So if we want to
       
    17 refer to the concrete characters of my email address we shall
       
    18 write
       
    19 
       
    20 \begin{center}
       
    21 \pcode{christian.urban@kcl.ac.uk}
       
    22 \end{center}
       
    23 
       
    24 \noindent If we need to explicitly indicate the ``space''
       
    25 character, we write \VS{}\hspace{1mm}. For example
       
    26 
       
    27 \begin{center}
       
    28 \tt{}hello\VS\hspace{0.5mm}world
       
    29 \end{center}
       
    30 
       
    31 
       
    32 \noindent But often we do not care
       
    33 about which characters we use. In such cases we us the italic
       
    34 font and write $a$, $b$ and so on. So if we need a
       
    35 representative string, we might write
       
    36 
       
    37 \begin{equation}\label{abracadabra}
       
    38 abracadabra
       
    39 \end{equation}
       
    40 
       
    41 \noindent We do not really care what the characters stand for,
       
    42 except we do care about is that for example the character $a$
       
    43 is not equal to $b$.
       
    44 
       
    45 An \defn{alphabet} is a finite set of characters. Often the
       
    46 letter $\Sigma$ is used to refer to an alphabet. For example
       
    47 the ASCII characters \pcode{a} to \pcode{z} form an alphabet.
       
    48 The digits $0$ to $9$ are another alphabet. If nothing else is
       
    49 specified, we usually assume the alphabet consists of just the
       
    50 lower-case letters $a$, $b$, \ldots, $z$. Sometimes, however,
       
    51 we explicitly restrict strings to contain, for example, only
       
    52 the letters $a$ and $b$. In this case we say the alphabet is
       
    53 the set $\{a, b\}$. 
       
    54 
       
    55 \defn{Strings} are lists of characters. Unfortunately, there
       
    56 are many ways how we can write down strings. In programming
       
    57 languages, they are usually written as \dq{$hello$} where the
       
    58 double quotes indicate that we dealing with a string. But
       
    59 since, strings are lists of characters we could also write
       
    60 this string as
       
    61 
       
    62 \[
       
    63 [\text{\it h, e, l, l, o}]
       
    64 \]
       
    65 
       
    66 \noindent The important point is that we can always decompose
       
    67 such strings. For example, we will often consider the first
       
    68 character of a string, say $h$, and the ``rest'' of a string
       
    69 say \dq{\textit{ello}} when making definitions about strings.
       
    70 There are some subtleties with the empty string, sometimes
       
    71 written as \dq{} but also as the empty list of characters
       
    72 $[\,]$. Two strings, for example $s_1$ and $s_2$, can be
       
    73 \emph{concatenated}, which we write as $s_1 @ s_2$. Suppose we
       
    74 are given two strings \dq{\textit{foo}} and \dq{\textit{bar}},
       
    75 then their concatenation, writen
       
    76 \dq{\textit{foo}} $@$ \dq{\textit{bar}}, gives
       
    77 \dq{\textit{foobar}}. Often we will simplify our life and just
       
    78 drop the double quotes whenever it is clear we are talking
       
    79 about strings, writing as already in \eqref{abracadabra} just
       
    80 \textit{foo}, \textit{bar}, \textit{foobar} or \textit{foo $@$
       
    81 bar}.
       
    82 
       
    83 Some simple properties of string concatenation hold. For
       
    84 example the concatenation operation is \emph{associative},
       
    85 meaning
       
    86 
       
    87 \[(s_1 @ s_2) @ s_3 = s_1 @ (s_2 @ s_3)\]  
       
    88 
       
    89 \noindent are always equal strings. The empty string behaves
       
    90 like a unit element, therefore
       
    91 
       
    92 \[s \,@\, [] = [] \,@\, s = s\]
       
    93 
       
    94 While for us strings are just lists of characters, programming
       
    95 languages often differentiate between the two concepts. In
       
    96 Scala, for example, there is the type of \code{String} and the
       
    97 type of lists of characters,  \code{List[Char]}. They are not
       
    98 the same and we need to explicitly coerce elements between the
       
    99 two types, for example
       
   100 
       
   101 \begin{lstlisting}[numbers=none]
       
   102 scala> "abc".toList
       
   103 res01: List[Char] = List(a, b, c)
       
   104 \end{lstlisting}
       
   105 
       
   106 
       
   107 \subsubsection*{Sets and Languages}
       
   108 
       
   109 We will use the familiar operations $\cup$ and $\cap$ for
       
   110 sets. For the empty set we will either write $\varnothing$ or
       
   111 $\{\,\}$. The set containing, for example, the natural numbers
       
   112 $1$, $2$ and $3$ we will write as
       
   113 
       
   114 \[
       
   115 \{1, 2, 3\}
       
   116 \]
       
   117 
       
   118 \noindent The notation $\in$ means \emph{element of}, so $1
       
   119 \in \{1, 2, 3\}$ is true and $3 \in \{1, 2, 3\}$ is false.
       
   120 Sets can potentially have infinitely many elements. For
       
   121 example the set of all natural numbers $\{0, 1, 2, \ldots\}$
       
   122 is infinite. This set is often also abbreviated as
       
   123 $\mathbb{N}$. We can define sets by giving all elements, like
       
   124 $\{0, 1\}$, but also by \defn{set comprehensions}. For example
       
   125 the set of all even natural numbers can be defined as
       
   126 
       
   127 \[
       
   128 \{n\;|\;n\in\mathbb{N} \wedge n\;\text{is even}\}
       
   129 \]
       
   130   
       
   131 \noindent Though silly, but the set $\{0, 1, 2\}$ could also be
       
   132 defined by the following set comprehension
       
   133 
       
   134 \[
       
   135 \{n\;|\; n^2 < 9 \wedge n \in \mathbb{N}\}
       
   136 \]
       
   137 
       
   138 \noindent Notice that set comprehensions could be used
       
   139 to define set union, intersection and difference:
       
   140 
       
   141 \begin{eqnarray*}
       
   142 A \cup B & \dn & \{x\;|\; x \in A \vee x \in B\}\\
       
   143 A \cap B & \dn & \{x\;|\; x \in A \wedge x \in B\}\\
       
   144 A \backslash B & \dn & \{x\;|\; x \in A \wedge x \not\in B\} 
       
   145 \end{eqnarray*}
       
   146 
       
   147 \noindent For defining sets, we will also often use the notion
       
   148 of the ``big union''. An example is as follows:
       
   149 
       
   150 \begin{equation}\label{bigunion}
       
   151 \bigcup_{0\le n}\; \{n^2, n^2 + 1\}
       
   152 \end{equation}
       
   153 
       
   154 \noindent which is the set of all squares and their immediate
       
   155 successors, so
       
   156 
       
   157 \[
       
   158 \{0, 1, 2, 4, 5, 9, 10, 16, 17, \ldots\}
       
   159 \]
       
   160 
       
   161 \noindent A big union is a sequence of unions which are 
       
   162 indexed typically by a natural number. So the big union in
       
   163 \eqref{bigunion} could equally be written as
       
   164 
       
   165 \[
       
   166 \{0, 1\} \cup \{1, 2\} \cup \{4, 5\} \cup \{9, 10\} \cup 
       
   167 \ldots
       
   168 \]
       
   169 
       
   170 \noindent but using the big union notation is more concise.
       
   171 
       
   172 An important notion in this module are \defn{Languages}, which
       
   173 are sets of strings. The main goal for us will be how to
       
   174 (formally) specify languages and to find out whether a string
       
   175 is in a language or not. Note that the language containing the
       
   176 empty string $\{\dq{}\}$ is not equal to the empty language
       
   177 (or empty set): The former contains one element, namely \dq{}
       
   178 (also written $[\,]$), but the latter does not contain any.
       
   179 
       
   180 
       
   181 For languages we define the operation of \defn{language
       
   182 concatenation}, written $A @ B$:
       
   183 
       
   184 \begin{equation}\label{langconc}
       
   185 A @ B \dn \{s_1 @ s_2\;|\; s_1\in A \wedge s_2\in B\}
       
   186 \end{equation}
       
   187 
       
   188 \noindent Be careful to understand the difference: the $@$
       
   189 in $s_1 @ s_2$ is string concatenation, while $A @ B$ refers 
       
   190 to the concatenation of two languages (or sets of strings).
       
   191 As an example suppose $A=\{ab, ac\}$ and $B=\{zzz, qq, r\}$,
       
   192 then $A \,@\, B$ is
       
   193 
       
   194 
       
   195 \[
       
   196 \{abzzz, abqq, abr, aczzz, acqq, acr\}
       
   197 \] 
       
   198 
       
   199 \noindent Recall the properties for string concatenation. For
       
   200 language concatenation we have the following properties
       
   201 
       
   202 \begin{center}
       
   203 \begin{tabular}{ll}
       
   204 associativity: & $(A @ B) @ C = A @ (B @ C)$\\
       
   205 unit element:  & $A \,@\, \{[]\} = \{[]\} \,@\, A = A$\\
       
   206 zero element:  & $A \,@\, \varnothing = \varnothing \,@\, A = 
       
   207 \varnothing$
       
   208 \end{tabular}
       
   209 \end{center}
       
   210 
       
   211 \noindent Note the difference: the empty set behaves like $0$
       
   212 for multiplication and the set $\{[]\}$ like $1$ for
       
   213 multiplication.
       
   214 
       
   215 Following the language concatenation, we can define a
       
   216 \defn{language power} operation as follows:
       
   217 
       
   218 \begin{eqnarray*}
       
   219 A^0     & \dn & \{[]\}\\
       
   220 A^{n+1} & \dn & A \,@\, A^n
       
   221 \end{eqnarray*}
       
   222 
       
   223 \noindent This definition is by induction on natural numbers.
       
   224 Note carefully that the zero-case is not defined as the empty
       
   225 set, but the set containing the empty string. So no matter
       
   226 what the set $A$ is, $A^0$ will always be $\{[]\}$. (There is
       
   227 another hint about a connection between the $@$-operation and
       
   228 multiplication: How is $x^n$ defined and what is $x^0$?)
       
   229 
       
   230 Next we can define the \defn{star operation} for languages: 
       
   231 $A^*$ is the union of all powers of $A$, or short
       
   232 
       
   233 \[
       
   234 A^* \dn \bigcup_{0\le n}\; A^n
       
   235 \]
       
   236 
       
   237 \noindent
       
   238 Unfolding this definition
       
   239 
       
   240 \[
       
   241 A^0 \cup A^1 \cup A^2 \cup A^3 \cup \ldots
       
   242 \]
       
   243 
       
   244 \noindent
       
   245 which is equal to 
       
   246 
       
   247 \[
       
   248 \{[]\} \,\cup\, A \,\cup\, A @ A \,\cup\, A @ A @ A \,\cup\, \ldots
       
   249 \]
       
   250 
       
   251 \noindent we can see that the empty string is always in $A^*$,
       
   252 no matter what $A$ is. This is because $[] \in A^0$. To make
       
   253 sure you understand these definition, I leave you to answer
       
   254 what $\{[]\}^*$ and $\varnothing^*$ are. 
       
   255 
       
   256 Recall that an alphabet is often referred to by the letter
       
   257 $\Sigma$. We can now write for the set of all strings over
       
   258 this alphabet $\Sigma^*$. In doing so we also include the 
       
   259 empty string as a possible string over $\Sigma$. So if
       
   260 $\Sigma = \{a, b\}$ then $\Sigma^*$ is
       
   261 
       
   262 \[
       
   263 \{[], a, b, ab, ba, aaa, aab, aba, abb, baa, bab, \ldots\}
       
   264 \]
       
   265 
       
   266 \noindent or in other words all strings containing $a$s and 
       
   267 $b$s only.
       
   268 
       
   269 \end{document}
       
   270 
       
   271 %%% Local Variables: 
       
   272 %%% mode: latex
       
   273 %%% TeX-master: t
       
   274 %%% End: