handouts/notation.tex
changeset 241 10f02605a46a
parent 239 68d98140b90b
child 242 35104ee14f87
equal deleted inserted replaced
240:de4f6382590a 241:10f02605a46a
     6 
     6 
     7 \begin{document}
     7 \begin{document}
     8 
     8 
     9 \section*{A Crash-Course on Notation}
     9 \section*{A Crash-Course on Notation}
    10 
    10 
       
    11 There are innumerable books available about automata and
       
    12 formal languages. Unfortunately, they often use their own
       
    13 notational conventions and their own symbols. This handout is
       
    14 meant to clarify some of the notation I will use.
       
    15 
    11 \subsubsection*{Characters and Strings}
    16 \subsubsection*{Characters and Strings}
    12 
    17 
    13 In this module we will often use \defn{characters}. While they
    18 The most important concept in this module are strings. Strings
    14 are surely familiar, we will make one subtle distinction. If
    19 are composed of \defn{characters}. While characters are surely
    15 we want to refer to concrete characters, like \code{a},
    20 a familiar concept, we will make one subtle distinction in
    16 \code{b} and so on, we use a typewriter font. So if we want to
    21 this module. If we want to refer to concrete characters, like
    17 refer to the concrete characters of my email address we shall
    22 \code{a}, \code{b} and so on, we use a typewriter font.
    18 write
    23 Accordingly if we want to refer to the concrete characters of
       
    24 my email address we shall write
    19 
    25 
    20 \begin{center}
    26 \begin{center}
    21 \pcode{christian.urban@kcl.ac.uk}
    27 \pcode{christian.urban@kcl.ac.uk}
    22 \end{center}
    28 \end{center}
    23 
    29 
    24 \noindent If we need to explicitly indicate the ``space''
    30 \noindent If we also need to explicitly indicate the ``space''
    25 character, we write \VS{}\hspace{1mm}. For example
    31 character, we write \VS{}\hspace{1mm}. For example
    26 
    32 
    27 \begin{center}
    33 \begin{center}
    28 \tt{}hello\VS\hspace{0.5mm}world
    34 \tt{}hello\VS\hspace{0.5mm}world
    29 \end{center}
    35 \end{center}
    30 
    36 
    31 
    37 
    32 \noindent But often we do not care
    38 \noindent But often we do not care which particular characters
    33 about which characters we use. In such cases we us the italic
    39 we use. In such cases we use the italic font and write $a$,
    34 font and write $a$, $b$ and so on. So if we need a
    40 $b$ and so on for characters. Therefore if we need a
    35 representative string, we might write
    41 representative string, we might write
    36 
    42 
    37 \begin{equation}\label{abracadabra}
    43 \begin{equation}\label{abracadabra}
    38 abracadabra
    44 abracadabra
    39 \end{equation}
    45 \end{equation}
    40 
    46 
    41 \noindent We do not really care what the characters stand for,
    47 \noindent In this string, we do not really care what the
    42 except we do care about is that for example the character $a$
    48 characters stand for, except we do care about the fact that
    43 is not equal to $b$.
    49 for example the character $a$ is not equal to $b$ and so on.
    44 
    50 
    45 An \defn{alphabet} is a finite set of characters. Often the
    51 An \defn{alphabet} is a (non-empty) finite set of characters.
    46 letter $\Sigma$ is used to refer to an alphabet. For example
    52 Often the letter $\Sigma$ is used to refer to an alphabet. For
    47 the ASCII characters \pcode{a} to \pcode{z} form an alphabet.
    53 example the ASCII characters \pcode{a} to \pcode{z} form an
    48 The digits $0$ to $9$ are another alphabet. If nothing else is
    54 alphabet. The digits $0$ to $9$ are another alphabet. The
    49 specified, we usually assume the alphabet consists of just the
    55 Greek letters $\alpha$ to $\omega$ also form an alphabet. If
    50 lower-case letters $a$, $b$, \ldots, $z$. Sometimes, however,
    56 nothing else is specified, we usually assume the alphabet
    51 we explicitly restrict strings to contain, for example, only
    57 consists of just the lower-case letters $a$, $b$, \ldots, $z$.
    52 the letters $a$ and $b$. In this case we say the alphabet is
    58 Sometimes, however, we explicitly want to restrict strings to
    53 the set $\{a, b\}$. 
    59 contain only the letters $a$ and $b$, for example. In this
       
    60 case we will state that the alphabet is the set $\{a, b\}$. 
    54 
    61 
    55 \defn{Strings} are lists of characters. Unfortunately, there
    62 \defn{Strings} are lists of characters. Unfortunately, there
    56 are many ways how we can write down strings. In programming
    63 are many ways how we can write down strings. In programming
    57 languages, they are usually written as \dq{$hello$} where the
    64 languages, they are usually written as \dq{$hello$} where the
    58 double quotes indicate that we dealing with a string. But
    65 double quotes indicate that we are dealing with a string. But
    59 since, strings are lists of characters we could also write
    66 since we regard strings as lists of characters we could also
    60 this string as
    67 write this string as
    61 
    68 
    62 \[
    69 \[
    63 [\text{\it h, e, l, l, o}]
    70 [\text{\it h, e, l, l, o}]
    64 \]
    71 \]
    65 
    72 
    66 \noindent The important point is that we can always decompose
    73 \noindent The important point is that we can always decompose
    67 such strings. For example, we will often consider the first
    74 such strings. For example, we will often consider the first
    68 character of a string, say $h$, and the ``rest'' of a string
    75 character of a string, say $h$, and the ``rest'' of a string
    69 say \dq{\textit{ello}} when making definitions about strings.
    76 say \dq{\textit{ello}} when making definitions about strings.
    70 There are some subtleties with the empty string, sometimes
    77 There are also some subtleties with the empty string,
    71 written as \dq{} but also as the empty list of characters
    78 sometimes written as \dq{} but also as the empty list of
    72 $[\,]$. Two strings, for example $s_1$ and $s_2$, can be
    79 characters $[\,]$.\footnote{In the literature you can also
    73 \emph{concatenated}, which we write as $s_1 @ s_2$. Suppose we
    80 often find that $\varepsilon$ or $\lambda$ is used to
    74 are given two strings \dq{\textit{foo}} and \dq{\textit{bar}},
    81 represent the empty string.} 
    75 then their concatenation, writen
    82 
    76 \dq{\textit{foo}} $@$ \dq{\textit{bar}}, gives
    83 Two strings, say $s_1$ and $s_2$, can be \defn{concatenated},
    77 \dq{\textit{foobar}}. Often we will simplify our life and just
    84 which we write as $s_1 @ s_2$. Suppose we are given two
    78 drop the double quotes whenever it is clear we are talking
    85 strings \dq{\textit{foo}} and \dq{\textit{bar}}, then their
    79 about strings, writing as already in \eqref{abracadabra} just
    86 concatenation, writen \dq{\textit{foo}} $@$ \dq{\textit{bar}},
    80 \textit{foo}, \textit{bar}, \textit{foobar} or \textit{foo $@$
    87 gives \dq{\textit{foobar}}. Often we will simplify our life
    81 bar}.
    88 and just drop the double quotes whenever it is clear we are
       
    89 talking about strings, writing as already in
       
    90 \eqref{abracadabra} just \textit{foo}, \textit{bar},
       
    91 \textit{foobar} or \textit{foo $@$ bar}.
    82 
    92 
    83 Some simple properties of string concatenation hold. For
    93 Some simple properties of string concatenation hold. For
    84 example the concatenation operation is \emph{associative},
    94 example the concatenation operation is \emph{associative},
    85 meaning
    95 meaning
    86 
    96 
    88 
    98 
    89 \noindent are always equal strings. The empty string behaves
    99 \noindent are always equal strings. The empty string behaves
    90 like a unit element, therefore
   100 like a unit element, therefore
    91 
   101 
    92 \[s \,@\, [] = [] \,@\, s = s\]
   102 \[s \,@\, [] = [] \,@\, s = s\]
       
   103 
       
   104 
       
   105 Occasionally we will use the notation $a^n$ for strings, which 
       
   106 stands for the string of $n$ repeated $a$s. So $a^{n}b^{n}$ is
       
   107 a string that has as many $a$s as $b$s.
    93 
   108 
    94 While for us strings are just lists of characters, programming
   109 While for us strings are just lists of characters, programming
    95 languages often differentiate between the two concepts. In
   110 languages often differentiate between the two concepts. In
    96 Scala, for example, there is the type of \code{String} and the
   111 Scala, for example, there is the type of \code{String} and the
    97 type of lists of characters,  \code{List[Char]}. They are not
   112 type of lists of characters,  \code{List[Char]}. They are not
   104 \end{lstlisting}
   119 \end{lstlisting}
   105 
   120 
   106 
   121 
   107 \subsubsection*{Sets and Languages}
   122 \subsubsection*{Sets and Languages}
   108 
   123 
   109 We will use the familiar operations $\cup$ and $\cap$ for
   124 We will use the familiar operations $\cup$, $\cap$, $\subset$
   110 sets. For the empty set we will either write $\varnothing$ or
   125 and $\subseteq$ for sets. For the empty set we will either
   111 $\{\,\}$. The set containing, for example, the natural numbers
   126 write $\varnothing$ or $\{\,\}$. The set containing the
   112 $1$, $2$ and $3$ we will write as
   127 natural numbers $1$, $2$ and $3$, for example, we will write
       
   128 with curly braces as
   113 
   129 
   114 \[
   130 \[
   115 \{1, 2, 3\}
   131 \{1, 2, 3\}
   116 \]
   132 \]
   117 
   133 
   118 \noindent The notation $\in$ means \emph{element of}, so $1
   134 \noindent The notation $\in$ means \emph{element of}, so $1
   119 \in \{1, 2, 3\}$ is true and $3 \in \{1, 2, 3\}$ is false.
   135 \in \{1, 2, 3\}$ is true and $3 \in \{1, 2, 3\}$ is false.
   120 Sets can potentially have infinitely many elements. For
   136 Sets can potentially have infinitely many elements. For
   121 example the set of all natural numbers $\{0, 1, 2, \ldots\}$
   137 example the set of all natural numbers $\{0, 1, 2, \ldots\}$
   122 is infinite. This set is often also abbreviated as
   138 is infinite. This set is often also abbreviated as
   123 $\mathbb{N}$. We can define sets by giving all elements, like
   139 $\mathbb{N}$. We can define sets by giving all elements, for
   124 $\{0, 1\}$, but also by \defn{set comprehensions}. For example
   140 example $\{0, 1\}$, but also by \defn{set comprehensions}. For
   125 the set of all even natural numbers can be defined as
   141 example the set of all even natural numbers can be defined as
   126 
   142 
   127 \[
   143 \[
   128 \{n\;|\;n\in\mathbb{N} \wedge n\;\text{is even}\}
   144 \{n\;|\;n\in\mathbb{N} \wedge n\;\text{is even}\}
   129 \]
   145 \]
   130   
   146   
   142 A \cup B & \dn & \{x\;|\; x \in A \vee x \in B\}\\
   158 A \cup B & \dn & \{x\;|\; x \in A \vee x \in B\}\\
   143 A \cap B & \dn & \{x\;|\; x \in A \wedge x \in B\}\\
   159 A \cap B & \dn & \{x\;|\; x \in A \wedge x \in B\}\\
   144 A \backslash B & \dn & \{x\;|\; x \in A \wedge x \not\in B\} 
   160 A \backslash B & \dn & \{x\;|\; x \in A \wedge x \not\in B\} 
   145 \end{eqnarray*}
   161 \end{eqnarray*}
   146 
   162 
   147 \noindent For defining sets, we will also often use the notion
   163 \noindent In general set comprehensions are of the form
   148 of the ``big union''. An example is as follows:
   164 $\{a\;|\;P\}$ which stands for the set of all elements $a$
       
   165 (from some set) for which some property $P$ holds.
       
   166  
       
   167 For defining sets, we will also often use the notion of the
       
   168 ``big union''. An example is as follows:
   149 
   169 
   150 \begin{equation}\label{bigunion}
   170 \begin{equation}\label{bigunion}
   151 \bigcup_{0\le n}\; \{n^2, n^2 + 1\}
   171 \bigcup_{0\le n}\; \{n^2, n^2 + 1\}
   152 \end{equation}
   172 \end{equation}
   153 
   173 
   167 \ldots
   187 \ldots
   168 \]
   188 \]
   169 
   189 
   170 \noindent but using the big union notation is more concise.
   190 \noindent but using the big union notation is more concise.
   171 
   191 
   172 An important notion in this module are \defn{Languages}, which
   192 An important notion in this module are \defn{languages}, which
   173 are sets of strings. The main goal for us will be how to
   193 are sets of strings. The main goal for us will be how to
   174 (formally) specify languages and to find out whether a string
   194 (formally) specify languages and to find out whether a string
   175 is in a language or not. Note that the language containing the
   195 is in a language or not.\footnote{You might wish to ponder
   176 empty string $\{\dq{}\}$ is not equal to the empty language
   196 whether this is in general a hard or easy problem, where
   177 (or empty set): The former contains one element, namely \dq{}
   197 hardness is meant in terms of Turing decidable, for example.}
   178 (also written $[\,]$), but the latter does not contain any.
   198 Note that the language containing the empty string $\{\dq{}\}$
   179 
   199 is not equal to $\varnothing$, the empty language (or empty
       
   200 set): The former contains one element, namely \dq{} (also
       
   201 written $[\,]$), but the latter does not contain any
       
   202 element.
   180 
   203 
   181 For languages we define the operation of \defn{language
   204 For languages we define the operation of \defn{language
   182 concatenation}, written $A @ B$:
   205 concatenation}, written like in the string case as $A @ B$:
   183 
   206 
   184 \begin{equation}\label{langconc}
   207 \begin{equation}\label{langconc}
   185 A @ B \dn \{s_1 @ s_2\;|\; s_1\in A \wedge s_2\in B\}
   208 A @ B \dn \{s_1 @ s_2\;|\; s_1\in A \wedge s_2\in B\}
   186 \end{equation}
   209 \end{equation}
   187 
   210 
   188 \noindent Be careful to understand the difference: the $@$
   211 \noindent Be careful to understand the difference: the $@$
   189 in $s_1 @ s_2$ is string concatenation, while $A @ B$ refers 
   212 in $s_1 @ s_2$ is string concatenation, while $A @ B$ refers 
   190 to the concatenation of two languages (or sets of strings).
   213 to the concatenation of two languages (or sets of strings).
   191 As an example suppose $A=\{ab, ac\}$ and $B=\{zzz, qq, r\}$,
   214 As an example suppose $A=\{ab, ac\}$ and $B=\{zzz, qq, r\}$,
   192 then $A \,@\, B$ is
   215 then $A \,@\, B$ is the language
   193 
       
   194 
   216 
   195 \[
   217 \[
   196 \{abzzz, abqq, abr, aczzz, acqq, acr\}
   218 \{abzzz, abqq, abr, aczzz, acqq, acr\}
   197 \] 
   219 \] 
   198 
   220 
   206 zero element:  & $A \,@\, \varnothing = \varnothing \,@\, A = 
   228 zero element:  & $A \,@\, \varnothing = \varnothing \,@\, A = 
   207 \varnothing$
   229 \varnothing$
   208 \end{tabular}
   230 \end{tabular}
   209 \end{center}
   231 \end{center}
   210 
   232 
   211 \noindent Note the difference: the empty set behaves like $0$
   233 \noindent Note the difference in the last two lines: the empty
   212 for multiplication and the set $\{[]\}$ like $1$ for
   234 set behaves like $0$ for multiplication and the set $\{[]\}$
   213 multiplication.
   235 like $1$ for multiplication.
   214 
   236 
   215 Following the language concatenation, we can define a
   237 Following the language concatenation, we can define a
   216 \defn{language power} operation as follows:
   238 \defn{language power} operation as follows:
   217 
   239 
   218 \begin{eqnarray*}
   240 \begin{eqnarray*}
   219 A^0     & \dn & \{[]\}\\
   241 A^0     & \dn & \{[]\}\\
   220 A^{n+1} & \dn & A \,@\, A^n
   242 A^{n+1} & \dn & A \,@\, A^n
   221 \end{eqnarray*}
   243 \end{eqnarray*}
   222 
   244 
   223 \noindent This definition is by induction on natural numbers.
   245 \noindent This definition is by recursion on natural numbers.
   224 Note carefully that the zero-case is not defined as the empty
   246 Note carefully that the zero-case is not defined as the empty
   225 set, but the set containing the empty string. So no matter
   247 set, but the set containing the empty string. So no matter
   226 what the set $A$ is, $A^0$ will always be $\{[]\}$. (There is
   248 what the set $A$ is, $A^0$ will always be $\{[]\}$. (There is
   227 another hint about a connection between the $@$-operation and
   249 another hint about a connection between the $@$-operation and
   228 multiplication: How is $x^n$ defined and what is $x^0$?)
   250 multiplication: How is $x^n$ defined and what is $x^0$?)
   229 
   251 
   230 Next we can define the \defn{star operation} for languages: 
   252 Next we can define the \defn{star operation} for languages: 
   231 $A^*$ is the union of all powers of $A$, or short
   253 $A^*$ is the union of all powers of $A$, or short
   232 
   254 
   233 \[
   255 \begin{equation}\label{star}
   234 A^* \dn \bigcup_{0\le n}\; A^n
   256 A^* \dn \bigcup_{0\le n}\; A^n
   235 \]
   257 \end{equation}
   236 
   258 
   237 \noindent
   259 \noindent This star operation is often also called
   238 Unfolding this definition
   260 \emph{Kleene-star}. Unfolding the definition in \eqref{star}
       
   261 gives
   239 
   262 
   240 \[
   263 \[
   241 A^0 \cup A^1 \cup A^2 \cup A^3 \cup \ldots
   264 A^0 \cup A^1 \cup A^2 \cup A^3 \cup \ldots
   242 \]
   265 \]
   243 
   266 
   246 
   269 
   247 \[
   270 \[
   248 \{[]\} \,\cup\, A \,\cup\, A @ A \,\cup\, A @ A @ A \,\cup\, \ldots
   271 \{[]\} \,\cup\, A \,\cup\, A @ A \,\cup\, A @ A @ A \,\cup\, \ldots
   249 \]
   272 \]
   250 
   273 
   251 \noindent we can see that the empty string is always in $A^*$,
   274 \noindent We can see that the empty string is always in $A^*$,
   252 no matter what $A$ is. This is because $[] \in A^0$. To make
   275 no matter what $A$ is. This is because $[] \in A^0$. To make
   253 sure you understand these definition, I leave you to answer
   276 sure you understand these definitions, I leave you to answer
   254 what $\{[]\}^*$ and $\varnothing^*$ are. 
   277 what $\{[]\}^*$ and $\varnothing^*$ are. 
   255 
   278 
   256 Recall that an alphabet is often referred to by the letter
   279 Recall that an alphabet is often referred to by the letter
   257 $\Sigma$. We can now write for the set of all strings over
   280 $\Sigma$. We can now write for the set of all strings over
   258 this alphabet $\Sigma^*$. In doing so we also include the 
   281 this alphabet $\Sigma^*$. In doing so we also include the 
   259 empty string as a possible string over $\Sigma$. So if
   282 empty string as a possible string over $\Sigma$. So if
   260 $\Sigma = \{a, b\}$ then $\Sigma^*$ is
   283 $\Sigma = \{a, b\}$, then $\Sigma^*$ is
   261 
   284 
   262 \[
   285 \[
   263 \{[], a, b, ab, ba, aaa, aab, aba, abb, baa, bab, \ldots\}
   286 \{[], a, b, ab, ba, aaa, aab, aba, abb, baa, bab, \ldots\}
   264 \]
   287 \]
   265 
   288