handouts/notation.tex
changeset 736 d3e477fe6c66
parent 550 71fc4a7a7039
child 830 dbf9d710ce65
equal deleted inserted replaced
735:fc2e3609d5e5 736:d3e477fe6c66
     3 \usepackage{../langs}
     3 \usepackage{../langs}
     4 
     4 
     5 
     5 
     6 
     6 
     7 \begin{document}
     7 \begin{document}
     8 \fnote{\copyright{} Christian Urban, King's College London, 2014, 2015, 2016, 2017, 2018}
     8 \fnote{\copyright{} Christian Urban, King's College London, 2014, 2015, 2016, 2017, 2018, 2020}
     9 
     9 
    10 
    10 
    11 \section*{A Crash-Course on Notation}
    11 \section*{A Crash-Course on Notation}
    12 
    12 
    13 There are innumerable books available about compilers, automata theory
    13 There are innumerable books available about compilers, automata theory
    19 readable and understandable), but other times we need pedantic
    19 readable and understandable), but other times we need pedantic
    20 precision for actual programs.
    20 precision for actual programs.
    21 
    21 
    22 \subsubsection*{Characters and Strings}
    22 \subsubsection*{Characters and Strings}
    23 
    23 
    24 The most important concept in this module are strings. Strings
    24 The most basic concept in this module are strings. Strings
    25 are composed of \defn{characters}. While characters are surely
    25 are composed of \defn{characters}. While characters are surely
    26 a familiar concept, we will make one subtle distinction in
    26 a familiar concept, we will make one subtle distinction in
    27 this module. If we want to refer to concrete characters, like
    27 this module. If we want to refer to concrete characters, like
    28 \code{a}, \code{b}, \code{c} and so on, we use a typewriter font.
    28 \code{a}, \code{b}, \code{c} and so on, we will use a typewriter font.
    29 Accordingly if we want to refer to the concrete characters of
    29 Accordingly if we want to refer to the concrete characters of
    30 my email address we shall write
    30 my email address we shall write
    31 
    31 
    32 \begin{center}
    32 \begin{center}
    33 \pcode{christian.urban@kcl.ac.uk}
    33 \pcode{christian.urban@kcl.ac.uk}
    53 \noindent In this string, we do not really care what the
    53 \noindent In this string, we do not really care what the
    54 characters stand for, except we do care about the fact that
    54 characters stand for, except we do care about the fact that
    55 for example the character $a$ is not equal to $b$ and so on.
    55 for example the character $a$ is not equal to $b$ and so on.
    56 Why do I make this distinction? Because we often need to
    56 Why do I make this distinction? Because we often need to
    57 define functions using variables ranging over characters. We
    57 define functions using variables ranging over characters. We
    58 need to somehow say this is a variable, say $c$, ranging over
    58 need to somehow say ``this-is-a-variable'' and give it a name. 
    59 characters, while this is the actual character \pcode{c}.
    59 In such cases we use the italic font.
       
    60 
    60 
    61 
    61 An \defn{alphabet} is a (non-empty) finite set of characters.
    62 An \defn{alphabet} is a (non-empty) finite set of characters.
    62 Often the letter $\Sigma$ is used to refer to an alphabet. For
    63 Often the letter $\Sigma$ is used to refer to an alphabet. For
    63 example the ASCII characters \pcode{a} to \pcode{z} form an
    64 example the ASCII characters \pcode{a} to \pcode{z} form an
    64 alphabet. The digits $0$ to $9$ are another alphabet. The
    65 alphabet. The digits $0$ to $9$ are another alphabet. The
    76 typed programming languages, such as Scala, strings have a special
    77 typed programming languages, such as Scala, strings have a special
    77 type---namely \pcode{String} which is different from the type
    78 type---namely \pcode{String} which is different from the type
    78 for lists of characters. This is because strings can be
    79 for lists of characters. This is because strings can be
    79 efficiently represented in memory, unlike lists. Since
    80 efficiently represented in memory, unlike lists. Since
    80 \code{String} and the type of lists of characters
    81 \code{String} and the type of lists of characters
    81 (\code{List[Char]}) are not the same, we need to explicitly
    82 (namely \code{List[Char]}) are not the same, we need to explicitly
    82 coerce elements between the two types, for example
    83 coerce elements between the two types, for example
    83 
    84 
    84 \begin{lstlisting}[numbers=none]
    85 \begin{lstlisting}[numbers=none]
    85 scala> "abc".toList
    86 scala> "abc".toList
    86 res01: List[Char] = List(a, b, c)
    87 res01: List[Char] = List(a, b, c)
   102 say \dq{\textit{ello}} when making definitions about strings.
   103 say \dq{\textit{ello}} when making definitions about strings.
   103 There are also some subtleties with the empty string,
   104 There are also some subtleties with the empty string,
   104 sometimes written as \dq{} but also as the empty list of
   105 sometimes written as \dq{} but also as the empty list of
   105 characters $[\,]$.\footnote{In the literature you can also
   106 characters $[\,]$.\footnote{In the literature you can also
   106 often find that $\varepsilon$ or $\lambda$ is used to
   107 often find that $\varepsilon$ or $\lambda$ is used to
   107 represent the empty string.} 
   108 represent the empty string. But we are not going to use this notation.} 
   108 
   109 
   109 Two strings, say $s_1$ and $s_2$, can be \defn{concatenated},
   110 Two strings, say $s_1$ and $s_2$, can be \defn{concatenated},
   110 which we write as $s_1 @ s_2$. If we regard $s_1$ and $s_2$ as
   111 which we write as $s_1 @ s_2$. If we regard $s_1$ and $s_2$ as
   111 lists of characters, then $@$ is the list-append function.
   112 lists of characters, then $@$ is the list-append function.
   112 Suppose we are given two strings \dq{\textit{foo}} and
   113 Suppose we are given two strings \dq{\textit{foo}} and
   113 \dq{\textit{bar}}, then their concatenation, written
   114 \dq{\textit{bar}}, then their concatenation, written
   114 \dq{\textit{foo}} $@$ \dq{\textit{bar}}, gives
   115 \dq{\textit{foo}} $@$ \dq{\textit{bar}}, gives
   115 \dq{\textit{foobar}}. But as said above, we will often
   116 \dq{\textit{foobar}}. But as said above, we will often
   116 simplify our life and just drop the double quotes whenever it
   117 simplify our life and just drop the double quotes whenever it
   117 is clear we are talking about strings. So we will often just
   118 is clear we are talking about strings. So we will just
   118 write \textit{foo}, \textit{bar}, \textit{foobar} or
   119 write \textit{foo}, \textit{bar}, \textit{foobar} 
   119 \textit{foo $@$ bar}.
   120 \textit{foo $@$ bar} and so on.
   120 
   121 
   121 Occasionally we will use the notation $a^n$ for strings, which stands
   122 Occasionally we will use the notation $a^n$ for strings, which stands
   122 for the string of $n$ repeated $a$s. So $a^{n}b^{n}$ is a string that
   123 for the string of $n$ repeated $a$s. So $a^{n}b^{n}$ is a string that
   123 has some number of $a$s followed by the same number of $b$s.  A simple
   124 has some number of $a$s followed by the same number of $b$s.
   124 property of string concatenation is \emph{associativity}, meaning
   125 Confusingly, in Scala the notation is ``times'' for this opration.
       
   126 So you can write 
       
   127 
       
   128 \begin{lstlisting}[numbers=none]
       
   129 scala> "a" * 13
       
   130 val res02: String = aaaaaaaaaaaaa
       
   131 \end{lstlisting}
       
   132 
       
   133 \noindent
       
   134 A simple property of string concatenation is \emph{associativity}, meaning
   125 
   135 
   126 \[(s_1 @ s_2) @ s_3 = s_1 @ (s_2 @ s_3)\]  
   136 \[(s_1 @ s_2) @ s_3 = s_1 @ (s_2 @ s_3)\]  
   127 
   137 
   128 \noindent are always equal strings. The empty string behaves
   138 \noindent are always equal strings. The empty string behaves
   129 like a \emph{unit element}, therefore
   139 like a \emph{unit element}, therefore
   183 A \backslash B & \dn & \{x\;|\; x \in A \wedge x \not\in B\} 
   193 A \backslash B & \dn & \{x\;|\; x \in A \wedge x \not\in B\} 
   184 \end{eqnarray*}
   194 \end{eqnarray*}
   185 
   195 
   186 \noindent In general set comprehensions are of the form
   196 \noindent In general set comprehensions are of the form
   187 $\{a\;|\;P\}$ which stands for the set of all elements $a$
   197 $\{a\;|\;P\}$ which stands for the set of all elements $a$
   188 (from some set) for which some property $P$ holds.
   198 (from some set) for which some property $P$ holds. If programming
       
   199 is more your-kind-of-thing, you might recognise the similarities
       
   200 with for-comprehensions, for example for the silly set above you
       
   201 could write
       
   202 
       
   203 \begin{lstlisting}[numbers=none]
       
   204 scala> for (n <- (0 to 10).toSet; if n * n < 9) yield n
       
   205 val res03: Set[Int] = Set(0, 1, 2)
       
   206 \end{lstlisting}
       
   207 
       
   208 \noindent
       
   209 This is pretty much the same as $\{n\;|\;  n \in \mathbb{N} \wedge n^2 < 9\}$
       
   210 just in Scala syntax.
   189  
   211  
   190 For defining sets, we will also often use the notion of the
   212 For defining sets, we will also often use the notion of the
   191 ``big union''. An example is as follows:
   213 ``big union''. An example is as follows:
   192 
   214 
   193 \begin{equation}\label{bigunion}
   215 \begin{equation}\label{bigunion}
   232 
   254 
   233 \noindent
   255 \noindent
   234 contain actually the same amount of elements. Does this make sense?
   256 contain actually the same amount of elements. Does this make sense?
   235 Though this might all look strange, infinite sets will be a
   257 Though this might all look strange, infinite sets will be a
   236 topic that is very relevant to the material of this module. It tells
   258 topic that is very relevant to the material of this module. It tells
   237 us what we can compute with a computer (actually algorithm) and what
   259 us what we can compute with a computer (actually an algorithm) and what
   238 we cannot. But during the first 9 lectures we can go by without this
   260 we cannot. But during the first 9 lectures we can go by without this
   239 ``weird'' stuff. End of aside.\smallskip
   261 ``weird'' stuff. End of aside.\smallskip
   240 
   262 
   241 Another important notion in this module are \defn{languages}, which
   263 Another important notion in this module are \defn{languages}, which
   242 are sets of strings. One of the main goals for us will be how to
   264 are sets of strings. One of the main goals for us will be how to
   246 hardness is meant in terms of Turing decidable, for example.}
   268 hardness is meant in terms of Turing decidable, for example.}
   247 Note that the language containing the empty string $\{\dq{}\}$
   269 Note that the language containing the empty string $\{\dq{}\}$
   248 is not equal to $\varnothing$, the empty language (or empty
   270 is not equal to $\varnothing$, the empty language (or empty
   249 set): The former contains one element, namely \dq{} (also
   271 set): The former contains one element, namely \dq{} (also
   250 written $[\,]$), but the latter does not contain any
   272 written $[\,]$), but the latter does not contain any
   251 element.
   273 element at all! Make sure you see the difference.
   252 
   274 
   253 For languages we define the operation of \defn{language
   275 For languages we define the operation of \defn{language
   254 concatenation}, written like in the string case as $A @ B$:
   276 concatenation}, written like in the string case as $A @ B$:
   255 
   277 
   256 \begin{equation}\label{langconc}
   278 \begin{equation}\label{langconc}
   265 
   287 
   266 \[
   288 \[
   267 \{abzzz, abqq, abr, aczzz, acqq, acr\}
   289 \{abzzz, abqq, abr, aczzz, acqq, acr\}
   268 \] 
   290 \] 
   269 
   291 
   270 \noindent Recall the properties for string concatenation. For
   292 \noindent The cool thing about Scala is that we can define language
       
   293 concatenation very elegantly as
       
   294 
       
   295 \begin{lstlisting}[numbers=none]
       
   296 def concat(A: Set[String], B: Set[String]) =
       
   297   for (x <- A ; y <- B) yield x ++ y
       
   298 \end{lstlisting}
       
   299 
       
   300 \noindent
       
   301 where \code{++} is string concatenation in Scala.
       
   302 
       
   303 Recall the properties for string concatenation. For
   271 language concatenation we have the following properties
   304 language concatenation we have the following properties
   272 
   305 
   273 \begin{center}
   306 \begin{center}
   274 \begin{tabular}{ll}
   307 \begin{tabular}{ll}
   275 associativity: & $(A @ B) @ C = A @ (B @ C)$\\
   308 associativity: & $(A @ B) @ C = A @ (B @ C)$\\
   278 \varnothing$
   311 \varnothing$
   279 \end{tabular}
   312 \end{tabular}
   280 \end{center}
   313 \end{center}
   281 
   314 
   282 \noindent Note the difference in the last two lines: the empty
   315 \noindent Note the difference in the last two lines: the empty
   283 set behaves like $0$ for multiplication and the set $\{[]\}$
   316 set behaves like $0$ for multiplication, whereas the set $\{[]\}$
   284 like $1$ for multiplication ($n * 1 = n$ and $n * 0 = 0$).
   317 behaves like $1$ for multiplication ($n * 1 = n$ and $n * 0 = 0$).
       
   318 Again this is a subtletly you need to get compfortable with.
   285 
   319 
   286 Using the operation of language concatenation, we can define a
   320 Using the operation of language concatenation, we can define a
   287 \defn{language power} operation as follows:
   321 \defn{language power} operation as follows:
   288 
   322 
   289 \begin{eqnarray*}
   323 \begin{eqnarray*}
   305 \begin{equation}\label{star}
   339 \begin{equation}\label{star}
   306 A\star \dn \bigcup_{0\le n}\; A^n
   340 A\star \dn \bigcup_{0\le n}\; A^n
   307 \end{equation}
   341 \end{equation}
   308 
   342 
   309 \noindent This star operation is often also called
   343 \noindent This star operation is often also called
   310 \emph{Kleene-star}. Unfolding the definition in~\eqref{star}
   344 \emph{Kleene-star} after the mathematician/computer scientist
       
   345 Stephen Cole Kleene. Unfolding the definition in~\eqref{star}
   311 gives
   346 gives
   312 
   347 
   313 \[
   348 \[
   314 A\star \dn A^0 \cup A^1 \cup A^2 \cup A^3 \cup \ldots
   349 A\star \dn A^0 \cup A^1 \cup A^2 \cup A^3 \cup \ldots
   315 \]
   350 \]
   327 what $\{[]\}\star$ and $\varnothing\star$ are?
   362 what $\{[]\}\star$ and $\varnothing\star$ are?
   328 
   363 
   329 Recall that an alphabet is often referred to by the letter
   364 Recall that an alphabet is often referred to by the letter
   330 $\Sigma$. We can now write for the set of \emph{all} strings
   365 $\Sigma$. We can now write for the set of \emph{all} strings
   331 over this alphabet as $\Sigma\star$. In doing so we also include the
   366 over this alphabet as $\Sigma\star$. In doing so we also include the
   332 empty string as a possible string over $\Sigma$. So if $\Sigma
   367 empty string as a possible string (over $\Sigma$). Assuming $\Sigma
   333 = \{a, b\}$, then $\Sigma\star$ is
   368 = \{a, b\}$, then $\Sigma\star$ is
   334 
   369 
   335 \[
   370 \[
   336 \{[], a, b, aa, ab, ba, bb, aaa, aab, aba, abb, baa, bab, \ldots\}
   371 \{[], a, b, aa, ab, ba, bb, aaa, aab, aba, abb, baa, bab, \ldots\}
   337 \]
   372 \]
   338 
   373 
   339 \noindent or in other words all strings containing $a$s and
   374 \noindent or in words all strings containing $a$s and
   340 $b$s only, plus the empty string.
   375 $b$s only, plus the empty string.
   341 
   376 \bigskip
       
   377 
       
   378 \noindent
       
   379 Thanks for making it until here! There are also some personal conventions
       
   380 about regular expressions. But I will explain them in the handout for the
       
   381 first week. An exercise you can do: Implement the power operation for languages
       
   382 and try out some examples.
   342 \end{document}
   383 \end{document}
   343 
   384 
   344 %%% Local Variables: 
   385 %%% Local Variables: 
   345 %%% mode: latex
   386 %%% mode: latex
   346 %%% TeX-master: t
   387 %%% TeX-master: t