6 |
6 |
7 \begin{document} |
7 \begin{document} |
8 |
8 |
9 \section*{A Crash-Course on Notation} |
9 \section*{A Crash-Course on Notation} |
10 |
10 |
11 There are innumerable books available about compiler, automata |
11 There are innumerable books available about compilers, |
12 and formal languages. Unfortunately, they often use their own |
12 automata theory and formal languages. Unfortunately, they |
13 notational conventions and their own symbols. This handout is |
13 often use their own notational conventions and their own |
14 meant to clarify some of the notation I will use. |
14 symbols. This handout is meant to clarify some of the notation |
|
15 I will use. I appologise in advance that sometimes I will be a |
|
16 bit fuzzy\ldots the problem is that often we want to have |
|
17 convenience in our mathematical definitions (to make them |
|
18 readable and understandable), but sometimes we need precision |
|
19 for actual programs. |
15 |
20 |
16 \subsubsection*{Characters and Strings} |
21 \subsubsection*{Characters and Strings} |
17 |
22 |
18 The most important concept in this module are strings. Strings |
23 The most important concept in this module are strings. Strings |
19 are composed of \defn{characters}. While characters are surely |
24 are composed of \defn{characters}. While characters are surely |
38 \noindent But often we do not care which particular characters |
43 \noindent But often we do not care which particular characters |
39 we use. In such cases we use the italic font and write $a$, |
44 we use. In such cases we use the italic font and write $a$, |
40 $b$, $c$ and so on for characters. Therefore if we need a |
45 $b$, $c$ and so on for characters. Therefore if we need a |
41 representative string, we might write |
46 representative string, we might write |
42 |
47 |
43 \begin{equation}\label{abracadabra} |
48 \[ |
44 abracadabra |
49 abracadabra |
45 \end{equation} |
50 \] |
46 |
51 |
47 \noindent In this string, we do not really care what the |
52 \noindent In this string, we do not really care what the |
48 characters stand for, except we do care about the fact that |
53 characters stand for, except we do care about the fact that |
49 for example the character $a$ is not equal to $b$ and so on. |
54 for example the character $a$ is not equal to $b$ and so on. |
|
55 Why do I make this distinction? Because we often need to |
|
56 define functions using variables ranging over characters. We |
|
57 need to somehow say this is a variable, say $c$, ranging over |
|
58 characters, while this is the atual character \pcode{c}. |
50 |
59 |
51 An \defn{alphabet} is a (non-empty) finite set of characters. |
60 An \defn{alphabet} is a (non-empty) finite set of characters. |
52 Often the letter $\Sigma$ is used to refer to an alphabet. For |
61 Often the letter $\Sigma$ is used to refer to an alphabet. For |
53 example the ASCII characters \pcode{a} to \pcode{z} form an |
62 example the ASCII characters \pcode{a} to \pcode{z} form an |
54 alphabet. The digits $0$ to $9$ are another alphabet. The |
63 alphabet. The digits $0$ to $9$ are another alphabet. The |
60 case we will state that the alphabet is the set $\{a, b\}$. |
69 case we will state that the alphabet is the set $\{a, b\}$. |
61 |
70 |
62 \defn{Strings} are lists of characters. Unfortunately, there |
71 \defn{Strings} are lists of characters. Unfortunately, there |
63 are many ways how we can write down strings. In programming |
72 are many ways how we can write down strings. In programming |
64 languages, they are usually written as \dq{$hello$} where the |
73 languages, they are usually written as \dq{$hello$} where the |
65 double quotes indicate that we are dealing with a string. But |
74 double quotes indicate that we are dealing with a string. In |
66 since we regard strings as lists of characters we could also |
75 programming languages, such as Scala, strings have a special |
67 write this string as |
76 type---namely \pcode{String} which is different from the type |
68 |
77 for lists of chatacters. This is because strings can be |
69 \[ |
78 efficiently represented in memory, unlike general lists. Since |
70 [\text{\it h, e, l, l, o}] \;\;\text{or simply}\;\; \textit{hello} |
79 \code{String} and the type of lists of characters, |
|
80 \code{List[Char]} are not the same, we need to explicitly |
|
81 coerce elements between the two types, for example |
|
82 |
|
83 \begin{lstlisting}[numbers=none] |
|
84 scala> "abc".toList |
|
85 res01: List[Char] = List(a, b, c) |
|
86 \end{lstlisting} |
|
87 |
|
88 \noindent Since in our (mathematical) definitions we regard |
|
89 strings as lists of characters, we will also write |
|
90 \dq{$hello$} as |
|
91 |
|
92 \[ |
|
93 [\text{\it h, e, l, l, o}] \qquad\text{or simply}\qquad \textit{hello} |
71 \] |
94 \] |
72 |
95 |
73 \noindent The important point is that we can always decompose |
96 \noindent The important point is that we can always decompose |
74 such strings. For example, we will often consider the first |
97 such strings. For example, we will often consider the first |
75 character of a string, say $h$, and the ``rest'' of a string |
98 character of a string, say $h$, and the ``rest'' of a string |
79 characters $[\,]$.\footnote{In the literature you can also |
102 characters $[\,]$.\footnote{In the literature you can also |
80 often find that $\varepsilon$ or $\lambda$ is used to |
103 often find that $\varepsilon$ or $\lambda$ is used to |
81 represent the empty string.} |
104 represent the empty string.} |
82 |
105 |
83 Two strings, say $s_1$ and $s_2$, can be \defn{concatenated}, |
106 Two strings, say $s_1$ and $s_2$, can be \defn{concatenated}, |
84 which we write as $s_1 @ s_2$. Suppose we are given two |
107 which we write as $s_1 @ s_2$. If we regard $s_1$ and $s_2$ as |
85 strings \dq{\textit{foo}} and \dq{\textit{bar}}, then their |
108 lists of characters, then $@$ is the list-append function. |
86 concatenation, writen \dq{\textit{foo}} $@$ \dq{\textit{bar}}, |
109 Suppose we are given two strings \dq{\textit{foo}} and |
87 gives \dq{\textit{foobar}}. Often we will simplify our life |
110 \dq{\textit{bar}}, then their concatenation, writen |
88 and just drop the double quotes whenever it is clear we are |
111 \dq{\textit{foo}} $@$ \dq{\textit{bar}}, gives |
89 talking about strings, writing as already in |
112 \dq{\textit{foobar}}. But as said above, we will often |
90 \eqref{abracadabra} just \textit{foo}, \textit{bar}, |
113 simplify our life and just drop the double quotes whenever it |
91 \textit{foobar} or \textit{foo $@$ bar}. |
114 is clear we are talking about strings, So we will often just |
92 |
115 write \textit{foo}, \textit{bar}, \textit{foobar} or |
93 Some simple properties of string concatenation hold. For |
116 \textit{foo $@$ bar}. |
94 example the concatenation operation is \emph{associative}, |
|
95 meaning |
|
96 |
|
97 \[(s_1 @ s_2) @ s_3 = s_1 @ (s_2 @ s_3)\] |
|
98 |
|
99 \noindent are always equal strings. The empty string behaves |
|
100 like a unit element, therefore |
|
101 |
|
102 \[s \,@\, [] = [] \,@\, s = s\] |
|
103 |
|
104 |
117 |
105 Occasionally we will use the notation $a^n$ for strings, which |
118 Occasionally we will use the notation $a^n$ for strings, which |
106 stands for the string of $n$ repeated $a$s. So $a^{n}b^{n}$ is |
119 stands for the string of $n$ repeated $a$s. So $a^{n}b^{n}$ is |
107 a string that has as many $a$s as $b$s. |
120 a string that has as many $a$s by as many $b$s. |
108 |
121 |
109 Note however that while for us strings are just lists of |
122 A simple property of string concatenation is |
110 characters, programming languages often differentiate between |
123 \emph{associativity}, meaning |
111 the two concepts. In Scala, for example, there is the type of |
124 |
112 \code{String} and the type of lists of characters, |
125 \[(s_1 @ s_2) @ s_3 = s_1 @ (s_2 @ s_3)\] |
113 \code{List[Char]}. They are not the same and we need to |
126 |
114 explicitly coerce elements between the two types, for example |
127 \noindent are always equal strings. The empty string behaves |
115 |
128 like a \emph{unit element}, therefore |
116 \begin{lstlisting}[numbers=none] |
129 |
117 scala> "abc".toList |
130 \[s \,@\, [] = [] \,@\, s = s\] |
118 res01: List[Char] = List(a, b, c) |
|
119 \end{lstlisting} |
|
120 |
131 |
121 |
132 |
122 \subsubsection*{Sets and Languages} |
133 \subsubsection*{Sets and Languages} |
123 |
134 |
124 We will use the familiar operations $\cup$, $\cap$, $\subset$ |
135 We will use the familiar operations $\cup$, $\cap$, $\subset$ |
135 \in \{1, 2, 3\}$ is true and $4 \in \{1, 2, 3\}$ is false. |
146 \in \{1, 2, 3\}$ is true and $4 \in \{1, 2, 3\}$ is false. |
136 Sets can potentially have infinitely many elements. For |
147 Sets can potentially have infinitely many elements. For |
137 example the set of all natural numbers $\{0, 1, 2, \ldots\}$ |
148 example the set of all natural numbers $\{0, 1, 2, \ldots\}$ |
138 is infinite. This set is often also abbreviated as |
149 is infinite. This set is often also abbreviated as |
139 $\mathbb{N}$. We can define sets by giving all elements, for |
150 $\mathbb{N}$. We can define sets by giving all elements, for |
140 example $\{0, 1\}$, but also by \defn{set comprehensions}. For |
151 example $\{0, 1\}$ for the set containing just $0$ and $1$, |
141 example the set of all even natural numbers can be defined as |
152 but also by \defn{set comprehensions}. For example the set of |
|
153 all even natural numbers can be defined as |
142 |
154 |
143 \[ |
155 \[ |
144 \{n\;|\;n\in\mathbb{N} \wedge n\;\text{is even}\} |
156 \{n\;|\;n\in\mathbb{N} \wedge n\;\text{is even}\} |
145 \] |
157 \] |
146 |
158 |
187 \ldots |
199 \ldots |
188 \] |
200 \] |
189 |
201 |
190 \noindent but using the big union notation is more concise. |
202 \noindent but using the big union notation is more concise. |
191 |
203 |
192 An important notion in this module are \defn{languages}, which |
204 While this stuff about sete might all look trivial or even |
|
205 needlessly pedantic, \emph{Nature} is never simple. If you |
|
206 want to be amazed how complicated sets can get, watch out for |
|
207 the last lecture just before Christmas where I want you to |
|
208 convince you of the fact that some sets are more infinite than |
|
209 others. Actually that will be a fact that is very relevant to |
|
210 the material of this module. |
|
211 |
|
212 Another important notion in this module are \defn{languages}, which |
193 are sets of strings. One of the main goals for us will be how to |
213 are sets of strings. One of the main goals for us will be how to |
194 (formally) specify languages and to find out whether a string |
214 (formally) specify languages and to find out whether a string |
195 is in a language or not.\footnote{You might wish to ponder |
215 is in a language or not.\footnote{You might wish to ponder |
196 whether this is in general a hard or easy problem, where |
216 whether this is in general a hard or easy problem, where |
197 hardness is meant in terms of Turing decidable, for example.} |
217 hardness is meant in terms of Turing decidable, for example.} |
249 another hint about a connection between the $@$-operation and |
269 another hint about a connection between the $@$-operation and |
250 multiplication: How is $x^n$ defined recursively and what is |
270 multiplication: How is $x^n$ defined recursively and what is |
251 $x^0$?) |
271 $x^0$?) |
252 |
272 |
253 Next we can define the \defn{star operation} for languages: |
273 Next we can define the \defn{star operation} for languages: |
254 $A^*$ is the union of all powers of $A$, or short |
274 $A\star$ is the union of all powers of $A$, or short |
255 |
275 |
256 \begin{equation}\label{star} |
276 \begin{equation}\label{star} |
257 A^* \dn \bigcup_{0\le n}\; A^n |
277 A\star \dn \bigcup_{0\le n}\; A^n |
258 \end{equation} |
278 \end{equation} |
259 |
279 |
260 \noindent This star operation is often also called |
280 \noindent This star operation is often also called |
261 \emph{Kleene-star}. Unfolding the definition in \eqref{star} |
281 \emph{Kleene-star}. Unfolding the definition in \eqref{star} |
262 gives |
282 gives |
270 |
290 |
271 \[ |
291 \[ |
272 \{[]\} \,\cup\, A \,\cup\, A @ A \,\cup\, A @ A @ A \,\cup\, \ldots |
292 \{[]\} \,\cup\, A \,\cup\, A @ A \,\cup\, A @ A @ A \,\cup\, \ldots |
273 \] |
293 \] |
274 |
294 |
275 \noindent We can see that the empty string is always in $A^*$, |
295 \noindent We can see that the empty string is always in $A\star$, |
276 no matter what $A$ is. This is because $[] \in A^0$. To make |
296 no matter what $A$ is. This is because $[] \in A^0$. To make |
277 sure you understand these definitions, I leave you to answer |
297 sure you understand these definitions, I leave you to answer |
278 what $\{[]\}^*$ and $\varnothing^*$ are. |
298 what $\{[]\}\star$ and $\varnothing\star$ are? |
279 |
299 |
280 Recall that an alphabet is often referred to by the letter |
300 Recall that an alphabet is often referred to by the letter |
281 $\Sigma$. We can now write for the set of all strings over |
301 $\Sigma$. We can now write for the set of \emph{all} strings |
282 this alphabet $\Sigma^*$. In doing so we also include the |
302 over this alphabet $\Sigma\star$. In doing so we also include the |
283 empty string as a possible string over $\Sigma$. So if |
303 empty string as a possible string over $\Sigma$. So if $\Sigma |
284 $\Sigma = \{a, b\}$, then $\Sigma^*$ is |
304 = \{a, b\}$, then $\Sigma\star$ is |
285 |
305 |
286 \[ |
306 \[ |
287 \{[], a, b, aa, ab, ba, bb, aaa, aab, aba, abb, baa, bab, \ldots\} |
307 \{[], a, b, aa, ab, ba, bb, aaa, aab, aba, abb, baa, bab, \ldots\} |
288 \] |
308 \] |
289 |
309 |