6 |
6 |
7 \begin{document} |
7 \begin{document} |
8 |
8 |
9 \section*{A Crash-Course on Notation} |
9 \section*{A Crash-Course on Notation} |
10 |
10 |
|
11 There are innumerable books available about automata and |
|
12 formal languages. Unfortunately, they often use their own |
|
13 notational conventions and their own symbols. This handout is |
|
14 meant to clarify some of the notation I will use. |
|
15 |
11 \subsubsection*{Characters and Strings} |
16 \subsubsection*{Characters and Strings} |
12 |
17 |
13 In this module we will often use \defn{characters}. While they |
18 The most important concept in this module are strings. Strings |
14 are surely familiar, we will make one subtle distinction. If |
19 are composed of \defn{characters}. While characters are surely |
15 we want to refer to concrete characters, like \code{a}, |
20 a familiar concept, we will make one subtle distinction in |
16 \code{b} and so on, we use a typewriter font. So if we want to |
21 this module. If we want to refer to concrete characters, like |
17 refer to the concrete characters of my email address we shall |
22 \code{a}, \code{b} and so on, we use a typewriter font. |
18 write |
23 Accordingly if we want to refer to the concrete characters of |
|
24 my email address we shall write |
19 |
25 |
20 \begin{center} |
26 \begin{center} |
21 \pcode{christian.urban@kcl.ac.uk} |
27 \pcode{christian.urban@kcl.ac.uk} |
22 \end{center} |
28 \end{center} |
23 |
29 |
24 \noindent If we need to explicitly indicate the ``space'' |
30 \noindent If we also need to explicitly indicate the ``space'' |
25 character, we write \VS{}\hspace{1mm}. For example |
31 character, we write \VS{}\hspace{1mm}. For example |
26 |
32 |
27 \begin{center} |
33 \begin{center} |
28 \tt{}hello\VS\hspace{0.5mm}world |
34 \tt{}hello\VS\hspace{0.5mm}world |
29 \end{center} |
35 \end{center} |
30 |
36 |
31 |
37 |
32 \noindent But often we do not care |
38 \noindent But often we do not care which particular characters |
33 about which characters we use. In such cases we us the italic |
39 we use. In such cases we use the italic font and write $a$, |
34 font and write $a$, $b$ and so on. So if we need a |
40 $b$ and so on for characters. Therefore if we need a |
35 representative string, we might write |
41 representative string, we might write |
36 |
42 |
37 \begin{equation}\label{abracadabra} |
43 \begin{equation}\label{abracadabra} |
38 abracadabra |
44 abracadabra |
39 \end{equation} |
45 \end{equation} |
40 |
46 |
41 \noindent We do not really care what the characters stand for, |
47 \noindent In this string, we do not really care what the |
42 except we do care about is that for example the character $a$ |
48 characters stand for, except we do care about the fact that |
43 is not equal to $b$. |
49 for example the character $a$ is not equal to $b$ and so on. |
44 |
50 |
45 An \defn{alphabet} is a finite set of characters. Often the |
51 An \defn{alphabet} is a (non-empty) finite set of characters. |
46 letter $\Sigma$ is used to refer to an alphabet. For example |
52 Often the letter $\Sigma$ is used to refer to an alphabet. For |
47 the ASCII characters \pcode{a} to \pcode{z} form an alphabet. |
53 example the ASCII characters \pcode{a} to \pcode{z} form an |
48 The digits $0$ to $9$ are another alphabet. If nothing else is |
54 alphabet. The digits $0$ to $9$ are another alphabet. The |
49 specified, we usually assume the alphabet consists of just the |
55 Greek letters $\alpha$ to $\omega$ also form an alphabet. If |
50 lower-case letters $a$, $b$, \ldots, $z$. Sometimes, however, |
56 nothing else is specified, we usually assume the alphabet |
51 we explicitly restrict strings to contain, for example, only |
57 consists of just the lower-case letters $a$, $b$, \ldots, $z$. |
52 the letters $a$ and $b$. In this case we say the alphabet is |
58 Sometimes, however, we explicitly want to restrict strings to |
53 the set $\{a, b\}$. |
59 contain only the letters $a$ and $b$, for example. In this |
|
60 case we will state that the alphabet is the set $\{a, b\}$. |
54 |
61 |
55 \defn{Strings} are lists of characters. Unfortunately, there |
62 \defn{Strings} are lists of characters. Unfortunately, there |
56 are many ways how we can write down strings. In programming |
63 are many ways how we can write down strings. In programming |
57 languages, they are usually written as \dq{$hello$} where the |
64 languages, they are usually written as \dq{$hello$} where the |
58 double quotes indicate that we dealing with a string. But |
65 double quotes indicate that we are dealing with a string. But |
59 since, strings are lists of characters we could also write |
66 since we regard strings as lists of characters we could also |
60 this string as |
67 write this string as |
61 |
68 |
62 \[ |
69 \[ |
63 [\text{\it h, e, l, l, o}] |
70 [\text{\it h, e, l, l, o}] |
64 \] |
71 \] |
65 |
72 |
66 \noindent The important point is that we can always decompose |
73 \noindent The important point is that we can always decompose |
67 such strings. For example, we will often consider the first |
74 such strings. For example, we will often consider the first |
68 character of a string, say $h$, and the ``rest'' of a string |
75 character of a string, say $h$, and the ``rest'' of a string |
69 say \dq{\textit{ello}} when making definitions about strings. |
76 say \dq{\textit{ello}} when making definitions about strings. |
70 There are some subtleties with the empty string, sometimes |
77 There are also some subtleties with the empty string, |
71 written as \dq{} but also as the empty list of characters |
78 sometimes written as \dq{} but also as the empty list of |
72 $[\,]$. Two strings, for example $s_1$ and $s_2$, can be |
79 characters $[\,]$.\footnote{In the literature you can also |
73 \emph{concatenated}, which we write as $s_1 @ s_2$. Suppose we |
80 often find that $\varepsilon$ or $\lambda$ is used to |
74 are given two strings \dq{\textit{foo}} and \dq{\textit{bar}}, |
81 represent the empty string.} |
75 then their concatenation, writen |
82 |
76 \dq{\textit{foo}} $@$ \dq{\textit{bar}}, gives |
83 Two strings, say $s_1$ and $s_2$, can be \defn{concatenated}, |
77 \dq{\textit{foobar}}. Often we will simplify our life and just |
84 which we write as $s_1 @ s_2$. Suppose we are given two |
78 drop the double quotes whenever it is clear we are talking |
85 strings \dq{\textit{foo}} and \dq{\textit{bar}}, then their |
79 about strings, writing as already in \eqref{abracadabra} just |
86 concatenation, writen \dq{\textit{foo}} $@$ \dq{\textit{bar}}, |
80 \textit{foo}, \textit{bar}, \textit{foobar} or \textit{foo $@$ |
87 gives \dq{\textit{foobar}}. Often we will simplify our life |
81 bar}. |
88 and just drop the double quotes whenever it is clear we are |
|
89 talking about strings, writing as already in |
|
90 \eqref{abracadabra} just \textit{foo}, \textit{bar}, |
|
91 \textit{foobar} or \textit{foo $@$ bar}. |
82 |
92 |
83 Some simple properties of string concatenation hold. For |
93 Some simple properties of string concatenation hold. For |
84 example the concatenation operation is \emph{associative}, |
94 example the concatenation operation is \emph{associative}, |
85 meaning |
95 meaning |
86 |
96 |
104 \end{lstlisting} |
119 \end{lstlisting} |
105 |
120 |
106 |
121 |
107 \subsubsection*{Sets and Languages} |
122 \subsubsection*{Sets and Languages} |
108 |
123 |
109 We will use the familiar operations $\cup$ and $\cap$ for |
124 We will use the familiar operations $\cup$, $\cap$, $\subset$ |
110 sets. For the empty set we will either write $\varnothing$ or |
125 and $\subseteq$ for sets. For the empty set we will either |
111 $\{\,\}$. The set containing, for example, the natural numbers |
126 write $\varnothing$ or $\{\,\}$. The set containing the |
112 $1$, $2$ and $3$ we will write as |
127 natural numbers $1$, $2$ and $3$, for example, we will write |
|
128 with curly braces as |
113 |
129 |
114 \[ |
130 \[ |
115 \{1, 2, 3\} |
131 \{1, 2, 3\} |
116 \] |
132 \] |
117 |
133 |
118 \noindent The notation $\in$ means \emph{element of}, so $1 |
134 \noindent The notation $\in$ means \emph{element of}, so $1 |
119 \in \{1, 2, 3\}$ is true and $3 \in \{1, 2, 3\}$ is false. |
135 \in \{1, 2, 3\}$ is true and $3 \in \{1, 2, 3\}$ is false. |
120 Sets can potentially have infinitely many elements. For |
136 Sets can potentially have infinitely many elements. For |
121 example the set of all natural numbers $\{0, 1, 2, \ldots\}$ |
137 example the set of all natural numbers $\{0, 1, 2, \ldots\}$ |
122 is infinite. This set is often also abbreviated as |
138 is infinite. This set is often also abbreviated as |
123 $\mathbb{N}$. We can define sets by giving all elements, like |
139 $\mathbb{N}$. We can define sets by giving all elements, for |
124 $\{0, 1\}$, but also by \defn{set comprehensions}. For example |
140 example $\{0, 1\}$, but also by \defn{set comprehensions}. For |
125 the set of all even natural numbers can be defined as |
141 example the set of all even natural numbers can be defined as |
126 |
142 |
127 \[ |
143 \[ |
128 \{n\;|\;n\in\mathbb{N} \wedge n\;\text{is even}\} |
144 \{n\;|\;n\in\mathbb{N} \wedge n\;\text{is even}\} |
129 \] |
145 \] |
130 |
146 |
167 \ldots |
187 \ldots |
168 \] |
188 \] |
169 |
189 |
170 \noindent but using the big union notation is more concise. |
190 \noindent but using the big union notation is more concise. |
171 |
191 |
172 An important notion in this module are \defn{Languages}, which |
192 An important notion in this module are \defn{languages}, which |
173 are sets of strings. The main goal for us will be how to |
193 are sets of strings. The main goal for us will be how to |
174 (formally) specify languages and to find out whether a string |
194 (formally) specify languages and to find out whether a string |
175 is in a language or not. Note that the language containing the |
195 is in a language or not.\footnote{You might wish to ponder |
176 empty string $\{\dq{}\}$ is not equal to the empty language |
196 whether this is in general a hard or easy problem, where |
177 (or empty set): The former contains one element, namely \dq{} |
197 hardness is meant in terms of Turing decidable, for example.} |
178 (also written $[\,]$), but the latter does not contain any. |
198 Note that the language containing the empty string $\{\dq{}\}$ |
179 |
199 is not equal to $\varnothing$, the empty language (or empty |
|
200 set): The former contains one element, namely \dq{} (also |
|
201 written $[\,]$), but the latter does not contain any |
|
202 element. |
180 |
203 |
181 For languages we define the operation of \defn{language |
204 For languages we define the operation of \defn{language |
182 concatenation}, written $A @ B$: |
205 concatenation}, written like in the string case as $A @ B$: |
183 |
206 |
184 \begin{equation}\label{langconc} |
207 \begin{equation}\label{langconc} |
185 A @ B \dn \{s_1 @ s_2\;|\; s_1\in A \wedge s_2\in B\} |
208 A @ B \dn \{s_1 @ s_2\;|\; s_1\in A \wedge s_2\in B\} |
186 \end{equation} |
209 \end{equation} |
187 |
210 |
188 \noindent Be careful to understand the difference: the $@$ |
211 \noindent Be careful to understand the difference: the $@$ |
189 in $s_1 @ s_2$ is string concatenation, while $A @ B$ refers |
212 in $s_1 @ s_2$ is string concatenation, while $A @ B$ refers |
190 to the concatenation of two languages (or sets of strings). |
213 to the concatenation of two languages (or sets of strings). |
191 As an example suppose $A=\{ab, ac\}$ and $B=\{zzz, qq, r\}$, |
214 As an example suppose $A=\{ab, ac\}$ and $B=\{zzz, qq, r\}$, |
192 then $A \,@\, B$ is |
215 then $A \,@\, B$ is the language |
193 |
|
194 |
216 |
195 \[ |
217 \[ |
196 \{abzzz, abqq, abr, aczzz, acqq, acr\} |
218 \{abzzz, abqq, abr, aczzz, acqq, acr\} |
197 \] |
219 \] |
198 |
220 |
206 zero element: & $A \,@\, \varnothing = \varnothing \,@\, A = |
228 zero element: & $A \,@\, \varnothing = \varnothing \,@\, A = |
207 \varnothing$ |
229 \varnothing$ |
208 \end{tabular} |
230 \end{tabular} |
209 \end{center} |
231 \end{center} |
210 |
232 |
211 \noindent Note the difference: the empty set behaves like $0$ |
233 \noindent Note the difference in the last two lines: the empty |
212 for multiplication and the set $\{[]\}$ like $1$ for |
234 set behaves like $0$ for multiplication and the set $\{[]\}$ |
213 multiplication. |
235 like $1$ for multiplication. |
214 |
236 |
215 Following the language concatenation, we can define a |
237 Following the language concatenation, we can define a |
216 \defn{language power} operation as follows: |
238 \defn{language power} operation as follows: |
217 |
239 |
218 \begin{eqnarray*} |
240 \begin{eqnarray*} |
219 A^0 & \dn & \{[]\}\\ |
241 A^0 & \dn & \{[]\}\\ |
220 A^{n+1} & \dn & A \,@\, A^n |
242 A^{n+1} & \dn & A \,@\, A^n |
221 \end{eqnarray*} |
243 \end{eqnarray*} |
222 |
244 |
223 \noindent This definition is by induction on natural numbers. |
245 \noindent This definition is by recursion on natural numbers. |
224 Note carefully that the zero-case is not defined as the empty |
246 Note carefully that the zero-case is not defined as the empty |
225 set, but the set containing the empty string. So no matter |
247 set, but the set containing the empty string. So no matter |
226 what the set $A$ is, $A^0$ will always be $\{[]\}$. (There is |
248 what the set $A$ is, $A^0$ will always be $\{[]\}$. (There is |
227 another hint about a connection between the $@$-operation and |
249 another hint about a connection between the $@$-operation and |
228 multiplication: How is $x^n$ defined and what is $x^0$?) |
250 multiplication: How is $x^n$ defined and what is $x^0$?) |
229 |
251 |
230 Next we can define the \defn{star operation} for languages: |
252 Next we can define the \defn{star operation} for languages: |
231 $A^*$ is the union of all powers of $A$, or short |
253 $A^*$ is the union of all powers of $A$, or short |
232 |
254 |
233 \[ |
255 \begin{equation}\label{star} |
234 A^* \dn \bigcup_{0\le n}\; A^n |
256 A^* \dn \bigcup_{0\le n}\; A^n |
235 \] |
257 \end{equation} |
236 |
258 |
237 \noindent |
259 \noindent This star operation is often also called |
238 Unfolding this definition |
260 \emph{Kleene-star}. Unfolding the definition in \eqref{star} |
|
261 gives |
239 |
262 |
240 \[ |
263 \[ |
241 A^0 \cup A^1 \cup A^2 \cup A^3 \cup \ldots |
264 A^0 \cup A^1 \cup A^2 \cup A^3 \cup \ldots |
242 \] |
265 \] |
243 |
266 |
246 |
269 |
247 \[ |
270 \[ |
248 \{[]\} \,\cup\, A \,\cup\, A @ A \,\cup\, A @ A @ A \,\cup\, \ldots |
271 \{[]\} \,\cup\, A \,\cup\, A @ A \,\cup\, A @ A @ A \,\cup\, \ldots |
249 \] |
272 \] |
250 |
273 |
251 \noindent we can see that the empty string is always in $A^*$, |
274 \noindent We can see that the empty string is always in $A^*$, |
252 no matter what $A$ is. This is because $[] \in A^0$. To make |
275 no matter what $A$ is. This is because $[] \in A^0$. To make |
253 sure you understand these definition, I leave you to answer |
276 sure you understand these definitions, I leave you to answer |
254 what $\{[]\}^*$ and $\varnothing^*$ are. |
277 what $\{[]\}^*$ and $\varnothing^*$ are. |
255 |
278 |
256 Recall that an alphabet is often referred to by the letter |
279 Recall that an alphabet is often referred to by the letter |
257 $\Sigma$. We can now write for the set of all strings over |
280 $\Sigma$. We can now write for the set of all strings over |
258 this alphabet $\Sigma^*$. In doing so we also include the |
281 this alphabet $\Sigma^*$. In doing so we also include the |
259 empty string as a possible string over $\Sigma$. So if |
282 empty string as a possible string over $\Sigma$. So if |
260 $\Sigma = \{a, b\}$ then $\Sigma^*$ is |
283 $\Sigma = \{a, b\}$, then $\Sigma^*$ is |
261 |
284 |
262 \[ |
285 \[ |
263 \{[], a, b, ab, ba, aaa, aab, aba, abb, baa, bab, \ldots\} |
286 \{[], a, b, ab, ba, aaa, aab, aba, abb, baa, bab, \ldots\} |
264 \] |
287 \] |
265 |
288 |