|
1 \documentclass{article} |
|
2 \usepackage{../style} |
|
3 \usepackage{../langs} |
|
4 |
|
5 |
|
6 |
|
7 \begin{document} |
|
8 |
|
9 \section*{A Crash-Course on Notation} |
|
10 |
|
11 \subsubsection*{Characters and Strings} |
|
12 |
|
13 In this module we will often use \defn{characters}. While they |
|
14 are surely familiar, we will make one subtle distinction. If |
|
15 we want to refer to concrete characters, like \code{a}, |
|
16 \code{b} and so on, we use a typewriter font. So if we want to |
|
17 refer to the concrete characters of my email address we shall |
|
18 write |
|
19 |
|
20 \begin{center} |
|
21 \pcode{christian.urban@kcl.ac.uk} |
|
22 \end{center} |
|
23 |
|
24 \noindent If we need to explicitly indicate the ``space'' |
|
25 character, we write \VS{}\hspace{1mm}. For example |
|
26 |
|
27 \begin{center} |
|
28 \tt{}hello\VS\hspace{0.5mm}world |
|
29 \end{center} |
|
30 |
|
31 |
|
32 \noindent But often we do not care |
|
33 about which characters we use. In such cases we us the italic |
|
34 font and write $a$, $b$ and so on. So if we need a |
|
35 representative string, we might write |
|
36 |
|
37 \begin{equation}\label{abracadabra} |
|
38 abracadabra |
|
39 \end{equation} |
|
40 |
|
41 \noindent We do not really care what the characters stand for, |
|
42 except we do care about is that for example the character $a$ |
|
43 is not equal to $b$. |
|
44 |
|
45 An \defn{alphabet} is a finite set of characters. Often the |
|
46 letter $\Sigma$ is used to refer to an alphabet. For example |
|
47 the ASCII characters \pcode{a} to \pcode{z} form an alphabet. |
|
48 The digits $0$ to $9$ are another alphabet. If nothing else is |
|
49 specified, we usually assume the alphabet consists of just the |
|
50 lower-case letters $a$, $b$, \ldots, $z$. Sometimes, however, |
|
51 we explicitly restrict strings to contain, for example, only |
|
52 the letters $a$ and $b$. In this case we say the alphabet is |
|
53 the set $\{a, b\}$. |
|
54 |
|
55 \defn{Strings} are lists of characters. Unfortunately, there |
|
56 are many ways how we can write down strings. In programming |
|
57 languages, they are usually written as \dq{$hello$} where the |
|
58 double quotes indicate that we dealing with a string. But |
|
59 since, strings are lists of characters we could also write |
|
60 this string as |
|
61 |
|
62 \[ |
|
63 [\text{\it h, e, l, l, o}] |
|
64 \] |
|
65 |
|
66 \noindent The important point is that we can always decompose |
|
67 such strings. For example, we will often consider the first |
|
68 character of a string, say $h$, and the ``rest'' of a string |
|
69 say \dq{\textit{ello}} when making definitions about strings. |
|
70 There are some subtleties with the empty string, sometimes |
|
71 written as \dq{} but also as the empty list of characters |
|
72 $[\,]$. Two strings, for example $s_1$ and $s_2$, can be |
|
73 \emph{concatenated}, which we write as $s_1 @ s_2$. Suppose we |
|
74 are given two strings \dq{\textit{foo}} and \dq{\textit{bar}}, |
|
75 then their concatenation, writen |
|
76 \dq{\textit{foo}} $@$ \dq{\textit{bar}}, gives |
|
77 \dq{\textit{foobar}}. Often we will simplify our life and just |
|
78 drop the double quotes whenever it is clear we are talking |
|
79 about strings, writing as already in \eqref{abracadabra} just |
|
80 \textit{foo}, \textit{bar}, \textit{foobar} or \textit{foo $@$ |
|
81 bar}. |
|
82 |
|
83 Some simple properties of string concatenation hold. For |
|
84 example the concatenation operation is \emph{associative}, |
|
85 meaning |
|
86 |
|
87 \[(s_1 @ s_2) @ s_3 = s_1 @ (s_2 @ s_3)\] |
|
88 |
|
89 \noindent are always equal strings. The empty string behaves |
|
90 like a unit element, therefore |
|
91 |
|
92 \[s \,@\, [] = [] \,@\, s = s\] |
|
93 |
|
94 While for us strings are just lists of characters, programming |
|
95 languages often differentiate between the two concepts. In |
|
96 Scala, for example, there is the type of \code{String} and the |
|
97 type of lists of characters, \code{List[Char]}. They are not |
|
98 the same and we need to explicitly coerce elements between the |
|
99 two types, for example |
|
100 |
|
101 \begin{lstlisting}[numbers=none] |
|
102 scala> "abc".toList |
|
103 res01: List[Char] = List(a, b, c) |
|
104 \end{lstlisting} |
|
105 |
|
106 |
|
107 \subsubsection*{Sets and Languages} |
|
108 |
|
109 We will use the familiar operations $\cup$ and $\cap$ for |
|
110 sets. For the empty set we will either write $\varnothing$ or |
|
111 $\{\,\}$. The set containing, for example, the natural numbers |
|
112 $1$, $2$ and $3$ we will write as |
|
113 |
|
114 \[ |
|
115 \{1, 2, 3\} |
|
116 \] |
|
117 |
|
118 \noindent The notation $\in$ means \emph{element of}, so $1 |
|
119 \in \{1, 2, 3\}$ is true and $3 \in \{1, 2, 3\}$ is false. |
|
120 Sets can potentially have infinitely many elements. For |
|
121 example the set of all natural numbers $\{0, 1, 2, \ldots\}$ |
|
122 is infinite. This set is often also abbreviated as |
|
123 $\mathbb{N}$. We can define sets by giving all elements, like |
|
124 $\{0, 1\}$, but also by \defn{set comprehensions}. For example |
|
125 the set of all even natural numbers can be defined as |
|
126 |
|
127 \[ |
|
128 \{n\;|\;n\in\mathbb{N} \wedge n\;\text{is even}\} |
|
129 \] |
|
130 |
|
131 \noindent Though silly, but the set $\{0, 1, 2\}$ could also be |
|
132 defined by the following set comprehension |
|
133 |
|
134 \[ |
|
135 \{n\;|\; n^2 < 9 \wedge n \in \mathbb{N}\} |
|
136 \] |
|
137 |
|
138 \noindent Notice that set comprehensions could be used |
|
139 to define set union, intersection and difference: |
|
140 |
|
141 \begin{eqnarray*} |
|
142 A \cup B & \dn & \{x\;|\; x \in A \vee x \in B\}\\ |
|
143 A \cap B & \dn & \{x\;|\; x \in A \wedge x \in B\}\\ |
|
144 A \backslash B & \dn & \{x\;|\; x \in A \wedge x \not\in B\} |
|
145 \end{eqnarray*} |
|
146 |
|
147 \noindent For defining sets, we will also often use the notion |
|
148 of the ``big union''. An example is as follows: |
|
149 |
|
150 \begin{equation}\label{bigunion} |
|
151 \bigcup_{0\le n}\; \{n^2, n^2 + 1\} |
|
152 \end{equation} |
|
153 |
|
154 \noindent which is the set of all squares and their immediate |
|
155 successors, so |
|
156 |
|
157 \[ |
|
158 \{0, 1, 2, 4, 5, 9, 10, 16, 17, \ldots\} |
|
159 \] |
|
160 |
|
161 \noindent A big union is a sequence of unions which are |
|
162 indexed typically by a natural number. So the big union in |
|
163 \eqref{bigunion} could equally be written as |
|
164 |
|
165 \[ |
|
166 \{0, 1\} \cup \{1, 2\} \cup \{4, 5\} \cup \{9, 10\} \cup |
|
167 \ldots |
|
168 \] |
|
169 |
|
170 \noindent but using the big union notation is more concise. |
|
171 |
|
172 An important notion in this module are \defn{Languages}, which |
|
173 are sets of strings. The main goal for us will be how to |
|
174 (formally) specify languages and to find out whether a string |
|
175 is in a language or not. Note that the language containing the |
|
176 empty string $\{\dq{}\}$ is not equal to the empty language |
|
177 (or empty set): The former contains one element, namely \dq{} |
|
178 (also written $[\,]$), but the latter does not contain any. |
|
179 |
|
180 |
|
181 For languages we define the operation of \defn{language |
|
182 concatenation}, written $A @ B$: |
|
183 |
|
184 \begin{equation}\label{langconc} |
|
185 A @ B \dn \{s_1 @ s_2\;|\; s_1\in A \wedge s_2\in B\} |
|
186 \end{equation} |
|
187 |
|
188 \noindent Be careful to understand the difference: the $@$ |
|
189 in $s_1 @ s_2$ is string concatenation, while $A @ B$ refers |
|
190 to the concatenation of two languages (or sets of strings). |
|
191 As an example suppose $A=\{ab, ac\}$ and $B=\{zzz, qq, r\}$, |
|
192 then $A \,@\, B$ is |
|
193 |
|
194 |
|
195 \[ |
|
196 \{abzzz, abqq, abr, aczzz, acqq, acr\} |
|
197 \] |
|
198 |
|
199 \noindent Recall the properties for string concatenation. For |
|
200 language concatenation we have the following properties |
|
201 |
|
202 \begin{center} |
|
203 \begin{tabular}{ll} |
|
204 associativity: & $(A @ B) @ C = A @ (B @ C)$\\ |
|
205 unit element: & $A \,@\, \{[]\} = \{[]\} \,@\, A = A$\\ |
|
206 zero element: & $A \,@\, \varnothing = \varnothing \,@\, A = |
|
207 \varnothing$ |
|
208 \end{tabular} |
|
209 \end{center} |
|
210 |
|
211 \noindent Note the difference: the empty set behaves like $0$ |
|
212 for multiplication and the set $\{[]\}$ like $1$ for |
|
213 multiplication. |
|
214 |
|
215 Following the language concatenation, we can define a |
|
216 \defn{language power} operation as follows: |
|
217 |
|
218 \begin{eqnarray*} |
|
219 A^0 & \dn & \{[]\}\\ |
|
220 A^{n+1} & \dn & A \,@\, A^n |
|
221 \end{eqnarray*} |
|
222 |
|
223 \noindent This definition is by induction on natural numbers. |
|
224 Note carefully that the zero-case is not defined as the empty |
|
225 set, but the set containing the empty string. So no matter |
|
226 what the set $A$ is, $A^0$ will always be $\{[]\}$. (There is |
|
227 another hint about a connection between the $@$-operation and |
|
228 multiplication: How is $x^n$ defined and what is $x^0$?) |
|
229 |
|
230 Next we can define the \defn{star operation} for languages: |
|
231 $A^*$ is the union of all powers of $A$, or short |
|
232 |
|
233 \[ |
|
234 A^* \dn \bigcup_{0\le n}\; A^n |
|
235 \] |
|
236 |
|
237 \noindent |
|
238 Unfolding this definition |
|
239 |
|
240 \[ |
|
241 A^0 \cup A^1 \cup A^2 \cup A^3 \cup \ldots |
|
242 \] |
|
243 |
|
244 \noindent |
|
245 which is equal to |
|
246 |
|
247 \[ |
|
248 \{[]\} \,\cup\, A \,\cup\, A @ A \,\cup\, A @ A @ A \,\cup\, \ldots |
|
249 \] |
|
250 |
|
251 \noindent we can see that the empty string is always in $A^*$, |
|
252 no matter what $A$ is. This is because $[] \in A^0$. To make |
|
253 sure you understand these definition, I leave you to answer |
|
254 what $\{[]\}^*$ and $\varnothing^*$ are. |
|
255 |
|
256 Recall that an alphabet is often referred to by the letter |
|
257 $\Sigma$. We can now write for the set of all strings over |
|
258 this alphabet $\Sigma^*$. In doing so we also include the |
|
259 empty string as a possible string over $\Sigma$. So if |
|
260 $\Sigma = \{a, b\}$ then $\Sigma^*$ is |
|
261 |
|
262 \[ |
|
263 \{[], a, b, ab, ba, aaa, aab, aba, abb, baa, bab, \ldots\} |
|
264 \] |
|
265 |
|
266 \noindent or in other words all strings containing $a$s and |
|
267 $b$s only. |
|
268 |
|
269 \end{document} |
|
270 |
|
271 %%% Local Variables: |
|
272 %%% mode: latex |
|
273 %%% TeX-master: t |
|
274 %%% End: |