22
+ − 1
\documentclass{article}
267
Christian Urban <christian dot urban at kcl dot ac dot uk>
diff
changeset
+ − 2
\usepackage{../style}
889
+ − 3
\usepackage{../graphicss}
+ − 4
22
+ − 5
\begin{document}
+ − 6
+ − 7
\section*{Homework 2}
+ − 8
916
+ − 9
%\HEADER
347
Christian Urban <christian dot urban at kcl dot ac dot uk>
diff
changeset
+ − 10
22
+ − 11
\begin{enumerate}
499
+ − 12
\item What is the difference between \emph{basic} regular expressions
885
+ − 13
and \emph{extended} regular expressions?
+ − 14
+ − 15
\solution{Basic regular expressions are $\ZERO$, $\ONE$, $c$, $r_1 + r_2$,
+ − 16
$r_1 \cdot r_2$, $r^*$. The extended ones are the bounded
+ − 17
repetitions, not, etc.}
499
+ − 18
401
Christian Urban <christian dot urban at kcl dot ac dot uk>
diff
changeset
+ − 19
\item What is the language recognised by the regular
885
+ − 20
expressions $(\ZERO^*)^*$.
+ − 21
+ − 22
\solution{$L(\ZERO^*{}^*) = \{[]\}$,
+ − 23
remember * always includes the empty string}
258
Christian Urban <christian dot urban at kcl dot ac dot uk>
diff
changeset
+ − 24
401
Christian Urban <christian dot urban at kcl dot ac dot uk>
diff
changeset
+ − 25
\item Review the first handout about sets of strings and read
Christian Urban <christian dot urban at kcl dot ac dot uk>
diff
changeset
+ − 26
the second handout. Assuming the alphabet is the set
Christian Urban <christian dot urban at kcl dot ac dot uk>
diff
changeset
+ − 27
$\{a, b\}$, decide which of the following equations are
Christian Urban <christian dot urban at kcl dot ac dot uk>
diff
changeset
+ − 28
true in general for arbitrary languages $A$, $B$ and
Christian Urban <christian dot urban at kcl dot ac dot uk>
diff
changeset
+ − 29
$C$:
115
Christian Urban <christian dot urban at kcl dot ac dot uk>
diff
changeset
+ − 30
401
Christian Urban <christian dot urban at kcl dot ac dot uk>
diff
changeset
+ − 31
\begin{eqnarray}
Christian Urban <christian dot urban at kcl dot ac dot uk>
diff
changeset
+ − 32
(A \cup B) @ C & =^? & A @ C \cup B @ C\nonumber\\
Christian Urban <christian dot urban at kcl dot ac dot uk>
diff
changeset
+ − 33
A^* \cup B^* & =^? & (A \cup B)^*\nonumber\\
Christian Urban <christian dot urban at kcl dot ac dot uk>
diff
changeset
+ − 34
A^* @ A^* & =^? & A^*\nonumber\\
Christian Urban <christian dot urban at kcl dot ac dot uk>
diff
changeset
+ − 35
(A \cap B)@ C & =^? & (A@C) \cap (B@C)\nonumber
Christian Urban <christian dot urban at kcl dot ac dot uk>
diff
changeset
+ − 36
\end{eqnarray}
104
Christian Urban <christian dot urban at kcl dot ac dot uk>
diff
changeset
+ − 37
401
Christian Urban <christian dot urban at kcl dot ac dot uk>
diff
changeset
+ − 38
\noindent In case an equation is true, give an
Christian Urban <christian dot urban at kcl dot ac dot uk>
diff
changeset
+ − 39
explanation; otherwise give a counter-example.
104
Christian Urban <christian dot urban at kcl dot ac dot uk>
diff
changeset
+ − 40
885
+ − 41
\solution{1 + 3 are equal; 2 + 4 are not. Interesting is 4 where
928
+ − 42
$A = \{[a]\}$, $B = \{[]\}$ and $C = \{[a], []\}$ is an
+ − 43
counter-example.\medskip
889
+ − 44
928
+ − 45
For equations like 3 it is always a good idea to prove the
889
+ − 46
two inclusions
+ − 47
+ − 48
\[
+ − 49
A^* \subseteq A^* @ A^* \qquad
+ − 50
A^* @ A^* \subseteq A^*
+ − 51
\]
+ − 52
+ − 53
This means for every string $s$ we have to show
+ − 54
+ − 55
\[
+ − 56
s \in A^* \;\textit{implies}\; s \in A^* @ A^* \qquad
+ − 57
s \in A^* @ A^* \;\textit{implies}\; s \in A^*
+ − 58
\]
+ − 59
+ − 60
The first one is easy because $[] \in A^*$ and therefore
+ − 61
$s @ [] \in A^* @ A^*$.
+ − 62
+ − 63
The second one says that $s$ must be of the form $s = s_1 @ s_2$ with
+ − 64
$s_1 \in A^*$ and $s_2 \in A^*$. We have to show that
+ − 65
$s_1 @ s_2 \in A^*$.
+ − 66
+ − 67
If $s_1 \in A^*$ then there exists an $n$ such that $s_1 \in A^n$, and
+ − 68
if $s_2 \in A^*$ then there exists an $m$ such that $s_2 \in A^m$.\bigskip
+ − 69
+ − 70
928
+ − 71
Aside: We are going to show the following power law
889
+ − 72
+ − 73
\[
+ − 74
A^n \,@\, A^m = A^{n+m}
+ − 75
\]
+ − 76
+ − 77
We prove that by induction on $n$.
+ − 78
+ − 79
Case $n = 0$: $A^0 \,@\, A^m = A^{0+m}$ holds because $A^0 = \{[]\}$
+ − 80
and $\{[]\} \,@\, A^m = A ^ m$ and $0 + m = m$.\medskip
+ − 81
+ − 82
Case $n + 1$: The induction hypothesis is
+ − 83
+ − 84
\[ A^n \,@\, A^m = A^{n+m}
+ − 85
\]
+ − 86
+ − 87
We need to prove
+ − 88
+ − 89
\[
+ − 90
A^{n+1} \,@\, A^m = A^{(n+1)+m}
+ − 91
\]
+ − 92
+ − 93
The left-hand side is $(A \,@\, A^n) \,@\, A^m$ by the definition of
+ − 94
the power operation. We can rearrange that
+ − 95
to $A \,@\, (A^n \,@\, A^m)$. \footnote{Because for all languages $A$, $B$, $C$ we have $(A @ B) @ C = A @ (B @ C)$.}
+ − 96
+ − 97
By the induction hypothesis we know that $A^n \,@\, A^m = A^{n+m}$.
+ − 98
+ − 99
So we have $A \,@\, (A^{n+m})$. But this is $A^{(n+m)+1}$ again if we
+ − 100
apply the definition of the power operator. If we
+ − 101
rearrange that we get $A^{(n+1)+m}$ and are done with
+ − 102
what we need to prove for the power law.\bigskip
+ − 103
+ − 104
Picking up where we left, we know that $s_1 \in A^n$ and $s_2 \in A^m$. This now implies that $s_1 @ s_2\in A^n @ A^m$. By the power law this means
+ − 105
$s_1 @ s_2\in A^{n+m}$. But this also means $s_1 @ s_2\in A^*$.
+ − 106
}
885
+ − 107
401
Christian Urban <christian dot urban at kcl dot ac dot uk>
diff
changeset
+ − 108
\item Given the regular expressions $r_1 = \ONE$ and $r_2 =
Christian Urban <christian dot urban at kcl dot ac dot uk>
diff
changeset
+ − 109
\ZERO$ and $r_3 = a$. How many strings can the regular
Christian Urban <christian dot urban at kcl dot ac dot uk>
diff
changeset
+ − 110
expressions $r_1^*$, $r_2^*$ and $r_3^*$ each match?
104
Christian Urban <christian dot urban at kcl dot ac dot uk>
diff
changeset
+ − 111
885
+ − 112
\solution{$r_1$ and $r_2$ can match the empty string only, $r_3$ can
+ − 113
match $[]$, $a$, $aa$, ....}
+ − 114
401
Christian Urban <christian dot urban at kcl dot ac dot uk>
diff
changeset
+ − 115
\item Give regular expressions for (a) decimal numbers and for
617
+ − 116
(b) binary numbers. Hint: Observe that the empty string
401
Christian Urban <christian dot urban at kcl dot ac dot uk>
diff
changeset
+ − 117
is not a number. Also observe that leading 0s are
617
+ − 118
normally not written---for example the JSON format for numbers
891
+ − 119
explicitly forbids this. So 007 is not a number according to JSON.
22
+ − 120
885
+ − 121
\solution{Just numbers without leading 0s: $0 + (1..9)\cdot(0..1)^*$;
+ − 122
can be extended to decimal; similar for binary numbers
+ − 123
}
+ − 124
258
Christian Urban <christian dot urban at kcl dot ac dot uk>
diff
changeset
+ − 125
\item Decide whether the following two regular expressions are
401
Christian Urban <christian dot urban at kcl dot ac dot uk>
diff
changeset
+ − 126
equivalent $(\ONE + a)^* \equiv^? a^*$ and $(a \cdot
Christian Urban <christian dot urban at kcl dot ac dot uk>
diff
changeset
+ − 127
b)^* \cdot a \equiv^? a \cdot (b \cdot a)^*$.
22
+ − 128
885
+ − 129
\solution{Both are equivalent, but why the second? Essentially you have to show that each string in one set is in the other. For 2 this means you can do an induction proof that $(ab)^na$ is the same string as $a(ba)^n$, where the former is in the first set and the latter in the second.}
+ − 130
22
+ − 131
881
+ − 132
\item Give an argument for why the following holds:
885
+ − 133
if $r$ is nullable then $r^{\{n\}} \equiv r^{\{..n\}}$.
+ − 134
928
+ − 135
\solution{ This requires an inductive proof. There are a number of
+ − 136
ways to prove this. It is clear that if $s \in L (r^{\{n\}})$ then
+ − 137
also $s \in L (r^{\{..n\}})$.
+ − 138
+ − 139
So one way to prove this is to show
+ − 140
that if $s \in L (r^{\{..n\}})$ then also $s \in L (r^{\{n\}})$
+ − 141
(under the assumption that $r$ is nullable, otherwise it would not
+ − 142
be true). The assumption $s \in L (r^{\{..n\}})$ means
+ − 143
that $s \in L(r^{\{i\}})$ with $i \leq n$ holds
+ − 144
and we have to show that $s \in L(r^{\{n\}})$ holds.
+ − 145
+ − 146
One can do this by induction for languages as follows:
+ − 147
+ − 148
\[
+ − 149
\textit{if}\; [] \in A \;\textit{and}\; s \in A^n
+ − 150
\;\textit{then}\; s \in A^{n+m}
+ − 151
\]
+ − 152
+ − 153
The proof is by induction on $m$. The base case $m=0$ is trivial.
+ − 154
For the $m + 1$ case we have the induction hypothesis:
+ − 155
+ − 156
\[
+ − 157
\textit{if}\; [] \in A \;\textit{and}\; s \in A^n
+ − 158
\;\textit{then}\; s \in A^{n+m}
+ − 159
\]
+ − 160
+ − 161
and we have to show
+ − 162
+ − 163
\[
+ − 164
s \in A^{n+m+1}
+ − 165
\]
+ − 166
+ − 167
under the assumption $[] \in A$ and $s \in A^n$. From the
+ − 168
assumptions and the IH we can infer that $s\in A^{n + m}$.
+ − 169
Then using the assumption $[] \in A$ we can infer that also
+ − 170
+ − 171
\[
+ − 172
s\in A \,@\, A^{n + m}
+ − 173
\]
+ − 174
+ − 175
which is equivalent to what we need to show $s \in A^{n+m+1}$.
+ − 176
+ − 177
Now we know $s \in L(r^{\{i\}})$ with $i \leq n$. Since $i + m = n$
+ − 178
for some $m$ we can conclude that $s \in L(r^{\{n\}})$. Done.
+ − 179
+ − 180
}
+ − 181
+ − 182
\item Given the regular expression $r = (a \cdot b + b)^*$.
+ − 183
Compute what the derivative of $r$ is with respect to
+ − 184
$a$, $b$ and $c$. Is $r$ nullable?
258
Christian Urban <christian dot urban at kcl dot ac dot uk>
diff
changeset
+ − 185
355
Christian Urban <christian dot urban at kcl dot ac dot uk>
diff
changeset
+ − 186
\item Define what is meant by the derivative of a regular
Christian Urban <christian dot urban at kcl dot ac dot uk>
diff
changeset
+ − 187
expressions with respect to a character. (Hint: The
Christian Urban <christian dot urban at kcl dot ac dot uk>
diff
changeset
+ − 188
derivative is defined recursively.)
267
Christian Urban <christian dot urban at kcl dot ac dot uk>
diff
changeset
+ − 189
928
+ − 190
\solution{The recursive function for $der$.}
885
+ − 191
768
+ − 192
\item Assume the set $Der$ is defined as
267
Christian Urban <christian dot urban at kcl dot ac dot uk>
diff
changeset
+ − 193
Christian Urban <christian dot urban at kcl dot ac dot uk>
diff
changeset
+ − 194
\begin{center}
Christian Urban <christian dot urban at kcl dot ac dot uk>
diff
changeset
+ − 195
$Der\,c\,A \dn \{ s \;|\; c\!::\!s \in A\}$
Christian Urban <christian dot urban at kcl dot ac dot uk>
diff
changeset
+ − 196
\end{center}
Christian Urban <christian dot urban at kcl dot ac dot uk>
diff
changeset
+ − 197
401
Christian Urban <christian dot urban at kcl dot ac dot uk>
diff
changeset
+ − 198
What is the relation between $Der$ and the notion of
Christian Urban <christian dot urban at kcl dot ac dot uk>
diff
changeset
+ − 199
derivative of regular expressions?
267
Christian Urban <christian dot urban at kcl dot ac dot uk>
diff
changeset
+ − 200
885
+ − 201
\solution{Main property is $L(der\,c\,r) = Der\,c\,(L(r))$.}
+ − 202
267
Christian Urban <christian dot urban at kcl dot ac dot uk>
diff
changeset
+ − 203
\item Give a regular expression over the alphabet $\{a,b\}$
401
Christian Urban <christian dot urban at kcl dot ac dot uk>
diff
changeset
+ − 204
recognising all strings that do not contain any
Christian Urban <christian dot urban at kcl dot ac dot uk>
diff
changeset
+ − 205
substring $bb$ and end in $a$.
267
Christian Urban <christian dot urban at kcl dot ac dot uk>
diff
changeset
+ − 206
931
+ − 207
\solution{$((ba)^* \cdot (a)^*)^*\,\cdot\,a + ba$ \quad Not sure whether this can be simplified.}
885
+ − 208
401
Christian Urban <christian dot urban at kcl dot ac dot uk>
diff
changeset
+ − 209
\item Do $(a + b)^* \cdot b^+$ and $(a^* \cdot b^+) +
885
+ − 210
(b^*\cdot b^+)$ define the same language?
+ − 211
889
+ − 212
\solution{No, the first one can match for example abababababbbbb
+ − 213
while the second can only match for example aaaaaabbbbb or bbbbbbb}
294
Christian Urban <christian dot urban at kcl dot ac dot uk>
diff
changeset
+ − 214
Christian Urban <christian dot urban at kcl dot ac dot uk>
diff
changeset
+ − 215
\item Define the function $zeroable$ by recursion over regular
401
Christian Urban <christian dot urban at kcl dot ac dot uk>
diff
changeset
+ − 216
expressions. This function should satisfy the property
294
Christian Urban <christian dot urban at kcl dot ac dot uk>
diff
changeset
+ − 217
Christian Urban <christian dot urban at kcl dot ac dot uk>
diff
changeset
+ − 218
\[
401
Christian Urban <christian dot urban at kcl dot ac dot uk>
diff
changeset
+ − 219
zeroable(r) \;\;\text{if and only if}\;\;L(r) = \{\}\qquad(*)
294
Christian Urban <christian dot urban at kcl dot ac dot uk>
diff
changeset
+ − 220
\]
Christian Urban <christian dot urban at kcl dot ac dot uk>
diff
changeset
+ − 221
401
Christian Urban <christian dot urban at kcl dot ac dot uk>
diff
changeset
+ − 222
The function $nullable$ for the not-regular expressions
Christian Urban <christian dot urban at kcl dot ac dot uk>
diff
changeset
+ − 223
can be defined by
294
Christian Urban <christian dot urban at kcl dot ac dot uk>
diff
changeset
+ − 224
Christian Urban <christian dot urban at kcl dot ac dot uk>
diff
changeset
+ − 225
\[
Christian Urban <christian dot urban at kcl dot ac dot uk>
diff
changeset
+ − 226
nullable(\sim r) \dn \neg(nullable(r))
Christian Urban <christian dot urban at kcl dot ac dot uk>
diff
changeset
+ − 227
\]
Christian Urban <christian dot urban at kcl dot ac dot uk>
diff
changeset
+ − 228
401
Christian Urban <christian dot urban at kcl dot ac dot uk>
diff
changeset
+ − 229
Unfortunately, a similar definition for $zeroable$ does
Christian Urban <christian dot urban at kcl dot ac dot uk>
diff
changeset
+ − 230
not satisfy the property in $(*)$:
294
Christian Urban <christian dot urban at kcl dot ac dot uk>
diff
changeset
+ − 231
Christian Urban <christian dot urban at kcl dot ac dot uk>
diff
changeset
+ − 232
\[
Christian Urban <christian dot urban at kcl dot ac dot uk>
diff
changeset
+ − 233
zeroable(\sim r) \dn \neg(zeroable(r))
Christian Urban <christian dot urban at kcl dot ac dot uk>
diff
changeset
+ − 234
\]
Christian Urban <christian dot urban at kcl dot ac dot uk>
diff
changeset
+ − 235
889
+ − 236
Find a counter example?
+ − 237
+ − 238
+ − 239
\solution{
+ − 240
Here the idea is that nullable for NOT can be defined as
+ − 241
+ − 242
\[nullable(\sim r) \dn \neg(nullable(r))\]
+ − 243
+ − 244
This will satisfy the property
+ − 245
$nullable(r) \;\;\text{if and only if}\;\;[] \in L(r)$. (Remember how
+ − 246
$L(\sim r)$ is defined).\bigskip
+ − 247
+ − 248
But you cannot define
+ − 249
+ − 250
\[zeroable(\sim r) \dn \neg(zeroable(r))\]
+ − 251
+ − 252
because if $r$ for example is $\ONE$ then $\sim \ONE$ can match
+ − 253
some strings (all non-empty strings). So $zeroable$ should be false. But if we follow
+ − 254
the above definition we would obtain $\neg(zeroable(\ONE))$. According
+ − 255
to the definition of $zeroable$ for $\ONE$ this would be false,
+ − 256
but if we now negate false, we get actually true. So the above
+ − 257
definition would not satisfy the property
+ − 258
+ − 259
\[
+ − 260
zeroable(r) \;\;\text{if and only if}\;\;L(r) = \{\}
+ − 261
\]
+ − 262
}
294
Christian Urban <christian dot urban at kcl dot ac dot uk>
diff
changeset
+ − 263
401
Christian Urban <christian dot urban at kcl dot ac dot uk>
diff
changeset
+ − 264
\item Give a regular expressions that can recognise all
Christian Urban <christian dot urban at kcl dot ac dot uk>
diff
changeset
+ − 265
strings from the language $\{a^n\;|\;\exists k.\; n = 3 k
885
+ − 266
+ 1 \}$.
+ − 267
+ − 268
\solution{$a(aaa)^*$}
404
Christian Urban <christian dot urban at kcl dot ac dot uk>
diff
changeset
+ − 269
Christian Urban <christian dot urban at kcl dot ac dot uk>
diff
changeset
+ − 270
\item Give a regular expression that can recognise an odd
889
+ − 271
number of $a$s or an even number of $b$s.
+ − 272
+ − 273
\solution{
+ − 274
If the a's and b's are meant to be separate, then this is easy
+ − 275
+ − 276
\[a(aa)^* + (bb)^*\]
+ − 277
+ − 278
If the letters are mixed, then this is difficult
+ − 279
+ − 280
\[(aa|bb|(ab|ba)\cdot (aa|bb)^* \cdot (ba|ab))^* \cdot (b|(ab|ba)(bb|aa)^* \cdot a)
+ − 281
\]
+ − 282
+ − 283
(copied from somewhere ;o)
+ − 284
928
+ − 285
The idea behind this monstrous regex is essentially the DFA
889
+ − 286
+ − 287
\begin{center}
+ − 288
\begin{tikzpicture}[scale=1,>=stealth',very thick,
+ − 289
every state/.style={minimum size=0pt,
+ − 290
draw=blue!50,very thick,fill=blue!20}]
+ − 291
\node[state,initial] (q0) at (0,2) {$q_0$};
+ − 292
\node[state,accepting] (q1) at (2,2) {$q_1$};
+ − 293
\node[state] (q2) at (0,0) {$q_2$};
+ − 294
\node[state] (q3) at (2,0) {$q_3$};
+ − 295
+ − 296
\path[->] (q0) edge[bend left] node[above] {$a$} (q1)
+ − 297
(q1) edge[bend left] node[above] {$a$} (q0)
+ − 298
(q2) edge[bend left] node[above] {$a$} (q3)
+ − 299
(q3) edge[bend left] node[above] {$a$} (q2)
+ − 300
(q0) edge[bend left] node[right] {$b$} (q2)
+ − 301
(q2) edge[bend left] node[left] {$b$} (q0)
+ − 302
(q1) edge[bend left] node[right] {$b$} (q3)
+ − 303
(q3) edge[bend left] node[left] {$b$} (q1);
+ − 304
\end{tikzpicture}
+ − 305
\end{center}
928
+ − 306
+ − 307
Maybe a good idea to reconsider this example in Lecture 3
+ − 308
where the Brzozowski algorithm for DFA $\rightarrow$ Regex
+ − 309
can be used.
889
+ − 310
}
444
Christian Urban <christian dot urban at kcl dot ac dot uk>
diff
changeset
+ − 311
Christian Urban <christian dot urban at kcl dot ac dot uk>
diff
changeset
+ − 312
\item \POSTSCRIPT
404
Christian Urban <christian dot urban at kcl dot ac dot uk>
diff
changeset
+ − 313
\end{enumerate}
22
+ − 314
+ − 315
\end{document}
+ − 316
+ − 317
%%% Local Variables:
+ − 318
%%% mode: latex
+ − 319
%%% TeX-master: t
+ − 320
%%% End: