31
|
1 |
\documentclass{article}
|
264
Christian Urban <christian dot urban at kcl dot ac dot uk>
diff
changeset
|
2 |
\usepackage{../style}
|
292
Christian Urban <christian dot urban at kcl dot ac dot uk>
diff
changeset
|
3 |
\usepackage{../graphics}
|
146
Christian Urban <christian dot urban at kcl dot ac dot uk>
diff
changeset
|
4 |
|
893
|
5 |
\newcommand{\solution}[1]{%
|
|
6 |
\begin{quote}\sf%
|
|
7 |
#1%
|
|
8 |
\end{quote}}
|
|
9 |
\renewcommand{\solution}[1]{}
|
|
10 |
|
|
11 |
|
|
12 |
|
|
13 |
|
|
14 |
|
31
|
15 |
\begin{document}
|
|
16 |
|
|
17 |
\section*{Homework 4}
|
|
18 |
|
347
Christian Urban <christian dot urban at kcl dot ac dot uk>
diff
changeset
|
19 |
\HEADER
|
Christian Urban <christian dot urban at kcl dot ac dot uk>
diff
changeset
|
20 |
|
31
|
21 |
\begin{enumerate}
|
34
|
22 |
|
726
|
23 |
\item Given the regular expressions
|
|
24 |
|
|
25 |
\begin{center}
|
|
26 |
\begin{tabular}{ll}
|
|
27 |
1) & $(ab + a)\cdot (\ONE + b)$\\
|
|
28 |
2) & $(aa + a)^*$\\
|
|
29 |
\end{tabular}
|
|
30 |
\end{center}
|
|
31 |
|
|
32 |
there are several values for how these regular expressions can
|
|
33 |
recognise the strings (for 1) $ab$ and (for 2) $aaa$. Give in each case
|
|
34 |
\emph{all} the values and indicate which one is the POSIX value.
|
|
35 |
|
|
36 |
|
444
Christian Urban <christian dot urban at kcl dot ac dot uk>
diff
changeset
|
37 |
\item If a regular expression $r$ does not contain any occurrence of $\ZERO$,
|
893
|
38 |
is it possible for $L(r)$ to be empty? Explain why, or give a proof.
|
|
39 |
|
|
40 |
\solution{
|
|
41 |
The property to prove is
|
|
42 |
|
|
43 |
\begin{center}
|
|
44 |
$P(r)$: If $r$ does not contain $\ZERO$, then $L(r) \not= \emptyset$.
|
|
45 |
\end{center}
|
|
46 |
|
|
47 |
For this you have to now go through all cases.
|
|
48 |
|
|
49 |
Case $r = 0$: $P(\ZERO)$ says: If $\ZERO$ does not contain $\ZERO$
|
|
50 |
then \ldots. The premise is obviously false, so everything follows,
|
|
51 |
in particular $L(r) \not= \emptyset$.\medskip
|
|
52 |
|
|
53 |
Case $r = \ONE$ and $r = c$ are similar, just that the premise is
|
|
54 |
true, but also $L(\ONE)$ and $L(c)$ are not empty. So we shown
|
|
55 |
$L(r) \not= \emptyset$.\medskip
|
|
56 |
|
|
57 |
Case $r = r_1 + r_2$: We know $P(r_1)$ and $P(r_2)$ as IHs. We need to show
|
|
58 |
$P(r_1 + r_2)$: If $r_1 + r_2$ does not contain $\ZERO$, then $L(r_1 + r_2) \not= \emptyset$.
|
|
59 |
|
|
60 |
If $r_1 + r_2$ does not contain $\ZERO$, then also $r_1$ does not contain $\ZERO$
|
|
61 |
and $r_2$ does not contain $\ZERO$. So we can apply the two IHs $P(r_1)$ and $P(r_2)$,
|
|
62 |
which allow us to infer that $L(r_1) \not= \emptyset$ and $L(r_2) \not= \emptyset$.
|
|
63 |
But if this is the case, then also $L(r_1 + r_2) \not= \emptyset$, which is what we needed
|
|
64 |
to show in this case.\medskip
|
|
65 |
|
|
66 |
The other cases are similar.\bigskip
|
|
67 |
|
|
68 |
|
|
69 |
This lemma essentially says that for basic regular expressions, if
|
|
70 |
they do not match anything at all, they must contain $\ZERO$(s)
|
|
71 |
somewhere.
|
|
72 |
|
|
73 |
}
|
32
|
74 |
|
264
Christian Urban <christian dot urban at kcl dot ac dot uk>
diff
changeset
|
75 |
\item Define the tokens and regular expressions for a language
|
267
Christian Urban <christian dot urban at kcl dot ac dot uk>
diff
changeset
|
76 |
consisting of numbers, left-parenthesis $($, right-parenthesis $)$,
|
Christian Urban <christian dot urban at kcl dot ac dot uk>
diff
changeset
|
77 |
identifiers and the operations $+$, $-$ and $*$. Can the following
|
Christian Urban <christian dot urban at kcl dot ac dot uk>
diff
changeset
|
78 |
strings in this language be lexed?
|
264
Christian Urban <christian dot urban at kcl dot ac dot uk>
diff
changeset
|
79 |
|
267
Christian Urban <christian dot urban at kcl dot ac dot uk>
diff
changeset
|
80 |
\begin{itemize}
|
264
Christian Urban <christian dot urban at kcl dot ac dot uk>
diff
changeset
|
81 |
\item $(a + 3) * b$
|
Christian Urban <christian dot urban at kcl dot ac dot uk>
diff
changeset
|
82 |
\item $)()++ -33$
|
Christian Urban <christian dot urban at kcl dot ac dot uk>
diff
changeset
|
83 |
\item $(a / 3) * 3$
|
267
Christian Urban <christian dot urban at kcl dot ac dot uk>
diff
changeset
|
84 |
\end{itemize}
|
264
Christian Urban <christian dot urban at kcl dot ac dot uk>
diff
changeset
|
85 |
|
267
Christian Urban <christian dot urban at kcl dot ac dot uk>
diff
changeset
|
86 |
In case they can, can you give the corresponding token
|
Christian Urban <christian dot urban at kcl dot ac dot uk>
diff
changeset
|
87 |
sequences.
|
264
Christian Urban <christian dot urban at kcl dot ac dot uk>
diff
changeset
|
88 |
|
768
|
89 |
\item Assume $r$ is nullable. Show that
|
|
90 |
\[ 1 + r + r\cdot r \;\equiv\; r\cdot r
|
|
91 |
\]
|
|
92 |
|
|
93 |
holds.
|
|
94 |
|
893
|
95 |
\solution{
|
|
96 |
If $r$ is nullable, then $1 + r \equiv r$. With this you can replace
|
|
97 |
|
|
98 |
\begin{align}
|
|
99 |
(1 + r) + r\cdot r & \equiv r + r\cdot r\\
|
|
100 |
& \equiv r \cdot (1 + r)\\
|
|
101 |
& \equiv r \cdot r
|
|
102 |
\end{align}
|
|
103 |
|
|
104 |
where in (2) you pull out the ``factor'' $r$ (because $r_1 \cdot (r_2 + r_3) \equiv r_1 \cdot r_2 + r_1 \cdot r_3$).
|
|
105 |
}
|
|
106 |
|
|
107 |
|
768
|
108 |
\item \textbf{(Deleted)} Assume that $s^{-1}$ stands for the operation of reversing a
|
267
Christian Urban <christian dot urban at kcl dot ac dot uk>
diff
changeset
|
109 |
string $s$. Given the following \emph{reversing} function on regular
|
Christian Urban <christian dot urban at kcl dot ac dot uk>
diff
changeset
|
110 |
expressions
|
32
|
111 |
|
267
Christian Urban <christian dot urban at kcl dot ac dot uk>
diff
changeset
|
112 |
\begin{center}
|
Christian Urban <christian dot urban at kcl dot ac dot uk>
diff
changeset
|
113 |
\begin{tabular}{r@{\hspace{1mm}}c@{\hspace{1mm}}l}
|
401
Christian Urban <christian dot urban at kcl dot ac dot uk>
diff
changeset
|
114 |
$rev(\ZERO)$ & $\dn$ & $\ZERO$\\
|
Christian Urban <christian dot urban at kcl dot ac dot uk>
diff
changeset
|
115 |
$rev(\ONE)$ & $\dn$ & $\ONE$\\
|
267
Christian Urban <christian dot urban at kcl dot ac dot uk>
diff
changeset
|
116 |
$rev(c)$ & $\dn$ & $c$\\
|
Christian Urban <christian dot urban at kcl dot ac dot uk>
diff
changeset
|
117 |
$rev(r_1 + r_2)$ & $\dn$ & $rev(r_1) + rev(r_2)$\\
|
Christian Urban <christian dot urban at kcl dot ac dot uk>
diff
changeset
|
118 |
$rev(r_1 \cdot r_2)$ & $\dn$ & $rev(r_2) \cdot rev(r_1)$\\
|
Christian Urban <christian dot urban at kcl dot ac dot uk>
diff
changeset
|
119 |
$rev(r^*)$ & $\dn$ & $rev(r)^*$\\
|
Christian Urban <christian dot urban at kcl dot ac dot uk>
diff
changeset
|
120 |
\end{tabular}
|
Christian Urban <christian dot urban at kcl dot ac dot uk>
diff
changeset
|
121 |
\end{center}
|
34
|
122 |
|
267
Christian Urban <christian dot urban at kcl dot ac dot uk>
diff
changeset
|
123 |
and the set
|
32
|
124 |
|
267
Christian Urban <christian dot urban at kcl dot ac dot uk>
diff
changeset
|
125 |
\begin{center}
|
Christian Urban <christian dot urban at kcl dot ac dot uk>
diff
changeset
|
126 |
$Rev\,A \dn \{s^{-1} \;|\; s \in A\}$
|
Christian Urban <christian dot urban at kcl dot ac dot uk>
diff
changeset
|
127 |
\end{center}
|
31
|
128 |
|
267
Christian Urban <christian dot urban at kcl dot ac dot uk>
diff
changeset
|
129 |
prove whether
|
32
|
130 |
|
267
Christian Urban <christian dot urban at kcl dot ac dot uk>
diff
changeset
|
131 |
\begin{center}
|
Christian Urban <christian dot urban at kcl dot ac dot uk>
diff
changeset
|
132 |
$L(rev(r)) = Rev (L(r))$
|
Christian Urban <christian dot urban at kcl dot ac dot uk>
diff
changeset
|
133 |
\end{center}
|
31
|
134 |
|
267
Christian Urban <christian dot urban at kcl dot ac dot uk>
diff
changeset
|
135 |
holds.
|
42
|
136 |
|
401
Christian Urban <christian dot urban at kcl dot ac dot uk>
diff
changeset
|
137 |
\item Assume the delimiters for comments are
|
Christian Urban <christian dot urban at kcl dot ac dot uk>
diff
changeset
|
138 |
\texttt{$\slash$*} and \texttt{*$\slash$}. Give a
|
Christian Urban <christian dot urban at kcl dot ac dot uk>
diff
changeset
|
139 |
regular expression that can recognise comments of the
|
Christian Urban <christian dot urban at kcl dot ac dot uk>
diff
changeset
|
140 |
form
|
42
|
141 |
|
267
Christian Urban <christian dot urban at kcl dot ac dot uk>
diff
changeset
|
142 |
\begin{center}
|
Christian Urban <christian dot urban at kcl dot ac dot uk>
diff
changeset
|
143 |
\texttt{$\slash$*~\ldots{}~*$\slash$}
|
Christian Urban <christian dot urban at kcl dot ac dot uk>
diff
changeset
|
144 |
\end{center}
|
42
|
145 |
|
401
Christian Urban <christian dot urban at kcl dot ac dot uk>
diff
changeset
|
146 |
where the three dots stand for arbitrary characters, but
|
Christian Urban <christian dot urban at kcl dot ac dot uk>
diff
changeset
|
147 |
not comment delimiters. (Hint: You can assume you are
|
Christian Urban <christian dot urban at kcl dot ac dot uk>
diff
changeset
|
148 |
already given a regular expression written \texttt{ALL},
|
Christian Urban <christian dot urban at kcl dot ac dot uk>
diff
changeset
|
149 |
that can recognise any character, and a regular
|
Christian Urban <christian dot urban at kcl dot ac dot uk>
diff
changeset
|
150 |
expression \texttt{NOT} that recognises the complement
|
Christian Urban <christian dot urban at kcl dot ac dot uk>
diff
changeset
|
151 |
of a regular expression.)
|
355
Christian Urban <christian dot urban at kcl dot ac dot uk>
diff
changeset
|
152 |
|
Christian Urban <christian dot urban at kcl dot ac dot uk>
diff
changeset
|
153 |
\item Simplify the regular expression
|
42
|
154 |
|
355
Christian Urban <christian dot urban at kcl dot ac dot uk>
diff
changeset
|
155 |
\[
|
401
Christian Urban <christian dot urban at kcl dot ac dot uk>
diff
changeset
|
156 |
(\ZERO \cdot (b \cdot c)) +
|
Christian Urban <christian dot urban at kcl dot ac dot uk>
diff
changeset
|
157 |
((\ZERO \cdot c) + \ONE)
|
355
Christian Urban <christian dot urban at kcl dot ac dot uk>
diff
changeset
|
158 |
\]
|
Christian Urban <christian dot urban at kcl dot ac dot uk>
diff
changeset
|
159 |
|
401
Christian Urban <christian dot urban at kcl dot ac dot uk>
diff
changeset
|
160 |
Does simplification always preserve the meaning of a
|
Christian Urban <christian dot urban at kcl dot ac dot uk>
diff
changeset
|
161 |
regular expression?
|
355
Christian Urban <christian dot urban at kcl dot ac dot uk>
diff
changeset
|
162 |
|
401
Christian Urban <christian dot urban at kcl dot ac dot uk>
diff
changeset
|
163 |
\item The Sulzmann \& Lu algorithm contains the function
|
Christian Urban <christian dot urban at kcl dot ac dot uk>
diff
changeset
|
164 |
$mkeps$ which answers how a regular expression can match
|
Christian Urban <christian dot urban at kcl dot ac dot uk>
diff
changeset
|
165 |
the empty string. What is the answer of $mkeps$ for the
|
355
Christian Urban <christian dot urban at kcl dot ac dot uk>
diff
changeset
|
166 |
regular expressions:
|
146
Christian Urban <christian dot urban at kcl dot ac dot uk>
diff
changeset
|
167 |
|
355
Christian Urban <christian dot urban at kcl dot ac dot uk>
diff
changeset
|
168 |
\[
|
Christian Urban <christian dot urban at kcl dot ac dot uk>
diff
changeset
|
169 |
\begin{array}{l}
|
401
Christian Urban <christian dot urban at kcl dot ac dot uk>
diff
changeset
|
170 |
(\ZERO \cdot (b \cdot c)) +
|
Christian Urban <christian dot urban at kcl dot ac dot uk>
diff
changeset
|
171 |
((\ZERO \cdot c) + \ONE)\\
|
577
|
172 |
(a + \ONE) \cdot (\ONE + \ONE)\\
|
|
173 |
a^*
|
355
Christian Urban <christian dot urban at kcl dot ac dot uk>
diff
changeset
|
174 |
\end{array}
|
Christian Urban <christian dot urban at kcl dot ac dot uk>
diff
changeset
|
175 |
\]
|
Christian Urban <christian dot urban at kcl dot ac dot uk>
diff
changeset
|
176 |
|
401
Christian Urban <christian dot urban at kcl dot ac dot uk>
diff
changeset
|
177 |
\item What is the purpose of the record regular expression in
|
Christian Urban <christian dot urban at kcl dot ac dot uk>
diff
changeset
|
178 |
the Sulzmann \& Lu algorithm?
|
498
|
179 |
|
843
|
180 |
\item Recall the functions \textit{nullable} and
|
|
181 |
\textit{zeroable}. Define recursive functions
|
|
182 |
\textit{atmostempty} (for regular expressions that match no
|
|
183 |
string or only the empty string), \textit{somechars} (for
|
|
184 |
regular expressions that match some non-empty string),
|
|
185 |
\textit{infinitestrings} (for regular expressions that can match
|
|
186 |
infinitely many strings).
|
893
|
187 |
|
|
188 |
\solution{
|
|
189 |
\textbf{zeroable}: The property is $z(r) \;\text{iff}\; L(r) = \emptyset$:
|
|
190 |
|
|
191 |
\begin{align}
|
|
192 |
z(\ZERO) &\dn true\\
|
|
193 |
z(\ONE) &\dn false\\
|
|
194 |
z(c) &\dn false\\
|
|
195 |
z(r_1 + r_2) &\dn z(r_1) \wedge z(r_2)\\
|
|
196 |
z(r_1 \cdot r_2) &\dn z(r_1) \vee z(r_2)\\
|
|
197 |
z(r^*) &\dn false
|
|
198 |
\end{align}\bigskip
|
|
199 |
|
|
200 |
\textbf{atmostempty}: The property is ``either $L(r) = \emptyset$ or $L(r) = \{[]\}$'', which
|
|
201 |
is more formally $a(r) \;\text{iff}\; L(r) \subseteq \{[]\}$:
|
|
202 |
|
|
203 |
\begin{align}
|
|
204 |
a(\ZERO) &\dn true\\
|
|
205 |
a(\ONE) &\dn true\\
|
|
206 |
a(c) &\dn false\\
|
|
207 |
a(r_1 + r_2) &\dn a(r_1) \wedge a(r_2)\\
|
|
208 |
a(r_1 \cdot r_2) &\dn z(r_1) \vee z(r_2) \vee (a(r_1) \wedge a(r_2))\\
|
|
209 |
a(r^*) &\dn a(r)
|
|
210 |
\end{align}
|
|
211 |
|
|
212 |
For this it is good to remember the regex should either not
|
|
213 |
match anything at all, or just the empty string.\bigskip
|
|
214 |
|
|
215 |
\textbf{somechars}: The property is ``$L(r)$ must contain a string which is not the empty string'', which
|
|
216 |
is more formally $s(r) \;\text{iff}\; \exists\,s. s \not= [] \wedge s \in L(r)$:
|
|
217 |
|
|
218 |
\begin{align}
|
|
219 |
s(\ZERO) &\dn false\\
|
|
220 |
s(\ONE) &\dn false\\
|
|
221 |
s(c) &\dn true\\
|
|
222 |
s(r_1 + r_2) &\dn s(r_1) \vee s(r_2)\\
|
|
223 |
s(r_1 \cdot r_2) &\dn (\neg z(r_1) \wedge s(r_2)) \;\vee\; (\neg z(r_2) \wedge s(r_1))\\
|
|
224 |
s(r^*) &\dn s(r)
|
|
225 |
\end{align}
|
|
226 |
|
|
227 |
Here the interesting case is $r_1 \cdot r_2$ where one essentially has to make sure
|
|
228 |
that one of the regexes is not zeroable, because then the resulting regex cannot match any
|
|
229 |
string.\bigskip
|
|
230 |
|
|
231 |
\textbf{infinitestrings}: The property is
|
|
232 |
$i(r) \;\text{iff}\; L(r)\;\text{is infinite}$:
|
|
233 |
|
|
234 |
\begin{align}
|
|
235 |
i(\ZERO) &\dn false\\
|
|
236 |
i(\ONE) &\dn false\\
|
|
237 |
i(c) &\dn false\\
|
|
238 |
i(r_1 + r_2) &\dn i(r_1) \vee i(r_2)\\
|
|
239 |
i(r_1 \cdot r_2) &\dn (\neg z(r_1) \wedge i(r_2)) \;\vee\; (\neg z(r_2) \wedge i(r_1))\\
|
|
240 |
i(r^*) &\dn \neg a(r)
|
|
241 |
\end{align}
|
|
242 |
|
|
243 |
Here the interesting bit is that as soon $r$ can match at least a single string, then $r^*$
|
|
244 |
will match infinitely many strings.
|
|
245 |
}
|
|
246 |
|
498
|
247 |
|
892
|
248 |
\item There are two kinds of automata that are generated for
|
843
|
249 |
regular expression matching---DFAs and NFAs. (1) Regular expression engines like
|
|
250 |
the one in Python generate NFAs. Explain what is the problem with such
|
|
251 |
NFAs and what is the reason why they use NFAs. (2) Regular expression
|
|
252 |
engines like the one in Rust generate DFAs. Explain what is the
|
|
253 |
problem with these regex engines and also what is the problem with $a^{\{1000\}}$
|
|
254 |
in these engines.
|
|
255 |
|
146
Christian Urban <christian dot urban at kcl dot ac dot uk>
diff
changeset
|
256 |
%\item (Optional) The tokenizer in \texttt{regexp3.scala} takes as
|
Christian Urban <christian dot urban at kcl dot ac dot uk>
diff
changeset
|
257 |
%argument a string and a list of rules. The result is a list of tokens. Improve this tokenizer so
|
Christian Urban <christian dot urban at kcl dot ac dot uk>
diff
changeset
|
258 |
%that it filters out all comments and whitespace from the result.
|
Christian Urban <christian dot urban at kcl dot ac dot uk>
diff
changeset
|
259 |
|
Christian Urban <christian dot urban at kcl dot ac dot uk>
diff
changeset
|
260 |
%\item (Optional) Modify the tokenizer in \texttt{regexp2.scala} so that it
|
Christian Urban <christian dot urban at kcl dot ac dot uk>
diff
changeset
|
261 |
%implements the \texttt{findAll} function. This function takes a regular
|
Christian Urban <christian dot urban at kcl dot ac dot uk>
diff
changeset
|
262 |
%expressions and a string, and returns all substrings in this string that
|
444
Christian Urban <christian dot urban at kcl dot ac dot uk>
diff
changeset
|
263 |
%match the regular expression.
|
Christian Urban <christian dot urban at kcl dot ac dot uk>
diff
changeset
|
264 |
|
Christian Urban <christian dot urban at kcl dot ac dot uk>
diff
changeset
|
265 |
\item \POSTSCRIPT
|
31
|
266 |
\end{enumerate}
|
|
267 |
|
|
268 |
|
|
269 |
\end{document}
|
|
270 |
|
|
271 |
%%% Local Variables:
|
|
272 |
%%% mode: latex
|
|
273 |
%%% TeX-master: t
|
|
274 |
%%% End:
|