author | Christian Urban <christian.urban@kcl.ac.uk> |
Fri, 29 Nov 2024 18:59:32 +0000 | |
changeset 976 | e9eac62928f5 |
parent 943 | 5365ef60707e |
permissions | -rw-r--r-- |
31 | 1 |
\documentclass{article} |
264
4deef8ac5d72
uodated hws
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents:
166
diff
changeset
|
2 |
\usepackage{../style} |
292
7ed2a25dd115
updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents:
267
diff
changeset
|
3 |
\usepackage{../graphics} |
146
9da175d5eb63
added new hws
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents:
102
diff
changeset
|
4 |
|
31 | 5 |
\begin{document} |
6 |
||
7 |
\section*{Homework 4} |
|
8 |
||
916 | 9 |
%%\HEADER |
347
22b5294daa2a
updated hws
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents:
294
diff
changeset
|
10 |
|
31 | 11 |
\begin{enumerate} |
34 | 12 |
|
726 | 13 |
\item Given the regular expressions |
14 |
||
15 |
\begin{center} |
|
16 |
\begin{tabular}{ll} |
|
17 |
1) & $(ab + a)\cdot (\ONE + b)$\\ |
|
18 |
2) & $(aa + a)^*$\\ |
|
19 |
\end{tabular} |
|
20 |
\end{center} |
|
21 |
||
22 |
there are several values for how these regular expressions can |
|
23 |
recognise the strings (for 1) $ab$ and (for 2) $aaa$. Give in each case |
|
24 |
\emph{all} the values and indicate which one is the POSIX value. |
|
943 | 25 |
|
26 |
\solution{ |
|
27 |
1) There are only 2 values (writing $a$ for $Char(a)$ and so on) |
|
28 |
||
29 |
\begin{center} |
|
30 |
\begin{tabular}{l} |
|
31 |
$Sequ(Left(Sequ(a,b)),Left(Empty))$\\ |
|
32 |
$Sequ(Right(a),Left(b))$\\ |
|
33 |
\end{tabular} |
|
34 |
\end{center} |
|
726 | 35 |
|
943 | 36 |
The first is the POSIX value because of the preference for $Left$. |
37 |
||
38 |
2) There are three ``main'' values, namely |
|
39 |
||
40 |
\begin{center} |
|
41 |
\begin{tabular}{l} |
|
42 |
$Stars\,[Left(Sequ(a,a)),Right(a)]$\\ |
|
43 |
$Stars\,[Right(a), Left(Sequ(a,a))]$\\ |
|
44 |
$Stars\,[Right(a), Right(a), Right(a)]$\\ |
|
45 |
\end{tabular} |
|
46 |
\end{center} |
|
47 |
||
48 |
Again the first one is the POSIX value, but if it just about all |
|
49 |
possible values, then there are in fact infinitely many values because |
|
50 |
the following |
|
51 |
||
52 |
\begin{center} |
|
53 |
\begin{tabular}{l} |
|
54 |
$Stars\,[Left(Sequ(a,a)),Empty,Right(a)]$\\ |
|
55 |
$Stars\,[Left(Sequ(a,a)),Empty,Empty,Right(a)]$\\ |
|
56 |
$Stars\,[Left(Sequ(a,a)),Empty,Right(a), Empty]$, \ldots\\ |
|
57 |
\end{tabular} |
|
58 |
\end{center} |
|
59 |
||
60 |
are also values for this regex and the string $aaa$. |
|
61 |
} |
|
726 | 62 |
|
444
3056a4c071b0
updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents:
401
diff
changeset
|
63 |
\item If a regular expression $r$ does not contain any occurrence of $\ZERO$, |
893 | 64 |
is it possible for $L(r)$ to be empty? Explain why, or give a proof. |
65 |
||
66 |
\solution{ |
|
943 | 67 |
No. The property to prove by induction is |
893 | 68 |
|
69 |
\begin{center} |
|
70 |
$P(r)$: If $r$ does not contain $\ZERO$, then $L(r) \not= \emptyset$. |
|
71 |
\end{center} |
|
72 |
||
73 |
For this you have to now go through all cases. |
|
74 |
||
75 |
Case $r = 0$: $P(\ZERO)$ says: If $\ZERO$ does not contain $\ZERO$ |
|
76 |
then \ldots. The premise is obviously false, so everything follows, |
|
77 |
in particular $L(r) \not= \emptyset$.\medskip |
|
78 |
||
79 |
Case $r = \ONE$ and $r = c$ are similar, just that the premise is |
|
80 |
true, but also $L(\ONE)$ and $L(c)$ are not empty. So we shown |
|
81 |
$L(r) \not= \emptyset$.\medskip |
|
82 |
||
83 |
Case $r = r_1 + r_2$: We know $P(r_1)$ and $P(r_2)$ as IHs. We need to show |
|
84 |
$P(r_1 + r_2)$: If $r_1 + r_2$ does not contain $\ZERO$, then $L(r_1 + r_2) \not= \emptyset$. |
|
85 |
||
86 |
If $r_1 + r_2$ does not contain $\ZERO$, then also $r_1$ does not contain $\ZERO$ |
|
87 |
and $r_2$ does not contain $\ZERO$. So we can apply the two IHs $P(r_1)$ and $P(r_2)$, |
|
88 |
which allow us to infer that $L(r_1) \not= \emptyset$ and $L(r_2) \not= \emptyset$. |
|
89 |
But if this is the case, then also $L(r_1 + r_2) \not= \emptyset$, which is what we needed |
|
90 |
to show in this case.\medskip |
|
91 |
||
92 |
The other cases are similar.\bigskip |
|
93 |
||
94 |
||
95 |
This lemma essentially says that for basic regular expressions, if |
|
96 |
they do not match anything at all, they must contain $\ZERO$(s) |
|
97 |
somewhere. |
|
98 |
||
99 |
} |
|
32 | 100 |
|
264
4deef8ac5d72
uodated hws
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents:
166
diff
changeset
|
101 |
\item Define the tokens and regular expressions for a language |
267
a1544b804d1e
updated homeworks
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents:
264
diff
changeset
|
102 |
consisting of numbers, left-parenthesis $($, right-parenthesis $)$, |
a1544b804d1e
updated homeworks
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents:
264
diff
changeset
|
103 |
identifiers and the operations $+$, $-$ and $*$. Can the following |
a1544b804d1e
updated homeworks
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents:
264
diff
changeset
|
104 |
strings in this language be lexed? |
264
4deef8ac5d72
uodated hws
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents:
166
diff
changeset
|
105 |
|
267
a1544b804d1e
updated homeworks
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents:
264
diff
changeset
|
106 |
\begin{itemize} |
264
4deef8ac5d72
uodated hws
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents:
166
diff
changeset
|
107 |
\item $(a + 3) * b$ |
4deef8ac5d72
uodated hws
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents:
166
diff
changeset
|
108 |
\item $)()++ -33$ |
4deef8ac5d72
uodated hws
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents:
166
diff
changeset
|
109 |
\item $(a / 3) * 3$ |
267
a1544b804d1e
updated homeworks
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents:
264
diff
changeset
|
110 |
\end{itemize} |
264
4deef8ac5d72
uodated hws
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents:
166
diff
changeset
|
111 |
|
267
a1544b804d1e
updated homeworks
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents:
264
diff
changeset
|
112 |
In case they can, can you give the corresponding token |
a1544b804d1e
updated homeworks
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents:
264
diff
changeset
|
113 |
sequences. |
264
4deef8ac5d72
uodated hws
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents:
166
diff
changeset
|
114 |
|
943 | 115 |
\solution{ |
116 |
The first 2 are lexibile. The 3 one contains $/$ which is not an operator. |
|
117 |
} |
|
118 |
||
768 | 119 |
\item Assume $r$ is nullable. Show that |
120 |
\[ 1 + r + r\cdot r \;\equiv\; r\cdot r |
|
121 |
\] |
|
122 |
||
123 |
holds. |
|
124 |
||
893 | 125 |
\solution{ |
126 |
If $r$ is nullable, then $1 + r \equiv r$. With this you can replace |
|
127 |
||
128 |
\begin{align} |
|
129 |
(1 + r) + r\cdot r & \equiv r + r\cdot r\\ |
|
130 |
& \equiv r \cdot (1 + r)\\ |
|
131 |
& \equiv r \cdot r |
|
132 |
\end{align} |
|
133 |
||
134 |
where in (2) you pull out the ``factor'' $r$ (because $r_1 \cdot (r_2 + r_3) \equiv r_1 \cdot r_2 + r_1 \cdot r_3$). |
|
135 |
} |
|
136 |
||
137 |
||
939 | 138 |
%\item \textbf{(Deleted)} Assume that $s^{-1}$ stands for the operation of reversing a |
139 |
% string $s$. Given the following \emph{reversing} function on regular |
|
140 |
% expressions |
|
141 |
% |
|
142 |
% \begin{center} |
|
143 |
% \begin{tabular}{r@{\hspace{1mm}}c@{\hspace{1mm}}l} |
|
144 |
% $rev(\ZERO)$ & $\dn$ & $\ZERO$\\ |
|
145 |
% $rev(\ONE)$ & $\dn$ & $\ONE$\\ |
|
146 |
% $rev(c)$ & $\dn$ & $c$\\ |
|
147 |
% $rev(r_1 + r_2)$ & $\dn$ & $rev(r_1) + rev(r_2)$\\ |
|
148 |
% $rev(r_1 \cdot r_2)$ & $\dn$ & $rev(r_2) \cdot rev(r_1)$\\ |
|
149 |
% $rev(r^*)$ & $\dn$ & $rev(r)^*$\\ |
|
150 |
% \end{tabular} |
|
151 |
% \end{center} |
|
152 |
% |
|
153 |
% and the set |
|
154 |
% |
|
155 |
% \begin{center} |
|
156 |
% $Rev\,A \dn \{s^{-1} \;|\; s \in A\}$ |
|
157 |
% \end{center} |
|
158 |
% |
|
159 |
% prove whether |
|
160 |
% |
|
161 |
% \begin{center} |
|
162 |
% $L(rev(r)) = Rev (L(r))$ |
|
163 |
% \end{center} |
|
164 |
% |
|
165 |
% holds. |
|
34 | 166 |
|
939 | 167 |
\item Construct a regular expression that can validate passwords. A |
168 |
password should be at least 8 characters long and consist of upper- |
|
169 |
and lower-case letters and digits. It should contain at least a |
|
170 |
single lower-case letter, at least a single upper-case letter and at |
|
943 | 171 |
least a single digit. If possible use the intersection regular |
172 |
expression from CW1, written $\_\&\_$, and the bounded regular |
|
939 | 173 |
expressions; you can also assume a regular expression written |
174 |
\texttt{ALL} that can match any character. |
|
31 | 175 |
|
939 | 176 |
\solution{ |
177 |
You can build-up the different constraints separately and then |
|
178 |
use the intersection operator: |
|
32 | 179 |
|
939 | 180 |
\begin{center} |
181 |
\begin{tabular}{lll} |
|
182 |
$ALL^{\{8..\}}$ & \;\&\; & $(ALL^*\cdot [a-z]\cdot ALL^*)$\\ |
|
183 |
& \;\&\; & $(ALL^*\cdot [A-Z]\cdot ALL^*)$\\ |
|
184 |
& \;\&\; & $(ALL^*\cdot [0-9]\cdot ALL^*)$\\ |
|
185 |
\end{tabular} |
|
267
a1544b804d1e
updated homeworks
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents:
264
diff
changeset
|
186 |
\end{center} |
943 | 187 |
|
188 |
$ALL$ could be represented as $\sim \ZERO$. |
|
939 | 189 |
} |
190 |
||
401
5d85dc9779b1
updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents:
359
diff
changeset
|
191 |
\item Assume the delimiters for comments are |
5d85dc9779b1
updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents:
359
diff
changeset
|
192 |
\texttt{$\slash$*} and \texttt{*$\slash$}. Give a |
5d85dc9779b1
updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents:
359
diff
changeset
|
193 |
regular expression that can recognise comments of the |
5d85dc9779b1
updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents:
359
diff
changeset
|
194 |
form |
42 | 195 |
|
267
a1544b804d1e
updated homeworks
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents:
264
diff
changeset
|
196 |
\begin{center} |
a1544b804d1e
updated homeworks
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents:
264
diff
changeset
|
197 |
\texttt{$\slash$*~\ldots{}~*$\slash$} |
a1544b804d1e
updated homeworks
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents:
264
diff
changeset
|
198 |
\end{center} |
42 | 199 |
|
401
5d85dc9779b1
updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents:
359
diff
changeset
|
200 |
where the three dots stand for arbitrary characters, but |
5d85dc9779b1
updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents:
359
diff
changeset
|
201 |
not comment delimiters. (Hint: You can assume you are |
5d85dc9779b1
updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents:
359
diff
changeset
|
202 |
already given a regular expression written \texttt{ALL}, |
5d85dc9779b1
updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents:
359
diff
changeset
|
203 |
that can recognise any character, and a regular |
5d85dc9779b1
updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents:
359
diff
changeset
|
204 |
expression \texttt{NOT} that recognises the complement |
5d85dc9779b1
updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents:
359
diff
changeset
|
205 |
of a regular expression.) |
943 | 206 |
|
207 |
\solution{ |
|
208 |
\[/ * \sim (ALL^* * / ALL^*) * /\] |
|
209 |
||
210 |
The idea to make sure in between $/ *$ and $* /$ ar no strings that contain $* /$. |
|
211 |
} |
|
212 |
||
355
a259eec25156
updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents:
347
diff
changeset
|
213 |
\item Simplify the regular expression |
42 | 214 |
|
355
a259eec25156
updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents:
347
diff
changeset
|
215 |
\[ |
401
5d85dc9779b1
updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents:
359
diff
changeset
|
216 |
(\ZERO \cdot (b \cdot c)) + |
5d85dc9779b1
updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents:
359
diff
changeset
|
217 |
((\ZERO \cdot c) + \ONE) |
355
a259eec25156
updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents:
347
diff
changeset
|
218 |
\] |
a259eec25156
updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents:
347
diff
changeset
|
219 |
|
401
5d85dc9779b1
updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents:
359
diff
changeset
|
220 |
Does simplification always preserve the meaning of a |
943 | 221 |
regular expression? |
222 |
||
223 |
\solution{ Yes, simplification preserves the language. It |
|
224 |
simplifies to just $\ONE$. It should be remembered that the |
|
225 |
Brzozowski does not simplify under stars. This does not apply |
|
226 |
in this example, though. } |
|
355
a259eec25156
updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents:
347
diff
changeset
|
227 |
|
401
5d85dc9779b1
updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents:
359
diff
changeset
|
228 |
\item The Sulzmann \& Lu algorithm contains the function |
5d85dc9779b1
updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents:
359
diff
changeset
|
229 |
$mkeps$ which answers how a regular expression can match |
5d85dc9779b1
updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents:
359
diff
changeset
|
230 |
the empty string. What is the answer of $mkeps$ for the |
355
a259eec25156
updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents:
347
diff
changeset
|
231 |
regular expressions: |
146
9da175d5eb63
added new hws
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents:
102
diff
changeset
|
232 |
|
355
a259eec25156
updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents:
347
diff
changeset
|
233 |
\[ |
a259eec25156
updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents:
347
diff
changeset
|
234 |
\begin{array}{l} |
401
5d85dc9779b1
updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents:
359
diff
changeset
|
235 |
(\ZERO \cdot (b \cdot c)) + |
5d85dc9779b1
updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents:
359
diff
changeset
|
236 |
((\ZERO \cdot c) + \ONE)\\ |
577 | 237 |
(a + \ONE) \cdot (\ONE + \ONE)\\ |
238 |
a^* |
|
355
a259eec25156
updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents:
347
diff
changeset
|
239 |
\end{array} |
943 | 240 |
\] |
241 |
||
242 |
\solution{ |
|
243 |
The values are |
|
244 |
\begin{center} |
|
245 |
\begin{tabular}{l} |
|
246 |
$Right(Right(Empty))$\\ |
|
247 |
$Sequ(Right(\ONE),Left(\ONE))$\\ |
|
248 |
$Stars\,[]$ |
|
249 |
\end{tabular} |
|
250 |
\end{center} |
|
251 |
||
252 |
The last one uses the rule that $mkeps$ for the star returns always $Star\,[]$. |
|
253 |
} |
|
355
a259eec25156
updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents:
347
diff
changeset
|
254 |
|
401
5d85dc9779b1
updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents:
359
diff
changeset
|
255 |
\item What is the purpose of the record regular expression in |
943 | 256 |
the Sulzmann \& Lu algorithm? |
257 |
||
258 |
\solution{ |
|
259 |
It marks a part of a regular expression and can be used to extract the part of the |
|
260 |
string that is matched by this marked part of the regular expression. |
|
261 |
} |
|
498 | 262 |
|
843 | 263 |
\item Recall the functions \textit{nullable} and |
264 |
\textit{zeroable}. Define recursive functions |
|
265 |
\textit{atmostempty} (for regular expressions that match no |
|
266 |
string or only the empty string), \textit{somechars} (for |
|
267 |
regular expressions that match some non-empty string), |
|
268 |
\textit{infinitestrings} (for regular expressions that can match |
|
269 |
infinitely many strings). |
|
893 | 270 |
|
271 |
\solution{ |
|
272 |
\textbf{zeroable}: The property is $z(r) \;\text{iff}\; L(r) = \emptyset$: |
|
273 |
||
274 |
\begin{align} |
|
275 |
z(\ZERO) &\dn true\\ |
|
276 |
z(\ONE) &\dn false\\ |
|
277 |
z(c) &\dn false\\ |
|
278 |
z(r_1 + r_2) &\dn z(r_1) \wedge z(r_2)\\ |
|
279 |
z(r_1 \cdot r_2) &\dn z(r_1) \vee z(r_2)\\ |
|
280 |
z(r^*) &\dn false |
|
281 |
\end{align}\bigskip |
|
282 |
||
283 |
\textbf{atmostempty}: The property is ``either $L(r) = \emptyset$ or $L(r) = \{[]\}$'', which |
|
284 |
is more formally $a(r) \;\text{iff}\; L(r) \subseteq \{[]\}$: |
|
285 |
||
286 |
\begin{align} |
|
287 |
a(\ZERO) &\dn true\\ |
|
288 |
a(\ONE) &\dn true\\ |
|
289 |
a(c) &\dn false\\ |
|
290 |
a(r_1 + r_2) &\dn a(r_1) \wedge a(r_2)\\ |
|
291 |
a(r_1 \cdot r_2) &\dn z(r_1) \vee z(r_2) \vee (a(r_1) \wedge a(r_2))\\ |
|
292 |
a(r^*) &\dn a(r) |
|
293 |
\end{align} |
|
294 |
||
295 |
For this it is good to remember the regex should either not |
|
296 |
match anything at all, or just the empty string.\bigskip |
|
297 |
||
298 |
\textbf{somechars}: The property is ``$L(r)$ must contain a string which is not the empty string'', which |
|
299 |
is more formally $s(r) \;\text{iff}\; \exists\,s. s \not= [] \wedge s \in L(r)$: |
|
300 |
||
301 |
\begin{align} |
|
302 |
s(\ZERO) &\dn false\\ |
|
303 |
s(\ONE) &\dn false\\ |
|
304 |
s(c) &\dn true\\ |
|
305 |
s(r_1 + r_2) &\dn s(r_1) \vee s(r_2)\\ |
|
306 |
s(r_1 \cdot r_2) &\dn (\neg z(r_1) \wedge s(r_2)) \;\vee\; (\neg z(r_2) \wedge s(r_1))\\ |
|
307 |
s(r^*) &\dn s(r) |
|
308 |
\end{align} |
|
309 |
||
310 |
Here the interesting case is $r_1 \cdot r_2$ where one essentially has to make sure |
|
311 |
that one of the regexes is not zeroable, because then the resulting regex cannot match any |
|
312 |
string.\bigskip |
|
313 |
||
314 |
\textbf{infinitestrings}: The property is |
|
315 |
$i(r) \;\text{iff}\; L(r)\;\text{is infinite}$: |
|
316 |
||
317 |
\begin{align} |
|
318 |
i(\ZERO) &\dn false\\ |
|
319 |
i(\ONE) &\dn false\\ |
|
320 |
i(c) &\dn false\\ |
|
321 |
i(r_1 + r_2) &\dn i(r_1) \vee i(r_2)\\ |
|
322 |
i(r_1 \cdot r_2) &\dn (\neg z(r_1) \wedge i(r_2)) \;\vee\; (\neg z(r_2) \wedge i(r_1))\\ |
|
323 |
i(r^*) &\dn \neg a(r) |
|
324 |
\end{align} |
|
325 |
||
943 | 326 |
Here the interesting bit is that as soon $r$ can match at least a single non-empty string, then $r^*$ |
893 | 327 |
will match infinitely many strings. |
328 |
} |
|
329 |
||
498 | 330 |
|
892 | 331 |
\item There are two kinds of automata that are generated for |
843 | 332 |
regular expression matching---DFAs and NFAs. (1) Regular expression engines like |
333 |
the one in Python generate NFAs. Explain what is the problem with such |
|
334 |
NFAs and what is the reason why they use NFAs. (2) Regular expression |
|
335 |
engines like the one in Rust generate DFAs. Explain what is the |
|
336 |
problem with these regex engines and also what is the problem with $a^{\{1000\}}$ |
|
337 |
in these engines. |
|
943 | 338 |
|
339 |
\solution{ |
|
340 |
Why they use NFAs? NFAs are of similar size as the regular expression (they do not explode |
|
341 |
for the basic regular expressions. Python regex library supports constructions like |
|
342 |
back-refernces which cannot be represented by DFAs (string matching with back-references |
|
343 |
can be NP. What is the problem with $a^{\{1000\}}$. When generating DFAs (and NFAs) for the |
|
344 |
bounded regular expressions, one has to make $n$ copies, which means their size can grow |
|
345 |
drastically for large counters. |
|
346 |
} |
|
843 | 347 |
|
146
9da175d5eb63
added new hws
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents:
102
diff
changeset
|
348 |
%\item (Optional) The tokenizer in \texttt{regexp3.scala} takes as |
9da175d5eb63
added new hws
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents:
102
diff
changeset
|
349 |
%argument a string and a list of rules. The result is a list of tokens. Improve this tokenizer so |
9da175d5eb63
added new hws
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents:
102
diff
changeset
|
350 |
%that it filters out all comments and whitespace from the result. |
9da175d5eb63
added new hws
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents:
102
diff
changeset
|
351 |
|
9da175d5eb63
added new hws
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents:
102
diff
changeset
|
352 |
%\item (Optional) Modify the tokenizer in \texttt{regexp2.scala} so that it |
9da175d5eb63
added new hws
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents:
102
diff
changeset
|
353 |
%implements the \texttt{findAll} function. This function takes a regular |
9da175d5eb63
added new hws
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents:
102
diff
changeset
|
354 |
%expressions and a string, and returns all substrings in this string that |
444
3056a4c071b0
updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents:
401
diff
changeset
|
355 |
%match the regular expression. |
3056a4c071b0
updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents:
401
diff
changeset
|
356 |
|
3056a4c071b0
updated
Christian Urban <christian dot urban at kcl dot ac dot uk>
parents:
401
diff
changeset
|
357 |
\item \POSTSCRIPT |
31 | 358 |
\end{enumerate} |
359 |
||
360 |
||
361 |
\end{document} |
|
362 |
||
363 |
%%% Local Variables: |
|
364 |
%%% mode: latex |
|
365 |
%%% TeX-master: t |
|
366 |
%%% End: |