20 \end{center} |
20 \end{center} |
21 |
21 |
22 there are several values for how these regular expressions can |
22 there are several values for how these regular expressions can |
23 recognise the strings (for 1) $ab$ and (for 2) $aaa$. Give in each case |
23 recognise the strings (for 1) $ab$ and (for 2) $aaa$. Give in each case |
24 \emph{all} the values and indicate which one is the POSIX value. |
24 \emph{all} the values and indicate which one is the POSIX value. |
|
25 |
|
26 \solution{ |
|
27 1) There are only 2 values (writing $a$ for $Char(a)$ and so on) |
|
28 |
|
29 \begin{center} |
|
30 \begin{tabular}{l} |
|
31 $Sequ(Left(Sequ(a,b)),Left(Empty))$\\ |
|
32 $Sequ(Right(a),Left(b))$\\ |
|
33 \end{tabular} |
|
34 \end{center} |
25 |
35 |
|
36 The first is the POSIX value because of the preference for $Left$. |
|
37 |
|
38 2) There are three ``main'' values, namely |
|
39 |
|
40 \begin{center} |
|
41 \begin{tabular}{l} |
|
42 $Stars\,[Left(Sequ(a,a)),Right(a)]$\\ |
|
43 $Stars\,[Right(a), Left(Sequ(a,a))]$\\ |
|
44 $Stars\,[Right(a), Right(a), Right(a)]$\\ |
|
45 \end{tabular} |
|
46 \end{center} |
|
47 |
|
48 Again the first one is the POSIX value, but if it just about all |
|
49 possible values, then there are in fact infinitely many values because |
|
50 the following |
|
51 |
|
52 \begin{center} |
|
53 \begin{tabular}{l} |
|
54 $Stars\,[Left(Sequ(a,a)),Empty,Right(a)]$\\ |
|
55 $Stars\,[Left(Sequ(a,a)),Empty,Empty,Right(a)]$\\ |
|
56 $Stars\,[Left(Sequ(a,a)),Empty,Right(a), Empty]$, \ldots\\ |
|
57 \end{tabular} |
|
58 \end{center} |
|
59 |
|
60 are also values for this regex and the string $aaa$. |
|
61 } |
26 |
62 |
27 \item If a regular expression $r$ does not contain any occurrence of $\ZERO$, |
63 \item If a regular expression $r$ does not contain any occurrence of $\ZERO$, |
28 is it possible for $L(r)$ to be empty? Explain why, or give a proof. |
64 is it possible for $L(r)$ to be empty? Explain why, or give a proof. |
29 |
65 |
30 \solution{ |
66 \solution{ |
31 The property to prove is |
67 No. The property to prove by induction is |
32 |
68 |
33 \begin{center} |
69 \begin{center} |
34 $P(r)$: If $r$ does not contain $\ZERO$, then $L(r) \not= \emptyset$. |
70 $P(r)$: If $r$ does not contain $\ZERO$, then $L(r) \not= \emptyset$. |
35 \end{center} |
71 \end{center} |
36 |
72 |
126 |
166 |
127 \item Construct a regular expression that can validate passwords. A |
167 \item Construct a regular expression that can validate passwords. A |
128 password should be at least 8 characters long and consist of upper- |
168 password should be at least 8 characters long and consist of upper- |
129 and lower-case letters and digits. It should contain at least a |
169 and lower-case letters and digits. It should contain at least a |
130 single lower-case letter, at least a single upper-case letter and at |
170 single lower-case letter, at least a single upper-case letter and at |
131 least a single digit. If possible ise the intersection regular |
171 least a single digit. If possible use the intersection regular |
132 expression from CW1, written $\_\&\_$, the bounded regular |
172 expression from CW1, written $\_\&\_$, and the bounded regular |
133 expressions; you can also assume a regular expression written |
173 expressions; you can also assume a regular expression written |
134 \texttt{ALL} that can match any character. |
174 \texttt{ALL} that can match any character. |
135 |
175 |
136 \solution{ |
176 \solution{ |
137 You can build-up the different constraints separately and then |
177 You can build-up the different constraints separately and then |
142 $ALL^{\{8..\}}$ & \;\&\; & $(ALL^*\cdot [a-z]\cdot ALL^*)$\\ |
182 $ALL^{\{8..\}}$ & \;\&\; & $(ALL^*\cdot [a-z]\cdot ALL^*)$\\ |
143 & \;\&\; & $(ALL^*\cdot [A-Z]\cdot ALL^*)$\\ |
183 & \;\&\; & $(ALL^*\cdot [A-Z]\cdot ALL^*)$\\ |
144 & \;\&\; & $(ALL^*\cdot [0-9]\cdot ALL^*)$\\ |
184 & \;\&\; & $(ALL^*\cdot [0-9]\cdot ALL^*)$\\ |
145 \end{tabular} |
185 \end{tabular} |
146 \end{center} |
186 \end{center} |
|
187 |
|
188 $ALL$ could be represented as $\sim \ZERO$. |
147 } |
189 } |
148 |
190 |
149 \item Assume the delimiters for comments are |
191 \item Assume the delimiters for comments are |
150 \texttt{$\slash$*} and \texttt{*$\slash$}. Give a |
192 \texttt{$\slash$*} and \texttt{*$\slash$}. Give a |
151 regular expression that can recognise comments of the |
193 regular expression that can recognise comments of the |
159 not comment delimiters. (Hint: You can assume you are |
201 not comment delimiters. (Hint: You can assume you are |
160 already given a regular expression written \texttt{ALL}, |
202 already given a regular expression written \texttt{ALL}, |
161 that can recognise any character, and a regular |
203 that can recognise any character, and a regular |
162 expression \texttt{NOT} that recognises the complement |
204 expression \texttt{NOT} that recognises the complement |
163 of a regular expression.) |
205 of a regular expression.) |
164 |
206 |
|
207 \solution{ |
|
208 \[/ * \sim (ALL^* * / ALL^*) * /\] |
|
209 |
|
210 The idea to make sure in between $/ *$ and $* /$ ar no strings that contain $* /$. |
|
211 } |
|
212 |
165 \item Simplify the regular expression |
213 \item Simplify the regular expression |
166 |
214 |
167 \[ |
215 \[ |
168 (\ZERO \cdot (b \cdot c)) + |
216 (\ZERO \cdot (b \cdot c)) + |
169 ((\ZERO \cdot c) + \ONE) |
217 ((\ZERO \cdot c) + \ONE) |
170 \] |
218 \] |
171 |
219 |
172 Does simplification always preserve the meaning of a |
220 Does simplification always preserve the meaning of a |
173 regular expression? |
221 regular expression? |
|
222 |
|
223 \solution{ Yes, simplification preserves the language. It |
|
224 simplifies to just $\ONE$. It should be remembered that the |
|
225 Brzozowski does not simplify under stars. This does not apply |
|
226 in this example, though. } |
174 |
227 |
175 \item The Sulzmann \& Lu algorithm contains the function |
228 \item The Sulzmann \& Lu algorithm contains the function |
176 $mkeps$ which answers how a regular expression can match |
229 $mkeps$ which answers how a regular expression can match |
177 the empty string. What is the answer of $mkeps$ for the |
230 the empty string. What is the answer of $mkeps$ for the |
178 regular expressions: |
231 regular expressions: |
182 (\ZERO \cdot (b \cdot c)) + |
235 (\ZERO \cdot (b \cdot c)) + |
183 ((\ZERO \cdot c) + \ONE)\\ |
236 ((\ZERO \cdot c) + \ONE)\\ |
184 (a + \ONE) \cdot (\ONE + \ONE)\\ |
237 (a + \ONE) \cdot (\ONE + \ONE)\\ |
185 a^* |
238 a^* |
186 \end{array} |
239 \end{array} |
187 \] |
240 \] |
|
241 |
|
242 \solution{ |
|
243 The values are |
|
244 \begin{center} |
|
245 \begin{tabular}{l} |
|
246 $Right(Right(Empty))$\\ |
|
247 $Sequ(Right(\ONE),Left(\ONE))$\\ |
|
248 $Stars\,[]$ |
|
249 \end{tabular} |
|
250 \end{center} |
|
251 |
|
252 The last one uses the rule that $mkeps$ for the star returns always $Star\,[]$. |
|
253 } |
188 |
254 |
189 \item What is the purpose of the record regular expression in |
255 \item What is the purpose of the record regular expression in |
190 the Sulzmann \& Lu algorithm? |
256 the Sulzmann \& Lu algorithm? |
|
257 |
|
258 \solution{ |
|
259 It marks a part of a regular expression and can be used to extract the part of the |
|
260 string that is matched by this marked part of the regular expression. |
|
261 } |
191 |
262 |
192 \item Recall the functions \textit{nullable} and |
263 \item Recall the functions \textit{nullable} and |
193 \textit{zeroable}. Define recursive functions |
264 \textit{zeroable}. Define recursive functions |
194 \textit{atmostempty} (for regular expressions that match no |
265 \textit{atmostempty} (for regular expressions that match no |
195 string or only the empty string), \textit{somechars} (for |
266 string or only the empty string), \textit{somechars} (for |
250 i(r_1 + r_2) &\dn i(r_1) \vee i(r_2)\\ |
321 i(r_1 + r_2) &\dn i(r_1) \vee i(r_2)\\ |
251 i(r_1 \cdot r_2) &\dn (\neg z(r_1) \wedge i(r_2)) \;\vee\; (\neg z(r_2) \wedge i(r_1))\\ |
322 i(r_1 \cdot r_2) &\dn (\neg z(r_1) \wedge i(r_2)) \;\vee\; (\neg z(r_2) \wedge i(r_1))\\ |
252 i(r^*) &\dn \neg a(r) |
323 i(r^*) &\dn \neg a(r) |
253 \end{align} |
324 \end{align} |
254 |
325 |
255 Here the interesting bit is that as soon $r$ can match at least a single string, then $r^*$ |
326 Here the interesting bit is that as soon $r$ can match at least a single non-empty string, then $r^*$ |
256 will match infinitely many strings. |
327 will match infinitely many strings. |
257 } |
328 } |
258 |
329 |
259 |
330 |
260 \item There are two kinds of automata that are generated for |
331 \item There are two kinds of automata that are generated for |
262 the one in Python generate NFAs. Explain what is the problem with such |
333 the one in Python generate NFAs. Explain what is the problem with such |
263 NFAs and what is the reason why they use NFAs. (2) Regular expression |
334 NFAs and what is the reason why they use NFAs. (2) Regular expression |
264 engines like the one in Rust generate DFAs. Explain what is the |
335 engines like the one in Rust generate DFAs. Explain what is the |
265 problem with these regex engines and also what is the problem with $a^{\{1000\}}$ |
336 problem with these regex engines and also what is the problem with $a^{\{1000\}}$ |
266 in these engines. |
337 in these engines. |
|
338 |
|
339 \solution{ |
|
340 Why they use NFAs? NFAs are of similar size as the regular expression (they do not explode |
|
341 for the basic regular expressions. Python regex library supports constructions like |
|
342 back-refernces which cannot be represented by DFAs (string matching with back-references |
|
343 can be NP. What is the problem with $a^{\{1000\}}$. When generating DFAs (and NFAs) for the |
|
344 bounded regular expressions, one has to make $n$ copies, which means their size can grow |
|
345 drastically for large counters. |
|
346 } |
267 |
347 |
268 %\item (Optional) The tokenizer in \texttt{regexp3.scala} takes as |
348 %\item (Optional) The tokenizer in \texttt{regexp3.scala} takes as |
269 %argument a string and a list of rules. The result is a list of tokens. Improve this tokenizer so |
349 %argument a string and a list of rules. The result is a list of tokens. Improve this tokenizer so |
270 %that it filters out all comments and whitespace from the result. |
350 %that it filters out all comments and whitespace from the result. |
271 |
351 |