| author | Christian Urban <christian.urban@kcl.ac.uk> | 
| Thu, 05 Oct 2023 10:31:05 +0100 | |
| changeset 938 | 0eb340948fdb | 
| parent 916 | 2ab96407f350 | 
| child 942 | 7f52427568ff | 
| permissions | -rw-r--r-- | 
| 31 | 1  | 
\documentclass{article}
 | 
| 
264
 
4deef8ac5d72
uodated hws
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
166 
diff
changeset
 | 
2  | 
\usepackage{../style}
 | 
| 
292
 
7ed2a25dd115
updated
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
267 
diff
changeset
 | 
3  | 
\usepackage{../graphics}
 | 
| 
146
 
9da175d5eb63
added new hws
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
102 
diff
changeset
 | 
4  | 
|
| 31 | 5  | 
\begin{document}
 | 
6  | 
||
7  | 
\section*{Homework 4}
 | 
|
8  | 
||
| 916 | 9  | 
%%\HEADER  | 
| 
347
 
22b5294daa2a
updated hws
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
294 
diff
changeset
 | 
10  | 
|
| 31 | 11  | 
\begin{enumerate}
 | 
| 34 | 12  | 
|
| 726 | 13  | 
\item Given the regular expressions  | 
14  | 
||
15  | 
\begin{center}
 | 
|
16  | 
\begin{tabular}{ll}    
 | 
|
17  | 
1) & $(ab + a)\cdot (\ONE + b)$\\  | 
|
18  | 
2) & $(aa + a)^*$\\  | 
|
19  | 
\end{tabular}
 | 
|
20  | 
\end{center}
 | 
|
21  | 
||
22  | 
there are several values for how these regular expressions can  | 
|
23  | 
recognise the strings (for 1) $ab$ and (for 2) $aaa$. Give in each case  | 
|
24  | 
\emph{all} the values and indicate which one is the POSIX value.
 | 
|
25  | 
||
26  | 
||
| 
444
 
3056a4c071b0
updated
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
401 
diff
changeset
 | 
27  | 
\item If a regular expression $r$ does not contain any occurrence of $\ZERO$,  | 
| 893 | 28  | 
is it possible for $L(r)$ to be empty? Explain why, or give a proof.  | 
29  | 
||
30  | 
  \solution{
 | 
|
31  | 
The property to prove is  | 
|
32  | 
||
33  | 
    \begin{center}
 | 
|
34  | 
$P(r)$: If $r$ does not contain $\ZERO$, then $L(r) \not= \emptyset$.  | 
|
35  | 
  \end{center}
 | 
|
36  | 
||
37  | 
For this you have to now go through all cases.  | 
|
38  | 
||
39  | 
Case $r = 0$: $P(\ZERO)$ says: If $\ZERO$ does not contain $\ZERO$  | 
|
40  | 
then \ldots. The premise is obviously false, so everything follows,  | 
|
41  | 
in particular $L(r) \not= \emptyset$.\medskip  | 
|
42  | 
||
43  | 
Case $r = \ONE$ and $r = c$ are similar, just that the premise is  | 
|
44  | 
true, but also $L(\ONE)$ and $L(c)$ are not empty. So we shown  | 
|
45  | 
$L(r) \not= \emptyset$.\medskip  | 
|
46  | 
||
47  | 
Case $r = r_1 + r_2$: We know $P(r_1)$ and $P(r_2)$ as IHs. We need to show  | 
|
48  | 
$P(r_1 + r_2)$: If $r_1 + r_2$ does not contain $\ZERO$, then $L(r_1 + r_2) \not= \emptyset$.  | 
|
49  | 
||
50  | 
If $r_1 + r_2$ does not contain $\ZERO$, then also $r_1$ does not contain $\ZERO$  | 
|
51  | 
and $r_2$ does not contain $\ZERO$. So we can apply the two IHs $P(r_1)$ and $P(r_2)$,  | 
|
52  | 
which allow us to infer that $L(r_1) \not= \emptyset$ and $L(r_2) \not= \emptyset$.  | 
|
53  | 
But if this is the case, then also $L(r_1 + r_2) \not= \emptyset$, which is what we needed  | 
|
54  | 
to show in this case.\medskip  | 
|
55  | 
||
56  | 
The other cases are similar.\bigskip  | 
|
57  | 
||
58  | 
||
59  | 
This lemma essentially says that for basic regular expressions, if  | 
|
60  | 
they do not match anything at all, they must contain $\ZERO$(s)  | 
|
61  | 
somewhere.  | 
|
62  | 
||
63  | 
}  | 
|
| 32 | 64  | 
|
| 
264
 
4deef8ac5d72
uodated hws
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
166 
diff
changeset
 | 
65  | 
\item Define the tokens and regular expressions for a language  | 
| 
267
 
a1544b804d1e
updated homeworks
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
264 
diff
changeset
 | 
66  | 
consisting of numbers, left-parenthesis $($, right-parenthesis $)$,  | 
| 
 
a1544b804d1e
updated homeworks
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
264 
diff
changeset
 | 
67  | 
identifiers and the operations $+$, $-$ and $*$. Can the following  | 
| 
 
a1544b804d1e
updated homeworks
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
264 
diff
changeset
 | 
68  | 
strings in this language be lexed?  | 
| 
264
 
4deef8ac5d72
uodated hws
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
166 
diff
changeset
 | 
69  | 
|
| 
267
 
a1544b804d1e
updated homeworks
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
264 
diff
changeset
 | 
70  | 
  \begin{itemize}
 | 
| 
264
 
4deef8ac5d72
uodated hws
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
166 
diff
changeset
 | 
71  | 
\item $(a + 3) * b$  | 
| 
 
4deef8ac5d72
uodated hws
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
166 
diff
changeset
 | 
72  | 
\item $)()++ -33$  | 
| 
 
4deef8ac5d72
uodated hws
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
166 
diff
changeset
 | 
73  | 
\item $(a / 3) * 3$  | 
| 
267
 
a1544b804d1e
updated homeworks
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
264 
diff
changeset
 | 
74  | 
  \end{itemize}
 | 
| 
264
 
4deef8ac5d72
uodated hws
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
166 
diff
changeset
 | 
75  | 
|
| 
267
 
a1544b804d1e
updated homeworks
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
264 
diff
changeset
 | 
76  | 
In case they can, can you give the corresponding token  | 
| 
 
a1544b804d1e
updated homeworks
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
264 
diff
changeset
 | 
77  | 
sequences.  | 
| 
264
 
4deef8ac5d72
uodated hws
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
166 
diff
changeset
 | 
78  | 
|
| 768 | 79  | 
\item Assume $r$ is nullable. Show that  | 
80  | 
\[ 1 + r + r\cdot r \;\equiv\; r\cdot r  | 
|
81  | 
\]  | 
|
82  | 
||
83  | 
holds.  | 
|
84  | 
||
| 893 | 85  | 
  \solution{
 | 
86  | 
If $r$ is nullable, then $1 + r \equiv r$. With this you can replace  | 
|
87  | 
||
88  | 
    \begin{align}
 | 
|
89  | 
(1 + r) + r\cdot r & \equiv r + r\cdot r\\  | 
|
90  | 
& \equiv r \cdot (1 + r)\\  | 
|
91  | 
& \equiv r \cdot r  | 
|
92  | 
    \end{align}  
 | 
|
93  | 
||
94  | 
where in (2) you pull out the ``factor'' $r$ (because $r_1 \cdot (r_2 + r_3) \equiv r_1 \cdot r_2 + r_1 \cdot r_3$).  | 
|
95  | 
}  | 
|
96  | 
||
97  | 
||
| 938 | 98  | 
%\item \textbf{(Deleted)} Assume that $s^{-1}$ stands for the operation of reversing a
 | 
99  | 
%  string $s$. Given the following \emph{reversing} function on regular
 | 
|
100  | 
% expressions  | 
|
101  | 
%  | 
|
102  | 
%  \begin{center}
 | 
|
103  | 
%    \begin{tabular}{r@{\hspace{1mm}}c@{\hspace{1mm}}l}
 | 
|
104  | 
% $rev(\ZERO)$ & $\dn$ & $\ZERO$\\  | 
|
105  | 
% $rev(\ONE)$ & $\dn$ & $\ONE$\\  | 
|
106  | 
% $rev(c)$ & $\dn$ & $c$\\  | 
|
107  | 
% $rev(r_1 + r_2)$ & $\dn$ & $rev(r_1) + rev(r_2)$\\  | 
|
108  | 
% $rev(r_1 \cdot r_2)$ & $\dn$ & $rev(r_2) \cdot rev(r_1)$\\  | 
|
109  | 
% $rev(r^*)$ & $\dn$ & $rev(r)^*$\\  | 
|
110  | 
%    \end{tabular}
 | 
|
111  | 
%  \end{center}
 | 
|
112  | 
%  | 
|
113  | 
% and the set  | 
|
114  | 
%  | 
|
115  | 
%  \begin{center}
 | 
|
116  | 
%    $Rev\,A \dn \{s^{-1} \;|\; s \in A\}$
 | 
|
117  | 
%  \end{center}
 | 
|
118  | 
%  | 
|
119  | 
% prove whether  | 
|
120  | 
%  | 
|
121  | 
%  \begin{center}
 | 
|
122  | 
% $L(rev(r)) = Rev (L(r))$  | 
|
123  | 
%  \end{center}
 | 
|
124  | 
%  | 
|
125  | 
% holds.  | 
|
| 34 | 126  | 
|
| 938 | 127  | 
\item Construct a regular expression that can validate passwords. A  | 
128  | 
password should be at least 8 characters long and consist of upper-  | 
|
129  | 
and lower-case letters and digits. It should contain at least a  | 
|
130  | 
single lower-case letter, at least a single upper-case letter and at  | 
|
131  | 
least a single digit. If possible ise the intersection regular  | 
|
132  | 
expression from CW1, written $\_\&\_$, the bounded regular  | 
|
133  | 
expressions; you can also assume a regular expression written  | 
|
134  | 
  \texttt{ALL} that can match any character.
 | 
|
| 31 | 135  | 
|
| 938 | 136  | 
  \solution{
 | 
137  | 
You can build-up the different constraints separately and then  | 
|
138  | 
use the intersection operator:  | 
|
| 32 | 139  | 
|
| 938 | 140  | 
  \begin{center}  
 | 
141  | 
  \begin{tabular}{lll}  
 | 
|
142  | 
    $ALL^{\{8..\}}$ & \;\&\; & $(ALL^*\cdot [a-z]\cdot ALL^*)$\\
 | 
|
143  | 
& \;\&\; & $(ALL^*\cdot [A-Z]\cdot ALL^*)$\\  | 
|
144  | 
& \;\&\; & $(ALL^*\cdot [0-9]\cdot ALL^*)$\\  | 
|
145  | 
  \end{tabular}
 | 
|
| 
267
 
a1544b804d1e
updated homeworks
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
264 
diff
changeset
 | 
146  | 
  \end{center}
 | 
| 938 | 147  | 
}  | 
148  | 
||
| 
401
 
5d85dc9779b1
updated
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
359 
diff
changeset
 | 
149  | 
\item Assume the delimiters for comments are  | 
| 
 
5d85dc9779b1
updated
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
359 
diff
changeset
 | 
150  | 
      \texttt{$\slash$*} and \texttt{*$\slash$}. Give a
 | 
| 
 
5d85dc9779b1
updated
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
359 
diff
changeset
 | 
151  | 
regular expression that can recognise comments of the  | 
| 
 
5d85dc9779b1
updated
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
359 
diff
changeset
 | 
152  | 
form  | 
| 42 | 153  | 
|
| 
267
 
a1544b804d1e
updated homeworks
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
264 
diff
changeset
 | 
154  | 
  \begin{center}
 | 
| 
 
a1544b804d1e
updated homeworks
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
264 
diff
changeset
 | 
155  | 
    \texttt{$\slash$*~\ldots{}~*$\slash$} 
 | 
| 
 
a1544b804d1e
updated homeworks
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
264 
diff
changeset
 | 
156  | 
  \end{center}
 | 
| 42 | 157  | 
|
| 
401
 
5d85dc9779b1
updated
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
359 
diff
changeset
 | 
158  | 
where the three dots stand for arbitrary characters, but  | 
| 
 
5d85dc9779b1
updated
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
359 
diff
changeset
 | 
159  | 
not comment delimiters. (Hint: You can assume you are  | 
| 
 
5d85dc9779b1
updated
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
359 
diff
changeset
 | 
160  | 
      already given a regular expression written \texttt{ALL},
 | 
| 
 
5d85dc9779b1
updated
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
359 
diff
changeset
 | 
161  | 
that can recognise any character, and a regular  | 
| 
 
5d85dc9779b1
updated
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
359 
diff
changeset
 | 
162  | 
      expression \texttt{NOT} that recognises the complement
 | 
| 
 
5d85dc9779b1
updated
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
359 
diff
changeset
 | 
163  | 
of a regular expression.)  | 
| 
355
 
a259eec25156
updated
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
347 
diff
changeset
 | 
164  | 
|
| 
 
a259eec25156
updated
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
347 
diff
changeset
 | 
165  | 
\item Simplify the regular expression  | 
| 42 | 166  | 
|
| 
355
 
a259eec25156
updated
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
347 
diff
changeset
 | 
167  | 
\[  | 
| 
401
 
5d85dc9779b1
updated
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
359 
diff
changeset
 | 
168  | 
(\ZERO \cdot (b \cdot c)) +  | 
| 
 
5d85dc9779b1
updated
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
359 
diff
changeset
 | 
169  | 
((\ZERO \cdot c) + \ONE)  | 
| 
355
 
a259eec25156
updated
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
347 
diff
changeset
 | 
170  | 
\]  | 
| 
 
a259eec25156
updated
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
347 
diff
changeset
 | 
171  | 
|
| 
401
 
5d85dc9779b1
updated
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
359 
diff
changeset
 | 
172  | 
Does simplification always preserve the meaning of a  | 
| 
 
5d85dc9779b1
updated
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
359 
diff
changeset
 | 
173  | 
regular expression?  | 
| 
355
 
a259eec25156
updated
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
347 
diff
changeset
 | 
174  | 
|
| 
401
 
5d85dc9779b1
updated
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
359 
diff
changeset
 | 
175  | 
\item The Sulzmann \& Lu algorithm contains the function  | 
| 
 
5d85dc9779b1
updated
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
359 
diff
changeset
 | 
176  | 
$mkeps$ which answers how a regular expression can match  | 
| 
 
5d85dc9779b1
updated
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
359 
diff
changeset
 | 
177  | 
the empty string. What is the answer of $mkeps$ for the  | 
| 
355
 
a259eec25156
updated
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
347 
diff
changeset
 | 
178  | 
regular expressions:  | 
| 
146
 
9da175d5eb63
added new hws
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
102 
diff
changeset
 | 
179  | 
|
| 
355
 
a259eec25156
updated
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
347 
diff
changeset
 | 
180  | 
\[  | 
| 
 
a259eec25156
updated
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
347 
diff
changeset
 | 
181  | 
  \begin{array}{l}
 | 
| 
401
 
5d85dc9779b1
updated
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
359 
diff
changeset
 | 
182  | 
(\ZERO \cdot (b \cdot c)) +  | 
| 
 
5d85dc9779b1
updated
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
359 
diff
changeset
 | 
183  | 
((\ZERO \cdot c) + \ONE)\\  | 
| 577 | 184  | 
(a + \ONE) \cdot (\ONE + \ONE)\\  | 
185  | 
a^*  | 
|
| 
355
 
a259eec25156
updated
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
347 
diff
changeset
 | 
186  | 
  \end{array}
 | 
| 
 
a259eec25156
updated
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
347 
diff
changeset
 | 
187  | 
\]  | 
| 
 
a259eec25156
updated
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
347 
diff
changeset
 | 
188  | 
|
| 
401
 
5d85dc9779b1
updated
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
359 
diff
changeset
 | 
189  | 
\item What is the purpose of the record regular expression in  | 
| 
 
5d85dc9779b1
updated
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
359 
diff
changeset
 | 
190  | 
the Sulzmann \& Lu algorithm?  | 
| 498 | 191  | 
|
| 843 | 192  | 
\item Recall the functions \textit{nullable} and
 | 
193  | 
      \textit{zeroable}.  Define recursive functions
 | 
|
194  | 
      \textit{atmostempty} (for regular expressions that match no
 | 
|
195  | 
      string or only the empty string), \textit{somechars} (for
 | 
|
196  | 
regular expressions that match some non-empty string),  | 
|
197  | 
      \textit{infinitestrings} (for regular expressions that can match
 | 
|
198  | 
infinitely many strings).  | 
|
| 893 | 199  | 
|
200  | 
      \solution{
 | 
|
201  | 
        \textbf{zeroable}: The property is $z(r) \;\text{iff}\; L(r) = \emptyset$:
 | 
|
202  | 
||
203  | 
        \begin{align}
 | 
|
204  | 
z(\ZERO) &\dn true\\  | 
|
205  | 
z(\ONE) &\dn false\\  | 
|
206  | 
z(c) &\dn false\\  | 
|
207  | 
z(r_1 + r_2) &\dn z(r_1) \wedge z(r_2)\\  | 
|
208  | 
z(r_1 \cdot r_2) &\dn z(r_1) \vee z(r_2)\\  | 
|
209  | 
z(r^*) &\dn false  | 
|
210  | 
        \end{align}\bigskip
 | 
|
211  | 
||
212  | 
        \textbf{atmostempty}: The property is ``either $L(r) = \emptyset$ or $L(r) = \{[]\}$'', which
 | 
|
213  | 
        is more formally $a(r) \;\text{iff}\; L(r) \subseteq \{[]\}$:
 | 
|
214  | 
||
215  | 
        \begin{align}
 | 
|
216  | 
a(\ZERO) &\dn true\\  | 
|
217  | 
a(\ONE) &\dn true\\  | 
|
218  | 
a(c) &\dn false\\  | 
|
219  | 
a(r_1 + r_2) &\dn a(r_1) \wedge a(r_2)\\  | 
|
220  | 
a(r_1 \cdot r_2) &\dn z(r_1) \vee z(r_2) \vee (a(r_1) \wedge a(r_2))\\  | 
|
221  | 
a(r^*) &\dn a(r)  | 
|
222  | 
        \end{align}
 | 
|
223  | 
||
224  | 
For this it is good to remember the regex should either not  | 
|
225  | 
match anything at all, or just the empty string.\bigskip  | 
|
226  | 
||
227  | 
        \textbf{somechars}: The property is ``$L(r)$ must contain a string which is not the empty string'', which
 | 
|
228  | 
        is more formally $s(r) \;\text{iff}\; \exists\,s. s \not= [] \wedge s \in L(r)$:
 | 
|
229  | 
||
230  | 
        \begin{align}
 | 
|
231  | 
s(\ZERO) &\dn false\\  | 
|
232  | 
s(\ONE) &\dn false\\  | 
|
233  | 
s(c) &\dn true\\  | 
|
234  | 
s(r_1 + r_2) &\dn s(r_1) \vee s(r_2)\\  | 
|
235  | 
s(r_1 \cdot r_2) &\dn (\neg z(r_1) \wedge s(r_2)) \;\vee\; (\neg z(r_2) \wedge s(r_1))\\  | 
|
236  | 
s(r^*) &\dn s(r)  | 
|
237  | 
        \end{align}
 | 
|
238  | 
||
239  | 
Here the interesting case is $r_1 \cdot r_2$ where one essentially has to make sure  | 
|
240  | 
that one of the regexes is not zeroable, because then the resulting regex cannot match any  | 
|
241  | 
string.\bigskip  | 
|
242  | 
||
243  | 
        \textbf{infinitestrings}: The property is
 | 
|
244  | 
        $i(r) \;\text{iff}\; L(r)\;\text{is infinite}$:
 | 
|
245  | 
||
246  | 
        \begin{align}
 | 
|
247  | 
i(\ZERO) &\dn false\\  | 
|
248  | 
i(\ONE) &\dn false\\  | 
|
249  | 
i(c) &\dn false\\  | 
|
250  | 
i(r_1 + r_2) &\dn i(r_1) \vee i(r_2)\\  | 
|
251  | 
i(r_1 \cdot r_2) &\dn (\neg z(r_1) \wedge i(r_2)) \;\vee\; (\neg z(r_2) \wedge i(r_1))\\  | 
|
252  | 
i(r^*) &\dn \neg a(r)  | 
|
253  | 
        \end{align}
 | 
|
254  | 
||
255  | 
Here the interesting bit is that as soon $r$ can match at least a single string, then $r^*$  | 
|
256  | 
will match infinitely many strings.  | 
|
257  | 
}  | 
|
258  | 
||
| 498 | 259  | 
|
| 892 | 260  | 
\item There are two kinds of automata that are generated for  | 
| 843 | 261  | 
regular expression matching---DFAs and NFAs. (1) Regular expression engines like  | 
262  | 
the one in Python generate NFAs. Explain what is the problem with such  | 
|
263  | 
NFAs and what is the reason why they use NFAs. (2) Regular expression  | 
|
264  | 
engines like the one in Rust generate DFAs. Explain what is the  | 
|
265  | 
  problem with these regex engines and also what is the problem with $a^{\{1000\}}$
 | 
|
266  | 
in these engines.  | 
|
267  | 
||
| 
146
 
9da175d5eb63
added new hws
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
102 
diff
changeset
 | 
268  | 
%\item (Optional) The tokenizer in \texttt{regexp3.scala} takes as
 | 
| 
 
9da175d5eb63
added new hws
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
102 
diff
changeset
 | 
269  | 
%argument a string and a list of rules. The result is a list of tokens. Improve this tokenizer so  | 
| 
 
9da175d5eb63
added new hws
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
102 
diff
changeset
 | 
270  | 
%that it filters out all comments and whitespace from the result.  | 
| 
 
9da175d5eb63
added new hws
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
102 
diff
changeset
 | 
271  | 
|
| 
 
9da175d5eb63
added new hws
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
102 
diff
changeset
 | 
272  | 
%\item (Optional) Modify the tokenizer in \texttt{regexp2.scala} so that it
 | 
| 
 
9da175d5eb63
added new hws
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
102 
diff
changeset
 | 
273  | 
%implements the \texttt{findAll} function. This function takes a regular
 | 
| 
 
9da175d5eb63
added new hws
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
102 
diff
changeset
 | 
274  | 
%expressions and a string, and returns all substrings in this string that  | 
| 
444
 
3056a4c071b0
updated
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
401 
diff
changeset
 | 
275  | 
%match the regular expression.  | 
| 
 
3056a4c071b0
updated
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
401 
diff
changeset
 | 
276  | 
|
| 
 
3056a4c071b0
updated
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
401 
diff
changeset
 | 
277  | 
\item \POSTSCRIPT  | 
| 31 | 278  | 
\end{enumerate}
 | 
279  | 
||
280  | 
||
281  | 
\end{document}
 | 
|
282  | 
||
283  | 
%%% Local Variables:  | 
|
284  | 
%%% mode: latex  | 
|
285  | 
%%% TeX-master: t  | 
|
286  | 
%%% End:  |