| author | Christian Urban <christian.urban@kcl.ac.uk> | 
| Fri, 31 Oct 2025 11:25:14 +0000 | |
| changeset 1019 | f71399fe3fdc | 
| parent 1009 | 7fd1997bd14c | 
| permissions | -rw-r--r-- | 
| 31 | 1  | 
\documentclass{article}
 | 
| 
264
 
4deef8ac5d72
uodated hws
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
166 
diff
changeset
 | 
2  | 
\usepackage{../style}
 | 
| 
292
 
7ed2a25dd115
updated
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
267 
diff
changeset
 | 
3  | 
\usepackage{../graphics}
 | 
| 
146
 
9da175d5eb63
added new hws
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
102 
diff
changeset
 | 
4  | 
|
| 31 | 5  | 
\begin{document}
 | 
6  | 
||
7  | 
\section*{Homework 4}
 | 
|
8  | 
||
| 916 | 9  | 
%%\HEADER  | 
| 
347
 
22b5294daa2a
updated hws
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
294 
diff
changeset
 | 
10  | 
|
| 31 | 11  | 
\begin{enumerate}
 | 
| 34 | 12  | 
|
| 726 | 13  | 
\item Given the regular expressions  | 
14  | 
||
15  | 
\begin{center}
 | 
|
16  | 
\begin{tabular}{ll}    
 | 
|
17  | 
1) & $(ab + a)\cdot (\ONE + b)$\\  | 
|
18  | 
2) & $(aa + a)^*$\\  | 
|
19  | 
\end{tabular}
 | 
|
20  | 
\end{center}
 | 
|
21  | 
||
22  | 
there are several values for how these regular expressions can  | 
|
23  | 
recognise the strings (for 1) $ab$ and (for 2) $aaa$. Give in each case  | 
|
24  | 
\emph{all} the values and indicate which one is the POSIX value.
 | 
|
| 942 | 25  | 
|
26  | 
\solution{
 | 
|
27  | 
1) There are only 2 values (writing $a$ for $Char(a)$ and so on)  | 
|
28  | 
||
29  | 
  \begin{center}
 | 
|
30  | 
  \begin{tabular}{l}
 | 
|
31  | 
$Sequ(Left(Sequ(a,b)),Left(Empty))$\\  | 
|
32  | 
$Sequ(Right(a),Left(b))$\\  | 
|
33  | 
  \end{tabular}    
 | 
|
34  | 
  \end{center}
 | 
|
| 726 | 35  | 
|
| 942 | 36  | 
The first is the POSIX value because of the preference for $Left$.  | 
37  | 
||
38  | 
2) There are three ``main'' values, namely  | 
|
39  | 
||
40  | 
  \begin{center}
 | 
|
41  | 
  \begin{tabular}{l}
 | 
|
42  | 
$Stars\,[Left(Sequ(a,a)),Right(a)]$\\  | 
|
43  | 
$Stars\,[Right(a), Left(Sequ(a,a))]$\\  | 
|
44  | 
$Stars\,[Right(a), Right(a), Right(a)]$\\  | 
|
45  | 
  \end{tabular}    
 | 
|
46  | 
  \end{center}
 | 
|
47  | 
||
48  | 
Again the first one is the POSIX value, but if it just about all  | 
|
49  | 
possible values, then there are in fact infinitely many values because  | 
|
50  | 
the following  | 
|
51  | 
||
52  | 
  \begin{center}
 | 
|
53  | 
  \begin{tabular}{l}
 | 
|
54  | 
$Stars\,[Left(Sequ(a,a)),Empty,Right(a)]$\\  | 
|
55  | 
$Stars\,[Left(Sequ(a,a)),Empty,Empty,Right(a)]$\\  | 
|
56  | 
$Stars\,[Left(Sequ(a,a)),Empty,Right(a), Empty]$, \ldots\\  | 
|
57  | 
  \end{tabular}    
 | 
|
58  | 
  \end{center}
 | 
|
59  | 
||
60  | 
are also values for this regex and the string $aaa$.  | 
|
61  | 
}  | 
|
| 726 | 62  | 
|
| 
444
 
3056a4c071b0
updated
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
401 
diff
changeset
 | 
63  | 
\item If a regular expression $r$ does not contain any occurrence of $\ZERO$,  | 
| 893 | 64  | 
is it possible for $L(r)$ to be empty? Explain why, or give a proof.  | 
65  | 
||
66  | 
  \solution{
 | 
|
| 942 | 67  | 
No. The property to prove by induction is  | 
| 893 | 68  | 
|
69  | 
    \begin{center}
 | 
|
70  | 
$P(r)$: If $r$ does not contain $\ZERO$, then $L(r) \not= \emptyset$.  | 
|
71  | 
  \end{center}
 | 
|
72  | 
||
73  | 
For this you have to now go through all cases.  | 
|
74  | 
||
75  | 
Case $r = 0$: $P(\ZERO)$ says: If $\ZERO$ does not contain $\ZERO$  | 
|
76  | 
then \ldots. The premise is obviously false, so everything follows,  | 
|
77  | 
in particular $L(r) \not= \emptyset$.\medskip  | 
|
78  | 
||
79  | 
Case $r = \ONE$ and $r = c$ are similar, just that the premise is  | 
|
80  | 
true, but also $L(\ONE)$ and $L(c)$ are not empty. So we shown  | 
|
81  | 
$L(r) \not= \emptyset$.\medskip  | 
|
82  | 
||
83  | 
Case $r = r_1 + r_2$: We know $P(r_1)$ and $P(r_2)$ as IHs. We need to show  | 
|
84  | 
$P(r_1 + r_2)$: If $r_1 + r_2$ does not contain $\ZERO$, then $L(r_1 + r_2) \not= \emptyset$.  | 
|
85  | 
||
86  | 
If $r_1 + r_2$ does not contain $\ZERO$, then also $r_1$ does not contain $\ZERO$  | 
|
87  | 
and $r_2$ does not contain $\ZERO$. So we can apply the two IHs $P(r_1)$ and $P(r_2)$,  | 
|
88  | 
which allow us to infer that $L(r_1) \not= \emptyset$ and $L(r_2) \not= \emptyset$.  | 
|
89  | 
But if this is the case, then also $L(r_1 + r_2) \not= \emptyset$, which is what we needed  | 
|
90  | 
to show in this case.\medskip  | 
|
91  | 
||
92  | 
The other cases are similar.\bigskip  | 
|
93  | 
||
94  | 
||
95  | 
This lemma essentially says that for basic regular expressions, if  | 
|
96  | 
they do not match anything at all, they must contain $\ZERO$(s)  | 
|
97  | 
somewhere.  | 
|
98  | 
||
99  | 
}  | 
|
| 32 | 100  | 
|
| 
264
 
4deef8ac5d72
uodated hws
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
166 
diff
changeset
 | 
101  | 
\item Define the tokens and regular expressions for a language  | 
| 
267
 
a1544b804d1e
updated homeworks
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
264 
diff
changeset
 | 
102  | 
consisting of numbers, left-parenthesis $($, right-parenthesis $)$,  | 
| 
 
a1544b804d1e
updated homeworks
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
264 
diff
changeset
 | 
103  | 
identifiers and the operations $+$, $-$ and $*$. Can the following  | 
| 
 
a1544b804d1e
updated homeworks
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
264 
diff
changeset
 | 
104  | 
strings in this language be lexed?  | 
| 
264
 
4deef8ac5d72
uodated hws
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
166 
diff
changeset
 | 
105  | 
|
| 
267
 
a1544b804d1e
updated homeworks
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
264 
diff
changeset
 | 
106  | 
  \begin{itemize}
 | 
| 
264
 
4deef8ac5d72
uodated hws
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
166 
diff
changeset
 | 
107  | 
\item $(a + 3) * b$  | 
| 
 
4deef8ac5d72
uodated hws
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
166 
diff
changeset
 | 
108  | 
\item $)()++ -33$  | 
| 
 
4deef8ac5d72
uodated hws
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
166 
diff
changeset
 | 
109  | 
\item $(a / 3) * 3$  | 
| 
267
 
a1544b804d1e
updated homeworks
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
264 
diff
changeset
 | 
110  | 
  \end{itemize}
 | 
| 
264
 
4deef8ac5d72
uodated hws
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
166 
diff
changeset
 | 
111  | 
|
| 
267
 
a1544b804d1e
updated homeworks
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
264 
diff
changeset
 | 
112  | 
In case they can, can you give the corresponding token  | 
| 
 
a1544b804d1e
updated homeworks
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
264 
diff
changeset
 | 
113  | 
sequences.  | 
| 
264
 
4deef8ac5d72
uodated hws
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
166 
diff
changeset
 | 
114  | 
|
| 942 | 115  | 
  \solution{
 | 
116  | 
The first 2 are lexibile. The 3 one contains $/$ which is not an operator.  | 
|
117  | 
}  | 
|
118  | 
||
| 768 | 119  | 
\item Assume $r$ is nullable. Show that  | 
120  | 
\[ 1 + r + r\cdot r \;\equiv\; r\cdot r  | 
|
121  | 
\]  | 
|
122  | 
||
123  | 
holds.  | 
|
124  | 
||
| 893 | 125  | 
  \solution{
 | 
126  | 
If $r$ is nullable, then $1 + r \equiv r$. With this you can replace  | 
|
127  | 
||
128  | 
    \begin{align}
 | 
|
129  | 
(1 + r) + r\cdot r & \equiv r + r\cdot r\\  | 
|
130  | 
& \equiv r \cdot (1 + r)\\  | 
|
131  | 
& \equiv r \cdot r  | 
|
132  | 
    \end{align}  
 | 
|
133  | 
||
134  | 
where in (2) you pull out the ``factor'' $r$ (because $r_1 \cdot (r_2 + r_3) \equiv r_1 \cdot r_2 + r_1 \cdot r_3$).  | 
|
135  | 
}  | 
|
136  | 
||
137  | 
||
| 938 | 138  | 
%\item \textbf{(Deleted)} Assume that $s^{-1}$ stands for the operation of reversing a
 | 
139  | 
%  string $s$. Given the following \emph{reversing} function on regular
 | 
|
140  | 
% expressions  | 
|
141  | 
%  | 
|
142  | 
%  \begin{center}
 | 
|
143  | 
%    \begin{tabular}{r@{\hspace{1mm}}c@{\hspace{1mm}}l}
 | 
|
144  | 
% $rev(\ZERO)$ & $\dn$ & $\ZERO$\\  | 
|
145  | 
% $rev(\ONE)$ & $\dn$ & $\ONE$\\  | 
|
146  | 
% $rev(c)$ & $\dn$ & $c$\\  | 
|
147  | 
% $rev(r_1 + r_2)$ & $\dn$ & $rev(r_1) + rev(r_2)$\\  | 
|
148  | 
% $rev(r_1 \cdot r_2)$ & $\dn$ & $rev(r_2) \cdot rev(r_1)$\\  | 
|
149  | 
% $rev(r^*)$ & $\dn$ & $rev(r)^*$\\  | 
|
150  | 
%    \end{tabular}
 | 
|
151  | 
%  \end{center}
 | 
|
152  | 
%  | 
|
153  | 
% and the set  | 
|
154  | 
%  | 
|
155  | 
%  \begin{center}
 | 
|
156  | 
%    $Rev\,A \dn \{s^{-1} \;|\; s \in A\}$
 | 
|
157  | 
%  \end{center}
 | 
|
158  | 
%  | 
|
159  | 
% prove whether  | 
|
160  | 
%  | 
|
161  | 
%  \begin{center}
 | 
|
162  | 
% $L(rev(r)) = Rev (L(r))$  | 
|
163  | 
%  \end{center}
 | 
|
164  | 
%  | 
|
165  | 
% holds.  | 
|
| 34 | 166  | 
|
| 938 | 167  | 
\item Construct a regular expression that can validate passwords. A  | 
168  | 
password should be at least 8 characters long and consist of upper-  | 
|
169  | 
and lower-case letters and digits. It should contain at least a  | 
|
170  | 
single lower-case letter, at least a single upper-case letter and at  | 
|
| 942 | 171  | 
least a single digit. If possible use the intersection regular  | 
172  | 
expression from CW1, written $\_\&\_$, and the bounded regular  | 
|
| 938 | 173  | 
expressions; you can also assume a regular expression written  | 
174  | 
  \texttt{ALL} that can match any character.
 | 
|
| 31 | 175  | 
|
| 938 | 176  | 
  \solution{
 | 
177  | 
You can build-up the different constraints separately and then  | 
|
178  | 
use the intersection operator:  | 
|
| 32 | 179  | 
|
| 938 | 180  | 
  \begin{center}  
 | 
181  | 
  \begin{tabular}{lll}  
 | 
|
182  | 
    $ALL^{\{8..\}}$ & \;\&\; & $(ALL^*\cdot [a-z]\cdot ALL^*)$\\
 | 
|
183  | 
& \;\&\; & $(ALL^*\cdot [A-Z]\cdot ALL^*)$\\  | 
|
184  | 
& \;\&\; & $(ALL^*\cdot [0-9]\cdot ALL^*)$\\  | 
|
185  | 
  \end{tabular}
 | 
|
| 
267
 
a1544b804d1e
updated homeworks
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
264 
diff
changeset
 | 
186  | 
  \end{center}
 | 
| 942 | 187  | 
|
188  | 
$ALL$ could be represented as $\sim \ZERO$.  | 
|
| 938 | 189  | 
}  | 
190  | 
||
| 
401
 
5d85dc9779b1
updated
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
359 
diff
changeset
 | 
191  | 
\item Assume the delimiters for comments are  | 
| 
 
5d85dc9779b1
updated
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
359 
diff
changeset
 | 
192  | 
      \texttt{$\slash$*} and \texttt{*$\slash$}. Give a
 | 
| 
 
5d85dc9779b1
updated
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
359 
diff
changeset
 | 
193  | 
regular expression that can recognise comments of the  | 
| 
 
5d85dc9779b1
updated
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
359 
diff
changeset
 | 
194  | 
form  | 
| 42 | 195  | 
|
| 
267
 
a1544b804d1e
updated homeworks
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
264 
diff
changeset
 | 
196  | 
  \begin{center}
 | 
| 
 
a1544b804d1e
updated homeworks
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
264 
diff
changeset
 | 
197  | 
    \texttt{$\slash$*~\ldots{}~*$\slash$} 
 | 
| 
 
a1544b804d1e
updated homeworks
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
264 
diff
changeset
 | 
198  | 
  \end{center}
 | 
| 42 | 199  | 
|
| 
401
 
5d85dc9779b1
updated
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
359 
diff
changeset
 | 
200  | 
where the three dots stand for arbitrary characters, but  | 
| 
 
5d85dc9779b1
updated
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
359 
diff
changeset
 | 
201  | 
not comment delimiters. (Hint: You can assume you are  | 
| 
 
5d85dc9779b1
updated
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
359 
diff
changeset
 | 
202  | 
      already given a regular expression written \texttt{ALL},
 | 
| 
 
5d85dc9779b1
updated
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
359 
diff
changeset
 | 
203  | 
that can recognise any character, and a regular  | 
| 
 
5d85dc9779b1
updated
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
359 
diff
changeset
 | 
204  | 
      expression \texttt{NOT} that recognises the complement
 | 
| 
 
5d85dc9779b1
updated
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
359 
diff
changeset
 | 
205  | 
of a regular expression.)  | 
| 942 | 206  | 
|
207  | 
      \solution{
 | 
|
208  | 
\[/ * \sim (ALL^* * / ALL^*) * /\]  | 
|
209  | 
||
210  | 
The idea to make sure in between $/ *$ and $* /$ ar no strings that contain $* /$.  | 
|
211  | 
}  | 
|
212  | 
||
| 
355
 
a259eec25156
updated
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
347 
diff
changeset
 | 
213  | 
\item Simplify the regular expression  | 
| 42 | 214  | 
|
| 
355
 
a259eec25156
updated
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
347 
diff
changeset
 | 
215  | 
\[  | 
| 
401
 
5d85dc9779b1
updated
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
359 
diff
changeset
 | 
216  | 
(\ZERO \cdot (b \cdot c)) +  | 
| 
 
5d85dc9779b1
updated
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
359 
diff
changeset
 | 
217  | 
((\ZERO \cdot c) + \ONE)  | 
| 
355
 
a259eec25156
updated
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
347 
diff
changeset
 | 
218  | 
\]  | 
| 
 
a259eec25156
updated
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
347 
diff
changeset
 | 
219  | 
|
| 
401
 
5d85dc9779b1
updated
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
359 
diff
changeset
 | 
220  | 
Does simplification always preserve the meaning of a  | 
| 942 | 221  | 
regular expression?  | 
222  | 
||
223  | 
      \solution{ Yes, simplification preserves the language. It
 | 
|
224  | 
simplifies to just $\ONE$. It should be remembered that the  | 
|
225  | 
Brzozowski does not simplify under stars. This does not apply  | 
|
226  | 
in this example, though. }  | 
|
| 
355
 
a259eec25156
updated
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
347 
diff
changeset
 | 
227  | 
|
| 
401
 
5d85dc9779b1
updated
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
359 
diff
changeset
 | 
228  | 
\item The Sulzmann \& Lu algorithm contains the function  | 
| 
 
5d85dc9779b1
updated
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
359 
diff
changeset
 | 
229  | 
$mkeps$ which answers how a regular expression can match  | 
| 
 
5d85dc9779b1
updated
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
359 
diff
changeset
 | 
230  | 
the empty string. What is the answer of $mkeps$ for the  | 
| 
355
 
a259eec25156
updated
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
347 
diff
changeset
 | 
231  | 
regular expressions:  | 
| 
146
 
9da175d5eb63
added new hws
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
102 
diff
changeset
 | 
232  | 
|
| 
355
 
a259eec25156
updated
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
347 
diff
changeset
 | 
233  | 
\[  | 
| 
 
a259eec25156
updated
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
347 
diff
changeset
 | 
234  | 
  \begin{array}{l}
 | 
| 
401
 
5d85dc9779b1
updated
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
359 
diff
changeset
 | 
235  | 
(\ZERO \cdot (b \cdot c)) +  | 
| 
 
5d85dc9779b1
updated
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
359 
diff
changeset
 | 
236  | 
((\ZERO \cdot c) + \ONE)\\  | 
| 577 | 237  | 
(a + \ONE) \cdot (\ONE + \ONE)\\  | 
238  | 
a^*  | 
|
| 
355
 
a259eec25156
updated
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
347 
diff
changeset
 | 
239  | 
  \end{array}
 | 
| 942 | 240  | 
\]  | 
241  | 
||
242  | 
\solution{
 | 
|
243  | 
The values are  | 
|
244  | 
  \begin{center}
 | 
|
245  | 
  \begin{tabular}{l}
 | 
|
246  | 
$Right(Right(Empty))$\\  | 
|
247  | 
$Sequ(Right(\ONE),Left(\ONE))$\\  | 
|
248  | 
$Stars\,[]$  | 
|
249  | 
  \end{tabular}  
 | 
|
250  | 
  \end{center}
 | 
|
251  | 
||
252  | 
The last one uses the rule that $mkeps$ for the star returns always $Star\,[]$.  | 
|
253  | 
}  | 
|
| 
355
 
a259eec25156
updated
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
347 
diff
changeset
 | 
254  | 
|
| 
401
 
5d85dc9779b1
updated
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
359 
diff
changeset
 | 
255  | 
\item What is the purpose of the record regular expression in  | 
| 942 | 256  | 
the Sulzmann \& Lu algorithm?  | 
257  | 
||
258  | 
  \solution{
 | 
|
259  | 
It marks a part of a regular expression and can be used to extract the part of the  | 
|
260  | 
string that is matched by this marked part of the regular expression.  | 
|
261  | 
}  | 
|
| 498 | 262  | 
|
| 843 | 263  | 
\item Recall the functions \textit{nullable} and
 | 
| 1009 | 264  | 
      \textit{zeroable} from HW2. Define the cases of these
 | 
265  | 
      functions for the $r^{\{n\}}$ regular expression. Recall that the first 
 | 
|
266  | 
      function needs to satisfy the property $\textit{nullable}(r)$ iff $[]\in L(r)$ and
 | 
|
267  | 
      the second $\textit{zeroable}(r)$ iff $L(r) = \emptyset$
 | 
|
268  | 
||
269  | 
  \solution{
 | 
|
270  | 
    $\textit{nullable}(r^{\{n\}}) \dn if n = 0 then true else \textit{nullable}(r)$
 | 
|
271  | 
||
272  | 
    $\textit{zeroable}(r^{\{n\}}) \dn if n = 0 then false else \textit{zeroable}(r)$    
 | 
|
273  | 
}  | 
|
274  | 
||
275  | 
\item Define recursive functions  | 
|
| 843 | 276  | 
      \textit{atmostempty} (for regular expressions that match no
 | 
277  | 
      string or only the empty string), \textit{somechars} (for
 | 
|
278  | 
regular expressions that match some non-empty string),  | 
|
279  | 
      \textit{infinitestrings} (for regular expressions that can match
 | 
|
280  | 
infinitely many strings).  | 
|
| 893 | 281  | 
|
282  | 
      \solution{
 | 
|
283  | 
        \textbf{zeroable}: The property is $z(r) \;\text{iff}\; L(r) = \emptyset$:
 | 
|
284  | 
||
285  | 
        \begin{align}
 | 
|
286  | 
z(\ZERO) &\dn true\\  | 
|
287  | 
z(\ONE) &\dn false\\  | 
|
288  | 
z(c) &\dn false\\  | 
|
289  | 
z(r_1 + r_2) &\dn z(r_1) \wedge z(r_2)\\  | 
|
290  | 
z(r_1 \cdot r_2) &\dn z(r_1) \vee z(r_2)\\  | 
|
291  | 
z(r^*) &\dn false  | 
|
292  | 
        \end{align}\bigskip
 | 
|
293  | 
||
294  | 
        \textbf{atmostempty}: The property is ``either $L(r) = \emptyset$ or $L(r) = \{[]\}$'', which
 | 
|
295  | 
        is more formally $a(r) \;\text{iff}\; L(r) \subseteq \{[]\}$:
 | 
|
296  | 
||
297  | 
        \begin{align}
 | 
|
298  | 
a(\ZERO) &\dn true\\  | 
|
299  | 
a(\ONE) &\dn true\\  | 
|
300  | 
a(c) &\dn false\\  | 
|
301  | 
a(r_1 + r_2) &\dn a(r_1) \wedge a(r_2)\\  | 
|
302  | 
a(r_1 \cdot r_2) &\dn z(r_1) \vee z(r_2) \vee (a(r_1) \wedge a(r_2))\\  | 
|
303  | 
a(r^*) &\dn a(r)  | 
|
304  | 
        \end{align}
 | 
|
305  | 
||
306  | 
For this it is good to remember the regex should either not  | 
|
307  | 
match anything at all, or just the empty string.\bigskip  | 
|
308  | 
||
309  | 
        \textbf{somechars}: The property is ``$L(r)$ must contain a string which is not the empty string'', which
 | 
|
310  | 
        is more formally $s(r) \;\text{iff}\; \exists\,s. s \not= [] \wedge s \in L(r)$:
 | 
|
311  | 
||
312  | 
        \begin{align}
 | 
|
313  | 
s(\ZERO) &\dn false\\  | 
|
314  | 
s(\ONE) &\dn false\\  | 
|
315  | 
s(c) &\dn true\\  | 
|
316  | 
s(r_1 + r_2) &\dn s(r_1) \vee s(r_2)\\  | 
|
317  | 
s(r_1 \cdot r_2) &\dn (\neg z(r_1) \wedge s(r_2)) \;\vee\; (\neg z(r_2) \wedge s(r_1))\\  | 
|
318  | 
s(r^*) &\dn s(r)  | 
|
319  | 
        \end{align}
 | 
|
320  | 
||
321  | 
Here the interesting case is $r_1 \cdot r_2$ where one essentially has to make sure  | 
|
322  | 
that one of the regexes is not zeroable, because then the resulting regex cannot match any  | 
|
323  | 
string.\bigskip  | 
|
324  | 
||
325  | 
        \textbf{infinitestrings}: The property is
 | 
|
326  | 
        $i(r) \;\text{iff}\; L(r)\;\text{is infinite}$:
 | 
|
327  | 
||
328  | 
        \begin{align}
 | 
|
329  | 
i(\ZERO) &\dn false\\  | 
|
330  | 
i(\ONE) &\dn false\\  | 
|
331  | 
i(c) &\dn false\\  | 
|
332  | 
i(r_1 + r_2) &\dn i(r_1) \vee i(r_2)\\  | 
|
333  | 
i(r_1 \cdot r_2) &\dn (\neg z(r_1) \wedge i(r_2)) \;\vee\; (\neg z(r_2) \wedge i(r_1))\\  | 
|
334  | 
i(r^*) &\dn \neg a(r)  | 
|
335  | 
        \end{align}
 | 
|
336  | 
||
| 942 | 337  | 
Here the interesting bit is that as soon $r$ can match at least a single non-empty string, then $r^*$  | 
| 893 | 338  | 
will match infinitely many strings.  | 
339  | 
}  | 
|
340  | 
||
| 498 | 341  | 
|
| 892 | 342  | 
\item There are two kinds of automata that are generated for  | 
| 843 | 343  | 
regular expression matching---DFAs and NFAs. (1) Regular expression engines like  | 
344  | 
the one in Python generate NFAs. Explain what is the problem with such  | 
|
345  | 
NFAs and what is the reason why they use NFAs. (2) Regular expression  | 
|
346  | 
engines like the one in Rust generate DFAs. Explain what is the  | 
|
347  | 
  problem with these regex engines and also what is the problem with $a^{\{1000\}}$
 | 
|
348  | 
in these engines.  | 
|
| 942 | 349  | 
|
350  | 
  \solution{
 | 
|
351  | 
Why they use NFAs? NFAs are of similar size as the regular expression (they do not explode  | 
|
352  | 
for the basic regular expressions. Python regex library supports constructions like  | 
|
353  | 
back-refernces which cannot be represented by DFAs (string matching with back-references  | 
|
354  | 
    can be NP. What is the problem with $a^{\{1000\}}$. When generating DFAs (and NFAs) for the
 | 
|
355  | 
bounded regular expressions, one has to make $n$ copies, which means their size can grow  | 
|
356  | 
drastically for large counters.  | 
|
357  | 
}  | 
|
| 843 | 358  | 
|
| 
146
 
9da175d5eb63
added new hws
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
102 
diff
changeset
 | 
359  | 
%\item (Optional) The tokenizer in \texttt{regexp3.scala} takes as
 | 
| 
 
9da175d5eb63
added new hws
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
102 
diff
changeset
 | 
360  | 
%argument a string and a list of rules. The result is a list of tokens. Improve this tokenizer so  | 
| 
 
9da175d5eb63
added new hws
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
102 
diff
changeset
 | 
361  | 
%that it filters out all comments and whitespace from the result.  | 
| 
 
9da175d5eb63
added new hws
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
102 
diff
changeset
 | 
362  | 
|
| 
 
9da175d5eb63
added new hws
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
102 
diff
changeset
 | 
363  | 
%\item (Optional) Modify the tokenizer in \texttt{regexp2.scala} so that it
 | 
| 
 
9da175d5eb63
added new hws
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
102 
diff
changeset
 | 
364  | 
%implements the \texttt{findAll} function. This function takes a regular
 | 
| 
 
9da175d5eb63
added new hws
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
102 
diff
changeset
 | 
365  | 
%expressions and a string, and returns all substrings in this string that  | 
| 
444
 
3056a4c071b0
updated
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
401 
diff
changeset
 | 
366  | 
%match the regular expression.  | 
| 
 
3056a4c071b0
updated
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
401 
diff
changeset
 | 
367  | 
|
| 
 
3056a4c071b0
updated
 
Christian Urban <christian dot urban at kcl dot ac dot uk> 
parents: 
401 
diff
changeset
 | 
368  | 
\item \POSTSCRIPT  | 
| 31 | 369  | 
\end{enumerate}
 | 
370  | 
||
371  | 
||
372  | 
\end{document}
 | 
|
373  | 
||
374  | 
%%% Local Variables:  | 
|
375  | 
%%% mode: latex  | 
|
376  | 
%%% TeX-master: t  | 
|
377  | 
%%% End:  |