| author | Christian Urban <christian.urban@kcl.ac.uk> | 
| Mon, 20 Oct 2025 22:18:21 +0200 | |
| changeset 1013 | 1a23d87d1700 | 
| parent 1010 | ae9ffbf979ff | 
| permissions | -rw-r--r-- | 
| 31 | 1 | \documentclass{article}
 | 
| 264 
4deef8ac5d72
uodated hws
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
166diff
changeset | 2 | \usepackage{../style}
 | 
| 292 
7ed2a25dd115
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
267diff
changeset | 3 | \usepackage{../graphics}
 | 
| 146 
9da175d5eb63
added new hws
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
102diff
changeset | 4 | |
| 31 | 5 | \begin{document}
 | 
| 6 | ||
| 7 | \section*{Homework 4}
 | |
| 8 | ||
| 916 | 9 | %%\HEADER | 
| 347 
22b5294daa2a
updated hws
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
294diff
changeset | 10 | |
| 31 | 11 | \begin{enumerate}
 | 
| 34 | 12 | |
| 726 | 13 | \item Given the regular expressions | 
| 14 | ||
| 15 | \begin{center}
 | |
| 16 | \begin{tabular}{ll}    
 | |
| 17 | 1) & $(ab + a)\cdot (\ONE + b)$\\ | |
| 18 | 2) & $(aa + a)^*$\\ | |
| 19 | \end{tabular}
 | |
| 20 | \end{center}
 | |
| 21 | ||
| 22 | there are several values for how these regular expressions can | |
| 23 | recognise the strings (for 1) $ab$ and (for 2) $aaa$. Give in each case | |
| 24 | \emph{all} the values and indicate which one is the POSIX value.
 | |
| 943 | 25 | |
| 26 | \solution{
 | |
| 27 | 1) There are only 2 values (writing $a$ for $Char(a)$ and so on) | |
| 28 | ||
| 29 |   \begin{center}
 | |
| 30 |   \begin{tabular}{l}
 | |
| 31 | $Sequ(Left(Sequ(a,b)),Left(Empty))$\\ | |
| 32 | $Sequ(Right(a),Left(b))$\\ | |
| 33 |   \end{tabular}    
 | |
| 34 |   \end{center}
 | |
| 726 | 35 | |
| 943 | 36 | The first is the POSIX value because of the preference for $Left$. | 
| 37 | ||
| 38 | 2) There are three ``main'' values, namely | |
| 39 | ||
| 40 |   \begin{center}
 | |
| 41 |   \begin{tabular}{l}
 | |
| 42 | $Stars\,[Left(Sequ(a,a)),Right(a)]$\\ | |
| 43 | $Stars\,[Right(a), Left(Sequ(a,a))]$\\ | |
| 44 | $Stars\,[Right(a), Right(a), Right(a)]$\\ | |
| 45 |   \end{tabular}    
 | |
| 46 |   \end{center}
 | |
| 47 | ||
| 48 | Again the first one is the POSIX value, but if it just about all | |
| 49 | possible values, then there are in fact infinitely many values because | |
| 50 | the following | |
| 51 | ||
| 52 |   \begin{center}
 | |
| 53 |   \begin{tabular}{l}
 | |
| 54 | $Stars\,[Left(Sequ(a,a)),Empty,Right(a)]$\\ | |
| 55 | $Stars\,[Left(Sequ(a,a)),Empty,Empty,Right(a)]$\\ | |
| 56 | $Stars\,[Left(Sequ(a,a)),Empty,Right(a), Empty]$, \ldots\\ | |
| 57 |   \end{tabular}    
 | |
| 58 |   \end{center}
 | |
| 59 | ||
| 60 | are also values for this regex and the string $aaa$. | |
| 61 | } | |
| 726 | 62 | |
| 444 
3056a4c071b0
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
401diff
changeset | 63 | \item If a regular expression $r$ does not contain any occurrence of $\ZERO$, | 
| 893 | 64 | is it possible for $L(r)$ to be empty? Explain why, or give a proof. | 
| 65 | ||
| 66 |   \solution{
 | |
| 943 | 67 | No. The property to prove by induction is | 
| 893 | 68 | |
| 69 |     \begin{center}
 | |
| 70 | $P(r)$: If $r$ does not contain $\ZERO$, then $L(r) \not= \emptyset$. | |
| 71 |   \end{center}
 | |
| 72 | ||
| 73 | For this you have to now go through all cases. | |
| 74 | ||
| 75 | Case $r = 0$: $P(\ZERO)$ says: If $\ZERO$ does not contain $\ZERO$ | |
| 76 | then \ldots. The premise is obviously false, so everything follows, | |
| 77 | in particular $L(r) \not= \emptyset$.\medskip | |
| 78 | ||
| 79 | Case $r = \ONE$ and $r = c$ are similar, just that the premise is | |
| 80 | true, but also $L(\ONE)$ and $L(c)$ are not empty. So we shown | |
| 81 | $L(r) \not= \emptyset$.\medskip | |
| 82 | ||
| 83 | Case $r = r_1 + r_2$: We know $P(r_1)$ and $P(r_2)$ as IHs. We need to show | |
| 84 | $P(r_1 + r_2)$: If $r_1 + r_2$ does not contain $\ZERO$, then $L(r_1 + r_2) \not= \emptyset$. | |
| 85 | ||
| 86 | If $r_1 + r_2$ does not contain $\ZERO$, then also $r_1$ does not contain $\ZERO$ | |
| 87 | and $r_2$ does not contain $\ZERO$. So we can apply the two IHs $P(r_1)$ and $P(r_2)$, | |
| 88 | which allow us to infer that $L(r_1) \not= \emptyset$ and $L(r_2) \not= \emptyset$. | |
| 89 | But if this is the case, then also $L(r_1 + r_2) \not= \emptyset$, which is what we needed | |
| 90 | to show in this case.\medskip | |
| 91 | ||
| 92 | The other cases are similar.\bigskip | |
| 93 | ||
| 94 | ||
| 95 | This lemma essentially says that for basic regular expressions, if | |
| 96 | they do not match anything at all, they must contain $\ZERO$(s) | |
| 97 | somewhere. | |
| 98 | ||
| 99 | } | |
| 32 | 100 | |
| 264 
4deef8ac5d72
uodated hws
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
166diff
changeset | 101 | \item Define the tokens and regular expressions for a language | 
| 267 
a1544b804d1e
updated homeworks
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
264diff
changeset | 102 | consisting of numbers, left-parenthesis $($, right-parenthesis $)$, | 
| 
a1544b804d1e
updated homeworks
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
264diff
changeset | 103 | identifiers and the operations $+$, $-$ and $*$. Can the following | 
| 
a1544b804d1e
updated homeworks
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
264diff
changeset | 104 | strings in this language be lexed? | 
| 264 
4deef8ac5d72
uodated hws
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
166diff
changeset | 105 | |
| 267 
a1544b804d1e
updated homeworks
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
264diff
changeset | 106 |   \begin{itemize}
 | 
| 264 
4deef8ac5d72
uodated hws
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
166diff
changeset | 107 | \item $(a + 3) * b$ | 
| 
4deef8ac5d72
uodated hws
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
166diff
changeset | 108 | \item $)()++ -33$ | 
| 
4deef8ac5d72
uodated hws
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
166diff
changeset | 109 | \item $(a / 3) * 3$ | 
| 267 
a1544b804d1e
updated homeworks
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
264diff
changeset | 110 |   \end{itemize}
 | 
| 264 
4deef8ac5d72
uodated hws
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
166diff
changeset | 111 | |
| 267 
a1544b804d1e
updated homeworks
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
264diff
changeset | 112 | In case they can, can you give the corresponding token | 
| 
a1544b804d1e
updated homeworks
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
264diff
changeset | 113 | sequences. | 
| 264 
4deef8ac5d72
uodated hws
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
166diff
changeset | 114 | |
| 943 | 115 |   \solution{
 | 
| 116 | The first 2 are lexibile. The 3 one contains $/$ which is not an operator. | |
| 117 | } | |
| 118 | ||
| 768 | 119 | \item Assume $r$ is nullable. Show that | 
| 120 | \[ 1 + r + r\cdot r \;\equiv\; r\cdot r | |
| 121 | \] | |
| 122 | ||
| 123 | holds. | |
| 124 | ||
| 893 | 125 |   \solution{
 | 
| 126 | If $r$ is nullable, then $1 + r \equiv r$. With this you can replace | |
| 127 | ||
| 128 |     \begin{align}
 | |
| 129 | (1 + r) + r\cdot r & \equiv r + r\cdot r\\ | |
| 130 | & \equiv r \cdot (1 + r)\\ | |
| 131 | & \equiv r \cdot r | |
| 132 |     \end{align}  
 | |
| 133 | ||
| 134 | where in (2) you pull out the ``factor'' $r$ (because $r_1 \cdot (r_2 + r_3) \equiv r_1 \cdot r_2 + r_1 \cdot r_3$). | |
| 135 | } | |
| 136 | ||
| 137 | ||
| 939 | 138 | %\item \textbf{(Deleted)} Assume that $s^{-1}$ stands for the operation of reversing a
 | 
| 139 | %  string $s$. Given the following \emph{reversing} function on regular
 | |
| 140 | % expressions | |
| 141 | % | |
| 142 | %  \begin{center}
 | |
| 143 | %    \begin{tabular}{r@{\hspace{1mm}}c@{\hspace{1mm}}l}
 | |
| 144 | % $rev(\ZERO)$ & $\dn$ & $\ZERO$\\ | |
| 145 | % $rev(\ONE)$ & $\dn$ & $\ONE$\\ | |
| 146 | % $rev(c)$ & $\dn$ & $c$\\ | |
| 147 | % $rev(r_1 + r_2)$ & $\dn$ & $rev(r_1) + rev(r_2)$\\ | |
| 148 | % $rev(r_1 \cdot r_2)$ & $\dn$ & $rev(r_2) \cdot rev(r_1)$\\ | |
| 149 | % $rev(r^*)$ & $\dn$ & $rev(r)^*$\\ | |
| 150 | %    \end{tabular}
 | |
| 151 | %  \end{center}
 | |
| 152 | % | |
| 153 | % and the set | |
| 154 | % | |
| 155 | %  \begin{center}
 | |
| 156 | %    $Rev\,A \dn \{s^{-1} \;|\; s \in A\}$
 | |
| 157 | %  \end{center}
 | |
| 158 | % | |
| 159 | % prove whether | |
| 160 | % | |
| 161 | %  \begin{center}
 | |
| 162 | % $L(rev(r)) = Rev (L(r))$ | |
| 163 | %  \end{center}
 | |
| 164 | % | |
| 165 | % holds. | |
| 34 | 166 | |
| 939 | 167 | \item Construct a regular expression that can validate passwords. A | 
| 168 | password should be at least 8 characters long and consist of upper- | |
| 169 | and lower-case letters and digits. It should contain at least a | |
| 170 | single lower-case letter, at least a single upper-case letter and at | |
| 943 | 171 | least a single digit. If possible use the intersection regular | 
| 172 | expression from CW1, written $\_\&\_$, and the bounded regular | |
| 939 | 173 | expressions; you can also assume a regular expression written | 
| 174 |   \texttt{ALL} that can match any character.
 | |
| 31 | 175 | |
| 939 | 176 |   \solution{
 | 
| 177 | You can build-up the different constraints separately and then | |
| 178 | use the intersection operator: | |
| 32 | 179 | |
| 939 | 180 |   \begin{center}  
 | 
| 181 |   \begin{tabular}{lll}  
 | |
| 182 |     $ALL^{\{8..\}}$ & \;\&\; & $(ALL^*\cdot [a-z]\cdot ALL^*)$\\
 | |
| 183 | & \;\&\; & $(ALL^*\cdot [A-Z]\cdot ALL^*)$\\ | |
| 184 | & \;\&\; & $(ALL^*\cdot [0-9]\cdot ALL^*)$\\ | |
| 185 |   \end{tabular}
 | |
| 267 
a1544b804d1e
updated homeworks
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
264diff
changeset | 186 |   \end{center}
 | 
| 943 | 187 | |
| 188 | $ALL$ could be represented as $\sim \ZERO$. | |
| 939 | 189 | } | 
| 190 | ||
| 401 
5d85dc9779b1
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
359diff
changeset | 191 | \item Assume the delimiters for comments are | 
| 
5d85dc9779b1
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
359diff
changeset | 192 |       \texttt{$\slash$*} and \texttt{*$\slash$}. Give a
 | 
| 
5d85dc9779b1
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
359diff
changeset | 193 | regular expression that can recognise comments of the | 
| 
5d85dc9779b1
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
359diff
changeset | 194 | form | 
| 42 | 195 | |
| 267 
a1544b804d1e
updated homeworks
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
264diff
changeset | 196 |   \begin{center}
 | 
| 
a1544b804d1e
updated homeworks
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
264diff
changeset | 197 |     \texttt{$\slash$*~\ldots{}~*$\slash$} 
 | 
| 
a1544b804d1e
updated homeworks
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
264diff
changeset | 198 |   \end{center}
 | 
| 42 | 199 | |
| 401 
5d85dc9779b1
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
359diff
changeset | 200 | where the three dots stand for arbitrary characters, but | 
| 
5d85dc9779b1
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
359diff
changeset | 201 | not comment delimiters. (Hint: You can assume you are | 
| 
5d85dc9779b1
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
359diff
changeset | 202 |       already given a regular expression written \texttt{ALL},
 | 
| 
5d85dc9779b1
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
359diff
changeset | 203 | that can recognise any character, and a regular | 
| 
5d85dc9779b1
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
359diff
changeset | 204 |       expression \texttt{NOT} that recognises the complement
 | 
| 
5d85dc9779b1
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
359diff
changeset | 205 | of a regular expression.) | 
| 943 | 206 | |
| 207 |       \solution{
 | |
| 208 | \[/ * \sim (ALL^* * / ALL^*) * /\] | |
| 209 | ||
| 210 | The idea to make sure in between $/ *$ and $* /$ ar no strings that contain $* /$. | |
| 211 | } | |
| 212 | ||
| 355 
a259eec25156
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
347diff
changeset | 213 | \item Simplify the regular expression | 
| 42 | 214 | |
| 355 
a259eec25156
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
347diff
changeset | 215 | \[ | 
| 401 
5d85dc9779b1
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
359diff
changeset | 216 | (\ZERO \cdot (b \cdot c)) + | 
| 
5d85dc9779b1
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
359diff
changeset | 217 | ((\ZERO \cdot c) + \ONE) | 
| 355 
a259eec25156
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
347diff
changeset | 218 | \] | 
| 
a259eec25156
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
347diff
changeset | 219 | |
| 401 
5d85dc9779b1
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
359diff
changeset | 220 | Does simplification always preserve the meaning of a | 
| 943 | 221 | regular expression? | 
| 222 | ||
| 223 |       \solution{ Yes, simplification preserves the language. It
 | |
| 224 | simplifies to just $\ONE$. It should be remembered that the | |
| 225 | Brzozowski does not simplify under stars. This does not apply | |
| 226 | in this example, though. } | |
| 355 
a259eec25156
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
347diff
changeset | 227 | |
| 401 
5d85dc9779b1
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
359diff
changeset | 228 | \item The Sulzmann \& Lu algorithm contains the function | 
| 
5d85dc9779b1
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
359diff
changeset | 229 | $mkeps$ which answers how a regular expression can match | 
| 
5d85dc9779b1
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
359diff
changeset | 230 | the empty string. What is the answer of $mkeps$ for the | 
| 355 
a259eec25156
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
347diff
changeset | 231 | regular expressions: | 
| 146 
9da175d5eb63
added new hws
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
102diff
changeset | 232 | |
| 355 
a259eec25156
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
347diff
changeset | 233 | \[ | 
| 
a259eec25156
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
347diff
changeset | 234 |   \begin{array}{l}
 | 
| 401 
5d85dc9779b1
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
359diff
changeset | 235 | (\ZERO \cdot (b \cdot c)) + | 
| 
5d85dc9779b1
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
359diff
changeset | 236 | ((\ZERO \cdot c) + \ONE)\\ | 
| 577 | 237 | (a + \ONE) \cdot (\ONE + \ONE)\\ | 
| 238 | a^* | |
| 355 
a259eec25156
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
347diff
changeset | 239 |   \end{array}
 | 
| 943 | 240 | \] | 
| 241 | ||
| 242 | \solution{
 | |
| 243 | The values are | |
| 244 |   \begin{center}
 | |
| 245 |   \begin{tabular}{l}
 | |
| 246 | $Right(Right(Empty))$\\ | |
| 247 | $Sequ(Right(\ONE),Left(\ONE))$\\ | |
| 248 | $Stars\,[]$ | |
| 249 |   \end{tabular}  
 | |
| 250 |   \end{center}
 | |
| 251 | ||
| 252 | The last one uses the rule that $mkeps$ for the star returns always $Star\,[]$. | |
| 253 | } | |
| 355 
a259eec25156
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
347diff
changeset | 254 | |
| 401 
5d85dc9779b1
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
359diff
changeset | 255 | \item What is the purpose of the record regular expression in | 
| 943 | 256 | the Sulzmann \& Lu algorithm? | 
| 257 | ||
| 258 |   \solution{
 | |
| 259 | It marks a part of a regular expression and can be used to extract the part of the | |
| 260 | string that is matched by this marked part of the regular expression. | |
| 261 | } | |
| 498 | 262 | |
| 843 | 263 | \item Recall the functions \textit{nullable} and
 | 
| 1010 | 264 |       \textit{zeroable} from HW2. Define the cases of these
 | 
| 265 |       functions for the $r^{\{n\}}$ regular expression. Recall that the first 
 | |
| 266 |       function needs to satisfy the property $\textit{nullable}(r)$ iff $[]\in L(r)$ and
 | |
| 267 |       the second $\textit{zeroable}(r)$ iff $L(r) = \emptyset$
 | |
| 268 | ||
| 269 |   \solution{
 | |
| 270 |     $\textit{nullable}(r^{\{n\}}) \dn if n = 0 then true else \textit{nullable}(r)$
 | |
| 271 | ||
| 272 |     $\textit{zeroable}(r^{\{n\}}) \dn if n = 0 then false else \textit{zeroable}(r)$    
 | |
| 273 | } | |
| 274 | ||
| 275 | \item Define recursive functions | |
| 843 | 276 |       \textit{atmostempty} (for regular expressions that match no
 | 
| 277 |       string or only the empty string), \textit{somechars} (for
 | |
| 278 | regular expressions that match some non-empty string), | |
| 279 |       \textit{infinitestrings} (for regular expressions that can match
 | |
| 280 | infinitely many strings). | |
| 893 | 281 | |
| 282 |       \solution{
 | |
| 283 |         \textbf{zeroable}: The property is $z(r) \;\text{iff}\; L(r) = \emptyset$:
 | |
| 284 | ||
| 285 |         \begin{align}
 | |
| 286 | z(\ZERO) &\dn true\\ | |
| 287 | z(\ONE) &\dn false\\ | |
| 288 | z(c) &\dn false\\ | |
| 289 | z(r_1 + r_2) &\dn z(r_1) \wedge z(r_2)\\ | |
| 290 | z(r_1 \cdot r_2) &\dn z(r_1) \vee z(r_2)\\ | |
| 291 | z(r^*) &\dn false | |
| 292 |         \end{align}\bigskip
 | |
| 293 | ||
| 294 |         \textbf{atmostempty}: The property is ``either $L(r) = \emptyset$ or $L(r) = \{[]\}$'', which
 | |
| 295 |         is more formally $a(r) \;\text{iff}\; L(r) \subseteq \{[]\}$:
 | |
| 296 | ||
| 297 |         \begin{align}
 | |
| 298 | a(\ZERO) &\dn true\\ | |
| 299 | a(\ONE) &\dn true\\ | |
| 300 | a(c) &\dn false\\ | |
| 301 | a(r_1 + r_2) &\dn a(r_1) \wedge a(r_2)\\ | |
| 302 | a(r_1 \cdot r_2) &\dn z(r_1) \vee z(r_2) \vee (a(r_1) \wedge a(r_2))\\ | |
| 303 | a(r^*) &\dn a(r) | |
| 304 |         \end{align}
 | |
| 305 | ||
| 306 | For this it is good to remember the regex should either not | |
| 307 | match anything at all, or just the empty string.\bigskip | |
| 308 | ||
| 309 |         \textbf{somechars}: The property is ``$L(r)$ must contain a string which is not the empty string'', which
 | |
| 310 |         is more formally $s(r) \;\text{iff}\; \exists\,s. s \not= [] \wedge s \in L(r)$:
 | |
| 311 | ||
| 312 |         \begin{align}
 | |
| 313 | s(\ZERO) &\dn false\\ | |
| 314 | s(\ONE) &\dn false\\ | |
| 315 | s(c) &\dn true\\ | |
| 316 | s(r_1 + r_2) &\dn s(r_1) \vee s(r_2)\\ | |
| 317 | s(r_1 \cdot r_2) &\dn (\neg z(r_1) \wedge s(r_2)) \;\vee\; (\neg z(r_2) \wedge s(r_1))\\ | |
| 318 | s(r^*) &\dn s(r) | |
| 319 |         \end{align}
 | |
| 320 | ||
| 321 | Here the interesting case is $r_1 \cdot r_2$ where one essentially has to make sure | |
| 322 | that one of the regexes is not zeroable, because then the resulting regex cannot match any | |
| 323 | string.\bigskip | |
| 324 | ||
| 325 |         \textbf{infinitestrings}: The property is
 | |
| 326 |         $i(r) \;\text{iff}\; L(r)\;\text{is infinite}$:
 | |
| 327 | ||
| 328 |         \begin{align}
 | |
| 329 | i(\ZERO) &\dn false\\ | |
| 330 | i(\ONE) &\dn false\\ | |
| 331 | i(c) &\dn false\\ | |
| 332 | i(r_1 + r_2) &\dn i(r_1) \vee i(r_2)\\ | |
| 333 | i(r_1 \cdot r_2) &\dn (\neg z(r_1) \wedge i(r_2)) \;\vee\; (\neg z(r_2) \wedge i(r_1))\\ | |
| 334 | i(r^*) &\dn \neg a(r) | |
| 335 |         \end{align}
 | |
| 336 | ||
| 943 | 337 | Here the interesting bit is that as soon $r$ can match at least a single non-empty string, then $r^*$ | 
| 893 | 338 | will match infinitely many strings. | 
| 339 | } | |
| 340 | ||
| 498 | 341 | |
| 892 | 342 | \item There are two kinds of automata that are generated for | 
| 843 | 343 | regular expression matching---DFAs and NFAs. (1) Regular expression engines like | 
| 344 | the one in Python generate NFAs. Explain what is the problem with such | |
| 345 | NFAs and what is the reason why they use NFAs. (2) Regular expression | |
| 346 | engines like the one in Rust generate DFAs. Explain what is the | |
| 347 |   problem with these regex engines and also what is the problem with $a^{\{1000\}}$
 | |
| 348 | in these engines. | |
| 943 | 349 | |
| 350 |   \solution{
 | |
| 351 | Why they use NFAs? NFAs are of similar size as the regular expression (they do not explode | |
| 352 | for the basic regular expressions. Python regex library supports constructions like | |
| 353 | back-refernces which cannot be represented by DFAs (string matching with back-references | |
| 354 |     can be NP. What is the problem with $a^{\{1000\}}$. When generating DFAs (and NFAs) for the
 | |
| 355 | bounded regular expressions, one has to make $n$ copies, which means their size can grow | |
| 356 | drastically for large counters. | |
| 357 | } | |
| 843 | 358 | |
| 146 
9da175d5eb63
added new hws
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
102diff
changeset | 359 | %\item (Optional) The tokenizer in \texttt{regexp3.scala} takes as
 | 
| 
9da175d5eb63
added new hws
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
102diff
changeset | 360 | %argument a string and a list of rules. The result is a list of tokens. Improve this tokenizer so | 
| 
9da175d5eb63
added new hws
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
102diff
changeset | 361 | %that it filters out all comments and whitespace from the result. | 
| 
9da175d5eb63
added new hws
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
102diff
changeset | 362 | |
| 
9da175d5eb63
added new hws
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
102diff
changeset | 363 | %\item (Optional) Modify the tokenizer in \texttt{regexp2.scala} so that it
 | 
| 
9da175d5eb63
added new hws
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
102diff
changeset | 364 | %implements the \texttt{findAll} function. This function takes a regular
 | 
| 
9da175d5eb63
added new hws
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
102diff
changeset | 365 | %expressions and a string, and returns all substrings in this string that | 
| 444 
3056a4c071b0
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
401diff
changeset | 366 | %match the regular expression. | 
| 
3056a4c071b0
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
401diff
changeset | 367 | |
| 
3056a4c071b0
updated
 Christian Urban <christian dot urban at kcl dot ac dot uk> parents: 
401diff
changeset | 368 | \item \POSTSCRIPT | 
| 31 | 369 | \end{enumerate}
 | 
| 370 | ||
| 371 | ||
| 372 | \end{document}
 | |
| 373 | ||
| 374 | %%% Local Variables: | |
| 375 | %%% mode: latex | |
| 376 | %%% TeX-master: t | |
| 377 | %%% End: |