192 are replaced by $s_2$. For example given the regular expression |
194 are replaced by $s_2$. For example given the regular expression |
193 |
195 |
194 \[(a \cdot a)^* + (b \cdot b)\] |
196 \[(a \cdot a)^* + (b \cdot b)\] |
195 |
197 |
196 \noindent the string $s_1 = aabbbaaaaaaabaaaaabbaaaabb$ and |
198 \noindent the string $s_1 = aabbbaaaaaaabaaaaabbaaaabb$ and |
197 replacement string $s_2 = c$ yields the string |
199 replacement the string $s_2 = c$ yields the string |
198 |
200 |
199 \[ |
201 \[ |
200 ccbcabcaccc |
202 ccbcabcaccc |
201 \] |
203 \] |
202 |
204 |
203 \hfill[2 Mark] |
205 \hfill[2 Marks] |
204 \end{itemize} |
206 \end{itemize} |
205 |
207 |
|
208 |
|
209 |
|
210 |
|
211 \subsection*{Part 2 (4 Marks)} |
|
212 |
|
213 You need to copy all the code from \texttt{re.scala} into |
|
214 \texttt{re2.scala} in order to complete Part 2. Parts (2a) and (2b) |
|
215 give you another method and datapoints for testing the \textit{der} |
|
216 and \textit{simp} functions from Part~1. |
|
217 |
|
218 \subsection*{Tasks (file re2.scala)} |
|
219 |
|
220 \begin{itemize} |
|
221 \item[(2a)] Write a \textbf{polymorphic} function, called |
|
222 \textit{iterT}, that is \textbf{tail-recursive}(!) and takes an |
|
223 integer $n$, a function $f$ and an $x$ as arguments. This function |
|
224 should iterate $f$ $n$-times starting with the argument $x$, like |
|
225 |
|
226 \[\underbrace{f(\ldots (f}_{n\text{-times}}(x))) |
|
227 \] |
|
228 |
|
229 More formally that means \textit{iterT} behaves as follows: |
|
230 |
|
231 \begin{center} |
|
232 \begin{tabular}{lcl} |
|
233 $\textit{iterT}(n, f, x)$ & $\dn$ & |
|
234 $\begin{cases} |
|
235 \;x & \textit{if}\;n = 0\\ |
|
236 \;f(\textit{iterT}(n - 1, f, x)) & \textit{otherwise} |
|
237 \end{cases}$ |
|
238 \end{tabular} |
|
239 \end{center} |
|
240 |
|
241 Make sure you write a \textbf{tail-recursive} version of |
|
242 \textit{iterT}. If you add the annotation \texttt{@tailrec} (see |
|
243 below) you should not get an error message. |
|
244 |
|
245 \begin{lstlisting}[language=Scala, numbers=none, xleftmargin=-1mm] |
|
246 import scala.annotation.tailrec |
|
247 |
|
248 @tailrec |
|
249 def iterT[A](n: Int, f: A => A, x: A): A = ... |
|
250 \end{lstlisting} |
|
251 |
|
252 You can assume that \textit{iterT} will only be called for positive |
|
253 integers $0 \le n$. Given the type variable \texttt{A}, the type of |
|
254 $f$ is \texttt{A => A} and the type of $x$ is \texttt{A}. This means |
|
255 \textit{iterT} can be used, for example, for functions from integers |
|
256 to integers, or strings to strings. \\ \mbox{}\hfill[2 Marks] |
|
257 |
|
258 \item[(2b)] Implement a function, called \textit{size}, by recursion |
|
259 over regular expressions. If a regular expression is seen as a tree, |
|
260 then \textit{size} should return the number of nodes in such a |
|
261 tree. Therefore this function is defined as follows: |
|
262 |
|
263 \begin{center} |
|
264 \begin{tabular}{lcl} |
|
265 $\textit{size}(\ZERO)$ & $\dn$ & $1$\\ |
|
266 $\textit{size}(\ONE)$ & $\dn$ & $1$\\ |
|
267 $\textit{size}(c)$ & $\dn$ & $1$\\ |
|
268 $\textit{size}(r_1 + r_2)$ & $\dn$ & $1 + \textit{size}(r_1) + \textit{size}(r_2)$\\ |
|
269 $\textit{size}(r_1 \cdot r_2)$ & $\dn$ & $1 + \textit{size}(r_1) + \textit{size}(r_2)$\\ |
|
270 $\textit{size}(r^*)$ & $\dn$ & $1 + \textit{size}(r)$\\ |
|
271 \end{tabular} |
|
272 \end{center} |
|
273 |
|
274 You can use \textit{size} and \textit{iterT} in order to test how much |
|
275 the 'evil' regular expression $(a^*)^* \cdot b$ grows when taking |
|
276 successive derivatives according the letter $a$ and compare it to |
|
277 taking the derivative, but simlifying the derivative after each step. |
|
278 For example, the calls |
|
279 |
|
280 \begin{lstlisting}[language=Scala, numbers=none, xleftmargin=-1mm] |
|
281 size(iterT(20, (r: Rexp) => der('a', r), EVIL)) |
|
282 size(iterT(20, (r: Rexp) => simp(der('a', r)), EVIL)) |
|
283 \end{lstlisting} |
|
284 |
|
285 produce without simplification a regular expression of size of |
|
286 7340068 for the derivative after 20 iterations, while the latter is |
|
287 just 8.\\ \mbox{}\hfill[1 Mark] |
|
288 |
|
289 |
|
290 \item[(2c)] Write a \textbf{polymorphic} function, called |
|
291 \textit{fixpT}, that takes |
|
292 a function $f$ and an $x$ as arguments. The purpose |
|
293 of \textit{fixpT} is to calculate a fixpoint of the function $f$ |
|
294 starting from the argument $x$. |
|
295 A fixpoint, say $y$, is when $f(y) = y$ holds. |
|
296 That means \textit{fixpT} behaves as follows: |
|
297 |
|
298 \begin{center} |
|
299 \begin{tabular}{lcl} |
|
300 $\textit{fixpT}(f, x)$ & $\dn$ & |
|
301 $\begin{cases} |
|
302 \;x & \textit{if}\;f(x) = x\\ |
|
303 \;\textit{fixpT}(f, f(x)) & \textit{otherwise} |
|
304 \end{cases}$ |
|
305 \end{tabular} |
|
306 \end{center} |
|
307 |
|
308 Make sure you calculate in the code of $\textit{fixpT}$ the result |
|
309 of $f(x)$ only once. Given the type variable \texttt{A} in |
|
310 $\textit{fixpT}$, the type of $f$ is \texttt{A => A} and the type of |
|
311 $x$ is \texttt{A}. The file \texttt{re2.scala} gives two example |
|
312 function where in one the fixpoint is 1 and in the other |
|
313 it is the string $a$.\\ \mbox{}\hfill[1 Mark] |
|
314 \end{itemize}\bigskip |
|
315 |
|
316 |
|
317 |
|
318 \noindent |
206 \textbf{Background} Although easily implementable in Scala, the idea |
319 \textbf{Background} Although easily implementable in Scala, the idea |
207 behind the derivative function might not so easy to be seen. To |
320 behind the derivative function might not so easy to be seen. To |
208 understand its purpose better, assume a regular expression $r$ can |
321 understand its purpose better, assume a regular expression $r$ can |
209 match strings of the form $c::cs$ (that means strings which start with |
322 match strings of the form $c::cs$ (that means strings which start with |
210 a character $c$ and have some rest $cs$). If you now take the |
323 a character $c$ and have some rest, or tail, $cs$). If you now take the |
211 derivative of $r$ with respect to the character $c$, then you obtain a |
324 derivative of $r$ with respect to the character $c$, then you obtain a |
212 regular expressions that can match all the strings $cs$. In other |
325 regular expressions that can match all the strings $cs$. In other |
213 words the regular expression $\textit{der}\;c\;r$ can match the same |
326 words, the regular expression $\textit{der}\;c\;r$ can match the same |
214 strings $c::cs$ that can be matched by $r$, except that the $c$ is |
327 strings $c::cs$ that can be matched by $r$, except that the $c$ is |
215 chopped off. |
328 chopped off. |
216 |
329 |
217 Assume now $r$ can match the string $abc$. If you take the derivative |
330 Assume now $r$ can match the string $abc$. If you take the derivative |
218 according to $a$ then you obtain a regular expression that can match |
331 according to $a$ then you obtain a regular expression that can match |
219 $bc$ (it is $abc$ where the $a$ has been chopped off). If you now |
332 $bc$ (it is $abc$ where the $a$ has been chopped off). If you now |
220 build the derivative $\textit{der}\;b\;(\textit{der}\;a\;r))$ you obtain a regular |
333 build the derivative $\textit{der}\;b\;(\textit{der}\;a\;r))$ you |
221 expression that can match the string "c" (it is "bc" where 'b' is |
334 obtain a regular expression that can match the string $c$ (it is $bc$ |
222 chopped off). If you finally build the derivative of this according |
335 where $b$ is chopped off). If you finally build the derivative of this |
223 'c', that is der('c', der('b', der('a', r))), you obtain a regular |
336 according $c$, that is |
224 expression that can match the empty string. You can test this using |
337 $\textit{der}\;c\;(\textit{der}\;b\;(\textit{der}\;a\;r)))$, you |
225 the function nullable, which is what your matcher is doing. |
338 obtain a regular expression that can match the empty string. You can |
226 |
339 test this using the function nullable, which is what your matcher is |
227 The purpose of the simp function is to keep the regular expression small. Normally the derivative function makes the regular expression bigger (see the SEQ case) and the algorithm would be slower and slower over time. The simp function counters this increase in size and the result is that the algorithm is fast throughout. |
340 doing. |
228 By the way this whole idea is by Janusz Brzozowski who came up with this in 1964 in his PhD thesis. |
341 |
229 https://en.wikipedia.org/wiki/Janusz_Brzozowski_(computer_scientist) |
342 The purpose of the simp function is to keep the regular expression |
230 |
343 small. Normally the derivative function makes the regular expression |
231 |
344 bigger (see the SEQ case and the example in (2b)) and the algorithm |
232 |
345 would be slower and slower over time. The simp function counters this |
233 \subsection*{Part 2 (4 Marks)} |
346 increase in size and the result is that the algorithm is fast |
234 |
347 throughout. By the way, this algorithm is by Janusz Brzozowski who |
235 Coming soon. |
348 came up with the idea of derivatives in 1964 in his PhD thesis. |
|
349 |
|
350 \begin{center}\small |
|
351 \url{https://en.wikipedia.org/wiki/Janusz_Brzozowski_(computer_scientist)} |
|
352 \end{center} |
236 |
353 |
237 \end{document} |
354 \end{document} |
238 |
355 |
239 |
356 |
240 %%% Local Variables: |
357 %%% Local Variables: |