180 \end{center} |
224 \end{center} |
181 |
225 |
182 For example the regular expression |
226 For example the regular expression |
183 \[(r_1 + \ZERO) \cdot \ONE + ((\ONE + r_2) + r_3) \cdot (r_4 \cdot \ZERO)\] |
227 \[(r_1 + \ZERO) \cdot \ONE + ((\ONE + r_2) + r_3) \cdot (r_4 \cdot \ZERO)\] |
184 |
228 |
185 simplifies to just $r_1$. \textbf{Hints:} Regular expressions can be |
229 simplifies to just $r_1$. \textbf{Hint:} Regular expressions can be |
186 seen as trees and there are several methods for traversing |
230 seen as trees and there are several methods for traversing |
187 trees. One of them corresponds to the inside-out traversal. Also |
231 trees. One of them corresponds to the inside-out traversal, which is |
188 remember numerical expressions from school: there you had expressions |
232 sometimes also called post-order traversal. Furthermore, |
|
233 remember numerical expressions from school times: there you had expressions |
189 like $u + \ldots + (1 \cdot x) - \ldots (z + (y \cdot 0)) \ldots$ |
234 like $u + \ldots + (1 \cdot x) - \ldots (z + (y \cdot 0)) \ldots$ |
190 and simplification rules that looked very similar to rules |
235 and simplification rules that looked very similar to rules |
191 above. You would simplify such numerical expressions by replacing |
236 above. You would simplify such numerical expressions by replacing |
192 for example the $y \cdot 0$ by $0$, or $1\cdot x$ by $x$, and then |
237 for example the $y \cdot 0$ by $0$, or $1\cdot x$ by $x$, and then |
193 look whether more rules are applicable. If you organise the |
238 look whether more rules are applicable. If you organise the |
194 simplification in an inside-out fashion, it is always clear which |
239 simplification in an inside-out fashion, it is always clear which |
195 rule should applied next.\\\mbox{}\hfill[1 Mark] |
240 rule should be applied next.\hfill[2 Marks] |
196 |
241 |
197 \item[(1d)] Implement two functions: The first, called \textit{ders}, |
242 \item[(1d)] Implement two functions: The first, called \textit{ders}, |
198 takes a list of characters and a regular expression as arguments, and |
243 takes a list of characters and a regular expression as arguments, and |
199 builds the derivative w.r.t.~the list as follows: |
244 builds the derivative w.r.t.~the list as follows: |
200 |
245 |
212 The second function, called \textit{matcher}, takes a string and a |
257 The second function, called \textit{matcher}, takes a string and a |
213 regular expression as arguments. It builds first the derivatives |
258 regular expression as arguments. It builds first the derivatives |
214 according to \textit{ders} and after that tests whether the resulting |
259 according to \textit{ders} and after that tests whether the resulting |
215 derivative regular expression can match the empty string (using |
260 derivative regular expression can match the empty string (using |
216 \textit{nullable}). For example the \textit{matcher} will produce |
261 \textit{nullable}). For example the \textit{matcher} will produce |
217 true given the regular expression $(a\cdot b)\cdot c$ and the string |
262 true for the regular expression $(a\cdot b)\cdot c$ and the string |
218 $abc$.\\ \mbox{}\hfill[1 Mark] |
263 $abc$, but false if you give it the string $ab$. \hfill[1 Mark] |
219 |
264 |
220 \item[(1e)] Implement the function $\textit{replace}\;r\;s_1\;s_2$: it searches |
265 \item[(1e)] Implement a function, called \textit{size}, by recursion |
221 (from the left to |
|
222 right) in the string $s_1$ all the non-empty substrings that match the |
|
223 regular expression $r$---these substrings are assumed to be |
|
224 the longest substrings matched by the regular expression and |
|
225 assumed to be non-overlapping. All these substrings in $s_1$ matched by $r$ |
|
226 are replaced by $s_2$. For example given the regular expression |
|
227 |
|
228 \[(a \cdot a)^* + (b \cdot b)\] |
|
229 |
|
230 \noindent the string $s_1 = aabbbaaaaaaabaaaaabbaaaabb$ and |
|
231 replacement the string $s_2 = c$ yields the string |
|
232 |
|
233 \[ |
|
234 ccbcabcaccc |
|
235 \] |
|
236 |
|
237 \hfill[2 Marks] |
|
238 \end{itemize} |
|
239 |
|
240 |
|
241 |
|
242 |
|
243 \subsection*{Part 2 (4 Marks)} |
|
244 |
|
245 You need to copy all the code from \texttt{re.scala} into |
|
246 \texttt{re2.scala} in order to complete Part 2. Parts (2a) and (2b) |
|
247 give you another method and datapoints for testing the \textit{der} |
|
248 and \textit{simp} functions from Part~1. |
|
249 |
|
250 \subsubsection*{Tasks (file re2.scala)} |
|
251 |
|
252 \begin{itemize} |
|
253 \item[(2a)] Write a \textbf{polymorphic} function, called |
|
254 \textit{iterT}, that is \textbf{tail-recursive}(!) and takes an |
|
255 integer $n$, a function $f$ and an $x$ as arguments. This function |
|
256 should iterate $f$ $n$-times starting with the argument $x$, like |
|
257 |
|
258 \[\underbrace{f(\ldots (f}_{n\text{-times}}(x))) |
|
259 \] |
|
260 |
|
261 More formally that means \textit{iterT} behaves as follows: |
|
262 |
|
263 \begin{center} |
|
264 \begin{tabular}{lcl} |
|
265 $\textit{iterT}(n, f, x)$ & $\dn$ & |
|
266 $\begin{cases} |
|
267 \;x & \textit{if}\;n = 0\\ |
|
268 \;f(\textit{iterT}(n - 1, f, x)) & \textit{otherwise} |
|
269 \end{cases}$ |
|
270 \end{tabular} |
|
271 \end{center} |
|
272 |
|
273 Make sure you write a \textbf{tail-recursive} version of |
|
274 \textit{iterT}. If you add the annotation \texttt{@tailrec} (see |
|
275 below) your code should not produce an error message. |
|
276 |
|
277 \begin{lstlisting}[language=Scala, numbers=none, xleftmargin=-1mm] |
|
278 import scala.annotation.tailrec |
|
279 |
|
280 @tailrec |
|
281 def iterT[A](n: Int, f: A => A, x: A): A = ... |
|
282 \end{lstlisting} |
|
283 |
|
284 You can assume that \textit{iterT} will only be called for positive |
|
285 integers $0 \le n$. Given the type variable \texttt{A}, the type of |
|
286 $f$ is \texttt{A => A} and the type of $x$ is \texttt{A}. This means |
|
287 \textit{iterT} can be used, for example, for functions from integers |
|
288 to integers, or strings to strings, or regular expressions to |
|
289 regular expressions. \\ \mbox{}\hfill[2 Marks] |
|
290 |
|
291 \item[(2b)] Implement a function, called \textit{size}, by recursion |
|
292 over regular expressions. If a regular expression is seen as a tree, |
266 over regular expressions. If a regular expression is seen as a tree, |
293 then \textit{size} should return the number of nodes in such a |
267 then \textit{size} should return the number of nodes in such a |
294 tree. Therefore this function is defined as follows: |
268 tree. Therefore this function is defined as follows: |
295 |
269 |
296 \begin{center} |
270 \begin{center} |
302 $\textit{size}(r_1 \cdot r_2)$ & $\dn$ & $1 + \textit{size}(r_1) + \textit{size}(r_2)$\\ |
276 $\textit{size}(r_1 \cdot r_2)$ & $\dn$ & $1 + \textit{size}(r_1) + \textit{size}(r_2)$\\ |
303 $\textit{size}(r^*)$ & $\dn$ & $1 + \textit{size}(r)$\\ |
277 $\textit{size}(r^*)$ & $\dn$ & $1 + \textit{size}(r)$\\ |
304 \end{tabular} |
278 \end{tabular} |
305 \end{center} |
279 \end{center} |
306 |
280 |
307 You can use \textit{size} and \textit{iterT} in order to test how much |
281 You can use \textit{size} in order to test how much the `evil' regular |
308 the 'evil' regular expression $(a^*)^* \cdot b$ grows when taking |
282 expression $(a^*)^* \cdot b$ grows when taking successive derivatives |
309 successive derivatives according the letter $a$ and then compare it to |
283 according the letter $a$ without simplification and then compare it to |
310 taking the derivative, but simlifying the derivative after each step. |
284 taking the derivative, but simplify the result. The sizes |
311 For example, the calls |
285 are given in \texttt{re.scala}. \hfill[1 Mark] |
312 |
286 \end{itemize} |
313 \begin{lstlisting}[language=Scala, numbers=none, xleftmargin=-1mm] |
|
314 size(iterT(20, (r: Rexp) => der('a', r), EVIL)) |
|
315 size(iterT(20, (r: Rexp) => simp(der('a', r)), EVIL)) |
|
316 \end{lstlisting} |
|
317 |
|
318 produce without simplification a regular expression of size of |
|
319 7340068 after 20 iterations, while the one with |
|
320 simplification gives |
|
321 just 8.\\ \mbox{}\hfill[1 Mark] |
|
322 |
|
323 |
|
324 \item[(2c)] Write a \textbf{polymorphic} function, called |
|
325 \textit{fixpT}, that takes |
|
326 a function $f$ and an $x$ as arguments. The purpose |
|
327 of \textit{fixpT} is to calculate a fixpoint of the function $f$ |
|
328 starting from the argument $x$. |
|
329 A fixpoint, say $y$, is when $f(y) = y$ holds. |
|
330 That means \textit{fixpT} behaves as follows: |
|
331 |
|
332 \begin{center} |
|
333 \begin{tabular}{lcl} |
|
334 $\textit{fixpT}(f, x)$ & $\dn$ & |
|
335 $\begin{cases} |
|
336 \;x & \textit{if}\;f(x) = x\\ |
|
337 \;\textit{fixpT}(f, f(x)) & \textit{otherwise} |
|
338 \end{cases}$ |
|
339 \end{tabular} |
|
340 \end{center} |
|
341 |
|
342 Make sure you calculate in the code of $\textit{fixpT}$ the result |
|
343 of $f(x)$ only once. Given the type variable \texttt{A} in |
|
344 $\textit{fixpT}$, the type of $f$ is \texttt{A => A} and the type of |
|
345 $x$ is \texttt{A}. The file \texttt{re2.scala} gives two example |
|
346 function where in one the fixpoint is 1 and in the other |
|
347 it is the string $a$.\\ \mbox{}\hfill[1 Mark] |
|
348 \end{itemize}\bigskip |
|
349 |
|
350 |
287 |
351 \subsection*{Background} |
288 \subsection*{Background} |
352 |
289 |
353 Although easily implementable in Scala, the idea behind the derivative |
290 Although easily implementable in Scala, the idea behind the derivative |
354 function might not so easy to be seen. To understand its purpose |
291 function might not so easy to be seen. To understand its purpose |
355 better, assume a regular expression $r$ can match strings of the form |
292 better, assume a regular expression $r$ can match strings of the form |
356 $c\!::\!cs$ (that means strings which start with a character $c$ and have |
293 $c\!::\!cs$ (that means strings which start with a character $c$ and have |
357 some rest, or tail, $cs$). If you now take the derivative of $r$ with |
294 some rest, or tail, $cs$). If you take the derivative of $r$ with |
358 respect to the character $c$, then you obtain a regular expressions |
295 respect to the character $c$, then you obtain a regular expression |
359 that can match all the strings $cs$. In other words, the regular |
296 that can match all the strings $cs$. In other words, the regular |
360 expression $\textit{der}\;c\;r$ can match the same strings $c::cs$ |
297 expression $\textit{der}\;c\;r$ can match the same strings $c\!::\!cs$ |
361 that can be matched by $r$, except that the $c$ is chopped off. |
298 that can be matched by $r$, except that the $c$ is chopped off. |
362 |
299 |
363 Assume now $r$ can match the string $abc$. If you take the derivative |
300 Assume now $r$ can match the string $abc$. If you take the derivative |
364 according to $a$ then you obtain a regular expression that can match |
301 according to $a$ then you obtain a regular expression that can match |
365 $bc$ (it is $abc$ where the $a$ has been chopped off). If you now |
302 $bc$ (it is $abc$ where the $a$ has been chopped off). If you now |
366 build the derivative $\textit{der}\;b\;(\textit{der}\;a\;r))$ you |
303 build the derivative $\textit{der}\;b\;(\textit{der}\;a\;r)$ you |
367 obtain a regular expression that can match the string $c$ (it is $bc$ |
304 obtain a regular expression that can match the string $c$ (it is $bc$ |
368 where $b$ is chopped off). If you finally build the derivative of this |
305 where $b$ is chopped off). If you finally build the derivative of this |
369 according $c$, that is |
306 according $c$, that is |
370 $\textit{der}\;c\;(\textit{der}\;b\;(\textit{der}\;a\;r)))$, you |
307 $\textit{der}\;c\;(\textit{der}\;b\;(\textit{der}\;a\;r))$, you obtain |
371 obtain a regular expression that can match the empty string. You can |
308 a regular expression that can match the empty string. You can test |
372 test this using the function nullable, which is what your matcher is |
309 whether this is indeed the case using the function nullable, which is |
373 doing. |
310 what your matcher is doing. |
374 |
311 |
375 The purpose of the simp function is to keep the regular expression |
312 The purpose of the $\textit{simp}$ function is to keep the regular |
376 small. Normally the derivative function makes the regular expression |
313 expression small. Normally the derivative function makes the regular |
377 bigger (see the SEQ case and the example in (2b)) and the algorithm |
314 expression bigger (see the SEQ case and the example in (1b)) and the |
378 would be slower and slower over time. The simp function counters this |
315 algorithm would be slower and slower over time. The $\textit{simp}$ |
379 increase in size and the result is that the algorithm is fast |
316 function counters this increase in size and the result is that the |
380 throughout. By the way, this algorithm is by Janusz Brzozowski who |
317 algorithm is fast throughout. By the way, this algorithm is by Janusz |
381 came up with the idea of derivatives in 1964 in his PhD thesis. |
318 Brzozowski who came up with the idea of derivatives in 1964 in his PhD |
|
319 thesis. |
382 |
320 |
383 \begin{center}\small |
321 \begin{center}\small |
384 \url{https://en.wikipedia.org/wiki/Janusz_Brzozowski_(computer_scientist)} |
322 \url{https://en.wikipedia.org/wiki/Janusz_Brzozowski_(computer_scientist)} |
385 \end{center} |
323 \end{center} |
|
324 |
|
325 |
|
326 If you want to see how badly the regular expression matchers do in |
|
327 Java and Python with the `evil' regular expression, then have a look |
|
328 at the graphs below (you can try it out for yourself: have a look at |
|
329 the file \texttt{catastrophic.java} on KEATS). Compare this with the |
|
330 matcher you have implemented. How long can the string of $a$'s be |
|
331 in your matcher and stay within the 30 seconds time limit? |
|
332 |
|
333 \begin{center} |
|
334 \begin{tikzpicture} |
|
335 \begin{axis}[ |
|
336 title={Graph: $(a^*)^*\cdot b$ and strings |
|
337 $\underbrace{a\ldots a}_{n}$}, |
|
338 xlabel={$n$}, |
|
339 x label style={at={(1.05,0.0)}}, |
|
340 ylabel={time in secs}, |
|
341 enlargelimits=false, |
|
342 xtick={0,5,...,30}, |
|
343 xmax=33, |
|
344 ymax=35, |
|
345 ytick={0,5,...,30}, |
|
346 scaled ticks=false, |
|
347 axis lines=left, |
|
348 width=6cm, |
|
349 height=5.0cm, |
|
350 legend entries={Python, Java}, |
|
351 legend pos=outer north east] |
|
352 \addplot[blue,mark=*, mark options={fill=white}] table {re-python2.data}; |
|
353 \addplot[cyan,mark=*, mark options={fill=white}] table {re-java.data}; |
|
354 \end{axis} |
|
355 \end{tikzpicture} |
|
356 \end{center} |
|
357 \newpage |
|
358 |
|
359 \subsection*{Part 2 (4 Marks)} |
|
360 |
|
361 Comming from Java or C++, you might think Scala is a quite |
|
362 esotheric programming language. But remember, some serious companies |
|
363 have built their business on Scala. And there are far more esotheric |
|
364 languages out there. One is called \emph{brainf***}. Urban M\"uller |
|
365 developed this language in 1993. A close relative was already |
|
366 introduced in ... by Corado B\"ohm, an Italian computer pionier, who |
|
367 unfortunately died a few months ago. One feature of brainf*** is its |
|
368 minimalistic set of instructions. It has just 8 instructions, all of |
|
369 which are single characters. Despite this minimalism, this language, |
|
370 given enough memory, has been shown to be Turing complete. In this |
|
371 part you will implement an interpreter for this language. |
|
372 |
|
373 |
|
374 |
|
375 \subsubsection*{Tasks (file bf.scala)} |
|
376 |
|
377 \begin{itemize} |
|
378 \item[(2a)] |
|
379 |
|
380 \item[(2b)] |
|
381 |
|
382 \item[(2c)] |
|
383 |
|
384 \end{itemize}\bigskip |
|
385 |
|
386 |
|
387 |
386 |
388 |
387 \end{document} |
389 \end{document} |
388 |
390 |
389 |
391 |
390 %%% Local Variables: |
392 %%% Local Variables: |