108 |
108 |
109 \begin{center} |
109 \begin{center} |
110 \begin{tabular}{lcll} |
110 \begin{tabular}{lcll} |
111 $r$ & $::=$ & $\ZERO$ & cannot match anything\\ |
111 $r$ & $::=$ & $\ZERO$ & cannot match anything\\ |
112 & $|$ & $\ONE$ & can only match the empty string\\ |
112 & $|$ & $\ONE$ & can only match the empty string\\ |
113 & $|$ & $c$ & can match a character (in this case $c$)\\ |
113 & $|$ & $c$ & can match a single character (in this case $c$)\\ |
114 & $|$ & $r_1 + r_2$ & can match a string either with $r_1$ or with $r_2$\\ |
114 & $|$ & $r_1 + r_2$ & can match a string either with $r_1$ or with $r_2$\\ |
115 & $|$ & $r_1\cdot r_2$ & can match the first part of a string with $r_1$ and\\ |
115 & $|$ & $r_1\cdot r_2$ & can match the first part of a string with $r_1$ and\\ |
116 & & & then the second part with $r_2$\\ |
116 & & & then the second part with $r_2$\\ |
117 & $|$ & $r^*$ & can match zero or more times $r$\\ |
117 & $|$ & $r^*$ & can match zero or more times $r$\\ |
118 \end{tabular} |
118 \end{tabular} |
134 \item[$\bullet$] \url{http://davidvgalbraith.com/how-i-fixed-atom/} |
134 \item[$\bullet$] \url{http://davidvgalbraith.com/how-i-fixed-atom/} |
135 \end{itemize}} |
135 \end{itemize}} |
136 |
136 |
137 \subsubsection*{Tasks (file re.scala)} |
137 \subsubsection*{Tasks (file re.scala)} |
138 |
138 |
|
139 The file \texttt{re.scala} has already a definition for regular |
|
140 expressions and also defines some handy shorthand notation for |
|
141 regular expressions. The notation in this document matches up |
|
142 with the code in the file as follows: |
|
143 |
|
144 \begin{center} |
|
145 \begin{tabular}{rcl@{\hspace{10mm}}l} |
|
146 & & code: & shorthand:\smallskip \\ |
|
147 $\ZERO$ & $\mapsto$ & \texttt{ZERO}\\ |
|
148 $\ONE$ & $\mapsto$ & \texttt{ONE}\\ |
|
149 $c$ & $\mapsto$ & \texttt{CHAR(c)}\\ |
|
150 $r_1 + r_2$ & $\mapsto$ & \texttt{ALT(r1, r2)} & \texttt{r1 | r2}\\ |
|
151 $r_1 \cdot r_2$ & $\mapsto$ & \texttt{SEQ(r1, r2)} & \texttt{r1 $\sim$ r2}\\ |
|
152 $r^*$ & $\mapsto$ & \texttt{STAR(r)} & \texttt{r.\%} |
|
153 \end{tabular} |
|
154 \end{center} |
|
155 |
|
156 |
139 \begin{itemize} |
157 \begin{itemize} |
140 \item[(1a)] Implement a function, called \textit{nullable}, by |
158 \item[(1a)] Implement a function, called \textit{nullable}, by |
141 recursion over regular expressions. This function tests whether a |
159 recursion over regular expressions. This function tests whether a |
142 regular expression can match the empty string. This means given a |
160 regular expression can match the empty string. This means given a |
143 regular expression it either returns true or false. |
161 regular expression it either returns true or false. The function |
|
162 \textit{nullable} |
|
163 is defined as follows: |
144 |
164 |
145 \begin{center} |
165 \begin{center} |
146 \begin{tabular}{lcl} |
166 \begin{tabular}{lcl} |
147 $\textit{nullable}(\ZERO)$ & $\dn$ & $\textit{false}$\\ |
167 $\textit{nullable}(\ZERO)$ & $\dn$ & $\textit{false}$\\ |
148 $\textit{nullable}(\ONE)$ & $\dn$ & $\textit{true}$\\ |
168 $\textit{nullable}(\ONE)$ & $\dn$ & $\textit{true}$\\ |
149 $\textit{nullable}(c)$ & $\dn$ & $\textit{false}$\\ |
169 $\textit{nullable}(c)$ & $\dn$ & $\textit{false}$\\ |
150 $\textit{nullable}(r_1 + r_2)$ & $\dn$ & $\textit{nullable}(r_1) \vee \textit{nullable}(r_2)$\\ |
170 $\textit{nullable}(r_1 + r_2)$ & $\dn$ & $\textit{nullable}(r_1) \vee \textit{nullable}(r_2)$\\ |
151 $\textit{nullable}(r_1 \cdot r_2)$ & $\dn$ & $\textit{nullable}(r_1) \wedge \textit{nullable}(r_2)$\\ |
171 $\textit{nullable}(r_1 \cdot r_2)$ & $\dn$ & $\textit{nullable}(r_1) \wedge \textit{nullable}(r_2)$\\ |
152 $\textit{nullable}(r^*)$ & $\dn$ & $\textit{true}$\\ |
172 $\textit{nullable}(r^*)$ & $\dn$ & $\textit{true}$\\ |
153 \end{tabular} |
173 \end{tabular} |
154 \end{center}\hfill[1 Mark] |
174 \end{center}~\hfill[1 Mark] |
155 |
175 |
156 \item[(1b)] Implement a function, called \textit{der}, by recursion over |
176 \item[(1b)] Implement a function, called \textit{der}, by recursion over |
157 regular expressions. It takes a character and a regular expression |
177 regular expressions. It takes a character and a regular expression |
158 as arguments and calculates the derivative regular expression according |
178 as arguments and calculates the derivative regular expression according |
159 to the rules: |
179 to the rules: |
312 a regular expression that can match the empty string. You can test |
332 a regular expression that can match the empty string. You can test |
313 whether this is indeed the case using the function nullable, which is |
333 whether this is indeed the case using the function nullable, which is |
314 what your matcher is doing. |
334 what your matcher is doing. |
315 |
335 |
316 The purpose of the $\textit{simp}$ function is to keep the regular |
336 The purpose of the $\textit{simp}$ function is to keep the regular |
317 expression small. Normally the derivative function makes the regular |
337 expressions small. Normally the derivative function makes the regular |
318 expression bigger (see the SEQ case and the example in (1b)) and the |
338 expression bigger (see the SEQ case and the example in (1b)) and the |
319 algorithm would be slower and slower over time. The $\textit{simp}$ |
339 algorithm would be slower and slower over time. The $\textit{simp}$ |
320 function counters this increase in size and the result is that the |
340 function counters this increase in size and the result is that the |
321 algorithm is fast throughout. By the way, this algorithm is by Janusz |
341 algorithm is fast throughout. By the way, this algorithm is by Janusz |
322 Brzozowski who came up with the idea of derivatives in 1964 in his PhD |
342 Brzozowski who came up with the idea of derivatives in 1964 in his PhD |
326 \url{https://en.wikipedia.org/wiki/Janusz_Brzozowski_(computer_scientist)} |
346 \url{https://en.wikipedia.org/wiki/Janusz_Brzozowski_(computer_scientist)} |
327 \end{center} |
347 \end{center} |
328 |
348 |
329 |
349 |
330 If you want to see how badly the regular expression matchers do in |
350 If you want to see how badly the regular expression matchers do in |
331 Java\footnote{Version 8 and below} and in Python with the `evil' regular |
351 Java\footnote{Version 8 and below; Version 9 does not seem to be as |
|
352 catastrophic, but still worse than the regular expression matcher |
|
353 based on derivatives.} and in Python with the `evil' regular |
332 expression $(a^*)^*\cdot b$, then have a look at the graphs below (you |
354 expression $(a^*)^*\cdot b$, then have a look at the graphs below (you |
333 can try it out for yourself: have a look at the file |
355 can try it out for yourself: have a look at the file |
334 \texttt{catastrophic.java} and \texttt{catastrophic.py} on |
356 \texttt{catastrophic.java} and \texttt{catastrophic.py} on |
335 KEATS). Compare this with the matcher you have implemented. How long |
357 KEATS). Compare this with the matcher you have implemented. How long |
336 can the string of $a$'s be in your matcher and still stay within the |
358 can the string of $a$'s be in your matcher and still stay within the |
350 ymax=35, |
372 ymax=35, |
351 ytick={0,5,...,30}, |
373 ytick={0,5,...,30}, |
352 scaled ticks=false, |
374 scaled ticks=false, |
353 axis lines=left, |
375 axis lines=left, |
354 width=6cm, |
376 width=6cm, |
355 height=5.0cm, |
377 height=5.5cm, |
356 legend entries={Python, Java 8}, |
378 legend entries={Python, Java 8}, |
357 legend pos=outer north east] |
379 legend pos=outer north east] |
358 \addplot[blue,mark=*, mark options={fill=white}] table {re-python2.data}; |
380 \addplot[blue,mark=*, mark options={fill=white}] table {re-python2.data}; |
359 \addplot[cyan,mark=*, mark options={fill=white}] table {re-java.data}; |
381 \addplot[cyan,mark=*, mark options={fill=white}] table {re-java.data}; |
360 \end{axis} |
382 \end{axis} |
377 computer pioneer, who unfortunately died a few months ago. The main |
399 computer pioneer, who unfortunately died a few months ago. The main |
378 feature of brainf*** is its minimalistic set of instructions---just 8 |
400 feature of brainf*** is its minimalistic set of instructions---just 8 |
379 instructions in total and all of which are single characters. Despite |
401 instructions in total and all of which are single characters. Despite |
380 the minimalism, this language has been shown to be Turing |
402 the minimalism, this language has been shown to be Turing |
381 complete\ldots{}if this doesn't ring any bell with you: it roughly |
403 complete\ldots{}if this doesn't ring any bell with you: it roughly |
382 that means every algorithm we know can, in principle, be implemented in |
404 means that every algorithm we know can, in principle, be implemented in |
383 brainf***. It just takes a lot of determination and quite a lot of |
405 brainf***. It just takes a lot of determination and quite a lot of |
384 memory resources. Some relatively sophisticated sample programs in |
406 memory resources. Some relatively sophisticated sample programs in |
385 brainf*** are given in the file \texttt{bf.scala}.\bigskip |
407 brainf*** are given in the file \texttt{bf.scala}.\bigskip |
386 |
408 |
387 \noindent |
409 \noindent |
418 |
440 |
419 \begin{itemize} |
441 \begin{itemize} |
420 \item[(2a)] Brainf*** memory is represented by a \texttt{Map} from |
442 \item[(2a)] Brainf*** memory is represented by a \texttt{Map} from |
421 integers to integers. The empty memory is represented by |
443 integers to integers. The empty memory is represented by |
422 \texttt{Map()}, that is nothing is stored in the |
444 \texttt{Map()}, that is nothing is stored in the |
423 memory. \texttt{Map(0 -> 1, 2 -> 3)} clearly has stored \texttt{1} |
445 memory. \texttt{Map(0 -> 1, 2 -> 3)} clearly stores \texttt{1} at |
424 at memory location \texttt{0}; at \texttt{2} it stores |
446 memory location \texttt{0}; at \texttt{2} it stores \texttt{3}. The |
425 \texttt{3}. The convention is that if we query the memory at a |
447 convention is that if we query the memory at a location that is |
426 location that is not defined in the \texttt{Map}, we return |
448 \emph{not} defined in the \texttt{Map}, we return \texttt{0}. Write |
427 \texttt{0}. Write a function, \texttt{sread}, that takes a memory (a |
449 a function, \texttt{sread}, that takes a memory (a \texttt{Map}) and |
428 \texttt{Map}) and a memory pointer (an \texttt{Int}) as argument, |
450 a memory pointer (an \texttt{Int}) as argument, and safely reads the |
429 and safely reads the corresponding memory location. If the map is not |
451 corresponding memory location. If the \texttt{Map} is not defined at |
430 defined at the memory pointer, \texttt{sread} returns \texttt{0}. |
452 the memory pointer, \texttt{sread} returns \texttt{0}. |
431 |
453 |
432 Write another function \texttt{write}, which takes a memory, a |
454 Write another function \texttt{write}, which takes a memory, a |
433 memory pointer and an integer value as argument and updates the map |
455 memory pointer and an integer value as argument and updates the |
434 with the value at the given memory location. As usual the map is not |
456 \texttt{Map} with the value at the given memory location. As usual |
435 updated `in-place' but a new map is created with the same data, |
457 the \texttt{Map} is not updated `in-place' but a new map is created |
436 except the value is stored at the given memory pointer.\hfill[1 Mark] |
458 with the same data, except the value is stored at the given memory |
|
459 pointer.\hfill[1 Mark] |
437 |
460 |
438 \item[(2b)] Write two functions, \texttt{jumpRight} and |
461 \item[(2b)] Write two functions, \texttt{jumpRight} and |
439 \texttt{jumpLeft} that are needed to implement the loop constructs |
462 \texttt{jumpLeft} that are needed to implement the loop constructs |
440 of brainf***. They take a program (a \texttt{String}) and a program |
463 of brainf***. They take a program (a \texttt{String}) and a program |
441 counter (an \texttt{Int}) as argument and move right (respectively |
464 counter (an \texttt{Int}) as argument and move right (respectively |
452 |
475 |
453 \begin{center} |
476 \begin{center} |
454 \texttt{--[..+>--]\barbelow{,}>,++} |
477 \texttt{--[..+>--]\barbelow{,}>,++} |
455 \end{center} |
478 \end{center} |
456 |
479 |
457 meaning it jumps to after the loop. Similarly, if you in 8th position |
480 meaning it jumps to after the loop. Similarly, if you are in 8th position |
458 then \texttt{jumpLeft} is supposed to jump to just after the opening |
481 then \texttt{jumpLeft} is supposed to jump to just after the opening |
459 bracket (that is jumping to the beginning of the loop): |
482 bracket (that is jumping to the beginning of the loop): |
460 |
483 |
461 \begin{center} |
484 \begin{center} |
462 \texttt{--[..+>-\barbelow{-}],>,++} |
485 \texttt{--[..+>-\barbelow{-}],>,++} |
550 $\bullet$ & \texttt{mem} updated with \texttt{mp -> mem(mp) - 1}\\ |
573 $\bullet$ & \texttt{mem} updated with \texttt{mp -> mem(mp) - 1}\\ |
551 \end{tabular}\\\hline |
574 \end{tabular}\\\hline |
552 \hfill\texttt{'.'} & \begin{tabular}[t]{@{}l@{\hspace{2mm}}l@{}} |
575 \hfill\texttt{'.'} & \begin{tabular}[t]{@{}l@{\hspace{2mm}}l@{}} |
553 $\bullet$ & $\texttt{pc} + 1$\\ |
576 $\bullet$ & $\texttt{pc} + 1$\\ |
554 $\bullet$ & $\texttt{mp}$ and \texttt{mem} unchanged\\ |
577 $\bullet$ & $\texttt{mp}$ and \texttt{mem} unchanged\\ |
555 $\bullet$ & print out\texttt{mem(mp)} as a character\\ |
578 $\bullet$ & print out \,\texttt{mem(mp)} as a character\\ |
556 \end{tabular}\\\hline |
579 \end{tabular}\\\hline |
557 \hfill\texttt{','} & \begin{tabular}[t]{@{}l@{\hspace{2mm}}l@{}} |
580 \hfill\texttt{','} & \begin{tabular}[t]{@{}l@{\hspace{2mm}}l@{}} |
558 $\bullet$ & $\texttt{pc} + 1$\\ |
581 $\bullet$ & $\texttt{pc} + 1$\\ |
559 $\bullet$ & $\texttt{mp}$ unchanged\\ |
582 $\bullet$ & $\texttt{mp}$ unchanged\\ |
560 $\bullet$ & \texttt{mem} updated with \texttt{mp -> \textrm{input}}\\ |
583 $\bullet$ & \texttt{mem} updated with \texttt{mp -> \textrm{input}}\\ |
561 \multicolumn{2}{@{}l}{input given by \texttt{Console.in.read().toByte}} |
584 \multicolumn{2}{@{}l}{the input is given by \texttt{Console.in.read().toByte}} |
562 \end{tabular}\\\hline |
585 \end{tabular}\\\hline |
563 \hfill\texttt{'['} & \begin{tabular}[t]{@{}l@{\hspace{2mm}}l@{}} |
586 \hfill\texttt{'['} & \begin{tabular}[t]{@{}l@{\hspace{2mm}}l@{}} |
564 \multicolumn{2}{@{}l}{if \texttt{mem(mp) == 0} then}\\ |
587 \multicolumn{2}{@{}l}{if \texttt{mem(mp) == 0} then}\\ |
565 $\bullet$ & $\texttt{pc = jumpRight(prog, pc + 1, 0)}$\\ |
588 $\bullet$ & $\texttt{pc = jumpRight(prog, pc + 1, 0)}$\\ |
566 $\bullet$ & $\texttt{mp}$ and \texttt{mem} unchanged\medskip\\ |
589 $\bullet$ & $\texttt{mp}$ and \texttt{mem} unchanged\medskip\\ |