6 |
6 |
7 \section*{Coursework 8 (Scala, Regular Expressions)} |
7 \section*{Coursework 8 (Scala, Regular Expressions)} |
8 |
8 |
9 This coursework is worth 10\%. It is about regular expressions, |
9 This coursework is worth 10\%. It is about regular expressions, |
10 pattern matching and polymorphism. The first part is due on 30 |
10 pattern matching and polymorphism. The first part is due on 30 |
11 November at 11pm; the second, more advanced part, is due on 7 December |
11 November at 11pm; the second, more advanced part, is due on 21 |
12 at 11pm. You are asked to implement a regular expression matcher. Make |
12 December at 11pm. You are asked to implement a regular expression |
13 sure the files you submit can be processed by just calling |
13 matcher based on derivatives of regular expressions. The reason is |
14 \texttt{scala <<filename.scala>>}.\bigskip |
14 that regular expression matching in Java can be extremely slow |
|
15 sometimes.\bigskip |
15 |
16 |
16 \noindent |
17 \noindent |
17 \textbf{Important:} Do not use any mutable data structures in your |
18 \textbf{Important:} |
18 submission! They are not needed. This menas you cannot use |
19 |
19 \texttt{ListBuffer}s, for example. Do not use \texttt{return} in your |
20 \begin{itemize} |
20 code! It has a different meaning in Scala, than in Java. Do not use |
21 \item Make sure the files you submit can be processed by just calling\\ |
21 \texttt{var}! This declares a mutable variable. Make sure the |
22 \mbox{\texttt{scala <<filename.scala>>}} on the commandline. Use the |
22 functions you submit are defined on the ``top-level'' of Scala, not |
23 template files provided and do not make any changes to arguments of |
23 inside a class or object. Also note that the running time of |
24 functions or to any types. You are free to implement any auxiliary |
24 each part will be restricted to a maximum of 360 seconds on my |
25 function you might need. |
25 laptop. |
26 |
26 |
27 \item Do not use any mutable data structures in your |
|
28 submissions! They are not needed. This means you cannot create new |
|
29 \texttt{Array}s or \texttt{ListBuffer}s, for example. |
|
30 |
|
31 \item Do not use \texttt{return} in your code! It has a different |
|
32 meaning in Scala, than in Java. |
|
33 |
|
34 \item Do not use \texttt{var}! This declares a mutable variable. Only |
|
35 use \texttt{val}! |
|
36 |
|
37 \item Do not use any parallel collections! No \texttt{.par} therefore! |
|
38 Our testing and marking infrastructure is not set up for it. |
|
39 \end{itemize} |
|
40 |
|
41 \noindent |
|
42 Also note that the running time of each part will be restricted to a |
|
43 maximum of 360 seconds on my laptop |
27 |
44 |
28 |
45 |
29 \subsection*{Disclaimer} |
46 \subsection*{Disclaimer} |
30 |
47 |
31 It should be understood that the work you submit represents |
48 It should be understood that the work you submit represents |
52 & $|$ & $r^*$ & can match zero or more times $r$\\ |
69 & $|$ & $r^*$ & can match zero or more times $r$\\ |
53 \end{tabular} |
70 \end{tabular} |
54 \end{center} |
71 \end{center} |
55 |
72 |
56 \noindent |
73 \noindent |
57 Why? Knowing how to match regular expressions and strings will |
74 Why? Knowing how to match regular expressions and strings will let you |
58 let you solve a lot of problems that vex other humans. Regular |
75 solve a lot of problems that vex other humans. Regular expressions are |
59 expressions are one of the fastest and simplest ways to match patterns |
76 one of the fastest and simplest ways to match patterns in text, and |
60 in text, and are endlessly useful for searching, editing and |
77 are endlessly useful for searching, editing and analysing data in all |
61 analysing text in all sorts of places. However, you need to be |
78 sorts of places (for example analysing network traffic in order to |
62 fast, otherwise you will stumble over problems such as recently reported at |
79 detect security breaches). However, you need to be fast, otherwise you |
|
80 will stumble over problems such as recently reported at |
63 |
81 |
64 {\small |
82 {\small |
65 \begin{itemize} |
83 \begin{itemize} |
66 \item[$\bullet$] \url{http://stackstatus.net/post/147710624694/outage-postmortem-july-20-2016} |
84 \item[$\bullet$] \url{http://stackstatus.net/post/147710624694/outage-postmortem-july-20-2016} |
67 \item[$\bullet$] \url{https://vimeo.com/112065252} |
85 \item[$\bullet$] \url{https://vimeo.com/112065252} |
69 \end{itemize}} |
87 \end{itemize}} |
70 |
88 |
71 \subsubsection*{Tasks (file re.scala)} |
89 \subsubsection*{Tasks (file re.scala)} |
72 |
90 |
73 \begin{itemize} |
91 \begin{itemize} |
74 \item[(1a)] Implement a function, called \textit{nullable}, by recursion over |
92 \item[(1a)] Implement a function, called \textit{nullable}, by |
75 regular expressions. This function tests whether a regular expression can match |
93 recursion over regular expressions. This function tests whether a |
76 the empty string. |
94 regular expression can match the empty string, that is given a |
|
95 regular expression it either returns true or false. |
77 |
96 |
78 \begin{center} |
97 \begin{center} |
79 \begin{tabular}{lcl} |
98 \begin{tabular}{lcl} |
80 $\textit{nullable}(\ZERO)$ & $\dn$ & $\textit{false}$\\ |
99 $\textit{nullable}(\ZERO)$ & $\dn$ & $\textit{false}$\\ |
81 $\textit{nullable}(\ONE)$ & $\dn$ & $\textit{true}$\\ |
100 $\textit{nullable}(\ONE)$ & $\dn$ & $\textit{true}$\\ |
132 |
151 |
133 \begin{center} |
152 \begin{center} |
134 \begin{tabular}{lcll} |
153 \begin{tabular}{lcll} |
135 $\textit{der}\;a\;r''$ & $=$ & $((\ZERO \cdot b) + \ZERO) \cdot c + \ZERO$ \\ |
154 $\textit{der}\;a\;r''$ & $=$ & $((\ZERO \cdot b) + \ZERO) \cdot c + \ZERO$ \\ |
136 $\textit{der}\;b\;r''$ & $=$ & $((\ZERO \cdot b) + \ZERO) \cdot c + \ZERO$\\ |
155 $\textit{der}\;b\;r''$ & $=$ & $((\ZERO \cdot b) + \ZERO) \cdot c + \ZERO$\\ |
137 $\textit{der}\;c\;r''$ & $=$ & $((\ZERO \cdot b) + \ZERO) \cdot c + \ONE$ |
156 $\textit{der}\;c\;r''$ & $=$ & $((\ZERO \cdot b) + \ZERO) \cdot c + \ONE$ & |
|
157 (is $\textit{nullable}$) |
138 \end{tabular} |
158 \end{tabular} |
139 \end{center} |
159 \end{center} |
140 |
160 |
141 Note, the last derivative can match the empty string, that is it is \textit{nullable}.\\ |
161 Note, the last derivative can match the empty string, that is it is \textit{nullable}.\\ |
142 \mbox{}\hfill\mbox{[1 Mark]} |
162 \mbox{}\hfill\mbox{[1 Mark]} |
163 \[(r_1 + \ZERO) \cdot \ONE + ((\ONE + r_2) + r_3) \cdot (r_4 \cdot \ZERO)\] |
183 \[(r_1 + \ZERO) \cdot \ONE + ((\ONE + r_2) + r_3) \cdot (r_4 \cdot \ZERO)\] |
164 |
184 |
165 simplifies to just $r_1$. \textbf{Hints:} Regular expressions can be |
185 simplifies to just $r_1$. \textbf{Hints:} Regular expressions can be |
166 seen as trees and there are several methods for traversing |
186 seen as trees and there are several methods for traversing |
167 trees. One of them corresponds to the inside-out traversal. Also |
187 trees. One of them corresponds to the inside-out traversal. Also |
168 remember numerical expressions from school: there you had exprssions |
188 remember numerical expressions from school: there you had expressions |
169 like $u + \ldots + (1 \cdot x) * \ldots (z + (y \cdot 0)) \ldots$ |
189 like $u + \ldots + (1 \cdot x) - \ldots (z + (y \cdot 0)) \ldots$ |
170 and simplification rules that looked very similar to rules |
190 and simplification rules that looked very similar to rules |
171 above. You would simplify such numerical expressions by replacing |
191 above. You would simplify such numerical expressions by replacing |
172 for example the $y \cdot 0$ by $0$, or $1\cdot x$ by $x$, and then |
192 for example the $y \cdot 0$ by $0$, or $1\cdot x$ by $x$, and then |
173 look if more rules are applicable. If you organise this |
193 look whether more rules are applicable. If you organise the |
174 simplification in an inside-out fashion, it is always clear which |
194 simplification in an inside-out fashion, it is always clear which |
175 rule should applied next.\\\mbox{}\hfill[1 Mark] |
195 rule should applied next.\\\mbox{}\hfill[1 Mark] |
176 |
196 |
177 \item[(1d)] Implement two functions: The first, called \textit{ders}, |
197 \item[(1d)] Implement two functions: The first, called \textit{ders}, |
178 takes a list of characters and a regular expression as arguments, and |
198 takes a list of characters and a regular expression as arguments, and |
331 \subsection*{Background} |
351 \subsection*{Background} |
332 |
352 |
333 Although easily implementable in Scala, the idea behind the derivative |
353 Although easily implementable in Scala, the idea behind the derivative |
334 function might not so easy to be seen. To understand its purpose |
354 function might not so easy to be seen. To understand its purpose |
335 better, assume a regular expression $r$ can match strings of the form |
355 better, assume a regular expression $r$ can match strings of the form |
336 $c::cs$ (that means strings which start with a character $c$ and have |
356 $c\!::\!cs$ (that means strings which start with a character $c$ and have |
337 some rest, or tail, $cs$). If you now take the derivative of $r$ with |
357 some rest, or tail, $cs$). If you now take the derivative of $r$ with |
338 respect to the character $c$, then you obtain a regular expressions |
358 respect to the character $c$, then you obtain a regular expressions |
339 that can match all the strings $cs$. In other words, the regular |
359 that can match all the strings $cs$. In other words, the regular |
340 expression $\textit{der}\;c\;r$ can match the same strings $c::cs$ |
360 expression $\textit{der}\;c\;r$ can match the same strings $c::cs$ |
341 that can be matched by $r$, except that the $c$ is chopped off. |
361 that can be matched by $r$, except that the $c$ is chopped off. |