174 4000000: 0.99149 secs. |
174 4000000: 0.99149 secs. |
175 4500000: 1.15395 secs. |
175 4500000: 1.15395 secs. |
176 5000000: 1.29659 secs. |
176 5000000: 1.29659 secs. |
177 \end{lstlisting}%$ |
177 \end{lstlisting}%$ |
178 |
178 |
|
179 |
179 \subsection*{Preliminaries} |
180 \subsection*{Preliminaries} |
180 |
181 |
181 The task is to implement a regular expression matcher that is based on |
182 The task is to implement a regular expression matcher that is based on |
182 derivatives of regular expressions. Most of the functions are defined by |
183 derivatives of regular expressions. Most of the functions are defined by |
183 recursion over regular expressions and can be elegantly implemented |
184 recursion over regular expressions and can be elegantly implemented |
239 |
240 |
240 \noindent |
241 \noindent |
241 The alternative regular expressions comes in two versions: one is |
242 The alternative regular expressions comes in two versions: one is |
242 binary (+ / \texttt{ALT}) and the other is n-ary ($\sum$ / |
243 binary (+ / \texttt{ALT}) and the other is n-ary ($\sum$ / |
243 \texttt{ALTs}). The latter takes a list of regular expressions as |
244 \texttt{ALTs}). The latter takes a list of regular expressions as |
244 arguments. In what follows we shall use $rs$ to stand for lists of |
245 argument. In what follows we shall use $rs$ to stand for lists of |
245 regular expressions. The binary alternative can be seen as an abbreviation, |
246 regular expressions. When the list is empty, we shall write $\sum\,[]$; |
|
247 if it is non-empty, we sometimes write $\sum\,[r_1,..., r_n]$. |
|
248 The binary alternative can be seen as an abbreviation, |
246 that is $r_1 + r_2 \dn \sum\,[r_1, r_2]$. As a result we can ignore the |
249 that is $r_1 + r_2 \dn \sum\,[r_1, r_2]$. As a result we can ignore the |
247 binary version and only implement the n-ary alternative. |
250 binary version and only implement the n-ary alternative. |
248 |
251 |
249 \begin{itemize} |
252 \begin{itemize} |
250 \item[(1)] Implement a function, called \textit{nullable}, by |
253 \item[(1)] Implement a function, called \textit{nullable}, by |
265 \end{tabular} |
268 \end{tabular} |
266 \end{center}~\hfill[0.5 Marks] |
269 \end{center}~\hfill[0.5 Marks] |
267 |
270 |
268 \item[(2)] Implement a function, called \textit{der}, by recursion over |
271 \item[(2)] Implement a function, called \textit{der}, by recursion over |
269 regular expressions. It takes a character and a regular expression |
272 regular expressions. It takes a character and a regular expression |
270 as arguments and calculates the derivative of a regular expression according |
273 as arguments and calculates the \emph{derivative} of a regular expression according |
271 to the rules: |
274 to the rules: |
272 |
275 |
273 \begin{center} |
276 \begin{center} |
274 \begin{tabular}{lcl} |
277 \begin{tabular}{lcl} |
275 $\textit{der}\;c\;(\ZERO)$ & $\dn$ & $\ZERO$\\ |
278 $\textit{der}\;c\;(\ZERO)$ & $\dn$ & $\ZERO$\\ |
335 3) &$rs = (\sum rs_1) :: rest$ & $\dn$ & $rs_1 ::: \textrm{flatten}\;rest$\\ |
338 3) &$rs = (\sum rs_1) :: rest$ & $\dn$ & $rs_1 ::: \textrm{flatten}\;rest$\\ |
336 4) &$rs = r :: rest$ & $\dn$ & $r :: \textrm{flatten}\;rest$ & (otherwise)\\ |
339 4) &$rs = r :: rest$ & $\dn$ & $r :: \textrm{flatten}\;rest$ & (otherwise)\\ |
337 \end{tabular} |
340 \end{tabular} |
338 \end{center} |
341 \end{center} |
339 |
342 |
340 The first clause just states that empty lists cannot be further |
343 The first clause states that empty lists cannot be further |
341 flattened. The second removes all $\ZERO$s from the list. |
344 flattened. The second removes the first $\ZERO$ from the list and recurses. |
342 The third is when the first regular expression is an \texttt{ALTs}, then |
345 The third is when the first regular expression is an \texttt{ALTs}, then |
343 the content of this alternative should be spilled out and appended |
346 the content of this alternative should be spilled out and appended |
344 with the flattened rest of the list. The last case is for all other |
347 with the flattened rest of the list. The last case is for all other |
345 cases where the head of the list is not $\ZERO$ and not an \texttt{ALTs}, |
348 cases where the head of the list is not $\ZERO$ and not an \texttt{ALTs}, |
346 then we just keep the head of the list and flatten the rest. |
349 then we just keep the head of the list and flatten the rest. |
364 |
367 |
365 The last case is as follows: first apply $simp$ to all regular |
368 The last case is as follows: first apply $simp$ to all regular |
366 expressions $r_1,.. ,r_n$; then flatten the resulting list using |
369 expressions $r_1,.. ,r_n$; then flatten the resulting list using |
367 \texttt{flts}; finally remove all duplicates in this list (this can be |
370 \texttt{flts}; finally remove all duplicates in this list (this can be |
368 done in Scala using the function |
371 done in Scala using the function |
369 \texttt{\_.distinct}). \textcolor{red}{When you perform these |
372 \texttt{\_.distinct}). When you perform these |
370 operations, you end up with three cases, namely where the list is |
373 operations, you end up with three cases, namely where the list is |
371 empty, contains a single element and ``otherwise''. These cases |
374 empty, contains a single element and ``otherwise''. These cases |
372 should be processed as follows} |
375 should be processed as follows |
373 \begin{center} |
376 \begin{center} |
374 \textcolor{red}{\begin{tabular}{l@{\hspace{4mm}}c@{\hspace{4mm}}ll} |
377 \begin{tabular}{l@{\hspace{4mm}}c@{\hspace{4mm}}ll} |
375 $\sum\;[]$ & $\mapsto$ & $\ZERO$\\ |
378 $\sum\;[]$ & $\mapsto$ & $\ZERO$\\ |
376 $\sum\;[r]$ & $\mapsto$ & $r$\\ |
379 $\sum\;[r]$ & $\mapsto$ & $r$\\ |
377 $\sum\;rs$ & $\mapsto$ & $\sum\;rs$ & ``otherwise''\\ |
380 $\sum\;rs$ & $\mapsto$ & $\sum\;rs$ & ``otherwise''\\ |
378 \end{tabular}} |
381 \end{tabular} |
379 \end{center} |
382 \end{center} |
380 |
383 |
381 |
384 |
382 |
385 |
383 For example the regular expression |
386 For example the regular expression |
513 than your off-the-shelf matcher in your bog-standard language. |
516 than your off-the-shelf matcher in your bog-standard language. |
514 |
517 |
515 \begin{center} |
518 \begin{center} |
516 \begin{tabular}{@{}cc@{}} |
519 \begin{tabular}{@{}cc@{}} |
517 \multicolumn{2}{c}{Graph: $(a^*)^*\cdot b$ and strings |
520 \multicolumn{2}{c}{Graph: $(a^*)^*\cdot b$ and strings |
518 $\underbrace{a\ldots a}_{n}$}\bigskip\\ |
521 $\underbrace{a\ldots a}_{n}$}\medskip\\ |
519 |
522 |
520 \begin{tikzpicture} |
523 \begin{tikzpicture} |
521 \begin{axis}[ |
524 \begin{axis}[ |
522 xlabel={$n$}, |
525 xlabel={$n$}, |
523 x label style={at={(1.05,0.0)}}, |
526 x label style={at={(1.05,0.0)}}, |
529 ymax=45, |
532 ymax=45, |
530 ytick={0,5,...,40}, |
533 ytick={0,5,...,40}, |
531 scaled ticks=false, |
534 scaled ticks=false, |
532 axis lines=left, |
535 axis lines=left, |
533 width=6cm, |
536 width=6cm, |
534 height=5.0cm, |
537 height=5.5cm, |
535 legend entries={Python, Java 8, JavaScript, Swift, Dart}, |
538 legend entries={Python, Java 8, JavaScript, Swift, Dart}, |
536 legend pos=north west, |
539 legend pos=north west, |
537 legend cell align=left] |
540 legend cell align=left] |
538 \addplot[blue,mark=*, mark options={fill=white}] table {re-python2.data}; |
541 \addplot[blue,mark=*, mark options={fill=white}] table {re-python2.data}; |
539 \addplot[cyan,mark=*, mark options={fill=white}] table {re-java.data}; |
542 \addplot[cyan,mark=*, mark options={fill=white}] table {re-java.data}; |