6
|
1 |
\documentclass{article}
|
62
|
2 |
\usepackage{../style}
|
6
|
3 |
%%\usepackage{../langs}
|
|
4 |
|
|
5 |
\begin{document}
|
|
6 |
|
62
|
7 |
\section*{Coursework 8 (Scala, Regular Expressions}
|
6
|
8 |
|
68
|
9 |
This coursework is worth 10\%. It is about regular expressions and
|
|
10 |
pattern matching. The first part is due on 30 November at 11pm; the
|
|
11 |
second, more advanced part, is due on 7 December at 11pm. The
|
|
12 |
second part is not yet included. For the first part you are
|
|
13 |
asked to implement a regular expression matcher. Make sure the files
|
62
|
14 |
you submit can be processed by just calling \texttt{scala
|
68
|
15 |
<<filename.scala>>}.\bigskip
|
62
|
16 |
|
|
17 |
\noindent
|
|
18 |
\textbf{Important:} Do not use any mutable data structures in your
|
|
19 |
submissions! They are not needed. This excluded the use of
|
|
20 |
\texttt{ListBuffer}s, for example. Do not use \texttt{return} in your
|
68
|
21 |
code! It has a different meaning in Scala, than in Java. Do not use
|
|
22 |
\texttt{var}! This declares a mutable variable. Make sure the
|
62
|
23 |
functions you submit are defined on the ``top-level'' of Scala, not
|
|
24 |
inside a class or object.
|
6
|
25 |
|
|
26 |
|
68
|
27 |
\subsection*{Disclaimer!!!!!!!!}
|
6
|
28 |
|
|
29 |
It should be understood that the work you submit represents
|
68
|
30 |
your own effort! You have not copied from anyone else. An
|
6
|
31 |
exception is the Scala code I showed during the lectures or
|
|
32 |
uploaded to KEATS, which you can freely use.\bigskip
|
|
33 |
|
|
34 |
|
68
|
35 |
\subsection*{Part 1 (6 Marks)}
|
6
|
36 |
|
|
37 |
The task is to implement a regular expression matcher based on
|
68
|
38 |
derivatives of regular expressions. The implementation can deal
|
|
39 |
with the following regular expressions, which have been predefined
|
|
40 |
file re.scala:
|
6
|
41 |
|
|
42 |
\begin{center}
|
|
43 |
\begin{tabular}{lcll}
|
|
44 |
$r$ & $::=$ & $\ZERO$ & cannot match anything\\
|
|
45 |
& $|$ & $\ONE$ & can only match the empty string\\
|
68
|
46 |
& $|$ & $c$ & can match a character (in this case $c$)\\
|
|
47 |
& $|$ & $r_1 + r_2$ & can match a string either with $r_1$ or with $r_2$\\
|
|
48 |
& $|$ & $r_1\cdot r_2$ & can match the first part of a string with $r_1$ and\\
|
|
49 |
& & & then the second part with $r_2$\\
|
6
|
50 |
& $|$ & $r^*$ & can match zero or more times $r$\\
|
|
51 |
\end{tabular}
|
|
52 |
\end{center}
|
|
53 |
|
68
|
54 |
\noindent
|
|
55 |
Why? Knowing how to match regular expressions and strings fast will
|
|
56 |
let you solve a lot of problems that vex other humans. Regular
|
|
57 |
expressions are one of the fastest and simplest ways to match patterns
|
|
58 |
in text, and are endlessly useful for searching, editing and
|
|
59 |
analysing text in all sorts of places. However, you need to be
|
|
60 |
fast, otherwise you will stumble upon problems such as recently reported at
|
|
61 |
|
|
62 |
{\small
|
|
63 |
\begin{itemize}
|
|
64 |
\item[$\bullet$] \url{http://stackstatus.net/post/147710624694/outage-postmortem-july-20-2016}
|
|
65 |
\item[$\bullet$] \url{https://vimeo.com/112065252}
|
|
66 |
\item[$\bullet$] \url{http://davidvgalbraith.com/how-i-fixed-atom/}
|
|
67 |
\end{itemize}}
|
|
68 |
|
|
69 |
\subsection*{Tasks (file re.scala)}
|
|
70 |
|
|
71 |
\begin{itemize}
|
|
72 |
\item[(1a)] Implement a function, called \textit{nullable}, by recursion over
|
|
73 |
regular expressions. This function test whether a regular expression can match
|
|
74 |
the empty string.
|
6
|
75 |
|
|
76 |
\begin{center}
|
|
77 |
\begin{tabular}{lcl}
|
|
78 |
$\textit{nullable}(\ZERO)$ & $\dn$ & $\textit{false}$\\
|
|
79 |
$\textit{nullable}(\ONE)$ & $\dn$ & $\textit{true}$\\
|
|
80 |
$\textit{nullable}(c)$ & $\dn$ & $\textit{false}$\\
|
|
81 |
$\textit{nullable}(r_1 + r_2)$ & $\dn$ & $\textit{nullable}(r_1) \vee \textit{nullable}(r_2)$\\
|
|
82 |
$\textit{nullable}(r_1 \cdot r_2)$ & $\dn$ & $\textit{nullable}(r_1) \wedge \textit{nullable}(r_2)$\\
|
|
83 |
$\textit{nullable}(r^*)$ & $\dn$ & $\textit{true}$\\
|
|
84 |
\end{tabular}
|
68
|
85 |
\end{center}\hfill[1 Mark]
|
|
86 |
|
|
87 |
\item[(1b)] Implement a function, called \textit{der}, by recursion over
|
|
88 |
regular expressions. It takes a character and a regular expression
|
|
89 |
as arguments and calculates the derivative regular expression.
|
6
|
90 |
|
|
91 |
\begin{center}
|
|
92 |
\begin{tabular}{lcl}
|
|
93 |
$\textit{der}\;c\;(\ZERO)$ & $\dn$ & $\ZERO$\\
|
|
94 |
$\textit{der}\;c\;(\ONE)$ & $\dn$ & $\ZERO$\\
|
|
95 |
$\textit{der}\;c\;(d)$ & $\dn$ & $\textit{if}\; c = d\;\textit{then} \;\ONE \; \textit{else} \;\ZERO$\\
|
|
96 |
$\textit{der}\;c\;(r_1 + r_2)$ & $\dn$ & $(\textit{der}\;c\;r_1) + (\textit{der}\;c\;r_2)$\\
|
|
97 |
$\textit{der}\;c\;(r_1 \cdot r_2)$ & $\dn$ & $\textit{if}\;\textit{nullable}(r_1)$\\
|
|
98 |
& & $\textit{then}\;((\textit{der}\;c\;r_1)\cdot r_2) + (\textit{der}\;c\;r_2)$\\
|
|
99 |
& & $\textit{else}\;(\textit{der}\;c\;r_1)\cdot r_2$\\
|
|
100 |
$\textit{der}\;c\;(r^*)$ & $\dn$ & $(\textit{der}\;c\;r)\cdot (r^*)$\\
|
|
101 |
\end{tabular}
|
68
|
102 |
\end{center}\hfill[1 Mark]
|
6
|
103 |
|
68
|
104 |
\item[(1c)] Implement the function \textit{simp}, which recursively
|
|
105 |
traverses a regular expression from inside to outside, and
|
|
106 |
simplifies every sub-regular-expressions on the left to
|
|
107 |
the regular expression on the right, except it does not simplify inside
|
|
108 |
${}^*$-regular expressions.
|
6
|
109 |
|
68
|
110 |
\begin{center}
|
6
|
111 |
\begin{tabular}{l@{\hspace{2mm}}c@{\hspace{2mm}}ll}
|
|
112 |
$r \cdot \ZERO$ & $\mapsto$ & $\ZERO$\\
|
|
113 |
$\ZERO \cdot r$ & $\mapsto$ & $\ZERO$\\
|
|
114 |
$r \cdot \ONE$ & $\mapsto$ & $r$\\
|
|
115 |
$\ONE \cdot r$ & $\mapsto$ & $r$\\
|
|
116 |
$r + \ZERO$ & $\mapsto$ & $r$\\
|
|
117 |
$\ZERO + r$ & $\mapsto$ & $r$\\
|
|
118 |
$r + r$ & $\mapsto$ & $r$\\
|
|
119 |
\end{tabular}
|
68
|
120 |
\end{center}
|
|
121 |
|
|
122 |
For example
|
|
123 |
\[(r_1 + \ZERO) \cdot \ONE + ((\ONE + r_2) + r_3) \cdot (r_4 \cdot \ZERO)\]
|
|
124 |
|
|
125 |
simplifies to just $r_1$.
|
|
126 |
\hfill[1 Mark]
|
|
127 |
|
|
128 |
\item[(1d)] Implement two functions: The first, called \textit{ders},
|
|
129 |
takes a list of characters as arguments and a regular expression and
|
|
130 |
buids the derivative as follows:
|
|
131 |
|
|
132 |
\begin{center}
|
|
133 |
\begin{tabular}{lcl}
|
|
134 |
$\textit{ders}\;Nil\;(r)$ & $\dn$ & $r$\\
|
|
135 |
$\textit{ders}\;c::cs\;(r)$ & $\dn$ &
|
|
136 |
$\textit{ders}\;cs\;(\textit{simp}(\textit{der}\;c\;r))$\\
|
|
137 |
\end{tabular}
|
6
|
138 |
\end{center}
|
|
139 |
|
68
|
140 |
The second, called \textit{matcher}, takes a string and a regular expression
|
|
141 |
as arguments. It builds first the derivatives according to \textit{ders}
|
|
142 |
and at the end tests whether the resulting redular expression can match
|
|
143 |
the empty string (using \textit{nullable}).
|
|
144 |
For example the \textit{matcher} will produce true if given the
|
|
145 |
regular expression $a\cdot b\cdot c$ and the string $abc$.
|
|
146 |
\hfill[1 Mark]
|
6
|
147 |
|
68
|
148 |
\item[(1e)] Implement the function \textit{replace}: it searches (from the left to
|
|
149 |
right) in string $s_1$ all the non-empty substrings that match the
|
|
150 |
regular expression---these substrings are assumed to be
|
|
151 |
the longest substrings matched by the regular expression and
|
|
152 |
assumed to be non-overlapping. All these substrings in $s_1$ are replaced
|
|
153 |
by $s_2$. For example given the regular expression
|
6
|
154 |
|
68
|
155 |
\[(a \cdot a)^* + (b \cdot b)\]
|
6
|
156 |
|
68
|
157 |
\noindent the string $aabbbaaaaaaabaaaaabbaaaabb$ and
|
|
158 |
replacement string $c$ yields the string
|
6
|
159 |
|
68
|
160 |
\[
|
|
161 |
ccbcabcaccc
|
|
162 |
\]
|
6
|
163 |
|
68
|
164 |
\hfill[2 Mark]
|
|
165 |
\end{itemize}
|
6
|
166 |
|
68
|
167 |
\subsection*{Part 2 (4 Marks)}
|
|
168 |
|
|
169 |
Coming soon.
|
6
|
170 |
|
|
171 |
\end{document}
|
|
172 |
|
68
|
173 |
|
6
|
174 |
%%% Local Variables:
|
|
175 |
%%% mode: latex
|
|
176 |
%%% TeX-master: t
|
|
177 |
%%% End:
|