4 |
4 |
5 \begin{document} |
5 \begin{document} |
6 |
6 |
7 \section*{Coursework 1 (Strand 1)} |
7 \section*{Coursework 1 (Strand 1)} |
8 |
8 |
9 This coursework is worth 5\% and is due on 20 October at |
9 This coursework is worth 4\% and is due on 20 October at |
10 16:00. You are asked to implement a regular expression matcher |
10 16:00. You are asked to implement a regular expression matcher |
11 and submit a document containing the answers for the questions |
11 and submit a document containing the answers for the questions |
12 below. You can do the implementation in any programming |
12 below. You can do the implementation in any programming |
13 language you like, but you need to submit the source code with |
13 language you like, but you need to submit the source code with |
14 which you answered the questions, otherwise a mark of 0\% will |
14 which you answered the questions, otherwise a mark of 0\% will |
15 be awarded. You can submit your answers in a txt-file or pdf. |
15 be awarded. You can submit your answers in a txt-file or pdf. |
|
16 Code send as code. |
16 |
17 |
17 |
18 |
18 |
19 |
19 \subsubsection*{Disclaimer} |
20 \subsubsection*{Disclaimer} |
20 |
21 |
21 It should be understood that the work you submit represents |
22 It should be understood that the work you submit represents |
22 your own effort. You have not copied from anyone else. An |
23 your own effort. You have not copied from anyone else. An |
23 exception is the Scala code I showed during the lectures or |
24 exception is the Scala code I showed during the lectures or |
24 uploaded to KEATS, which you can use.\bigskip |
25 uploaded to KEATS, which you can freely use.\bigskip |
25 |
26 |
26 |
27 |
27 \subsubsection*{Tasks} |
28 \subsubsection*{Task} |
28 |
29 |
29 The task is to implement a regular expression matcher based on |
30 The task is to implement a regular expression matcher based on |
30 derivatives. The implementation should be able to deal with the usual |
31 derivatives of regular expressions. The implementation should |
31 (basic) regular expressions |
32 be able to deal with the usual (basic) regular expressions |
32 |
33 |
33 \[ |
34 \[ |
34 \varnothing, \epsilon, c, r_1 + r_2, r_1 \cdot r_2, r^* |
35 \ZERO,\; \ONE,\; c,\; r_1 + r_2,\; r_1 \cdot r_2,\; r^* |
35 \] |
36 \] |
36 |
37 |
37 \noindent |
38 \noindent |
38 but also with the following extended regular expressions: |
39 but also with the following extended regular expressions: |
39 |
40 |
45 $r^{\{n,m\}}$ & at least $n$-times $r$ but no more than $m$-times\\ |
46 $r^{\{n,m\}}$ & at least $n$-times $r$ but no more than $m$-times\\ |
46 $\sim{}r$ & not-regular expression of $r$\\ |
47 $\sim{}r$ & not-regular expression of $r$\\ |
47 \end{tabular} |
48 \end{tabular} |
48 \end{center} |
49 \end{center} |
49 |
50 |
50 \noindent In the case of $r^{\{n,m\}}$ we have the convention |
51 \noindent In the case of $r^{\{n,m\}}$ you can assume the |
51 that $0 \le n \le m$. The meanings of the extended regular |
52 convention that $0 \le n \le m$. The meanings of the extended |
52 expressions are |
53 regular expressions are |
53 |
54 |
54 \begin{center} |
55 \begin{center} |
55 \begin{tabular}{r@{\hspace{2mm}}c@{\hspace{2mm}}l} |
56 \begin{tabular}{r@{\hspace{2mm}}c@{\hspace{2mm}}l} |
56 $L([c_1 c_2 \ldots c_n])$ & $\dn$ & $\{[c_1], [c_2], \ldots, [c_n]\}$\\ |
57 $L([c_1 c_2 \ldots c_n])$ & $\dn$ & $\{[c_1], [c_2], \ldots, [c_n]\}$\\ |
57 $L(r^+)$ & $\dn$ & $\bigcup_{1\le i}. L(r)^i$\\ |
58 $L(r^+)$ & $\dn$ & $\bigcup_{1\le i}. L(r)^i$\\ |
62 \end{center} |
63 \end{center} |
63 |
64 |
64 \noindent whereby in the last clause the set $\Sigma^*$ stands |
65 \noindent whereby in the last clause the set $\Sigma^*$ stands |
65 for the set of \emph{all} strings over the alphabet $\Sigma$ |
66 for the set of \emph{all} strings over the alphabet $\Sigma$ |
66 (in the implementation the alphabet can be just what is |
67 (in the implementation the alphabet can be just what is |
67 represented by, say, the type \pcode{char}). So $\sim{}r$ |
68 represented by, say, the type \pcode{Char}). So $\sim{}r$ |
68 means `all the strings that $r$ cannot match'. |
69 means `all the strings that $r$ cannot match'. |
69 |
70 |
70 Be careful that your implementation of \textit{nullable} and |
71 Be careful that your implementation of \textit{nullable} and |
71 \textit{der} satisfies for every $r$ the following two |
72 \textit{der} satisfies for every $r$ the following two |
72 properties (see also Question 2): |
73 properties (see also Question 2): |
84 where you are asked to explicitly give the rules for |
85 where you are asked to explicitly give the rules for |
85 \textit{nullable} and \textit{der} for the extended regular |
86 \textit{nullable} and \textit{der} for the extended regular |
86 expressions. |
87 expressions. |
87 |
88 |
88 |
89 |
89 \subsection*{Question 1 (unmarked)} |
90 \subsection*{Question 1} |
90 |
91 |
91 What is your King's email address (you will need it in |
92 What is your King's email address (you will need it in |
92 Question 3)? |
93 Question 3)? |
93 |
94 |
94 \subsection*{Question 2 (marked with 2\%)} |
95 \subsection*{Question 2} |
95 |
96 |
96 This question does not require any implementation. From the |
97 This question does not require any implementation. From the |
97 lectures you have seen the definitions for the functions |
98 lectures you have seen the definitions for the functions |
98 \textit{nullable} and \textit{der} for the basic regular |
99 \textit{nullable} and \textit{der} for the basic regular |
99 expressions. Give the rules for the extended regular |
100 expressions. Give the rules for the extended regular |
120 \begin{itemize} |
121 \begin{itemize} |
121 \item $\textit{nullable}(r)$ if and only if $[]\in L(r)$ |
122 \item $\textit{nullable}(r)$ if and only if $[]\in L(r)$ |
122 \item $L(der\,c\,r)) = Der\,c\,(L(r))$ |
123 \item $L(der\,c\,r)) = Der\,c\,(L(r))$ |
123 \end{itemize} |
124 \end{itemize} |
124 |
125 |
125 \subsection*{Question 3 (marked with 1\%)} |
126 \subsection*{Question 3} |
126 |
127 |
127 Implement the following regular expression for email addresses |
128 Implement the following regular expression for email addresses |
128 |
129 |
129 \[ |
130 \[ |
130 ([a\mbox{-}z0\mbox{-}9\_\!\_\,.-]^+)\cdot @\cdot ([a\mbox{-}z0\mbox{-}9\,.-]^+)\cdot .\cdot ([a\mbox{-}z\,.]^{\{2,6\}}) |
131 ([a\mbox{-}z0\mbox{-}9\_\!\_\,.-]^+)\cdot @\cdot ([a\mbox{-}z0\mbox{-}9\,.-]^+)\cdot .\cdot ([a\mbox{-}z\,.]^{\{2,6\}}) |
131 \] |
132 \] |
132 |
133 |
133 \noindent and calculate the derivative according to your email |
134 \noindent and calculate the derivative according to your email |
134 address. When calculating the derivative, simplify all regular |
135 address. When calculating the derivative, simplify all regular |
135 expressions as much as possible, but at least apply the |
136 expressions as much as possible by applying the |
136 following six simplification rules: |
137 following 7 simplification rules: |
137 |
138 |
138 \begin{center} |
139 \begin{center} |
139 \begin{tabular}{l@{\hspace{2mm}}c@{\hspace{2mm}}ll} |
140 \begin{tabular}{l@{\hspace{2mm}}c@{\hspace{2mm}}ll} |
140 $r \cdot \varnothing$ & $\mapsto$ & $\varnothing$\\ |
141 $r \cdot \varnothing$ & $\mapsto$ & $\varnothing$\\ |
141 $\varnothing \cdot r$ & $\mapsto$ & $\varnothing$\\ |
142 $\varnothing \cdot r$ & $\mapsto$ & $\varnothing$\\ |
145 $\varnothing + r$ & $\mapsto$ & $r$\\ |
146 $\varnothing + r$ & $\mapsto$ & $r$\\ |
146 $r + r$ & $\mapsto$ & $r$\\ |
147 $r + r$ & $\mapsto$ & $r$\\ |
147 \end{tabular} |
148 \end{tabular} |
148 \end{center} |
149 \end{center} |
149 |
150 |
150 \noindent |
151 \noindent Write down your simplified derivative in a readable |
151 Write down your simplified derivative in the ``mathematicical'' |
152 notation using parentheses where necessary. That means you |
152 notation using parentheses where necessary. That means you should |
153 should use the infix notation $+$, $\cdot$, $^*$ and so on, |
153 use the more readable infix notation $+$, $\cdot$ and $^*$, |
|
154 instead of code. |
154 instead of code. |
155 |
155 |
156 \subsection*{Question 4 (marked with 1\%)} |
156 \subsection*{Question 4} |
157 |
157 |
158 Suppose \textit{[a-z]} stands for the range regular expression |
158 Suppose \textit{[a-z]} stands for the range regular expression |
159 $[a,b,c,\ldots,z]$. Consider the regular expression $/ \cdot * \cdot |
159 $[a,b,c,\ldots,z]$. Consider the regular expression $/ \cdot * \cdot |
160 (\sim{}([a\mbox{-}z]^* \cdot * \cdot / \cdot [a\mbox{-}z]^*)) \cdot * |
160 (\sim{}([a\mbox{-}z]^* \cdot * \cdot / \cdot [a\mbox{-}z]^*)) \cdot * |
161 \cdot /$ and decide wether the following four strings are matched by |
161 \cdot /$ and decide wether the following four strings are matched by |
166 \item \texttt{"/*foobar*/"} |
166 \item \texttt{"/*foobar*/"} |
167 \item \texttt{"/*test*/test*/"} |
167 \item \texttt{"/*test*/test*/"} |
168 \item \texttt{"/*test/*test*/"} |
168 \item \texttt{"/*test/*test*/"} |
169 \end{enumerate} |
169 \end{enumerate} |
170 |
170 |
171 \subsection*{Question 5 (marked with 1\%)} |
171 \noindent |
|
172 Also test your regular expression matcher with the regular |
|
173 expression $a^{\{3,5\}}$ and the strings |
|
174 |
|
175 \begin{enumerate} |
|
176 \setcounter{enumi}{4} |
|
177 \item \texttt{aa} |
|
178 \item \texttt{aaa} |
|
179 \item \texttt{aaaaa} |
|
180 \item \texttt{aaaaaa} |
|
181 \end{enumerate} |
|
182 |
|
183 \noindent |
|
184 Does your matcher produce the expected results? |
|
185 |
|
186 \subsection*{Question 5} |
172 |
187 |
173 Let $r_1$ be the regular expression $a\cdot a\cdot a$ and $r_2$ be |
188 Let $r_1$ be the regular expression $a\cdot a\cdot a$ and $r_2$ be |
174 $(a^{\{19,19\}}) \cdot (a^?)$. Decide whether the following three |
189 $(a^{\{19,19\}}) \cdot (a^?)$. Decide whether the following three |
175 strings consisting of $a$s only can be matched by $(r_1^+)^+$. |
190 strings consisting of $a$s only can be matched by $(r_1^+)^+$. |
176 Similarly test them with $(r_2^+)^+$. Again answer in all six cases |
191 Similarly test them with $(r_2^+)^+$. Again answer in all six cases |