|
1 \documentclass{article} |
|
2 \usepackage{style} |
|
3 %%\usepackage{../langs} |
|
4 |
|
5 \begin{document} |
|
6 |
|
7 \section*{Coursework 1 (Strand 1)} |
|
8 |
|
9 This coursework is worth 4\% and is due on 25 October at |
|
10 16:00. You are asked to implement a regular expression matcher |
|
11 and submit a document containing the answers for the questions |
|
12 below. You can do the implementation in any programming |
|
13 language you like, but you need to submit the source code with |
|
14 which you answered the questions, otherwise a mark of 0\% will |
|
15 be awarded. You can submit your answers in a txt-file or pdf. |
|
16 Code send as code. |
|
17 |
|
18 |
|
19 |
|
20 \subsubsection*{Disclaimer} |
|
21 |
|
22 It should be understood that the work you submit represents |
|
23 your own effort. You have not copied from anyone else. An |
|
24 exception is the Scala code I showed during the lectures or |
|
25 uploaded to KEATS, which you can freely use.\bigskip |
|
26 |
|
27 |
|
28 \subsubsection*{Task} |
|
29 |
|
30 The task is to implement a regular expression matcher based on |
|
31 derivatives of regular expressions. The implementation should |
|
32 be able to deal with the usual (basic) regular expressions |
|
33 |
|
34 \[ |
|
35 \ZERO,\; \ONE,\; c,\; r_1 + r_2,\; r_1 \cdot r_2,\; r^* |
|
36 \] |
|
37 |
|
38 \noindent |
|
39 but also with the following extended regular expressions: |
|
40 |
|
41 \begin{center} |
|
42 \begin{tabular}{ll} |
|
43 $[c_1 c_2 \ldots c_n]$ & a range of characters\\ |
|
44 $r^+$ & one or more times $r$\\ |
|
45 $r^?$ & optional $r$\\ |
|
46 $r^{\{n,m\}}$ & at least $n$-times $r$ but no more than $m$-times\\ |
|
47 $\sim{}r$ & not-regular expression of $r$\\ |
|
48 \end{tabular} |
|
49 \end{center} |
|
50 |
|
51 \noindent In the case of $r^{\{n,m\}}$ you can assume the |
|
52 convention that $0 \le n \le m$. The meanings of the extended |
|
53 regular expressions are |
|
54 |
|
55 \begin{center} |
|
56 \begin{tabular}{r@{\hspace{2mm}}c@{\hspace{2mm}}l} |
|
57 $L([c_1 c_2 \ldots c_n])$ & $\dn$ & $\{[c_1], [c_2], \ldots, [c_n]\}$\\ |
|
58 $L(r^+)$ & $\dn$ & $\bigcup_{1\le i}. L(r)^i$\\ |
|
59 $L(r^?)$ & $\dn$ & $L(r) \cup \{[]\}$\\ |
|
60 $L(r^{\{n,m\}})$ & $\dn$ & $\bigcup_{n\le i \le m}. L(r)^i$\\ |
|
61 $L(\sim{}r)$ & $\dn$ & $\Sigma^* - L(r)$ |
|
62 \end{tabular} |
|
63 \end{center} |
|
64 |
|
65 \noindent whereby in the last clause the set $\Sigma^*$ stands |
|
66 for the set of \emph{all} strings over the alphabet $\Sigma$ |
|
67 (in the implementation the alphabet can be just what is |
|
68 represented by, say, the type \texttt{Char}). So $\sim{}r$ |
|
69 means `all the strings that $r$ cannot match'. |
|
70 |
|
71 Be careful that your implementation of \textit{nullable} and |
|
72 \textit{der} satisfies for every $r$ the following two |
|
73 properties (see also Question 2): |
|
74 |
|
75 \begin{itemize} |
|
76 \item $\textit{nullable}(r)$ if and only if $[]\in L(r)$ |
|
77 \item $L(der\,c\,r) = Der\,c\,(L(r))$ |
|
78 \end{itemize} |
|
79 |
|
80 \noindent {\bf Important!} Your implementation should have |
|
81 explicit cases for the basic regular expressions, but also |
|
82 explicit cases for the extended regular expressions. That |
|
83 means do not treat the extended regular expressions by just |
|
84 translating them into the basic ones. See also Question 2, |
|
85 where you are asked to explicitly give the rules for |
|
86 \textit{nullable} and \textit{der} for the extended regular |
|
87 expressions. |
|
88 |
|
89 |
|
90 \subsection*{Question 1} |
|
91 |
|
92 What is your King's email address (you will need it in |
|
93 Question 3)? |
|
94 |
|
95 \subsection*{Question 2} |
|
96 |
|
97 This question does not require any implementation. From the |
|
98 lectures you have seen the definitions for the functions |
|
99 \textit{nullable} and \textit{der} for the basic regular |
|
100 expressions. Give the rules for the extended regular |
|
101 expressions: |
|
102 |
|
103 \begin{center} |
|
104 \begin{tabular}{@ {}l@ {\hspace{2mm}}c@ {\hspace{2mm}}l@ {}} |
|
105 $\textit{nullable}([c_1 c_2 \ldots c_n])$ & $\dn$ & $?$\\ |
|
106 $\textit{nullable}(r^+)$ & $\dn$ & $?$\\ |
|
107 $\textit{nullable}(r^?)$ & $\dn$ & $?$\\ |
|
108 $\textit{nullable}(r^{\{n,m\}})$ & $\dn$ & $?$\\ |
|
109 $\textit{nullable}(\sim{}r)$ & $\dn$ & $?$\medskip\\ |
|
110 $der\, c\, ([c_1 c_2 \ldots c_n])$ & $\dn$ & $?$\\ |
|
111 $der\, c\, (r^+)$ & $\dn$ & $?$\\ |
|
112 $der\, c\, (r^?)$ & $\dn$ & $?$\\ |
|
113 $der\, c\, (r^{\{n,m\}})$ & $\dn$ & $?$\\ |
|
114 $der\, c\, (\sim{}r)$ & $\dn$ & $?$\\ |
|
115 \end{tabular} |
|
116 \end{center} |
|
117 |
|
118 \noindent |
|
119 Remember your definitions have to satisfy the two properties |
|
120 |
|
121 \begin{itemize} |
|
122 \item $\textit{nullable}(r)$ if and only if $[]\in L(r)$ |
|
123 \item $L(der\,c\,r)) = Der\,c\,(L(r))$ |
|
124 \end{itemize} |
|
125 |
|
126 \subsection*{Question 3} |
|
127 |
|
128 Implement the following regular expression for email addresses |
|
129 |
|
130 \[ |
|
131 ([a\mbox{-}z0\mbox{-}9\_\!\_\,.-]^+)\cdot @\cdot ([a\mbox{-}z0\mbox{-}9\,.-]^+)\cdot .\cdot ([a\mbox{-}z\,.]^{\{2,6\}}) |
|
132 \] |
|
133 |
|
134 \noindent and calculate the derivative according to your email |
|
135 address. When calculating the derivative, simplify all regular |
|
136 expressions as much as possible by applying the |
|
137 following 7 simplification rules: |
|
138 |
|
139 \begin{center} |
|
140 \begin{tabular}{l@{\hspace{2mm}}c@{\hspace{2mm}}ll} |
|
141 $r \cdot \ZERO$ & $\mapsto$ & $\ZERO$\\ |
|
142 $\ZERO \cdot r$ & $\mapsto$ & $\ZERO$\\ |
|
143 $r \cdot \ONE$ & $\mapsto$ & $r$\\ |
|
144 $\ONE \cdot r$ & $\mapsto$ & $r$\\ |
|
145 $r + \ZERO$ & $\mapsto$ & $r$\\ |
|
146 $\ZERO + r$ & $\mapsto$ & $r$\\ |
|
147 $r + r$ & $\mapsto$ & $r$\\ |
|
148 \end{tabular} |
|
149 \end{center} |
|
150 |
|
151 \noindent Write down your simplified derivative in a readable |
|
152 notation using parentheses where necessary. That means you |
|
153 should use the infix notation $+$, $\cdot$, $^*$ and so on, |
|
154 instead of code. |
|
155 |
|
156 \subsection*{Question 4} |
|
157 |
|
158 Suppose \textit{[a-z]} stands for the range regular expression |
|
159 $[a,b,c,\ldots,z]$. Consider the regular expression $/ \cdot * \cdot |
|
160 (\sim{}([a\mbox{-}z]^* \cdot * \cdot / \cdot [a\mbox{-}z]^*)) \cdot * |
|
161 \cdot /$ and decide wether the following four strings are matched by |
|
162 this regular expression. Answer yes or no. |
|
163 |
|
164 \begin{enumerate} |
|
165 \item \texttt{"/**/"} |
|
166 \item \texttt{"/*foobar*/"} |
|
167 \item \texttt{"/*test*/test*/"} |
|
168 \item \texttt{"/*test/*test*/"} |
|
169 \end{enumerate} |
|
170 |
|
171 \noindent |
|
172 Also test your regular expression matcher with the regular |
|
173 expression $a^{\{3,5\}}$ and the strings |
|
174 |
|
175 \begin{enumerate} |
|
176 \setcounter{enumi}{4} |
|
177 \item \texttt{aa} |
|
178 \item \texttt{aaa} |
|
179 \item \texttt{aaaaa} |
|
180 \item \texttt{aaaaaa} |
|
181 \end{enumerate} |
|
182 |
|
183 \noindent |
|
184 Does your matcher produce the expected results? |
|
185 |
|
186 \subsection*{Question 5} |
|
187 |
|
188 Let $r_1$ be the regular expression $a\cdot a\cdot a$ and $r_2$ be |
|
189 $(a^{\{19,19\}}) \cdot (a^?)$. Decide whether the following three |
|
190 strings consisting of $a$s only can be matched by $(r_1^+)^+$. |
|
191 Similarly test them with $(r_2^+)^+$. Again answer in all six cases |
|
192 with yes or no. \medskip |
|
193 |
|
194 \noindent |
|
195 These are strings are meant to be entirely made up of $a$s. Be careful |
|
196 when copy-and-pasting the strings so as to not forgetting any $a$ and |
|
197 to not introducing any other character. |
|
198 |
|
199 \begin{enumerate} |
|
200 \item \texttt{"aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa\\ |
|
201 aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa\\ |
|
202 aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa"} |
|
203 \item \texttt{"aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa\\ |
|
204 aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa\\ |
|
205 aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa"} |
|
206 \item \texttt{"aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa\\ |
|
207 aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa\\ |
|
208 aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa"} |
|
209 \end{enumerate} |
|
210 |
|
211 |
|
212 \end{document} |
|
213 |
|
214 %%% Local Variables: |
|
215 %%% mode: latex |
|
216 %%% TeX-master: t |
|
217 %%% End: |