|
1 \documentclass{article} |
|
2 \usepackage{style} |
|
3 %%\usepackage{../langs} |
|
4 |
|
5 \begin{document} |
|
6 |
|
7 \section*{Coursework 3} |
|
8 |
|
9 This coursework is worth XXX\% and is due on XXXX at |
|
10 16:00. You are asked to implement a regular expression matcher. |
|
11 |
|
12 |
|
13 |
|
14 |
|
15 \subsubsection*{Disclaimer} |
|
16 |
|
17 It should be understood that the work you submit represents |
|
18 your own effort. You have not copied from anyone else. An |
|
19 exception is the Scala code I showed during the lectures or |
|
20 uploaded to KEATS, which you can freely use.\bigskip |
|
21 |
|
22 |
|
23 \subsubsection*{Task} |
|
24 |
|
25 The task is to implement a regular expression matcher based on |
|
26 derivatives of regular expressions. The implementation should |
|
27 be able to deal with the usual (basic) regular expressions |
|
28 |
|
29 \begin{center} |
|
30 \begin{tabular}{lcll} |
|
31 $r$ & $::=$ & $\ZERO$ & cannot match anything\\ |
|
32 & $|$ & $\ONE$ & can only match the empty string\\ |
|
33 & $|$ & $c$ & can match a character $c$\\ |
|
34 & $|$ & $r_1 + r_2$ & can match either with $r_1$ or with $r_2$\\ |
|
35 & $|$ & $r_1 \cdot r_2$ & can match first with $r_1$ and then with $r_2$\\ |
|
36 & $|$ & $r^*$ & can match zero or more times $r$\\ |
|
37 & $|$ & $r^{\{\uparrow n\}}$ & can match zero upto $n$ times $r$\\ |
|
38 & $|$ & $r^{\{n\}}$ & can match exactly $n$ times $r$\\ |
|
39 \end{tabular} |
|
40 \end{center} |
|
41 |
|
42 \noindent |
|
43 Implement a function called \textit{nullable} by recursion over |
|
44 regular expressions: |
|
45 |
|
46 \begin{center} |
|
47 \begin{tabular}{lcl} |
|
48 $\textit{nullable}(\ZERO)$ & $\dn$ & $\textit{false}$\\ |
|
49 $\textit{nullable}(\ONE)$ & $\dn$ & $\textit{true}$\\ |
|
50 $\textit{nullable}(c)$ & $\dn$ & $\textit{false}$\\ |
|
51 $\textit{nullable}(r_1 + r_2)$ & $\dn$ & $\textit{nullable}(r_1) \vee \textit{nullable}(r_2)$\\ |
|
52 $\textit{nullable}(r_1 \cdot r_2)$ & $\dn$ & $\textit{nullable}(r_1) \wedge \textit{nullable}(r_2)$\\ |
|
53 $\textit{nullable}(r^*)$ & $\dn$ & $\textit{true}$\\ |
|
54 $\textit{nullable}(r^{\{\uparrow n\}})$ & $\dn$ & $\textit{true}$\\ |
|
55 $\textit{nullable}(r^{\{n\}})$ & $\dn$ & |
|
56 $\textit{if}\;n = 0\; \textit{then} \; \textit{true} \; \textit{else} \; \textit{nullable}(r)$\\ |
|
57 \end{tabular} |
|
58 \end{center} |
|
59 |
|
60 \begin{center} |
|
61 \begin{tabular}{lcl} |
|
62 $\textit{der}\;c\;(\ZERO)$ & $\dn$ & $\ZERO$\\ |
|
63 $\textit{der}\;c\;(\ONE)$ & $\dn$ & $\ZERO$\\ |
|
64 $\textit{der}\;c\;(d)$ & $\dn$ & $\textit{if}\; c = d\;\textit{then} \;\ONE \; \textit{else} \;\ZERO$\\ |
|
65 $\textit{der}\;c\;(r_1 + r_2)$ & $\dn$ & $(\textit{der}\;c\;r_1) + (\textit{der}\;c\;r_2)$\\ |
|
66 $\textit{der}\;c\;(r_1 \cdot r_2)$ & $\dn$ & $\textit{if}\;\textit{nullable}(r_1)$\\ |
|
67 & & $\textit{then}\;((\textit{der}\;c\;r_1)\cdot r_2) + (\textit{der}\;c\;r_2)$\\ |
|
68 & & $\textit{else}\;(\textit{der}\;c\;r_1)\cdot r_2$\\ |
|
69 $\textit{der}\;c\;(r^*)$ & $\dn$ & $(\textit{der}\;c\;r)\cdot (r^*)$\\ |
|
70 $\textit{der}\;c\;(r^{\{\uparrow n\}})$ & $\dn$ & $\textit{if}\;n = 0\;\textit{then}\;\ZERO\;\text{else}\; |
|
71 (\textit{der}\;c\;r)\cdot (r^{\{\uparrow n-1\}})$\\ |
|
72 $\textit{der}\;c\;(r^{\{n\}})$ & $\dn$ & |
|
73 $\textit{if}\;n = 0\; \textit{then} \; \ZERO\; \textit{else}\;$\\ |
|
74 & & $\textit{if} \;\textit{nullable}(r)\;\textit{then}\;(\textit{der}\;c\;r)\cdot (r^{\{\uparrow n-1\}})$\\ |
|
75 & & $\textit{else}\;(\textit{der}\;c\;r)\cdot (r^{\{n-1\}})$ |
|
76 \end{tabular} |
|
77 \end{center} |
|
78 |
|
79 |
|
80 Be careful that your implementation of \textit{nullable} and |
|
81 \textit{der}\;c\; satisfies for every $r$ the following two |
|
82 properties (see also Question 2): |
|
83 |
|
84 \begin{itemize} |
|
85 \item $\textit{nullable}(r)$ if and only if $[]\in L(r)$ |
|
86 \item $L(der\,c\,r) = Der\,c\,(L(r))$ |
|
87 \end{itemize} |
|
88 |
|
89 \noindent {\bf Important!} Your implementation should have |
|
90 explicit cases for the basic regular expressions, but also |
|
91 explicit cases for the extended regular expressions. That |
|
92 means do not treat the extended regular expressions by just |
|
93 translating them into the basic ones. See also Question 2, |
|
94 where you are asked to explicitly give the rules for |
|
95 \textit{nullable} and \textit{der}\;c\; for the extended regular |
|
96 expressions. |
|
97 |
|
98 |
|
99 \subsection*{Question 1} |
|
100 |
|
101 What is your King's email address (you will need it in |
|
102 Question 3)? |
|
103 |
|
104 \subsection*{Question 2} |
|
105 |
|
106 This question does not require any implementation. From the |
|
107 lectures you have seen the definitions for the functions |
|
108 \textit{nullable} and \textit{der}\;c\; for the basic regular |
|
109 expressions. Give the rules for the extended regular |
|
110 expressions: |
|
111 |
|
112 \begin{center} |
|
113 \begin{tabular}{@ {}l@ {\hspace{2mm}}c@ {\hspace{2mm}}l@ {}} |
|
114 $\textit{nullable}([c_1 c_2 \ldots c_n])$ & $\dn$ & $?$\\ |
|
115 $\textit{nullable}(r^+)$ & $\dn$ & $?$\\ |
|
116 $\textit{nullable}(r^?)$ & $\dn$ & $?$\\ |
|
117 $\textit{nullable}(r^{\{n,m\}})$ & $\dn$ & $?$\\ |
|
118 $\textit{nullable}(\sim{}r)$ & $\dn$ & $?$\medskip\\ |
|
119 $der\, c\, ([c_1 c_2 \ldots c_n])$ & $\dn$ & $?$\\ |
|
120 $der\, c\, (r^+)$ & $\dn$ & $?$\\ |
|
121 $der\, c\, (r^?)$ & $\dn$ & $?$\\ |
|
122 $der\, c\, (r^{\{n,m\}})$ & $\dn$ & $?$\\ |
|
123 $der\, c\, (\sim{}r)$ & $\dn$ & $?$\\ |
|
124 \end{tabular} |
|
125 \end{center} |
|
126 |
|
127 \noindent |
|
128 Remember your definitions have to satisfy the two properties |
|
129 |
|
130 \begin{itemize} |
|
131 \item $\textit{nullable}(r)$ if and only if $[]\in L(r)$ |
|
132 \item $L(der\,c\,r)) = Der\,c\,(L(r))$ |
|
133 \end{itemize} |
|
134 |
|
135 \subsection*{Question 3} |
|
136 |
|
137 Implement the following regular expression for email addresses |
|
138 |
|
139 \[ |
|
140 ([a\mbox{-}z0\mbox{-}9\_\!\_\,.-]^+)\cdot @\cdot ([a\mbox{-}z0\mbox{-}9\,.-]^+)\cdot .\cdot ([a\mbox{-}z\,.]^{\{2,6\}}) |
|
141 \] |
|
142 |
|
143 \noindent and calculate the derivative according to your email |
|
144 address. When calculating the derivative, simplify all regular |
|
145 expressions as much as possible by applying the |
|
146 following 7 simplification rules: |
|
147 |
|
148 \begin{center} |
|
149 \begin{tabular}{l@{\hspace{2mm}}c@{\hspace{2mm}}ll} |
|
150 $r \cdot \ZERO$ & $\mapsto$ & $\ZERO$\\ |
|
151 $\ZERO \cdot r$ & $\mapsto$ & $\ZERO$\\ |
|
152 $r \cdot \ONE$ & $\mapsto$ & $r$\\ |
|
153 $\ONE \cdot r$ & $\mapsto$ & $r$\\ |
|
154 $r + \ZERO$ & $\mapsto$ & $r$\\ |
|
155 $\ZERO + r$ & $\mapsto$ & $r$\\ |
|
156 $r + r$ & $\mapsto$ & $r$\\ |
|
157 \end{tabular} |
|
158 \end{center} |
|
159 |
|
160 \noindent Write down your simplified derivative in a readable |
|
161 notation using parentheses where necessary. That means you |
|
162 should use the infix notation $+$, $\cdot$, $^*$ and so on, |
|
163 instead of code. |
|
164 |
|
165 \subsection*{Question 4} |
|
166 |
|
167 Suppose \textit{[a-z]} stands for the range regular expression |
|
168 $[a,b,c,\ldots,z]$. Consider the regular expression $/ \cdot * \cdot |
|
169 (\sim{}([a\mbox{-}z]^* \cdot * \cdot / \cdot [a\mbox{-}z]^*)) \cdot * |
|
170 \cdot /$ and decide wether the following four strings are matched by |
|
171 this regular expression. Answer yes or no. |
|
172 |
|
173 \begin{enumerate} |
|
174 \item \texttt{"/**/"} |
|
175 \item \texttt{"/*foobar*/"} |
|
176 \item \texttt{"/*test*/test*/"} |
|
177 \item \texttt{"/*test/*test*/"} |
|
178 \end{enumerate} |
|
179 |
|
180 \noindent |
|
181 Also test your regular expression matcher with the regular |
|
182 expression $a^{\{3,5\}}$ and the strings |
|
183 |
|
184 \begin{enumerate} |
|
185 \setcounter{enumi}{4} |
|
186 \item \texttt{aa} |
|
187 \item \texttt{aaa} |
|
188 \item \texttt{aaaaa} |
|
189 \item \texttt{aaaaaa} |
|
190 \end{enumerate} |
|
191 |
|
192 \noindent |
|
193 Does your matcher produce the expected results? |
|
194 |
|
195 \subsection*{Question 5} |
|
196 |
|
197 Let $r_1$ be the regular expression $a\cdot a\cdot a$ and $r_2$ be |
|
198 $(a^{\{19,19\}}) \cdot (a^?)$. Decide whether the following three |
|
199 strings consisting of $a$s only can be matched by $(r_1^+)^+$. |
|
200 Similarly test them with $(r_2^+)^+$. Again answer in all six cases |
|
201 with yes or no. \medskip |
|
202 |
|
203 \noindent |
|
204 These are strings are meant to be entirely made up of $a$s. Be careful |
|
205 when copy-and-pasting the strings so as to not forgetting any $a$ and |
|
206 to not introducing any other character. |
|
207 |
|
208 \begin{enumerate} |
|
209 \item \texttt{"aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa\\ |
|
210 aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa\\ |
|
211 aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa"} |
|
212 \item \texttt{"aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa\\ |
|
213 aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa\\ |
|
214 aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa"} |
|
215 \item \texttt{"aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa\\ |
|
216 aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa\\ |
|
217 aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa"} |
|
218 \end{enumerate} |
|
219 |
|
220 |
|
221 \end{document} |
|
222 |
|
223 %%% Local Variables: |
|
224 %%% mode: latex |
|
225 %%% TeX-master: t |
|
226 %%% End: |