4 |
4 |
5 \begin{document} |
5 \begin{document} |
6 |
6 |
7 \section*{Coursework 1} |
7 \section*{Coursework 1} |
8 |
8 |
9 This coursework is worth 3\% and is due on 12 November at 16:00. You are asked to implement |
9 This coursework is worth 5\% and is due on 13 October at 16:00. You |
10 a regular expression matcher and submit a document containing the answers for the questions |
10 are asked to implement a regular expression matcher and submit a |
11 below. You can do the implementation in any programming language you like, but you need |
11 document containing the answers for the questions below. You can do |
12 to submit the source code with which you answered the questions. However, the coursework |
12 the implementation in any programming language you like, but you need |
13 will \emph{only} be judged according to the answers. You can submit your answers |
13 to submit the source code with which you answered the |
14 in a txt-file or pdf.\bigskip |
14 questions. However, the coursework will \emph{only} be judged |
|
15 according to the answers. You can submit your answers in a txt-file or |
|
16 pdf. |
15 |
17 |
16 \noindent |
18 \subsubsection*{Disclaimer} |
17 The task is to implement a regular expression matcher based on derivatives. The implementation |
19 |
18 should be able to deal with the usual regular expressions |
20 It should be understood that the work submitted represents your own effort. |
|
21 You have not copied from anyone else. An exception is the Scala code I |
|
22 showed during the lectures, which you can use.\bigskip |
|
23 |
|
24 |
|
25 \subsubsection*{Tasks} |
|
26 |
|
27 The task is to implement a regular expression matcher based on |
|
28 derivatives. The implementation should be able to deal with the usual |
|
29 (basic) regular expressions |
19 |
30 |
20 \[ |
31 \[ |
21 \varnothing, \epsilon, c, r_1 + r_2, r_1 \cdot r_2, r^* |
32 \varnothing, \epsilon, c, r_1 + r_2, r_1 \cdot r_2, r^* |
22 \] |
33 \] |
23 |
34 |
24 \noindent |
35 \noindent |
25 but also with |
36 but also with the following extended regular expressions: |
26 |
37 |
27 \begin{center} |
38 \begin{center} |
28 \begin{tabular}{ll} |
39 \begin{tabular}{ll} |
29 $[c_1 c_2 \ldots c_n]$ & a range of characters\\ |
40 $[c_1 c_2 \ldots c_n]$ & a range of characters\\ |
30 $r^+$ & one or more times $r$\\ |
41 $r^+$ & one or more times $r$\\ |
38 In the case of $r^{\{n,m\}}$ we have the convention that $0 \le n \le m$. |
49 In the case of $r^{\{n,m\}}$ we have the convention that $0 \le n \le m$. |
39 The meaning of these regular expressions is |
50 The meaning of these regular expressions is |
40 |
51 |
41 \begin{center} |
52 \begin{center} |
42 \begin{tabular}{r@{\hspace{2mm}}c@{\hspace{2mm}}l} |
53 \begin{tabular}{r@{\hspace{2mm}}c@{\hspace{2mm}}l} |
43 $L([c_1 c_2 \ldots c_n])$ & $\dn$ & $\{"c_1", "c_2", \ldots, "c_n"\}$\\ |
54 $L([c_1 c_2 \ldots c_n])$ & $\dn$ & $\{[c_1], [c_2], \ldots, [c_n]\}$\\ |
44 $L(r^+)$ & $\dn$ & $\bigcup_{1\le i}. L(r)^i$\\ |
55 $L(r^+)$ & $\dn$ & $\bigcup_{1\le i}. L(r)^i$\\ |
45 $L(r^?)$ & $\dn$ & $L(r) \cup \{""\}$\\ |
56 $L(r^?)$ & $\dn$ & $L(r) \cup \{[]\}$\\ |
46 $L(r^{\{n,m\}})$ & $\dn$ & $\bigcup_{n\le i \le m}. L(r)^i$\\ |
57 $L(r^{\{n,m\}})$ & $\dn$ & $\bigcup_{n\le i \le m}. L(r)^i$\\ |
47 $L(\sim{}r)$ & $\dn$ & $UNIV - L(r)$ |
58 $L(\sim{}r)$ & $\dn$ & $\mathbb{A} - L(r)$ |
48 \end{tabular} |
59 \end{tabular} |
49 \end{center} |
60 \end{center} |
50 |
61 |
51 \noindent |
62 \noindent |
52 whereby in the last clause the set $UNIV$ stands for the set of \emph{all} strings. |
63 whereby in the last clause the set $\mathbb{A}$ stands for the set of |
53 So $\sim{}r$ means `all the strings that $r$ cannot match'. We assume ranges |
64 \emph{all} strings. So $\sim{}r$ means `all the strings that $r$ |
54 like $[a\mbox{-}z0\mbox{-}9]$ are a shorthand for the regular expression |
65 cannot match'. We assume ranges like $[a\mbox{-}z0\mbox{-}9]$ are a |
|
66 shorthand for the regular expression |
55 |
67 |
56 \[ |
68 \[ |
57 [a b c d\ldots z 0 1\ldots 9]\;. |
69 [a b c d\ldots z 0 1\ldots 9]\;. |
58 \] |
70 \] |
59 |
71 |
60 \noindent |
72 \noindent |
61 Be careful that your implementation of $nullable$ and $der$ satisfies for every $r$ the following two |
73 Be careful that your implementation of $nullable$ and $der$ satisfies |
62 properties: |
74 for every $r$ the following two properties: |
63 |
75 |
64 \begin{itemize} |
76 \begin{itemize} |
65 \item $nullable(r)$ if and only if $""\in L(r)$ |
77 \item $nullable(r)$ if and only if $[]\in L(r)$ |
66 \item $L(der\,c\,r)) = Der\,c\,(L(r))$ |
78 \item $L(der\,c\,r)) = Der\,c\,(L(r))$ |
67 \end{itemize} |
79 \end{itemize} |
68 \newpage |
80 |
|
81 \noindent |
|
82 {\bf Important!} Your implementation should have explicit cases for the |
|
83 basic regular expressions, but also for the extended regular expressions. |
|
84 That means do not treat the extended regular expressions by just translating |
|
85 them into the basic ones. See also Question 2, where you asked to give |
|
86 the rules for \textit{nullable} and \textit{der}. |
|
87 |
69 |
88 |
70 \subsection*{Question 1 (unmarked)} |
89 \subsection*{Question 1 (unmarked)} |
71 |
90 |
72 What is your King's email address (you will need it in the next question)?\bigskip |
91 What is your King's email address (you will need it in Question 2)? |
73 |
92 |
74 \subsection*{Question 2 (marked with 1\%)} |
93 \subsection*{Question 2 (marked with 2\%)} |
|
94 |
|
95 This question does not require any implementation. From the lectures |
|
96 you have seen the definitions for the functions \textit{nullable} and |
|
97 \textit{der}. Give the rules for the extended regular expressions: |
|
98 |
|
99 \begin{center} |
|
100 \begin{tabular}{@ {}l@ {\hspace{2mm}}c@ {\hspace{2mm}}l@ {}} |
|
101 $nullable([c_1 c_2 \ldots c_n])$ & $\dn$ & $?$\\ |
|
102 $nullable(r^+)$ & $\dn$ & $?$\\ |
|
103 $nullable(r^?)$ & $\dn$ & $?$\\ |
|
104 $nullable(r^{\{n,m\}})$ & $\dn$ & $?$\\ |
|
105 $nullable(\sim{}r)$ & $\dn$ & $?$\medskip\\ |
|
106 $der c ([c_1 c_2 \ldots c_n])$ & $\dn$ & $?$\\ |
|
107 $der c (r^+)$ & $\dn$ & $?$\\ |
|
108 $der c (r^?)$ & $\dn$ & $?$\\ |
|
109 $der c (r^{\{n,m\}})$ & $\dn$ & $?$\\ |
|
110 $der c (\sim{}r)$ & $\dn$ & $?$\\ |
|
111 \end{tabular} |
|
112 \end{center} |
|
113 |
|
114 \subsection*{Question 3 (marked with 1\%)} |
75 |
115 |
76 Implement the following regular expression for email addresses |
116 Implement the following regular expression for email addresses |
77 |
117 |
78 \[ |
118 \[ |
79 ([a\mbox{-}z0\mbox{-}9\_\!\_\,.-]^+)\cdot @\cdot ([a\mbox{-}z0\mbox{-}9\,.-]^+)\cdot .\cdot ([a\mbox{-}z\,.]^{\{2,6\}}) |
119 ([a\mbox{-}z0\mbox{-}9\_\!\_\,.-]^+)\cdot @\cdot ([a\mbox{-}z0\mbox{-}9\,.-]^+)\cdot .\cdot ([a\mbox{-}z\,.]^{\{2,6\}}) |
80 \] |
120 \] |
81 |
121 |
82 \noindent |
122 \noindent |
83 and calculate the derivative according to your email address. When calculating |
123 and calculate the derivative according to your email address. When |
84 the derivative, simplify all regular expressions as much as possible, but at least apply the following |
124 calculating the derivative, simplify all regular expressions as much |
85 six simplification rules: |
125 as possible, but at least apply the following six simplification |
|
126 rules: |
86 |
127 |
87 \begin{center} |
128 \begin{center} |
88 \begin{tabular}{l@{\hspace{2mm}}c@{\hspace{2mm}}l} |
129 \begin{tabular}{l@{\hspace{2mm}}c@{\hspace{2mm}}l} |
89 $r \cdot \varnothing$ & $\mapsto$ & $\varnothing$\\ |
130 $r \cdot \varnothing$ & $\mapsto$ & $\varnothing$\\ |
90 $\varnothing \cdot r$ & $\mapsto$ & $\varnothing$\\ |
131 $\varnothing \cdot r$ & $\mapsto$ & $\varnothing$\\ |
94 $\varnothing + r$ & $\mapsto$ & $r$\\ |
135 $\varnothing + r$ & $\mapsto$ & $r$\\ |
95 \end{tabular} |
136 \end{tabular} |
96 \end{center} |
137 \end{center} |
97 |
138 |
98 \noindent |
139 \noindent |
99 Write down your simplified derivative in the ``mathematicical'' notation using parentheses where necessary. |
140 Write down your simplified derivative in the ``mathematicical'' |
|
141 notation using parentheses where necessary. |
100 |
142 |
101 \subsection*{Question 3 (marked with 1\%)} |
143 \subsection*{Question 4 (marked with 1\%)} |
102 |
144 |
103 Consider the regular expression $/ \cdot * \cdot (\sim{}([a\mbox{-}z]^* \cdot * \cdot / \cdot [a\mbox{-}z]^*)) \cdot * \cdot /$ and decide |
145 Consider the regular expression $/ \cdot * \cdot |
104 wether the following four strings are matched by this regular expression. Answer yes or no. |
146 (\sim{}([a\mbox{-}z]^* \cdot * \cdot / \cdot [a\mbox{-}z]^*)) \cdot * |
|
147 \cdot /$ and decide wether the following four strings are matched by |
|
148 this regular expression. Answer yes or no. |
105 |
149 |
106 \begin{enumerate} |
150 \begin{enumerate} |
107 \item \texttt{"/**/"} |
151 \item \texttt{"/**/"} |
108 \item \texttt{"/*foobar*/"} |
152 \item \texttt{"/*foobar*/"} |
109 \item \texttt{"/*test*/test*/"} |
153 \item \texttt{"/*test*/test*/"} |
110 \item \texttt{"/*test/*test*/"} |
154 \item \texttt{"/*test/*test*/"} |
111 \end{enumerate} |
155 \end{enumerate} |
112 |
156 |
113 \subsection*{Question 4 (marked with 1\%)} |
157 \subsection*{Question 5 (marked with 1\%)} |
114 |
158 |
115 Let $r_1$ be the regular expression $a\cdot a\cdot a$ and $r_2$ be $(a^{\{19,19\}}) \cdot (a^?)$. |
159 Let $r_1$ be the regular expression $a\cdot a\cdot a$ and $r_2$ be |
116 Decide whether the following three strings consisting of $a$s only can be matched by $(r_1^+)^+$. |
160 $(a^{\{19,19\}}) \cdot (a^?)$. Decide whether the following three |
117 Similarly test them with $(r_2^+)^+$. Again answer in all six cases with yes or no. \medskip |
161 strings consisting of $a$s only can be matched by $(r_1^+)^+$. |
|
162 Similarly test them with $(r_2^+)^+$. Again answer in all six cases |
|
163 with yes or no. \medskip |
118 |
164 |
119 \noindent |
165 \noindent |
120 These are strings are meant to be entirely made up of $a$s. Be careful when |
166 These are strings are meant to be entirely made up of $a$s. Be careful |
121 copy-and-pasting the strings so as to not forgetting any $a$ and to not introducing any |
167 when copy-and-pasting the strings so as to not forgetting any $a$ and |
122 other character. |
168 to not introducing any other character. |
123 |
169 |
124 \begin{enumerate} |
170 \begin{enumerate} |
125 \item \texttt{"aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa\\ |
171 \item \texttt{"aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa\\ |
126 aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa\\ |
172 aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa\\ |
127 aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa"} |
173 aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa"} |