|         |      1 % !TEX program = xelatex | 
|         |      2 \documentclass{article} | 
|         |      3 \usepackage{../style} | 
|         |      4 \usepackage{../langs} | 
|         |      5  | 
|         |      6 \usepackage{array} | 
|         |      7  | 
|         |      8  | 
|         |      9 \begin{document} | 
|         |     10 \newcolumntype{C}[1]{>{\centering}m{#1}} | 
|         |     11  | 
|         |     12 \section*{Coursework 1} | 
|         |     13  | 
|         |     14 This coursework is worth 5\% and is due on \cwONE{} at 18:00. You are | 
|         |     15 asked to implement a regular expression matcher and submit a document | 
|         |     16 containing the answers for the questions below. You can do the | 
|         |     17 implementation in any programming language you like, but you need to | 
|         |     18 submit the source code with which you answered the questions, | 
|         |     19 otherwise a mark of 0\% will be awarded. You can submit your answers | 
|         |     20 in a txt-file or pdf.  Code send as code. Please package everything | 
|         |     21 inside a zip-file that creates a directory with the name | 
|         |     22 \[\texttt{YournameYourfamilyname}\] | 
|         |     23  | 
|         |     24 \noindent on my end. Thanks! | 
|         |     25  | 
|         |     26  | 
|         |     27  | 
|         |     28 \subsubsection*{Disclaimer\alert} | 
|         |     29  | 
|         |     30 It should be understood that the work you submit represents | 
|         |     31 your own effort. You have not copied from anyone else. An | 
|         |     32 exception is the Scala code I showed during the lectures or | 
|         |     33 uploaded to KEATS, which you can freely use.\bigskip | 
|         |     34  | 
|         |     35 \noindent | 
|         |     36 If you have any questions, please send me an email in \textbf{good} | 
|         |     37 time.\bigskip | 
|         |     38  | 
|         |     39 \subsection*{Task} | 
|         |     40  | 
|         |     41 The task is to implement a regular expression matcher based on | 
|         |     42 derivatives of regular expressions. The implementation should | 
|         |     43 be able to deal with the usual (basic) regular expressions | 
|         |     44  | 
|         |     45 \[ | 
|         |     46 \ZERO,\; \ONE,\; c,\; r_1 + r_2,\; r_1 \cdot r_2,\; r^* | 
|         |     47 \] | 
|         |     48  | 
|         |     49 \noindent | 
|         |     50 but also with the following extended regular expressions: | 
|         |     51  | 
|         |     52 \begin{center} | 
|         |     53 \begin{tabular}{ll} | 
|         |     54   $[c_1,c_2,\ldots,c_n]$ & a set of characters---for character ranges\\ | 
|         |     55   $r^+$ & one or more times $r$\\ | 
|         |     56   $r^?$ & optional $r$\\ | 
|         |     57   $r^{\{n\}}$ & exactly $n$-times\\ | 
|         |     58   $r^{\{..m\}}$ & zero or more times $r$ but no more than $m$-times\\ | 
|         |     59   $r^{\{n..\}}$ & at least $n$-times $r$\\ | 
|         |     60   $r^{\{n..m\}}$ & at least $n$-times $r$ but no more than $m$-times\\ | 
|         |     61   $\sim{}r$ & not-regular-expression of $r$\\ | 
|         |     62 \end{tabular} | 
|         |     63 \end{center} | 
|         |     64  | 
|         |     65 \noindent You can assume that $n$ and $m$ are greater or equal than | 
|         |     66 $0$. In the case of $r^{\{n,m\}}$ you can also assume $0 \le n \le m$.\bigskip | 
|         |     67  | 
|         |     68 \noindent {\bf Important!} Your implementation should have explicit | 
|         |     69 case classes  for the basic regular expressions, but also explicit case | 
|         |     70 classes for | 
|         |     71 the extended regular expressions.\footnote{Please call them | 
|         |     72   \code{RANGE}, \code{PLUS}, \code{OPTIONAL}, \code{NTIMES}, | 
|         |     73   \code{UPTO}, \code{FROM} and \code{BETWEEN}.}  | 
|         |     74   That means do not treat the extended regular expressions | 
|         |     75 by just translating them into the basic ones. See also Question 3, | 
|         |     76 where you are asked to explicitly give the rules for \textit{nullable} | 
|         |     77 and \textit{der} for the extended regular expressions. Something like | 
|         |     78  | 
|         |     79 \[der\,c\,(r^+) \dn der\,c\,(r\cdot r^*)\] | 
|         |     80  | 
|         |     81 \noindent is \emph{not} allowed as answer in Question 3 and \emph{not} | 
|         |     82 allowed in your code.\medskip | 
|         |     83  | 
|         |     84 \noindent | 
|         |     85 The meanings of the extended regular expressions are | 
|         |     86  | 
|         |     87 \begin{center} | 
|         |     88 \begin{tabular}{r@{\hspace{2mm}}c@{\hspace{2mm}}l} | 
|         |     89   $L([c_1,c_2,\ldots,c_n])$ & $\dn$ & $\{[c_1], [c_2], \ldots, [c_n]\}$\\  | 
|         |     90   $L(r^+)$                  & $\dn$ & $\bigcup_{1\le i}.\;L(r)^i$\\ | 
|         |     91   $L(r^?)$                  & $\dn$ & $L(r) \cup \{[]\}$\\ | 
|         |     92   $L(r^{\{n\}})$             & $\dn$ & $L(r)^n$\\ | 
|         |     93   $L(r^{\{..m\}})$           & $\dn$ & $\bigcup_{0\le i \le m}.\;L(r)^i$\\ | 
|         |     94   $L(r^{\{n..\}})$           & $\dn$ & $\bigcup_{n\le i}.\;L(r)^i$\\ | 
|         |     95   $L(r^{\{n..m\}})$          & $\dn$ & $\bigcup_{n\le i \le m}.\;L(r)^i$\\ | 
|         |     96   $L(\sim{}r)$              & $\dn$ & $\Sigma^* - L(r)$ | 
|         |     97 \end{tabular} | 
|         |     98 \end{center} | 
|         |     99  | 
|         |    100 \noindent whereby in the last clause the set $\Sigma^*$ stands | 
|         |    101 for the set of \emph{all} strings over the alphabet $\Sigma$ | 
|         |    102 (in the implementation the alphabet can be just what is | 
|         |    103 represented by, say, the type \pcode{Char}). So $\sim{}r$ | 
|         |    104 means in effect ``all the strings that $r$ cannot match''.\medskip  | 
|         |    105  | 
|         |    106 \noindent | 
|         |    107 Be careful that your implementation of \textit{nullable} and | 
|         |    108 \textit{der} satisfies for every regular expression $r$ the following | 
|         |    109 two properties (see also Question 3): | 
|         |    110  | 
|         |    111 \begin{itemize} | 
|         |    112 \item $\textit{nullable}(r)$ if and only if $[]\in L(r)$ | 
|         |    113 \item $L(der\,c\,r) = Der\,c\,(L(r))$ | 
|         |    114 \end{itemize} | 
|         |    115  | 
|         |    116  | 
|         |    117  | 
|         |    118 \subsection*{Question 1 (Unmarked)} | 
|         |    119  | 
|         |    120 What is your King's email address (you will need it in | 
|         |    121 Question 5)? | 
|         |    122  | 
|         |    123 \subsection*{Question 2 (Unmarked)} | 
|         |    124  | 
|         |    125 Can you please list all programming languages in which you have | 
|         |    126 already written programs (include only instances where you have spent | 
|         |    127 at least a good working day fiddling with a program)?  This is just | 
|         |    128 for my curiosity to estimate what your background is. | 
|         |    129  | 
|         |    130 \subsection*{Question 3} | 
|         |    131  | 
|         |    132 From the | 
|         |    133 lectures you have seen the definitions for the functions | 
|         |    134 \textit{nullable} and \textit{der} for the basic regular | 
|         |    135 expressions. Implement and write down the rules for the extended | 
|         |    136 regular expressions: | 
|         |    137  | 
|         |    138 \begin{center} | 
|         |    139 \begin{tabular}{@ {}l@ {\hspace{2mm}}c@ {\hspace{2mm}}l@ {}} | 
|         |    140   $\textit{nullable}([c_1,c_2,\ldots,c_n])$  & $\dn$ & $?$\\ | 
|         |    141   $\textit{nullable}(r^+)$                   & $\dn$ & $?$\\ | 
|         |    142   $\textit{nullable}(r^?)$                   & $\dn$ & $?$\\ | 
|         |    143   $\textit{nullable}(r^{\{n\}})$              & $\dn$ & $?$\\ | 
|         |    144   $\textit{nullable}(r^{\{..m\}})$            & $\dn$ & $?$\\ | 
|         |    145   $\textit{nullable}(r^{\{n..\}})$            & $\dn$ & $?$\\ | 
|         |    146   $\textit{nullable}(r^{\{n..m\}})$           & $\dn$ & $?$\\ | 
|         |    147   $\textit{nullable}(\sim{}r)$              & $\dn$ & $?$ | 
|         |    148 \end{tabular} | 
|         |    149 \end{center} | 
|         |    150  | 
|         |    151 \begin{center} | 
|         |    152 \begin{tabular}{@ {}l@ {\hspace{2mm}}c@ {\hspace{2mm}}l@ {}} | 
|         |    153   $der\, c\, ([c_1,c_2,\ldots,c_n])$  & $\dn$ & $?$\\ | 
|         |    154   $der\, c\, (r^+)$                   & $\dn$ & $?$\\ | 
|         |    155   $der\, c\, (r^?)$                   & $\dn$ & $?$\\ | 
|         |    156   $der\, c\, (r^{\{n\}})$              & $\dn$ & $?$\\ | 
|         |    157   $der\, c\, (r^{\{..m\}})$           & $\dn$ & $?$\\ | 
|         |    158   $der\, c\, (r^{\{n..\}})$           & $\dn$ & $?$\\ | 
|         |    159   $der\, c\, (r^{\{n..m\}})$           & $\dn$ & $?$\\ | 
|         |    160   $der\, c\, (\sim{}r)$               & $\dn$ & $?$\\ | 
|         |    161 \end{tabular} | 
|         |    162 \end{center} | 
|         |    163  | 
|         |    164 \noindent | 
|         |    165 Remember your definitions have to satisfy the two properties | 
|         |    166  | 
|         |    167 \begin{itemize} | 
|         |    168 \item $\textit{nullable}(r)$ if and only if $[]\in L(r)$ | 
|         |    169 \item $L(der\,c\,r)) = Der\,c\,(L(r))$ | 
|         |    170 \end{itemize} | 
|         |    171  | 
|         |    172 \noindent | 
|         |    173 Given the definitions of \textit{nullable} and \textit{der}, it is | 
|         |    174 easy to implement a regular expression matcher.  Test your regular | 
|         |    175 expression matcher with (at least) the examples: | 
|         |    176  | 
|         |    177  | 
|         |    178 \begin{center} | 
|         |    179 \def\arraystretch{1.2}   | 
|         |    180 \begin{tabular}{@{}r|m{3mm}|m{6mm}|m{6mm}|m{10mm}|m{6mm}|m{10mm}|m{10mm}|m{10mm}} | 
|         |    181   string & $a^?$ & $\sim{}a$ & $a^{\{3\}}$ & $(a^?)^{\{3\}}$ & $a^{\{..3\}}$ & | 
|         |    182      $(a^?)^{\{..3\}}$ & $a^{\{3..5\}}$ & $(a^?)^{\{3..5\}}$ \\\hline | 
|         |    183   $[]$           &&&&&&& \\\hline  | 
|         |    184   \texttt{a}     &&&&&&& \\\hline  | 
|         |    185   \texttt{aa}    &&&&&&& \\\hline  | 
|         |    186   \texttt{aaa}   &&&&&&& \\\hline  | 
|         |    187   \texttt{aaaaa} &&&&&&& \\\hline  | 
|         |    188   \texttt{aaaaaa}&&&&&&& \\ | 
|         |    189 \end{tabular} | 
|         |    190 \end{center} | 
|         |    191  | 
|         |    192 \noindent | 
|         |    193 Does your matcher produce the expected results? Make sure you | 
|         |    194 also test corner-cases, like $a^{\{0\}}$! | 
|         |    195  | 
|         |    196 \subsection*{Question 4} | 
|         |    197  | 
|         |    198 As you can see, there are a number of explicit regular expressions | 
|         |    199 that deal with single or several characters, for example: | 
|         |    200  | 
|         |    201 \begin{center} | 
|         |    202 \begin{tabular}{ll} | 
|         |    203   $c$ & matches a single character\\   | 
|         |    204   $[c_1,c_2,\ldots,c_n]$ & matches a set of characters---for character ranges\\ | 
|         |    205   $\textit{ALL}$ & matches any character | 
|         |    206 \end{tabular} | 
|         |    207 \end{center} | 
|         |    208  | 
|         |    209 \noindent | 
|         |    210 The latter is useful for matching any string (for example | 
|         |    211 by using $\textit{ALL}^*$). In order to avoid having an explicit constructor | 
|         |    212 for each case, we can generalise all these cases and introduce a single | 
|         |    213 constructor $\textit{CFUN}(f)$ where $f$ is a function from characters | 
|         |    214 to booleans. In Scala code this would look as follows: | 
|         |    215  | 
|         |    216 \begin{lstlisting}[numbers=none] | 
|         |    217 abstract class Rexp | 
|         |    218 ... | 
|         |    219 case class CFUN(f: Char => Boolean) extends Rexp  | 
|         |    220 \end{lstlisting}\smallskip | 
|         |    221  | 
|         |    222 \noindent | 
|         |    223 The idea is that the function $f$ determines which character(s) | 
|         |    224 are matched, namely those where $f$ returns \texttt{true}. | 
|         |    225 In this question implement \textit{CFUN} and define | 
|         |    226  | 
|         |    227 \begin{center} | 
|         |    228 \begin{tabular}{@ {}l@ {\hspace{2mm}}c@ {\hspace{2mm}}l@ {}} | 
|         |    229   $\textit{nullable}(\textit{CFUN}(f))$  & $\dn$ & $?$\\ | 
|         |    230   $\textit{der}\,c\,(\textit{CFUN}(f))$  & $\dn$ & $?$ | 
|         |    231 \end{tabular} | 
|         |    232 \end{center} | 
|         |    233  | 
|         |    234 \noindent in your matcher and then also give definitions for | 
|         |    235  | 
|         |    236 \begin{center} | 
|         |    237 \begin{tabular}{@ {}l@ {\hspace{2mm}}c@ {\hspace{2mm}}l@ {}} | 
|         |    238   $c$  & $\dn$ & $\textit{CFUN}(?)$\\ | 
|         |    239   $[c_1,c_2,\ldots,c_n]$  & $\dn$ & $\textit{CFUN}(?)$\\ | 
|         |    240   $\textit{ALL}$  & $\dn$ & $\textit{CFUN}(?)$ | 
|         |    241 \end{tabular} | 
|         |    242 \end{center} | 
|         |    243  | 
|         |    244 \noindent | 
|         |    245 You can either add the constructor $CFUN$ to your implementation in | 
|         |    246 Question 3, or you can implement this questions first | 
|         |    247 and then use $CFUN$ instead of \code{RANGE} and \code{CHAR} in Question 3. | 
|         |    248  | 
|         |    249  | 
|         |    250 \subsection*{Question 5} | 
|         |    251  | 
|         |    252 Suppose $[a\mbox{-}z0\mbox{-}9\_\,.\mbox{-}]$ stands for the regular expression | 
|         |    253  | 
|         |    254 \[[a,b,c,\ldots,z,0,\dots,9,\_,.,\mbox{-}]\;.\] | 
|         |    255  | 
|         |    256 \noindent | 
|         |    257 Define in your code the following regular expression for email addresses | 
|         |    258  | 
|         |    259 \[ | 
|         |    260 ([a\mbox{-}z0\mbox{-}9\_\,.-]^+)\cdot @\cdot ([a\mbox{-}z0\mbox{-}9\,.-]^+)\cdot .\cdot ([a\mbox{-}z\,.]^{\{2,6\}}) | 
|         |    261 \] | 
|         |    262  | 
|         |    263 \noindent and calculate the derivative according to your own email | 
|         |    264 address. When calculating the derivative, simplify all regular | 
|         |    265 expressions as much as possible by applying the | 
|         |    266 following 7 simplification rules: | 
|         |    267  | 
|         |    268 \begin{center} | 
|         |    269 \begin{tabular}{l@{\hspace{2mm}}c@{\hspace{2mm}}ll} | 
|         |    270 $r \cdot \ZERO$ & $\mapsto$ & $\ZERO$\\  | 
|         |    271 $\ZERO \cdot r$ & $\mapsto$ & $\ZERO$\\  | 
|         |    272 $r \cdot \ONE$ & $\mapsto$ & $r$\\  | 
|         |    273 $\ONE \cdot r$ & $\mapsto$ & $r$\\  | 
|         |    274 $r + \ZERO$ & $\mapsto$ & $r$\\  | 
|         |    275 $\ZERO + r$ & $\mapsto$ & $r$\\  | 
|         |    276 $r + r$ & $\mapsto$ & $r$\\  | 
|         |    277 \end{tabular} | 
|         |    278 \end{center} | 
|         |    279  | 
|         |    280 \noindent Write down your simplified derivative in a readable | 
|         |    281 notation using parentheses where necessary. That means you | 
|         |    282 should use the infix notation $+$, $\cdot$, $^*$ and so on, | 
|         |    283 instead of raw code.\bigskip | 
|         |    284  | 
|         |    285  | 
|         |    286 \subsection*{Question 6} | 
|         |    287  | 
|         |    288 Implement the simplification rules in your regular expression matcher. | 
|         |    289 Consider the regular expression $/ \cdot * \cdot | 
|         |    290 (\sim{}(\textit{ALL}^* \cdot * \cdot / \cdot \textit{ALL}^*)) \cdot * | 
|         |    291 \cdot /$ and decide whether the following four strings are matched by | 
|         |    292 this regular expression. Answer yes or no. | 
|         |    293  | 
|         |    294 \begin{enumerate} | 
|         |    295 \item \texttt{"/**/"} | 
|         |    296 \item \texttt{"/*foobar*/"} | 
|         |    297 \item \texttt{"/*test*/test*/"} | 
|         |    298 \item \texttt{"/*test/*test*/"} | 
|         |    299 \end{enumerate} | 
|         |    300  | 
|         |    301 \subsection*{Question 7} | 
|         |    302  | 
|         |    303 Let $r_1$ be the regular expression $a\cdot a\cdot a$ and $r_2$ be | 
|         |    304 $(a^{\{19,19\}}) \cdot (a^?)$.\medskip | 
|         |    305  | 
|         |    306 \noindent | 
|         |    307 Decide whether the following three | 
|         |    308 strings consisting of $a$s only can be matched by $(r_1^+)^+$. | 
|         |    309 Similarly test them with $(r_2^+)^+$. Again answer in all six cases | 
|         |    310 with yes or no. \medskip | 
|         |    311  | 
|         |    312 \noindent | 
|         |    313 These are strings are meant to be entirely made up of $a$s. Be careful | 
|         |    314 when copy-and-pasting the strings so as to not forgetting any $a$ and | 
|         |    315 to not introducing any other character. | 
|         |    316  | 
|         |    317 \begin{enumerate} | 
|         |    318 \setcounter{enumi}{4} | 
|         |    319 \item \texttt{"aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa\\ | 
|         |    320 aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa\\ | 
|         |    321 aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa"} | 
|         |    322 \item \texttt{"aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa\\  | 
|         |    323 aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa\\  | 
|         |    324 aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa"} | 
|         |    325 \item \texttt{"aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa\\  | 
|         |    326 aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa\\  | 
|         |    327 aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa"} | 
|         |    328 \end{enumerate} | 
|         |    329  | 
|         |    330  | 
|         |    331  | 
|         |    332 \end{document} | 
|         |    333  | 
|         |    334 %%% Local Variables:  | 
|         |    335 %%% mode: latex | 
|         |    336 %%% TeX-master: t | 
|         |    337 %%% End:  |