cws/cw01.tex
changeset 6 aae256985251
child 9 48a477fdef21
equal deleted inserted replaced
5:c1e6123d02f4 6:aae256985251
       
     1 \documentclass{article}
       
     2 \usepackage{style}
       
     3 %%\usepackage{../langs}
       
     4 
       
     5 \begin{document}
       
     6 
       
     7 \section*{Coursework 1 (Strand 1)}
       
     8 
       
     9 This coursework is worth 4\% and is due on 25 October at
       
    10 16:00. You are asked to implement a regular expression matcher
       
    11 and submit a document containing the answers for the questions
       
    12 below. You can do the implementation in any programming
       
    13 language you like, but you need to submit the source code with
       
    14 which you answered the questions, otherwise a mark of 0\% will
       
    15 be awarded. You can submit your answers in a txt-file or pdf.
       
    16 Code send as code.
       
    17 
       
    18 
       
    19 
       
    20 \subsubsection*{Disclaimer}
       
    21 
       
    22 It should be understood that the work you submit represents
       
    23 your own effort. You have not copied from anyone else. An
       
    24 exception is the Scala code I showed during the lectures or
       
    25 uploaded to KEATS, which you can freely use.\bigskip
       
    26 
       
    27 
       
    28 \subsubsection*{Task}
       
    29 
       
    30 The task is to implement a regular expression matcher based on
       
    31 derivatives of regular expressions. The implementation should
       
    32 be able to deal with the usual (basic) regular expressions
       
    33 
       
    34 \[
       
    35 \ZERO,\; \ONE,\; c,\; r_1 + r_2,\; r_1 \cdot r_2,\; r^*
       
    36 \]
       
    37 
       
    38 \noindent
       
    39 but also with the following extended regular expressions:
       
    40 
       
    41 \begin{center}
       
    42 \begin{tabular}{ll}
       
    43 $[c_1 c_2 \ldots c_n]$ & a range of characters\\
       
    44 $r^+$ & one or more times $r$\\
       
    45 $r^?$ & optional $r$\\
       
    46 $r^{\{n,m\}}$ & at least $n$-times $r$ but no more than $m$-times\\
       
    47 $\sim{}r$ & not-regular expression of $r$\\
       
    48 \end{tabular}
       
    49 \end{center}
       
    50 
       
    51 \noindent In the case of $r^{\{n,m\}}$ you can assume the
       
    52 convention that $0 \le n \le m$. The meanings of the extended
       
    53 regular expressions are
       
    54 
       
    55 \begin{center}
       
    56 \begin{tabular}{r@{\hspace{2mm}}c@{\hspace{2mm}}l}
       
    57 $L([c_1 c_2 \ldots c_n])$ & $\dn$ & $\{[c_1], [c_2], \ldots, [c_n]\}$\\ 
       
    58 $L(r^+)$                  & $\dn$ & $\bigcup_{1\le i}. L(r)^i$\\
       
    59 $L(r^?)$                  & $\dn$ & $L(r) \cup \{[]\}$\\
       
    60 $L(r^{\{n,m\}})$           & $\dn$ & $\bigcup_{n\le i \le m}. L(r)^i$\\
       
    61 $L(\sim{}r)$              & $\dn$ & $\Sigma^* - L(r)$
       
    62 \end{tabular}
       
    63 \end{center}
       
    64 
       
    65 \noindent whereby in the last clause the set $\Sigma^*$ stands
       
    66 for the set of \emph{all} strings over the alphabet $\Sigma$
       
    67 (in the implementation the alphabet can be just what is
       
    68 represented by, say, the type \texttt{Char}). So $\sim{}r$
       
    69 means `all the strings that $r$ cannot match'. 
       
    70 
       
    71 Be careful that your implementation of \textit{nullable} and
       
    72 \textit{der} satisfies for every $r$ the following two
       
    73 properties (see also Question 2):
       
    74 
       
    75 \begin{itemize}
       
    76 \item $\textit{nullable}(r)$ if and only if $[]\in L(r)$
       
    77 \item $L(der\,c\,r) = Der\,c\,(L(r))$
       
    78 \end{itemize}
       
    79 
       
    80 \noindent {\bf Important!} Your implementation should have
       
    81 explicit cases for the basic regular expressions, but also
       
    82 explicit cases for the extended regular expressions. That
       
    83 means do not treat the extended regular expressions by just
       
    84 translating them into the basic ones. See also Question 2,
       
    85 where you are asked to explicitly give the rules for
       
    86 \textit{nullable} and \textit{der} for the extended regular
       
    87 expressions.
       
    88 
       
    89 
       
    90 \subsection*{Question 1}
       
    91 
       
    92 What is your King's email address (you will need it in
       
    93 Question 3)?
       
    94 
       
    95 \subsection*{Question 2}
       
    96 
       
    97 This question does not require any implementation. From the
       
    98 lectures you have seen the definitions for the functions
       
    99 \textit{nullable} and \textit{der} for the basic regular
       
   100 expressions. Give the rules for the extended regular
       
   101 expressions:
       
   102 
       
   103 \begin{center}
       
   104 \begin{tabular}{@ {}l@ {\hspace{2mm}}c@ {\hspace{2mm}}l@ {}}
       
   105 $\textit{nullable}([c_1 c_2 \ldots c_n])$  & $\dn$ & $?$\\
       
   106 $\textit{nullable}(r^+)$                   & $\dn$ & $?$\\
       
   107 $\textit{nullable}(r^?)$                   & $\dn$ & $?$\\
       
   108 $\textit{nullable}(r^{\{n,m\}})$            & $\dn$ & $?$\\
       
   109 $\textit{nullable}(\sim{}r)$               & $\dn$ & $?$\medskip\\
       
   110 $der\, c\, ([c_1 c_2 \ldots c_n])$  & $\dn$ & $?$\\
       
   111 $der\, c\, (r^+)$                   & $\dn$ & $?$\\
       
   112 $der\, c\, (r^?)$                   & $\dn$ & $?$\\
       
   113 $der\, c\, (r^{\{n,m\}})$            & $\dn$ & $?$\\
       
   114 $der\, c\, (\sim{}r)$               & $\dn$ & $?$\\
       
   115 \end{tabular}
       
   116 \end{center}
       
   117 
       
   118 \noindent
       
   119 Remember your definitions have to satisfy the two properties
       
   120 
       
   121 \begin{itemize}
       
   122 \item $\textit{nullable}(r)$ if and only if $[]\in L(r)$
       
   123 \item $L(der\,c\,r)) = Der\,c\,(L(r))$
       
   124 \end{itemize}
       
   125 
       
   126 \subsection*{Question 3}
       
   127 
       
   128 Implement the following regular expression for email addresses
       
   129 
       
   130 \[
       
   131 ([a\mbox{-}z0\mbox{-}9\_\!\_\,.-]^+)\cdot @\cdot ([a\mbox{-}z0\mbox{-}9\,.-]^+)\cdot .\cdot ([a\mbox{-}z\,.]^{\{2,6\}})
       
   132 \]
       
   133 
       
   134 \noindent and calculate the derivative according to your email
       
   135 address. When calculating the derivative, simplify all regular
       
   136 expressions as much as possible by applying the
       
   137 following 7 simplification rules:
       
   138 
       
   139 \begin{center}
       
   140 \begin{tabular}{l@{\hspace{2mm}}c@{\hspace{2mm}}ll}
       
   141 $r \cdot \ZERO$ & $\mapsto$ & $\ZERO$\\ 
       
   142 $\ZERO \cdot r$ & $\mapsto$ & $\ZERO$\\ 
       
   143 $r \cdot \ONE$ & $\mapsto$ & $r$\\ 
       
   144 $\ONE \cdot r$ & $\mapsto$ & $r$\\ 
       
   145 $r + \ZERO$ & $\mapsto$ & $r$\\ 
       
   146 $\ZERO + r$ & $\mapsto$ & $r$\\ 
       
   147 $r + r$ & $\mapsto$ & $r$\\ 
       
   148 \end{tabular}
       
   149 \end{center}
       
   150 
       
   151 \noindent Write down your simplified derivative in a readable
       
   152 notation using parentheses where necessary. That means you
       
   153 should use the infix notation $+$, $\cdot$, $^*$ and so on,
       
   154 instead of code.
       
   155  
       
   156 \subsection*{Question 4}
       
   157 
       
   158 Suppose \textit{[a-z]} stands for the range regular expression
       
   159 $[a,b,c,\ldots,z]$.  Consider the regular expression $/ \cdot * \cdot
       
   160 (\sim{}([a\mbox{-}z]^* \cdot * \cdot / \cdot [a\mbox{-}z]^*)) \cdot *
       
   161 \cdot /$ and decide wether the following four strings are matched by
       
   162 this regular expression. Answer yes or no.
       
   163 
       
   164 \begin{enumerate}
       
   165 \item \texttt{"/**/"}
       
   166 \item \texttt{"/*foobar*/"}
       
   167 \item \texttt{"/*test*/test*/"}
       
   168 \item \texttt{"/*test/*test*/"}
       
   169 \end{enumerate}
       
   170 
       
   171 \noindent
       
   172 Also test your regular expression matcher with the regular
       
   173 expression $a^{\{3,5\}}$ and the strings
       
   174 
       
   175 \begin{enumerate}
       
   176 \setcounter{enumi}{4}
       
   177 \item \texttt{aa}
       
   178 \item \texttt{aaa}
       
   179 \item \texttt{aaaaa}
       
   180 \item \texttt{aaaaaa}
       
   181 \end{enumerate}
       
   182 
       
   183 \noindent
       
   184 Does your matcher produce the expected results?
       
   185 
       
   186 \subsection*{Question 5}
       
   187 
       
   188 Let $r_1$ be the regular expression $a\cdot a\cdot a$ and $r_2$ be
       
   189 $(a^{\{19,19\}}) \cdot (a^?)$.  Decide whether the following three
       
   190 strings consisting of $a$s only can be matched by $(r_1^+)^+$.
       
   191 Similarly test them with $(r_2^+)^+$. Again answer in all six cases
       
   192 with yes or no. \medskip
       
   193 
       
   194 \noindent
       
   195 These are strings are meant to be entirely made up of $a$s. Be careful
       
   196 when copy-and-pasting the strings so as to not forgetting any $a$ and
       
   197 to not introducing any other character.
       
   198 
       
   199 \begin{enumerate}
       
   200 \item \texttt{"aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa\\
       
   201 aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa\\
       
   202 aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa"}
       
   203 \item \texttt{"aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa\\ 
       
   204 aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa\\ 
       
   205 aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa"}
       
   206 \item \texttt{"aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa\\ 
       
   207 aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa\\ 
       
   208 aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa"}
       
   209 \end{enumerate}
       
   210 
       
   211 
       
   212 \end{document}
       
   213 
       
   214 %%% Local Variables: 
       
   215 %%% mode: latex
       
   216 %%% TeX-master: t
       
   217 %%% End: