cws/cw01.tex
changeset 752 c0bdd4ad69ca
parent 750 e93a9e74ca8e
child 772 3bf3f5bb067e
equal deleted inserted replaced
751:4b208d81e002 752:c0bdd4ad69ca
       
     1 % !TEX program = xelatex
       
     2 \documentclass{article}
       
     3 \usepackage{../style}
       
     4 \usepackage{../langs}
       
     5 
       
     6 \usepackage{array}
       
     7 
       
     8 
       
     9 \begin{document}
       
    10 \newcolumntype{C}[1]{>{\centering}m{#1}}
       
    11 
       
    12 \section*{Coursework 1}
       
    13 
       
    14 This coursework is worth 5\% and is due on \cwONE{} at 18:00. You are
       
    15 asked to implement a regular expression matcher and submit a document
       
    16 containing the answers for the questions below. You can do the
       
    17 implementation in any programming language you like, but you need to
       
    18 submit the source code with which you answered the questions,
       
    19 otherwise a mark of 0\% will be awarded. You can submit your answers
       
    20 in a txt-file or pdf.  Code send as code. Please package everything
       
    21 inside a zip-file that creates a directory with the name
       
    22 \[\texttt{YournameYourfamilyname}\]
       
    23 
       
    24 \noindent on my end. Thanks!
       
    25 
       
    26 
       
    27 
       
    28 \subsubsection*{Disclaimer\alert}
       
    29 
       
    30 It should be understood that the work you submit represents
       
    31 your own effort. You have not copied from anyone else. An
       
    32 exception is the Scala code I showed during the lectures or
       
    33 uploaded to KEATS, which you can freely use.\bigskip
       
    34 
       
    35 \noindent
       
    36 If you have any questions, please send me an email in \textbf{good}
       
    37 time.\bigskip
       
    38 
       
    39 \subsection*{Task}
       
    40 
       
    41 The task is to implement a regular expression matcher based on
       
    42 derivatives of regular expressions. The implementation should
       
    43 be able to deal with the usual (basic) regular expressions
       
    44 
       
    45 \[
       
    46 \ZERO,\; \ONE,\; c,\; r_1 + r_2,\; r_1 \cdot r_2,\; r^*
       
    47 \]
       
    48 
       
    49 \noindent
       
    50 but also with the following extended regular expressions:
       
    51 
       
    52 \begin{center}
       
    53 \begin{tabular}{ll}
       
    54   $[c_1,c_2,\ldots,c_n]$ & a set of characters---for character ranges\\
       
    55   $r^+$ & one or more times $r$\\
       
    56   $r^?$ & optional $r$\\
       
    57   $r^{\{n\}}$ & exactly $n$-times\\
       
    58   $r^{\{..m\}}$ & zero or more times $r$ but no more than $m$-times\\
       
    59   $r^{\{n..\}}$ & at least $n$-times $r$\\
       
    60   $r^{\{n..m\}}$ & at least $n$-times $r$ but no more than $m$-times\\
       
    61   $\sim{}r$ & not-regular-expression of $r$\\
       
    62 \end{tabular}
       
    63 \end{center}
       
    64 
       
    65 \noindent You can assume that $n$ and $m$ are greater or equal than
       
    66 $0$. In the case of $r^{\{n,m\}}$ you can also assume $0 \le n \le m$.\bigskip
       
    67 
       
    68 \noindent {\bf Important!} Your implementation should have explicit
       
    69 case classes  for the basic regular expressions, but also explicit case
       
    70 classes for
       
    71 the extended regular expressions.\footnote{Please call them
       
    72   \code{RANGE}, \code{PLUS}, \code{OPTIONAL}, \code{NTIMES},
       
    73   \code{UPTO}, \code{FROM} and \code{BETWEEN}.} 
       
    74   That means do not treat the extended regular expressions
       
    75 by just translating them into the basic ones. See also Question 3,
       
    76 where you are asked to explicitly give the rules for \textit{nullable}
       
    77 and \textit{der} for the extended regular expressions. Something like
       
    78 
       
    79 \[der\,c\,(r^+) \dn der\,c\,(r\cdot r^*)\]
       
    80 
       
    81 \noindent is \emph{not} allowed as answer in Question 3 and \emph{not}
       
    82 allowed in your code.\medskip
       
    83 
       
    84 \noindent
       
    85 The meanings of the extended regular expressions are
       
    86 
       
    87 \begin{center}
       
    88 \begin{tabular}{r@{\hspace{2mm}}c@{\hspace{2mm}}l}
       
    89   $L([c_1,c_2,\ldots,c_n])$ & $\dn$ & $\{[c_1], [c_2], \ldots, [c_n]\}$\\ 
       
    90   $L(r^+)$                  & $\dn$ & $\bigcup_{1\le i}.\;L(r)^i$\\
       
    91   $L(r^?)$                  & $\dn$ & $L(r) \cup \{[]\}$\\
       
    92   $L(r^{\{n\}})$             & $\dn$ & $L(r)^n$\\
       
    93   $L(r^{\{..m\}})$           & $\dn$ & $\bigcup_{0\le i \le m}.\;L(r)^i$\\
       
    94   $L(r^{\{n..\}})$           & $\dn$ & $\bigcup_{n\le i}.\;L(r)^i$\\
       
    95   $L(r^{\{n..m\}})$          & $\dn$ & $\bigcup_{n\le i \le m}.\;L(r)^i$\\
       
    96   $L(\sim{}r)$              & $\dn$ & $\Sigma^* - L(r)$
       
    97 \end{tabular}
       
    98 \end{center}
       
    99 
       
   100 \noindent whereby in the last clause the set $\Sigma^*$ stands
       
   101 for the set of \emph{all} strings over the alphabet $\Sigma$
       
   102 (in the implementation the alphabet can be just what is
       
   103 represented by, say, the type \pcode{Char}). So $\sim{}r$
       
   104 means in effect ``all the strings that $r$ cannot match''.\medskip 
       
   105 
       
   106 \noindent
       
   107 Be careful that your implementation of \textit{nullable} and
       
   108 \textit{der} satisfies for every regular expression $r$ the following
       
   109 two properties (see also Question 3):
       
   110 
       
   111 \begin{itemize}
       
   112 \item $\textit{nullable}(r)$ if and only if $[]\in L(r)$
       
   113 \item $L(der\,c\,r) = Der\,c\,(L(r))$
       
   114 \end{itemize}
       
   115 
       
   116 
       
   117 
       
   118 \subsection*{Question 1 (Unmarked)}
       
   119 
       
   120 What is your King's email address (you will need it in
       
   121 Question 5)?
       
   122 
       
   123 \subsection*{Question 2 (Unmarked)}
       
   124 
       
   125 Can you please list all programming languages in which you have
       
   126 already written programs (include only instances where you have spent
       
   127 at least a good working day fiddling with a program)?  This is just
       
   128 for my curiosity to estimate what your background is.
       
   129 
       
   130 \subsection*{Question 3}
       
   131 
       
   132 From the
       
   133 lectures you have seen the definitions for the functions
       
   134 \textit{nullable} and \textit{der} for the basic regular
       
   135 expressions. Implement and write down the rules for the extended
       
   136 regular expressions:
       
   137 
       
   138 \begin{center}
       
   139 \begin{tabular}{@ {}l@ {\hspace{2mm}}c@ {\hspace{2mm}}l@ {}}
       
   140   $\textit{nullable}([c_1,c_2,\ldots,c_n])$  & $\dn$ & $?$\\
       
   141   $\textit{nullable}(r^+)$                   & $\dn$ & $?$\\
       
   142   $\textit{nullable}(r^?)$                   & $\dn$ & $?$\\
       
   143   $\textit{nullable}(r^{\{n\}})$              & $\dn$ & $?$\\
       
   144   $\textit{nullable}(r^{\{..m\}})$            & $\dn$ & $?$\\
       
   145   $\textit{nullable}(r^{\{n..\}})$            & $\dn$ & $?$\\
       
   146   $\textit{nullable}(r^{\{n..m\}})$           & $\dn$ & $?$\\
       
   147   $\textit{nullable}(\sim{}r)$              & $\dn$ & $?$
       
   148 \end{tabular}
       
   149 \end{center}
       
   150 
       
   151 \begin{center}
       
   152 \begin{tabular}{@ {}l@ {\hspace{2mm}}c@ {\hspace{2mm}}l@ {}}
       
   153   $der\, c\, ([c_1,c_2,\ldots,c_n])$  & $\dn$ & $?$\\
       
   154   $der\, c\, (r^+)$                   & $\dn$ & $?$\\
       
   155   $der\, c\, (r^?)$                   & $\dn$ & $?$\\
       
   156   $der\, c\, (r^{\{n\}})$              & $\dn$ & $?$\\
       
   157   $der\, c\, (r^{\{..m\}})$           & $\dn$ & $?$\\
       
   158   $der\, c\, (r^{\{n..\}})$           & $\dn$ & $?$\\
       
   159   $der\, c\, (r^{\{n..m\}})$           & $\dn$ & $?$\\
       
   160   $der\, c\, (\sim{}r)$               & $\dn$ & $?$\\
       
   161 \end{tabular}
       
   162 \end{center}
       
   163 
       
   164 \noindent
       
   165 Remember your definitions have to satisfy the two properties
       
   166 
       
   167 \begin{itemize}
       
   168 \item $\textit{nullable}(r)$ if and only if $[]\in L(r)$
       
   169 \item $L(der\,c\,r)) = Der\,c\,(L(r))$
       
   170 \end{itemize}
       
   171 
       
   172 \noindent
       
   173 Given the definitions of \textit{nullable} and \textit{der}, it is
       
   174 easy to implement a regular expression matcher.  Test your regular
       
   175 expression matcher with (at least) the examples:
       
   176 
       
   177 
       
   178 \begin{center}
       
   179 \def\arraystretch{1.2}  
       
   180 \begin{tabular}{@{}r|m{3mm}|m{6mm}|m{6mm}|m{10mm}|m{6mm}|m{10mm}|m{10mm}|m{10mm}}
       
   181   string & $a^?$ & $\sim{}a$ & $a^{\{3\}}$ & $(a^?)^{\{3\}}$ & $a^{\{..3\}}$ &
       
   182      $(a^?)^{\{..3\}}$ & $a^{\{3..5\}}$ & $(a^?)^{\{3..5\}}$ \\\hline
       
   183   $[]$           &&&&&&& \\\hline 
       
   184   \texttt{a}     &&&&&&& \\\hline 
       
   185   \texttt{aa}    &&&&&&& \\\hline 
       
   186   \texttt{aaa}   &&&&&&& \\\hline 
       
   187   \texttt{aaaaa} &&&&&&& \\\hline 
       
   188   \texttt{aaaaaa}&&&&&&& \\
       
   189 \end{tabular}
       
   190 \end{center}
       
   191 
       
   192 \noindent
       
   193 Does your matcher produce the expected results? Make sure you
       
   194 also test corner-cases, like $a^{\{0\}}$!
       
   195 
       
   196 \subsection*{Question 4}
       
   197 
       
   198 As you can see, there are a number of explicit regular expressions
       
   199 that deal with single or several characters, for example:
       
   200 
       
   201 \begin{center}
       
   202 \begin{tabular}{ll}
       
   203   $c$ & matches a single character\\  
       
   204   $[c_1,c_2,\ldots,c_n]$ & matches a set of characters---for character ranges\\
       
   205   $\textit{ALL}$ & matches any character
       
   206 \end{tabular}
       
   207 \end{center}
       
   208 
       
   209 \noindent
       
   210 The latter is useful for matching any string (for example
       
   211 by using $\textit{ALL}^*$). In order to avoid having an explicit constructor
       
   212 for each case, we can generalise all these cases and introduce a single
       
   213 constructor $\textit{CFUN}(f)$ where $f$ is a function from characters
       
   214 to booleans. In Scala code this would look as follows:
       
   215 
       
   216 \begin{lstlisting}[numbers=none]
       
   217 abstract class Rexp
       
   218 ...
       
   219 case class CFUN(f: Char => Boolean) extends Rexp 
       
   220 \end{lstlisting}\smallskip
       
   221 
       
   222 \noindent
       
   223 The idea is that the function $f$ determines which character(s)
       
   224 are matched, namely those where $f$ returns \texttt{true}.
       
   225 In this question implement \textit{CFUN} and define
       
   226 
       
   227 \begin{center}
       
   228 \begin{tabular}{@ {}l@ {\hspace{2mm}}c@ {\hspace{2mm}}l@ {}}
       
   229   $\textit{nullable}(\textit{CFUN}(f))$  & $\dn$ & $?$\\
       
   230   $\textit{der}\,c\,(\textit{CFUN}(f))$  & $\dn$ & $?$
       
   231 \end{tabular}
       
   232 \end{center}
       
   233 
       
   234 \noindent in your matcher and then also give definitions for
       
   235 
       
   236 \begin{center}
       
   237 \begin{tabular}{@ {}l@ {\hspace{2mm}}c@ {\hspace{2mm}}l@ {}}
       
   238   $c$  & $\dn$ & $\textit{CFUN}(?)$\\
       
   239   $[c_1,c_2,\ldots,c_n]$  & $\dn$ & $\textit{CFUN}(?)$\\
       
   240   $\textit{ALL}$  & $\dn$ & $\textit{CFUN}(?)$
       
   241 \end{tabular}
       
   242 \end{center}
       
   243 
       
   244 \noindent
       
   245 You can either add the constructor $CFUN$ to your implementation in
       
   246 Question 3, or you can implement this questions first
       
   247 and then use $CFUN$ instead of \code{RANGE} and \code{CHAR} in Question 3.
       
   248 
       
   249 
       
   250 \subsection*{Question 5}
       
   251 
       
   252 Suppose $[a\mbox{-}z0\mbox{-}9\_\,.\mbox{-}]$ stands for the regular expression
       
   253 
       
   254 \[[a,b,c,\ldots,z,0,\dots,9,\_,.,\mbox{-}]\;.\]
       
   255 
       
   256 \noindent
       
   257 Define in your code the following regular expression for email addresses
       
   258 
       
   259 \[
       
   260 ([a\mbox{-}z0\mbox{-}9\_\,.-]^+)\cdot @\cdot ([a\mbox{-}z0\mbox{-}9\,.-]^+)\cdot .\cdot ([a\mbox{-}z\,.]^{\{2,6\}})
       
   261 \]
       
   262 
       
   263 \noindent and calculate the derivative according to your own email
       
   264 address. When calculating the derivative, simplify all regular
       
   265 expressions as much as possible by applying the
       
   266 following 7 simplification rules:
       
   267 
       
   268 \begin{center}
       
   269 \begin{tabular}{l@{\hspace{2mm}}c@{\hspace{2mm}}ll}
       
   270 $r \cdot \ZERO$ & $\mapsto$ & $\ZERO$\\ 
       
   271 $\ZERO \cdot r$ & $\mapsto$ & $\ZERO$\\ 
       
   272 $r \cdot \ONE$ & $\mapsto$ & $r$\\ 
       
   273 $\ONE \cdot r$ & $\mapsto$ & $r$\\ 
       
   274 $r + \ZERO$ & $\mapsto$ & $r$\\ 
       
   275 $\ZERO + r$ & $\mapsto$ & $r$\\ 
       
   276 $r + r$ & $\mapsto$ & $r$\\ 
       
   277 \end{tabular}
       
   278 \end{center}
       
   279 
       
   280 \noindent Write down your simplified derivative in a readable
       
   281 notation using parentheses where necessary. That means you
       
   282 should use the infix notation $+$, $\cdot$, $^*$ and so on,
       
   283 instead of raw code.\bigskip
       
   284 
       
   285 
       
   286 \subsection*{Question 6}
       
   287 
       
   288 Implement the simplification rules in your regular expression matcher.
       
   289 Consider the regular expression $/ \cdot * \cdot
       
   290 (\sim{}(\textit{ALL}^* \cdot * \cdot / \cdot \textit{ALL}^*)) \cdot *
       
   291 \cdot /$ and decide whether the following four strings are matched by
       
   292 this regular expression. Answer yes or no.
       
   293 
       
   294 \begin{enumerate}
       
   295 \item \texttt{"/**/"}
       
   296 \item \texttt{"/*foobar*/"}
       
   297 \item \texttt{"/*test*/test*/"}
       
   298 \item \texttt{"/*test/*test*/"}
       
   299 \end{enumerate}
       
   300 
       
   301 \subsection*{Question 7}
       
   302 
       
   303 Let $r_1$ be the regular expression $a\cdot a\cdot a$ and $r_2$ be
       
   304 $(a^{\{19,19\}}) \cdot (a^?)$.\medskip
       
   305 
       
   306 \noindent
       
   307 Decide whether the following three
       
   308 strings consisting of $a$s only can be matched by $(r_1^+)^+$.
       
   309 Similarly test them with $(r_2^+)^+$. Again answer in all six cases
       
   310 with yes or no. \medskip
       
   311 
       
   312 \noindent
       
   313 These are strings are meant to be entirely made up of $a$s. Be careful
       
   314 when copy-and-pasting the strings so as to not forgetting any $a$ and
       
   315 to not introducing any other character.
       
   316 
       
   317 \begin{enumerate}
       
   318 \setcounter{enumi}{4}
       
   319 \item \texttt{"aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa\\
       
   320 aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa\\
       
   321 aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa"}
       
   322 \item \texttt{"aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa\\ 
       
   323 aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa\\ 
       
   324 aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa"}
       
   325 \item \texttt{"aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa\\ 
       
   326 aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa\\ 
       
   327 aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa"}
       
   328 \end{enumerate}
       
   329 
       
   330 
       
   331 
       
   332 \end{document}
       
   333 
       
   334 %%% Local Variables: 
       
   335 %%% mode: latex
       
   336 %%% TeX-master: t
       
   337 %%% End: