cws/cw03.tex
changeset 78 85f2f75abeeb
parent 75 71e463b33a9e
child 79 2d57b0d43a0f
equal deleted inserted replaced
75:71e463b33a9e 78:85f2f75abeeb
     1 \documentclass{article}
     1 \documentclass{article}
     2 \usepackage{../style}
     2 \usepackage{../style}
     3 %%\usepackage{../langs}
     3 \usepackage{../langs}
     4 
     4 
     5 \begin{document}
     5 \begin{document}
     6 
     6 
     7 \section*{Coursework 8 (Scala, Regular Expressions}
     7 \section*{Coursework 8 (Scala, Regular Expressions}
     8 
     8 
     9 This coursework is worth 10\%. It is about regular expressions and
     9 This coursework is worth 10\%. It is about regular expressions and
    10 pattern matching. The first part is due on 30 November at 11pm; the
    10 pattern matching. The first part is due on 30 November at 11pm; the
    11 second, more advanced part, is due on 7 December at 11pm. The
    11 second, more advanced part, is due on 7 December at 11pm. You are
    12 second part is not yet included. For the first part you are
       
    13 asked to implement a regular expression matcher. Make sure the files
    12 asked to implement a regular expression matcher. Make sure the files
    14 you submit can be processed by just calling \texttt{scala
    13 you submit can be processed by just calling \texttt{scala
    15   <<filename.scala>>}.\bigskip
    14   <<filename.scala>>}.\bigskip
    16 
    15 
    17 \noindent
    16 \noindent
   173   $\textit{ders}\;(c::cs)\;r$  & $\dn$ &
   172   $\textit{ders}\;(c::cs)\;r$  & $\dn$ &
   174     $\textit{ders}\;cs\;(\textit{simp}(\textit{der}\;c\;r))$\\
   173     $\textit{ders}\;cs\;(\textit{simp}(\textit{der}\;c\;r))$\\
   175 \end{tabular}
   174 \end{tabular}
   176 \end{center}
   175 \end{center}
   177 
   176 
   178 The second, called \textit{matcher}, takes a string and a regular expression
   177 Note that this function is different from \textit{der}, which only
   179 as arguments. It builds first the derivatives according to \textit{ders}
   178 takes a single character.
   180 and after that tests whether the resulting derivative regular expression can match
   179 
   181 the empty string (using \textit{nullable}).
   180 The second function, called \textit{matcher}, takes a string and a
   182 For example the \textit{matcher} will produce true given the
   181 regular expression as arguments. It builds first the derivatives
   183 regular expression $(a\cdot b)\cdot c$ and the string $abc$.
   182 according to \textit{ders} and after that tests whether the resulting
   184 \hfill[1 Mark]
   183 derivative regular expression can match the empty string (using
       
   184 \textit{nullable}).  For example the \textit{matcher} will produce
       
   185 true given the regular expression $(a\cdot b)\cdot c$ and the string
       
   186 $abc$.\\  \mbox{}\hfill[1 Mark]
   185 
   187 
   186 \item[(1e)] Implement the function $\textit{replace}\;r\;s_1\;s_2$: it searches
   188 \item[(1e)] Implement the function $\textit{replace}\;r\;s_1\;s_2$: it searches
   187   (from the left to 
   189   (from the left to 
   188 right) in the string $s_1$ all the non-empty substrings that match the 
   190 right) in the string $s_1$ all the non-empty substrings that match the 
   189 regular expression $r$---these substrings are assumed to be
   191 regular expression $r$---these substrings are assumed to be
   192 are replaced by $s_2$. For example given the regular expression
   194 are replaced by $s_2$. For example given the regular expression
   193 
   195 
   194 \[(a \cdot a)^* + (b \cdot b)\]
   196 \[(a \cdot a)^* + (b \cdot b)\]
   195 
   197 
   196 \noindent the string $s_1 = aabbbaaaaaaabaaaaabbaaaabb$ and
   198 \noindent the string $s_1 = aabbbaaaaaaabaaaaabbaaaabb$ and
   197 replacement string $s_2 = c$ yields the string
   199 replacement the string $s_2 = c$ yields the string
   198 
   200 
   199 \[
   201 \[
   200 ccbcabcaccc
   202 ccbcabcaccc
   201 \]
   203 \]
   202 
   204 
   203 \hfill[2 Mark]
   205 \hfill[2 Marks]
   204 \end{itemize}
   206 \end{itemize}
   205 
   207 
       
   208 
       
   209 
       
   210 
       
   211 \subsection*{Part 2 (4 Marks)}
       
   212 
       
   213 You need to copy all the code from \texttt{re.scala} into
       
   214 \texttt{re2.scala} in order to complete Part 2.  Parts (2a) and (2b)
       
   215 give you another method and datapoints for testing the \textit{der}
       
   216 and \textit{simp} functions from Part~1.
       
   217 
       
   218 \subsection*{Tasks (file re2.scala)}
       
   219 
       
   220 \begin{itemize}
       
   221 \item[(2a)] Write a \textbf{polymorphic} function, called
       
   222   \textit{iterT}, that is \textbf{tail-recursive}(!) and takes an
       
   223   integer $n$, a function $f$ and an $x$ as arguments. This function
       
   224   should iterate $f$ $n$-times starting with the argument $x$, like
       
   225 
       
   226   \[\underbrace{f(\ldots (f}_{n\text{-times}}(x)))
       
   227   \]
       
   228 
       
   229   More formally that means \textit{iterT} behaves as follows:
       
   230 
       
   231   \begin{center}
       
   232     \begin{tabular}{lcl}
       
   233       $\textit{iterT}(n, f, x)$ & $\dn$ &
       
   234       $\begin{cases}
       
   235         \;x & \textit{if}\;n = 0\\
       
   236         \;f(\textit{iterT}(n - 1, f, x)) & \textit{otherwise}
       
   237         \end{cases}$      
       
   238     \end{tabular}
       
   239 \end{center}
       
   240 
       
   241   Make sure you write a \textbf{tail-recursive} version of
       
   242   \textit{iterT}.  If you add the annotation \texttt{@tailrec} (see
       
   243   below) you should not get an error message.
       
   244 
       
   245   \begin{lstlisting}[language=Scala, numbers=none, xleftmargin=-1mm]  
       
   246   import scala.annotation.tailrec
       
   247 
       
   248   @tailrec
       
   249   def iterT[A](n: Int, f: A => A, x: A): A = ...
       
   250   \end{lstlisting}
       
   251 
       
   252   You can assume that \textit{iterT} will only be called for positive
       
   253   integers $0 \le n$. Given the type variable \texttt{A}, the type of
       
   254   $f$ is \texttt{A => A} and the type of $x$ is \texttt{A}. This means
       
   255   \textit{iterT} can be used, for example, for functions from integers
       
   256   to integers, or strings to strings.  \\ \mbox{}\hfill[2 Marks]
       
   257 
       
   258 \item[(2b)] Implement a function, called \textit{size}, by recursion
       
   259   over regular expressions. If a regular expression is seen as a tree,
       
   260   then \textit{size} should return the number of nodes in such a
       
   261   tree. Therefore this function is defined as follows:
       
   262 
       
   263 \begin{center}
       
   264 \begin{tabular}{lcl}
       
   265 $\textit{size}(\ZERO)$ & $\dn$ & $1$\\
       
   266 $\textit{size}(\ONE)$  & $\dn$ & $1$\\
       
   267 $\textit{size}(c)$     & $\dn$ & $1$\\
       
   268 $\textit{size}(r_1 + r_2)$ & $\dn$ & $1 + \textit{size}(r_1) + \textit{size}(r_2)$\\
       
   269 $\textit{size}(r_1 \cdot r_2)$ & $\dn$ & $1 + \textit{size}(r_1) + \textit{size}(r_2)$\\
       
   270 $\textit{size}(r^*)$ & $\dn$ & $1 + \textit{size}(r)$\\
       
   271 \end{tabular}
       
   272 \end{center}
       
   273 
       
   274 You can use \textit{size} and \textit{iterT} in order to test how much
       
   275 the 'evil' regular expression $(a^*)^* \cdot b$ grows when taking
       
   276 successive derivatives according the letter $a$ and compare it to
       
   277 taking the derivative, but simlifying the derivative after each step.
       
   278 For example, the calls
       
   279 
       
   280   \begin{lstlisting}[language=Scala, numbers=none, xleftmargin=-1mm]  
       
   281   size(iterT(20, (r: Rexp) => der('a', r), EVIL))
       
   282   size(iterT(20, (r: Rexp) => simp(der('a', r)), EVIL))
       
   283   \end{lstlisting}
       
   284 
       
   285   produce without simplification a regular expression of size of
       
   286   7340068 for the derivative after 20 iterations, while the latter is
       
   287   just 8.\\ \mbox{}\hfill[1 Mark]
       
   288   
       
   289 
       
   290 \item[(2c)] Write a \textbf{polymorphic} function, called
       
   291   \textit{fixpT}, that takes
       
   292   a function $f$ and an $x$ as arguments. The purpose
       
   293   of \textit{fixpT} is to calculate a fixpoint of the function $f$
       
   294   starting from the argument $x$.
       
   295   A fixpoint, say $y$, is when $f(y) = y$ holds. 
       
   296   That means \textit{fixpT} behaves as follows:
       
   297 
       
   298   \begin{center}
       
   299     \begin{tabular}{lcl}
       
   300       $\textit{fixpT}(f, x)$ & $\dn$ &
       
   301       $\begin{cases}
       
   302         \;x & \textit{if}\;f(x) = x\\
       
   303         \;\textit{fixpT}(f, f(x)) & \textit{otherwise}
       
   304         \end{cases}$      
       
   305     \end{tabular}
       
   306 \end{center}
       
   307 
       
   308   Make sure you calculate in the code of $\textit{fixpT}$ the result
       
   309   of $f(x)$ only once. Given the type variable \texttt{A} in
       
   310   $\textit{fixpT}$, the type of $f$ is \texttt{A => A} and the type of
       
   311   $x$ is \texttt{A}. The file \texttt{re2.scala} gives two example
       
   312   function where in one the fixpoint is 1 and in the other
       
   313   it is the string $a$.\\ \mbox{}\hfill[1 Mark]  
       
   314 \end{itemize}\bigskip  
       
   315 
       
   316 
       
   317 
       
   318 \noindent
   206 \textbf{Background} Although easily implementable in Scala, the idea
   319 \textbf{Background} Although easily implementable in Scala, the idea
   207 behind the derivative function might not so easy to be seen. To
   320 behind the derivative function might not so easy to be seen. To
   208 understand its purpose better, assume a regular expression $r$ can
   321 understand its purpose better, assume a regular expression $r$ can
   209 match strings of the form $c::cs$ (that means strings which start with
   322 match strings of the form $c::cs$ (that means strings which start with
   210 a character $c$ and have some rest $cs$). If you now take the
   323 a character $c$ and have some rest, or tail, $cs$). If you now take the
   211 derivative of $r$ with respect to the character $c$, then you obtain a
   324 derivative of $r$ with respect to the character $c$, then you obtain a
   212 regular expressions that can match all the strings $cs$.  In other
   325 regular expressions that can match all the strings $cs$.  In other
   213 words the regular expression $\textit{der}\;c\;r$ can match the same
   326 words, the regular expression $\textit{der}\;c\;r$ can match the same
   214 strings $c::cs$ that can be matched by $r$, except that the $c$ is
   327 strings $c::cs$ that can be matched by $r$, except that the $c$ is
   215 chopped off.
   328 chopped off.
   216 
   329 
   217 Assume now $r$ can match the string $abc$. If you take the derivative
   330 Assume now $r$ can match the string $abc$. If you take the derivative
   218 according to $a$ then you obtain a regular expression that can match
   331 according to $a$ then you obtain a regular expression that can match
   219 $bc$ (it is $abc$ where the $a$ has been chopped off). If you now
   332 $bc$ (it is $abc$ where the $a$ has been chopped off). If you now
   220 build the derivative $\textit{der}\;b\;(\textit{der}\;a\;r))$ you obtain a regular
   333 build the derivative $\textit{der}\;b\;(\textit{der}\;a\;r))$ you
   221 expression that can match the string "c" (it is "bc" where 'b' is
   334 obtain a regular expression that can match the string $c$ (it is $bc$
   222 chopped off). If you finally build the derivative of this according
   335 where $b$ is chopped off). If you finally build the derivative of this
   223 'c', that is der('c', der('b', der('a', r))), you obtain a regular
   336 according $c$, that is
   224 expression that can match the empty string. You can test this using
   337 $\textit{der}\;c\;(\textit{der}\;b\;(\textit{der}\;a\;r)))$, you
   225 the function nullable, which is what your matcher is doing.
   338 obtain a regular expression that can match the empty string. You can
   226 
   339 test this using the function nullable, which is what your matcher is
   227 The purpose of the simp function is to keep the regular expression small. Normally the derivative function makes the regular expression bigger (see the SEQ case) and the algorithm would be slower and slower over time. The simp function counters this increase in size and the result is that the algorithm is fast throughout. 
   340 doing.
   228 By the way this whole idea is by Janusz Brzozowski who came up with this in 1964 in his PhD thesis. 
   341 
   229 https://en.wikipedia.org/wiki/Janusz_Brzozowski_(computer_scientist)
   342 The purpose of the simp function is to keep the regular expression
   230 
   343 small. Normally the derivative function makes the regular expression
   231 
   344 bigger (see the SEQ case and the example in (2b)) and the algorithm
   232 
   345 would be slower and slower over time. The simp function counters this
   233 \subsection*{Part 2 (4 Marks)}
   346 increase in size and the result is that the algorithm is fast
   234 
   347 throughout.  By the way, this algorithm is by Janusz Brzozowski who
   235 Coming soon.
   348 came up with the idea of derivatives in 1964 in his PhD thesis.
       
   349 
       
   350 \begin{center}\small
       
   351 \url{https://en.wikipedia.org/wiki/Janusz_Brzozowski_(computer_scientist)}
       
   352 \end{center}
   236 
   353 
   237 \end{document}
   354 \end{document}
   238 
   355 
   239 
   356 
   240 %%% Local Variables: 
   357 %%% Local Variables: