cws/cw03.tex
author Christian Urban <urbanc@in.tum.de>
Sat, 26 Nov 2016 19:12:33 +0000
changeset 75 71e463b33a9e
parent 74 fb2d8012ed08
child 78 85f2f75abeeb
permissions -rw-r--r--
updated
Ignore whitespace changes - Everywhere: Within whitespace: At end of lines:
6
aae256985251 updated
Christian Urban <urbanc@in.tum.de>
parents:
diff changeset
     1
\documentclass{article}
62
2151c77e1e24 updated
Christian Urban <urbanc@in.tum.de>
parents: 6
diff changeset
     2
\usepackage{../style}
6
aae256985251 updated
Christian Urban <urbanc@in.tum.de>
parents:
diff changeset
     3
%%\usepackage{../langs}
aae256985251 updated
Christian Urban <urbanc@in.tum.de>
parents:
diff changeset
     4
aae256985251 updated
Christian Urban <urbanc@in.tum.de>
parents:
diff changeset
     5
\begin{document}
aae256985251 updated
Christian Urban <urbanc@in.tum.de>
parents:
diff changeset
     6
62
2151c77e1e24 updated
Christian Urban <urbanc@in.tum.de>
parents: 6
diff changeset
     7
\section*{Coursework 8 (Scala, Regular Expressions}
6
aae256985251 updated
Christian Urban <urbanc@in.tum.de>
parents:
diff changeset
     8
68
8da9e0c16194 updated
Christian Urban <urbanc@in.tum.de>
parents: 62
diff changeset
     9
This coursework is worth 10\%. It is about regular expressions and
8da9e0c16194 updated
Christian Urban <urbanc@in.tum.de>
parents: 62
diff changeset
    10
pattern matching. The first part is due on 30 November at 11pm; the
8da9e0c16194 updated
Christian Urban <urbanc@in.tum.de>
parents: 62
diff changeset
    11
second, more advanced part, is due on 7 December at 11pm. The
8da9e0c16194 updated
Christian Urban <urbanc@in.tum.de>
parents: 62
diff changeset
    12
second part is not yet included. For the first part you are
8da9e0c16194 updated
Christian Urban <urbanc@in.tum.de>
parents: 62
diff changeset
    13
asked to implement a regular expression matcher. Make sure the files
62
2151c77e1e24 updated
Christian Urban <urbanc@in.tum.de>
parents: 6
diff changeset
    14
you submit can be processed by just calling \texttt{scala
68
8da9e0c16194 updated
Christian Urban <urbanc@in.tum.de>
parents: 62
diff changeset
    15
  <<filename.scala>>}.\bigskip
62
2151c77e1e24 updated
Christian Urban <urbanc@in.tum.de>
parents: 6
diff changeset
    16
2151c77e1e24 updated
Christian Urban <urbanc@in.tum.de>
parents: 6
diff changeset
    17
\noindent
2151c77e1e24 updated
Christian Urban <urbanc@in.tum.de>
parents: 6
diff changeset
    18
\textbf{Important:} Do not use any mutable data structures in your
74
fb2d8012ed08 updated
Christian Urban <urbanc@in.tum.de>
parents: 69
diff changeset
    19
submission! They are not needed. This excludes the use of
62
2151c77e1e24 updated
Christian Urban <urbanc@in.tum.de>
parents: 6
diff changeset
    20
\texttt{ListBuffer}s, for example. Do not use \texttt{return} in your
68
8da9e0c16194 updated
Christian Urban <urbanc@in.tum.de>
parents: 62
diff changeset
    21
code! It has a different meaning in Scala, than in Java.  Do not use
8da9e0c16194 updated
Christian Urban <urbanc@in.tum.de>
parents: 62
diff changeset
    22
\texttt{var}! This declares a mutable variable.  Make sure the
62
2151c77e1e24 updated
Christian Urban <urbanc@in.tum.de>
parents: 6
diff changeset
    23
functions you submit are defined on the ``top-level'' of Scala, not
2151c77e1e24 updated
Christian Urban <urbanc@in.tum.de>
parents: 6
diff changeset
    24
inside a class or object.
6
aae256985251 updated
Christian Urban <urbanc@in.tum.de>
parents:
diff changeset
    25
aae256985251 updated
Christian Urban <urbanc@in.tum.de>
parents:
diff changeset
    26
68
8da9e0c16194 updated
Christian Urban <urbanc@in.tum.de>
parents: 62
diff changeset
    27
\subsection*{Disclaimer!!!!!!!!}
6
aae256985251 updated
Christian Urban <urbanc@in.tum.de>
parents:
diff changeset
    28
aae256985251 updated
Christian Urban <urbanc@in.tum.de>
parents:
diff changeset
    29
It should be understood that the work you submit represents
68
8da9e0c16194 updated
Christian Urban <urbanc@in.tum.de>
parents: 62
diff changeset
    30
your own effort! You have not copied from anyone else. An
6
aae256985251 updated
Christian Urban <urbanc@in.tum.de>
parents:
diff changeset
    31
exception is the Scala code I showed during the lectures or
aae256985251 updated
Christian Urban <urbanc@in.tum.de>
parents:
diff changeset
    32
uploaded to KEATS, which you can freely use.\bigskip
aae256985251 updated
Christian Urban <urbanc@in.tum.de>
parents:
diff changeset
    33
aae256985251 updated
Christian Urban <urbanc@in.tum.de>
parents:
diff changeset
    34
68
8da9e0c16194 updated
Christian Urban <urbanc@in.tum.de>
parents: 62
diff changeset
    35
\subsection*{Part 1 (6 Marks)}
6
aae256985251 updated
Christian Urban <urbanc@in.tum.de>
parents:
diff changeset
    36
69
f1295a0ab4ed updated
Christian Urban <urbanc@in.tum.de>
parents: 68
diff changeset
    37
The task is to implement a regular expression matcher that is based on
68
8da9e0c16194 updated
Christian Urban <urbanc@in.tum.de>
parents: 62
diff changeset
    38
derivatives of regular expressions. The implementation can deal
8da9e0c16194 updated
Christian Urban <urbanc@in.tum.de>
parents: 62
diff changeset
    39
with the following regular expressions, which have been predefined
69
f1295a0ab4ed updated
Christian Urban <urbanc@in.tum.de>
parents: 68
diff changeset
    40
in the file re.scala:
6
aae256985251 updated
Christian Urban <urbanc@in.tum.de>
parents:
diff changeset
    41
aae256985251 updated
Christian Urban <urbanc@in.tum.de>
parents:
diff changeset
    42
\begin{center}
aae256985251 updated
Christian Urban <urbanc@in.tum.de>
parents:
diff changeset
    43
\begin{tabular}{lcll}
aae256985251 updated
Christian Urban <urbanc@in.tum.de>
parents:
diff changeset
    44
  $r$ & $::=$ & $\ZERO$     & cannot match anything\\
aae256985251 updated
Christian Urban <urbanc@in.tum.de>
parents:
diff changeset
    45
      &   $|$ & $\ONE$      & can only match the empty string\\
68
8da9e0c16194 updated
Christian Urban <urbanc@in.tum.de>
parents: 62
diff changeset
    46
      &   $|$ & $c$         & can match a character (in this case $c$)\\
8da9e0c16194 updated
Christian Urban <urbanc@in.tum.de>
parents: 62
diff changeset
    47
      &   $|$ & $r_1 + r_2$ & can match a string either with $r_1$ or with $r_2$\\
8da9e0c16194 updated
Christian Urban <urbanc@in.tum.de>
parents: 62
diff changeset
    48
  &   $|$ & $r_1\cdot r_2$ & can match the first part of a string with $r_1$ and\\
8da9e0c16194 updated
Christian Urban <urbanc@in.tum.de>
parents: 62
diff changeset
    49
          &  & & then the second part with $r_2$\\
6
aae256985251 updated
Christian Urban <urbanc@in.tum.de>
parents:
diff changeset
    50
      &   $|$ & $r^*$       & can match zero or more times $r$\\
aae256985251 updated
Christian Urban <urbanc@in.tum.de>
parents:
diff changeset
    51
\end{tabular}
aae256985251 updated
Christian Urban <urbanc@in.tum.de>
parents:
diff changeset
    52
\end{center}
aae256985251 updated
Christian Urban <urbanc@in.tum.de>
parents:
diff changeset
    53
68
8da9e0c16194 updated
Christian Urban <urbanc@in.tum.de>
parents: 62
diff changeset
    54
\noindent 
8da9e0c16194 updated
Christian Urban <urbanc@in.tum.de>
parents: 62
diff changeset
    55
Why? Knowing how to match regular expressions and strings fast will
8da9e0c16194 updated
Christian Urban <urbanc@in.tum.de>
parents: 62
diff changeset
    56
let you solve a lot of problems that vex other humans. Regular
8da9e0c16194 updated
Christian Urban <urbanc@in.tum.de>
parents: 62
diff changeset
    57
expressions are one of the fastest and simplest ways to match patterns
8da9e0c16194 updated
Christian Urban <urbanc@in.tum.de>
parents: 62
diff changeset
    58
in text, and are endlessly useful for searching, editing and
8da9e0c16194 updated
Christian Urban <urbanc@in.tum.de>
parents: 62
diff changeset
    59
analysing text in all sorts of places. However, you need to be
69
f1295a0ab4ed updated
Christian Urban <urbanc@in.tum.de>
parents: 68
diff changeset
    60
fast, otherwise you will stumble over problems such as recently reported at
68
8da9e0c16194 updated
Christian Urban <urbanc@in.tum.de>
parents: 62
diff changeset
    61
8da9e0c16194 updated
Christian Urban <urbanc@in.tum.de>
parents: 62
diff changeset
    62
{\small
8da9e0c16194 updated
Christian Urban <urbanc@in.tum.de>
parents: 62
diff changeset
    63
\begin{itemize}
8da9e0c16194 updated
Christian Urban <urbanc@in.tum.de>
parents: 62
diff changeset
    64
\item[$\bullet$] \url{http://stackstatus.net/post/147710624694/outage-postmortem-july-20-2016}
8da9e0c16194 updated
Christian Urban <urbanc@in.tum.de>
parents: 62
diff changeset
    65
\item[$\bullet$] \url{https://vimeo.com/112065252}
8da9e0c16194 updated
Christian Urban <urbanc@in.tum.de>
parents: 62
diff changeset
    66
\item[$\bullet$] \url{http://davidvgalbraith.com/how-i-fixed-atom/}  
8da9e0c16194 updated
Christian Urban <urbanc@in.tum.de>
parents: 62
diff changeset
    67
\end{itemize}}
8da9e0c16194 updated
Christian Urban <urbanc@in.tum.de>
parents: 62
diff changeset
    68
8da9e0c16194 updated
Christian Urban <urbanc@in.tum.de>
parents: 62
diff changeset
    69
\subsection*{Tasks (file re.scala)}
8da9e0c16194 updated
Christian Urban <urbanc@in.tum.de>
parents: 62
diff changeset
    70
8da9e0c16194 updated
Christian Urban <urbanc@in.tum.de>
parents: 62
diff changeset
    71
\begin{itemize}
8da9e0c16194 updated
Christian Urban <urbanc@in.tum.de>
parents: 62
diff changeset
    72
\item[(1a)] Implement a function, called \textit{nullable}, by recursion over
69
f1295a0ab4ed updated
Christian Urban <urbanc@in.tum.de>
parents: 68
diff changeset
    73
  regular expressions. This function tests whether a regular expression can match
68
8da9e0c16194 updated
Christian Urban <urbanc@in.tum.de>
parents: 62
diff changeset
    74
  the empty string.
6
aae256985251 updated
Christian Urban <urbanc@in.tum.de>
parents:
diff changeset
    75
aae256985251 updated
Christian Urban <urbanc@in.tum.de>
parents:
diff changeset
    76
\begin{center}
aae256985251 updated
Christian Urban <urbanc@in.tum.de>
parents:
diff changeset
    77
\begin{tabular}{lcl}
aae256985251 updated
Christian Urban <urbanc@in.tum.de>
parents:
diff changeset
    78
$\textit{nullable}(\ZERO)$ & $\dn$ & $\textit{false}$\\
aae256985251 updated
Christian Urban <urbanc@in.tum.de>
parents:
diff changeset
    79
$\textit{nullable}(\ONE)$  & $\dn$ & $\textit{true}$\\
aae256985251 updated
Christian Urban <urbanc@in.tum.de>
parents:
diff changeset
    80
$\textit{nullable}(c)$     & $\dn$ & $\textit{false}$\\
aae256985251 updated
Christian Urban <urbanc@in.tum.de>
parents:
diff changeset
    81
$\textit{nullable}(r_1 + r_2)$ & $\dn$ & $\textit{nullable}(r_1) \vee \textit{nullable}(r_2)$\\
aae256985251 updated
Christian Urban <urbanc@in.tum.de>
parents:
diff changeset
    82
$\textit{nullable}(r_1 \cdot r_2)$ & $\dn$ & $\textit{nullable}(r_1) \wedge \textit{nullable}(r_2)$\\
aae256985251 updated
Christian Urban <urbanc@in.tum.de>
parents:
diff changeset
    83
$\textit{nullable}(r^*)$ & $\dn$ & $\textit{true}$\\
aae256985251 updated
Christian Urban <urbanc@in.tum.de>
parents:
diff changeset
    84
\end{tabular}
68
8da9e0c16194 updated
Christian Urban <urbanc@in.tum.de>
parents: 62
diff changeset
    85
\end{center}\hfill[1 Mark]
8da9e0c16194 updated
Christian Urban <urbanc@in.tum.de>
parents: 62
diff changeset
    86
8da9e0c16194 updated
Christian Urban <urbanc@in.tum.de>
parents: 62
diff changeset
    87
\item[(1b)] Implement a function, called \textit{der}, by recursion over
8da9e0c16194 updated
Christian Urban <urbanc@in.tum.de>
parents: 62
diff changeset
    88
  regular expressions. It takes a character and a regular expression
69
f1295a0ab4ed updated
Christian Urban <urbanc@in.tum.de>
parents: 68
diff changeset
    89
  as arguments and calculates the derivative regular expression according
f1295a0ab4ed updated
Christian Urban <urbanc@in.tum.de>
parents: 68
diff changeset
    90
  to the rules:
6
aae256985251 updated
Christian Urban <urbanc@in.tum.de>
parents:
diff changeset
    91
aae256985251 updated
Christian Urban <urbanc@in.tum.de>
parents:
diff changeset
    92
\begin{center}
aae256985251 updated
Christian Urban <urbanc@in.tum.de>
parents:
diff changeset
    93
\begin{tabular}{lcl}
aae256985251 updated
Christian Urban <urbanc@in.tum.de>
parents:
diff changeset
    94
$\textit{der}\;c\;(\ZERO)$ & $\dn$ & $\ZERO$\\
aae256985251 updated
Christian Urban <urbanc@in.tum.de>
parents:
diff changeset
    95
$\textit{der}\;c\;(\ONE)$  & $\dn$ & $\ZERO$\\
aae256985251 updated
Christian Urban <urbanc@in.tum.de>
parents:
diff changeset
    96
$\textit{der}\;c\;(d)$     & $\dn$ & $\textit{if}\; c = d\;\textit{then} \;\ONE \; \textit{else} \;\ZERO$\\
aae256985251 updated
Christian Urban <urbanc@in.tum.de>
parents:
diff changeset
    97
$\textit{der}\;c\;(r_1 + r_2)$ & $\dn$ & $(\textit{der}\;c\;r_1) + (\textit{der}\;c\;r_2)$\\
aae256985251 updated
Christian Urban <urbanc@in.tum.de>
parents:
diff changeset
    98
$\textit{der}\;c\;(r_1 \cdot r_2)$ & $\dn$ & $\textit{if}\;\textit{nullable}(r_1)$\\
aae256985251 updated
Christian Urban <urbanc@in.tum.de>
parents:
diff changeset
    99
      & & $\textit{then}\;((\textit{der}\;c\;r_1)\cdot r_2) + (\textit{der}\;c\;r_2)$\\
aae256985251 updated
Christian Urban <urbanc@in.tum.de>
parents:
diff changeset
   100
      & & $\textit{else}\;(\textit{der}\;c\;r_1)\cdot r_2$\\
aae256985251 updated
Christian Urban <urbanc@in.tum.de>
parents:
diff changeset
   101
$\textit{der}\;c\;(r^*)$ & $\dn$ & $(\textit{der}\;c\;r)\cdot (r^*)$\\
aae256985251 updated
Christian Urban <urbanc@in.tum.de>
parents:
diff changeset
   102
\end{tabular}
69
f1295a0ab4ed updated
Christian Urban <urbanc@in.tum.de>
parents: 68
diff changeset
   103
\end{center}
f1295a0ab4ed updated
Christian Urban <urbanc@in.tum.de>
parents: 68
diff changeset
   104
f1295a0ab4ed updated
Christian Urban <urbanc@in.tum.de>
parents: 68
diff changeset
   105
For example given the regular expression $r = (a \cdot b) \cdot c$, the derivatives
f1295a0ab4ed updated
Christian Urban <urbanc@in.tum.de>
parents: 68
diff changeset
   106
w.r.t.~the characters $a$, $b$ and $c$ are
f1295a0ab4ed updated
Christian Urban <urbanc@in.tum.de>
parents: 68
diff changeset
   107
f1295a0ab4ed updated
Christian Urban <urbanc@in.tum.de>
parents: 68
diff changeset
   108
\begin{center}
f1295a0ab4ed updated
Christian Urban <urbanc@in.tum.de>
parents: 68
diff changeset
   109
  \begin{tabular}{lcll}
f1295a0ab4ed updated
Christian Urban <urbanc@in.tum.de>
parents: 68
diff changeset
   110
    $\textit{der}\;a\;r$ & $=$ & $(\ONE \cdot b)\cdot c$ & ($= r'$)\\
f1295a0ab4ed updated
Christian Urban <urbanc@in.tum.de>
parents: 68
diff changeset
   111
    $\textit{der}\;b\;r$ & $=$ & $(\ZERO \cdot b)\cdot c$\\
f1295a0ab4ed updated
Christian Urban <urbanc@in.tum.de>
parents: 68
diff changeset
   112
    $\textit{der}\;c\;r$ & $=$ & $(\ZERO \cdot b)\cdot c$
f1295a0ab4ed updated
Christian Urban <urbanc@in.tum.de>
parents: 68
diff changeset
   113
  \end{tabular}
f1295a0ab4ed updated
Christian Urban <urbanc@in.tum.de>
parents: 68
diff changeset
   114
\end{center}
f1295a0ab4ed updated
Christian Urban <urbanc@in.tum.de>
parents: 68
diff changeset
   115
f1295a0ab4ed updated
Christian Urban <urbanc@in.tum.de>
parents: 68
diff changeset
   116
Let $r'$ stand for the first derivative, then taking the derivatives of $r'$
f1295a0ab4ed updated
Christian Urban <urbanc@in.tum.de>
parents: 68
diff changeset
   117
w.r.t.~the characters $a$, $b$ and $c$ gives
f1295a0ab4ed updated
Christian Urban <urbanc@in.tum.de>
parents: 68
diff changeset
   118
f1295a0ab4ed updated
Christian Urban <urbanc@in.tum.de>
parents: 68
diff changeset
   119
\begin{center}
f1295a0ab4ed updated
Christian Urban <urbanc@in.tum.de>
parents: 68
diff changeset
   120
  \begin{tabular}{lcll}
f1295a0ab4ed updated
Christian Urban <urbanc@in.tum.de>
parents: 68
diff changeset
   121
    $\textit{der}\;a\;r'$ & $=$ & $((\ZERO \cdot b) + \ZERO)\cdot c$ \\
f1295a0ab4ed updated
Christian Urban <urbanc@in.tum.de>
parents: 68
diff changeset
   122
    $\textit{der}\;b\;r'$ & $=$ & $((\ZERO \cdot b) + \ONE)\cdot c$ & ($= r''$)\\
f1295a0ab4ed updated
Christian Urban <urbanc@in.tum.de>
parents: 68
diff changeset
   123
    $\textit{der}\;c\;r'$ & $=$ & $((\ZERO \cdot b) + \ZERO)\cdot c$
f1295a0ab4ed updated
Christian Urban <urbanc@in.tum.de>
parents: 68
diff changeset
   124
  \end{tabular}
f1295a0ab4ed updated
Christian Urban <urbanc@in.tum.de>
parents: 68
diff changeset
   125
\end{center}
f1295a0ab4ed updated
Christian Urban <urbanc@in.tum.de>
parents: 68
diff changeset
   126
f1295a0ab4ed updated
Christian Urban <urbanc@in.tum.de>
parents: 68
diff changeset
   127
One more example: Let $r''$ stand for the second derivative above,
f1295a0ab4ed updated
Christian Urban <urbanc@in.tum.de>
parents: 68
diff changeset
   128
then taking the derivatives of $r''$ w.r.t.~the characters $a$, $b$
f1295a0ab4ed updated
Christian Urban <urbanc@in.tum.de>
parents: 68
diff changeset
   129
and $c$ gives
f1295a0ab4ed updated
Christian Urban <urbanc@in.tum.de>
parents: 68
diff changeset
   130
f1295a0ab4ed updated
Christian Urban <urbanc@in.tum.de>
parents: 68
diff changeset
   131
\begin{center}
f1295a0ab4ed updated
Christian Urban <urbanc@in.tum.de>
parents: 68
diff changeset
   132
  \begin{tabular}{lcll}
f1295a0ab4ed updated
Christian Urban <urbanc@in.tum.de>
parents: 68
diff changeset
   133
    $\textit{der}\;a\;r''$ & $=$ & $((\ZERO \cdot b) + \ZERO) \cdot c + \ZERO$ \\
f1295a0ab4ed updated
Christian Urban <urbanc@in.tum.de>
parents: 68
diff changeset
   134
    $\textit{der}\;b\;r''$ & $=$ & $((\ZERO \cdot b) + \ZERO) \cdot c + \ZERO$\\
f1295a0ab4ed updated
Christian Urban <urbanc@in.tum.de>
parents: 68
diff changeset
   135
    $\textit{der}\;c\;r''$ & $=$ & $((\ZERO \cdot b) + \ZERO) \cdot c + \ONE$
f1295a0ab4ed updated
Christian Urban <urbanc@in.tum.de>
parents: 68
diff changeset
   136
  \end{tabular}
f1295a0ab4ed updated
Christian Urban <urbanc@in.tum.de>
parents: 68
diff changeset
   137
\end{center}
f1295a0ab4ed updated
Christian Urban <urbanc@in.tum.de>
parents: 68
diff changeset
   138
f1295a0ab4ed updated
Christian Urban <urbanc@in.tum.de>
parents: 68
diff changeset
   139
Note, the last derivative can match the empty string, that is it is \textit{nullable}.\\
f1295a0ab4ed updated
Christian Urban <urbanc@in.tum.de>
parents: 68
diff changeset
   140
\mbox{}\hfill\mbox{[1 Mark]}
6
aae256985251 updated
Christian Urban <urbanc@in.tum.de>
parents:
diff changeset
   141
68
8da9e0c16194 updated
Christian Urban <urbanc@in.tum.de>
parents: 62
diff changeset
   142
\item[(1c)] Implement the function \textit{simp}, which recursively
69
f1295a0ab4ed updated
Christian Urban <urbanc@in.tum.de>
parents: 68
diff changeset
   143
  traverses a regular expression from the inside to the outside, and
f1295a0ab4ed updated
Christian Urban <urbanc@in.tum.de>
parents: 68
diff changeset
   144
  simplifies every sub-regular-expression on the left (see below) to
68
8da9e0c16194 updated
Christian Urban <urbanc@in.tum.de>
parents: 62
diff changeset
   145
  the regular expression on the right, except it does not simplify inside
8da9e0c16194 updated
Christian Urban <urbanc@in.tum.de>
parents: 62
diff changeset
   146
  ${}^*$-regular expressions.
6
aae256985251 updated
Christian Urban <urbanc@in.tum.de>
parents:
diff changeset
   147
68
8da9e0c16194 updated
Christian Urban <urbanc@in.tum.de>
parents: 62
diff changeset
   148
  \begin{center}
69
f1295a0ab4ed updated
Christian Urban <urbanc@in.tum.de>
parents: 68
diff changeset
   149
\begin{tabular}{l@{\hspace{4mm}}c@{\hspace{4mm}}ll}
6
aae256985251 updated
Christian Urban <urbanc@in.tum.de>
parents:
diff changeset
   150
$r \cdot \ZERO$ & $\mapsto$ & $\ZERO$\\ 
aae256985251 updated
Christian Urban <urbanc@in.tum.de>
parents:
diff changeset
   151
$\ZERO \cdot r$ & $\mapsto$ & $\ZERO$\\ 
aae256985251 updated
Christian Urban <urbanc@in.tum.de>
parents:
diff changeset
   152
$r \cdot \ONE$ & $\mapsto$ & $r$\\ 
aae256985251 updated
Christian Urban <urbanc@in.tum.de>
parents:
diff changeset
   153
$\ONE \cdot r$ & $\mapsto$ & $r$\\ 
aae256985251 updated
Christian Urban <urbanc@in.tum.de>
parents:
diff changeset
   154
$r + \ZERO$ & $\mapsto$ & $r$\\ 
aae256985251 updated
Christian Urban <urbanc@in.tum.de>
parents:
diff changeset
   155
$\ZERO + r$ & $\mapsto$ & $r$\\ 
aae256985251 updated
Christian Urban <urbanc@in.tum.de>
parents:
diff changeset
   156
$r + r$ & $\mapsto$ & $r$\\ 
aae256985251 updated
Christian Urban <urbanc@in.tum.de>
parents:
diff changeset
   157
\end{tabular}
68
8da9e0c16194 updated
Christian Urban <urbanc@in.tum.de>
parents: 62
diff changeset
   158
  \end{center}
8da9e0c16194 updated
Christian Urban <urbanc@in.tum.de>
parents: 62
diff changeset
   159
69
f1295a0ab4ed updated
Christian Urban <urbanc@in.tum.de>
parents: 68
diff changeset
   160
  For example the regular expression
68
8da9e0c16194 updated
Christian Urban <urbanc@in.tum.de>
parents: 62
diff changeset
   161
  \[(r_1 + \ZERO) \cdot \ONE + ((\ONE + r_2) + r_3) \cdot (r_4 \cdot \ZERO)\]
8da9e0c16194 updated
Christian Urban <urbanc@in.tum.de>
parents: 62
diff changeset
   162
8da9e0c16194 updated
Christian Urban <urbanc@in.tum.de>
parents: 62
diff changeset
   163
  simplifies to just $r_1$. 
8da9e0c16194 updated
Christian Urban <urbanc@in.tum.de>
parents: 62
diff changeset
   164
  \hfill[1 Mark]
8da9e0c16194 updated
Christian Urban <urbanc@in.tum.de>
parents: 62
diff changeset
   165
8da9e0c16194 updated
Christian Urban <urbanc@in.tum.de>
parents: 62
diff changeset
   166
\item[(1d)] Implement two functions: The first, called \textit{ders},
69
f1295a0ab4ed updated
Christian Urban <urbanc@in.tum.de>
parents: 68
diff changeset
   167
  takes a list of characters and a regular expression as arguments, and
f1295a0ab4ed updated
Christian Urban <urbanc@in.tum.de>
parents: 68
diff changeset
   168
  builds the derivative w.r.t.~the list as follows:
68
8da9e0c16194 updated
Christian Urban <urbanc@in.tum.de>
parents: 62
diff changeset
   169
8da9e0c16194 updated
Christian Urban <urbanc@in.tum.de>
parents: 62
diff changeset
   170
\begin{center}
8da9e0c16194 updated
Christian Urban <urbanc@in.tum.de>
parents: 62
diff changeset
   171
\begin{tabular}{lcl}
69
f1295a0ab4ed updated
Christian Urban <urbanc@in.tum.de>
parents: 68
diff changeset
   172
$\textit{ders}\;(Nil)\;r$ & $\dn$ & $r$\\
f1295a0ab4ed updated
Christian Urban <urbanc@in.tum.de>
parents: 68
diff changeset
   173
  $\textit{ders}\;(c::cs)\;r$  & $\dn$ &
68
8da9e0c16194 updated
Christian Urban <urbanc@in.tum.de>
parents: 62
diff changeset
   174
    $\textit{ders}\;cs\;(\textit{simp}(\textit{der}\;c\;r))$\\
8da9e0c16194 updated
Christian Urban <urbanc@in.tum.de>
parents: 62
diff changeset
   175
\end{tabular}
6
aae256985251 updated
Christian Urban <urbanc@in.tum.de>
parents:
diff changeset
   176
\end{center}
aae256985251 updated
Christian Urban <urbanc@in.tum.de>
parents:
diff changeset
   177
68
8da9e0c16194 updated
Christian Urban <urbanc@in.tum.de>
parents: 62
diff changeset
   178
The second, called \textit{matcher}, takes a string and a regular expression
8da9e0c16194 updated
Christian Urban <urbanc@in.tum.de>
parents: 62
diff changeset
   179
as arguments. It builds first the derivatives according to \textit{ders}
69
f1295a0ab4ed updated
Christian Urban <urbanc@in.tum.de>
parents: 68
diff changeset
   180
and after that tests whether the resulting derivative regular expression can match
68
8da9e0c16194 updated
Christian Urban <urbanc@in.tum.de>
parents: 62
diff changeset
   181
the empty string (using \textit{nullable}).
69
f1295a0ab4ed updated
Christian Urban <urbanc@in.tum.de>
parents: 68
diff changeset
   182
For example the \textit{matcher} will produce true given the
f1295a0ab4ed updated
Christian Urban <urbanc@in.tum.de>
parents: 68
diff changeset
   183
regular expression $(a\cdot b)\cdot c$ and the string $abc$.
68
8da9e0c16194 updated
Christian Urban <urbanc@in.tum.de>
parents: 62
diff changeset
   184
\hfill[1 Mark]
6
aae256985251 updated
Christian Urban <urbanc@in.tum.de>
parents:
diff changeset
   185
69
f1295a0ab4ed updated
Christian Urban <urbanc@in.tum.de>
parents: 68
diff changeset
   186
\item[(1e)] Implement the function $\textit{replace}\;r\;s_1\;s_2$: it searches
f1295a0ab4ed updated
Christian Urban <urbanc@in.tum.de>
parents: 68
diff changeset
   187
  (from the left to 
f1295a0ab4ed updated
Christian Urban <urbanc@in.tum.de>
parents: 68
diff changeset
   188
right) in the string $s_1$ all the non-empty substrings that match the 
f1295a0ab4ed updated
Christian Urban <urbanc@in.tum.de>
parents: 68
diff changeset
   189
regular expression $r$---these substrings are assumed to be
68
8da9e0c16194 updated
Christian Urban <urbanc@in.tum.de>
parents: 62
diff changeset
   190
the longest substrings matched by the regular expression and
69
f1295a0ab4ed updated
Christian Urban <urbanc@in.tum.de>
parents: 68
diff changeset
   191
assumed to be non-overlapping. All these substrings in $s_1$ matched by $r$
f1295a0ab4ed updated
Christian Urban <urbanc@in.tum.de>
parents: 68
diff changeset
   192
are replaced by $s_2$. For example given the regular expression
6
aae256985251 updated
Christian Urban <urbanc@in.tum.de>
parents:
diff changeset
   193
68
8da9e0c16194 updated
Christian Urban <urbanc@in.tum.de>
parents: 62
diff changeset
   194
\[(a \cdot a)^* + (b \cdot b)\]
6
aae256985251 updated
Christian Urban <urbanc@in.tum.de>
parents:
diff changeset
   195
69
f1295a0ab4ed updated
Christian Urban <urbanc@in.tum.de>
parents: 68
diff changeset
   196
\noindent the string $s_1 = aabbbaaaaaaabaaaaabbaaaabb$ and
f1295a0ab4ed updated
Christian Urban <urbanc@in.tum.de>
parents: 68
diff changeset
   197
replacement string $s_2 = c$ yields the string
6
aae256985251 updated
Christian Urban <urbanc@in.tum.de>
parents:
diff changeset
   198
68
8da9e0c16194 updated
Christian Urban <urbanc@in.tum.de>
parents: 62
diff changeset
   199
\[
8da9e0c16194 updated
Christian Urban <urbanc@in.tum.de>
parents: 62
diff changeset
   200
ccbcabcaccc
8da9e0c16194 updated
Christian Urban <urbanc@in.tum.de>
parents: 62
diff changeset
   201
\]
6
aae256985251 updated
Christian Urban <urbanc@in.tum.de>
parents:
diff changeset
   202
68
8da9e0c16194 updated
Christian Urban <urbanc@in.tum.de>
parents: 62
diff changeset
   203
\hfill[2 Mark]
8da9e0c16194 updated
Christian Urban <urbanc@in.tum.de>
parents: 62
diff changeset
   204
\end{itemize}
6
aae256985251 updated
Christian Urban <urbanc@in.tum.de>
parents:
diff changeset
   205
75
71e463b33a9e updated
Christian Urban <urbanc@in.tum.de>
parents: 74
diff changeset
   206
\textbf{Background} Although easily implementable in Scala, the idea
71e463b33a9e updated
Christian Urban <urbanc@in.tum.de>
parents: 74
diff changeset
   207
behind the derivative function might not so easy to be seen. To
71e463b33a9e updated
Christian Urban <urbanc@in.tum.de>
parents: 74
diff changeset
   208
understand its purpose better, assume a regular expression $r$ can
71e463b33a9e updated
Christian Urban <urbanc@in.tum.de>
parents: 74
diff changeset
   209
match strings of the form $c::cs$ (that means strings which start with
71e463b33a9e updated
Christian Urban <urbanc@in.tum.de>
parents: 74
diff changeset
   210
a character $c$ and have some rest $cs$). If you now take the
71e463b33a9e updated
Christian Urban <urbanc@in.tum.de>
parents: 74
diff changeset
   211
derivative of $r$ with respect to the character $c$, then you obtain a
71e463b33a9e updated
Christian Urban <urbanc@in.tum.de>
parents: 74
diff changeset
   212
regular expressions that can match all the strings $cs$.  In other
71e463b33a9e updated
Christian Urban <urbanc@in.tum.de>
parents: 74
diff changeset
   213
words the regular expression $\textit{der}\;c\;r$ can match the same
71e463b33a9e updated
Christian Urban <urbanc@in.tum.de>
parents: 74
diff changeset
   214
strings $c::cs$ that can be matched by $r$, except that the $c$ is
71e463b33a9e updated
Christian Urban <urbanc@in.tum.de>
parents: 74
diff changeset
   215
chopped off.
71e463b33a9e updated
Christian Urban <urbanc@in.tum.de>
parents: 74
diff changeset
   216
71e463b33a9e updated
Christian Urban <urbanc@in.tum.de>
parents: 74
diff changeset
   217
Assume now $r$ can match the string $abc$. If you take the derivative
71e463b33a9e updated
Christian Urban <urbanc@in.tum.de>
parents: 74
diff changeset
   218
according to $a$ then you obtain a regular expression that can match
71e463b33a9e updated
Christian Urban <urbanc@in.tum.de>
parents: 74
diff changeset
   219
$bc$ (it is $abc$ where the $a$ has been chopped off). If you now
71e463b33a9e updated
Christian Urban <urbanc@in.tum.de>
parents: 74
diff changeset
   220
build the derivative $\textit{der}\;b\;(\textit{der}\;a\;r))$ you obtain a regular
71e463b33a9e updated
Christian Urban <urbanc@in.tum.de>
parents: 74
diff changeset
   221
expression that can match the string "c" (it is "bc" where 'b' is
71e463b33a9e updated
Christian Urban <urbanc@in.tum.de>
parents: 74
diff changeset
   222
chopped off). If you finally build the derivative of this according
71e463b33a9e updated
Christian Urban <urbanc@in.tum.de>
parents: 74
diff changeset
   223
'c', that is der('c', der('b', der('a', r))), you obtain a regular
71e463b33a9e updated
Christian Urban <urbanc@in.tum.de>
parents: 74
diff changeset
   224
expression that can match the empty string. You can test this using
71e463b33a9e updated
Christian Urban <urbanc@in.tum.de>
parents: 74
diff changeset
   225
the function nullable, which is what your matcher is doing.
71e463b33a9e updated
Christian Urban <urbanc@in.tum.de>
parents: 74
diff changeset
   226
71e463b33a9e updated
Christian Urban <urbanc@in.tum.de>
parents: 74
diff changeset
   227
The purpose of the simp function is to keep the regular expression small. Normally the derivative function makes the regular expression bigger (see the SEQ case) and the algorithm would be slower and slower over time. The simp function counters this increase in size and the result is that the algorithm is fast throughout. 
71e463b33a9e updated
Christian Urban <urbanc@in.tum.de>
parents: 74
diff changeset
   228
By the way this whole idea is by Janusz Brzozowski who came up with this in 1964 in his PhD thesis. 
71e463b33a9e updated
Christian Urban <urbanc@in.tum.de>
parents: 74
diff changeset
   229
https://en.wikipedia.org/wiki/Janusz_Brzozowski_(computer_scientist)
71e463b33a9e updated
Christian Urban <urbanc@in.tum.de>
parents: 74
diff changeset
   230
71e463b33a9e updated
Christian Urban <urbanc@in.tum.de>
parents: 74
diff changeset
   231
71e463b33a9e updated
Christian Urban <urbanc@in.tum.de>
parents: 74
diff changeset
   232
68
8da9e0c16194 updated
Christian Urban <urbanc@in.tum.de>
parents: 62
diff changeset
   233
\subsection*{Part 2 (4 Marks)}
8da9e0c16194 updated
Christian Urban <urbanc@in.tum.de>
parents: 62
diff changeset
   234
8da9e0c16194 updated
Christian Urban <urbanc@in.tum.de>
parents: 62
diff changeset
   235
Coming soon.
6
aae256985251 updated
Christian Urban <urbanc@in.tum.de>
parents:
diff changeset
   236
aae256985251 updated
Christian Urban <urbanc@in.tum.de>
parents:
diff changeset
   237
\end{document}
aae256985251 updated
Christian Urban <urbanc@in.tum.de>
parents:
diff changeset
   238
68
8da9e0c16194 updated
Christian Urban <urbanc@in.tum.de>
parents: 62
diff changeset
   239
6
aae256985251 updated
Christian Urban <urbanc@in.tum.de>
parents:
diff changeset
   240
%%% Local Variables: 
aae256985251 updated
Christian Urban <urbanc@in.tum.de>
parents:
diff changeset
   241
%%% mode: latex
aae256985251 updated
Christian Urban <urbanc@in.tum.de>
parents:
diff changeset
   242
%%% TeX-master: t
aae256985251 updated
Christian Urban <urbanc@in.tum.de>
parents:
diff changeset
   243
%%% End: