pep-material: cws/cw03.tex@71e463b33a9e (annotated)

6 aae256985251 updated Christian Urban <urbanc@in.tum.de> parents: diff changeset	1	\documentclass{article}
62 2151c77e1e24 updated Christian Urban <urbanc@in.tum.de> parents: 6 diff changeset	2	\usepackage{../style}
6 aae256985251 updated Christian Urban <urbanc@in.tum.de> parents: diff changeset	3	%%\usepackage{../langs}
aae256985251 updated Christian Urban <urbanc@in.tum.de> parents: diff changeset	4
aae256985251 updated Christian Urban <urbanc@in.tum.de> parents: diff changeset	5	\begin{document}
aae256985251 updated Christian Urban <urbanc@in.tum.de> parents: diff changeset	6
62 2151c77e1e24 updated Christian Urban <urbanc@in.tum.de> parents: 6 diff changeset	7	\section*{Coursework 8 (Scala, Regular Expressions}
6 aae256985251 updated Christian Urban <urbanc@in.tum.de> parents: diff changeset	8
68 8da9e0c16194 updated Christian Urban <urbanc@in.tum.de> parents: 62 diff changeset	9	This coursework is worth 10\%. It is about regular expressions and
8da9e0c16194 updated Christian Urban <urbanc@in.tum.de> parents: 62 diff changeset	10	pattern matching. The first part is due on 30 November at 11pm; the
8da9e0c16194 updated Christian Urban <urbanc@in.tum.de> parents: 62 diff changeset	11	second, more advanced part, is due on 7 December at 11pm. The
8da9e0c16194 updated Christian Urban <urbanc@in.tum.de> parents: 62 diff changeset	12	second part is not yet included. For the first part you are
8da9e0c16194 updated Christian Urban <urbanc@in.tum.de> parents: 62 diff changeset	13	asked to implement a regular expression matcher. Make sure the files
62 2151c77e1e24 updated Christian Urban <urbanc@in.tum.de> parents: 6 diff changeset	14	you submit can be processed by just calling \texttt{scala
68 8da9e0c16194 updated Christian Urban <urbanc@in.tum.de> parents: 62 diff changeset	15	<<filename.scala>>}.\bigskip
62 2151c77e1e24 updated Christian Urban <urbanc@in.tum.de> parents: 6 diff changeset	16
2151c77e1e24 updated Christian Urban <urbanc@in.tum.de> parents: 6 diff changeset	17	\noindent
2151c77e1e24 updated Christian Urban <urbanc@in.tum.de> parents: 6 diff changeset	18	\textbf{Important:} Do not use any mutable data structures in your
74 fb2d8012ed08 updated Christian Urban <urbanc@in.tum.de> parents: 69 diff changeset	19	submission! They are not needed. This excludes the use of
62 2151c77e1e24 updated Christian Urban <urbanc@in.tum.de> parents: 6 diff changeset	20	\texttt{ListBuffer}s, for example. Do not use \texttt{return} in your
68 8da9e0c16194 updated Christian Urban <urbanc@in.tum.de> parents: 62 diff changeset	21	code! It has a different meaning in Scala, than in Java. Do not use
8da9e0c16194 updated Christian Urban <urbanc@in.tum.de> parents: 62 diff changeset	22	\texttt{var}! This declares a mutable variable. Make sure the
62 2151c77e1e24 updated Christian Urban <urbanc@in.tum.de> parents: 6 diff changeset	23	functions you submit are defined on the ``top-level'' of Scala, not
2151c77e1e24 updated Christian Urban <urbanc@in.tum.de> parents: 6 diff changeset	24	inside a class or object.
6 aae256985251 updated Christian Urban <urbanc@in.tum.de> parents: diff changeset	25
aae256985251 updated Christian Urban <urbanc@in.tum.de> parents: diff changeset	26
68 8da9e0c16194 updated Christian Urban <urbanc@in.tum.de> parents: 62 diff changeset	27	\subsection*{Disclaimer!!!!!!!!}
6 aae256985251 updated Christian Urban <urbanc@in.tum.de> parents: diff changeset	28
aae256985251 updated Christian Urban <urbanc@in.tum.de> parents: diff changeset	29	It should be understood that the work you submit represents
68 8da9e0c16194 updated Christian Urban <urbanc@in.tum.de> parents: 62 diff changeset	30	your own effort! You have not copied from anyone else. An
6 aae256985251 updated Christian Urban <urbanc@in.tum.de> parents: diff changeset	31	exception is the Scala code I showed during the lectures or
aae256985251 updated Christian Urban <urbanc@in.tum.de> parents: diff changeset	32	uploaded to KEATS, which you can freely use.\bigskip
aae256985251 updated Christian Urban <urbanc@in.tum.de> parents: diff changeset	33
aae256985251 updated Christian Urban <urbanc@in.tum.de> parents: diff changeset	34
68 8da9e0c16194 updated Christian Urban <urbanc@in.tum.de> parents: 62 diff changeset	35	\subsection*{Part 1 (6 Marks)}
6 aae256985251 updated Christian Urban <urbanc@in.tum.de> parents: diff changeset	36
69 f1295a0ab4ed updated Christian Urban <urbanc@in.tum.de> parents: 68 diff changeset	37	The task is to implement a regular expression matcher that is based on
68 8da9e0c16194 updated Christian Urban <urbanc@in.tum.de> parents: 62 diff changeset	38	derivatives of regular expressions. The implementation can deal
8da9e0c16194 updated Christian Urban <urbanc@in.tum.de> parents: 62 diff changeset	39	with the following regular expressions, which have been predefined
69 f1295a0ab4ed updated Christian Urban <urbanc@in.tum.de> parents: 68 diff changeset	40	in the file re.scala:
6 aae256985251 updated Christian Urban <urbanc@in.tum.de> parents: diff changeset	41
aae256985251 updated Christian Urban <urbanc@in.tum.de> parents: diff changeset	42	\begin{center}
aae256985251 updated Christian Urban <urbanc@in.tum.de> parents: diff changeset	43	\begin{tabular}{lcll}
aae256985251 updated Christian Urban <urbanc@in.tum.de> parents: diff changeset	44	$r$ & $::=$ & $\ZERO$ & cannot match anything\\
aae256985251 updated Christian Urban <urbanc@in.tum.de> parents: diff changeset	45	& $\|$ & $\ONE$ & can only match the empty string\\
68 8da9e0c16194 updated Christian Urban <urbanc@in.tum.de> parents: 62 diff changeset	46	& $\|$ & $c$ & can match a character (in this case $c$)\\
8da9e0c16194 updated Christian Urban <urbanc@in.tum.de> parents: 62 diff changeset	47	& $\|$ & $r_1 + r_2$ & can match a string either with $r_1$ or with $r_2$\\
8da9e0c16194 updated Christian Urban <urbanc@in.tum.de> parents: 62 diff changeset	48	& $\|$ & $r_1\cdot r_2$ & can match the first part of a string with $r_1$ and\\
8da9e0c16194 updated Christian Urban <urbanc@in.tum.de> parents: 62 diff changeset	49	& & & then the second part with $r_2$\\
6 aae256985251 updated Christian Urban <urbanc@in.tum.de> parents: diff changeset	50	& $\|$ & $r^*$ & can match zero or more times $r$\\
aae256985251 updated Christian Urban <urbanc@in.tum.de> parents: diff changeset	51	\end{tabular}
aae256985251 updated Christian Urban <urbanc@in.tum.de> parents: diff changeset	52	\end{center}
aae256985251 updated Christian Urban <urbanc@in.tum.de> parents: diff changeset	53
68 8da9e0c16194 updated Christian Urban <urbanc@in.tum.de> parents: 62 diff changeset	54	\noindent
8da9e0c16194 updated Christian Urban <urbanc@in.tum.de> parents: 62 diff changeset	55	Why? Knowing how to match regular expressions and strings fast will
8da9e0c16194 updated Christian Urban <urbanc@in.tum.de> parents: 62 diff changeset	56	let you solve a lot of problems that vex other humans. Regular
8da9e0c16194 updated Christian Urban <urbanc@in.tum.de> parents: 62 diff changeset	57	expressions are one of the fastest and simplest ways to match patterns
8da9e0c16194 updated Christian Urban <urbanc@in.tum.de> parents: 62 diff changeset	58	in text, and are endlessly useful for searching, editing and
8da9e0c16194 updated Christian Urban <urbanc@in.tum.de> parents: 62 diff changeset	59	analysing text in all sorts of places. However, you need to be
69 f1295a0ab4ed updated Christian Urban <urbanc@in.tum.de> parents: 68 diff changeset	60	fast, otherwise you will stumble over problems such as recently reported at
68 8da9e0c16194 updated Christian Urban <urbanc@in.tum.de> parents: 62 diff changeset	61
8da9e0c16194 updated Christian Urban <urbanc@in.tum.de> parents: 62 diff changeset	62	{\small
8da9e0c16194 updated Christian Urban <urbanc@in.tum.de> parents: 62 diff changeset	63	\begin{itemize}
8da9e0c16194 updated Christian Urban <urbanc@in.tum.de> parents: 62 diff changeset	64	\item[$\bullet$] \url{http://stackstatus.net/post/147710624694/outage-postmortem-july-20-2016}
8da9e0c16194 updated Christian Urban <urbanc@in.tum.de> parents: 62 diff changeset	65	\item[$\bullet$] \url{https://vimeo.com/112065252}
8da9e0c16194 updated Christian Urban <urbanc@in.tum.de> parents: 62 diff changeset	66	\item[$\bullet$] \url{http://davidvgalbraith.com/how-i-fixed-atom/}
8da9e0c16194 updated Christian Urban <urbanc@in.tum.de> parents: 62 diff changeset	67	\end{itemize}}
8da9e0c16194 updated Christian Urban <urbanc@in.tum.de> parents: 62 diff changeset	68
8da9e0c16194 updated Christian Urban <urbanc@in.tum.de> parents: 62 diff changeset	69	\subsection*{Tasks (file re.scala)}
8da9e0c16194 updated Christian Urban <urbanc@in.tum.de> parents: 62 diff changeset	70
8da9e0c16194 updated Christian Urban <urbanc@in.tum.de> parents: 62 diff changeset	71	\begin{itemize}
8da9e0c16194 updated Christian Urban <urbanc@in.tum.de> parents: 62 diff changeset	72	\item[(1a)] Implement a function, called \textit{nullable}, by recursion over
69 f1295a0ab4ed updated Christian Urban <urbanc@in.tum.de> parents: 68 diff changeset	73	regular expressions. This function tests whether a regular expression can match
68 8da9e0c16194 updated Christian Urban <urbanc@in.tum.de> parents: 62 diff changeset	74	the empty string.
6 aae256985251 updated Christian Urban <urbanc@in.tum.de> parents: diff changeset	75
aae256985251 updated Christian Urban <urbanc@in.tum.de> parents: diff changeset	76	\begin{center}
aae256985251 updated Christian Urban <urbanc@in.tum.de> parents: diff changeset	77	\begin{tabular}{lcl}
aae256985251 updated Christian Urban <urbanc@in.tum.de> parents: diff changeset	78	$\textit{nullable}(\ZERO)$ & $\dn$ & $\textit{false}$\\
aae256985251 updated Christian Urban <urbanc@in.tum.de> parents: diff changeset	79	$\textit{nullable}(\ONE)$ & $\dn$ & $\textit{true}$\\
aae256985251 updated Christian Urban <urbanc@in.tum.de> parents: diff changeset	80	$\textit{nullable}(c)$ & $\dn$ & $\textit{false}$\\
aae256985251 updated Christian Urban <urbanc@in.tum.de> parents: diff changeset	81	$\textit{nullable}(r_1 + r_2)$ & $\dn$ & $\textit{nullable}(r_1) \vee \textit{nullable}(r_2)$\\
aae256985251 updated Christian Urban <urbanc@in.tum.de> parents: diff changeset	82	$\textit{nullable}(r_1 \cdot r_2)$ & $\dn$ & $\textit{nullable}(r_1) \wedge \textit{nullable}(r_2)$\\
aae256985251 updated Christian Urban <urbanc@in.tum.de> parents: diff changeset	83	$\textit{nullable}(r^*)$ & $\dn$ & $\textit{true}$\\
aae256985251 updated Christian Urban <urbanc@in.tum.de> parents: diff changeset	84	\end{tabular}
68 8da9e0c16194 updated Christian Urban <urbanc@in.tum.de> parents: 62 diff changeset	85	\end{center}\hfill[1 Mark]
8da9e0c16194 updated Christian Urban <urbanc@in.tum.de> parents: 62 diff changeset	86
8da9e0c16194 updated Christian Urban <urbanc@in.tum.de> parents: 62 diff changeset	87	\item[(1b)] Implement a function, called \textit{der}, by recursion over
8da9e0c16194 updated Christian Urban <urbanc@in.tum.de> parents: 62 diff changeset	88	regular expressions. It takes a character and a regular expression
69 f1295a0ab4ed updated Christian Urban <urbanc@in.tum.de> parents: 68 diff changeset	89	as arguments and calculates the derivative regular expression according
f1295a0ab4ed updated Christian Urban <urbanc@in.tum.de> parents: 68 diff changeset	90	to the rules:
6 aae256985251 updated Christian Urban <urbanc@in.tum.de> parents: diff changeset	91
aae256985251 updated Christian Urban <urbanc@in.tum.de> parents: diff changeset	92	\begin{center}
aae256985251 updated Christian Urban <urbanc@in.tum.de> parents: diff changeset	93	\begin{tabular}{lcl}
aae256985251 updated Christian Urban <urbanc@in.tum.de> parents: diff changeset	94	$\textit{der}\;c\;(\ZERO)$ & $\dn$ & $\ZERO$\\
aae256985251 updated Christian Urban <urbanc@in.tum.de> parents: diff changeset	95	$\textit{der}\;c\;(\ONE)$ & $\dn$ & $\ZERO$\\
aae256985251 updated Christian Urban <urbanc@in.tum.de> parents: diff changeset	96	$\textit{der}\;c\;(d)$ & $\dn$ & $\textit{if}\; c = d\;\textit{then} \;\ONE \; \textit{else} \;\ZERO$\\
aae256985251 updated Christian Urban <urbanc@in.tum.de> parents: diff changeset	97	$\textit{der}\;c\;(r_1 + r_2)$ & $\dn$ & $(\textit{der}\;c\;r_1) + (\textit{der}\;c\;r_2)$\\
aae256985251 updated Christian Urban <urbanc@in.tum.de> parents: diff changeset	98	$\textit{der}\;c\;(r_1 \cdot r_2)$ & $\dn$ & $\textit{if}\;\textit{nullable}(r_1)$\\
aae256985251 updated Christian Urban <urbanc@in.tum.de> parents: diff changeset	99	& & $\textit{then}\;((\textit{der}\;c\;r_1)\cdot r_2) + (\textit{der}\;c\;r_2)$\\
aae256985251 updated Christian Urban <urbanc@in.tum.de> parents: diff changeset	100	& & $\textit{else}\;(\textit{der}\;c\;r_1)\cdot r_2$\\
aae256985251 updated Christian Urban <urbanc@in.tum.de> parents: diff changeset	101	$\textit{der}\;c\;(r^)$ & $\dn$ & $(\textit{der}\;c\;r)\cdot (r^)$\\
aae256985251 updated Christian Urban <urbanc@in.tum.de> parents: diff changeset	102	\end{tabular}
69 f1295a0ab4ed updated Christian Urban <urbanc@in.tum.de> parents: 68 diff changeset	103	\end{center}
f1295a0ab4ed updated Christian Urban <urbanc@in.tum.de> parents: 68 diff changeset	104
f1295a0ab4ed updated Christian Urban <urbanc@in.tum.de> parents: 68 diff changeset	105	For example given the regular expression $r = (a \cdot b) \cdot c$, the derivatives
f1295a0ab4ed updated Christian Urban <urbanc@in.tum.de> parents: 68 diff changeset	106	w.r.t.~the characters $a$, $b$ and $c$ are
f1295a0ab4ed updated Christian Urban <urbanc@in.tum.de> parents: 68 diff changeset	107
f1295a0ab4ed updated Christian Urban <urbanc@in.tum.de> parents: 68 diff changeset	108	\begin{center}
f1295a0ab4ed updated Christian Urban <urbanc@in.tum.de> parents: 68 diff changeset	109	\begin{tabular}{lcll}
f1295a0ab4ed updated Christian Urban <urbanc@in.tum.de> parents: 68 diff changeset	110	$\textit{der}\;a\;r$ & $=$ & $(\ONE \cdot b)\cdot c$ & ($= r'$)\\
f1295a0ab4ed updated Christian Urban <urbanc@in.tum.de> parents: 68 diff changeset	111	$\textit{der}\;b\;r$ & $=$ & $(\ZERO \cdot b)\cdot c$\\
f1295a0ab4ed updated Christian Urban <urbanc@in.tum.de> parents: 68 diff changeset	112	$\textit{der}\;c\;r$ & $=$ & $(\ZERO \cdot b)\cdot c$
f1295a0ab4ed updated Christian Urban <urbanc@in.tum.de> parents: 68 diff changeset	113	\end{tabular}
f1295a0ab4ed updated Christian Urban <urbanc@in.tum.de> parents: 68 diff changeset	114	\end{center}
f1295a0ab4ed updated Christian Urban <urbanc@in.tum.de> parents: 68 diff changeset	115
f1295a0ab4ed updated Christian Urban <urbanc@in.tum.de> parents: 68 diff changeset	116	Let $r'$ stand for the first derivative, then taking the derivatives of $r'$
f1295a0ab4ed updated Christian Urban <urbanc@in.tum.de> parents: 68 diff changeset	117	w.r.t.~the characters $a$, $b$ and $c$ gives
f1295a0ab4ed updated Christian Urban <urbanc@in.tum.de> parents: 68 diff changeset	118
f1295a0ab4ed updated Christian Urban <urbanc@in.tum.de> parents: 68 diff changeset	119	\begin{center}
f1295a0ab4ed updated Christian Urban <urbanc@in.tum.de> parents: 68 diff changeset	120	\begin{tabular}{lcll}
f1295a0ab4ed updated Christian Urban <urbanc@in.tum.de> parents: 68 diff changeset	121	$\textit{der}\;a\;r'$ & $=$ & $((\ZERO \cdot b) + \ZERO)\cdot c$ \\
f1295a0ab4ed updated Christian Urban <urbanc@in.tum.de> parents: 68 diff changeset	122	$\textit{der}\;b\;r'$ & $=$ & $((\ZERO \cdot b) + \ONE)\cdot c$ & ($= r''$)\\
f1295a0ab4ed updated Christian Urban <urbanc@in.tum.de> parents: 68 diff changeset	123	$\textit{der}\;c\;r'$ & $=$ & $((\ZERO \cdot b) + \ZERO)\cdot c$
f1295a0ab4ed updated Christian Urban <urbanc@in.tum.de> parents: 68 diff changeset	124	\end{tabular}
f1295a0ab4ed updated Christian Urban <urbanc@in.tum.de> parents: 68 diff changeset	125	\end{center}
f1295a0ab4ed updated Christian Urban <urbanc@in.tum.de> parents: 68 diff changeset	126
f1295a0ab4ed updated Christian Urban <urbanc@in.tum.de> parents: 68 diff changeset	127	One more example: Let $r''$ stand for the second derivative above,
f1295a0ab4ed updated Christian Urban <urbanc@in.tum.de> parents: 68 diff changeset	128	then taking the derivatives of $r''$ w.r.t.~the characters $a$, $b$
f1295a0ab4ed updated Christian Urban <urbanc@in.tum.de> parents: 68 diff changeset	129	and $c$ gives
f1295a0ab4ed updated Christian Urban <urbanc@in.tum.de> parents: 68 diff changeset	130
f1295a0ab4ed updated Christian Urban <urbanc@in.tum.de> parents: 68 diff changeset	131	\begin{center}
f1295a0ab4ed updated Christian Urban <urbanc@in.tum.de> parents: 68 diff changeset	132	\begin{tabular}{lcll}
f1295a0ab4ed updated Christian Urban <urbanc@in.tum.de> parents: 68 diff changeset	133	$\textit{der}\;a\;r''$ & $=$ & $((\ZERO \cdot b) + \ZERO) \cdot c + \ZERO$ \\
f1295a0ab4ed updated Christian Urban <urbanc@in.tum.de> parents: 68 diff changeset	134	$\textit{der}\;b\;r''$ & $=$ & $((\ZERO \cdot b) + \ZERO) \cdot c + \ZERO$\\
f1295a0ab4ed updated Christian Urban <urbanc@in.tum.de> parents: 68 diff changeset	135	$\textit{der}\;c\;r''$ & $=$ & $((\ZERO \cdot b) + \ZERO) \cdot c + \ONE$
f1295a0ab4ed updated Christian Urban <urbanc@in.tum.de> parents: 68 diff changeset	136	\end{tabular}
f1295a0ab4ed updated Christian Urban <urbanc@in.tum.de> parents: 68 diff changeset	137	\end{center}
f1295a0ab4ed updated Christian Urban <urbanc@in.tum.de> parents: 68 diff changeset	138
f1295a0ab4ed updated Christian Urban <urbanc@in.tum.de> parents: 68 diff changeset	139	Note, the last derivative can match the empty string, that is it is \textit{nullable}.\\
f1295a0ab4ed updated Christian Urban <urbanc@in.tum.de> parents: 68 diff changeset	140	\mbox{}\hfill\mbox{[1 Mark]}
6 aae256985251 updated Christian Urban <urbanc@in.tum.de> parents: diff changeset	141
68 8da9e0c16194 updated Christian Urban <urbanc@in.tum.de> parents: 62 diff changeset	142	\item[(1c)] Implement the function \textit{simp}, which recursively
69 f1295a0ab4ed updated Christian Urban <urbanc@in.tum.de> parents: 68 diff changeset	143	traverses a regular expression from the inside to the outside, and
f1295a0ab4ed updated Christian Urban <urbanc@in.tum.de> parents: 68 diff changeset	144	simplifies every sub-regular-expression on the left (see below) to
68 8da9e0c16194 updated Christian Urban <urbanc@in.tum.de> parents: 62 diff changeset	145	the regular expression on the right, except it does not simplify inside
8da9e0c16194 updated Christian Urban <urbanc@in.tum.de> parents: 62 diff changeset	146	${}^*$-regular expressions.
6 aae256985251 updated Christian Urban <urbanc@in.tum.de> parents: diff changeset	147
68 8da9e0c16194 updated Christian Urban <urbanc@in.tum.de> parents: 62 diff changeset	148	\begin{center}
69 f1295a0ab4ed updated Christian Urban <urbanc@in.tum.de> parents: 68 diff changeset	149	\begin{tabular}{l@{\hspace{4mm}}c@{\hspace{4mm}}ll}
6 aae256985251 updated Christian Urban <urbanc@in.tum.de> parents: diff changeset	150	$r \cdot \ZERO$ & $\mapsto$ & $\ZERO$\\
aae256985251 updated Christian Urban <urbanc@in.tum.de> parents: diff changeset	151	$\ZERO \cdot r$ & $\mapsto$ & $\ZERO$\\
aae256985251 updated Christian Urban <urbanc@in.tum.de> parents: diff changeset	152	$r \cdot \ONE$ & $\mapsto$ & $r$\\
aae256985251 updated Christian Urban <urbanc@in.tum.de> parents: diff changeset	153	$\ONE \cdot r$ & $\mapsto$ & $r$\\
aae256985251 updated Christian Urban <urbanc@in.tum.de> parents: diff changeset	154	$r + \ZERO$ & $\mapsto$ & $r$\\
aae256985251 updated Christian Urban <urbanc@in.tum.de> parents: diff changeset	155	$\ZERO + r$ & $\mapsto$ & $r$\\
aae256985251 updated Christian Urban <urbanc@in.tum.de> parents: diff changeset	156	$r + r$ & $\mapsto$ & $r$\\
aae256985251 updated Christian Urban <urbanc@in.tum.de> parents: diff changeset	157	\end{tabular}
68 8da9e0c16194 updated Christian Urban <urbanc@in.tum.de> parents: 62 diff changeset	158	\end{center}
8da9e0c16194 updated Christian Urban <urbanc@in.tum.de> parents: 62 diff changeset	159
69 f1295a0ab4ed updated Christian Urban <urbanc@in.tum.de> parents: 68 diff changeset	160	For example the regular expression
68 8da9e0c16194 updated Christian Urban <urbanc@in.tum.de> parents: 62 diff changeset	161	\[(r_1 + \ZERO) \cdot \ONE + ((\ONE + r_2) + r_3) \cdot (r_4 \cdot \ZERO)\]
8da9e0c16194 updated Christian Urban <urbanc@in.tum.de> parents: 62 diff changeset	162
8da9e0c16194 updated Christian Urban <urbanc@in.tum.de> parents: 62 diff changeset	163	simplifies to just $r_1$.
8da9e0c16194 updated Christian Urban <urbanc@in.tum.de> parents: 62 diff changeset	164	\hfill[1 Mark]
8da9e0c16194 updated Christian Urban <urbanc@in.tum.de> parents: 62 diff changeset	165
8da9e0c16194 updated Christian Urban <urbanc@in.tum.de> parents: 62 diff changeset	166	\item[(1d)] Implement two functions: The first, called \textit{ders},
69 f1295a0ab4ed updated Christian Urban <urbanc@in.tum.de> parents: 68 diff changeset	167	takes a list of characters and a regular expression as arguments, and
f1295a0ab4ed updated Christian Urban <urbanc@in.tum.de> parents: 68 diff changeset	168	builds the derivative w.r.t.~the list as follows:
68 8da9e0c16194 updated Christian Urban <urbanc@in.tum.de> parents: 62 diff changeset	169
8da9e0c16194 updated Christian Urban <urbanc@in.tum.de> parents: 62 diff changeset	170	\begin{center}
8da9e0c16194 updated Christian Urban <urbanc@in.tum.de> parents: 62 diff changeset	171	\begin{tabular}{lcl}
69 f1295a0ab4ed updated Christian Urban <urbanc@in.tum.de> parents: 68 diff changeset	172	$\textit{ders}\;(Nil)\;r$ & $\dn$ & $r$\\
f1295a0ab4ed updated Christian Urban <urbanc@in.tum.de> parents: 68 diff changeset	173	$\textit{ders}\;(c::cs)\;r$ & $\dn$ &
68 8da9e0c16194 updated Christian Urban <urbanc@in.tum.de> parents: 62 diff changeset	174	$\textit{ders}\;cs\;(\textit{simp}(\textit{der}\;c\;r))$\\
8da9e0c16194 updated Christian Urban <urbanc@in.tum.de> parents: 62 diff changeset	175	\end{tabular}
6 aae256985251 updated Christian Urban <urbanc@in.tum.de> parents: diff changeset	176	\end{center}
aae256985251 updated Christian Urban <urbanc@in.tum.de> parents: diff changeset	177
68 8da9e0c16194 updated Christian Urban <urbanc@in.tum.de> parents: 62 diff changeset	178	The second, called \textit{matcher}, takes a string and a regular expression
8da9e0c16194 updated Christian Urban <urbanc@in.tum.de> parents: 62 diff changeset	179	as arguments. It builds first the derivatives according to \textit{ders}
69 f1295a0ab4ed updated Christian Urban <urbanc@in.tum.de> parents: 68 diff changeset	180	and after that tests whether the resulting derivative regular expression can match
68 8da9e0c16194 updated Christian Urban <urbanc@in.tum.de> parents: 62 diff changeset	181	the empty string (using \textit{nullable}).
69 f1295a0ab4ed updated Christian Urban <urbanc@in.tum.de> parents: 68 diff changeset	182	For example the \textit{matcher} will produce true given the
f1295a0ab4ed updated Christian Urban <urbanc@in.tum.de> parents: 68 diff changeset	183	regular expression $(a\cdot b)\cdot c$ and the string $abc$.
68 8da9e0c16194 updated Christian Urban <urbanc@in.tum.de> parents: 62 diff changeset	184	\hfill[1 Mark]
6 aae256985251 updated Christian Urban <urbanc@in.tum.de> parents: diff changeset	185
69 f1295a0ab4ed updated Christian Urban <urbanc@in.tum.de> parents: 68 diff changeset	186	\item[(1e)] Implement the function $\textit{replace}\;r\;s_1\;s_2$: it searches
f1295a0ab4ed updated Christian Urban <urbanc@in.tum.de> parents: 68 diff changeset	187	(from the left to
f1295a0ab4ed updated Christian Urban <urbanc@in.tum.de> parents: 68 diff changeset	188	right) in the string $s_1$ all the non-empty substrings that match the
f1295a0ab4ed updated Christian Urban <urbanc@in.tum.de> parents: 68 diff changeset	189	regular expression $r$---these substrings are assumed to be
68 8da9e0c16194 updated Christian Urban <urbanc@in.tum.de> parents: 62 diff changeset	190	the longest substrings matched by the regular expression and
69 f1295a0ab4ed updated Christian Urban <urbanc@in.tum.de> parents: 68 diff changeset	191	assumed to be non-overlapping. All these substrings in $s_1$ matched by $r$
f1295a0ab4ed updated Christian Urban <urbanc@in.tum.de> parents: 68 diff changeset	192	are replaced by $s_2$. For example given the regular expression
6 aae256985251 updated Christian Urban <urbanc@in.tum.de> parents: diff changeset	193
68 8da9e0c16194 updated Christian Urban <urbanc@in.tum.de> parents: 62 diff changeset	194	\[(a \cdot a)^* + (b \cdot b)\]
6 aae256985251 updated Christian Urban <urbanc@in.tum.de> parents: diff changeset	195
69 f1295a0ab4ed updated Christian Urban <urbanc@in.tum.de> parents: 68 diff changeset	196	\noindent the string $s_1 = aabbbaaaaaaabaaaaabbaaaabb$ and
f1295a0ab4ed updated Christian Urban <urbanc@in.tum.de> parents: 68 diff changeset	197	replacement string $s_2 = c$ yields the string
6 aae256985251 updated Christian Urban <urbanc@in.tum.de> parents: diff changeset	198
68 8da9e0c16194 updated Christian Urban <urbanc@in.tum.de> parents: 62 diff changeset	199	\[
8da9e0c16194 updated Christian Urban <urbanc@in.tum.de> parents: 62 diff changeset	200	ccbcabcaccc
8da9e0c16194 updated Christian Urban <urbanc@in.tum.de> parents: 62 diff changeset	201	\]
6 aae256985251 updated Christian Urban <urbanc@in.tum.de> parents: diff changeset	202
68 8da9e0c16194 updated Christian Urban <urbanc@in.tum.de> parents: 62 diff changeset	203	\hfill[2 Mark]
8da9e0c16194 updated Christian Urban <urbanc@in.tum.de> parents: 62 diff changeset	204	\end{itemize}
6 aae256985251 updated Christian Urban <urbanc@in.tum.de> parents: diff changeset	205
75 71e463b33a9e updated Christian Urban <urbanc@in.tum.de> parents: 74 diff changeset	206	\textbf{Background} Although easily implementable in Scala, the idea
71e463b33a9e updated Christian Urban <urbanc@in.tum.de> parents: 74 diff changeset	207	behind the derivative function might not so easy to be seen. To
71e463b33a9e updated Christian Urban <urbanc@in.tum.de> parents: 74 diff changeset	208	understand its purpose better, assume a regular expression $r$ can
71e463b33a9e updated Christian Urban <urbanc@in.tum.de> parents: 74 diff changeset	209	match strings of the form $c::cs$ (that means strings which start with
71e463b33a9e updated Christian Urban <urbanc@in.tum.de> parents: 74 diff changeset	210	a character $c$ and have some rest $cs$). If you now take the
71e463b33a9e updated Christian Urban <urbanc@in.tum.de> parents: 74 diff changeset	211	derivative of $r$ with respect to the character $c$, then you obtain a
71e463b33a9e updated Christian Urban <urbanc@in.tum.de> parents: 74 diff changeset	212	regular expressions that can match all the strings $cs$. In other
71e463b33a9e updated Christian Urban <urbanc@in.tum.de> parents: 74 diff changeset	213	words the regular expression $\textit{der}\;c\;r$ can match the same
71e463b33a9e updated Christian Urban <urbanc@in.tum.de> parents: 74 diff changeset	214	strings $c::cs$ that can be matched by $r$, except that the $c$ is
71e463b33a9e updated Christian Urban <urbanc@in.tum.de> parents: 74 diff changeset	215	chopped off.
71e463b33a9e updated Christian Urban <urbanc@in.tum.de> parents: 74 diff changeset	216
71e463b33a9e updated Christian Urban <urbanc@in.tum.de> parents: 74 diff changeset	217	Assume now $r$ can match the string $abc$. If you take the derivative
71e463b33a9e updated Christian Urban <urbanc@in.tum.de> parents: 74 diff changeset	218	according to $a$ then you obtain a regular expression that can match
71e463b33a9e updated Christian Urban <urbanc@in.tum.de> parents: 74 diff changeset	219	$bc$ (it is $abc$ where the $a$ has been chopped off). If you now
71e463b33a9e updated Christian Urban <urbanc@in.tum.de> parents: 74 diff changeset	220	build the derivative $\textit{der}\;b\;(\textit{der}\;a\;r))$ you obtain a regular
71e463b33a9e updated Christian Urban <urbanc@in.tum.de> parents: 74 diff changeset	221	expression that can match the string "c" (it is "bc" where 'b' is
71e463b33a9e updated Christian Urban <urbanc@in.tum.de> parents: 74 diff changeset	222	chopped off). If you finally build the derivative of this according
71e463b33a9e updated Christian Urban <urbanc@in.tum.de> parents: 74 diff changeset	223	'c', that is der('c', der('b', der('a', r))), you obtain a regular
71e463b33a9e updated Christian Urban <urbanc@in.tum.de> parents: 74 diff changeset	224	expression that can match the empty string. You can test this using
71e463b33a9e updated Christian Urban <urbanc@in.tum.de> parents: 74 diff changeset	225	the function nullable, which is what your matcher is doing.
71e463b33a9e updated Christian Urban <urbanc@in.tum.de> parents: 74 diff changeset	226
71e463b33a9e updated Christian Urban <urbanc@in.tum.de> parents: 74 diff changeset	227	The purpose of the simp function is to keep the regular expression small. Normally the derivative function makes the regular expression bigger (see the SEQ case) and the algorithm would be slower and slower over time. The simp function counters this increase in size and the result is that the algorithm is fast throughout.
71e463b33a9e updated Christian Urban <urbanc@in.tum.de> parents: 74 diff changeset	228	By the way this whole idea is by Janusz Brzozowski who came up with this in 1964 in his PhD thesis.
71e463b33a9e updated Christian Urban <urbanc@in.tum.de> parents: 74 diff changeset	229	https://en.wikipedia.org/wiki/Janusz_Brzozowski_(computer_scientist)
71e463b33a9e updated Christian Urban <urbanc@in.tum.de> parents: 74 diff changeset	230
71e463b33a9e updated Christian Urban <urbanc@in.tum.de> parents: 74 diff changeset	231
71e463b33a9e updated Christian Urban <urbanc@in.tum.de> parents: 74 diff changeset	232
68 8da9e0c16194 updated Christian Urban <urbanc@in.tum.de> parents: 62 diff changeset	233	\subsection*{Part 2 (4 Marks)}
8da9e0c16194 updated Christian Urban <urbanc@in.tum.de> parents: 62 diff changeset	234
8da9e0c16194 updated Christian Urban <urbanc@in.tum.de> parents: 62 diff changeset	235	Coming soon.
6 aae256985251 updated Christian Urban <urbanc@in.tum.de> parents: diff changeset	236
aae256985251 updated Christian Urban <urbanc@in.tum.de> parents: diff changeset	237	\end{document}
aae256985251 updated Christian Urban <urbanc@in.tum.de> parents: diff changeset	238
68 8da9e0c16194 updated Christian Urban <urbanc@in.tum.de> parents: 62 diff changeset	239
6 aae256985251 updated Christian Urban <urbanc@in.tum.de> parents: diff changeset	240	%%% Local Variables:
aae256985251 updated Christian Urban <urbanc@in.tum.de> parents: diff changeset	241	%%% mode: latex
aae256985251 updated Christian Urban <urbanc@in.tum.de> parents: diff changeset	242	%%% TeX-master: t
aae256985251 updated Christian Urban <urbanc@in.tum.de> parents: diff changeset	243	%%% End:

author	Christian Urban <urbanc@in.tum.de>
	Sat, 26 Nov 2016 19:12:33 +0000
changeset 75	71e463b33a9e
parent 74	fb2d8012ed08
child 78	85f2f75abeeb
permissions	-rw-r--r--