% !TEX program = xelatex\documentclass{article}\usepackage{../styles/style}\usepackage{../styles/langs}\usepackage{disclaimer}\usepackage{tikz}\usepackage{pgf}\usepackage{pgfplots}\usepackage{stackengine}%% \usepackage{accents}\newcommand\barbelow[1]{\stackunder[1.2pt]{#1}{\raisebox{-4mm}{\boldmath$\uparrow$}}}\begin{filecontents}{re-python2.data}1 0.0335 0.03610 0.03415 0.03618 0.05919 0.084 20 0.14121 0.24822 0.48523 0.87824 1.7125 3.4026 7.0827 14.1228 26.69\end{filecontents}\begin{filecontents}{re-java.data}5 0.0029810 0.0041815 0.0099616 0.0171017 0.0349218 0.0330319 0.0508420 0.1017721 0.1996022 0.4115923 0.8223424 1.7025125 3.3611226 6.6399827 13.3512028 29.81185\end{filecontents}\begin{filecontents}{re-js.data}5 0.06110 0.06115 0.06120 0.07023 0.13125 0.30826 0.56428 1.99430 7.64831 15.881 32 32.190\end{filecontents}\begin{filecontents}{re-java9.data}1000 0.014102000 0.048823000 0.106094000 0.174565000 0.275306000 0.411167000 0.537418000 0.702619000 0.9398110000 0.9741911000 1.2869712000 1.5138714000 2.0707916000 2.6984620000 4.4182324000 6.4607726000 7.6437330000 9.9944634000 12.96688538000 16.28162142000 19.18022846000 21.98472150000 26.95020360000 43.0327746\end{filecontents}\begin{filecontents}{re-swift.data}5 0.00110 0.00115 0.00920 0.17823 1.39924 2.89325 5.67126 11.35727 22.430\end{filecontents}\begin{filecontents}{re-dart.data}20 0.04221 0.08422 0.19023 0.34024 0.67825 1.36926 2.70027 5.46228 10.90829 21.72530 43.492\end{filecontents}\begin{document}% BF IDE% https://www.microsoft.com/en-us/p/brainf-ck/9nblgggzhvq5\section*{Main Part 3 (Scala, 7 Marks)}\mbox{}\hfill\textit{``Java is the most distressing thing to happen to computing since MS-DOS.''}\smallskip\\\mbox{}\hfill\textit{ --- Alan Kay, the inventor of object-oriented programming}\bigskip\medskip\noindentThis part is about a regular expression matcher described byBrzozowski in 1964. Thebackground is that ``out-of-the-box'' regular expression matching inmainstream languages like Java, JavaScript and Python can sometimes beexcruciatingly slow. You are supposed to implement a regularexpression matcher that is much, much faster. \bigskip\IMPORTANTNONE{}\noindentAlso note that the running time of each part will be restricted to amaximum of 30 seconds on my laptop. \DISCLAIMER{}\subsection*{Reference Implementation}This Scala assignment comes with a reference implementation in form ofa \texttt{jar}-file. This allows you to run any test cases on your owncomputer. For example you can call \texttt{scala-cli} on the commandline with the option \texttt{--extra-jars re.jar} and then query any functionfrom the \texttt{re.scala} template file. As usual you have to prefixthe calls with \texttt{M3} or import this object. Since some tasksare time sensitive, you can check the reference implementation asfollows: if you want to know, for example, how long it takes to matchstrings of $a$'s using the regular expression $(a^*)^*\cdot b$ you canquery as follows:\begin{lstlisting}[xleftmargin=1mm,numbers=none,basicstyle=\ttfamily\small]$ scala-cli --extra-jars re.jarscala> import M3._ scala> for (i <- 0 to 5000000 by 500000) { println(s"$i: ${time_needed(2, matcher(EVIL, "a" * i))}")}0: 0.00002 secs.500000: 0.10608 secs.1000000: 0.22286 secs.1500000: 0.35982 secs.2000000: 0.45828 secs.2500000: 0.59558 secs.3000000: 0.73191 secs.3500000: 0.83499 secs.4000000: 0.99149 secs.4500000: 1.15395 secs.5000000: 1.29659 secs.\end{lstlisting}%$\noindentFor this you need to copy the \texttt{time\_needed} function and the \texttt{EVIL} regularexpression from the comments given in \texttt{re.scala}.\subsection*{Preliminaries}The task is to implement a regular expression matcher that is based onderivatives of regular expressions. Most of the functions are defined byrecursion over regular expressions and can be elegantly implementedusing Scala's pattern-matching. The implementation should deal with thefollowing regular expressions, which have been predefined in the file\texttt{re.scala}:\begin{center}\begin{tabular}{lcll} $r$ & $::=$ & $\ZERO$ & cannot match anything\\ & $|$ & $\ONE$ & can only match the empty string\\ & $|$ & $c$ & can match a single character (in this case $c$)\\ & $|$ & $r_1 + r_2$ & can match a string either with $r_1$ or with $r_2$\\ & $|$ & $r_1\cdot r_2$ & can match the first part of a string with $r_1$ and\\ & & & then the second part with $r_2$\\ & $|$ & $r^*$ & can match a string with zero or more copies of $r$\\\end{tabular}\end{center}\noindent Why? Regular expressions areone of the simplest ways to match patterns in text, andare endlessly useful for searching, editing and analysing data in allsorts of places (for example analysing network traffic in order todetect security breaches). However, you need to be fast, otherwise youwill stumble over problems such as recently reported at{\small\begin{itemize}\item[$\bullet$] \url{https://blog.cloudflare.com/details-of-the-cloudflare-outage-on-july-2-2019} \item[$\bullet$] \texttt{\href{https://web.archive.org/web/20160801163029/https://www.stackstatus.net/post/147710624694/outage-postmortem-july-20-2016}{https://stackstatus.net/post/147710624694/outage-postmortem-july-20-2016}}\item[$\bullet$] \url{https://vimeo.com/112065252}\item[$\bullet$] \url{https://davidvgalbraith.com/how-i-fixed-atom} \end{itemize}}% Knowing how to match regular expressions and strings will let you% solve a lot of problems that vex other humans.\subsubsection*{Tasks (file re.scala)}The file \texttt{re.scala} has already a definition for regularexpressions and also defines some handy shorthand notation for regularexpressions. The notation in this coursework description matches upwith the code as follows:\begin{center} \begin{tabular}{rcl@{\hspace{10mm}}l} & & code: & shorthand:\smallskip \\ $\ZERO$ & $\mapsto$ & \texttt{ZERO}\\ $\ONE$ & $\mapsto$ & \texttt{ONE}\\ $c$ & $\mapsto$ & \texttt{CHAR(c)}\\ $\sum rs$ & $\mapsto$ & \texttt{ALTs(rs)}\\ $r_1 + r_2$ & $\mapsto$ & \texttt{ALT(r1, r2)} & \texttt{r1 | r2}\\ $\prod rs$ & $\mapsto$ & \texttt{SEQs(rs)}\\ $r_1 \cdot r_2$ & $\mapsto$ & \texttt{SEQ(r1, r2)} & \texttt{r1 $\sim$ r2}\\ $r^*$ & $\mapsto$ & \texttt{STAR(r)} & \texttt{r.\%}\end{tabular} \end{center} \noindentThe alternative regular expression comes in two versions: one isbinary (+ / \texttt{ALT}) and the other is n-ary ($\sum$ /\texttt{ALTs}). The latter takes a list of regular expressions asargument. In what follows we shall use $rs$ to stand for lists ofregular expressions. When the list is empty, we shall write $\sum\,[]$;if it is non-empty, we sometimes write $\sum\,[r_1,..., r_n]$.The binary alternative can be seen as an abbreviation,that is $r_1 + r_2 \dn \sum\,[r_1, r_2]$. As a result we can ignore thebinary version and only implement the n-ary alternative. Similarlythe sequence regular expression is only implemented with lists and thebinary version can be obtained by defining $r_1 \cdot r_2 \dn \prod\,[r_1, r_2]$.\begin{itemize}\item[(1)] Implement a function, called \textit{nullable}, by recursion over regular expressions. This function tests whether a regular expression can match the empty string. This means given a regular expression, it either returns true or false. The function \textit{nullable} is defined as follows:\begin{center}\begin{tabular}{lcl}$\textit{nullable}(\ZERO)$ & $\dn$ & $\textit{false}$\\$\textit{nullable}(\ONE)$ & $\dn$ & $\textit{true}$\\$\textit{nullable}(c)$ & $\dn$ & $\textit{false}$\\$\textit{nullable}(\sum rs)$ & $\dn$ & $\exists r \in rs.\;\textit{nullable}(r)$\\$\textit{nullable}(\prod rs)$ & $\dn$ & $\forall r\in rs.\;\textit{nullable}(r)$\\$\textit{nullable}(r^*)$ & $\dn$ & $\textit{true}$\\\end{tabular}\end{center}~\hfill[0.5 Marks]\item[(2)] Implement a function, called \textit{der}, by recursion over regular expressions. It takes a character and a regular expression as arguments and calculates the \emph{derivative} of a regular expression according to the rules:\begin{center}\begin{tabular}{lcl}$\textit{der}\;c\;(\ZERO)$ & $\dn$ & $\ZERO$\\$\textit{der}\;c\;(\ONE)$ & $\dn$ & $\ZERO$\\$\textit{der}\;c\;(d)$ & $\dn$ & $\textit{if}\; c = d\;\textit{then} \;\ONE \; \textit{else} \;\ZERO$\\$\textit{der}\;c\;(\sum\;[r_1,..,r_n])$ & $\dn$ & $\sum\;[\textit{der}\;c\;r_1,..,\textit{der}\;c\;r_n]$\\$\textit{der}\;c\;(\prod\;[])$ & $\dn$ & $\ZERO$\\ $\textit{der}\;c\;(\prod\;r\!::\!rs)$ & $\dn$ & $\textit{if}\;\textit{nullable}(r)$\\ & & $\textit{then}\;(\prod\;(\textit{der}\;c\;r)\!::\!rs) + (\textit{der}\;c\;(\prod rs))$\\ & & $\textit{else}\;(\prod\;(\textit{der}\;c\;r)\!::\! rs)$\\$\textit{der}\;c\;(r^*)$ & $\dn$ & $(\textit{der}\;c\;r)\cdot (r^*)$\\\end{tabular}\end{center}\mbox{}\hfill\mbox{[1 Mark]}\item[(3)] We next want to simplify regular expressions: essentially we want to remove $\ZERO$ in regular expressions like $r + \ZERO$ and $\ZERO + r$. However, our n-ary alternative takes a list of regular expressions as argument, and we therefore need a more general ``denesting'' function, which deletes $\ZERO$s and ``spills out'' nested $\sum$s. This function, called \texttt{denest}, should analyse a list of regular expressions, say $rs$, as follows: \begin{center} \begin{tabular}{lllll} 1) &$rs = []$ & $\dn$ & $[]$ & (empty list)\\ 2) &$rs = \ZERO :: rest$ & $\dn$ & $\texttt{denest}\;rest$ & (throw away $\ZERO$)\\ 3) &$rs = (\sum rs) :: rest$ & $\dn$ & $rs ::: \texttt{denest}\;rest$ & (spill out $\sum$)\\ 4) &$rs = r :: rest$ & $\dn$ & $r :: \texttt{denest}\;rest$ & (otherwise)\\ \end{tabular} \end{center} The first clause states that empty lists cannot be further denested. The second removes the first $\ZERO$ from the list and recurses. The third is when the first regular expression is an \texttt{ALTs}, then the content of this alternative should be spilled out and appended with the denested rest of the list. The last case is for all other cases where the head of the list is not $\ZERO$ and not an \texttt{ALTs}, then we just keep the head of the list and denest the rest.\\ \mbox{}\hfill\mbox{[1 Mark]}\item[(4)] Implement the function \texttt{flts} which flattens our n-ary sequence regular expression $\prod$. Like \texttt{denest}, this function is intended to delete $\ONE$s and spill out nested $\prod$s. Unfortunately, there is a special case to do with $\ZERO$: If this function encounters a $\ZERO$, then the whole ``product'' should be $\ZERO$. The problem is that the $\ZERO$ can be anywhere inside the list. The easiest way to implement this function is therefore by using an accumulator, which when called is set to \texttt{Nil}. This means \textit{flts} takes two arguments (which are both lists of regular expressions) \[ \texttt{flts}\;rs\;acc \] This function analyses the list $rs$ as follows \begin{center} \begin{tabular}{@{}lllll@{}} 1) &$rs = []$ & $\dn$ & $acc$ & (empty list)\\ 2) &$rs = \ZERO :: rest$ & $\dn$ & $[\ZERO]$ & (special case for $\ZERO$)\\ 3) &$rs = \ONE :: rest$ & $\dn$ & $\texttt{flts}\,rest\,acc$ & (throw away $\ONE$)\\ 4) &$rs = (\prod rs) :: rest$ & $\dn$ & $\texttt{flts}\;rest\,(acc ::: rs)$ & (spill out $\prod$)\\ 5) &$rs = r :: rest$ & $\dn$ & $\texttt{flts}\;rest\,(acc ::: [r])$& (otherwise)\\ \end{tabular} \end{center} In the first case we just return whatever has accumulated in $acc$. In the fourth case we spill out the $rs$ by appending the $rs$ to the end of the accumulator. Similarly in the last case we append the single regular expression $r$ to the end of the accumulator. I let you think why you have to add it to the end. \mbox{}\hfill\mbox{[1 Mark]}\item[(5)] Before we can simplify regular expressions, we need what is often called \emph{smart constructors} for $\sum$ and $\prod$. While the ``normal'' constructors \texttt{ALTs} and \texttt{SEQs} give us alternatives and sequences, respectively, \emph{smart} constructors might return something different depending on what list of regular expressions they are given as argument. \begin{center} \begin{tabular}{@{}c@{\hspace{9mm}}c@{}} \begin{tabular}{l@{\hspace{2mm}}l@{\hspace{1mm}}ll} & $\sum^{smart}$\smallskip\\ 1) & $rs = []$ & $\dn$ & $\ZERO$\\ 2) & $rs = [r]$ & $\dn$ & $r$\\ \\ 3) & otherwise & $\dn$ & $\sum\,rs$ \end{tabular} & \begin{tabular}{l@{\hspace{2mm}}l@{\hspace{1mm}}ll} & $\prod^{smart}$\smallskip\\ 1) & $rs = []$ & $\dn$ & $\ONE$\\ 2a) & $rs = [\ZERO]$ & $\dn$ & $\ZERO$\\ 2b) & $rs = [r]$ & $\dn$ & $r$\\ 3) & otherwise & $\dn$ & $\prod\,rs$ \end{tabular} \end{tabular} \end{center} \mbox{}\hfill\mbox{[0.5 Marks]}\item[(6)] Implement the function \textit{simp}, which recursively traverses a regular expression, and on the way up simplifies every regular expression on the left (see below) to the regular expression on the right, except it does not simplify inside ${}^*$-regular expressions and also does not simplify $\ZERO$, $\ONE$ and characters. \begin{center} \begin{tabular}{@{}l@{\hspace{3mm}}c@{\hspace{3mm}}ll@{}} LHS: & & RHS:\smallskip\\$\sum\;[r_1,..,r_n]$ & $\mapsto$ & $\sum^{smart}\;(\texttt{(denest + distinct)}[simp(r_1),..,simp(r_n)])$\\ $\prod\;[r_1,..,r_n]$ & $\mapsto$ & $\prod^{smart}\;(\texttt{(flts)}[simp(r_1),..,simp(r_n)])$\\ $r$ & $\mapsto$ & $r$ \quad (all other cases)\end{tabular}\end{center}The first case is as follows: first apply $simp$ to all regularexpressions $r_1,.. ,r_n$; then denest the resulting list using\texttt{denest}; after that remove all duplicates in this list (this can bedone in Scala using the function\texttt{\_.distinct}). Finally, you end up with a list of (simplified)regular expressions; apply the smart constructor $\sum^{smart}$ to this list.Similarly in the $\prod$ case: simplify first all regularexpressions $r_1,.. ,r_n$; then flatten the resulting list using \texttt{flts} and apply thesmart constructor $\prod^{smart}$ to the result. In all other cases, just return theinput $r$ as is. For example the regular expression \[(r_1 + \ZERO) \cdot \ONE + ((\ONE + r_2) + r_3) \cdot (r_4 \cdot \ZERO)\] simplifies to just $r_1$. \mbox{}\hfill\mbox{[1 Mark]}\item[(7)] Implement two functions: The first, called \textit{ders}, takes a list of characters and a regular expression as arguments, and builds the derivative w.r.t.~the list as follows:\begin{center}\begin{tabular}{lcl}$\textit{ders}\;(Nil)\;r$ & $\dn$ & $r$\\ $\textit{ders}\;(c::cs)\;r$ & $\dn$ & $\textit{ders}\;cs\;(\textit{simp}\,(\textit{der}\;c\;r))$\\\end{tabular}\end{center}Note that this function is different from \textit{der}, which onlytakes a single character.The second function, called \textit{matcher}, takes a string and aregular expression as arguments. It builds first the derivativesaccording to \textit{ders} and after that tests whether the resultingderivative regular expression can match the empty string (using\textit{nullable}). For example the \textit{matcher} will producetrue for the regular expression $(a\cdot b)\cdot c$ and the string$abc$, but false if you give it the string $ab$. \hfill[0.5 Mark]\item[(8)] Implement a function, called \textit{size}, by recursion over regular expressions. If a regular expression is seen as a tree, then \textit{size} should return the number of nodes in such a tree. Therefore this function is defined as follows:\begin{center}\begin{tabular}{lcl}$\textit{size}(\ZERO)$ & $\dn$ & $1$\\$\textit{size}(\ONE)$ & $\dn$ & $1$\\$\textit{size}(c)$ & $\dn$ & $1$\\$\textit{size}(\sum\,[r_1,..,r_n]$ & $\dn$ & $1 + \textit{size}(r_1) + ... + \textit{size}(r_n)$\\$\textit{size}(\prod\,[r_1,..,r_n]$ & $\dn$ & $1 + \textit{size}(r_1) + ... + \textit{size}(r_n)$\\$\textit{size}(r^*)$ & $\dn$ & $1 + \textit{size}(r)$\\\end{tabular}\end{center}You can use \textit{size} in order to test how much the ``evil'' regularexpression $(a^*)^* \cdot b$ grows when taking successive derivativesaccording the letter $a$ without simplification and then compare it totaking the derivative, but simplify the result. The sizesare given in \texttt{re.scala}. \hfill[0.5 Marks]\item[(9)] You do not have to implement anything specific under this task. The purpose here is that you will be marked for some ``power'' test cases. For example can your matcher decide within 30 seconds whether the regular expression $(a^*)^*\cdot b$ matches strings of the form $aaa\ldots{}aaaa$, for say 1 Million $a$'s. And does simplification simplify the regular expression \[ \texttt{SEQ(SEQ(SEQ(..., ONE | ONE) , ONE | ONE), ONE | ONE)} \] \noindent correctly to just \texttt{ONE}, where \texttt{SEQ} is nested 50 or more times?\\ \mbox{}\hfill[1 Mark]\end{itemize}\subsection*{Background}Although easily implementable in Scala (ok maybe the \texttt{simp} functions andthe constructors \texttt{ALTs}/\texttt{SEQs} needs a bit more thinking), the idea behind thederivative function might not so easy to be seen. To understand itspurpose better, assume a regular expression $r$ can match strings ofthe form $c\!::\!cs$ (that means strings which start with a character$c$ and have some rest, or tail, $cs$). If you take the derivative of$r$ with respect to the character $c$, then you obtain a regularexpression that can match all the strings $cs$. In other words, theregular expression $\textit{der}\;c\;r$ can match the same strings$c\!::\!cs$ that can be matched by $r$, except that the $c$ is choppedoff.Assume now $r$ can match the string $abc$. If you take the derivativeaccording to $a$ then you obtain a regular expression that can match$bc$ (it is $abc$ where the $a$ has been chopped off). If you nowbuild the derivative $\textit{der}\;b\;(\textit{der}\;a\;r)$ youobtain a regular expression that can match the string $c$ (it is $bc$where $b$ is chopped off). If you finally build the derivative of thisaccording $c$, that is$\textit{der}\;c\;(\textit{der}\;b\;(\textit{der}\;a\;r))$, you obtaina regular expression that can match the empty string. You can testwhether this is indeed the case using the function nullable, which iswhat the matcher you have implemented is doing.The purpose of the $\textit{simp}$ function is to keep the regularexpressions small. Normally the derivative function makes the regularexpression bigger (see the \texttt{SEQs} case and the example in Task (2)) and thealgorithm would be slower and slower over time. The $\textit{simp}$function counters this increase in size and the result is that thealgorithm is fast throughout. By the way, this algorithm is by JanuszBrzozowski who came up with the idea of derivatives in 1964 in his PhDthesis.\begin{center}\small\url{https://en.wikipedia.org/wiki/Janusz_Brzozowski_(computer_scientist)}\end{center}If you want to see how badly the regular expression matchers do inJava\footnote{Version 8 and below; Version 9 and above does not seem to be as catastrophic, but still much worse than the regular expression matcher based on derivatives. BTW, Scala uses the regular expression matcher provided by the Java libraries. So is just as bad.}, JavaScript,Python Swift and Dart with the ``evil'' regular expression$(a^*)^*\cdot b$, then have a look at the graphs below (you can try itout for yourself: have a look at the files\texttt{catastrophic9.java}, \texttt{catastrophic.js},\texttt{catastrophic.py} etc on KEATS). Compare this with the matcheryou have implemented. How long can a string of $a$'s be in yourmatcher and still stay within the 30 seconds time limit? It should bemu(uu)$^*$ch better than your off-the-shelf matcher in yourbog-standard programming language.\begin{center}\begin{tabular}{@{}cc@{}}\multicolumn{2}{c}{Graph: $(a^*)^*\cdot b$ and strings $\underbrace{a\ldots a}_{n}$}\medskip\\\begin{tikzpicture}\begin{axis}[ xlabel={$n$}, x label style={at={(1.05,0.0)}}, ylabel={time in secs}, y label style={at={(0.06,0.5)}}, enlargelimits=false, xtick={0,5,...,30}, xmax=33, ymax=45, ytick={0,5,...,40}, scaled ticks=false, axis lines=left, width=6cm, height=5.5cm, legend entries={Python, Java 8, JavaScript, Swift, Dart}, legend pos=north west, legend cell align=left]\addplot[blue,mark=*, mark options={fill=white}] table {re-python2.data};\addplot[cyan,mark=*, mark options={fill=white}] table {re-java.data};\addplot[red,mark=*, mark options={fill=white}] table {re-js.data};\addplot[magenta,mark=*, mark options={fill=white}] table {re-swift.data};\addplot[brown,mark=*, mark options={fill=white}] table {re-dart.data};\end{axis}\end{tikzpicture} & \begin{tikzpicture}\begin{axis}[ xlabel={$n$}, x label style={at={(1.05,0.0)}}, ylabel={time in secs}, y label style={at={(0.06,0.5)}}, %enlargelimits=false, %xtick={0,5000,...,30000}, xmax=65000, ymax=45, ytick={0,5,...,40}, scaled ticks=false, axis lines=left, width=6cm, height=5.5cm, legend entries={Java 9}, legend pos=north west]\addplot[cyan,mark=*, mark options={fill=white}] table {re-java9.data};\end{axis}\end{tikzpicture}\end{tabular} \end{center}%\end{document}\newpage\noindentFor the calculation below, I prefer to use the more ``mathematical''notation for regular expressions. Therefore let us first look at thisnotation and the corresponding Scala code.\begin{center}\begin{tabular}{r@{\hspace{10mm}}l} ``mathematical'' notation & \\ for regular expressions & Scala code\smallskip\\ $\ZERO$ & \texttt{ZERO}\\ $\ONE$ & \texttt{ONE}\\ $c$ & \texttt{CHAR(c)}\\ $\sum rs$ & \texttt{ALTs(rs)}\\ $\prod rs$ & \texttt{SEQs(rs)}\\ $r^*$ & \texttt{STAR(r)}\end{tabular}\end{center}\noindentMy own convention is that \texttt{rs} stands for a list of regularexpressions. Also of note is that these are \textbf{all} regularexpressions in Main 3 and the template file defines them as the(algebraic) datatype \texttt{Rexp}. A confusion might arise from thefact that there is also some shorthand notation for some regularexpressions, namely\begin{lstlisting}[xleftmargin=10mm,numbers=none]def ALT(r1: Rexp, r2: Rexp) = ALTs(List(r1, r2))def SEQ(r1: Rexp, r2: Rexp) = SEQs(List(r1, r2))\end{lstlisting}\noindentSince these are functions, everything of the form \texttt{ALT(r1, r2)}will immediately be translated into the regular expression\texttt{ALTs(List(r1, r2))} (similarly for \texttt{SEQ}). Maybe evenmore confusing is that Scala allows one to define\textit{extensions} that provide an even shorter notation for\texttt{ALT} and \texttt{SEQ}, namely\begin{center}\begin{tabular}{lclcl} \texttt{r1} $\sim$ \texttt{r2} & $\dn$ & \texttt{SEQ(r1, r2)} & $\dn$ & \texttt{SEQs(List(r1, r2))}\\ \texttt{r1} $|$ \texttt{r2} & $\dn$ & \texttt{ALT(r1, r2)} & $\dn$ & \texttt{ALTs(List(r1, r2))}\\\end{tabular}\end{center}\noindentThe right hand sides are the fully expanded definitions.The reason for this even shorter notation is that in the mathematicalnotation one often writes \begin{center}\begin{tabular}{lcl} $r_1 \;\cdot\; r_2$ & $\dn$ & $\prod\;[r_1, r_2]$\\ $r_1 + r_2$ & $\dn$ & $\sum\;[r_1, r_2]$\end{tabular}\end{center}\noindentThe first one is for binary \textit{sequence} regular expressions andthe second for binary \textit{alternative} regular expressions.The regex in question in the shorthand notation is $(a + 1)\cdot a$,which is the same as\[\prod\; [\Sigma\,[a, 1], a]\]\noindentor in Scala code\[\texttt{(CHAR('a') | ONE)} \;\sim\; \texttt{CHAR('a')}\] \noindentUsing the mathematical notation, the definition of $\textit{der}$ isgiven by the rules:\begin{center}\begin{tabular}{llcl}(1) & $\textit{der}\;c\;(\ZERO)$ & $\dn$ & $\ZERO$\\(2) & $\textit{der}\;c\;(\ONE)$ & $\dn$ & $\ZERO$\\(3) & $\textit{der}\;c\;(d)$ & $\dn$ & $\textit{if}\; c = d\;\textit{then} \;\ONE \; \textit{else} \;\ZERO$\\(4) & $\textit{der}\;c\;(\sum\;[r_1,..,r_n])$ & $\dn$ & $\sum\;[\textit{der}\;c\;r_1,..,\textit{der}\;c\;r_n]$\\(5) & $\textit{der}\;c\;(\prod\;[])$ & $\dn$ & $\ZERO$\\ (6) & $\textit{der}\;c\;(\prod\;r\!::\!rs)$ & $\dn$ & $\textit{if}\;\textit{nullable}(r)$\\ & & & $\textit{then}\;(\prod\;(\textit{der}\;c\;r)\!::\!rs) + (\textit{der}\;c\;(\prod rs))$\\ & & & $\textit{else}\;(\prod\;(\textit{der}\;c\;r)\!::\! rs)$\\(7) & $\textit{der}\;c\;(r^*)$ & $\dn$ & $(\textit{der}\;c\;r)\cdot (r^*)$\\\end{tabular}\end{center}\noindentLet's finally do the calculation for the derivative of the regularexpression with respect to the letter $a$ (in red is in each line whichregular expression is ana-lysed):\begin{center}\begin{tabular}{cll} & $\textit{der}(a, \textcolor{red}{(a + 1) \cdot a})$ & by (6) and since $a + 1$ is nullable\\$\dn$ & $(\textit{der}(a, \textcolor{red}{a + 1})\cdot a) + \textit{der}(a, \,\prod\,[a])$ & by (4)\\$\dn$ & $((\textit{der}(a, \textcolor{red}{a}) + \texttt{der}(a, \ONE))\cdot a) + \textit{der}(a, \,\prod\,[a])$& by (3)\\$\dn$ & $((\ONE + \texttt{der}(a, \textcolor{red}{1}))\cdot a) + \textit{der}(a, \,\prod\,[a])$ & by (2)\\$\dn$ & $((\ONE + \ZERO)\cdot a) + \textit{der}(a, \textcolor{red}{\prod\,[a]})$ & by (6) and $a$ not being nullable\\$\dn$ & $((\ONE + \ZERO)\cdot a) + \prod\,[\texttt{der}(a, \textcolor{red}{a})]$ & by (3)\\$\dn$ & $((\ONE + \ZERO)\cdot a) + \prod\,[\ONE]$ \\ \end{tabular} \end{center}\noindentTranslating this result back into Scala code gives you\[\texttt{ALT(\,} \underbrace{\texttt{(ONE | ZERO)} \sim \texttt{CHAR('a')}}_{(\textbf{1} + \textbf{0})\,\cdot\, a}\;,\;\underbrace{\texttt{SEQs(List(ONE))}}_{\prod\,[\textbf{1}]}\texttt{)}\]\end{document}For example given the regular expression $r = (a \cdot b) \cdot c$, the derivativesw.r.t.~the characters $a$, $b$ and $c$ are\begin{center} \begin{tabular}{lcll} $\textit{der}\;a\;r$ & $=$ & $(\ONE \cdot b)\cdot c$ & \quad($= r'$)\\ $\textit{der}\;b\;r$ & $=$ & $(\ZERO \cdot b)\cdot c$\\ $\textit{der}\;c\;r$ & $=$ & $(\ZERO \cdot b)\cdot c$ \end{tabular}\end{center}Let $r'$ stand for the first derivative, then taking the derivatives of $r'$w.r.t.~the characters $a$, $b$ and $c$ gives\begin{center} \begin{tabular}{lcll} $\textit{der}\;a\;r'$ & $=$ & $((\ZERO \cdot b) + \ZERO)\cdot c$ \\ $\textit{der}\;b\;r'$ & $=$ & $((\ZERO \cdot b) + \ONE)\cdot c$ & \quad($= r''$)\\ $\textit{der}\;c\;r'$ & $=$ & $((\ZERO \cdot b) + \ZERO)\cdot c$ \end{tabular}\end{center}One more example: Let $r''$ stand for the second derivative above,then taking the derivatives of $r''$ w.r.t.~the characters $a$, $b$and $c$ gives\begin{center} \begin{tabular}{lcll} $\textit{der}\;a\;r''$ & $=$ & $((\ZERO \cdot b) + \ZERO) \cdot c + \ZERO$ \\ $\textit{der}\;b\;r''$ & $=$ & $((\ZERO \cdot b) + \ZERO) \cdot c + \ZERO$\\ $\textit{der}\;c\;r''$ & $=$ & $((\ZERO \cdot b) + \ZERO) \cdot c + \ONE$ & (is $\textit{nullable}$) \end{tabular}\end{center}Note, the last derivative can match the empty string, that is it is \textit{nullable}.%%% Local Variables: %%% mode: latex%%% TeX-master: t%%% End: