Binary file cws/cw04.pdf has changed
--- a/cws/cw04.tex Wed Nov 28 23:26:47 2018 +0000
+++ b/cws/cw04.tex Thu Nov 29 17:15:11 2018 +0000
@@ -46,6 +46,20 @@
28 29.81185
\end{filecontents}
+\begin{filecontents}{re-js.data}
+5 0.061
+10 0.061
+15 0.061
+20 0.070
+23 0.131
+25 0.308
+26 0.564
+28 1.994
+30 7.648
+31 15.881
+32 32.190
+\end{filecontents}
+
\begin{filecontents}{re-java9.data}
1000 0.01410
2000 0.04882
@@ -79,16 +93,18 @@
% BF IDE
% https://www.microsoft.com/en-us/p/brainf-ck/9nblgggzhvq5
-\section*{Coursework 8 (Regular Expressions and Brainf***)}
+\section*{Coursework 9 (Scala)}
-This coursework is worth 10\%. It is about regular expressions,
-pattern matching and an interpreter. The first part is due on 30
-November at 11pm; the second, more advanced part, is due on 21
-December at 11pm. In the first part, you are asked to implement a
+This coursework is worth 10\%. It is about a regular expression
+matcher and the shunting yard algorithm by Dijkstra. The first part is
+due on 6 December at 11pm; the second, more advanced part, is due on
+21 December at 11pm. In the first part, you are asked to implement a
regular expression matcher based on derivatives of regular
-expressions. The reason is that regular expression matching in Java
-and Python can sometimes be extremely slow. The advanced part is about
-an interpreter for a very simple programming language.\bigskip
+expressions. The background is that regular expression matching in
+languages like Java, JavaScript and Python can sometimes be excruciatingly
+slow. The advanced part is about the shunting yard algorithm that
+transforms the usual infix notation of arithmetic expressions into the
+postfix notation, which is for example used in compilers.\bigskip
\IMPORTANT{}
@@ -98,6 +114,40 @@
\DISCLAIMER{}
+\subsection*{Reference Implementation}
+
+This Scala assignment comes with three reference implementations in form of
+\texttt{jar}-files. This allows you to run any test cases on your own
+computer. For example you can call Scala on the command line with the
+option \texttt{-cp re.jar} and then query any function from the
+\texttt{re.scala} template file. As usual you have to
+prefix the calls with \texttt{CW9a}, \texttt{CW9b} and \texttt{CW9c}.
+Since some tasks are time sensitive, you can check the reference
+implementation as follows: if you want to know how long it takes
+to match strings of $a$'s using the regular expression $(a^*)^*\cdot b$
+you can querry as follows:
+
+
+
+\begin{lstlisting}[language={},xleftmargin=1mm,numbers=none,basicstyle=\ttfamily\small]
+$ scala -cp re.jar
+scala> import CW9a._
+scala> for (i <- 0 to 5000000 by 500000) {
+ | println(i + " " + "%.5f".format(time_needed(2, matcher(EVIL, "a" * i))) + "secs.")
+ | }
+0 0.00002 secs.
+500000 0.10608 secs.
+1000000 0.22286 secs.
+1500000 0.35982 secs.
+2000000 0.45828 secs.
+2500000 0.59558 secs.
+3000000 0.73191 secs.
+3500000 0.83499 secs.
+4000000 0.99149 secs.
+4500000 1.15395 secs.
+5000000 1.29659 secs.
+\end{lstlisting}%$
+
\subsection*{Part 1 (6 Marks)}
@@ -116,14 +166,13 @@
& $|$ & $r_1 + r_2$ & can match a string either with $r_1$ or with $r_2$\\
& $|$ & $r_1\cdot r_2$ & can match the first part of a string with $r_1$ and\\
& & & then the second part with $r_2$\\
- & $|$ & $r^*$ & can match zero or more times $r$\\
+ & $|$ & $r^*$ & can match a string with zero or more copies of $r$\\
\end{tabular}
\end{center}
\noindent
-Why? Knowing how to match regular expressions and strings will let you
-solve a lot of problems that vex other humans. Regular expressions are
-one of the fastest and simplest ways to match patterns in text, and
+Why? Regular expressions are
+one of the simplest ways to match patterns in text, and
are endlessly useful for searching, editing and analysing data in all
sorts of places (for example analysing network traffic in order to
detect security breaches). However, you need to be fast, otherwise you
@@ -136,6 +185,10 @@
\item[$\bullet$] \url{http://davidvgalbraith.com/how-i-fixed-atom/}
\end{itemize}}
+% Knowing how to match regular expressions and strings will let you
+% solve a lot of problems that vex other humans.
+
+
\subsubsection*{Tasks (file re.scala)}
The file \texttt{re.scala} has already a definition for regular
@@ -157,7 +210,7 @@
\begin{itemize}
-\item[(1a)] Implement a function, called \textit{nullable}, by
+\item[(1)] Implement a function, called \textit{nullable}, by
recursion over regular expressions. This function tests whether a
regular expression can match the empty string. This means given a
regular expression it either returns true or false. The function
@@ -175,7 +228,7 @@
\end{tabular}
\end{center}~\hfill[1 Mark]
-\item[(1b)] Implement a function, called \textit{der}, by recursion over
+\item[(2)] Implement a function, called \textit{der}, by recursion over
regular expressions. It takes a character and a regular expression
as arguments and calculates the derivative regular expression according
to the rules:
@@ -198,7 +251,7 @@
\begin{center}
\begin{tabular}{lcll}
- $\textit{der}\;a\;r$ & $=$ & $(\ONE \cdot b)\cdot c$ & ($= r'$)\\
+ $\textit{der}\;a\;r$ & $=$ & $(\ONE \cdot b)\cdot c$ & \quad($= r'$)\\
$\textit{der}\;b\;r$ & $=$ & $(\ZERO \cdot b)\cdot c$\\
$\textit{der}\;c\;r$ & $=$ & $(\ZERO \cdot b)\cdot c$
\end{tabular}
@@ -210,7 +263,7 @@
\begin{center}
\begin{tabular}{lcll}
$\textit{der}\;a\;r'$ & $=$ & $((\ZERO \cdot b) + \ZERO)\cdot c$ \\
- $\textit{der}\;b\;r'$ & $=$ & $((\ZERO \cdot b) + \ONE)\cdot c$ & ($= r''$)\\
+ $\textit{der}\;b\;r'$ & $=$ & $((\ZERO \cdot b) + \ONE)\cdot c$ & \quad($= r''$)\\
$\textit{der}\;c\;r'$ & $=$ & $((\ZERO \cdot b) + \ZERO)\cdot c$
\end{tabular}
\end{center}
@@ -231,7 +284,7 @@
Note, the last derivative can match the empty string, that is it is \textit{nullable}.\\
\mbox{}\hfill\mbox{[1 Mark]}
-\item[(1c)] Implement the function \textit{simp}, which recursively
+\item[(3)] Implement the function \textit{simp}, which recursively
traverses a regular expression from the inside to the outside, and
on the way simplifies every regular expression on the left (see
below) to the regular expression on the right, except it does not
@@ -255,7 +308,9 @@
simplifies to just $r_1$. \textbf{Hint:} Regular expressions can be
seen as trees and there are several methods for traversing
trees. One of them corresponds to the inside-out traversal, which is
- sometimes also called post-order traversal. Furthermore,
+ sometimes also called post-order traversal'' you traverse inside the
+ tree and on the way up, you apply simplification rules.
+ Furthermore,
remember numerical expressions from school times: there you had expressions
like $u + \ldots + (1 \cdot x) - \ldots (z + (y \cdot 0)) \ldots$
and simplification rules that looked very similar to rules
@@ -263,9 +318,9 @@
for example the $y \cdot 0$ by $0$, or $1\cdot x$ by $x$, and then
look whether more rules are applicable. If you organise the
simplification in an inside-out fashion, it is always clear which
- rule should be applied next.\hfill[2 Marks]
+ rule should be applied next.\hfill[1 Mark]
-\item[(1d)] Implement two functions: The first, called \textit{ders},
+\item[(4)] Implement two functions: The first, called \textit{ders},
takes a list of characters and a regular expression as arguments, and
builds the derivative w.r.t.~the list as follows:
@@ -288,7 +343,7 @@
true for the regular expression $(a\cdot b)\cdot c$ and the string
$abc$, but false if you give it the string $ab$. \hfill[1 Mark]
-\item[(1e)] Implement a function, called \textit{size}, by recursion
+\item[(5)] Implement a function, called \textit{size}, by recursion
over regular expressions. If a regular expression is seen as a tree,
then \textit{size} should return the number of nodes in such a
tree. Therefore this function is defined as follows:
@@ -309,6 +364,21 @@
according the letter $a$ without simplification and then compare it to
taking the derivative, but simplify the result. The sizes
are given in \texttt{re.scala}. \hfill[1 Mark]
+
+\item[(6)] You do not have to implement anything specific under this
+ task. The purpose here is that you will be marked for some ``power''
+ test cases. For example can your matcher decide within 30 seconds
+ whether the regular expression $(a^*)^*\cdot b$ matches strings of the
+ form $aaa\ldots{}aaaa$, for say 1 Million $a$'s. And does simplification
+ simplify the regular expression
+
+ \[
+ \texttt{SEQ(SEQ(SEQ(..., ONE | ONE) , ONE | ONE), ONE | ONE)}
+ \]
+
+ \noindent correctly to just \texttt{ONE}, where \texttt{SEQ} is nested
+ 50 or more times.\\
+ \mbox{}\hfill[1 Mark]
\end{itemize}
\subsection*{Background}
@@ -337,7 +407,7 @@
The purpose of the $\textit{simp}$ function is to keep the regular
expressions small. Normally the derivative function makes the regular
-expression bigger (see the SEQ case and the example in (1b)) and the
+expression bigger (see the SEQ case and the example in (2)) and the
algorithm would be slower and slower over time. The $\textit{simp}$
function counters this increase in size and the result is that the
algorithm is fast throughout. By the way, this algorithm is by Janusz
@@ -350,15 +420,15 @@
If you want to see how badly the regular expression matchers do in
-Java\footnote{Version 8 and below; Version 9 does not seem to be as
- catastrophic, but still worse than the regular expression matcher
-based on derivatives.} and in Python with the `evil' regular
-expression $(a^*)^*\cdot b$, then have a look at the graphs below (you
-can try it out for yourself: have a look at the file
-\texttt{catastrophic.java} and \texttt{catastrophic.py} on
-KEATS). Compare this with the matcher you have implemented. How long
-can the string of $a$'s be in your matcher and still stay within the
-30 seconds time limit?
+Java\footnote{Version 8 and below; Version 9 and above does not seem to be as
+ catastrophic, but still much worse than the regular expression
+ matcher based on derivatives.}, JavaScript and Python with the
+`evil' regular expression $(a^*)^*\cdot b$, then have a look at the
+graphs below (you can try it out for yourself: have a look at the file
+\texttt{catastrophic9.java}, \texttt{catastrophic.js} and
+\texttt{catastrophic.py} on KEATS). Compare this with the matcher you
+have implemented. How long can the string of $a$'s be in your matcher
+and still stay within the 30 seconds time limit?
\begin{center}
\begin{tabular}{@{}cc@{}}
@@ -380,10 +450,11 @@
axis lines=left,
width=6cm,
height=5.5cm,
- legend entries={Python, Java 8},
+ legend entries={Python, Java 8, JavaScript},
legend pos=north west]
\addplot[blue,mark=*, mark options={fill=white}] table {re-python2.data};
\addplot[cyan,mark=*, mark options={fill=white}] table {re-java.data};
+\addplot[red,mark=*, mark options={fill=white}] table {re-js.data};
\end{axis}
\end{tikzpicture}
&
@@ -413,231 +484,109 @@
\subsection*{Part 2 (4 Marks)}
-Coming from Java or C++, you might think Scala is a quite esoteric
-programming language. But remember, some serious companies have built
-their business on
-Scala.\footnote{\url{https://en.wikipedia.org/wiki/Scala_(programming_language)\#Companies}}
-And there are far, far more esoteric languages out there. One is
-called \emph{brainf***}. You are asked in this part to implement an
-interpreter for this language.
+The \emph{Shunting Yard Algorithm} has been developed by Edsger Dijkstra,
+an influential computer scientist who developed many well-known
+algorithms. This algorithm transforms the usual infix notation of
+arithmetic expressions into the postfix notation, sometimes also
+called reverse Polish notation.
-Urban M\"uller developed brainf*** in 1993. A close relative of this
-language was already introduced in 1964 by Corado B\"ohm, an Italian
-computer pioneer, who unfortunately died a few months ago. The main
-feature of brainf*** is its minimalistic set of instructions---just 8
-instructions in total and all of which are single characters. Despite
-the minimalism, this language has been shown to be Turing
-complete\ldots{}if this doesn't ring any bell with you: it roughly
-means that every algorithm we know can, in principle, be implemented in
-brainf***. It just takes a lot of determination and quite a lot of
-memory resources. Some relatively sophisticated sample programs in
-brainf*** are given in the file \texttt{bf.scala}.\bigskip
+Why on Earth do people use the postfix notation? It is much more
+convenient to work with the usual infix notation for arithmetic
+expressions. Most modern calculators (as opposed to the ones used 20
+years ago) understand infix notation. So why on Earth? \ldots{}Well,
+many computers under the hood, even nowadays, use postfix notation
+extensively. For example if you give to the Java compiler the
+expression $1 + ((2 * 3) + (4 - 3))$, it will generate for it the Java Byte
+code
+
+\begin{lstlisting}[language=JVMIS,numbers=none]
+ldc 1
+ldc 2
+ldc 3
+imul
+ldc 4
+ldc 3
+isub
+iadd
+iadd
+\end{lstlisting}
\noindent
-As mentioned above, brainf*** has 8 single-character commands, namely
-\texttt{'>'}, \texttt{'<'}, \texttt{'+'}, \texttt{'-'}, \texttt{'.'},
-\texttt{','}, \texttt{'['} and \texttt{']'}. Every other character is
-considered a comment. Brainf*** operates on memory cells containing
-integers. For this it uses a single memory pointer that points at each
-stage to one memory cell. This pointer can be moved forward by one
-memory cell by using the command \texttt{'>'}, and backward by using
-\texttt{'<'}. The commands \texttt{'+'} and \texttt{'-'} increase,
-respectively decrease, by 1 the content of the memory cell to which
-the memory pointer currently points to. The commands for input/output
-are \texttt{','} and \texttt{'.'}. Output works by reading the content
-of the memory cell to which the memory pointer points to and printing
-it out as an ASCII character. Input works the other way, taking some
-user input and storing it in the cell to which the memory pointer
-points to. The commands \texttt{'['} and \texttt{']'} are looping
-constructs. Everything in between \texttt{'['} and \texttt{']'} is
-repeated until a counter (memory cell) reaches zero. A typical
-program in brainf*** looks as follows:
-
-\begin{center}
-\begin{verbatim}
- ++++++++[>++++[>++>+++>+++>+<<<<-]>+>+>->>+[<]<-]>>.>---.+++++++
- ..+++.>>.<-.<.+++.------.--------.>>+.>++.
-\end{verbatim}
-\end{center}
+where the command \texttt{ldc} loads a constant onto a stack, and \texttt{imul},
+\texttt{isub} and \texttt{iadd} are commands acting on the stack. Clearly this
+is the arithmetic expression in postfix notation.\bigskip
\noindent
-This one prints out Hello World\ldots{}obviously.
-
-\subsubsection*{Tasks (file bf.scala)}
+The shunting yard algorithm processes an input token list using an
+operator stack and an output list. The input consists of numbers,
+operators ($+$, $-$, $*$, $/$) and parentheses, and for the purpose of
+the assignment we assume the input is always a well-formed expression
+in infix notation. The algorithm uses information about the
+precedences of the operators (given in the template file). The
+algorithm processes the input token list as follows:
\begin{itemize}
-\item[(2a)] Brainf*** memory is represented by a \texttt{Map} from
- integers to integers. The empty memory is represented by
- \texttt{Map()}, that is nothing is stored in the
- memory. \texttt{Map(0 -> 1, 2 -> 3)} clearly stores \texttt{1} at
- memory location \texttt{0}; at \texttt{2} it stores \texttt{3}. The
- convention is that if we query the memory at a location that is
- \emph{not} defined in the \texttt{Map}, we return \texttt{0}. Write
- a function, \texttt{sread}, that takes a memory (a \texttt{Map}) and
- a memory pointer (an \texttt{Int}) as argument, and safely reads the
- corresponding memory location. If the \texttt{Map} is not defined at
- the memory pointer, \texttt{sread} returns \texttt{0}.
-
- Write another function \texttt{write}, which takes a memory, a
- memory pointer and an integer value as argument and updates the
- \texttt{Map} with the value at the given memory location. As usual
- the \texttt{Map} is not updated `in-place' but a new map is created
- with the same data, except the value is stored at the given memory
- pointer.\hfill[1 Mark]
-
-\item[(2b)] Write two functions, \texttt{jumpRight} and
- \texttt{jumpLeft} that are needed to implement the loop constructs
- of brainf***. They take a program (a \texttt{String}) and a program
- counter (an \texttt{Int}) as argument and move right (respectively
- left) in the string in order to find the \textbf{matching}
- opening/closing bracket. For example, given the following program
- with the program counter indicated by an arrow:
-
- \begin{center}
- \texttt{--[\barbelow{.}.+>--],>,++}
- \end{center}
-
- then the matching closing bracket is in 9th position (counting from 0) and
- \texttt{jumpRight} is supposed to return the position just after this
-
- \begin{center}
- \texttt{--[..+>--]\barbelow{,}>,++}
- \end{center}
+\item If there is a number as input token, then this token is
+ transferred to the output list. Then the rest of the input is
+ processed.
+\item If there is an operator as input token, then you need to check
+ what is on top of the operator stack. If there are operators with
+ a higher or equal precedence, these operators are first popped off
+ the stack and moved to the output list. Then the operator from the input
+ is pushed onto the stack and the rest of the input is processed.
+\item If the input is a left-parenthesis, you push it on to the stack
+ and continue processing the input.
+\item If the input is a right-parenthesis, then you move all operators
+ from the stack to the output list until you reach the left-parenthesis.
+ Then you discharge the $($ and $)$ from the input and stack, and continue
+ processing.
+\item If the input is empty, then you move all remaining operators
+ from the stack to the output list.
+\end{itemize}
- meaning it jumps to after the loop. Similarly, if you are in 8th position
- then \texttt{jumpLeft} is supposed to jump to just after the opening
- bracket (that is jumping to the beginning of the loop):
-
- \begin{center}
- \texttt{--[..+>-\barbelow{-}],>,++}
- \qquad$\stackrel{\texttt{jumpLeft}}{\longrightarrow}$\qquad
- \texttt{--[\barbelow{.}.+>--],>,++}
- \end{center}
-
- Unfortunately we have to take into account that there might be
- other opening and closing brackets on the `way' to find the
- matching bracket. For example in the brainf*** program
-
- \begin{center}
- \texttt{--[\barbelow{.}.[+>]--],>,++}
- \end{center}
+\subsubsection*{Tasks (file postfix.scala)}
- we do not want to return the index for the \texttt{'-'} in the 9th
- position, but the program counter for \texttt{','} in 12th
- position. The easiest to find out whether a bracket is matched is by
- using levels (which are the third argument in \texttt{jumpLeft} and
- \texttt{jumpLeft}). In case of \texttt{jumpRight} you increase the
- level by one whenever you find an opening bracket and decrease by
- one for a closing bracket. Then in \texttt{jumpRight} you are looking
- for the closing bracket on level \texttt{0}. For \texttt{jumpLeft} you
- do the opposite. In this way you can find \textbf{matching} brackets
- in strings such as
-
- \begin{center}
- \texttt{--[\barbelow{.}.[[-]+>[.]]--],>,++}
- \end{center}
-
- for which \texttt{jumpRight} should produce the position:
-
- \begin{center}
- \texttt{--[..[[-]+>[.]]--]\barbelow{,}>,++}
- \end{center}
-
- It is also possible that the position returned by \texttt{jumpRight} or
- \texttt{jumpLeft} is outside the string in cases where there are
- no matching brackets. For example
+\begin{itemize}
+\item[(7)] Implement the shunting yard algorithm outlined above. The
+ function, called \texttt{syard}, takes a list of tokens as first
+ argument. The second and third arguments are the stack and output
+ list represented as Scala lists. The most convenient way to
+ implement this algorithm is to analyse what the input list, stack
+ and output look like in each step using pattern-matching. The
+ algorithm transforms for example the input
- \begin{center}
- \texttt{--[\barbelow{.}.[[-]+>[.]]--,>,++}
- \qquad$\stackrel{\texttt{jumpRight}}{\longrightarrow}$\qquad
- \texttt{--[..[[-]+>[.]]-->,++\barbelow{\;\phantom{+}}}
- \end{center}
- \hfill[1 Mark]
-
+ \[
+ \texttt{List(3, +, 4, *, (, 2, -, 1, ))}
+ \]
-\item[(2c)] Write a recursive function \texttt{run} that executes a
- brainf*** program. It takes a program, a program counter, a memory
- pointer and a memory as arguments. If the program counter is outside
- the program string, the execution stops and \texttt{run} returns the
- memory. If the program counter is inside the string, it reads the
- corresponding character and updates the program counter \texttt{pc},
- memory pointer \texttt{mp} and memory \texttt{mem} according to the
- rules shown in Figure~\ref{comms}. It then calls recursively
- \texttt{run} with the updated data.
+ into the postfix output
- Write another function \texttt{start} that calls \texttt{run} with a
- given brainfu** program and memory, and the program counter and memory pointer
- set to~$0$. Like \texttt{run} it returns the memory after the execution
- of the program finishes. You can test your brainf**k interpreter with the
- Sierpinski triangle or the Hello world programs or have a look at
-
- \begin{center}
- \url{https://esolangs.org/wiki/Brainfuck}
- \end{center}\hfill[2 Marks]
+ \[
+ \texttt{List(3, 4, 2, 1, -, *, +)}
+ \]
- \begin{figure}[p]
- \begin{center}
- \begin{tabular}{|@{}p{0.8cm}|l|}
- \hline
- \hfill\texttt{'>'} & \begin{tabular}[t]{@{}l@{\hspace{2mm}}l@{}}
- $\bullet$ & $\texttt{pc} + 1$\\
- $\bullet$ & $\texttt{mp} + 1$\\
- $\bullet$ & \texttt{mem} unchanged
- \end{tabular}\\\hline
- \hfill\texttt{'<'} & \begin{tabular}[t]{@{}l@{\hspace{2mm}}l@{}}
- $\bullet$ & $\texttt{pc} + 1$\\
- $\bullet$ & $\texttt{mp} - 1$\\
- $\bullet$ & \texttt{mem} unchanged
- \end{tabular}\\\hline
- \hfill\texttt{'+'} & \begin{tabular}[t]{@{}l@{\hspace{2mm}}l@{}}
- $\bullet$ & $\texttt{pc} + 1$\\
- $\bullet$ & $\texttt{mp}$ unchanged\\
- $\bullet$ & \texttt{mem} updated with \texttt{mp -> mem(mp) + 1}\\
- \end{tabular}\\\hline
- \hfill\texttt{'-'} & \begin{tabular}[t]{@{}l@{\hspace{2mm}}l@{}}
- $\bullet$ & $\texttt{pc} + 1$\\
- $\bullet$ & $\texttt{mp}$ unchanged\\
- $\bullet$ & \texttt{mem} updated with \texttt{mp -> mem(mp) - 1}\\
- \end{tabular}\\\hline
- \hfill\texttt{'.'} & \begin{tabular}[t]{@{}l@{\hspace{2mm}}l@{}}
- $\bullet$ & $\texttt{pc} + 1$\\
- $\bullet$ & $\texttt{mp}$ and \texttt{mem} unchanged\\
- $\bullet$ & print out \,\texttt{mem(mp)} as a character\\
- \end{tabular}\\\hline
- \hfill\texttt{','} & \begin{tabular}[t]{@{}l@{\hspace{2mm}}l@{}}
- $\bullet$ & $\texttt{pc} + 1$\\
- $\bullet$ & $\texttt{mp}$ unchanged\\
- $\bullet$ & \texttt{mem} updated with \texttt{mp -> \textrm{input}}\\
- \multicolumn{2}{@{}l}{the input is given by \texttt{Console.in.read().toByte}}
- \end{tabular}\\\hline
- \hfill\texttt{'['} & \begin{tabular}[t]{@{}l@{\hspace{2mm}}l@{}}
- \multicolumn{2}{@{}l}{if \texttt{mem(mp) == 0} then}\\
- $\bullet$ & $\texttt{pc = jumpRight(prog, pc + 1, 0)}$\\
- $\bullet$ & $\texttt{mp}$ and \texttt{mem} unchanged\medskip\\
- \multicolumn{2}{@{}l}{otherwise if \texttt{mem(mp) != 0} then}\\
- $\bullet$ & $\texttt{pc} + 1$\\
- $\bullet$ & $\texttt{mp}$ and \texttt{mem} unchanged\\
- \end{tabular}
- \\\hline
- \hfill\texttt{']'} & \begin{tabular}[t]{@{}l@{\hspace{2mm}}l@{}}
- \multicolumn{2}{@{}l}{if \texttt{mem(mp) != 0} then}\\
- $\bullet$ & $\texttt{pc = jumpLeft(prog, pc - 1, 0)}$\\
- $\bullet$ & $\texttt{mp}$ and \texttt{mem} unchanged\medskip\\
- \multicolumn{2}{@{}l}{otherwise if \texttt{mem(mp) == 0} then}\\
- $\bullet$ & $\texttt{pc} + 1$\\
- $\bullet$ & $\texttt{mp}$ and \texttt{mem} unchanged\\
- \end{tabular}\\\hline
- any other char & \begin{tabular}[t]{@{}l@{\hspace{2mm}}l@{}}
- $\bullet$ & $\texttt{pc} + 1$\\
- $\bullet$ & \texttt{mp} and \texttt{mem} unchanged
- \end{tabular}\\
- \hline
- \end{tabular}
- \end{center}
- \caption{The rules for how commands in the brainf*** language update the program counter \texttt{pc},
- memory pointer \texttt{mp} and memory \texttt{mem}.\label{comms}}
- \end{figure}
-\end{itemize}\bigskip
+ You can assume the input list is always a list representing
+ a well-formed infix arithmetic expression.\hfill[1 Mark]
+
+\item[(8)] Implement a compute function that takes a postfix expression
+ as argument and evaluates it generating an integer as result. It uses a
+ stack to evaluate the postfix expression. The operators $+$, $-$, $*$
+ are as usual; $/$ is division on integers, for example $7 / 3 = 2$.
+ \hfill[1 Mark]
+\end{itemize}
+
+\subsubsection*{Task (file postfix2.scala)}
+
+\begin{itemize}
+\item[(9)] Extend the code in (7) and (8) to include the power
+ operator. This requires proper account of associativity of
+ the operators. The power operator is right-associative, whereas the
+ other operators are left-associative. Left-associative operators
+ are popped off if the precedence is bigger or equal, while
+ right-associative operators are only popped off if the precedence is
+ bigger. The compute function in this task should use
+ \texttt{Long}s, rather than \texttt{Int}s.\hfill[2 Marks]
+\end{itemize}
--- a/cws/cw05.tex Wed Nov 28 23:26:47 2018 +0000
+++ b/cws/cw05.tex Thu Nov 29 17:15:11 2018 +0000
@@ -1,262 +1,680 @@
\documentclass{article}
\usepackage{../style}
\usepackage{../langs}
-\usepackage{../graphics}
+\usepackage{disclaimer}
+\usepackage{tikz}
+\usepackage{pgf}
+\usepackage{pgfplots}
+\usepackage{stackengine}
+%% \usepackage{accents}
+\newcommand\barbelow[1]{\stackunder[1.2pt]{#1}{\raisebox{-4mm}{\boldmath$\uparrow$}}}
+
+\begin{filecontents}{re-python2.data}
+1 0.033
+5 0.036
+10 0.034
+15 0.036
+18 0.059
+19 0.084
+20 0.141
+21 0.248
+22 0.485
+23 0.878
+24 1.71
+25 3.40
+26 7.08
+27 14.12
+28 26.69
+\end{filecontents}
+
+\begin{filecontents}{re-java.data}
+5 0.00298
+10 0.00418
+15 0.00996
+16 0.01710
+17 0.03492
+18 0.03303
+19 0.05084
+20 0.10177
+21 0.19960
+22 0.41159
+23 0.82234
+24 1.70251
+25 3.36112
+26 6.63998
+27 13.35120
+28 29.81185
+\end{filecontents}
+
+\begin{filecontents}{re-js.data}
+5 0.061
+10 0.061
+15 0.061
+20 0.070
+23 0.131
+25 0.308
+26 0.564
+28 1.994
+30 7.648
+31 15.881
+32 32.190
+\end{filecontents}
+
+\begin{filecontents}{re-java9.data}
+1000 0.01410
+2000 0.04882
+3000 0.10609
+4000 0.17456
+5000 0.27530
+6000 0.41116
+7000 0.53741
+8000 0.70261
+9000 0.93981
+10000 0.97419
+11000 1.28697
+12000 1.51387
+14000 2.07079
+16000 2.69846
+20000 4.41823
+24000 6.46077
+26000 7.64373
+30000 9.99446
+34000 12.966885
+38000 16.281621
+42000 19.180228
+46000 21.984721
+50000 26.950203
+60000 43.0327746
+\end{filecontents}
+
\begin{document}
-\section*{Replacement Coursework 2 (Automata)}
+% BF IDE
+% https://www.microsoft.com/en-us/p/brainf-ck/9nblgggzhvq5
+
+\section*{Coursework 9 (Scala)}
-This coursework is worth 10\%. It is about deterministic and
-non-deterministic finite automata. The coursework is due on 21 March
-at 5pm. Make sure the files you submit can be processed by just
-calling \texttt{scala <<filename.scala>>}.\bigskip
+This coursework is worth 10\%. It is about a regular expression
+matcher and the shunting yard algorithm by Dijkstra. The first part is
+due on 6 December at 11pm; the second, more advanced part, is due on
+21 December at 11pm. In the first part, you are asked to implement a
+regular expression matcher based on derivatives of regular
+expressions. The reason is that regular expression matching in
+languages like Java, JavaScipt and Python can sometimes be extremely
+slow. The advanced part is about the shunting yard algorithm that
+transforms the usual infix notation of arithmetic expressions into the
+postfix notation, which is for example used in compilers.\bigskip
+
+\IMPORTANT{}
\noindent
-\textbf{Important:} Do not use any mutable data structures in your
-submission! They are not needed. This means you cannot use
-\texttt{ListBuffer}s, for example. Do not use \texttt{return} in your
-code! It has a different meaning in Scala, than in Java. Do not use
-\texttt{var}! This declares a mutable variable. Make sure the
-functions you submit are defined on the ``top-level'' of Scala, not
-inside a class or object. Also note that when marking, the running time
-will be restricted to a maximum of 360 seconds on my laptop.
+Also note that the running time of each part will be restricted to a
+maximum of 30 seconds on my laptop.
+
+\DISCLAIMER{}
-\subsection*{Disclaimer}
-
-It should be understood that the work you submit represents your own
-effort! You have not copied from anyone else. An exception is the
-Scala code I showed during the lectures or uploaded to KEATS, which
-you can freely use.\bigskip
-
+\subsection*{Part 1 (6 Marks, Regular Expression Matcher)}
-\subsection*{Part 1 (Deterministic Finite Automata)}
-
-\noindent
-There are many uses for Deterministic Finite Automata (DFAs), for
-example for testing whether a string matches a pattern or not. DFAs
-consist of some states (circles) and some transitions (edges) between
-states. For example the DFA
+The task is to implement a regular expression matcher that is based on
+derivatives of regular expressions. Most of the functions are defined by
+recursion over regular expressions and can be elegantly implemented
+using Scala's pattern-matching. The implementation should deal with the
+following regular expressions, which have been predefined in the file
+\texttt{re.scala}:
\begin{center}
-\begin{tikzpicture}[scale=1.5,>=stealth',very thick,auto,
- every state/.style={minimum size=4pt,
- inner sep=4pt,draw=blue!50,very thick,
- fill=blue!20}]
- \node[state, initial] (q0) at ( 0,1) {$Q_0$};
- \node[state] (q1) at ( 1,1) {$Q_1$};
- \node[state, accepting] (q2) at ( 2,1) {$Q_2$};
- \path[->] (q0) edge[bend left] node[above] {$a$} (q1)
- (q1) edge[bend left] node[above] {$b$} (q0)
- (q2) edge[bend left=50] node[below] {$b$} (q0)
- (q1) edge node[above] {$a$} (q2)
- (q2) edge [loop right] node {$a$} ()
- (q0) edge [loop below] node {$b$} ();
-\end{tikzpicture}
+\begin{tabular}{lcll}
+ $r$ & $::=$ & $\ZERO$ & cannot match anything\\
+ & $|$ & $\ONE$ & can only match the empty string\\
+ & $|$ & $c$ & can match a single character (in this case $c$)\\
+ & $|$ & $r_1 + r_2$ & can match a string either with $r_1$ or with $r_2$\\
+ & $|$ & $r_1\cdot r_2$ & can match the first part of a string with $r_1$ and\\
+ & & & then the second part with $r_2$\\
+ & $|$ & $r^*$ & can match a string with zero or more copies of $r$\\
+\end{tabular}
\end{center}
-\noindent
-has three states ($Q_0$, $Q_1$ and $Q_2$), whereby $Q_0$ is the
-starting state of the DFA and $Q_2$ is the accepting state. The latter
-is indicated by double lines. In general, a DFA can have any number of
-accepting states, but only a single starting state.
+\noindent
+Why? Knowing how to match regular expressions and strings will let you
+solve a lot of problems that vex other humans. Regular expressions are
+one of the fastest and simplest ways to match patterns in text, and
+are endlessly useful for searching, editing and analysing data in all
+sorts of places (for example analysing network traffic in order to
+detect security breaches). However, you need to be fast, otherwise you
+will stumble over problems such as recently reported at
+
+{\small
+\begin{itemize}
+\item[$\bullet$] \url{http://stackstatus.net/post/147710624694/outage-postmortem-july-20-2016}
+\item[$\bullet$] \url{https://vimeo.com/112065252}
+\item[$\bullet$] \url{http://davidvgalbraith.com/how-i-fixed-atom/}
+\end{itemize}}
+
+\subsubsection*{Tasks (file re.scala)}
-Transitions are edges between states labelled with a character. The
-idea is that if we are in state $Q_0$, say, and get an $a$, we can go
-to state $Q_1$. If we are in state $Q_2$ and get an $a$, we can stay
-in state $Q_2$; if we get a $b$ in $Q_2$, then can go to state
-$Q_0$. The main point of DFAs is that if we are in a state and get a
-character, it is always clear which is the next state---there can only
-be at most one. The task of Part 1 is to implement such DFAs in Scala
-using partial functions for the transitions.
+The file \texttt{re.scala} has already a definition for regular
+expressions and also defines some handy shorthand notation for
+regular expressions. The notation in this document matches up
+with the code in the file as follows:
-A string is accepted by a DFA, if we start in the starting state,
-follow all transitions according to the string; if we end up in an
-accepting state, then the string is accepted. If not, the string is
-not accepted. The technical idea is that DFAs can be used to
-accept strings from \emph{regular} languages.
+\begin{center}
+ \begin{tabular}{rcl@{\hspace{10mm}}l}
+ & & code: & shorthand:\smallskip \\
+ $\ZERO$ & $\mapsto$ & \texttt{ZERO}\\
+ $\ONE$ & $\mapsto$ & \texttt{ONE}\\
+ $c$ & $\mapsto$ & \texttt{CHAR(c)}\\
+ $r_1 + r_2$ & $\mapsto$ & \texttt{ALT(r1, r2)} & \texttt{r1 | r2}\\
+ $r_1 \cdot r_2$ & $\mapsto$ & \texttt{SEQ(r1, r2)} & \texttt{r1 $\sim$ r2}\\
+ $r^*$ & $\mapsto$ & \texttt{STAR(r)} & \texttt{r.\%}
+\end{tabular}
+\end{center}
-\subsubsection*{Tasks}
\begin{itemize}
-\item[(1)] Write a polymorphic function, called \texttt{share}, that
- decides whether two sets share some elements (i.e.~the intersection
- is not empty).\hfill[1 Mark]
-
-\item[(2)] The transitions of DFAs will be implemented as partial
- functions. These functions will have the type (state,
- character)-pair to state, that is their input will be a (state,
- character)-pair and they return a state. For example the transitions
- of the DFA shown above can be defined as the following
- partial function:
+\item[(1)] Implement a function, called \textit{nullable}, by
+ recursion over regular expressions. This function tests whether a
+ regular expression can match the empty string. This means given a
+ regular expression it either returns true or false. The function
+ \textit{nullable}
+ is defined as follows:
+
+\begin{center}
+\begin{tabular}{lcl}
+$\textit{nullable}(\ZERO)$ & $\dn$ & $\textit{false}$\\
+$\textit{nullable}(\ONE)$ & $\dn$ & $\textit{true}$\\
+$\textit{nullable}(c)$ & $\dn$ & $\textit{false}$\\
+$\textit{nullable}(r_1 + r_2)$ & $\dn$ & $\textit{nullable}(r_1) \vee \textit{nullable}(r_2)$\\
+$\textit{nullable}(r_1 \cdot r_2)$ & $\dn$ & $\textit{nullable}(r_1) \wedge \textit{nullable}(r_2)$\\
+$\textit{nullable}(r^*)$ & $\dn$ & $\textit{true}$\\
+\end{tabular}
+\end{center}~\hfill[1 Mark]
+
+\item[(2)] Implement a function, called \textit{der}, by recursion over
+ regular expressions. It takes a character and a regular expression
+ as arguments and calculates the derivative regular expression according
+ to the rules:
+
+\begin{center}
+\begin{tabular}{lcl}
+$\textit{der}\;c\;(\ZERO)$ & $\dn$ & $\ZERO$\\
+$\textit{der}\;c\;(\ONE)$ & $\dn$ & $\ZERO$\\
+$\textit{der}\;c\;(d)$ & $\dn$ & $\textit{if}\; c = d\;\textit{then} \;\ONE \; \textit{else} \;\ZERO$\\
+$\textit{der}\;c\;(r_1 + r_2)$ & $\dn$ & $(\textit{der}\;c\;r_1) + (\textit{der}\;c\;r_2)$\\
+$\textit{der}\;c\;(r_1 \cdot r_2)$ & $\dn$ & $\textit{if}\;\textit{nullable}(r_1)$\\
+ & & $\textit{then}\;((\textit{der}\;c\;r_1)\cdot r_2) + (\textit{der}\;c\;r_2)$\\
+ & & $\textit{else}\;(\textit{der}\;c\;r_1)\cdot r_2$\\
+$\textit{der}\;c\;(r^*)$ & $\dn$ & $(\textit{der}\;c\;r)\cdot (r^*)$\\
+\end{tabular}
+\end{center}
+
+For example given the regular expression $r = (a \cdot b) \cdot c$, the derivatives
+w.r.t.~the characters $a$, $b$ and $c$ are
-\begin{lstlisting}[language=Scala,numbers=none]
-val dfa_trans : PartialFunction[(State,Char), State] =
- { case (Q0, 'a') => Q1
- case (Q0, 'b') => Q0
- case (Q1, 'a') => Q2
- case (Q1, 'b') => Q0
- case (Q2, 'a') => Q2
- case (Q2, 'b') => Q0
- }
-\end{lstlisting}
+\begin{center}
+ \begin{tabular}{lcll}
+ $\textit{der}\;a\;r$ & $=$ & $(\ONE \cdot b)\cdot c$ & \quad($= r'$)\\
+ $\textit{der}\;b\;r$ & $=$ & $(\ZERO \cdot b)\cdot c$\\
+ $\textit{der}\;c\;r$ & $=$ & $(\ZERO \cdot b)\cdot c$
+ \end{tabular}
+\end{center}
+
+Let $r'$ stand for the first derivative, then taking the derivatives of $r'$
+w.r.t.~the characters $a$, $b$ and $c$ gives
+
+\begin{center}
+ \begin{tabular}{lcll}
+ $\textit{der}\;a\;r'$ & $=$ & $((\ZERO \cdot b) + \ZERO)\cdot c$ \\
+ $\textit{der}\;b\;r'$ & $=$ & $((\ZERO \cdot b) + \ONE)\cdot c$ & \quad($= r''$)\\
+ $\textit{der}\;c\;r'$ & $=$ & $((\ZERO \cdot b) + \ZERO)\cdot c$
+ \end{tabular}
+\end{center}
- The main point of partial functions (as opposed to ``normal''
- functions) is that they do not have to be defined everywhere. For
- example the transitions above only mention characters $a$ and $b$,
- but leave out any other characters. Partial functions come with a
- method \texttt{isDefinedAt} that can be used to check whether an
- input produces a result or not. For example
+One more example: Let $r''$ stand for the second derivative above,
+then taking the derivatives of $r''$ w.r.t.~the characters $a$, $b$
+and $c$ gives
+
+\begin{center}
+ \begin{tabular}{lcll}
+ $\textit{der}\;a\;r''$ & $=$ & $((\ZERO \cdot b) + \ZERO) \cdot c + \ZERO$ \\
+ $\textit{der}\;b\;r''$ & $=$ & $((\ZERO \cdot b) + \ZERO) \cdot c + \ZERO$\\
+ $\textit{der}\;c\;r''$ & $=$ & $((\ZERO \cdot b) + \ZERO) \cdot c + \ONE$ &
+ (is $\textit{nullable}$)
+ \end{tabular}
+\end{center}
+
+Note, the last derivative can match the empty string, that is it is \textit{nullable}.\\
+\mbox{}\hfill\mbox{[1 Mark]}
+
+\item[(3)] Implement the function \textit{simp}, which recursively
+ traverses a regular expression from the inside to the outside, and
+ on the way simplifies every regular expression on the left (see
+ below) to the regular expression on the right, except it does not
+ simplify inside ${}^*$-regular expressions.
-\begin{lstlisting}[language=Scala,numbers=none]
- dfa_trans.isDefinedAt((Q0, 'a'))
- dfa_trans.isDefinedAt((Q0, 'c'))
-\end{lstlisting}
+ \begin{center}
+\begin{tabular}{l@{\hspace{4mm}}c@{\hspace{4mm}}ll}
+$r \cdot \ZERO$ & $\mapsto$ & $\ZERO$\\
+$\ZERO \cdot r$ & $\mapsto$ & $\ZERO$\\
+$r \cdot \ONE$ & $\mapsto$ & $r$\\
+$\ONE \cdot r$ & $\mapsto$ & $r$\\
+$r + \ZERO$ & $\mapsto$ & $r$\\
+$\ZERO + r$ & $\mapsto$ & $r$\\
+$r + r$ & $\mapsto$ & $r$\\
+\end{tabular}
+ \end{center}
+
+ For example the regular expression
+ \[(r_1 + \ZERO) \cdot \ONE + ((\ONE + r_2) + r_3) \cdot (r_4 \cdot \ZERO)\]
+
+ simplifies to just $r_1$. \textbf{Hint:} Regular expressions can be
+ seen as trees and there are several methods for traversing
+ trees. One of them corresponds to the inside-out traversal, which is
+ sometimes also called post-order traversal'' you traverse inside the
+ tree and on the way up, you apply simplification rules.
+ Furthermore,
+ remember numerical expressions from school times: there you had expressions
+ like $u + \ldots + (1 \cdot x) - \ldots (z + (y \cdot 0)) \ldots$
+ and simplification rules that looked very similar to rules
+ above. You would simplify such numerical expressions by replacing
+ for example the $y \cdot 0$ by $0$, or $1\cdot x$ by $x$, and then
+ look whether more rules are applicable. If you organise the
+ simplification in an inside-out fashion, it is always clear which
+ rule should be applied next.\hfill[1 Mark]
+
+\item[(4)] Implement two functions: The first, called \textit{ders},
+ takes a list of characters and a regular expression as arguments, and
+ builds the derivative w.r.t.~the list as follows:
+
+\begin{center}
+\begin{tabular}{lcl}
+$\textit{ders}\;(Nil)\;r$ & $\dn$ & $r$\\
+ $\textit{ders}\;(c::cs)\;r$ & $\dn$ &
+ $\textit{ders}\;cs\;(\textit{simp}(\textit{der}\;c\;r))$\\
+\end{tabular}
+\end{center}
+
+Note that this function is different from \textit{der}, which only
+takes a single character.
- \noindent
- gives \texttt{true} in the first case and \texttt{false} in the
- second. There is also a method \texttt{lift} that transforms a
- partial function into a ``normal'' function returning an optional
- value: if the partial function is defined on the input, the lifted
- function will return \texttt{Some}; if it is not defined, then
- \texttt{None}.
-
- Write a function that takes a transition and a (state, character)-pair as arguments
- and produces an optional state (the state specified by the partial transition
- function whenever it is defined; if the transition function is undefined,
- return \texttt{None}).\hfill\mbox{[1 Mark]}
+The second function, called \textit{matcher}, takes a string and a
+regular expression as arguments. It builds first the derivatives
+according to \textit{ders} and after that tests whether the resulting
+derivative regular expression can match the empty string (using
+\textit{nullable}). For example the \textit{matcher} will produce
+true for the regular expression $(a\cdot b)\cdot c$ and the string
+$abc$, but false if you give it the string $ab$. \hfill[1 Mark]
+
+\item[(5)] Implement a function, called \textit{size}, by recursion
+ over regular expressions. If a regular expression is seen as a tree,
+ then \textit{size} should return the number of nodes in such a
+ tree. Therefore this function is defined as follows:
-\item[(3)]
- Write a function that ``lifts'' the function in (2) from characters to strings. That
- is, write a function that takes a transition, a state and a list of characters
- as arguments and produces the state generated by following the transitions for
- each character in the list. For example if you are in state $Q_0$ in the DFA above
- and have the list \texttt{List(a,a,a,b,b,a)}, then you need to return the
- state $Q_1$ (as option since there might not be such a state in general).\\
- \mbox{}\hfill\mbox{[1 Mark]}
-
-\item[(4)] DFAs are defined as a triple: (starting state, transitions,
- set of accepting states). Write a function \texttt{accepts} that tests whether
- a string is accepted by an DFA or not. For this start in the
- starting state of the DFA, use the function under (3) to calculate
- the state after following all transitions according to the
- characters in the string. If the resulting state is an accepting state,
- return \texttt{true}; otherwise \texttt{false}.\\\mbox{}\hfill\mbox{[1 Mark]}
+\begin{center}
+\begin{tabular}{lcl}
+$\textit{size}(\ZERO)$ & $\dn$ & $1$\\
+$\textit{size}(\ONE)$ & $\dn$ & $1$\\
+$\textit{size}(c)$ & $\dn$ & $1$\\
+$\textit{size}(r_1 + r_2)$ & $\dn$ & $1 + \textit{size}(r_1) + \textit{size}(r_2)$\\
+$\textit{size}(r_1 \cdot r_2)$ & $\dn$ & $1 + \textit{size}(r_1) + \textit{size}(r_2)$\\
+$\textit{size}(r^*)$ & $\dn$ & $1 + \textit{size}(r)$\\
+\end{tabular}
+\end{center}
+
+You can use \textit{size} in order to test how much the `evil' regular
+expression $(a^*)^* \cdot b$ grows when taking successive derivatives
+according the letter $a$ without simplification and then compare it to
+taking the derivative, but simplify the result. The sizes
+are given in \texttt{re.scala}. \hfill[1 Mark]
+
+\item[(6)] You do not have to implement anything specific under this
+ task. The purpose is that you will be marked for some ``power''
+ test cases. For example can your matcher decide withing 30 seconds
+ whether the regular expression $(a^*)^*\cdot b$ matches strings of the
+ form $aaa\ldots{}aaaa$, for say 1 Million $a$'s. And does simplification
+ simplify the regular expression
+
+ \[
+ \texttt{SEQ(SEQ(SEQ(..., ONE | ONE) , ONE | ONE), ONE | ONE)}
+ \]
+
+ \noindent correctly to just \texttt{ONE}, where \texttt{SEQ} is nested
+ 50 or more times.\\
+ \mbox{}\hfill[1 Mark]
\end{itemize}
+\subsection*{Background}
-\subsection*{Part 2 (Non-Deterministic Finite Automata)}
+Although easily implementable in Scala, the idea behind the derivative
+function might not so easy to be seen. To understand its purpose
+better, assume a regular expression $r$ can match strings of the form
+$c\!::\!cs$ (that means strings which start with a character $c$ and have
+some rest, or tail, $cs$). If you take the derivative of $r$ with
+respect to the character $c$, then you obtain a regular expression
+that can match all the strings $cs$. In other words, the regular
+expression $\textit{der}\;c\;r$ can match the same strings $c\!::\!cs$
+that can be matched by $r$, except that the $c$ is chopped off.
+
+Assume now $r$ can match the string $abc$. If you take the derivative
+according to $a$ then you obtain a regular expression that can match
+$bc$ (it is $abc$ where the $a$ has been chopped off). If you now
+build the derivative $\textit{der}\;b\;(\textit{der}\;a\;r)$ you
+obtain a regular expression that can match the string $c$ (it is $bc$
+where $b$ is chopped off). If you finally build the derivative of this
+according $c$, that is
+$\textit{der}\;c\;(\textit{der}\;b\;(\textit{der}\;a\;r))$, you obtain
+a regular expression that can match the empty string. You can test
+whether this is indeed the case using the function nullable, which is
+what your matcher is doing.
-The main point of DFAs is that for every given state and character
-there is at most one next state (one if the transition is defined;
-none otherwise). However, this restriction to at most one state can be
-quite limiting for some applications.\footnote{Though there is a
- curious fact that every (less restricted) NFA can be translated into
- an ``equivalent'' DFA, whereby accepting means accepting the same
- set of strings. However this might increase drastically the number
- of states in the DFA.} Non-Deterministic Automata (NFAs) remove
-this restriction: there can be more than one starting state, and given
-a state and a character there can be more than one next
-state. Consider for example the NFA
+The purpose of the $\textit{simp}$ function is to keep the regular
+expressions small. Normally the derivative function makes the regular
+expression bigger (see the SEQ case and the example in (2)) and the
+algorithm would be slower and slower over time. The $\textit{simp}$
+function counters this increase in size and the result is that the
+algorithm is fast throughout. By the way, this algorithm is by Janusz
+Brzozowski who came up with the idea of derivatives in 1964 in his PhD
+thesis.
+
+\begin{center}\small
+\url{https://en.wikipedia.org/wiki/Janusz_Brzozowski_(computer_scientist)}
+\end{center}
+
+
+If you want to see how badly the regular expression matchers do in
+Java\footnote{Version 8 and below; Version 9 and above does not seem to be as
+ catastrophic, but still much worse than the regular expression
+ matcher based on derivatives.}, JavaScript and in Python with the
+`evil' regular expression $(a^*)^*\cdot b$, then have a look at the
+graphs below (you can try it out for yourself: have a look at the file
+\texttt{catastrophic9.java}, \texttt{catastrophic.js} and
+\texttt{catastrophic.py} on KEATS). Compare this with the matcher you
+have implemented. How long can the string of $a$'s be in your matcher
+and still stay within the 30 seconds time limit?
\begin{center}
-\begin{tikzpicture}[scale=0.7,>=stealth',very thick,
- every state/.style={minimum size=0pt,
- draw=blue!50,very thick,fill=blue!20},]
-\node[state,initial] (R_1) {$R_1$};
-\node[state,initial] (R_2) [above=of R_1] {$R_2$};
-\node[state, accepting] (R_3) [right=of R_1] {$R_3$};
-\path[->] (R_1) edge node [below] {$b$} (R_3);
-\path[->] (R_2) edge [bend left] node [above] {$a$} (R_3);
-\path[->] (R_1) edge [bend left] node [left] {$c$} (R_2);
-\path[->] (R_2) edge [bend left] node [right] {$a$} (R_1);
+\begin{tabular}{@{}cc@{}}
+\multicolumn{2}{c}{Graph: $(a^*)^*\cdot b$ and strings
+ $\underbrace{a\ldots a}_{n}$}\bigskip\\
+
+\begin{tikzpicture}
+\begin{axis}[
+ xlabel={$n$},
+ x label style={at={(1.05,0.0)}},
+ ylabel={time in secs},
+ y label style={at={(0.06,0.5)}},
+ enlargelimits=false,
+ xtick={0,5,...,30},
+ xmax=33,
+ ymax=45,
+ ytick={0,5,...,40},
+ scaled ticks=false,
+ axis lines=left,
+ width=6cm,
+ height=5.5cm,
+ legend entries={Python, Java 8, JavaScript},
+ legend pos=north west]
+\addplot[blue,mark=*, mark options={fill=white}] table {re-python2.data};
+\addplot[cyan,mark=*, mark options={fill=white}] table {re-java.data};
+\addplot[red,mark=*, mark options={fill=white}] table {re-js.data};
+\end{axis}
\end{tikzpicture}
+ &
+\begin{tikzpicture}
+\begin{axis}[
+ xlabel={$n$},
+ x label style={at={(1.05,0.0)}},
+ ylabel={time in secs},
+ y label style={at={(0.06,0.5)}},
+ %enlargelimits=false,
+ %xtick={0,5000,...,30000},
+ xmax=65000,
+ ymax=45,
+ ytick={0,5,...,40},
+ scaled ticks=false,
+ axis lines=left,
+ width=6cm,
+ height=5.5cm,
+ legend entries={Java 9},
+ legend pos=north west]
+\addplot[cyan,mark=*, mark options={fill=white}] table {re-java9.data};
+\end{axis}
+\end{tikzpicture}
+\end{tabular}
\end{center}
+\newpage
+
+\subsection*{Part 2 (4 Marks)}
+
+Coming from Java or C++, you might think Scala is a quite esoteric
+programming language. But remember, some serious companies have built
+their business on
+Scala.\footnote{\url{https://en.wikipedia.org/wiki/Scala_(programming_language)\#Companies}}
+And there are far, far more esoteric languages out there. One is
+called \emph{brainf***}. You are asked in this part to implement an
+interpreter for this language.
+
+Urban M\"uller developed brainf*** in 1993. A close relative of this
+language was already introduced in 1964 by Corado B\"ohm, an Italian
+computer pioneer, who unfortunately died a few months ago. The main
+feature of brainf*** is its minimalistic set of instructions---just 8
+instructions in total and all of which are single characters. Despite
+the minimalism, this language has been shown to be Turing
+complete\ldots{}if this doesn't ring any bell with you: it roughly
+means that every algorithm we know can, in principle, be implemented in
+brainf***. It just takes a lot of determination and quite a lot of
+memory resources. Some relatively sophisticated sample programs in
+brainf*** are given in the file \texttt{bf.scala}.\bigskip
\noindent
-where in state $R_2$ if we get an $a$, we can go to state $R_1$
-\emph{or} $R_3$. If we want to find out whether an NFA accepts a
-string, then we need to explore both possibilities. We will do this
-``exploration'' in the tasks below in a breadth-first manner.
-
+As mentioned above, brainf*** has 8 single-character commands, namely
+\texttt{'>'}, \texttt{'<'}, \texttt{'+'}, \texttt{'-'}, \texttt{'.'},
+\texttt{','}, \texttt{'['} and \texttt{']'}. Every other character is
+considered a comment. Brainf*** operates on memory cells containing
+integers. For this it uses a single memory pointer that points at each
+stage to one memory cell. This pointer can be moved forward by one
+memory cell by using the command \texttt{'>'}, and backward by using
+\texttt{'<'}. The commands \texttt{'+'} and \texttt{'-'} increase,
+respectively decrease, by 1 the content of the memory cell to which
+the memory pointer currently points to. The commands for input/output
+are \texttt{','} and \texttt{'.'}. Output works by reading the content
+of the memory cell to which the memory pointer points to and printing
+it out as an ASCII character. Input works the other way, taking some
+user input and storing it in the cell to which the memory pointer
+points to. The commands \texttt{'['} and \texttt{']'} are looping
+constructs. Everything in between \texttt{'['} and \texttt{']'} is
+repeated until a counter (memory cell) reaches zero. A typical
+program in brainf*** looks as follows:
-The feature of having more than one next state in NFAs will be
-implemented by having a \emph{set} of partial transition functions
-(DFAs had only one). For example the NFA shown above will be
-represented by the set of partial functions
-
-\begin{lstlisting}[language=Scala,numbers=none]
-val nfa_trans : NTrans = Set(
- { case (R1, 'c') => R2 },
- { case (R1, 'b') => R3 },
- { case (R2, 'a') => R1 },
- { case (R2, 'a') => R3 }
-)
-\end{lstlisting}
+\begin{center}
+\begin{verbatim}
+ ++++++++[>++++[>++>+++>+++>+<<<<-]>+>+>->>+[<]<-]>>.>---.+++++++
+ ..+++.>>.<-.<.+++.------.--------.>>+.>++.
+\end{verbatim}
+\end{center}
\noindent
-The point is that the 3rd element in this set makes sure that in state $R_2$ and
-given an $a$, we can go to state $R_1$; and the 4th element, in $R_2$,
-given an $a$, we can also go to state $R_3$. When following
-transitions from a state, we have to look at all partial functions in
-the set and generate the set of \emph{all} possible next states.
+This one prints out Hello World\ldots{}obviously.
-\subsubsection*{Tasks}
+\subsubsection*{Tasks (file bf.scala)}
\begin{itemize}
-\item[(5)]
- Write a function \texttt{nnext} which takes a transition set, a state
- and a character as arguments, and calculates all possible next states
- (returned as set).\\
- \mbox{}\hfill\mbox{[1 Mark]}
+\item[(2a)] Brainf*** memory is represented by a \texttt{Map} from
+ integers to integers. The empty memory is represented by
+ \texttt{Map()}, that is nothing is stored in the
+ memory. \texttt{Map(0 -> 1, 2 -> 3)} clearly stores \texttt{1} at
+ memory location \texttt{0}; at \texttt{2} it stores \texttt{3}. The
+ convention is that if we query the memory at a location that is
+ \emph{not} defined in the \texttt{Map}, we return \texttt{0}. Write
+ a function, \texttt{sread}, that takes a memory (a \texttt{Map}) and
+ a memory pointer (an \texttt{Int}) as argument, and safely reads the
+ corresponding memory location. If the \texttt{Map} is not defined at
+ the memory pointer, \texttt{sread} returns \texttt{0}.
+
+ Write another function \texttt{write}, which takes a memory, a
+ memory pointer and an integer value as argument and updates the
+ \texttt{Map} with the value at the given memory location. As usual
+ the \texttt{Map} is not updated `in-place' but a new map is created
+ with the same data, except the value is stored at the given memory
+ pointer.\hfill[1 Mark]
-\item[(6)] Write a function \texttt{nnexts} which takes a transition
- set, a \emph{set} of states and a character as arguments, and
- calculates \emph{all} possible next states that can be reached from
- any state in the set.\mbox{}\hfill\mbox{[1 Mark]}
+\item[(2b)] Write two functions, \texttt{jumpRight} and
+ \texttt{jumpLeft} that are needed to implement the loop constructs
+ of brainf***. They take a program (a \texttt{String}) and a program
+ counter (an \texttt{Int}) as argument and move right (respectively
+ left) in the string in order to find the \textbf{matching}
+ opening/closing bracket. For example, given the following program
+ with the program counter indicated by an arrow:
+
+ \begin{center}
+ \texttt{--[\barbelow{.}.+>--],>,++}
+ \end{center}
+
+ then the matching closing bracket is in 9th position (counting from 0) and
+ \texttt{jumpRight} is supposed to return the position just after this
+
+ \begin{center}
+ \texttt{--[..+>--]\barbelow{,}>,++}
+ \end{center}
+
+ meaning it jumps to after the loop. Similarly, if you are in 8th position
+ then \texttt{jumpLeft} is supposed to jump to just after the opening
+ bracket (that is jumping to the beginning of the loop):
-\item[(7)] Like in (3), write a function \texttt{nnextss} that lifts
- \texttt{nnexts} from (6) from single characters to lists of characters.
- \mbox{}\hfill\mbox{[1 Mark]}
-
-\item[(8)] NFAs are also defined as a triple: (set of staring states,
- set of transitions, set of accepting states). Write a function
- \texttt{naccepts} that tests whether a string is accepted by an NFA
- or not. For this start in all starting states of the NFA, use the
- function under (7) to calculate the set of states following all
- transitions according to the characters in the string. If the
- resulting set of states shares at least a single state with the set
- of accepting states, return \texttt{true}; otherwise \texttt{false}.
- Use the function under (1) in order to test whether these two sets
- of states share any states or not.\mbox{}\hfill\mbox{[1 Mark]}
+ \begin{center}
+ \texttt{--[..+>-\barbelow{-}],>,++}
+ \qquad$\stackrel{\texttt{jumpLeft}}{\longrightarrow}$\qquad
+ \texttt{--[\barbelow{.}.+>--],>,++}
+ \end{center}
+
+ Unfortunately we have to take into account that there might be
+ other opening and closing brackets on the `way' to find the
+ matching bracket. For example in the brainf*** program
+
+ \begin{center}
+ \texttt{--[\barbelow{.}.[+>]--],>,++}
+ \end{center}
+
+ we do not want to return the index for the \texttt{'-'} in the 9th
+ position, but the program counter for \texttt{','} in 12th
+ position. The easiest to find out whether a bracket is matched is by
+ using levels (which are the third argument in \texttt{jumpLeft} and
+ \texttt{jumpLeft}). In case of \texttt{jumpRight} you increase the
+ level by one whenever you find an opening bracket and decrease by
+ one for a closing bracket. Then in \texttt{jumpRight} you are looking
+ for the closing bracket on level \texttt{0}. For \texttt{jumpLeft} you
+ do the opposite. In this way you can find \textbf{matching} brackets
+ in strings such as
+
+ \begin{center}
+ \texttt{--[\barbelow{.}.[[-]+>[.]]--],>,++}
+ \end{center}
+
+ for which \texttt{jumpRight} should produce the position:
+
+ \begin{center}
+ \texttt{--[..[[-]+>[.]]--]\barbelow{,}>,++}
+ \end{center}
+
+ It is also possible that the position returned by \texttt{jumpRight} or
+ \texttt{jumpLeft} is outside the string in cases where there are
+ no matching brackets. For example
+
+ \begin{center}
+ \texttt{--[\barbelow{.}.[[-]+>[.]]--,>,++}
+ \qquad$\stackrel{\texttt{jumpRight}}{\longrightarrow}$\qquad
+ \texttt{--[..[[-]+>[.]]-->,++\barbelow{\;\phantom{+}}}
+ \end{center}
+ \hfill[1 Mark]
-\item[(9)] Since we explore in functions (6) and (7) all possible next
- states, we decide whether a string is accepted in a breadth-first
- manner. (Depth-first would be to choose one state, follow all next
- states of this single state; check whether it leads to an accepting
- state. If not, we backtrack and choose another state). The
- disadvantage of breadth-first search is that at every step a
- non-empty set of states are ``active''\ldots{} states that need to
- be followed at the same time. Write similar functions as in (7) and
- (8), but instead of returning states or a boolean, calculate the
- number of states that need to be followed in each step. The function
- \texttt{max\_accept} should then return the maximum of all these
- numbers.
+
+\item[(2c)] Write a recursive function \texttt{run} that executes a
+ brainf*** program. It takes a program, a program counter, a memory
+ pointer and a memory as arguments. If the program counter is outside
+ the program string, the execution stops and \texttt{run} returns the
+ memory. If the program counter is inside the string, it reads the
+ corresponding character and updates the program counter \texttt{pc},
+ memory pointer \texttt{mp} and memory \texttt{mem} according to the
+ rules shown in Figure~\ref{comms}. It then calls recursively
+ \texttt{run} with the updated data.
+
+ Write another function \texttt{start} that calls \texttt{run} with a
+ given brainfu** program and memory, and the program counter and memory pointer
+ set to~$0$. Like \texttt{run} it returns the memory after the execution
+ of the program finishes. You can test your brainf**k interpreter with the
+ Sierpinski triangle or the Hello world programs or have a look at
- As a test case, consider again the NFA shown above. At the beginning
- the number of active states will be 2 (since there are two starting
- states, namely $R_1$ and $R_2$). If we get an $a$, there will be
- still 2 active states, namely $R_1$ and $R_3$ both reachable from
- $R_2$. There is no transition for $a$ and $R_1$. So for a string,
- say, $ab$ which is accepted by the NFA, the maximum number of active
- states is 2 (it is not possible that all three states of this NFA
- are active at the same time; is it possible that no state is
- active?). \hfill\mbox{[2 Marks]}
+ \begin{center}
+ \url{https://esolangs.org/wiki/Brainfuck}
+ \end{center}\hfill[2 Marks]
+
+ \begin{figure}[p]
+ \begin{center}
+ \begin{tabular}{|@{}p{0.8cm}|l|}
+ \hline
+ \hfill\texttt{'>'} & \begin{tabular}[t]{@{}l@{\hspace{2mm}}l@{}}
+ $\bullet$ & $\texttt{pc} + 1$\\
+ $\bullet$ & $\texttt{mp} + 1$\\
+ $\bullet$ & \texttt{mem} unchanged
+ \end{tabular}\\\hline
+ \hfill\texttt{'<'} & \begin{tabular}[t]{@{}l@{\hspace{2mm}}l@{}}
+ $\bullet$ & $\texttt{pc} + 1$\\
+ $\bullet$ & $\texttt{mp} - 1$\\
+ $\bullet$ & \texttt{mem} unchanged
+ \end{tabular}\\\hline
+ \hfill\texttt{'+'} & \begin{tabular}[t]{@{}l@{\hspace{2mm}}l@{}}
+ $\bullet$ & $\texttt{pc} + 1$\\
+ $\bullet$ & $\texttt{mp}$ unchanged\\
+ $\bullet$ & \texttt{mem} updated with \texttt{mp -> mem(mp) + 1}\\
+ \end{tabular}\\\hline
+ \hfill\texttt{'-'} & \begin{tabular}[t]{@{}l@{\hspace{2mm}}l@{}}
+ $\bullet$ & $\texttt{pc} + 1$\\
+ $\bullet$ & $\texttt{mp}$ unchanged\\
+ $\bullet$ & \texttt{mem} updated with \texttt{mp -> mem(mp) - 1}\\
+ \end{tabular}\\\hline
+ \hfill\texttt{'.'} & \begin{tabular}[t]{@{}l@{\hspace{2mm}}l@{}}
+ $\bullet$ & $\texttt{pc} + 1$\\
+ $\bullet$ & $\texttt{mp}$ and \texttt{mem} unchanged\\
+ $\bullet$ & print out \,\texttt{mem(mp)} as a character\\
+ \end{tabular}\\\hline
+ \hfill\texttt{','} & \begin{tabular}[t]{@{}l@{\hspace{2mm}}l@{}}
+ $\bullet$ & $\texttt{pc} + 1$\\
+ $\bullet$ & $\texttt{mp}$ unchanged\\
+ $\bullet$ & \texttt{mem} updated with \texttt{mp -> \textrm{input}}\\
+ \multicolumn{2}{@{}l}{the input is given by \texttt{Console.in.read().toByte}}
+ \end{tabular}\\\hline
+ \hfill\texttt{'['} & \begin{tabular}[t]{@{}l@{\hspace{2mm}}l@{}}
+ \multicolumn{2}{@{}l}{if \texttt{mem(mp) == 0} then}\\
+ $\bullet$ & $\texttt{pc = jumpRight(prog, pc + 1, 0)}$\\
+ $\bullet$ & $\texttt{mp}$ and \texttt{mem} unchanged\medskip\\
+ \multicolumn{2}{@{}l}{otherwise if \texttt{mem(mp) != 0} then}\\
+ $\bullet$ & $\texttt{pc} + 1$\\
+ $\bullet$ & $\texttt{mp}$ and \texttt{mem} unchanged\\
+ \end{tabular}
+ \\\hline
+ \hfill\texttt{']'} & \begin{tabular}[t]{@{}l@{\hspace{2mm}}l@{}}
+ \multicolumn{2}{@{}l}{if \texttt{mem(mp) != 0} then}\\
+ $\bullet$ & $\texttt{pc = jumpLeft(prog, pc - 1, 0)}$\\
+ $\bullet$ & $\texttt{mp}$ and \texttt{mem} unchanged\medskip\\
+ \multicolumn{2}{@{}l}{otherwise if \texttt{mem(mp) == 0} then}\\
+ $\bullet$ & $\texttt{pc} + 1$\\
+ $\bullet$ & $\texttt{mp}$ and \texttt{mem} unchanged\\
+ \end{tabular}\\\hline
+ any other char & \begin{tabular}[t]{@{}l@{\hspace{2mm}}l@{}}
+ $\bullet$ & $\texttt{pc} + 1$\\
+ $\bullet$ & \texttt{mp} and \texttt{mem} unchanged
+ \end{tabular}\\
+ \hline
+ \end{tabular}
+ \end{center}
+ \caption{The rules for how commands in the brainf*** language update the program counter \texttt{pc},
+ memory pointer \texttt{mp} and memory \texttt{mem}.\label{comms}}
+ \end{figure}
+\end{itemize}\bigskip
-
-\end{itemize}
-
+
+
\end{document}
--- /dev/null Thu Jan 01 00:00:00 1970 +0000
+++ b/cws/cw08.tex Thu Nov 29 17:15:11 2018 +0000
@@ -0,0 +1,267 @@
+\documentclass{article}
+\usepackage{../style}
+\usepackage{../langs}
+\usepackage{../graphics}
+
+\begin{document}
+
+\section*{Replacement Coursework 2 (Automata)}
+
+This coursework is worth 10\%. It is about deterministic and
+non-deterministic finite automata. The coursework is due on 21 March
+at 5pm. Make sure the files you submit can be processed by just
+calling \texttt{scala <<filename.scala>>}.\bigskip
+
+\noindent
+\textbf{Important:} Do not use any mutable data structures in your
+submission! They are not needed. This means you cannot use
+\texttt{ListBuffer}s, for example. Do not use \texttt{return} in your
+code! It has a different meaning in Scala, than in Java. Do not use
+\texttt{var}! This declares a mutable variable. Make sure the
+functions you submit are defined on the ``top-level'' of Scala, not
+inside a class or object. Also note that when marking, the running time
+will be restricted to a maximum of 360 seconds on my laptop.
+
+
+\subsection*{Disclaimer}
+
+It should be understood that the work you submit represents your own
+effort! You have not copied from anyone else. An exception is the
+Scala code I showed during the lectures or uploaded to KEATS, which
+you can freely use.\bigskip
+
+
+\subsection*{Part 1 (Deterministic Finite Automata)}
+
+\noindent
+There are many uses for Deterministic Finite Automata (DFAs), for
+example for testing whether a string matches a pattern or not. DFAs
+consist of some states (circles) and some transitions (edges) between
+states. For example the DFA
+
+\begin{center}
+\begin{tikzpicture}[scale=1.5,>=stealth',very thick,auto,
+ every state/.style={minimum size=4pt,
+ inner sep=4pt,draw=blue!50,very thick,
+ fill=blue!20}]
+ \node[state, initial] (q0) at ( 0,1) {$Q_0$};
+ \node[state] (q1) at ( 1,1) {$Q_1$};
+ \node[state, accepting] (q2) at ( 2,1) {$Q_2$};
+ \path[->] (q0) edge[bend left] node[above] {$a$} (q1)
+ (q1) edge[bend left] node[above] {$b$} (q0)
+ (q2) edge[bend left=50] node[below] {$b$} (q0)
+ (q1) edge node[above] {$a$} (q2)
+ (q2) edge [loop right] node {$a$} ()
+ (q0) edge [loop below] node {$b$} ();
+\end{tikzpicture}
+\end{center}
+
+\noindent
+has three states ($Q_0$, $Q_1$ and $Q_2$), whereby $Q_0$ is the
+starting state of the DFA and $Q_2$ is the accepting state. The latter
+is indicated by double lines. In general, a DFA can have any number of
+accepting states, but only a single starting state.
+
+Transitions are edges between states labelled with a character. The
+idea is that if we are in state $Q_0$, say, and get an $a$, we can go
+to state $Q_1$. If we are in state $Q_2$ and get an $a$, we can stay
+in state $Q_2$; if we get a $b$ in $Q_2$, then can go to state
+$Q_0$. The main point of DFAs is that if we are in a state and get a
+character, it is always clear which is the next state---there can only
+be at most one. The task of Part 1 is to implement such DFAs in Scala
+using partial functions for the transitions.
+
+A string is accepted by a DFA, if we start in the starting state,
+follow all transitions according to the string; if we end up in an
+accepting state, then the string is accepted. If not, the string is
+not accepted. The technical idea is that DFAs can be used to
+accept strings from \emph{regular} languages.
+
+\subsubsection*{Tasks}
+
+\begin{itemize}
+\item[(1)] Write a polymorphic function, called \texttt{share}, that
+ decides whether two sets share some elements (i.e.~the intersection
+ is not empty).\hfill[1 Mark]
+
+\item[(2)] The transitions of DFAs will be implemented as partial
+ functions. These functions will have the type (state,
+ character)-pair to state, that is their input will be a (state,
+ character)-pair and they return a state. For example the transitions
+ of the DFA shown above can be defined as the following
+ partial function:
+
+\begin{lstlisting}[language=Scala,numbers=none]
+val dfa_trans : PartialFunction[(State,Char), State] =
+ { case (Q0, 'a') => Q1
+ case (Q0, 'b') => Q0
+ case (Q1, 'a') => Q2
+ case (Q1, 'b') => Q0
+ case (Q2, 'a') => Q2
+ case (Q2, 'b') => Q0
+ }
+\end{lstlisting}
+
+ The main point of partial functions (as opposed to ``normal''
+ functions) is that they do not have to be defined everywhere. For
+ example the transitions above only mention characters $a$ and $b$,
+ but leave out any other characters. Partial functions come with a
+ method \texttt{isDefinedAt} that can be used to check whether an
+ input produces a result or not. For example
+
+\begin{lstlisting}[language=Scala,numbers=none]
+ dfa_trans.isDefinedAt((Q0, 'a'))
+ dfa_trans.isDefinedAt((Q0, 'c'))
+\end{lstlisting}
+
+ \noindent
+ gives \texttt{true} in the first case and \texttt{false} in the
+ second. There is also a method \texttt{lift} that transforms a
+ partial function into a ``normal'' function returning an optional
+ value: if the partial function is defined on the input, the lifted
+ function will return \texttt{Some}; if it is not defined, then
+ \texttt{None}.
+
+ Write a function that takes a transition and a (state, character)-pair as arguments
+ and produces an optional state (the state specified by the partial transition
+ function whenever it is defined; if the transition function is undefined,
+ return \texttt{None}).\hfill\mbox{[1 Mark]}
+
+\item[(3)]
+ Write a function that ``lifts'' the function in (2) from characters to strings. That
+ is, write a function that takes a transition, a state and a list of characters
+ as arguments and produces the state generated by following the transitions for
+ each character in the list. For example if you are in state $Q_0$ in the DFA above
+ and have the list \texttt{List(a,a,a,b,b,a)}, then you need to return the
+ state $Q_1$ (as option since there might not be such a state in general).\\
+ \mbox{}\hfill\mbox{[1 Mark]}
+
+\item[(4)] DFAs are defined as a triple: (starting state, transitions,
+ set of accepting states). Write a function \texttt{accepts} that tests whether
+ a string is accepted by an DFA or not. For this start in the
+ starting state of the DFA, use the function under (3) to calculate
+ the state after following all transitions according to the
+ characters in the string. If the resulting state is an accepting state,
+ return \texttt{true}; otherwise \texttt{false}.\\\mbox{}\hfill\mbox{[1 Mark]}
+\end{itemize}
+
+
+\subsection*{Part 2 (Non-Deterministic Finite Automata)}
+
+The main point of DFAs is that for every given state and character
+there is at most one next state (one if the transition is defined;
+none otherwise). However, this restriction to at most one state can be
+quite limiting for some applications.\footnote{Though there is a
+ curious fact that every (less restricted) NFA can be translated into
+ an ``equivalent'' DFA, whereby accepting means accepting the same
+ set of strings. However this might increase drastically the number
+ of states in the DFA.} Non-Deterministic Automata (NFAs) remove
+this restriction: there can be more than one starting state, and given
+a state and a character there can be more than one next
+state. Consider for example the NFA
+
+\begin{center}
+\begin{tikzpicture}[scale=0.7,>=stealth',very thick,
+ every state/.style={minimum size=0pt,
+ draw=blue!50,very thick,fill=blue!20},]
+\node[state,initial] (R_1) {$R_1$};
+\node[state,initial] (R_2) [above=of R_1] {$R_2$};
+\node[state, accepting] (R_3) [right=of R_1] {$R_3$};
+\path[->] (R_1) edge node [below] {$b$} (R_3);
+\path[->] (R_2) edge [bend left] node [above] {$a$} (R_3);
+\path[->] (R_1) edge [bend left] node [left] {$c$} (R_2);
+\path[->] (R_2) edge [bend left] node [right] {$a$} (R_1);
+\end{tikzpicture}
+\end{center}
+
+\noindent
+where in state $R_2$ if we get an $a$, we can go to state $R_1$
+\emph{or} $R_3$. If we want to find out whether an NFA accepts a
+string, then we need to explore both possibilities. We will do this
+``exploration'' in the tasks below in a breadth-first manner.
+
+
+The feature of having more than one next state in NFAs will be
+implemented by having a \emph{set} of partial transition functions
+(DFAs had only one). For example the NFA shown above will be
+represented by the set of partial functions
+
+\begin{lstlisting}[language=Scala,numbers=none]
+val nfa_trans : NTrans = Set(
+ { case (R1, 'c') => R2 },
+ { case (R1, 'b') => R3 },
+ { case (R2, 'a') => R1 },
+ { case (R2, 'a') => R3 }
+)
+\end{lstlisting}
+
+\noindent
+The point is that the 3rd element in this set makes sure that in state $R_2$ and
+given an $a$, we can go to state $R_1$; and the 4th element, in $R_2$,
+given an $a$, we can also go to state $R_3$. When following
+transitions from a state, we have to look at all partial functions in
+the set and generate the set of \emph{all} possible next states.
+
+\subsubsection*{Tasks}
+
+\begin{itemize}
+\item[(5)]
+ Write a function \texttt{nnext} which takes a transition set, a state
+ and a character as arguments, and calculates all possible next states
+ (returned as set).\\
+ \mbox{}\hfill\mbox{[1 Mark]}
+
+\item[(6)] Write a function \texttt{nnexts} which takes a transition
+ set, a \emph{set} of states and a character as arguments, and
+ calculates \emph{all} possible next states that can be reached from
+ any state in the set.\mbox{}\hfill\mbox{[1 Mark]}
+
+\item[(7)] Like in (3), write a function \texttt{nnextss} that lifts
+ \texttt{nnexts} from (6) from single characters to lists of characters.
+ \mbox{}\hfill\mbox{[1 Mark]}
+
+\item[(8)] NFAs are also defined as a triple: (set of staring states,
+ set of transitions, set of accepting states). Write a function
+ \texttt{naccepts} that tests whether a string is accepted by an NFA
+ or not. For this start in all starting states of the NFA, use the
+ function under (7) to calculate the set of states following all
+ transitions according to the characters in the string. If the
+ resulting set of states shares at least a single state with the set
+ of accepting states, return \texttt{true}; otherwise \texttt{false}.
+ Use the function under (1) in order to test whether these two sets
+ of states share any states or not.\mbox{}\hfill\mbox{[1 Mark]}
+
+\item[(9)] Since we explore in functions (6) and (7) all possible next
+ states, we decide whether a string is accepted in a breadth-first
+ manner. (Depth-first would be to choose one state, follow all next
+ states of this single state; check whether it leads to an accepting
+ state. If not, we backtrack and choose another state). The
+ disadvantage of breadth-first search is that at every step a
+ non-empty set of states are ``active''\ldots{} states that need to
+ be followed at the same time. Write similar functions as in (7) and
+ (8), but instead of returning states or a boolean, calculate the
+ number of states that need to be followed in each step. The function
+ \texttt{max\_accept} should then return the maximum of all these
+ numbers.
+
+ As a test case, consider again the NFA shown above. At the beginning
+ the number of active states will be 2 (since there are two starting
+ states, namely $R_1$ and $R_2$). If we get an $a$, there will be
+ still 2 active states, namely $R_1$ and $R_3$ both reachable from
+ $R_2$. There is no transition for $a$ and $R_1$. So for a string,
+ say, $ab$ which is accepted by the NFA, the maximum number of active
+ states is 2 (it is not possible that all three states of this NFA
+ are active at the same time; is it possible that no state is
+ active?). \hfill\mbox{[2 Marks]}
+
+
+\end{itemize}
+
+
+\end{document}
+
+
+%%% Local Variables:
+%%% mode: latex
+%%% TeX-master: t
+%%% End:
Binary file solutions3/knight2.jar has changed
--- a/solutions3/knight2.scala Wed Nov 28 23:26:47 2018 +0000
+++ b/solutions3/knight2.scala Thu Nov 29 17:15:11 2018 +0000
@@ -64,6 +64,8 @@
def first_closed_tour_heuristics(dim: Int, path: Path) =
time_needed(tfirst_closed_tour_heuristics(dim: Int, path: Path))
+def first_closed_tour_heuristic(dim: Int, path: Path) =
+ time_needed(tfirst_closed_tour_heuristics(dim: Int, path: Path))
// heuristic cannot be used to search for closed tours on 7 x 7 an beyond
//for (dim <- 1 to 6) {
@@ -82,6 +84,8 @@
def first_tour_heuristics(dim: Int, path: Path) =
time_needed(tfirst_tour_heuristics(dim: Int, path: Path))
+def first_tour_heuristic(dim: Int, path: Path) =
+ time_needed(tfirst_tour_heuristics(dim: Int, path: Path))
// will be called with boards up to 30 x 30
Binary file solutions3/knight3.jar has changed
--- a/testing3/knight1.scala Wed Nov 28 23:26:47 2018 +0000
+++ b/testing3/knight1.scala Thu Nov 29 17:15:11 2018 +0000
@@ -1,13 +1,105 @@
-// Part 1 about finding and counting Knight's tours
-//==================================================
+// Part 1 about finding Knight's tours
+//=====================================
-//object CW8a { // for preparing the jar
+// If you need any auxiliary function, feel free to
+// implement it, but do not make any changes to the
+// templates below. Also have a look whether the functions
+// at the end are of any help.
+
type Pos = (Int, Int) // a position on a chessboard
type Path = List[Pos] // a path...a list of positions
+//(1) Complete the function that tests whether the position x
+// is inside the board and not yet element in the path.
-// for measuring time in the JAR
+def is_legal(dim: Int, path: Path, x: Pos) : Boolean = ((!(path.contains(x))) && (x._1 < dim) && (x._2 < dim))
+
+
+
+//(2) Complete the function that calculates for a position x
+// all legal onward moves that are not already in the path.
+// The moves should be ordered in a "clockwise" manner.
+
+
+def legal_moves(dim: Int, path: Path, x: Pos) : List[Pos] ={
+ val y = List((x._1 + 1, x._2 + 2),
+ (x._1 + 2, x._2 + 1),
+ (x._1 + 2, x._2 - 1),
+ (x._1 + 1, x._2 - 2),
+ (x._1 - 1, x._2 - 2),
+ (x._1 - 2, x._2 - 1),
+ (x._1 - 2, x._2 + 1),
+ (x._1 - 1, x._2 + 2)
+ )
+ y.filter(next => is_legal(dim, path, next))
+}
+
+//some test cases
+//
+//assert(legal_moves(8, Nil, (2,2)) == List((3,4), (4,3), (4,1), (3,0), (1,0), (0,1), (0,3), (1,4)))
+//assert(legal_moves(8, Nil, (7,7)) == List((6,5), (5,6)))
+//assert(legal_moves(8, List((4,1), (1,0)), (2,2)) == List((3,4), (4,3), (3,0), (0,1), (0,3), (1,4)))
+//assert(legal_moves(8, List((6,6)), (7,7)) == List((6,5), (5,6)))
+
+
+//(3) Complete the two recursive functions below.
+// They exhaustively search for knight's tours starting from the
+// given path. The first function counts all possible tours,
+// and the second collects all tours in a list of paths.
+
+def count_tours(dim: Int, path: Path) : Int = {
+ if(path.length == dim*dim) 1 else
+ (for(i <- legal_moves(dim, path, path.head)) yield
+ count_tours(dim, i :: path)
+ ).sum
+}
+
+def enum_tours(dim: Int, path: Path) : List[Path] ={
+ if(path.length == dim*dim) List(path) else
+ (for(i <- legal_moves(dim, path, path.head)) yield
+ enum_tours(dim, i :: path)
+ ).flatten
+}
+
+//(5) Implement a first-function that finds the first
+// element, say x, in the list xs where f is not None.
+// In that case Return f(x), otherwise None. If possible,
+// calculate f(x) only once.
+
+def first(xs: List[Pos], f: Pos => Option[Path]) : Option[Path] = {
+ if(xs == Nil) None
+ else(
+ for(x <- xs) yield{
+ val a = f(x)
+ if(a != None) a
+ else first(xs.drop(1), f)
+ }
+ ).head
+}
+
+// test cases
+//def foo(x: (Int, Int)) = if (x._1 > 3) Some(List(x)) else None
+//
+//first(List((1, 0),(2, 0),(3, 0),(4, 0)), foo) // Some(List((4,0)))
+//first(List((1, 0),(2, 0),(3, 0)), foo) // None
+
+
+
+
+//(6) Implement a function that uses the first-function from (5) for
+// trying out onward moves, and searches recursively for a
+// knight tour on a dim * dim-board.
+
+
+// def first_tour(dim: Int, path: Path) : Option[Path] = {
+// first(legal_moves(dim, path, path.head), (x : Pos => ))
+// }
+
+/* Helper functions
+
+
+// for measuring time
def time_needed[T](code: => T) : T = {
val start = System.nanoTime()
val result = code
@@ -16,6 +108,11 @@
result
}
+// can be called for example with
+// time_needed(count_tours(dim, List((0, 0))))
+// in order to print out the time that is needed for
+// running count_tours
+
// for printing a board
def print_board(dim: Int, path: Path): Unit = {
println
@@ -27,144 +124,5 @@
}
}
-def is_legal(dim: Int, path: Path, x: Pos): Boolean =
- 0 <= x._1 && 0 <= x._2 && x._1 < dim && x._2 < dim && !path.contains(x)
-// testcases
-//assert(is_legal(8, Nil, (3, 4)) == true)
-//assert(is_legal(8, List((4, 1), (1, 0)), (4, 1)) == false)
-//assert(is_legal(2, Nil, (0, 0)) == true)
-
-
-def add_pair(x: Pos, y: Pos): Pos =
- (x._1 + y._1, x._2 + y._2)
-
-def moves(x: Pos): List[Pos] =
- List(( 1, 2),( 2, 1),( 2, -1),( 1, -2),
- (-1, -2),(-2, -1),(-2, 1),(-1, 2)).map(add_pair(x, _))
-
-// 1 mark
-
-def legal_moves(dim: Int, path: Path, x: Pos): List[Pos] =
- moves(x).filter(is_legal(dim, path, _))
-
-// testcases
-//assert(legal_moves(8, Nil, (2,2)) ==
-// List((3,4), (4,3), (4,1), (3,0), (1,0), (0,1), (0,3), (1,4)))
-//assert(legal_moves(8, Nil, (7,7)) == List((6,5), (5,6)))
-//assert(legal_moves(8, List((4,1), (1,0)), (2,2)) ==
-// List((3,4), (4,3), (3,0), (0,1), (0,3), (1,4)))
-//assert(legal_moves(8, List((6,6)), (7,7)) == List((6,5), (5,6)))
-//assert(legal_moves(1, Nil, (0,0)) == List())
-//assert(legal_moves(2, Nil, (0,0)) == List())
-//assert(legal_moves(3, Nil, (0,0)) == List((1,2), (2,1)))
-
-// 2 marks
-
-def tcount_tours(dim: Int, path: Path): Int = {
- if (path.length == dim * dim) 1
- else
- (for (x <- legal_moves(dim, path, path.head)) yield tcount_tours(dim, x::path)).sum
-}
-
-def count_tours(dim: Int, path: Path) =
- time_needed(tcount_tours(dim: Int, path: Path))
-
-
-def tenum_tours(dim: Int, path: Path): List[Path] = {
- if (path.length == dim * dim) List(path)
- else
- (for (x <- legal_moves(dim, path, path.head)) yield tenum_tours(dim, x::path)).flatten
-}
-
-def enum_tours(dim: Int, path: Path) =
- time_needed(tenum_tours(dim: Int, path: Path))
-
-// test cases
-
-/*
-def count_all_tours(dim: Int) = {
- for (i <- (0 until dim).toList;
- j <- (0 until dim).toList) yield count_tours(dim, List((i, j)))
-}
-
-def enum_all_tours(dim: Int): List[Path] = {
- (for (i <- (0 until dim).toList;
- j <- (0 until dim).toList) yield enum_tours(dim, List((i, j)))).flatten
-}
-
-
-println("Number of tours starting from (0, 0)")
-
-for (dim <- 1 to 5) {
- println(s"${dim} x ${dim} " + time_needed(0, count_tours(dim, List((0, 0)))))
-}
-
-println("Number of tours starting from all fields")
-
-for (dim <- 1 to 5) {
- println(s"${dim} x ${dim} " + time_needed(0, count_all_tours(dim)))
-}
-
-for (dim <- 1 to 5) {
- val ts = enum_tours(dim, List((0, 0)))
- println(s"${dim} x ${dim} ")
- if (ts != Nil) {
- print_board(dim, ts.head)
- println(ts.head)
- }
-}
*/
-
-// 1 mark
-
-def first(xs: List[Pos], f: Pos => Option[Path]): Option[Path] = xs match {
- case Nil => None
- case x::xs => {
- val result = f(x)
- if (result.isDefined) result else first(xs, f)
- }
-}
-
-// test cases
-//def foo(x: (Int, Int)) = if (x._1 > 3) Some(List(x)) else None
-//
-//first(List((1, 0),(2, 0),(3, 0),(4, 0)), foo)
-//first(List((1, 0),(2, 0),(3, 0)), foo)
-
-
-// 1 mark
-
-def tfirst_tour(dim: Int, path: Path): Option[Path] = {
- if (path.length == dim * dim) Some(path)
- else
- first(legal_moves(dim, path, path.head), (x:Pos) => tfirst_tour(dim, x::path))
-}
-
-def first_tour(dim: Int, path: Path) =
- time_needed(tfirst_tour(dim: Int, path: Path))
-
-
-/*
-for (dim <- 1 to 8) {
- val t = first_tour(dim, List((0, 0)))
- println(s"${dim} x ${dim} " + (if (t == None) "" else { print_board(dim, t.get) ; "" }))
-}
-*/
-
-// 15 secs for 8 x 8
-//val ts1 = time_needed(0,first_tour(8, List((0, 0))).get)
-
-// no result for 4 x 4
-//val ts2 = time_needed(0, first_tour(4, List((0, 0))))
-
-// 0.3 secs for 6 x 6
-//val ts3 = time_needed(0, first_tour(6, List((0, 0))))
-
-// 15 secs for 8 x 8
-//time_needed(0, print_board(8, first_tour(8, List((0, 0))).get))
-
-
-//}
-
-
--- a/testing4/re.scala Wed Nov 28 23:26:47 2018 +0000
+++ b/testing4/re.scala Thu Nov 29 17:15:11 2018 +0000
@@ -1,8 +1,9 @@
// Part 1 about Regular Expression Matching
//==========================================
-object CW8a {
+//object CW9a {
+// Regular Expressions
abstract class Rexp
case object ZERO extends Rexp
case object ONE extends Rexp
@@ -38,10 +39,11 @@
def ~ (r: String) = SEQ(s, r)
}
-// (1a) Complete the function nullable according to
+// (1) Complete the function nullable according to
// the definition given in the coursework; this
// function checks whether a regular expression
-// can match the empty string
+// can match the empty string and Returns a boolean
+// accordingly.
def nullable (r: Rexp) : Boolean = r match {
case ZERO => false
@@ -52,10 +54,10 @@
case STAR(_) => true
}
-// (1b) Complete the function der according to
+// (2) Complete the function der according to
// the definition given in the coursework; this
// function calculates the derivative of a
-// regular expression w.r.t. a character
+// regular expression w.r.t. a character.
def der (c: Char, r: Rexp) : Rexp = r match {
case ZERO => ZERO
@@ -68,11 +70,12 @@
case STAR(r1) => SEQ(der(c, r1), STAR(r1))
}
-// (1c) Complete the function der according to
+// (3) Complete the simp function according to
// the specification given in the coursework; this
-// function simplifies a regular expression;
-// however it does not simplify inside STAR-regular
-// expressions
+// function simplifies a regular expression from
+// the inside out, like you would simplify arithmetic
+// expressions; however it does not simplify inside
+// STAR-regular expressions.
def simp(r: Rexp) : Rexp = r match {
case ALT(r1, r2) => (simp(r1), simp(r2)) match {
@@ -90,11 +93,12 @@
case r => r
}
-// (1d) Complete the two functions below; the first
+
+// (4) Complete the two functions below; the first
// calculates the derivative w.r.t. a string; the second
// is the regular expression matcher taking a regular
// expression and a string and checks whether the
-// string matches the regular expression
+// string matches the regular expression.
def ders (s: List[Char], r: Rexp) : Rexp = s match {
case Nil => r
@@ -102,12 +106,13 @@
}
// main matcher function
-def matcher(r: Rexp, s: String): Boolean = nullable(ders(s.toList, r))
+def matcher(r: Rexp, s: String) = nullable(ders(s.toList, r))
-// (1e) Complete the size function for regular
-// expressions according to the specification
+// (5) Complete the size function for regular
+// expressions according to the specification
// given in the coursework.
+
def size(r: Rexp): Int = r match {
case ZERO => 1
case ONE => 1
@@ -138,11 +143,13 @@
size(simp(der('a', der('a', EVIL)))) // => 8
size(simp(der('a', der('a', der('a', EVIL))))) // => 8
-// Java needs around 30 seconds for matching 28 a's with EVIL.
+// Python needs around 30 seconds for matching 28 a's with EVIL.
+// Java 9 and later increase this to an "astonishing" 40000 a's in
+// around 30 seconds.
//
// Lets see how long it takes to match strings with
-// 0.5 Million a's...it should be in the range of some
-// seconds.
+// 5 Million a's...it should be in the range of a
+// couple of seconds.
def time_needed[T](i: Int, code: => T) = {
val start = System.nanoTime()
@@ -154,6 +161,16 @@
for (i <- 0 to 5000000 by 500000) {
println(i + " " + "%.5f".format(time_needed(2, matcher(EVIL, "a" * i))))
}
+
+// another "power" test case
+simp(Iterator.iterate(ONE:Rexp)(r => SEQ(r, ONE | ONE)).drop(100).next) == ONE
+
+// the Iterator produces the rexp
+//
+// SEQ(SEQ(SEQ(..., ONE | ONE) , ONE | ONE), ONE | ONE)
+//
+// where SEQ is nested 100 times.
+
*/
-}
+//}
--- a/testing4/re1a_test.scala Wed Nov 28 23:26:47 2018 +0000
+++ /dev/null Thu Jan 01 00:00:00 1970 +0000
@@ -1,21 +0,0 @@
-
-//import scala.concurrent._
-//import scala.concurrent.duration._
-//import ExecutionContext.Implicits.global
-//import scala.language.postfixOps
-//import scala.language.reflectiveCalls
-
-//lazy val f = Future {
- import CW8a._
-
- assert(nullable(ZERO) == false)
- assert(nullable(ONE) == true)
- assert(nullable(CHAR('a')) == false)
- assert(nullable(ZERO | ONE) == true)
- assert(nullable(ZERO | CHAR('a')) == false)
- assert(nullable(ONE ~ ONE) == true)
- assert(nullable(ONE ~ CHAR('a')) == false)
- assert(nullable(STAR(ZERO)) == true)
-//}
-
-//Await.result(f, 120 second)
--- a/testing4/re1b_test.scala Wed Nov 28 23:26:47 2018 +0000
+++ /dev/null Thu Jan 01 00:00:00 1970 +0000
@@ -1,18 +0,0 @@
-
-//import scala.concurrent._
-//import scala.concurrent.duration._
-//import ExecutionContext.Implicits.global
-//import scala.language.postfixOps
-//import scala.language.reflectiveCalls
-
-
-//lazy val f = Future {
- import CW8a._
-
- assert(der('a', ZERO | ONE) == (ZERO | ZERO))
- assert(der('a', (CHAR('a') | ONE) ~ CHAR('a')) == ALT((ONE | ZERO) ~ CHAR('a'), ONE))
- assert(der('a', STAR(CHAR('a'))) == (ONE ~ STAR(CHAR('a'))))
- assert(der('b', STAR(CHAR('a'))) == (ZERO ~ STAR(CHAR('a'))))
-//}
-
-//Await.result(f, 120 second)
--- a/testing4/re1c_test.scala Wed Nov 28 23:26:47 2018 +0000
+++ /dev/null Thu Jan 01 00:00:00 1970 +0000
@@ -1,24 +0,0 @@
-
-//import scala.concurrent._
-//import scala.concurrent.duration._
-//import ExecutionContext.Implicits.global
-//import scala.language.postfixOps
-//import scala.language.reflectiveCalls
-
-
-//lazy val f = Future {
- import CW8a._
-
- assert(simp(ZERO | ONE) == ONE)
- assert(simp(STAR(ZERO | ONE)) == STAR(ZERO | ONE))
- assert(simp(ONE ~ (ONE ~ (ONE ~ CHAR('a')))) == CHAR('a'))
- assert(simp(ONE ~ (ONE ~ (ONE ~ ZERO))) == ZERO)
- assert(simp(ALT(ONE ~ (ONE ~ (ONE ~ ZERO)), CHAR('a'))) == CHAR('a'))
- assert(simp(CHAR('a') | CHAR('a')) == CHAR('a'))
- assert(simp(ONE | CHAR('a')) == (ONE | CHAR('a')))
- assert(simp(ALT((CHAR('a') | ZERO) ~ ONE,
- ((ONE | CHAR('b')) | CHAR('c')) ~ (CHAR('d') ~ ZERO))) == CHAR('a'))
-
-//}
-
-//Await.result(f, 30 second)
--- a/testing4/re1d_test.scala Wed Nov 28 23:26:47 2018 +0000
+++ /dev/null Thu Jan 01 00:00:00 1970 +0000
@@ -1,41 +0,0 @@
-
-//import scala.concurrent._
-//import scala.concurrent.duration._
-//import ExecutionContext.Implicits.global
-//import scala.language.postfixOps
-//import scala.language.reflectiveCalls
-
-
-
-//lazy val f = Future {
- import CW8a._
-
- val EVIL_urban = SEQ(STAR(STAR(CHAR('a'))), CHAR('b'))
-
- //println("1")
- assert(ders(List.fill(5)('a'), EVIL_urban) == SEQ(SEQ(STAR(CHAR('a')),STAR(STAR(CHAR('a')))),CHAR('b')))
- //println("2")
- assert(ders(List('b'), EVIL_urban) == ONE)
- //println("3")
- assert(ders(List('b','b'), EVIL_urban) == ZERO)
- //println("4")
- assert(matcher(EVIL_urban, "a" * 5 ++ "b") == true)
- //println("5")
- assert(matcher(EVIL_urban, "b") == true)
- //println("6")
- assert(matcher(EVIL_urban, "bb") == false)
- //println("7")
- assert(matcher("abc", "abc") == true)
- //println("8")
- assert(matcher(("ab" | "a") ~ (ONE | "bc"), "abc") == true)
- //println("9")
- assert(matcher(ONE, "") == true)
- //println("10")
- assert(matcher(ZERO, "") == false)
- //println("11")
- assert(matcher(ONE | CHAR('a'), "") == true)
- //println("12")
- assert(matcher(ONE | CHAR('a'), "a") == true)
-//}
-
-//Await.result(f, 90 second)
--- a/testing4/re1e_test.scala Wed Nov 28 23:26:47 2018 +0000
+++ /dev/null Thu Jan 01 00:00:00 1970 +0000
@@ -1,19 +0,0 @@
-
-//import scala.concurrent._
-//import scala.concurrent.duration._
-//import ExecutionContext.Implicits.global
-//import scala.language.postfixOps
-//import scala.language.reflectiveCalls
-
-//lazy val f = Future {
- import CW8a._
-
- val EVIL_urban = SEQ(STAR(STAR(CHAR('a'))), CHAR('b'))
-
- assert(size(der('a', der('a', EVIL_urban))) == 28)
- assert(size(der('a', der('a', der('a', EVIL_urban)))) == 58)
-
- assert(size(ders("aaaaaa".toList, EVIL_urban)) == 8)
-//}
-
-//Await.result(f, 120 second)
--- a/testing4/re_test.sh Wed Nov 28 23:26:47 2018 +0000
+++ b/testing4/re_test.sh Thu Nov 29 17:15:11 2018 +0000
@@ -1,24 +1,26 @@
#!/bin/bash
-set -e
+set -euo pipefail
+
out=${1:-output}
-echo "" > $out
+echo -e "" > $out
-echo "Below is the feedback for your submission of CW 8, Part 1." >> $out
-echo "" >> $out
+echo -e "Below is the feedback for your submission of CW 9, Part 1." >> $out
+echo -e "" >> $out
# compilation tests
function scala_compile {
- (ulimit -t 30 -m 1024000 ; scala "$1" 2>> $out 1>> $out)
+ (ulimit -t 30; JAVA_OPTS="-Xmx1g" scala "$1" 2>> $out 1>> $out)
}
# functional tests
function scala_assert {
- (ulimit -t 30 -m 1024000 ; scala -i "$1" "$2" -e "" 2> /dev/null 1> /dev/null)
+ (ulimit -t 30; JAVA_OPTS="-Xmx1g" scala -nc -i "$1" "$2" -e "" 2> /dev/null 1> /dev/null)
+
}
# purity test
@@ -30,53 +32,54 @@
# var, return, ListBuffer test
#
-echo "re.scala does not contain vars, returns etc?" >> $out
+echo -e "re.scala does not contain vars, returns etc?" >> $out
if (scala_vars re.scala)
then
- echo " --> fail" >> $out
- tsts0=$(( 1 ))
+ echo -e " --> fail (make triple-sure your program conforms to the required format)" >> $out
+ tsts0=$(( 0 ))
else
- echo " --> yes" >> $out
+ echo -e " --> success" >> $out
tsts0=$(( 0 ))
fi
# compilation test
+
if [ $tsts0 -eq 0 ]
then
- echo "re.scala runs?" >> $out
+ echo -e "re.scala runs?" >> $out
if (scala_compile re.scala)
then
- echo " --> yes" >> $out
+ echo -e " --> yes" >> $out
tsts1=$(( 0 ))
else
- echo " --> scala re.scala did not run successfully" >> $out
+ echo -e " --> SCALA DID NOT RUN RE.SCALA\n" >> $out
tsts1=$(( 1 ))
fi
else
tsts1=$(( 1 ))
fi
-
+### re tests
if [ $tsts1 -eq 0 ]
then
- echo " nullable(ZERO) == false" >> $out
- echo " nullable(ONE) == true" >> $out
- echo " nullable(CHAR('a')) == false" >> $out
- echo " nullable(ZERO | ONE) == true" >> $out
- echo " nullable(ZERO | CHAR('a')) == false" >> $out
- echo " nullable(ONE ~ ONE) == true" >> $out
- echo " nullable(ONE ~ CHAR('a')) == false" >> $out
- echo " nullable(STAR(ZERO)) == true" >> $out
+ echo -e " nullable(ZERO) == false" >> $out
+ echo -e " nullable(ONE) == true" >> $out
+ echo -e " nullable(CHAR('a')) == false" >> $out
+ echo -e " nullable(ZERO | ONE) == true" >> $out
+ echo -e " nullable(ZERO | CHAR('a')) == false" >> $out
+ echo -e " nullable(ONE ~ ONE) == true" >> $out
+ echo -e " nullable(ONE ~ CHAR('a')) == false" >> $out
+ echo -e " nullable(STAR(ZERO)) == true" >> $out
- if (scala_assert "re.scala" "re1a_test.scala")
+ if (scala_assert "re.scala" "re_test1.scala")
then
- echo " --> success" >> $out
+ echo -e " --> success" >> $out
else
- echo " --> test failed" >> $out
+ echo -e " --> \n ONE TEST FAILED\n" >> $out
fi
fi
@@ -84,16 +87,17 @@
if [ $tsts1 -eq 0 ]
then
- echo " der('a', ZERO | ONE) == (ZERO | ZERO)" >> $out
- echo " der('a', (CHAR('a') | ONE) ~ CHAR('a')) == ALT((ONE | ZERO) ~ CHAR('a'), ONE)" >> $out
- echo " der('a', STAR(CHAR('a'))) == (ONE ~ STAR(CHAR('a')))" >> $out
- echo " der('b', STAR(CHAR('a'))) == (ZERO ~ STAR(CHAR('a')))" >> $out
+ echo -e " der('a', ZERO | ONE) == (ZERO | ZERO)" >> $out
+ echo -e " der('a', (CHAR('a') | ONE) ~ CHAR('a')) ==" >> $out
+ echo -e " ALT((ONE | ZERO) ~ CHAR('a'), ONE)" >> $out
+ echo -e " der('a', STAR(CHAR('a'))) == (ONE ~ STAR(CHAR('a')))" >> $out
+ echo -e " der('b', STAR(CHAR('a'))) == (ZERO ~ STAR(CHAR('a')))" >> $out
- if (scala_assert "re.scala" "re1b_test.scala")
+ if (scala_assert "re.scala" "re_test2.scala")
then
- echo " --> success" >> $out
+ echo -e " --> success" >> $out
else
- echo " --> test failed" >> $out
+ echo -e " --> \n ONE TEST FAILED\n" >> $out
fi
fi
@@ -101,61 +105,61 @@
if [ $tsts1 -eq 0 ]
then
- echo " simp(ZERO | ONE) == ONE" >> $out
- echo " simp(STAR(ZERO | ONE)) == STAR(ZERO | ONE)" >> $out
- echo " simp(ONE ~ (ONE ~ (ONE ~ CHAR('a')))) == CHAR('a')" >> $out
- echo " simp(ONE ~ (ONE ~ (ONE ~ ZERO))) == ZERO" >> $out
- echo " simp(ALT(ONE ~ (ONE ~ (ONE ~ ZERO)), CHAR('a'))) == CHAR('a')" >> $out
- echo " simp(CHAR('a') | CHAR('a')) == CHAR('a')" >> $out
- echo " simp(ONE | CHAR('a')) == (ONE | CHAR('a'))" >> $out
- echo " simp(ALT((CHAR('a') | ZERO) ~ ONE," >> $out
- echo " ((ONE | CHAR('b')) | CHAR('c')) ~ (CHAR('d') ~ ZERO))) == CHAR('a')" >> $out
- if (scala_assert "re.scala" "re1c_test.scala")
+ echo -e " simp(ZERO | ONE) == ONE" >> $out
+ echo -e " simp(STAR(ZERO | ONE)) == STAR(ZERO | ONE)" >> $out
+ echo -e " simp(ONE ~ (ONE ~ (ONE ~ CHAR('a')))) == CHAR('a')" >> $out
+ echo -e " simp(ONE ~ (ONE ~ (ONE ~ ZERO))) == ZERO" >> $out
+ echo -e " simp(ALT(ONE ~ (ONE ~ (ONE ~ ZERO)), CHAR('a'))) == CHAR('a')" >> $out
+ echo -e " simp(CHAR('a') | CHAR('a')) == CHAR('a')" >> $out
+ echo -e " simp(ONE | CHAR('a')) == (ONE | CHAR('a'))" >> $out
+ echo -e " simp(ALT((CHAR('a') | ZERO) ~ ONE," >> $out
+ echo -e " ((ONE | CHAR('b')) | CHAR('c')) ~ (CHAR('d') ~ ZERO))) == CHAR('a')" >> $out
+ if (scala_assert "re.scala" "re_test3.scala")
then
- echo " --> success" >> $out
+ echo -e " --> success" >> $out
else
- echo " --> test failed" >> $out
+ echo -e " --> \n ONE TEST FAILED\n" >> $out
fi
fi
if [ $tsts1 -eq 0 ]
then
- echo " val EVIL = SEQ(STAR(STAR(CHAR('a'))), CHAR('b'))" >> $out
- echo " ders(List.fill(5)('a'),EVIL) == SEQ(SEQ(STAR(CHAR('a')),STAR(STAR(CHAR('a')))),CHAR('b'))" >> $out
- echo " ders(List('b'),EVIL) == ONE" >> $out
- echo " ders(List('b','b'),EVIL) == ZERO" >> $out
- echo " matcher(EVIL, \"a\" * 5 ++ \"b\") == true" >> $out
- echo " matcher(EVIL, \"b\") == true" >> $out
- echo " matcher(EVIL, \"bb\") == false" >> $out
- echo " matcher(\"abc\", \"abc\") == true" >> $out
- echo " matcher((\"ab\" | \"a\") ~ (ONE | \"bc\"), \"abc\") == true" >> $out
- echo " matcher(ONE, \"\") == true" >> $out
- echo " matcher(ZERO, \"\") == false" >> $out
- echo " matcher(ONE | CHAR('a'), \"\") == true" >> $out
- echo " matcher(ONE | CHAR('a'), \"a\") == true" >> $out
+ echo -e " val EVIL = SEQ(STAR(STAR(CHAR('a'))), CHAR('b'))" >> $out
+ echo -e " ders(\"aaaaa\".toList, EVIL) == SEQ(SEQ(STAR(CHAR('a')),STAR(STAR(CHAR('a')))),CHAR('b'))" >> $out
+ echo -e " ders(List('b'), EVIL) == ONE" >> $out
+ echo -e " ders(\"bb\".toList, EVIL) == ZERO" >> $out
+ echo -e " matcher(EVIL, \"a\" * 5 ++ \"b\") == true" >> $out
+ echo -e " matcher(EVIL, \"b\") == true" >> $out
+ echo -e " matcher(EVIL, \"bb\") == false" >> $out
+ echo -e " matcher(\"abc\", \"abc\") == true" >> $out
+ echo -e " matcher((\"ab\" | \"a\") ~ (ONE | \"bc\"), \"abc\") == true" >> $out
+ echo -e " matcher(ONE, \"\") == true" >> $out
+ echo -e " matcher(ZERO, \"\") == false" >> $out
+ echo -e " matcher(ONE | CHAR('a'), \"\") == true" >> $out
+ echo -e " matcher(ONE | CHAR('a'), \"a\") == true" >> $out
- if (scala_assert "re.scala" "re1d_test.scala")
+ if (scala_assert "re.scala" "re_test4.scala")
then
- echo " --> success" >> $out
+ echo -e " --> success" >> $out
else
- echo " --> test failed" >> $out
+ echo -e " --> \n ONE TEST FAILED\n" >> $out
fi
fi
if [ $tsts1 -eq 0 ]
then
- echo " val EVIL = SEQ(STAR(STAR(CHAR('a'))), CHAR('b'))" >> $out
- echo " size(der('a', der('a', EVIL))) == 28" >> $out
- echo " size(der('a', der('a', der('a', EVIL)))) == 58" >> $out
- echo " size(ders(\"aaaaaa\".toList, EVIL)) == 8" >> $out
+ echo -e " val EVIL = SEQ(STAR(STAR(CHAR('a'))), CHAR('b'))" >> $out
+ echo -e " size(der('a', der('a', EVIL))) == 28" >> $out
+ echo -e " size(der('a', der('a', der('a', EVIL)))) == 58" >> $out
+ echo -e " size(ders(\"aaaaaa\".toList, EVIL)) == 8" >> $out
- if (scala_assert "re.scala" "re1e_test.scala")
+ if (scala_assert "re.scala" "re_test5.scala")
then
- echo " --> success" >> $out
+ echo -e " --> success" >> $out
else
- echo " --> test failed" >> $out
+ echo -e " --> \n ONE TEST FAILED\n" >> $out
fi
fi
--- /dev/null Thu Jan 01 00:00:00 1970 +0000
+++ b/testing4/re_test1.scala Thu Nov 29 17:15:11 2018 +0000
@@ -0,0 +1,24 @@
+
+
+
+assert(nullable(ZERO) == false)
+assert(nullable(ONE) == true)
+assert(nullable(CHAR('a')) == false)
+assert(nullable(ZERO | ONE) == true)
+assert(nullable(ZERO | CHAR('a')) == false)
+assert(nullable(ONE ~ ONE) == true)
+assert(nullable(ONE ~ CHAR('a')) == false)
+assert(nullable(STAR(ZERO)) == true)
+
+
+
+//import scala.concurrent._
+//import scala.concurrent.duration._
+//import ExecutionContext.Implicits.global
+//import scala.language.postfixOps
+//import scala.language.reflectiveCalls
+
+//lazy val f = Future {
+
+//}
+//Await.result(f,30 second)
--- /dev/null Thu Jan 01 00:00:00 1970 +0000
+++ b/testing4/re_test2.scala Thu Nov 29 17:15:11 2018 +0000
@@ -0,0 +1,6 @@
+
+
+assert(der('a', ZERO | ONE) == (ZERO | ZERO))
+assert(der('a', (CHAR('a') | ONE) ~ CHAR('a')) == ALT((ONE | ZERO) ~ CHAR('a'), ONE))
+assert(der('a', STAR(CHAR('a'))) == (ONE ~ STAR(CHAR('a'))))
+assert(der('b', STAR(CHAR('a'))) == (ZERO ~ STAR(CHAR('a'))))
--- /dev/null Thu Jan 01 00:00:00 1970 +0000
+++ b/testing4/re_test3.scala Thu Nov 29 17:15:11 2018 +0000
@@ -0,0 +1,13 @@
+
+
+
+assert(simp(ZERO | ONE) == ONE)
+assert(simp(STAR(ZERO | ONE)) == STAR(ZERO | ONE))
+assert(simp(ONE ~ (ONE ~ (ONE ~ CHAR('a')))) == CHAR('a'))
+assert(simp(ONE ~ (ONE ~ (ONE ~ ZERO))) == ZERO)
+assert(simp(ALT(ONE ~ (ONE ~ (ONE ~ ZERO)), CHAR('a'))) == CHAR('a'))
+assert(simp(CHAR('a') | CHAR('a')) == CHAR('a'))
+assert(simp(ONE | CHAR('a')) == (ONE | CHAR('a')))
+assert(simp(ALT((CHAR('a') | ZERO) ~ ONE,
+ ((ONE | CHAR('b')) | CHAR('c')) ~ (CHAR('d') ~ ZERO))) == CHAR('a'))
+
--- /dev/null Thu Jan 01 00:00:00 1970 +0000
+++ b/testing4/re_test4.scala Thu Nov 29 17:15:11 2018 +0000
@@ -0,0 +1,27 @@
+
+val EVIL_urban = SEQ(STAR(STAR(CHAR('a'))), CHAR('b'))
+
+//println("1")
+assert(ders("aaaaa".toList, EVIL_urban) == SEQ(SEQ(STAR(CHAR('a')),STAR(STAR(CHAR('a')))),CHAR('b')))
+//println("2")
+assert(ders(List('b'), EVIL_urban) == ONE)
+//println("3")
+assert(ders("bb".toList, EVIL_urban) == ZERO)
+//println("4")
+assert(matcher(EVIL_urban, "a" * 5 ++ "b") == true)
+//println("5")
+assert(matcher(EVIL_urban, "b") == true)
+//println("6")
+assert(matcher(EVIL_urban, "bb") == false)
+//println("7")
+assert(matcher("abc", "abc") == true)
+//println("8")
+assert(matcher(("ab" | "a") ~ (ONE | "bc"), "abc") == true)
+//println("9")
+assert(matcher(ONE, "") == true)
+//println("10")
+assert(matcher(ZERO, "") == false)
+//println("11")
+assert(matcher(ONE | CHAR('a'), "") == true)
+//println("12")
+assert(matcher(ONE | CHAR('a'), "a") == true)
--- /dev/null Thu Jan 01 00:00:00 1970 +0000
+++ b/testing4/re_test5.scala Thu Nov 29 17:15:11 2018 +0000
@@ -0,0 +1,7 @@
+
+val EVIL_urban = SEQ(STAR(STAR(CHAR('a'))), CHAR('b'))
+
+assert(size(der('a', der('a', EVIL_urban))) == 28)
+assert(size(der('a', der('a', der('a', EVIL_urban)))) == 58)
+
+assert(size(ders("aaaaaa".toList, EVIL_urban)) == 8)