--- a/cws/cw04.tex Tue Oct 29 23:56:13 2019 +0000
+++ b/cws/cw04.tex Wed Oct 30 11:28:44 2019 +0000
@@ -99,21 +99,22 @@
\mbox{}\hfill\textit{``[Google’s MapReduce] abstraction is inspired by the}\\
\mbox{}\hfill\textit{map and reduce primitives present in Lisp and many}\\
\mbox{}\hfill\textit{other functional language.''}\smallskip\\
-\mbox{}\hfill\textit{ --- Dean and Ghemawat, who designed this concept at Google}\bigskip
+\mbox{}\hfill\textit{ --- Dean and Ghemawat, who designed this concept at Google}
+\bigskip\medskip
\noindent
-This coursework is worth 10\%. It is about a regular expression
-matcher and the shunting yard algorithm by Dijkstra. The first part is
-due on \cwNINE{} at 11pm; the second, more advanced part, is due on
-\cwNINEa{} at 11pm. In the first part, you are asked to implement a
-regular expression matcher based on derivatives of regular
-expressions. The background is that ``out-of-the-box'' regular
-expression matching in mainstream languages like Java, JavaScript and
-Python can sometimes be excruciatingly slow. You are supposed to implement
-an regular expression matcher that is much, much faster. The advanced part is
-about the shunting yard algorithm that transforms the usual infix
-notation of arithmetic expressions into the postfix notation, which is
-for example used in compilers.\bigskip
+This coursework is about the shunting yard algorithm by Dijkstra and a
+regular expression matcher by Brzozowski. The preliminary part is due on
+\cwNINE{} at 4pm; the core, more advanced part, is due on \cwNINEa{}
+at 4pm. The preliminary part is about the shunting yard algorithm that
+transforms the usual infix notation of arithmetic expressions into the
+postfix notation, which is for example used in compilers. In the core
+part, you are asked to implement a regular expression matcher based on
+derivatives of regular expressions. The background is that
+``out-of-the-box'' regular expression matching in mainstream languages
+like Java, JavaScript and Python can sometimes be excruciatingly slow.
+You are supposed to implement an regular expression matcher that is
+much, much faster. \bigskip
\IMPORTANT{}
@@ -159,7 +160,116 @@
\end{lstlisting}%$
-\subsection*{Part 1 (6 Marks)}
+\subsection*{Preliminary Part (4 Marks)}
+
+The \emph{Shunting Yard Algorithm} has been developed by Edsger Dijkstra,
+an influential computer scientist who developed many well-known
+algorithms. This algorithm transforms the usual infix notation of
+arithmetic expressions into the postfix notation, sometimes also
+called reverse Polish notation.
+
+Why on Earth do people use the postfix notation? It is much more
+convenient to work with the usual infix notation for arithmetic
+expressions. Most modern calculators (as opposed to the ones used 20
+years ago) understand infix notation. So why on Earth? \ldots{}Well,
+many computers under the hood, even nowadays, use postfix notation
+extensively. For example if you give to the Java compiler the
+expression $1 + ((2 * 3) + (4 - 3))$, it will generate the Java Byte
+code
+
+\begin{lstlisting}[language=JVMIS,numbers=none]
+ldc 1
+ldc 2
+ldc 3
+imul
+ldc 4
+ldc 3
+isub
+iadd
+iadd
+\end{lstlisting}
+
+\noindent
+where the command \texttt{ldc} loads a constant onto the stack, and \texttt{imul},
+\texttt{isub} and \texttt{iadd} are commands acting on the stack. Clearly this
+is the arithmetic expression in postfix notation.\bigskip
+
+\noindent
+The shunting yard algorithm processes an input token list using an
+operator stack and an output list. The input consists of numbers,
+operators ($+$, $-$, $*$, $/$) and parentheses, and for the purpose of
+the assignment we assume the input is always a well-formed expression
+in infix notation. The calculation in the shunting yard algorithm uses
+information about the
+precedences of the operators (given in the template file). The
+algorithm processes the input token list as follows:
+
+\begin{itemize}
+\item If there is a number as input token, then this token is
+ transferred directly to the output list. Then the rest of the input is
+ processed.
+\item If there is an operator as input token, then you need to check
+ what is on top of the operator stack. If there are operators with
+ a higher or equal precedence, these operators are first popped off
+ from the stack and moved to the output list. Then the operator from the input
+ is pushed onto the stack and the rest of the input is processed.
+\item If the input is a left-parenthesis, you push it on to the stack
+ and continue processing the input.
+\item If the input is a right-parenthesis, then you pop off all operators
+ from the stack to the output list until you reach the left-parenthesis.
+ Then you discharge the $($ and $)$ from the input and stack, and continue
+ processing the input list.
+\item If the input is empty, then you move all remaining operators
+ from the stack to the output list.
+\end{itemize}
+
+\subsubsection*{Tasks (file postfix.scala)}
+
+\begin{itemize}
+\item[(1)] Implement the shunting yard algorithm described above. The
+ function, called \texttt{syard}, takes a list of tokens as first
+ argument. The second and third arguments are the stack and output
+ list represented as Scala lists. The most convenient way to
+ implement this algorithm is to analyse what the input list, stack
+ and output list look like in each step using pattern-matching. The
+ algorithm transforms for example the input
+
+ \[
+ \texttt{List(3, +, 4, *, (, 2, -, 1, ))}
+ \]
+
+ into the postfix output
+
+ \[
+ \texttt{List(3, 4, 2, 1, -, *, +)}
+ \]
+
+ You can assume the input list is always a list representing
+ a well-formed infix arithmetic expression.\hfill[1 Mark]
+
+\item[(2)] Implement a compute function that takes a postfix expression
+ as argument and evaluates it generating an integer as result. It uses a
+ stack to evaluate the postfix expression. The operators $+$, $-$, $*$
+ are as usual; $/$ is division on integers, for example $7 / 3 = 2$.
+ \hfill[1 Mark]
+\end{itemize}
+
+\subsubsection*{Task (file postfix2.scala)}
+
+\begin{itemize}
+\item[(3)] Extend the code in (7) and (8) to include the power
+ operator. This requires proper account of associativity of
+ the operators. The power operator is right-associative, whereas the
+ other operators are left-associative. Left-associative operators
+ are popped off if the precedence is bigger or equal, while
+ right-associative operators are only popped off if the precedence is
+ bigger. The compute function in this task should use
+ \texttt{Long}s, rather than \texttt{Int}s.\hfill[2 Marks]
+\end{itemize}
+
+
+
+\subsection*{Core Part (6 Marks)}
The task is to implement a regular expression matcher that is based on
derivatives of regular expressions. Most of the functions are defined by
@@ -220,7 +330,7 @@
\begin{itemize}
-\item[(1)] Implement a function, called \textit{nullable}, by
+\item[(5)] Implement a function, called \textit{nullable}, by
recursion over regular expressions. This function tests whether a
regular expression can match the empty string. This means given a
regular expression it either returns true or false. The function
@@ -238,7 +348,7 @@
\end{tabular}
\end{center}~\hfill[1 Mark]
-\item[(2)] Implement a function, called \textit{der}, by recursion over
+\item[(6)] Implement a function, called \textit{der}, by recursion over
regular expressions. It takes a character and a regular expression
as arguments and calculates the derivative of a xregular expression according
to the rules:
@@ -294,7 +404,7 @@
Note, the last derivative can match the empty string, that is it is \textit{nullable}.\\
\mbox{}\hfill\mbox{[1 Mark]}
-\item[(3)] Implement the function \textit{simp}, which recursively
+\item[(7)] Implement the function \textit{simp}, which recursively
traverses a regular expression, and on the way up simplifies every
regular expression on the left (see below) to the regular expression
on the right, except it does not simplify inside ${}^*$-regular
@@ -330,7 +440,7 @@
simplification in an inside-out fashion, it is always clear which
simplification should be applied next.\hfill[1 Mark]
-\item[(4)] Implement two functions: The first, called \textit{ders},
+\item[(8)] Implement two functions: The first, called \textit{ders},
takes a list of characters and a regular expression as arguments, and
builds the derivative w.r.t.~the list as follows:
@@ -353,7 +463,7 @@
true for the regular expression $(a\cdot b)\cdot c$ and the string
$abc$, but false if you give it the string $ab$. \hfill[1 Mark]
-\item[(5)] Implement a function, called \textit{size}, by recursion
+\item[(9)] Implement a function, called \textit{size}, by recursion
over regular expressions. If a regular expression is seen as a tree,
then \textit{size} should return the number of nodes in such a
tree. Therefore this function is defined as follows:
@@ -493,112 +603,6 @@
\end{center}
\newpage
-\subsection*{Part 2 (4 Marks)}
-
-The \emph{Shunting Yard Algorithm} has been developed by Edsger Dijkstra,
-an influential computer scientist who developed many well-known
-algorithms. This algorithm transforms the usual infix notation of
-arithmetic expressions into the postfix notation, sometimes also
-called reverse Polish notation.
-
-Why on Earth do people use the postfix notation? It is much more
-convenient to work with the usual infix notation for arithmetic
-expressions. Most modern calculators (as opposed to the ones used 20
-years ago) understand infix notation. So why on Earth? \ldots{}Well,
-many computers under the hood, even nowadays, use postfix notation
-extensively. For example if you give to the Java compiler the
-expression $1 + ((2 * 3) + (4 - 3))$, it will generate the Java Byte
-code
-
-\begin{lstlisting}[language=JVMIS,numbers=none]
-ldc 1
-ldc 2
-ldc 3
-imul
-ldc 4
-ldc 3
-isub
-iadd
-iadd
-\end{lstlisting}
-
-\noindent
-where the command \texttt{ldc} loads a constant onto the stack, and \texttt{imul},
-\texttt{isub} and \texttt{iadd} are commands acting on the stack. Clearly this
-is the arithmetic expression in postfix notation.\bigskip
-
-\noindent
-The shunting yard algorithm processes an input token list using an
-operator stack and an output list. The input consists of numbers,
-operators ($+$, $-$, $*$, $/$) and parentheses, and for the purpose of
-the assignment we assume the input is always a well-formed expression
-in infix notation. The calculation in the shunting yard algorithm uses
-information about the
-precedences of the operators (given in the template file). The
-algorithm processes the input token list as follows:
-
-\begin{itemize}
-\item If there is a number as input token, then this token is
- transferred directly to the output list. Then the rest of the input is
- processed.
-\item If there is an operator as input token, then you need to check
- what is on top of the operator stack. If there are operators with
- a higher or equal precedence, these operators are first popped off
- from the stack and moved to the output list. Then the operator from the input
- is pushed onto the stack and the rest of the input is processed.
-\item If the input is a left-parenthesis, you push it on to the stack
- and continue processing the input.
-\item If the input is a right-parenthesis, then you pop off all operators
- from the stack to the output list until you reach the left-parenthesis.
- Then you discharge the $($ and $)$ from the input and stack, and continue
- processing the input list.
-\item If the input is empty, then you move all remaining operators
- from the stack to the output list.
-\end{itemize}
-
-\subsubsection*{Tasks (file postfix.scala)}
-
-\begin{itemize}
-\item[(7)] Implement the shunting yard algorithm described above. The
- function, called \texttt{syard}, takes a list of tokens as first
- argument. The second and third arguments are the stack and output
- list represented as Scala lists. The most convenient way to
- implement this algorithm is to analyse what the input list, stack
- and output list look like in each step using pattern-matching. The
- algorithm transforms for example the input
-
- \[
- \texttt{List(3, +, 4, *, (, 2, -, 1, ))}
- \]
-
- into the postfix output
-
- \[
- \texttt{List(3, 4, 2, 1, -, *, +)}
- \]
-
- You can assume the input list is always a list representing
- a well-formed infix arithmetic expression.\hfill[1 Mark]
-
-\item[(8)] Implement a compute function that takes a postfix expression
- as argument and evaluates it generating an integer as result. It uses a
- stack to evaluate the postfix expression. The operators $+$, $-$, $*$
- are as usual; $/$ is division on integers, for example $7 / 3 = 2$.
- \hfill[1 Mark]
-\end{itemize}
-
-\subsubsection*{Task (file postfix2.scala)}
-
-\begin{itemize}
-\item[(9)] Extend the code in (7) and (8) to include the power
- operator. This requires proper account of associativity of
- the operators. The power operator is right-associative, whereas the
- other operators are left-associative. Left-associative operators
- are popped off if the precedence is bigger or equal, while
- right-associative operators are only popped off if the precedence is
- bigger. The compute function in this task should use
- \texttt{Long}s, rather than \texttt{Int}s.\hfill[2 Marks]
-\end{itemize}