diff -r ef5f62bf5987 -r 9a04eb6a2291 cws/cw04.tex --- a/cws/cw04.tex Tue Oct 29 23:56:13 2019 +0000 +++ b/cws/cw04.tex Wed Oct 30 11:28:44 2019 +0000 @@ -99,21 +99,22 @@ \mbox{}\hfill\textit{``[Google’s MapReduce] abstraction is inspired by the}\\ \mbox{}\hfill\textit{map and reduce primitives present in Lisp and many}\\ \mbox{}\hfill\textit{other functional language.''}\smallskip\\ -\mbox{}\hfill\textit{ --- Dean and Ghemawat, who designed this concept at Google}\bigskip +\mbox{}\hfill\textit{ --- Dean and Ghemawat, who designed this concept at Google} +\bigskip\medskip \noindent -This coursework is worth 10\%. It is about a regular expression -matcher and the shunting yard algorithm by Dijkstra. The first part is -due on \cwNINE{} at 11pm; the second, more advanced part, is due on -\cwNINEa{} at 11pm. In the first part, you are asked to implement a -regular expression matcher based on derivatives of regular -expressions. The background is that ``out-of-the-box'' regular -expression matching in mainstream languages like Java, JavaScript and -Python can sometimes be excruciatingly slow. You are supposed to implement -an regular expression matcher that is much, much faster. The advanced part is -about the shunting yard algorithm that transforms the usual infix -notation of arithmetic expressions into the postfix notation, which is -for example used in compilers.\bigskip +This coursework is about the shunting yard algorithm by Dijkstra and a +regular expression matcher by Brzozowski. The preliminary part is due on +\cwNINE{} at 4pm; the core, more advanced part, is due on \cwNINEa{} +at 4pm. The preliminary part is about the shunting yard algorithm that +transforms the usual infix notation of arithmetic expressions into the +postfix notation, which is for example used in compilers. In the core +part, you are asked to implement a regular expression matcher based on +derivatives of regular expressions. The background is that +``out-of-the-box'' regular expression matching in mainstream languages +like Java, JavaScript and Python can sometimes be excruciatingly slow. +You are supposed to implement an regular expression matcher that is +much, much faster. \bigskip \IMPORTANT{} @@ -159,7 +160,116 @@ \end{lstlisting}%$ -\subsection*{Part 1 (6 Marks)} +\subsection*{Preliminary Part (4 Marks)} + +The \emph{Shunting Yard Algorithm} has been developed by Edsger Dijkstra, +an influential computer scientist who developed many well-known +algorithms. This algorithm transforms the usual infix notation of +arithmetic expressions into the postfix notation, sometimes also +called reverse Polish notation. + +Why on Earth do people use the postfix notation? It is much more +convenient to work with the usual infix notation for arithmetic +expressions. Most modern calculators (as opposed to the ones used 20 +years ago) understand infix notation. So why on Earth? \ldots{}Well, +many computers under the hood, even nowadays, use postfix notation +extensively. For example if you give to the Java compiler the +expression $1 + ((2 * 3) + (4 - 3))$, it will generate the Java Byte +code + +\begin{lstlisting}[language=JVMIS,numbers=none] +ldc 1 +ldc 2 +ldc 3 +imul +ldc 4 +ldc 3 +isub +iadd +iadd +\end{lstlisting} + +\noindent +where the command \texttt{ldc} loads a constant onto the stack, and \texttt{imul}, +\texttt{isub} and \texttt{iadd} are commands acting on the stack. Clearly this +is the arithmetic expression in postfix notation.\bigskip + +\noindent +The shunting yard algorithm processes an input token list using an +operator stack and an output list. The input consists of numbers, +operators ($+$, $-$, $*$, $/$) and parentheses, and for the purpose of +the assignment we assume the input is always a well-formed expression +in infix notation. The calculation in the shunting yard algorithm uses +information about the +precedences of the operators (given in the template file). The +algorithm processes the input token list as follows: + +\begin{itemize} +\item If there is a number as input token, then this token is + transferred directly to the output list. Then the rest of the input is + processed. +\item If there is an operator as input token, then you need to check + what is on top of the operator stack. If there are operators with + a higher or equal precedence, these operators are first popped off + from the stack and moved to the output list. Then the operator from the input + is pushed onto the stack and the rest of the input is processed. +\item If the input is a left-parenthesis, you push it on to the stack + and continue processing the input. +\item If the input is a right-parenthesis, then you pop off all operators + from the stack to the output list until you reach the left-parenthesis. + Then you discharge the $($ and $)$ from the input and stack, and continue + processing the input list. +\item If the input is empty, then you move all remaining operators + from the stack to the output list. +\end{itemize} + +\subsubsection*{Tasks (file postfix.scala)} + +\begin{itemize} +\item[(1)] Implement the shunting yard algorithm described above. The + function, called \texttt{syard}, takes a list of tokens as first + argument. The second and third arguments are the stack and output + list represented as Scala lists. The most convenient way to + implement this algorithm is to analyse what the input list, stack + and output list look like in each step using pattern-matching. The + algorithm transforms for example the input + + \[ + \texttt{List(3, +, 4, *, (, 2, -, 1, ))} + \] + + into the postfix output + + \[ + \texttt{List(3, 4, 2, 1, -, *, +)} + \] + + You can assume the input list is always a list representing + a well-formed infix arithmetic expression.\hfill[1 Mark] + +\item[(2)] Implement a compute function that takes a postfix expression + as argument and evaluates it generating an integer as result. It uses a + stack to evaluate the postfix expression. The operators $+$, $-$, $*$ + are as usual; $/$ is division on integers, for example $7 / 3 = 2$. + \hfill[1 Mark] +\end{itemize} + +\subsubsection*{Task (file postfix2.scala)} + +\begin{itemize} +\item[(3)] Extend the code in (7) and (8) to include the power + operator. This requires proper account of associativity of + the operators. The power operator is right-associative, whereas the + other operators are left-associative. Left-associative operators + are popped off if the precedence is bigger or equal, while + right-associative operators are only popped off if the precedence is + bigger. The compute function in this task should use + \texttt{Long}s, rather than \texttt{Int}s.\hfill[2 Marks] +\end{itemize} + + + +\subsection*{Core Part (6 Marks)} The task is to implement a regular expression matcher that is based on derivatives of regular expressions. Most of the functions are defined by @@ -220,7 +330,7 @@ \begin{itemize} -\item[(1)] Implement a function, called \textit{nullable}, by +\item[(5)] Implement a function, called \textit{nullable}, by recursion over regular expressions. This function tests whether a regular expression can match the empty string. This means given a regular expression it either returns true or false. The function @@ -238,7 +348,7 @@ \end{tabular} \end{center}~\hfill[1 Mark] -\item[(2)] Implement a function, called \textit{der}, by recursion over +\item[(6)] Implement a function, called \textit{der}, by recursion over regular expressions. It takes a character and a regular expression as arguments and calculates the derivative of a xregular expression according to the rules: @@ -294,7 +404,7 @@ Note, the last derivative can match the empty string, that is it is \textit{nullable}.\\ \mbox{}\hfill\mbox{[1 Mark]} -\item[(3)] Implement the function \textit{simp}, which recursively +\item[(7)] Implement the function \textit{simp}, which recursively traverses a regular expression, and on the way up simplifies every regular expression on the left (see below) to the regular expression on the right, except it does not simplify inside ${}^*$-regular @@ -330,7 +440,7 @@ simplification in an inside-out fashion, it is always clear which simplification should be applied next.\hfill[1 Mark] -\item[(4)] Implement two functions: The first, called \textit{ders}, +\item[(8)] Implement two functions: The first, called \textit{ders}, takes a list of characters and a regular expression as arguments, and builds the derivative w.r.t.~the list as follows: @@ -353,7 +463,7 @@ true for the regular expression $(a\cdot b)\cdot c$ and the string $abc$, but false if you give it the string $ab$. \hfill[1 Mark] -\item[(5)] Implement a function, called \textit{size}, by recursion +\item[(9)] Implement a function, called \textit{size}, by recursion over regular expressions. If a regular expression is seen as a tree, then \textit{size} should return the number of nodes in such a tree. Therefore this function is defined as follows: @@ -493,112 +603,6 @@ \end{center} \newpage -\subsection*{Part 2 (4 Marks)} - -The \emph{Shunting Yard Algorithm} has been developed by Edsger Dijkstra, -an influential computer scientist who developed many well-known -algorithms. This algorithm transforms the usual infix notation of -arithmetic expressions into the postfix notation, sometimes also -called reverse Polish notation. - -Why on Earth do people use the postfix notation? It is much more -convenient to work with the usual infix notation for arithmetic -expressions. Most modern calculators (as opposed to the ones used 20 -years ago) understand infix notation. So why on Earth? \ldots{}Well, -many computers under the hood, even nowadays, use postfix notation -extensively. For example if you give to the Java compiler the -expression $1 + ((2 * 3) + (4 - 3))$, it will generate the Java Byte -code - -\begin{lstlisting}[language=JVMIS,numbers=none] -ldc 1 -ldc 2 -ldc 3 -imul -ldc 4 -ldc 3 -isub -iadd -iadd -\end{lstlisting} - -\noindent -where the command \texttt{ldc} loads a constant onto the stack, and \texttt{imul}, -\texttt{isub} and \texttt{iadd} are commands acting on the stack. Clearly this -is the arithmetic expression in postfix notation.\bigskip - -\noindent -The shunting yard algorithm processes an input token list using an -operator stack and an output list. The input consists of numbers, -operators ($+$, $-$, $*$, $/$) and parentheses, and for the purpose of -the assignment we assume the input is always a well-formed expression -in infix notation. The calculation in the shunting yard algorithm uses -information about the -precedences of the operators (given in the template file). The -algorithm processes the input token list as follows: - -\begin{itemize} -\item If there is a number as input token, then this token is - transferred directly to the output list. Then the rest of the input is - processed. -\item If there is an operator as input token, then you need to check - what is on top of the operator stack. If there are operators with - a higher or equal precedence, these operators are first popped off - from the stack and moved to the output list. Then the operator from the input - is pushed onto the stack and the rest of the input is processed. -\item If the input is a left-parenthesis, you push it on to the stack - and continue processing the input. -\item If the input is a right-parenthesis, then you pop off all operators - from the stack to the output list until you reach the left-parenthesis. - Then you discharge the $($ and $)$ from the input and stack, and continue - processing the input list. -\item If the input is empty, then you move all remaining operators - from the stack to the output list. -\end{itemize} - -\subsubsection*{Tasks (file postfix.scala)} - -\begin{itemize} -\item[(7)] Implement the shunting yard algorithm described above. The - function, called \texttt{syard}, takes a list of tokens as first - argument. The second and third arguments are the stack and output - list represented as Scala lists. The most convenient way to - implement this algorithm is to analyse what the input list, stack - and output list look like in each step using pattern-matching. The - algorithm transforms for example the input - - \[ - \texttt{List(3, +, 4, *, (, 2, -, 1, ))} - \] - - into the postfix output - - \[ - \texttt{List(3, 4, 2, 1, -, *, +)} - \] - - You can assume the input list is always a list representing - a well-formed infix arithmetic expression.\hfill[1 Mark] - -\item[(8)] Implement a compute function that takes a postfix expression - as argument and evaluates it generating an integer as result. It uses a - stack to evaluate the postfix expression. The operators $+$, $-$, $*$ - are as usual; $/$ is division on integers, for example $7 / 3 = 2$. - \hfill[1 Mark] -\end{itemize} - -\subsubsection*{Task (file postfix2.scala)} - -\begin{itemize} -\item[(9)] Extend the code in (7) and (8) to include the power - operator. This requires proper account of associativity of - the operators. The power operator is right-associative, whereas the - other operators are left-associative. Left-associative operators - are popped off if the precedence is bigger or equal, while - right-associative operators are only popped off if the precedence is - bigger. The compute function in this task should use - \texttt{Long}s, rather than \texttt{Int}s.\hfill[2 Marks] -\end{itemize}