cws/cw04.tex
changeset 284 9a04eb6a2291
parent 275 eb1b4ad23941
child 289 08b5ddbc7e55
equal deleted inserted replaced
283:ef5f62bf5987 284:9a04eb6a2291
    97 \section*{Coursework 9 (Scala)}
    97 \section*{Coursework 9 (Scala)}
    98 
    98 
    99 \mbox{}\hfill\textit{``[Google’s MapReduce] abstraction is inspired by the}\\
    99 \mbox{}\hfill\textit{``[Google’s MapReduce] abstraction is inspired by the}\\
   100 \mbox{}\hfill\textit{map and reduce primitives present in Lisp and many}\\
   100 \mbox{}\hfill\textit{map and reduce primitives present in Lisp and many}\\
   101 \mbox{}\hfill\textit{other functional language.''}\smallskip\\
   101 \mbox{}\hfill\textit{other functional language.''}\smallskip\\
   102 \mbox{}\hfill\textit{ --- Dean and Ghemawat, who designed this concept at Google}\bigskip
   102 \mbox{}\hfill\textit{ --- Dean and Ghemawat, who designed this concept at Google}
       
   103 \bigskip\medskip
   103 
   104 
   104 \noindent
   105 \noindent
   105 This coursework is worth 10\%. It is about a regular expression
   106 This coursework is about the shunting yard algorithm by Dijkstra and a
   106 matcher and the shunting yard algorithm by Dijkstra. The first part is
   107 regular expression matcher by Brzozowski. The preliminary part is due on
   107 due on \cwNINE{} at 11pm; the second, more advanced part, is due on
   108 \cwNINE{} at 4pm; the core, more advanced part, is due on \cwNINEa{}
   108 \cwNINEa{} at 11pm. In the first part, you are asked to implement a
   109 at 4pm. The preliminary part is about the shunting yard algorithm that
   109 regular expression matcher based on derivatives of regular
   110 transforms the usual infix notation of arithmetic expressions into the
   110 expressions. The background is that ``out-of-the-box'' regular
   111 postfix notation, which is for example used in compilers. In the core
   111 expression matching in mainstream languages like Java, JavaScript and
   112 part, you are asked to implement a regular expression matcher based on
   112 Python can sometimes be excruciatingly slow. You are supposed to implement
   113 derivatives of regular expressions. The background is that
   113 an regular expression matcher that is much, much faster. The advanced part is
   114 ``out-of-the-box'' regular expression matching in mainstream languages
   114 about the shunting yard algorithm that transforms the usual infix
   115 like Java, JavaScript and Python can sometimes be excruciatingly slow.
   115 notation of arithmetic expressions into the postfix notation, which is
   116 You are supposed to implement an regular expression matcher that is
   116 for example used in compilers.\bigskip
   117 much, much faster. \bigskip
   117 
   118 
   118 \IMPORTANT{}
   119 \IMPORTANT{}
   119 
   120 
   120 \noindent
   121 \noindent
   121 Also note that the running time of each part will be restricted to a
   122 Also note that the running time of each part will be restricted to a
   157 4500000 1.15395 secs.
   158 4500000 1.15395 secs.
   158 5000000 1.29659 secs.
   159 5000000 1.29659 secs.
   159 \end{lstlisting}%$
   160 \end{lstlisting}%$
   160 
   161 
   161 
   162 
   162 \subsection*{Part 1 (6 Marks)}
   163 \subsection*{Preliminary Part (4 Marks)}
       
   164 
       
   165 The \emph{Shunting Yard Algorithm} has been developed by Edsger Dijkstra,
       
   166 an influential computer scientist who developed many well-known
       
   167 algorithms. This algorithm transforms the usual infix notation of
       
   168 arithmetic expressions into the postfix notation, sometimes also
       
   169 called reverse Polish notation.
       
   170 
       
   171 Why on Earth do people use the postfix notation? It is much more
       
   172 convenient to work with the usual infix notation for arithmetic
       
   173 expressions. Most modern calculators (as opposed to the ones used 20
       
   174 years ago) understand infix notation. So why on Earth? \ldots{}Well,
       
   175 many computers under the hood, even nowadays, use postfix notation
       
   176 extensively. For example if you give to the Java compiler the
       
   177 expression $1 + ((2 * 3) + (4 - 3))$, it will generate the Java Byte
       
   178 code
       
   179 
       
   180 \begin{lstlisting}[language=JVMIS,numbers=none]
       
   181 ldc 1 
       
   182 ldc 2 
       
   183 ldc 3 
       
   184 imul 
       
   185 ldc 4 
       
   186 ldc 3 
       
   187 isub 
       
   188 iadd 
       
   189 iadd
       
   190 \end{lstlisting}
       
   191 
       
   192 \noindent
       
   193 where the command \texttt{ldc} loads a constant onto the stack, and \texttt{imul},
       
   194 \texttt{isub} and \texttt{iadd} are commands acting on the stack. Clearly this
       
   195 is the arithmetic expression in postfix notation.\bigskip
       
   196 
       
   197 \noindent
       
   198 The shunting yard algorithm processes an input token list using an
       
   199 operator stack and an output list. The input consists of numbers,
       
   200 operators ($+$, $-$, $*$, $/$) and parentheses, and for the purpose of
       
   201 the assignment we assume the input is always a well-formed expression
       
   202 in infix notation.  The calculation in the shunting yard algorithm uses
       
   203 information about the
       
   204 precedences of the operators (given in the template file). The
       
   205 algorithm processes the input token list as follows:
       
   206 
       
   207 \begin{itemize}
       
   208 \item If there is a number as input token, then this token is
       
   209   transferred directly to the output list. Then the rest of the input is
       
   210   processed.
       
   211 \item If there is an operator as input token, then you need to check
       
   212   what is on top of the operator stack. If there are operators with
       
   213   a higher or equal precedence, these operators are first popped off
       
   214   from the stack and moved to the output list. Then the operator from the input
       
   215   is pushed onto the stack and the rest of the input is processed.
       
   216 \item If the input is a left-parenthesis, you push it on to the stack
       
   217   and continue processing the input.
       
   218 \item If the input is a right-parenthesis, then you pop off all operators
       
   219   from the stack to the output list until you reach the left-parenthesis.
       
   220   Then you discharge the $($ and $)$ from the input and stack, and continue
       
   221   processing the input list.
       
   222 \item If the input is empty, then you move all remaining operators
       
   223   from the stack to the output list.  
       
   224 \end{itemize}  
       
   225 
       
   226 \subsubsection*{Tasks (file postfix.scala)}
       
   227 
       
   228 \begin{itemize}
       
   229 \item[(1)] Implement the shunting yard algorithm described above. The
       
   230   function, called \texttt{syard}, takes a list of tokens as first
       
   231   argument. The second and third arguments are the stack and output
       
   232   list represented as Scala lists. The most convenient way to
       
   233   implement this algorithm is to analyse what the input list, stack
       
   234   and output list look like in each step using pattern-matching. The
       
   235   algorithm transforms for example the input
       
   236 
       
   237   \[
       
   238   \texttt{List(3, +, 4, *, (, 2, -, 1, ))}
       
   239   \]
       
   240 
       
   241   into the postfix output
       
   242 
       
   243   \[
       
   244   \texttt{List(3, 4, 2, 1, -, *, +)}
       
   245   \]  
       
   246   
       
   247   You can assume the input list is always a  list representing
       
   248   a well-formed infix arithmetic expression.\hfill[1 Mark]
       
   249 
       
   250 \item[(2)] Implement a compute function that takes a postfix expression
       
   251   as argument and evaluates it generating an integer as result. It uses a
       
   252   stack to evaluate the postfix expression. The operators $+$, $-$, $*$
       
   253   are as usual; $/$ is division on integers, for example $7 / 3 = 2$.
       
   254   \hfill[1 Mark]
       
   255 \end{itemize}
       
   256 
       
   257 \subsubsection*{Task (file postfix2.scala)}
       
   258 
       
   259 \begin{itemize}
       
   260 \item[(3)] Extend the code in (7) and (8) to include the power
       
   261   operator.  This requires proper account of associativity of
       
   262   the operators. The power operator is right-associative, whereas the
       
   263   other operators are left-associative.  Left-associative operators
       
   264   are popped off if the precedence is bigger or equal, while
       
   265   right-associative operators are only popped off if the precedence is
       
   266   bigger. The compute function in this task should use
       
   267   \texttt{Long}s, rather than \texttt{Int}s.\hfill[2 Marks]
       
   268 \end{itemize}
       
   269 
       
   270 
       
   271 
       
   272 \subsection*{Core Part (6 Marks)}
   163 
   273 
   164 The task is to implement a regular expression matcher that is based on
   274 The task is to implement a regular expression matcher that is based on
   165 derivatives of regular expressions. Most of the functions are defined by
   275 derivatives of regular expressions. Most of the functions are defined by
   166 recursion over regular expressions and can be elegantly implemented
   276 recursion over regular expressions and can be elegantly implemented
   167 using Scala's pattern-matching. The implementation should deal with the
   277 using Scala's pattern-matching. The implementation should deal with the
   218 \end{tabular}    
   328 \end{tabular}    
   219 \end{center}  
   329 \end{center}  
   220 
   330 
   221 
   331 
   222 \begin{itemize}
   332 \begin{itemize}
   223 \item[(1)] Implement a function, called \textit{nullable}, by
   333 \item[(5)] Implement a function, called \textit{nullable}, by
   224   recursion over regular expressions. This function tests whether a
   334   recursion over regular expressions. This function tests whether a
   225   regular expression can match the empty string. This means given a
   335   regular expression can match the empty string. This means given a
   226   regular expression it either returns true or false. The function
   336   regular expression it either returns true or false. The function
   227   \textit{nullable}
   337   \textit{nullable}
   228   is defined as follows:
   338   is defined as follows:
   236 $\textit{nullable}(r_1 \cdot r_2)$ & $\dn$ & $\textit{nullable}(r_1) \wedge \textit{nullable}(r_2)$\\
   346 $\textit{nullable}(r_1 \cdot r_2)$ & $\dn$ & $\textit{nullable}(r_1) \wedge \textit{nullable}(r_2)$\\
   237 $\textit{nullable}(r^*)$ & $\dn$ & $\textit{true}$\\
   347 $\textit{nullable}(r^*)$ & $\dn$ & $\textit{true}$\\
   238 \end{tabular}
   348 \end{tabular}
   239 \end{center}~\hfill[1 Mark]
   349 \end{center}~\hfill[1 Mark]
   240 
   350 
   241 \item[(2)] Implement a function, called \textit{der}, by recursion over
   351 \item[(6)] Implement a function, called \textit{der}, by recursion over
   242   regular expressions. It takes a character and a regular expression
   352   regular expressions. It takes a character and a regular expression
   243   as arguments and calculates the derivative of a xregular expression according
   353   as arguments and calculates the derivative of a xregular expression according
   244   to the rules:
   354   to the rules:
   245 
   355 
   246 \begin{center}
   356 \begin{center}
   292 \end{center}
   402 \end{center}
   293 
   403 
   294 Note, the last derivative can match the empty string, that is it is \textit{nullable}.\\
   404 Note, the last derivative can match the empty string, that is it is \textit{nullable}.\\
   295 \mbox{}\hfill\mbox{[1 Mark]}
   405 \mbox{}\hfill\mbox{[1 Mark]}
   296 
   406 
   297 \item[(3)] Implement the function \textit{simp}, which recursively
   407 \item[(7)] Implement the function \textit{simp}, which recursively
   298   traverses a regular expression, and on the way up simplifies every
   408   traverses a regular expression, and on the way up simplifies every
   299   regular expression on the left (see below) to the regular expression
   409   regular expression on the left (see below) to the regular expression
   300   on the right, except it does not simplify inside ${}^*$-regular
   410   on the right, except it does not simplify inside ${}^*$-regular
   301   expressions.
   411   expressions.
   302 
   412 
   328   for example the $y \cdot 0$ by $0$, or $1\cdot x$ by $x$, and then
   438   for example the $y \cdot 0$ by $0$, or $1\cdot x$ by $x$, and then
   329   look whether more rules are applicable. If you organise the
   439   look whether more rules are applicable. If you organise the
   330   simplification in an inside-out fashion, it is always clear which
   440   simplification in an inside-out fashion, it is always clear which
   331   simplification should be applied next.\hfill[1 Mark]
   441   simplification should be applied next.\hfill[1 Mark]
   332 
   442 
   333 \item[(4)] Implement two functions: The first, called \textit{ders},
   443 \item[(8)] Implement two functions: The first, called \textit{ders},
   334   takes a list of characters and a regular expression as arguments, and
   444   takes a list of characters and a regular expression as arguments, and
   335   builds the derivative w.r.t.~the list as follows:
   445   builds the derivative w.r.t.~the list as follows:
   336 
   446 
   337 \begin{center}
   447 \begin{center}
   338 \begin{tabular}{lcl}
   448 \begin{tabular}{lcl}
   351 derivative regular expression can match the empty string (using
   461 derivative regular expression can match the empty string (using
   352 \textit{nullable}).  For example the \textit{matcher} will produce
   462 \textit{nullable}).  For example the \textit{matcher} will produce
   353 true for the regular expression $(a\cdot b)\cdot c$ and the string
   463 true for the regular expression $(a\cdot b)\cdot c$ and the string
   354 $abc$, but false if you give it the string $ab$. \hfill[1 Mark]
   464 $abc$, but false if you give it the string $ab$. \hfill[1 Mark]
   355 
   465 
   356 \item[(5)] Implement a function, called \textit{size}, by recursion
   466 \item[(9)] Implement a function, called \textit{size}, by recursion
   357   over regular expressions. If a regular expression is seen as a tree,
   467   over regular expressions. If a regular expression is seen as a tree,
   358   then \textit{size} should return the number of nodes in such a
   468   then \textit{size} should return the number of nodes in such a
   359   tree. Therefore this function is defined as follows:
   469   tree. Therefore this function is defined as follows:
   360 
   470 
   361 \begin{center}
   471 \begin{center}
   491 \end{tikzpicture}
   601 \end{tikzpicture}
   492 \end{tabular}  
   602 \end{tabular}  
   493 \end{center}
   603 \end{center}
   494 \newpage
   604 \newpage
   495 
   605 
   496 \subsection*{Part 2 (4 Marks)}
       
   497 
       
   498 The \emph{Shunting Yard Algorithm} has been developed by Edsger Dijkstra,
       
   499 an influential computer scientist who developed many well-known
       
   500 algorithms. This algorithm transforms the usual infix notation of
       
   501 arithmetic expressions into the postfix notation, sometimes also
       
   502 called reverse Polish notation.
       
   503 
       
   504 Why on Earth do people use the postfix notation? It is much more
       
   505 convenient to work with the usual infix notation for arithmetic
       
   506 expressions. Most modern calculators (as opposed to the ones used 20
       
   507 years ago) understand infix notation. So why on Earth? \ldots{}Well,
       
   508 many computers under the hood, even nowadays, use postfix notation
       
   509 extensively. For example if you give to the Java compiler the
       
   510 expression $1 + ((2 * 3) + (4 - 3))$, it will generate the Java Byte
       
   511 code
       
   512 
       
   513 \begin{lstlisting}[language=JVMIS,numbers=none]
       
   514 ldc 1 
       
   515 ldc 2 
       
   516 ldc 3 
       
   517 imul 
       
   518 ldc 4 
       
   519 ldc 3 
       
   520 isub 
       
   521 iadd 
       
   522 iadd
       
   523 \end{lstlisting}
       
   524 
       
   525 \noindent
       
   526 where the command \texttt{ldc} loads a constant onto the stack, and \texttt{imul},
       
   527 \texttt{isub} and \texttt{iadd} are commands acting on the stack. Clearly this
       
   528 is the arithmetic expression in postfix notation.\bigskip
       
   529 
       
   530 \noindent
       
   531 The shunting yard algorithm processes an input token list using an
       
   532 operator stack and an output list. The input consists of numbers,
       
   533 operators ($+$, $-$, $*$, $/$) and parentheses, and for the purpose of
       
   534 the assignment we assume the input is always a well-formed expression
       
   535 in infix notation.  The calculation in the shunting yard algorithm uses
       
   536 information about the
       
   537 precedences of the operators (given in the template file). The
       
   538 algorithm processes the input token list as follows:
       
   539 
       
   540 \begin{itemize}
       
   541 \item If there is a number as input token, then this token is
       
   542   transferred directly to the output list. Then the rest of the input is
       
   543   processed.
       
   544 \item If there is an operator as input token, then you need to check
       
   545   what is on top of the operator stack. If there are operators with
       
   546   a higher or equal precedence, these operators are first popped off
       
   547   from the stack and moved to the output list. Then the operator from the input
       
   548   is pushed onto the stack and the rest of the input is processed.
       
   549 \item If the input is a left-parenthesis, you push it on to the stack
       
   550   and continue processing the input.
       
   551 \item If the input is a right-parenthesis, then you pop off all operators
       
   552   from the stack to the output list until you reach the left-parenthesis.
       
   553   Then you discharge the $($ and $)$ from the input and stack, and continue
       
   554   processing the input list.
       
   555 \item If the input is empty, then you move all remaining operators
       
   556   from the stack to the output list.  
       
   557 \end{itemize}  
       
   558 
       
   559 \subsubsection*{Tasks (file postfix.scala)}
       
   560 
       
   561 \begin{itemize}
       
   562 \item[(7)] Implement the shunting yard algorithm described above. The
       
   563   function, called \texttt{syard}, takes a list of tokens as first
       
   564   argument. The second and third arguments are the stack and output
       
   565   list represented as Scala lists. The most convenient way to
       
   566   implement this algorithm is to analyse what the input list, stack
       
   567   and output list look like in each step using pattern-matching. The
       
   568   algorithm transforms for example the input
       
   569 
       
   570   \[
       
   571   \texttt{List(3, +, 4, *, (, 2, -, 1, ))}
       
   572   \]
       
   573 
       
   574   into the postfix output
       
   575 
       
   576   \[
       
   577   \texttt{List(3, 4, 2, 1, -, *, +)}
       
   578   \]  
       
   579   
       
   580   You can assume the input list is always a  list representing
       
   581   a well-formed infix arithmetic expression.\hfill[1 Mark]
       
   582 
       
   583 \item[(8)] Implement a compute function that takes a postfix expression
       
   584   as argument and evaluates it generating an integer as result. It uses a
       
   585   stack to evaluate the postfix expression. The operators $+$, $-$, $*$
       
   586   are as usual; $/$ is division on integers, for example $7 / 3 = 2$.
       
   587   \hfill[1 Mark]
       
   588 \end{itemize}
       
   589 
       
   590 \subsubsection*{Task (file postfix2.scala)}
       
   591 
       
   592 \begin{itemize}
       
   593 \item[(9)] Extend the code in (7) and (8) to include the power
       
   594   operator.  This requires proper account of associativity of
       
   595   the operators. The power operator is right-associative, whereas the
       
   596   other operators are left-associative.  Left-associative operators
       
   597   are popped off if the precedence is bigger or equal, while
       
   598   right-associative operators are only popped off if the precedence is
       
   599   bigger. The compute function in this task should use
       
   600   \texttt{Long}s, rather than \texttt{Int}s.\hfill[2 Marks]
       
   601 \end{itemize}
       
   602 
   606 
   603 
   607 
   604 
   608 
   605 
   609 
   606 \end{document}
   610 \end{document}