97 \section*{Coursework 9 (Scala)} |
97 \section*{Coursework 9 (Scala)} |
98 |
98 |
99 \mbox{}\hfill\textit{``[Google’s MapReduce] abstraction is inspired by the}\\ |
99 \mbox{}\hfill\textit{``[Google’s MapReduce] abstraction is inspired by the}\\ |
100 \mbox{}\hfill\textit{map and reduce primitives present in Lisp and many}\\ |
100 \mbox{}\hfill\textit{map and reduce primitives present in Lisp and many}\\ |
101 \mbox{}\hfill\textit{other functional language.''}\smallskip\\ |
101 \mbox{}\hfill\textit{other functional language.''}\smallskip\\ |
102 \mbox{}\hfill\textit{ --- Dean and Ghemawat, who designed this concept at Google}\bigskip |
102 \mbox{}\hfill\textit{ --- Dean and Ghemawat, who designed this concept at Google} |
|
103 \bigskip\medskip |
103 |
104 |
104 \noindent |
105 \noindent |
105 This coursework is worth 10\%. It is about a regular expression |
106 This coursework is about the shunting yard algorithm by Dijkstra and a |
106 matcher and the shunting yard algorithm by Dijkstra. The first part is |
107 regular expression matcher by Brzozowski. The preliminary part is due on |
107 due on \cwNINE{} at 11pm; the second, more advanced part, is due on |
108 \cwNINE{} at 4pm; the core, more advanced part, is due on \cwNINEa{} |
108 \cwNINEa{} at 11pm. In the first part, you are asked to implement a |
109 at 4pm. The preliminary part is about the shunting yard algorithm that |
109 regular expression matcher based on derivatives of regular |
110 transforms the usual infix notation of arithmetic expressions into the |
110 expressions. The background is that ``out-of-the-box'' regular |
111 postfix notation, which is for example used in compilers. In the core |
111 expression matching in mainstream languages like Java, JavaScript and |
112 part, you are asked to implement a regular expression matcher based on |
112 Python can sometimes be excruciatingly slow. You are supposed to implement |
113 derivatives of regular expressions. The background is that |
113 an regular expression matcher that is much, much faster. The advanced part is |
114 ``out-of-the-box'' regular expression matching in mainstream languages |
114 about the shunting yard algorithm that transforms the usual infix |
115 like Java, JavaScript and Python can sometimes be excruciatingly slow. |
115 notation of arithmetic expressions into the postfix notation, which is |
116 You are supposed to implement an regular expression matcher that is |
116 for example used in compilers.\bigskip |
117 much, much faster. \bigskip |
117 |
118 |
118 \IMPORTANT{} |
119 \IMPORTANT{} |
119 |
120 |
120 \noindent |
121 \noindent |
121 Also note that the running time of each part will be restricted to a |
122 Also note that the running time of each part will be restricted to a |
157 4500000 1.15395 secs. |
158 4500000 1.15395 secs. |
158 5000000 1.29659 secs. |
159 5000000 1.29659 secs. |
159 \end{lstlisting}%$ |
160 \end{lstlisting}%$ |
160 |
161 |
161 |
162 |
162 \subsection*{Part 1 (6 Marks)} |
163 \subsection*{Preliminary Part (4 Marks)} |
|
164 |
|
165 The \emph{Shunting Yard Algorithm} has been developed by Edsger Dijkstra, |
|
166 an influential computer scientist who developed many well-known |
|
167 algorithms. This algorithm transforms the usual infix notation of |
|
168 arithmetic expressions into the postfix notation, sometimes also |
|
169 called reverse Polish notation. |
|
170 |
|
171 Why on Earth do people use the postfix notation? It is much more |
|
172 convenient to work with the usual infix notation for arithmetic |
|
173 expressions. Most modern calculators (as opposed to the ones used 20 |
|
174 years ago) understand infix notation. So why on Earth? \ldots{}Well, |
|
175 many computers under the hood, even nowadays, use postfix notation |
|
176 extensively. For example if you give to the Java compiler the |
|
177 expression $1 + ((2 * 3) + (4 - 3))$, it will generate the Java Byte |
|
178 code |
|
179 |
|
180 \begin{lstlisting}[language=JVMIS,numbers=none] |
|
181 ldc 1 |
|
182 ldc 2 |
|
183 ldc 3 |
|
184 imul |
|
185 ldc 4 |
|
186 ldc 3 |
|
187 isub |
|
188 iadd |
|
189 iadd |
|
190 \end{lstlisting} |
|
191 |
|
192 \noindent |
|
193 where the command \texttt{ldc} loads a constant onto the stack, and \texttt{imul}, |
|
194 \texttt{isub} and \texttt{iadd} are commands acting on the stack. Clearly this |
|
195 is the arithmetic expression in postfix notation.\bigskip |
|
196 |
|
197 \noindent |
|
198 The shunting yard algorithm processes an input token list using an |
|
199 operator stack and an output list. The input consists of numbers, |
|
200 operators ($+$, $-$, $*$, $/$) and parentheses, and for the purpose of |
|
201 the assignment we assume the input is always a well-formed expression |
|
202 in infix notation. The calculation in the shunting yard algorithm uses |
|
203 information about the |
|
204 precedences of the operators (given in the template file). The |
|
205 algorithm processes the input token list as follows: |
|
206 |
|
207 \begin{itemize} |
|
208 \item If there is a number as input token, then this token is |
|
209 transferred directly to the output list. Then the rest of the input is |
|
210 processed. |
|
211 \item If there is an operator as input token, then you need to check |
|
212 what is on top of the operator stack. If there are operators with |
|
213 a higher or equal precedence, these operators are first popped off |
|
214 from the stack and moved to the output list. Then the operator from the input |
|
215 is pushed onto the stack and the rest of the input is processed. |
|
216 \item If the input is a left-parenthesis, you push it on to the stack |
|
217 and continue processing the input. |
|
218 \item If the input is a right-parenthesis, then you pop off all operators |
|
219 from the stack to the output list until you reach the left-parenthesis. |
|
220 Then you discharge the $($ and $)$ from the input and stack, and continue |
|
221 processing the input list. |
|
222 \item If the input is empty, then you move all remaining operators |
|
223 from the stack to the output list. |
|
224 \end{itemize} |
|
225 |
|
226 \subsubsection*{Tasks (file postfix.scala)} |
|
227 |
|
228 \begin{itemize} |
|
229 \item[(1)] Implement the shunting yard algorithm described above. The |
|
230 function, called \texttt{syard}, takes a list of tokens as first |
|
231 argument. The second and third arguments are the stack and output |
|
232 list represented as Scala lists. The most convenient way to |
|
233 implement this algorithm is to analyse what the input list, stack |
|
234 and output list look like in each step using pattern-matching. The |
|
235 algorithm transforms for example the input |
|
236 |
|
237 \[ |
|
238 \texttt{List(3, +, 4, *, (, 2, -, 1, ))} |
|
239 \] |
|
240 |
|
241 into the postfix output |
|
242 |
|
243 \[ |
|
244 \texttt{List(3, 4, 2, 1, -, *, +)} |
|
245 \] |
|
246 |
|
247 You can assume the input list is always a list representing |
|
248 a well-formed infix arithmetic expression.\hfill[1 Mark] |
|
249 |
|
250 \item[(2)] Implement a compute function that takes a postfix expression |
|
251 as argument and evaluates it generating an integer as result. It uses a |
|
252 stack to evaluate the postfix expression. The operators $+$, $-$, $*$ |
|
253 are as usual; $/$ is division on integers, for example $7 / 3 = 2$. |
|
254 \hfill[1 Mark] |
|
255 \end{itemize} |
|
256 |
|
257 \subsubsection*{Task (file postfix2.scala)} |
|
258 |
|
259 \begin{itemize} |
|
260 \item[(3)] Extend the code in (7) and (8) to include the power |
|
261 operator. This requires proper account of associativity of |
|
262 the operators. The power operator is right-associative, whereas the |
|
263 other operators are left-associative. Left-associative operators |
|
264 are popped off if the precedence is bigger or equal, while |
|
265 right-associative operators are only popped off if the precedence is |
|
266 bigger. The compute function in this task should use |
|
267 \texttt{Long}s, rather than \texttt{Int}s.\hfill[2 Marks] |
|
268 \end{itemize} |
|
269 |
|
270 |
|
271 |
|
272 \subsection*{Core Part (6 Marks)} |
163 |
273 |
164 The task is to implement a regular expression matcher that is based on |
274 The task is to implement a regular expression matcher that is based on |
165 derivatives of regular expressions. Most of the functions are defined by |
275 derivatives of regular expressions. Most of the functions are defined by |
166 recursion over regular expressions and can be elegantly implemented |
276 recursion over regular expressions and can be elegantly implemented |
167 using Scala's pattern-matching. The implementation should deal with the |
277 using Scala's pattern-matching. The implementation should deal with the |
351 derivative regular expression can match the empty string (using |
461 derivative regular expression can match the empty string (using |
352 \textit{nullable}). For example the \textit{matcher} will produce |
462 \textit{nullable}). For example the \textit{matcher} will produce |
353 true for the regular expression $(a\cdot b)\cdot c$ and the string |
463 true for the regular expression $(a\cdot b)\cdot c$ and the string |
354 $abc$, but false if you give it the string $ab$. \hfill[1 Mark] |
464 $abc$, but false if you give it the string $ab$. \hfill[1 Mark] |
355 |
465 |
356 \item[(5)] Implement a function, called \textit{size}, by recursion |
466 \item[(9)] Implement a function, called \textit{size}, by recursion |
357 over regular expressions. If a regular expression is seen as a tree, |
467 over regular expressions. If a regular expression is seen as a tree, |
358 then \textit{size} should return the number of nodes in such a |
468 then \textit{size} should return the number of nodes in such a |
359 tree. Therefore this function is defined as follows: |
469 tree. Therefore this function is defined as follows: |
360 |
470 |
361 \begin{center} |
471 \begin{center} |
491 \end{tikzpicture} |
601 \end{tikzpicture} |
492 \end{tabular} |
602 \end{tabular} |
493 \end{center} |
603 \end{center} |
494 \newpage |
604 \newpage |
495 |
605 |
496 \subsection*{Part 2 (4 Marks)} |
|
497 |
|
498 The \emph{Shunting Yard Algorithm} has been developed by Edsger Dijkstra, |
|
499 an influential computer scientist who developed many well-known |
|
500 algorithms. This algorithm transforms the usual infix notation of |
|
501 arithmetic expressions into the postfix notation, sometimes also |
|
502 called reverse Polish notation. |
|
503 |
|
504 Why on Earth do people use the postfix notation? It is much more |
|
505 convenient to work with the usual infix notation for arithmetic |
|
506 expressions. Most modern calculators (as opposed to the ones used 20 |
|
507 years ago) understand infix notation. So why on Earth? \ldots{}Well, |
|
508 many computers under the hood, even nowadays, use postfix notation |
|
509 extensively. For example if you give to the Java compiler the |
|
510 expression $1 + ((2 * 3) + (4 - 3))$, it will generate the Java Byte |
|
511 code |
|
512 |
|
513 \begin{lstlisting}[language=JVMIS,numbers=none] |
|
514 ldc 1 |
|
515 ldc 2 |
|
516 ldc 3 |
|
517 imul |
|
518 ldc 4 |
|
519 ldc 3 |
|
520 isub |
|
521 iadd |
|
522 iadd |
|
523 \end{lstlisting} |
|
524 |
|
525 \noindent |
|
526 where the command \texttt{ldc} loads a constant onto the stack, and \texttt{imul}, |
|
527 \texttt{isub} and \texttt{iadd} are commands acting on the stack. Clearly this |
|
528 is the arithmetic expression in postfix notation.\bigskip |
|
529 |
|
530 \noindent |
|
531 The shunting yard algorithm processes an input token list using an |
|
532 operator stack and an output list. The input consists of numbers, |
|
533 operators ($+$, $-$, $*$, $/$) and parentheses, and for the purpose of |
|
534 the assignment we assume the input is always a well-formed expression |
|
535 in infix notation. The calculation in the shunting yard algorithm uses |
|
536 information about the |
|
537 precedences of the operators (given in the template file). The |
|
538 algorithm processes the input token list as follows: |
|
539 |
|
540 \begin{itemize} |
|
541 \item If there is a number as input token, then this token is |
|
542 transferred directly to the output list. Then the rest of the input is |
|
543 processed. |
|
544 \item If there is an operator as input token, then you need to check |
|
545 what is on top of the operator stack. If there are operators with |
|
546 a higher or equal precedence, these operators are first popped off |
|
547 from the stack and moved to the output list. Then the operator from the input |
|
548 is pushed onto the stack and the rest of the input is processed. |
|
549 \item If the input is a left-parenthesis, you push it on to the stack |
|
550 and continue processing the input. |
|
551 \item If the input is a right-parenthesis, then you pop off all operators |
|
552 from the stack to the output list until you reach the left-parenthesis. |
|
553 Then you discharge the $($ and $)$ from the input and stack, and continue |
|
554 processing the input list. |
|
555 \item If the input is empty, then you move all remaining operators |
|
556 from the stack to the output list. |
|
557 \end{itemize} |
|
558 |
|
559 \subsubsection*{Tasks (file postfix.scala)} |
|
560 |
|
561 \begin{itemize} |
|
562 \item[(7)] Implement the shunting yard algorithm described above. The |
|
563 function, called \texttt{syard}, takes a list of tokens as first |
|
564 argument. The second and third arguments are the stack and output |
|
565 list represented as Scala lists. The most convenient way to |
|
566 implement this algorithm is to analyse what the input list, stack |
|
567 and output list look like in each step using pattern-matching. The |
|
568 algorithm transforms for example the input |
|
569 |
|
570 \[ |
|
571 \texttt{List(3, +, 4, *, (, 2, -, 1, ))} |
|
572 \] |
|
573 |
|
574 into the postfix output |
|
575 |
|
576 \[ |
|
577 \texttt{List(3, 4, 2, 1, -, *, +)} |
|
578 \] |
|
579 |
|
580 You can assume the input list is always a list representing |
|
581 a well-formed infix arithmetic expression.\hfill[1 Mark] |
|
582 |
|
583 \item[(8)] Implement a compute function that takes a postfix expression |
|
584 as argument and evaluates it generating an integer as result. It uses a |
|
585 stack to evaluate the postfix expression. The operators $+$, $-$, $*$ |
|
586 are as usual; $/$ is division on integers, for example $7 / 3 = 2$. |
|
587 \hfill[1 Mark] |
|
588 \end{itemize} |
|
589 |
|
590 \subsubsection*{Task (file postfix2.scala)} |
|
591 |
|
592 \begin{itemize} |
|
593 \item[(9)] Extend the code in (7) and (8) to include the power |
|
594 operator. This requires proper account of associativity of |
|
595 the operators. The power operator is right-associative, whereas the |
|
596 other operators are left-associative. Left-associative operators |
|
597 are popped off if the precedence is bigger or equal, while |
|
598 right-associative operators are only popped off if the precedence is |
|
599 bigger. The compute function in this task should use |
|
600 \texttt{Long}s, rather than \texttt{Int}s.\hfill[2 Marks] |
|
601 \end{itemize} |
|
602 |
606 |
603 |
607 |
604 |
608 |
605 |
609 |
606 \end{document} |
610 \end{document} |