# HG changeset patch # User Christian Urban # Date 1543567562 0 # Node ID 42d7609844969adbcc2bf14e11f6a3a04a0ef3b8 # Parent c6453f3547ec51220130053bf92ef5447122164a updated diff -r c6453f3547ec -r 42d760984496 cws/cw04.pdf Binary file cws/cw04.pdf has changed diff -r c6453f3547ec -r 42d760984496 cws/cw04.tex --- a/cws/cw04.tex Fri Nov 30 07:54:49 2018 +0000 +++ b/cws/cw04.tex Fri Nov 30 08:46:02 2018 +0000 @@ -98,7 +98,7 @@ This coursework is worth 10\%. It is about a regular expression matcher and the shunting yard algorithm by Dijkstra. The first part is due on 6 December at 11pm; the second, more advanced part, is due on -21 December at 11pm. In the first part, you are asked to implement a +20 December at 11pm. In the first part, you are asked to implement a regular expression matcher based on derivatives of regular expressions. The background is that regular expression matching in languages like Java, JavaScript and Python can sometimes be excruciatingly @@ -117,13 +117,14 @@ \subsection*{Reference Implementation} This Scala assignment comes with three reference implementations in form of -\texttt{jar}-files. This allows you to run any test cases on your own +\texttt{jar}-files you can download from KEATS. This allows you to run any +test cases on your own computer. For example you can call Scala on the command line with the option \texttt{-cp re.jar} and then query any function from the \texttt{re.scala} template file. As usual you have to prefix the calls with \texttt{CW9a}, \texttt{CW9b} and \texttt{CW9c}. Since some tasks are time sensitive, you can check the reference -implementation as follows: if you want to know how long it takes +implementation as follows: if you want to know, for example, how long it takes to match strings of $a$'s using the regular expression $(a^*)^*\cdot b$ you can querry as follows: @@ -285,10 +286,10 @@ \mbox{}\hfill\mbox{[1 Mark]} \item[(3)] Implement the function \textit{simp}, which recursively - traverses a regular expression from the inside to the outside, and - on the way simplifies every regular expression on the left (see - below) to the regular expression on the right, except it does not - simplify inside ${}^*$-regular expressions. + traverses a regular expression, and on the way up simplifies every + regular expression on the left (see below) to the regular expression + on the right, except it does not simplify inside ${}^*$-regular + expressions. \begin{center} \begin{tabular}{l@{\hspace{4mm}}c@{\hspace{4mm}}ll} @@ -308,8 +309,8 @@ simplifies to just $r_1$. \textbf{Hint:} Regular expressions can be seen as trees and there are several methods for traversing trees. One of them corresponds to the inside-out traversal, which is - sometimes also called post-order traversal'' you traverse inside the - tree and on the way up, you apply simplification rules. + sometimes also called post-order traversal: you traverse inside the + tree and on the way up you apply simplification rules. Furthermore, remember numerical expressions from school times: there you had expressions like $u + \ldots + (1 \cdot x) - \ldots (z + (y \cdot 0)) \ldots$ @@ -318,7 +319,7 @@ for example the $y \cdot 0$ by $0$, or $1\cdot x$ by $x$, and then look whether more rules are applicable. If you organise the simplification in an inside-out fashion, it is always clear which - rule should be applied next.\hfill[1 Mark] + simplification should be applied next.\hfill[1 Mark] \item[(4)] Implement two functions: The first, called \textit{ders}, takes a list of characters and a regular expression as arguments, and @@ -359,7 +360,7 @@ \end{tabular} \end{center} -You can use \textit{size} in order to test how much the `evil' regular +You can use \textit{size} in order to test how much the ``evil'' regular expression $(a^*)^* \cdot b$ grows when taking successive derivatives according the letter $a$ without simplification and then compare it to taking the derivative, but simplify the result. The sizes @@ -497,7 +498,7 @@ years ago) understand infix notation. So why on Earth? \ldots{}Well, many computers under the hood, even nowadays, use postfix notation extensively. For example if you give to the Java compiler the -expression $1 + ((2 * 3) + (4 - 3))$, it will generate for it the Java Byte +expression $1 + ((2 * 3) + (4 - 3))$, it will generate the Java Byte code \begin{lstlisting}[language=JVMIS,numbers=none] @@ -522,25 +523,26 @@ operator stack and an output list. The input consists of numbers, operators ($+$, $-$, $*$, $/$) and parentheses, and for the purpose of the assignment we assume the input is always a well-formed expression -in infix notation. The algorithm uses information about the +in infix notation. The calculation in the shunting yard algorithm uses +information about the precedences of the operators (given in the template file). The algorithm processes the input token list as follows: \begin{itemize} \item If there is a number as input token, then this token is - transferred to the output list. Then the rest of the input is + transferred directly to the output list. Then the rest of the input is processed. \item If there is an operator as input token, then you need to check what is on top of the operator stack. If there are operators with a higher or equal precedence, these operators are first popped off - the stack and moved to the output list. Then the operator from the input + from the stack and moved to the output list. Then the operator from the input is pushed onto the stack and the rest of the input is processed. \item If the input is a left-parenthesis, you push it on to the stack and continue processing the input. -\item If the input is a right-parenthesis, then you move all operators +\item If the input is a right-parenthesis, then you pop off all operators from the stack to the output list until you reach the left-parenthesis. Then you discharge the $($ and $)$ from the input and stack, and continue - processing. + processing the input list. \item If the input is empty, then you move all remaining operators from the stack to the output list. \end{itemize} @@ -548,12 +550,12 @@ \subsubsection*{Tasks (file postfix.scala)} \begin{itemize} -\item[(7)] Implement the shunting yard algorithm outlined above. The +\item[(7)] Implement the shunting yard algorithm described above. The function, called \texttt{syard}, takes a list of tokens as first argument. The second and third arguments are the stack and output list represented as Scala lists. The most convenient way to implement this algorithm is to analyse what the input list, stack - and output look like in each step using pattern-matching. The + and output list look like in each step using pattern-matching. The algorithm transforms for example the input \[ diff -r c6453f3547ec -r 42d760984496 progs/lecture4.scala --- a/progs/lecture4.scala Fri Nov 30 07:54:49 2018 +0000 +++ b/progs/lecture4.scala Fri Nov 30 08:46:02 2018 +0000 @@ -9,6 +9,21 @@ // length and so on for every type of lists. + + + + + + + + + + + + + + + def length_string_list(lst: List[String]): Int = lst match { case Nil => 0 case x::xs => 1 + length_string_list(xs) @@ -333,7 +348,7 @@ // Q: Why the kerfuffle about the polymorphic types in DFAs/NFAs? -// A: Subset construction +// A: Subset construction. def subset[A, C](nfa: NFA[A, C]) : DFA[Set[A], C] = { DFA(nfa.starts, @@ -405,7 +420,8 @@ case c::Nil => CHAR(c) case c::s => SEQ(CHAR(c), charlist2rexp(s)) } -implicit def string2rexp(s: String): Rexp = charlist2rexp(s.toList) +implicit def string2rexp(s: String): Rexp = + charlist2rexp(s.toList) val r1 = STAR("ab")