cws/cw03.tex
changeset 158 94b11ac19b41
parent 157 15f1fca879c5
child 159 92c31cbb1952
equal deleted inserted replaced
157:15f1fca879c5 158:94b11ac19b41
   108 
   108 
   109 \begin{center}
   109 \begin{center}
   110 \begin{tabular}{lcll}
   110 \begin{tabular}{lcll}
   111   $r$ & $::=$ & $\ZERO$     & cannot match anything\\
   111   $r$ & $::=$ & $\ZERO$     & cannot match anything\\
   112       &   $|$ & $\ONE$      & can only match the empty string\\
   112       &   $|$ & $\ONE$      & can only match the empty string\\
   113       &   $|$ & $c$         & can match a character (in this case $c$)\\
   113       &   $|$ & $c$         & can match a single character (in this case $c$)\\
   114       &   $|$ & $r_1 + r_2$ & can match a string either with $r_1$ or with $r_2$\\
   114       &   $|$ & $r_1 + r_2$ & can match a string either with $r_1$ or with $r_2$\\
   115   &   $|$ & $r_1\cdot r_2$ & can match the first part of a string with $r_1$ and\\
   115   &   $|$ & $r_1\cdot r_2$ & can match the first part of a string with $r_1$ and\\
   116           &  & & then the second part with $r_2$\\
   116           &  & & then the second part with $r_2$\\
   117       &   $|$ & $r^*$       & can match zero or more times $r$\\
   117       &   $|$ & $r^*$       & can match zero or more times $r$\\
   118 \end{tabular}
   118 \end{tabular}
   134 \item[$\bullet$] \url{http://davidvgalbraith.com/how-i-fixed-atom/}  
   134 \item[$\bullet$] \url{http://davidvgalbraith.com/how-i-fixed-atom/}  
   135 \end{itemize}}
   135 \end{itemize}}
   136 
   136 
   137 \subsubsection*{Tasks (file re.scala)}
   137 \subsubsection*{Tasks (file re.scala)}
   138 
   138 
       
   139 The file \texttt{re.scala} has already a definition for regular
       
   140 expressions and also defines some handy shorthand notation for
       
   141 regular expressions. The notation in this document matches up
       
   142 with the code in the file as follows:
       
   143 
       
   144 \begin{center}
       
   145   \begin{tabular}{rcl@{\hspace{10mm}}l}
       
   146     & & code: & shorthand:\smallskip \\ 
       
   147   $\ZERO$ & $\mapsto$ & \texttt{ZERO}\\
       
   148   $\ONE$  & $\mapsto$ & \texttt{ONE}\\
       
   149   $c$     & $\mapsto$ & \texttt{CHAR(c)}\\
       
   150   $r_1 + r_2$ & $\mapsto$ & \texttt{ALT(r1, r2)} & \texttt{r1 | r2}\\
       
   151   $r_1 \cdot r_2$ & $\mapsto$ & \texttt{SEQ(r1, r2)} & \texttt{r1 $\sim$ r2}\\
       
   152   $r^*$ & $\mapsto$ &  \texttt{STAR(r)} & \texttt{r.\%}
       
   153 \end{tabular}    
       
   154 \end{center}  
       
   155 
       
   156 
   139 \begin{itemize}
   157 \begin{itemize}
   140 \item[(1a)] Implement a function, called \textit{nullable}, by
   158 \item[(1a)] Implement a function, called \textit{nullable}, by
   141   recursion over regular expressions. This function tests whether a
   159   recursion over regular expressions. This function tests whether a
   142   regular expression can match the empty string. This means given a
   160   regular expression can match the empty string. This means given a
   143   regular expression it either returns true or false.
   161   regular expression it either returns true or false. The function
       
   162   \textit{nullable}
       
   163   is defined as follows:
   144 
   164 
   145 \begin{center}
   165 \begin{center}
   146 \begin{tabular}{lcl}
   166 \begin{tabular}{lcl}
   147 $\textit{nullable}(\ZERO)$ & $\dn$ & $\textit{false}$\\
   167 $\textit{nullable}(\ZERO)$ & $\dn$ & $\textit{false}$\\
   148 $\textit{nullable}(\ONE)$  & $\dn$ & $\textit{true}$\\
   168 $\textit{nullable}(\ONE)$  & $\dn$ & $\textit{true}$\\
   149 $\textit{nullable}(c)$     & $\dn$ & $\textit{false}$\\
   169 $\textit{nullable}(c)$     & $\dn$ & $\textit{false}$\\
   150 $\textit{nullable}(r_1 + r_2)$ & $\dn$ & $\textit{nullable}(r_1) \vee \textit{nullable}(r_2)$\\
   170 $\textit{nullable}(r_1 + r_2)$ & $\dn$ & $\textit{nullable}(r_1) \vee \textit{nullable}(r_2)$\\
   151 $\textit{nullable}(r_1 \cdot r_2)$ & $\dn$ & $\textit{nullable}(r_1) \wedge \textit{nullable}(r_2)$\\
   171 $\textit{nullable}(r_1 \cdot r_2)$ & $\dn$ & $\textit{nullable}(r_1) \wedge \textit{nullable}(r_2)$\\
   152 $\textit{nullable}(r^*)$ & $\dn$ & $\textit{true}$\\
   172 $\textit{nullable}(r^*)$ & $\dn$ & $\textit{true}$\\
   153 \end{tabular}
   173 \end{tabular}
   154 \end{center}\hfill[1 Mark]
   174 \end{center}~\hfill[1 Mark]
   155 
   175 
   156 \item[(1b)] Implement a function, called \textit{der}, by recursion over
   176 \item[(1b)] Implement a function, called \textit{der}, by recursion over
   157   regular expressions. It takes a character and a regular expression
   177   regular expressions. It takes a character and a regular expression
   158   as arguments and calculates the derivative regular expression according
   178   as arguments and calculates the derivative regular expression according
   159   to the rules:
   179   to the rules:
   312 a regular expression that can match the empty string. You can test
   332 a regular expression that can match the empty string. You can test
   313 whether this is indeed the case using the function nullable, which is
   333 whether this is indeed the case using the function nullable, which is
   314 what your matcher is doing.
   334 what your matcher is doing.
   315 
   335 
   316 The purpose of the $\textit{simp}$ function is to keep the regular
   336 The purpose of the $\textit{simp}$ function is to keep the regular
   317 expression small. Normally the derivative function makes the regular
   337 expressions small. Normally the derivative function makes the regular
   318 expression bigger (see the SEQ case and the example in (1b)) and the
   338 expression bigger (see the SEQ case and the example in (1b)) and the
   319 algorithm would be slower and slower over time. The $\textit{simp}$
   339 algorithm would be slower and slower over time. The $\textit{simp}$
   320 function counters this increase in size and the result is that the
   340 function counters this increase in size and the result is that the
   321 algorithm is fast throughout.  By the way, this algorithm is by Janusz
   341 algorithm is fast throughout.  By the way, this algorithm is by Janusz
   322 Brzozowski who came up with the idea of derivatives in 1964 in his PhD
   342 Brzozowski who came up with the idea of derivatives in 1964 in his PhD
   326 \url{https://en.wikipedia.org/wiki/Janusz_Brzozowski_(computer_scientist)}
   346 \url{https://en.wikipedia.org/wiki/Janusz_Brzozowski_(computer_scientist)}
   327 \end{center}
   347 \end{center}
   328 
   348 
   329 
   349 
   330 If you want to see how badly the regular expression matchers do in
   350 If you want to see how badly the regular expression matchers do in
   331 Java\footnote{Version 8 and below} and in Python with the `evil' regular
   351 Java\footnote{Version 8 and below; Version 9 does not seem to be as
       
   352   catastrophic, but still worse than the regular expression matcher
       
   353 based on derivatives.} and in Python with the `evil' regular
   332 expression $(a^*)^*\cdot b$, then have a look at the graphs below (you
   354 expression $(a^*)^*\cdot b$, then have a look at the graphs below (you
   333 can try it out for yourself: have a look at the file
   355 can try it out for yourself: have a look at the file
   334 \texttt{catastrophic.java} and \texttt{catastrophic.py} on
   356 \texttt{catastrophic.java} and \texttt{catastrophic.py} on
   335 KEATS). Compare this with the matcher you have implemented. How long
   357 KEATS). Compare this with the matcher you have implemented. How long
   336 can the string of $a$'s be in your matcher and still stay within the
   358 can the string of $a$'s be in your matcher and still stay within the
   350     ymax=35,
   372     ymax=35,
   351     ytick={0,5,...,30},
   373     ytick={0,5,...,30},
   352     scaled ticks=false,
   374     scaled ticks=false,
   353     axis lines=left,
   375     axis lines=left,
   354     width=6cm,
   376     width=6cm,
   355     height=5.0cm, 
   377     height=5.5cm, 
   356     legend entries={Python, Java 8},  
   378     legend entries={Python, Java 8},  
   357     legend pos=outer north east]
   379     legend pos=outer north east]
   358 \addplot[blue,mark=*, mark options={fill=white}] table {re-python2.data};
   380 \addplot[blue,mark=*, mark options={fill=white}] table {re-python2.data};
   359 \addplot[cyan,mark=*, mark options={fill=white}] table {re-java.data};
   381 \addplot[cyan,mark=*, mark options={fill=white}] table {re-java.data};
   360 \end{axis}
   382 \end{axis}
   377 computer pioneer, who unfortunately died a few months ago. The main
   399 computer pioneer, who unfortunately died a few months ago. The main
   378 feature of brainf*** is its minimalistic set of instructions---just 8
   400 feature of brainf*** is its minimalistic set of instructions---just 8
   379 instructions in total and all of which are single characters. Despite
   401 instructions in total and all of which are single characters. Despite
   380 the minimalism, this language has been shown to be Turing
   402 the minimalism, this language has been shown to be Turing
   381 complete\ldots{}if this doesn't ring any bell with you: it roughly
   403 complete\ldots{}if this doesn't ring any bell with you: it roughly
   382 that means every algorithm we know can, in principle, be implemented in
   404 means that every algorithm we know can, in principle, be implemented in
   383 brainf***. It just takes a lot of determination and quite a lot of
   405 brainf***. It just takes a lot of determination and quite a lot of
   384 memory resources. Some relatively sophisticated sample programs in
   406 memory resources. Some relatively sophisticated sample programs in
   385 brainf*** are given in the file \texttt{bf.scala}.\bigskip
   407 brainf*** are given in the file \texttt{bf.scala}.\bigskip
   386 
   408 
   387 \noindent
   409 \noindent
   418 
   440 
   419 \begin{itemize}
   441 \begin{itemize}
   420 \item[(2a)] Brainf*** memory is represented by a \texttt{Map} from
   442 \item[(2a)] Brainf*** memory is represented by a \texttt{Map} from
   421   integers to integers. The empty memory is represented by
   443   integers to integers. The empty memory is represented by
   422   \texttt{Map()}, that is nothing is stored in the
   444   \texttt{Map()}, that is nothing is stored in the
   423   memory. \texttt{Map(0 -> 1, 2 -> 3)} clearly has stored \texttt{1}
   445   memory. \texttt{Map(0 -> 1, 2 -> 3)} clearly stores \texttt{1} at
   424   at memory location \texttt{0}; at \texttt{2} it stores
   446   memory location \texttt{0}; at \texttt{2} it stores \texttt{3}. The
   425   \texttt{3}. The convention is that if we query the memory at a
   447   convention is that if we query the memory at a location that is
   426   location that is not defined in the \texttt{Map}, we return
   448   \emph{not} defined in the \texttt{Map}, we return \texttt{0}. Write
   427   \texttt{0}. Write a function, \texttt{sread}, that takes a memory (a
   449   a function, \texttt{sread}, that takes a memory (a \texttt{Map}) and
   428   \texttt{Map}) and a memory pointer (an \texttt{Int}) as argument,
   450   a memory pointer (an \texttt{Int}) as argument, and safely reads the
   429   and safely reads the corresponding memory location. If the map is not
   451   corresponding memory location. If the \texttt{Map} is not defined at
   430   defined at the memory pointer, \texttt{sread} returns \texttt{0}.
   452   the memory pointer, \texttt{sread} returns \texttt{0}.
   431 
   453 
   432   Write another function \texttt{write}, which takes a memory, a
   454   Write another function \texttt{write}, which takes a memory, a
   433   memory pointer and an integer value as argument and updates the map
   455   memory pointer and an integer value as argument and updates the
   434   with the value at the given memory location. As usual the map is not
   456   \texttt{Map} with the value at the given memory location. As usual
   435   updated `in-place' but a new map is created with the same data,
   457   the \texttt{Map} is not updated `in-place' but a new map is created
   436   except the value is stored at the given memory pointer.\hfill[1 Mark]
   458   with the same data, except the value is stored at the given memory
       
   459   pointer.\hfill[1 Mark]
   437 
   460 
   438 \item[(2b)] Write two functions, \texttt{jumpRight} and
   461 \item[(2b)] Write two functions, \texttt{jumpRight} and
   439   \texttt{jumpLeft} that are needed to implement the loop constructs
   462   \texttt{jumpLeft} that are needed to implement the loop constructs
   440   of brainf***. They take a program (a \texttt{String}) and a program
   463   of brainf***. They take a program (a \texttt{String}) and a program
   441   counter (an \texttt{Int}) as argument and move right (respectively
   464   counter (an \texttt{Int}) as argument and move right (respectively
   452   
   475   
   453   \begin{center}
   476   \begin{center}
   454   \texttt{--[..+>--]\barbelow{,}>,++}
   477   \texttt{--[..+>--]\barbelow{,}>,++}
   455   \end{center}
   478   \end{center}
   456 
   479 
   457   meaning it jumps to after the loop. Similarly, if you in 8th position
   480   meaning it jumps to after the loop. Similarly, if you are in 8th position
   458   then \texttt{jumpLeft} is supposed to jump to just after the opening
   481   then \texttt{jumpLeft} is supposed to jump to just after the opening
   459   bracket (that is jumping to the beginning of the loop):
   482   bracket (that is jumping to the beginning of the loop):
   460 
   483 
   461   \begin{center}
   484   \begin{center}
   462     \texttt{--[..+>-\barbelow{-}],>,++}
   485     \texttt{--[..+>-\barbelow{-}],>,++}
   550                        $\bullet$ & \texttt{mem} updated with \texttt{mp -> mem(mp) - 1}\\
   573                        $\bullet$ & \texttt{mem} updated with \texttt{mp -> mem(mp) - 1}\\
   551                      \end{tabular}\\\hline   
   574                      \end{tabular}\\\hline   
   552       \hfill\texttt{'.'} & \begin{tabular}[t]{@{}l@{\hspace{2mm}}l@{}}
   575       \hfill\texttt{'.'} & \begin{tabular}[t]{@{}l@{\hspace{2mm}}l@{}}
   553                        $\bullet$ & $\texttt{pc} + 1$\\
   576                        $\bullet$ & $\texttt{pc} + 1$\\
   554                        $\bullet$ & $\texttt{mp}$ and \texttt{mem} unchanged\\
   577                        $\bullet$ & $\texttt{mp}$ and \texttt{mem} unchanged\\
   555                        $\bullet$ & print out\texttt{mem(mp)} as a character\\
   578                        $\bullet$ & print out \,\texttt{mem(mp)} as a character\\
   556                      \end{tabular}\\\hline   
   579                      \end{tabular}\\\hline   
   557       \hfill\texttt{','} & \begin{tabular}[t]{@{}l@{\hspace{2mm}}l@{}}
   580       \hfill\texttt{','} & \begin{tabular}[t]{@{}l@{\hspace{2mm}}l@{}}
   558                        $\bullet$ & $\texttt{pc} + 1$\\
   581                        $\bullet$ & $\texttt{pc} + 1$\\
   559                        $\bullet$ & $\texttt{mp}$ unchanged\\
   582                        $\bullet$ & $\texttt{mp}$ unchanged\\
   560                        $\bullet$ & \texttt{mem} updated with \texttt{mp -> \textrm{input}}\\
   583                        $\bullet$ & \texttt{mem} updated with \texttt{mp -> \textrm{input}}\\
   561                        \multicolumn{2}{@{}l}{input given by \texttt{Console.in.read().toByte}}
   584                        \multicolumn{2}{@{}l}{the input is given by \texttt{Console.in.read().toByte}}
   562                      \end{tabular}\\\hline   
   585                      \end{tabular}\\\hline   
   563       \hfill\texttt{'['} & \begin{tabular}[t]{@{}l@{\hspace{2mm}}l@{}}
   586       \hfill\texttt{'['} & \begin{tabular}[t]{@{}l@{\hspace{2mm}}l@{}}
   564                        \multicolumn{2}{@{}l}{if \texttt{mem(mp) == 0} then}\\
   587                        \multicolumn{2}{@{}l}{if \texttt{mem(mp) == 0} then}\\
   565                        $\bullet$ & $\texttt{pc = jumpRight(prog, pc + 1, 0)}$\\
   588                        $\bullet$ & $\texttt{pc = jumpRight(prog, pc + 1, 0)}$\\
   566                        $\bullet$ & $\texttt{mp}$ and \texttt{mem} unchanged\medskip\\
   589                        $\bullet$ & $\texttt{mp}$ and \texttt{mem} unchanged\medskip\\