coursework/cw04.tex
changeset 309 640e4a05cd9b
parent 298 bdf84605b6cd
child 313 90ccc385c547
equal deleted inserted replaced
308:3703ade9b17c 309:640e4a05cd9b
     2 \usepackage{../style}
     2 \usepackage{../style}
     3 \usepackage{../langs}
     3 \usepackage{../langs}
     4 
     4 
     5 \begin{document}
     5 \begin{document}
     6 
     6 
     7 \section*{Coursework (Strand 2)}
     7 \section*{Coursework 4 (Strand 1)}
     8 
     8 
     9 \noindent This coursework is worth 25\% and is due on 12
     9 \noindent This coursework is worth 10\% and is due on 12th
    10 December at 16:00. You are asked to prove the correctness of a
    10 December at 16:00. You are asked to implement a compiler for
    11 regular expression matcher from the lectures using the
    11 the WHILE language that targets the assembler language
    12 Isabelle theorem prover. You need to submit a theory file
    12 provided by Jasmin. This assembler is available from
    13 containing this proof. The Isabelle theorem prover is
    13 
    14 available from 
    14 \begin{center}
    15 
    15 \url{http://jasmin.sourceforge.net}
    16 \begin{center}
    16 \end{center}
    17 \url{http://isabelle.in.tum.de}
    17 
    18 \end{center}
    18 \noindent
    19 
    19 There is a user guide for Jasmin
    20 \noindent This is an interactive theorem prover, meaning that
    20 
    21 you can make definitions and state properties, and then help
    21 \begin{center}
    22 the system with proving these properties. Sometimes the proofs
    22 \url{http://jasmin.sourceforge.net/guide.html}
    23 are also automatic. There is a shortish user guide for
    23 \end{center}
    24 Isabelle, called ``Programming and Proving in Isabelle/HOL''
    24 
    25 at
    25 \noindent
    26 
    26 and also a description of some of the instructions that the JVM understands
    27 \begin{center}
    27 
    28 \url{http://isabelle.in.tum.de/documentation.html}
    28 \begin{center}
    29 \end{center}
    29 \url{http://jasmin.sourceforge.net/instructions.html}
    30 
    30 \end{center}
    31 \noindent
    31 
    32 and also a longer (free) book at
    32 \noindent
    33 
    33 If you generated a correct assembler file for Jasmin, for example
    34 \begin{center}
    34 \texttt{loops.j}, you can use
    35 \url{http://www.concrete-semantics.org}
    35 
    36 \end{center}
    36 \begin{center}
    37 
    37 \texttt{java -jar jasmin-2.4/jasmin.jar loops.j}
    38 \noindent The Isabelle theorem prover is operated through the
    38 \end{center}
    39 jEdit IDE, which might not be an editor that is widely known.
    39 
    40 JEdit is documented in
    40 \noindent
    41 
    41 in order to translate it to Java byte code. The resulting class file can be
    42 \begin{center}
    42 run with
    43 \url{http://isabelle.in.tum.de/dist/Isabelle2014/doc/jedit.pdf}
    43 
    44 \end{center}
    44 \begin{center}
    45 
    45 \texttt{java loops}
    46 
    46 \end{center}
    47 \noindent If you need more help or you are stuck somewhere,
    47 
    48 please feel free to contact me (christian.urban@kcl.ac.uk). I
    48 \noindent
    49 am a main developer of Isabelle and have used it for
    49 where you might need to give the correct path to the class file. There
    50 approximately the 14 years. One of the success stories of
    50 are also other resources about Jasmin on the Internet, for example
    51 Isabelle is the recent verification of a microkernel operating
    51 \mbox{\url{http://goo.gl/Qj8TeK}} and \mbox{\url{http://goo.gl/fpVNyT}}\;.\bigskip
    52 system by an Australian group, see \url{http://sel4.systems}.
    52 
    53 Their operating system is the only one that has been proved
    53 \noindent
    54 correct according to its specification and is used for
    54 You need to submit a document containing the answers for the two questions 
    55 application where high assurance, security and reliability is
    55 below. You can do the implementation in any programming language you like, but you need 
    56 needed. 
    56 to submit the source code with which you answered the questions. Otherwise
    57 
    57 the submission will not be counted.  However, the coursework 
    58 
    58 will \emph{only} be judged according to the answers. You can submit your answers
    59 \subsection*{The Task}
    59 in a txt-file or as pdf.\bigskip
    60 
    60 
    61 In this coursework you are asked to prove the correctness of
    61 
    62 the regular expression matcher from the lectures in Isabelle.
    62 \subsection*{Question 1 (marked with 2\%)}
    63 For this you need to first specify what the matcher is
    63 
    64 supposed to do and then to implement the algorithm. Finally
    64 You need to lex and parse WHILE programs and submit the assembler 
    65 you need to prove that the algorithm meets the specification.
    65 instructions for the Fibonacci program and for the program you submitted
    66 The first two parts are relatively easy, because the
    66 in Coursework 2 in Question 3. The latter should be so modified that 
    67 definitions in Isabelle will look very similar to the
    67 a user can input the upper bound on the console (in the original question
    68 mathematical definitions from the lectures or the Scala code
    68 it was fixed to 100).
    69 that is supplied at KEATS. For example very similar to Scala,
    69 
    70 regular expressions are defined in Isabelle as an inductive
    70 \subsection*{Question 2 (marked with 2\%)}
    71 datatype:
    71 
    72 
    72 Extend the syntax of you language so that it contains also \texttt{for}-loops, like
    73 \begin{lstlisting}[language={},numbers=none]
    73 
    74 datatype rexp =
    74 \begin{center}
    75   NULL
    75 \texttt{for} \;\textit{Id} \texttt{:=} \textit{AExp}\; \texttt{upto} \;\textit{AExp}\; \texttt{do} \textit{Block} 
    76 | EMPTY
    76 \end{center}
    77 | CHAR char
    77 
    78 | SEQ rexp rexp
    78 \noindent
    79 | ALT rexp rexp
    79 The intended meaning is to first assign the variable \textit{Id} the value of the first arithmetic 
    80 | STAR rexp
    80 expression, then go through the loop, at the end increase the value of the variable by 1, 
    81 \end{lstlisting}
    81 and finally test wether the value is not less or equal to the value of the second
    82 
    82 arithmetic expression. For example the following instance of a \texttt{for}-loop 
    83 \noindent The meaning of regular expressions is given as 
    83 is supposed to print out the numbers \texttt{2}, \texttt{3}, \texttt{4}.
    84 usual:
    84 
    85 
    85 
    86 \begin{center}
    86 \begin{center}
    87 \begin{tabular}{rcl@{\hspace{10mm}}l}
    87 \begin{minipage}{6cm}
    88 $L(\varnothing)$  & $\dn$ & $\varnothing$   & \pcode{NULL}\\
    88 \begin{lstlisting}[language=While,basicstyle=\ttfamily, numbers=none]
    89 $L(\epsilon)$     & $\dn$ & $\{[]\}$        & \pcode{EMPTY}\\ 
    89 for i := 2 upto 4 do {
    90 $L(c)$            & $\dn$ & $\{[c]\}$       & \pcode{CHAR}\\
    90     write i	
    91 $L(r_1 + r_2)$     & $\dn$ & $L(r_1) \cup L(r_2)$ & \pcode{ALT}\\
    91 }
    92 $L(r_1 \cdot r_2)$ & $\dn$ & $L(r_1) \,@\, L(r_2)$ & \pcode{SEQ}\\
    92 \end{lstlisting}
    93 $L(r^*)$           & $\dn$ & $(L(r))^*$ & \pcode{STAR}\\
    93 \end{minipage}
    94 \end{tabular}
    94 \end{center}
    95 \end{center}
    95 
    96 
    96 \noindent
    97 \noindent You would need to implement this function in order
    97 There are two ways how this can be implemented: one is to adapt the code generation 
    98 to state the theorem about the correctness of the algorithm.
    98 part of the compiler and generate specific code for \texttt{for}-loops; the other is to
    99 The function $L$ should in Isabelle take a \pcode{rexp} as
    99 translate the abstract syntax tree of \texttt{for}-loops into an abstract syntax tree using
   100 input and return a set of strings. Its type is
   100 existing language constructs. For example the loop above could be translated
   101 therefore 
   101 to the following \texttt{while}-loop:
   102 
   102 
   103 \begin{center}
   103 \begin{center}
   104 \pcode{L} \pcode{::} \pcode{rexp} $\Rightarrow$ \pcode{string set}
   104 \begin{minipage}{6cm}
   105 \end{center}
   105 \begin{lstlisting}[language=While,basicstyle=\ttfamily, numbers=none]
   106 
   106 i := 2;
   107 \noindent Isabelle treats strings as an abbreviation for lists
   107 while (i <= 4) do {
   108 of characters. This means you can pattern-match strings like
   108     write i;
   109 lists. The union operation on sets (for the \pcode{ALT}-case)
   109     i := i + 1;
   110 is a standard definition in Isabelle, but not the
   110 }
   111 concatenation operation on sets and also not the
   111 \end{lstlisting}
   112 star-operation. You would have to supply these definitions.
   112 \end{minipage}
   113 The concatenation operation can be defined in terms of the
   113 \end{center}
   114 append function, written \code{_ @ _} in Isabelle, for lists.
   114 
   115 The star-operation can be defined as a ``big-union'' of 
   115 \noindent
   116 powers, like in the lectures, or directly as an inductive set.
   116 In this question you are supposed to give the assembler instructions for the
   117 
   117 program
   118 The functions for the matcher are shown in
   118 
   119 Figure~\ref{matcher}. The theorem that needs to be proved is
   119 \begin{center}
   120 
   120 \begin{minipage}{6cm}
   121 \begin{lstlisting}[numbers=none,language={},keywordstyle=\color{black}\ttfamily,mathescape]
   121 \begin{lstlisting}[language=While,basicstyle=\ttfamily, numbers=none]
   122 theorem 
   122 for i := 1 upto 10000 do {
   123   "matches r s $\longleftrightarrow$ s $\in$ L r"
   123   for i := 1 upto 10000 do {
   124 \end{lstlisting}
   124   skip
   125 
   125   }
   126 \noindent which states that the function \emph{matches} is
   126 } 
   127 true if and only if the string is in the language of the
   127 \end{lstlisting}
   128 regular expression. A proof for this lemma will need
   128 \end{minipage}
   129 side-lemmas about \pcode{nullable} and \pcode{der}. An example
   129 \end{center}
   130 proof in Isabelle that will not be relevant for the theorem
   130 
   131 above is given in Figure~\ref{proof}.
   131 
   132 
   132 
   133 \begin{figure}[p]
   133 \subsection*{Further Information}
   134 \begin{lstlisting}[language={},keywordstyle=\color{black}\ttfamily,mathescape]
   134 
   135 fun 
   135 The Java infrastructure unfortunately does not contain an assembler out-of-the-box
   136   nullable :: "rexp $\Rightarrow$ bool"
   136 (therefore
   137 where
   137 you need to download the additional package Jasmin---see above). But it does contain a 
   138   "nullable NULL = False"
   138 disassembler, called \texttt{javap}. A dissembler does the ``opposite'' of an assembler: it
   139 | "nullable EMPTY = True"
   139 generates readable assembler code from Java byte code. Have a look at the
   140 | "nullable (CHAR _) = False"
   140 following example: Compile using the usual Java compiler the simple Hello World 
   141 | "nullable (ALT r1 r2) = (nullable(r1) $\vee$ nullable(r2))"
   141 program below:
   142 | "nullable (SEQ r1 r2) = (nullable(r1) $\wedge$ nullable(r2))"
   142 
   143 | "nullable (STAR _) = True"
   143 \begin{center}
   144 
   144 \begin{minipage}{10cm}
   145 fun 
   145 \begin{lstlisting}[language=Java,basicstyle=\ttfamily]
   146   der :: "char $\Rightarrow$ rexp $\Rightarrow$ rexp"
   146 class HelloWorld {
   147 where
   147     public static void main(String[] args) {
   148   "der c NULL = NULL"
   148         System.out.println("Hello World!");
   149 | "der c EMPTY = NULL"
   149     }
   150 | "der c (CHAR d) = (if c = d then EMPTY else NULL)"
   150 }
   151 | "der c (ALT r1 r2) = ALT (der c r1) (der c r2)"
   151 \end{lstlisting}
   152 | "der c (SEQ r1 r2) = 
   152 \end{minipage}
   153      (if (nullable r1) then ALT (SEQ (der c r1) r2) (der c r2)
   153 \end{center}
   154                        else SEQ (der c r1) r2)"
   154 
   155 | "der c (STAR r) = SEQ (der c r) (STAR r)"
   155 \noindent
   156 
   156 You can use the command
   157 fun 
   157 
   158   ders :: "rexp $\Rightarrow$ string $\Rightarrow$ rexp"
   158 \begin{center}
   159 where
   159 \texttt{javap -v HelloWorld}
   160   "ders r [] = r"
   160 \end{center}
   161 | "ders r (c # s) = ders (der c r) s"
   161 
   162 
   162 \noindent
   163 fun 
   163 to see the assembler instructions of the Java byte code that has been generated for this
   164   matches :: "rexp $\Rightarrow$ string $\Rightarrow$ bool"
   164 program. You can compare this with the code generated for the Scala
   165 where
   165 version of Hello World.
   166   "matches r s = nullable (ders r s)" 
   166 
   167 \end{lstlisting}
   167 \begin{center}
   168 \caption{The definition of the matcher algorithm in 
   168 \begin{minipage}{10cm}
   169 Isabelle.\label{matcher}}
   169 \begin{lstlisting}[language=Scala,basicstyle=\ttfamily]
   170 \end{figure}
   170 object HelloWorld {
   171 
   171    def main(args: Array[String]) {
   172 \begin{figure}[p]
   172       println("Hello World!")
   173 \begin{lstlisting}[language={},keywordstyle=\color{black}\ttfamily,mathescape]
   173   }
   174 fun 
   174 }
   175   zeroable :: "rexp $\Rightarrow$ bool"
   175 \end{lstlisting}
   176 where
   176 \end{minipage}
   177   "zeroable NULL = True"
   177 \end{center}
   178 | "zeroable EMPTY = False"
   178 
   179 | "zeroable (CHAR _) = False"
   179 
   180 | "zeroable (ALT r1 r2) = (zeroable(r1) $\wedge$ zeroable(r2))"
   180 \subsection*{Library Functions}
   181 | "zeroable (SEQ r1 r2) = (zeroable(r1) $\vee$ zeroable(r2))"
   181 
   182 | "zeroable (STAR _) = False"
   182 You need to generate code for the commands \texttt{write} and \texttt{read}. This
   183 
   183 will require the addition of some ``library'' functions to your generated code. The first
   184 lemma
   184 command even needs two versions, because you might want to write out an
   185   "zeroable r $\longleftrightarrow$ L r = {}"
   185 integer or a string. The Java byte code will need two separate functions for this.
   186 proof (induct)
   186 For writing out an integer, you can use the assembler code
   187   case (NULL)
   187 
   188   have "zeroable NULL" "L NULL = {}" by simp_all
   188 \begin{center}
   189   then show "zeroable NULL $\longleftrightarrow$ (L NULL = {})" by simp
   189 \begin{minipage}{12cm}
   190 next
   190 \begin{lstlisting}[basicstyle=\ttfamily, numbers=none]
   191   case (EMPTY)
   191 .method public static write(I)V 
   192   have "$\neg$ zeroable EMPTY" "L EMPTY = {[]}" by simp_all
   192     .limit locals 5 
   193   then show "zeroable EMPTY $\longleftrightarrow$ (L EMPTY = {})" by simp
   193     .limit stack 5 
   194 next
   194     iload 0 
   195   case (CHAR c)
   195     getstatic java/lang/System/out Ljava/io/PrintStream; 
   196   have "$\neg$ zeroable (CHAR c)" "L (CHAR c) = {[c]}" by simp_all
   196     swap 
   197   then show "zeroable (CHAR c) $\longleftrightarrow$ (L (CHAR c) = {})" by simp
   197     invokevirtual java/io/PrintStream/println(I)V 
   198 next 
   198     return 
   199   case (ALT r1 r2)
   199 .end method
   200   have ih1: "zeroable r1 $\longleftrightarrow$ L r1 = {}" by fact
   200 \end{lstlisting}
   201   have ih2: "zeroable r2 $\longleftrightarrow$ L r2 = {}" by fact
   201 \end{minipage}
   202   show "zeroable (ALT r1 r2) $\longleftrightarrow$ (L (ALT r1 r2) = {})" 
   202 \end{center}
   203     using ih1 ih2 by simp
   203 
   204 next
   204 \noindent 
   205   case (SEQ r1 r2)
   205 This function will invoke Java's \texttt{println} function for integers. Then if you need
   206   have ih1: "zeroable r1 $\longleftrightarrow$ L r1 = {}" by fact
   206 to generate code for \texttt{write x} where \texttt{x} is an integer variable, you can generate
   207   have ih2: "zeroable r2 $\longleftrightarrow$ L r2 = {}" by fact
   207 
   208   show "zeroable (SEQ r1 r2) $\longleftrightarrow$ (L (SEQ r1 r2) = {})" 
   208 \begin{center}
   209     using ih1 ih2 by (auto simp add: Conc_def)
   209 \begin{minipage}{8cm}
   210 next
   210 \begin{lstlisting}[basicstyle=\ttfamily, numbers=none]
   211   case (STAR r)
   211 iload n 
   212   have "$\neg$ zeroable (STAR r)" "[] $\in$ L (r) ^ 0" by simp_all
   212 invokestatic XXX/XXX/write(I)V
   213   then show "zeroable (STAR r) $\longleftrightarrow$ (L (STAR r) = {})" 
   213 \end{lstlisting}
   214     by (simp (no_asm) add: Star_def) blast
   214 \end{minipage}
   215 qed
   215 \end{center}
   216 \end{lstlisting}
   216 
   217 \caption{An Isabelle proof about the function \pcode{zeroable}.\label{proof}}
   217 \noindent
       
   218 where \texttt{n} is the index where the value of the variable \texttt{x} is
       
   219 stored. The \texttt{XXX/XXX} needs to be replaced with the class name 
       
   220 which you use to generate the code (for example \texttt{fib/fib} in case
       
   221 of the Fibonacci numbers).
       
   222 
       
   223 Writing out a string is similar. The corresponding library function uses strings 
       
   224 instead of integers:
       
   225 
       
   226 \begin{center}
       
   227 \begin{minipage}{12cm}
       
   228 \begin{lstlisting}[basicstyle=\ttfamily, numbers=none]
       
   229 .method public static writes(Ljava/lang/String;)V
       
   230    .limit stack 2
       
   231    .limit locals 2
       
   232    getstatic java/lang/System/out Ljava/io/PrintStream;
       
   233    aload 0
       
   234    invokevirtual java/io/PrintStream/println(Ljava/lang/String;)V
       
   235    return
       
   236 .end method
       
   237 \end{lstlisting}
       
   238 \end{minipage}
       
   239 \end{center}
       
   240 
       
   241 \noindent
       
   242 The code that needs to be generated for \texttt{write "some\_string"} commands 
       
   243 is
       
   244 
       
   245 \begin{center}
       
   246 \begin{minipage}{8cm}
       
   247 \begin{lstlisting}[basicstyle=\ttfamily, numbers=none]
       
   248 ldc "some_string"
       
   249 invokestatic XXX/XXX/writes(Ljava/lang/String;)V
       
   250 \end{lstlisting}
       
   251 \end{minipage}
       
   252 \end{center}
       
   253 
       
   254 \noindent
       
   255 Again you need to adjust the \texttt{XXX/XXX} part in each call.
       
   256 
       
   257 The code for \texttt{read} is more complicated. The reason is that inputting a string
       
   258 will need to be transformed into an integer. The code in Figure~\ref{read} does this.
       
   259 It can be called with
       
   260 
       
   261 \begin{center}
       
   262 \begin{minipage}{8cm}
       
   263 \begin{lstlisting}[basicstyle=\ttfamily, numbers=none]
       
   264 invokestatic XXX/XXX/read()I 
       
   265 istore n
       
   266 \end{lstlisting}
       
   267 \end{minipage}
       
   268 \end{center}
       
   269 
       
   270 \noindent 
       
   271 where \texttt{n} is the index of the variable that requires an input.
       
   272 
       
   273 
       
   274 \begin{figure}[p]\small
       
   275 \begin{lstlisting}[basicstyle=\ttfamily, numbers=none]
       
   276 .method public static read()I 
       
   277       .limit locals 10 
       
   278       .limit stack 10
       
   279 
       
   280       ldc 0 
       
   281       istore 1  ; this will hold our final integer 
       
   282 Label1: 
       
   283       getstatic java/lang/System/in Ljava/io/InputStream; 
       
   284       invokevirtual java/io/InputStream/read()I 
       
   285       istore 2 
       
   286       iload 2 
       
   287       ldc 10   ; the newline delimiter 
       
   288       isub 
       
   289       ifeq Label2 
       
   290       iload 2 
       
   291       ldc 32   ; the space delimiter 
       
   292       isub 
       
   293       ifeq Label2
       
   294 
       
   295       iload 2 
       
   296       ldc 48   ; we have our digit in ASCII, have to subtract it from 48 
       
   297       isub 
       
   298       ldc 10 
       
   299       iload 1 
       
   300       imul 
       
   301       iadd 
       
   302       istore 1 
       
   303       goto Label1 
       
   304 Label2: 
       
   305       ;when we come here we have our integer computed in Local Variable 1 
       
   306       iload 1 
       
   307       ireturn 
       
   308 .end method
       
   309 \end{lstlisting}\normalsize
       
   310 \caption{Assembler code for reading an integer from the console.\label{read}}
   218 \end{figure}
   311 \end{figure}
   219 
   312 
   220 \end{document}
   313 \end{document}
   221 
   314 
   222 %%% Local Variables: 
   315 %%% Local Variables: