2 \usepackage{../style} |
2 \usepackage{../style} |
3 \usepackage{../langs} |
3 \usepackage{../langs} |
4 |
4 |
5 \begin{document} |
5 \begin{document} |
6 |
6 |
7 \section*{Coursework (Strand 2)} |
7 \section*{Coursework 4 (Strand 1)} |
8 |
8 |
9 \noindent This coursework is worth 25\% and is due on 12 |
9 \noindent This coursework is worth 10\% and is due on 12th |
10 December at 16:00. You are asked to prove the correctness of a |
10 December at 16:00. You are asked to implement a compiler for |
11 regular expression matcher from the lectures using the |
11 the WHILE language that targets the assembler language |
12 Isabelle theorem prover. You need to submit a theory file |
12 provided by Jasmin. This assembler is available from |
13 containing this proof. The Isabelle theorem prover is |
13 |
14 available from |
14 \begin{center} |
15 |
15 \url{http://jasmin.sourceforge.net} |
16 \begin{center} |
16 \end{center} |
17 \url{http://isabelle.in.tum.de} |
17 |
18 \end{center} |
18 \noindent |
19 |
19 There is a user guide for Jasmin |
20 \noindent This is an interactive theorem prover, meaning that |
20 |
21 you can make definitions and state properties, and then help |
21 \begin{center} |
22 the system with proving these properties. Sometimes the proofs |
22 \url{http://jasmin.sourceforge.net/guide.html} |
23 are also automatic. There is a shortish user guide for |
23 \end{center} |
24 Isabelle, called ``Programming and Proving in Isabelle/HOL'' |
24 |
25 at |
25 \noindent |
26 |
26 and also a description of some of the instructions that the JVM understands |
27 \begin{center} |
27 |
28 \url{http://isabelle.in.tum.de/documentation.html} |
28 \begin{center} |
29 \end{center} |
29 \url{http://jasmin.sourceforge.net/instructions.html} |
30 |
30 \end{center} |
31 \noindent |
31 |
32 and also a longer (free) book at |
32 \noindent |
33 |
33 If you generated a correct assembler file for Jasmin, for example |
34 \begin{center} |
34 \texttt{loops.j}, you can use |
35 \url{http://www.concrete-semantics.org} |
35 |
36 \end{center} |
36 \begin{center} |
37 |
37 \texttt{java -jar jasmin-2.4/jasmin.jar loops.j} |
38 \noindent The Isabelle theorem prover is operated through the |
38 \end{center} |
39 jEdit IDE, which might not be an editor that is widely known. |
39 |
40 JEdit is documented in |
40 \noindent |
41 |
41 in order to translate it to Java byte code. The resulting class file can be |
42 \begin{center} |
42 run with |
43 \url{http://isabelle.in.tum.de/dist/Isabelle2014/doc/jedit.pdf} |
43 |
44 \end{center} |
44 \begin{center} |
45 |
45 \texttt{java loops} |
46 |
46 \end{center} |
47 \noindent If you need more help or you are stuck somewhere, |
47 |
48 please feel free to contact me (christian.urban@kcl.ac.uk). I |
48 \noindent |
49 am a main developer of Isabelle and have used it for |
49 where you might need to give the correct path to the class file. There |
50 approximately the 14 years. One of the success stories of |
50 are also other resources about Jasmin on the Internet, for example |
51 Isabelle is the recent verification of a microkernel operating |
51 \mbox{\url{http://goo.gl/Qj8TeK}} and \mbox{\url{http://goo.gl/fpVNyT}}\;.\bigskip |
52 system by an Australian group, see \url{http://sel4.systems}. |
52 |
53 Their operating system is the only one that has been proved |
53 \noindent |
54 correct according to its specification and is used for |
54 You need to submit a document containing the answers for the two questions |
55 application where high assurance, security and reliability is |
55 below. You can do the implementation in any programming language you like, but you need |
56 needed. |
56 to submit the source code with which you answered the questions. Otherwise |
57 |
57 the submission will not be counted. However, the coursework |
58 |
58 will \emph{only} be judged according to the answers. You can submit your answers |
59 \subsection*{The Task} |
59 in a txt-file or as pdf.\bigskip |
60 |
60 |
61 In this coursework you are asked to prove the correctness of |
61 |
62 the regular expression matcher from the lectures in Isabelle. |
62 \subsection*{Question 1 (marked with 2\%)} |
63 For this you need to first specify what the matcher is |
63 |
64 supposed to do and then to implement the algorithm. Finally |
64 You need to lex and parse WHILE programs and submit the assembler |
65 you need to prove that the algorithm meets the specification. |
65 instructions for the Fibonacci program and for the program you submitted |
66 The first two parts are relatively easy, because the |
66 in Coursework 2 in Question 3. The latter should be so modified that |
67 definitions in Isabelle will look very similar to the |
67 a user can input the upper bound on the console (in the original question |
68 mathematical definitions from the lectures or the Scala code |
68 it was fixed to 100). |
69 that is supplied at KEATS. For example very similar to Scala, |
69 |
70 regular expressions are defined in Isabelle as an inductive |
70 \subsection*{Question 2 (marked with 2\%)} |
71 datatype: |
71 |
72 |
72 Extend the syntax of you language so that it contains also \texttt{for}-loops, like |
73 \begin{lstlisting}[language={},numbers=none] |
73 |
74 datatype rexp = |
74 \begin{center} |
75 NULL |
75 \texttt{for} \;\textit{Id} \texttt{:=} \textit{AExp}\; \texttt{upto} \;\textit{AExp}\; \texttt{do} \textit{Block} |
76 | EMPTY |
76 \end{center} |
77 | CHAR char |
77 |
78 | SEQ rexp rexp |
78 \noindent |
79 | ALT rexp rexp |
79 The intended meaning is to first assign the variable \textit{Id} the value of the first arithmetic |
80 | STAR rexp |
80 expression, then go through the loop, at the end increase the value of the variable by 1, |
81 \end{lstlisting} |
81 and finally test wether the value is not less or equal to the value of the second |
82 |
82 arithmetic expression. For example the following instance of a \texttt{for}-loop |
83 \noindent The meaning of regular expressions is given as |
83 is supposed to print out the numbers \texttt{2}, \texttt{3}, \texttt{4}. |
84 usual: |
84 |
85 |
85 |
86 \begin{center} |
86 \begin{center} |
87 \begin{tabular}{rcl@{\hspace{10mm}}l} |
87 \begin{minipage}{6cm} |
88 $L(\varnothing)$ & $\dn$ & $\varnothing$ & \pcode{NULL}\\ |
88 \begin{lstlisting}[language=While,basicstyle=\ttfamily, numbers=none] |
89 $L(\epsilon)$ & $\dn$ & $\{[]\}$ & \pcode{EMPTY}\\ |
89 for i := 2 upto 4 do { |
90 $L(c)$ & $\dn$ & $\{[c]\}$ & \pcode{CHAR}\\ |
90 write i |
91 $L(r_1 + r_2)$ & $\dn$ & $L(r_1) \cup L(r_2)$ & \pcode{ALT}\\ |
91 } |
92 $L(r_1 \cdot r_2)$ & $\dn$ & $L(r_1) \,@\, L(r_2)$ & \pcode{SEQ}\\ |
92 \end{lstlisting} |
93 $L(r^*)$ & $\dn$ & $(L(r))^*$ & \pcode{STAR}\\ |
93 \end{minipage} |
94 \end{tabular} |
94 \end{center} |
95 \end{center} |
95 |
96 |
96 \noindent |
97 \noindent You would need to implement this function in order |
97 There are two ways how this can be implemented: one is to adapt the code generation |
98 to state the theorem about the correctness of the algorithm. |
98 part of the compiler and generate specific code for \texttt{for}-loops; the other is to |
99 The function $L$ should in Isabelle take a \pcode{rexp} as |
99 translate the abstract syntax tree of \texttt{for}-loops into an abstract syntax tree using |
100 input and return a set of strings. Its type is |
100 existing language constructs. For example the loop above could be translated |
101 therefore |
101 to the following \texttt{while}-loop: |
102 |
102 |
103 \begin{center} |
103 \begin{center} |
104 \pcode{L} \pcode{::} \pcode{rexp} $\Rightarrow$ \pcode{string set} |
104 \begin{minipage}{6cm} |
105 \end{center} |
105 \begin{lstlisting}[language=While,basicstyle=\ttfamily, numbers=none] |
106 |
106 i := 2; |
107 \noindent Isabelle treats strings as an abbreviation for lists |
107 while (i <= 4) do { |
108 of characters. This means you can pattern-match strings like |
108 write i; |
109 lists. The union operation on sets (for the \pcode{ALT}-case) |
109 i := i + 1; |
110 is a standard definition in Isabelle, but not the |
110 } |
111 concatenation operation on sets and also not the |
111 \end{lstlisting} |
112 star-operation. You would have to supply these definitions. |
112 \end{minipage} |
113 The concatenation operation can be defined in terms of the |
113 \end{center} |
114 append function, written \code{_ @ _} in Isabelle, for lists. |
114 |
115 The star-operation can be defined as a ``big-union'' of |
115 \noindent |
116 powers, like in the lectures, or directly as an inductive set. |
116 In this question you are supposed to give the assembler instructions for the |
117 |
117 program |
118 The functions for the matcher are shown in |
118 |
119 Figure~\ref{matcher}. The theorem that needs to be proved is |
119 \begin{center} |
120 |
120 \begin{minipage}{6cm} |
121 \begin{lstlisting}[numbers=none,language={},keywordstyle=\color{black}\ttfamily,mathescape] |
121 \begin{lstlisting}[language=While,basicstyle=\ttfamily, numbers=none] |
122 theorem |
122 for i := 1 upto 10000 do { |
123 "matches r s $\longleftrightarrow$ s $\in$ L r" |
123 for i := 1 upto 10000 do { |
124 \end{lstlisting} |
124 skip |
125 |
125 } |
126 \noindent which states that the function \emph{matches} is |
126 } |
127 true if and only if the string is in the language of the |
127 \end{lstlisting} |
128 regular expression. A proof for this lemma will need |
128 \end{minipage} |
129 side-lemmas about \pcode{nullable} and \pcode{der}. An example |
129 \end{center} |
130 proof in Isabelle that will not be relevant for the theorem |
130 |
131 above is given in Figure~\ref{proof}. |
131 |
132 |
132 |
133 \begin{figure}[p] |
133 \subsection*{Further Information} |
134 \begin{lstlisting}[language={},keywordstyle=\color{black}\ttfamily,mathescape] |
134 |
135 fun |
135 The Java infrastructure unfortunately does not contain an assembler out-of-the-box |
136 nullable :: "rexp $\Rightarrow$ bool" |
136 (therefore |
137 where |
137 you need to download the additional package Jasmin---see above). But it does contain a |
138 "nullable NULL = False" |
138 disassembler, called \texttt{javap}. A dissembler does the ``opposite'' of an assembler: it |
139 | "nullable EMPTY = True" |
139 generates readable assembler code from Java byte code. Have a look at the |
140 | "nullable (CHAR _) = False" |
140 following example: Compile using the usual Java compiler the simple Hello World |
141 | "nullable (ALT r1 r2) = (nullable(r1) $\vee$ nullable(r2))" |
141 program below: |
142 | "nullable (SEQ r1 r2) = (nullable(r1) $\wedge$ nullable(r2))" |
142 |
143 | "nullable (STAR _) = True" |
143 \begin{center} |
144 |
144 \begin{minipage}{10cm} |
145 fun |
145 \begin{lstlisting}[language=Java,basicstyle=\ttfamily] |
146 der :: "char $\Rightarrow$ rexp $\Rightarrow$ rexp" |
146 class HelloWorld { |
147 where |
147 public static void main(String[] args) { |
148 "der c NULL = NULL" |
148 System.out.println("Hello World!"); |
149 | "der c EMPTY = NULL" |
149 } |
150 | "der c (CHAR d) = (if c = d then EMPTY else NULL)" |
150 } |
151 | "der c (ALT r1 r2) = ALT (der c r1) (der c r2)" |
151 \end{lstlisting} |
152 | "der c (SEQ r1 r2) = |
152 \end{minipage} |
153 (if (nullable r1) then ALT (SEQ (der c r1) r2) (der c r2) |
153 \end{center} |
154 else SEQ (der c r1) r2)" |
154 |
155 | "der c (STAR r) = SEQ (der c r) (STAR r)" |
155 \noindent |
156 |
156 You can use the command |
157 fun |
157 |
158 ders :: "rexp $\Rightarrow$ string $\Rightarrow$ rexp" |
158 \begin{center} |
159 where |
159 \texttt{javap -v HelloWorld} |
160 "ders r [] = r" |
160 \end{center} |
161 | "ders r (c # s) = ders (der c r) s" |
161 |
162 |
162 \noindent |
163 fun |
163 to see the assembler instructions of the Java byte code that has been generated for this |
164 matches :: "rexp $\Rightarrow$ string $\Rightarrow$ bool" |
164 program. You can compare this with the code generated for the Scala |
165 where |
165 version of Hello World. |
166 "matches r s = nullable (ders r s)" |
166 |
167 \end{lstlisting} |
167 \begin{center} |
168 \caption{The definition of the matcher algorithm in |
168 \begin{minipage}{10cm} |
169 Isabelle.\label{matcher}} |
169 \begin{lstlisting}[language=Scala,basicstyle=\ttfamily] |
170 \end{figure} |
170 object HelloWorld { |
171 |
171 def main(args: Array[String]) { |
172 \begin{figure}[p] |
172 println("Hello World!") |
173 \begin{lstlisting}[language={},keywordstyle=\color{black}\ttfamily,mathescape] |
173 } |
174 fun |
174 } |
175 zeroable :: "rexp $\Rightarrow$ bool" |
175 \end{lstlisting} |
176 where |
176 \end{minipage} |
177 "zeroable NULL = True" |
177 \end{center} |
178 | "zeroable EMPTY = False" |
178 |
179 | "zeroable (CHAR _) = False" |
179 |
180 | "zeroable (ALT r1 r2) = (zeroable(r1) $\wedge$ zeroable(r2))" |
180 \subsection*{Library Functions} |
181 | "zeroable (SEQ r1 r2) = (zeroable(r1) $\vee$ zeroable(r2))" |
181 |
182 | "zeroable (STAR _) = False" |
182 You need to generate code for the commands \texttt{write} and \texttt{read}. This |
183 |
183 will require the addition of some ``library'' functions to your generated code. The first |
184 lemma |
184 command even needs two versions, because you might want to write out an |
185 "zeroable r $\longleftrightarrow$ L r = {}" |
185 integer or a string. The Java byte code will need two separate functions for this. |
186 proof (induct) |
186 For writing out an integer, you can use the assembler code |
187 case (NULL) |
187 |
188 have "zeroable NULL" "L NULL = {}" by simp_all |
188 \begin{center} |
189 then show "zeroable NULL $\longleftrightarrow$ (L NULL = {})" by simp |
189 \begin{minipage}{12cm} |
190 next |
190 \begin{lstlisting}[basicstyle=\ttfamily, numbers=none] |
191 case (EMPTY) |
191 .method public static write(I)V |
192 have "$\neg$ zeroable EMPTY" "L EMPTY = {[]}" by simp_all |
192 .limit locals 5 |
193 then show "zeroable EMPTY $\longleftrightarrow$ (L EMPTY = {})" by simp |
193 .limit stack 5 |
194 next |
194 iload 0 |
195 case (CHAR c) |
195 getstatic java/lang/System/out Ljava/io/PrintStream; |
196 have "$\neg$ zeroable (CHAR c)" "L (CHAR c) = {[c]}" by simp_all |
196 swap |
197 then show "zeroable (CHAR c) $\longleftrightarrow$ (L (CHAR c) = {})" by simp |
197 invokevirtual java/io/PrintStream/println(I)V |
198 next |
198 return |
199 case (ALT r1 r2) |
199 .end method |
200 have ih1: "zeroable r1 $\longleftrightarrow$ L r1 = {}" by fact |
200 \end{lstlisting} |
201 have ih2: "zeroable r2 $\longleftrightarrow$ L r2 = {}" by fact |
201 \end{minipage} |
202 show "zeroable (ALT r1 r2) $\longleftrightarrow$ (L (ALT r1 r2) = {})" |
202 \end{center} |
203 using ih1 ih2 by simp |
203 |
204 next |
204 \noindent |
205 case (SEQ r1 r2) |
205 This function will invoke Java's \texttt{println} function for integers. Then if you need |
206 have ih1: "zeroable r1 $\longleftrightarrow$ L r1 = {}" by fact |
206 to generate code for \texttt{write x} where \texttt{x} is an integer variable, you can generate |
207 have ih2: "zeroable r2 $\longleftrightarrow$ L r2 = {}" by fact |
207 |
208 show "zeroable (SEQ r1 r2) $\longleftrightarrow$ (L (SEQ r1 r2) = {})" |
208 \begin{center} |
209 using ih1 ih2 by (auto simp add: Conc_def) |
209 \begin{minipage}{8cm} |
210 next |
210 \begin{lstlisting}[basicstyle=\ttfamily, numbers=none] |
211 case (STAR r) |
211 iload n |
212 have "$\neg$ zeroable (STAR r)" "[] $\in$ L (r) ^ 0" by simp_all |
212 invokestatic XXX/XXX/write(I)V |
213 then show "zeroable (STAR r) $\longleftrightarrow$ (L (STAR r) = {})" |
213 \end{lstlisting} |
214 by (simp (no_asm) add: Star_def) blast |
214 \end{minipage} |
215 qed |
215 \end{center} |
216 \end{lstlisting} |
216 |
217 \caption{An Isabelle proof about the function \pcode{zeroable}.\label{proof}} |
217 \noindent |
|
218 where \texttt{n} is the index where the value of the variable \texttt{x} is |
|
219 stored. The \texttt{XXX/XXX} needs to be replaced with the class name |
|
220 which you use to generate the code (for example \texttt{fib/fib} in case |
|
221 of the Fibonacci numbers). |
|
222 |
|
223 Writing out a string is similar. The corresponding library function uses strings |
|
224 instead of integers: |
|
225 |
|
226 \begin{center} |
|
227 \begin{minipage}{12cm} |
|
228 \begin{lstlisting}[basicstyle=\ttfamily, numbers=none] |
|
229 .method public static writes(Ljava/lang/String;)V |
|
230 .limit stack 2 |
|
231 .limit locals 2 |
|
232 getstatic java/lang/System/out Ljava/io/PrintStream; |
|
233 aload 0 |
|
234 invokevirtual java/io/PrintStream/println(Ljava/lang/String;)V |
|
235 return |
|
236 .end method |
|
237 \end{lstlisting} |
|
238 \end{minipage} |
|
239 \end{center} |
|
240 |
|
241 \noindent |
|
242 The code that needs to be generated for \texttt{write "some\_string"} commands |
|
243 is |
|
244 |
|
245 \begin{center} |
|
246 \begin{minipage}{8cm} |
|
247 \begin{lstlisting}[basicstyle=\ttfamily, numbers=none] |
|
248 ldc "some_string" |
|
249 invokestatic XXX/XXX/writes(Ljava/lang/String;)V |
|
250 \end{lstlisting} |
|
251 \end{minipage} |
|
252 \end{center} |
|
253 |
|
254 \noindent |
|
255 Again you need to adjust the \texttt{XXX/XXX} part in each call. |
|
256 |
|
257 The code for \texttt{read} is more complicated. The reason is that inputting a string |
|
258 will need to be transformed into an integer. The code in Figure~\ref{read} does this. |
|
259 It can be called with |
|
260 |
|
261 \begin{center} |
|
262 \begin{minipage}{8cm} |
|
263 \begin{lstlisting}[basicstyle=\ttfamily, numbers=none] |
|
264 invokestatic XXX/XXX/read()I |
|
265 istore n |
|
266 \end{lstlisting} |
|
267 \end{minipage} |
|
268 \end{center} |
|
269 |
|
270 \noindent |
|
271 where \texttt{n} is the index of the variable that requires an input. |
|
272 |
|
273 |
|
274 \begin{figure}[p]\small |
|
275 \begin{lstlisting}[basicstyle=\ttfamily, numbers=none] |
|
276 .method public static read()I |
|
277 .limit locals 10 |
|
278 .limit stack 10 |
|
279 |
|
280 ldc 0 |
|
281 istore 1 ; this will hold our final integer |
|
282 Label1: |
|
283 getstatic java/lang/System/in Ljava/io/InputStream; |
|
284 invokevirtual java/io/InputStream/read()I |
|
285 istore 2 |
|
286 iload 2 |
|
287 ldc 10 ; the newline delimiter |
|
288 isub |
|
289 ifeq Label2 |
|
290 iload 2 |
|
291 ldc 32 ; the space delimiter |
|
292 isub |
|
293 ifeq Label2 |
|
294 |
|
295 iload 2 |
|
296 ldc 48 ; we have our digit in ASCII, have to subtract it from 48 |
|
297 isub |
|
298 ldc 10 |
|
299 iload 1 |
|
300 imul |
|
301 iadd |
|
302 istore 1 |
|
303 goto Label1 |
|
304 Label2: |
|
305 ;when we come here we have our integer computed in Local Variable 1 |
|
306 iload 1 |
|
307 ireturn |
|
308 .end method |
|
309 \end{lstlisting}\normalsize |
|
310 \caption{Assembler code for reading an integer from the console.\label{read}} |
218 \end{figure} |
311 \end{figure} |
219 |
312 |
220 \end{document} |
313 \end{document} |
221 |
314 |
222 %%% Local Variables: |
315 %%% Local Variables: |