|
1 theory Parsing |
|
2 imports Base "Package/Simple_Inductive_Package" |
|
3 begin |
|
4 |
|
5 |
|
6 chapter {* Parsing *} |
|
7 |
|
8 text {* |
|
9 |
|
10 Isabelle distinguishes between \emph{outer} and \emph{inner} syntax. |
|
11 Theory commands, such as \isacommand{definition}, \isacommand{inductive} and so |
|
12 on, belong to the outer syntax, whereas items inside double quotation marks, such |
|
13 as terms, types and so on, belong to the inner syntax. For parsing inner syntax, |
|
14 Isabelle uses a rather general and sophisticated algorithm, which |
|
15 is driven by priority grammars. Parsers for outer syntax are built up by functional |
|
16 parsing combinators. These combinators are a well-established technique for parsing, |
|
17 which has, for example, been described in Paulson's classic ML-book \cite{paulson-ml2}. |
|
18 Isabelle developers are usually concerned with writing these outer syntax parsers, |
|
19 either for new definitional packages or for calling tactics with specific arguments. |
|
20 |
|
21 \begin{readmore} |
|
22 The library |
|
23 for writing parser combinators is split up, roughly, into two parts. |
|
24 The first part consists of a collection of generic parser combinators defined |
|
25 in the structure @{ML_struct Scan} in the file |
|
26 @{ML_file "Pure/General/scan.ML"}. The second part of the library consists of |
|
27 combinators for dealing with specific token types, which are defined in the |
|
28 structure @{ML_struct OuterParse} in the file |
|
29 @{ML_file "Pure/Isar/outer_parse.ML"}. |
|
30 \end{readmore} |
|
31 |
|
32 *} |
|
33 |
|
34 section {* Building Generic Parsers *} |
|
35 |
|
36 text {* |
|
37 |
|
38 Let us first have a look at parsing strings using generic parsing combinators. |
|
39 The function @{ML "$$"} takes a string as argument and will ``consume'' this string from |
|
40 a given input list of strings. ``Consume'' in this context means that it will |
|
41 return a pair consisting of this string and the rest of the input list. |
|
42 For example: |
|
43 |
|
44 @{ML_response [display,gray] "($$ \"h\") (explode \"hello\")" "(\"h\", [\"e\", \"l\", \"l\", \"o\"])"} |
|
45 |
|
46 @{ML_response [display,gray] "($$ \"w\") (explode \"world\")" "(\"w\", [\"o\", \"r\", \"l\", \"d\"])"} |
|
47 |
|
48 The function @{ML "$$"} will either succeed (as in the two examples above) or raise the exception |
|
49 @{text "FAIL"} if no string can be consumed. For example trying to parse |
|
50 |
|
51 @{ML_response_fake [display,gray] "($$ \"x\") (explode \"world\")" |
|
52 "Exception FAIL raised"} |
|
53 |
|
54 will raise the exception @{text "FAIL"}. |
|
55 There are three exceptions used in the parsing combinators: |
|
56 |
|
57 \begin{itemize} |
|
58 \item @{text "FAIL"} is used to indicate that alternative routes of parsing |
|
59 might be explored. |
|
60 \item @{text "MORE"} indicates that there is not enough input for the parser. For example |
|
61 in @{text "($$ \"h\") []"}. |
|
62 \item @{text "ABORT"} is the exception that is raised when a dead end is reached. |
|
63 It is used for example in the function @{ML "!!"} (see below). |
|
64 \end{itemize} |
|
65 |
|
66 However, note that these exceptions are private to the parser and cannot be accessed |
|
67 by the programmer (for example to handle them). |
|
68 |
|
69 Slightly more general than the parser @{ML "$$"} is the function @{ML |
|
70 Scan.one}, in that it takes a predicate as argument and then parses exactly |
|
71 one item from the input list satisfying this predicate. For example the |
|
72 following parser either consumes an @{text [quotes] "h"} or a @{text |
|
73 [quotes] "w"}: |
|
74 |
|
75 |
|
76 @{ML_response [display,gray] |
|
77 "let |
|
78 val hw = Scan.one (fn x => x = \"h\" orelse x = \"w\") |
|
79 val input1 = (explode \"hello\") |
|
80 val input2 = (explode \"world\") |
|
81 in |
|
82 (hw input1, hw input2) |
|
83 end" |
|
84 "((\"h\", [\"e\", \"l\", \"l\", \"o\"]),(\"w\", [\"o\", \"r\", \"l\", \"d\"]))"} |
|
85 |
|
86 Two parser can be connected in sequence by using the function @{ML "--"}. |
|
87 For example parsing @{text "h"}, @{text "e"} and @{text "l"} in this |
|
88 sequence you can achieve by: |
|
89 |
|
90 @{ML_response [display,gray] "(($$ \"h\") -- ($$ \"e\") -- ($$ \"l\")) (explode \"hello\")" |
|
91 "(((\"h\", \"e\"), \"l\"), [\"l\", \"o\"])"} |
|
92 |
|
93 Note how the result of consumed strings builds up on the left as nested pairs. |
|
94 |
|
95 If, as in the previous example, you want to parse a particular string, |
|
96 then you should use the function @{ML Scan.this_string}: |
|
97 |
|
98 @{ML_response [display,gray] "Scan.this_string \"hell\" (explode \"hello\")" |
|
99 "(\"hell\", [\"o\"])"} |
|
100 |
|
101 Parsers that explore alternatives can be constructed using the function @{ML |
|
102 "||"}. For example, the parser @{ML "(p || q)" for p q} returns the |
|
103 result of @{text "p"}, in case it succeeds, otherwise it returns the |
|
104 result of @{text "q"}. For example: |
|
105 |
|
106 |
|
107 @{ML_response [display,gray] |
|
108 "let |
|
109 val hw = ($$ \"h\") || ($$ \"w\") |
|
110 val input1 = (explode \"hello\") |
|
111 val input2 = (explode \"world\") |
|
112 in |
|
113 (hw input1, hw input2) |
|
114 end" |
|
115 "((\"h\", [\"e\", \"l\", \"l\", \"o\"]), (\"w\", [\"o\", \"r\", \"l\", \"d\"]))"} |
|
116 |
|
117 The functions @{ML "|--"} and @{ML "--|"} work like the sequencing function |
|
118 for parsers, except that they discard the item being parsed by the first (respectively second) |
|
119 parser. For example: |
|
120 |
|
121 @{ML_response [display,gray] |
|
122 "let |
|
123 val just_e = ($$ \"h\") |-- ($$ \"e\") |
|
124 val just_h = ($$ \"h\") --| ($$ \"e\") |
|
125 val input = (explode \"hello\") |
|
126 in |
|
127 (just_e input, just_h input) |
|
128 end" |
|
129 "((\"e\", [\"l\", \"l\", \"o\"]),(\"h\", [\"l\", \"l\", \"o\"]))"} |
|
130 |
|
131 The parser @{ML "Scan.optional p x" for p x} returns the result of the parser |
|
132 @{text "p"}, if it succeeds; otherwise it returns |
|
133 the default value @{text "x"}. For example: |
|
134 |
|
135 @{ML_response [display,gray] |
|
136 "let |
|
137 val p = Scan.optional ($$ \"h\") \"x\" |
|
138 val input1 = (explode \"hello\") |
|
139 val input2 = (explode \"world\") |
|
140 in |
|
141 (p input1, p input2) |
|
142 end" |
|
143 "((\"h\", [\"e\", \"l\", \"l\", \"o\"]), (\"x\", [\"w\", \"o\", \"r\", \"l\", \"d\"]))"} |
|
144 |
|
145 The function @{ML Scan.option} works similarly, except no default value can |
|
146 be given. Instead, the result is wrapped as an @{text "option"}-type. For example: |
|
147 |
|
148 @{ML_response [display,gray] |
|
149 "let |
|
150 val p = Scan.option ($$ \"h\") |
|
151 val input1 = (explode \"hello\") |
|
152 val input2 = (explode \"world\") |
|
153 in |
|
154 (p input1, p input2) |
|
155 end" "((SOME \"h\", [\"e\", \"l\", \"l\", \"o\"]), (NONE, [\"w\", \"o\", \"r\", \"l\", \"d\"]))"} |
|
156 |
|
157 The function @{ML "!!"} helps to produce appropriate error messages |
|
158 during parsing. For example if you want to parse that @{text p} is immediately |
|
159 followed by @{text q}, or start a completely different parser @{text r}, |
|
160 you might write: |
|
161 |
|
162 @{ML [display,gray] "(p -- q) || r" for p q r} |
|
163 |
|
164 However, this parser is problematic for producing an appropriate error |
|
165 message, in case the parsing of @{ML "(p -- q)" for p q} fails. Because in |
|
166 that case you lose the information that @{text p} should be followed by |
|
167 @{text q}. To see this consider the case in which @{text p} is present in |
|
168 the input, but not @{text q}. That means @{ML "(p -- q)" for p q} will fail |
|
169 and the alternative parser @{text r} will be tried. However in many |
|
170 circumstance this will be the wrong parser for the input ``p-followed-by-q'' |
|
171 and therefore will also fail. The error message is then caused by the |
|
172 failure of @{text r}, not by the absence of @{text q} in the input. This |
|
173 kind of situation can be avoided when using the function @{ML "!!"}. |
|
174 This function aborts the whole process of parsing in case of a |
|
175 failure and prints an error message. For example if you invoke the parser |
|
176 |
|
177 |
|
178 @{ML [display,gray] "(!! (fn _ => \"foo\") ($$ \"h\"))"} |
|
179 |
|
180 on @{text [quotes] "hello"}, the parsing succeeds |
|
181 |
|
182 @{ML_response [display,gray] |
|
183 "(!! (fn _ => \"foo\") ($$ \"h\")) (explode \"hello\")" |
|
184 "(\"h\", [\"e\", \"l\", \"l\", \"o\"])"} |
|
185 |
|
186 but if you invoke it on @{text [quotes] "world"} |
|
187 |
|
188 @{ML_response_fake [display,gray] "(!! (fn _ => \"foo\") ($$ \"h\")) (explode \"world\")" |
|
189 "Exception ABORT raised"} |
|
190 |
|
191 then the parsing aborts and the error message @{text "foo"} is printed. In order to |
|
192 see the error message properly, you need to prefix the parser with the function |
|
193 @{ML "Scan.error"}. For example: |
|
194 |
|
195 @{ML_response_fake [display,gray] "Scan.error (!! (fn _ => \"foo\") ($$ \"h\"))" |
|
196 "Exception Error \"foo\" raised"} |
|
197 |
|
198 This ``prefixing'' is usually done by wrappers such as @{ML "OuterSyntax.command"} |
|
199 (see Section~\ref{sec:newcommand} which explains this function in more detail). |
|
200 |
|
201 Let us now return to our example of parsing @{ML "(p -- q) || r" for p q |
|
202 r}. If you want to generate the correct error message for p-followed-by-q, |
|
203 then you have to write: |
|
204 *} |
|
205 |
|
206 ML{*fun p_followed_by_q p q r = |
|
207 let |
|
208 val err_msg = (fn _ => p ^ " is not followed by " ^ q) |
|
209 in |
|
210 ($$ p -- (!! err_msg ($$ q))) || ($$ r -- $$ r) |
|
211 end *} |
|
212 |
|
213 |
|
214 text {* |
|
215 Running this parser with the @{text [quotes] "h"} and @{text [quotes] "e"}, and |
|
216 the input @{text [quotes] "holle"} |
|
217 |
|
218 @{ML_response_fake [display,gray] "Scan.error (p_followed_by_q \"h\" \"e\" \"w\") (explode \"holle\")" |
|
219 "Exception ERROR \"h is not followed by e\" raised"} |
|
220 |
|
221 produces the correct error message. Running it with |
|
222 |
|
223 @{ML_response [display,gray] "Scan.error (p_followed_by_q \"h\" \"e\" \"w\") (explode \"wworld\")" |
|
224 "((\"w\", \"w\"), [\"o\", \"r\", \"l\", \"d\"])"} |
|
225 |
|
226 yields the expected parsing. |
|
227 |
|
228 The function @{ML "Scan.repeat p" for p} will apply a parser @{text p} as |
|
229 often as it succeeds. For example: |
|
230 |
|
231 @{ML_response [display,gray] "Scan.repeat ($$ \"h\") (explode \"hhhhello\")" |
|
232 "([\"h\", \"h\", \"h\", \"h\"], [\"e\", \"l\", \"l\", \"o\"])"} |
|
233 |
|
234 Note that @{ML "Scan.repeat"} stores the parsed items in a list. The function |
|
235 @{ML "Scan.repeat1"} is similar, but requires that the parser @{text "p"} |
|
236 succeeds at least once. |
|
237 |
|
238 Also note that the parser would have aborted with the exception @{text MORE}, if |
|
239 you had run it only on just @{text [quotes] "hhhh"}. This can be avoided by using |
|
240 the wrapper @{ML Scan.finite} and the ``stopper-token'' @{ML Symbol.stopper}. With |
|
241 them you can write: |
|
242 |
|
243 @{ML_response [display,gray] "Scan.finite Symbol.stopper (Scan.repeat ($$ \"h\")) (explode \"hhhh\")" |
|
244 "([\"h\", \"h\", \"h\", \"h\"], [])"} |
|
245 |
|
246 @{ML Symbol.stopper} is the ``end-of-input'' indicator for parsing strings; |
|
247 other stoppers need to be used when parsing, for example, tokens. However, this kind of |
|
248 manually wrapping is often already done by the surrounding infrastructure. |
|
249 |
|
250 The function @{ML Scan.repeat} can be used with @{ML Scan.one} to read any |
|
251 string as in |
|
252 |
|
253 @{ML_response [display,gray] |
|
254 "let |
|
255 val p = Scan.repeat (Scan.one Symbol.not_eof) |
|
256 val input = (explode \"foo bar foo\") |
|
257 in |
|
258 Scan.finite Symbol.stopper p input |
|
259 end" |
|
260 "([\"f\", \"o\", \"o\", \" \", \"b\", \"a\", \"r\", \" \", \"f\", \"o\", \"o\"], [])"} |
|
261 |
|
262 where the function @{ML Symbol.not_eof} ensures that we do not read beyond the |
|
263 end of the input string (i.e.~stopper symbol). |
|
264 |
|
265 The function @{ML "Scan.unless p q" for p q} takes two parsers: if the first one can |
|
266 parse the input, then the whole parser fails; if not, then the second is tried. Therefore |
|
267 |
|
268 @{ML_response_fake_both [display,gray] "Scan.unless ($$ \"h\") ($$ \"w\") (explode \"hello\")" |
|
269 "Exception FAIL raised"} |
|
270 |
|
271 fails, while |
|
272 |
|
273 @{ML_response [display,gray] "Scan.unless ($$ \"h\") ($$ \"w\") (explode \"world\")" |
|
274 "(\"w\",[\"o\", \"r\", \"l\", \"d\"])"} |
|
275 |
|
276 succeeds. |
|
277 |
|
278 The functions @{ML Scan.repeat} and @{ML Scan.unless} can be combined to read any |
|
279 input until a certain marker symbol is reached. In the example below the marker |
|
280 symbol is a @{text [quotes] "*"}. |
|
281 |
|
282 @{ML_response [display,gray] |
|
283 "let |
|
284 val p = Scan.repeat (Scan.unless ($$ \"*\") (Scan.one Symbol.not_eof)) |
|
285 val input1 = (explode \"fooooo\") |
|
286 val input2 = (explode \"foo*ooo\") |
|
287 in |
|
288 (Scan.finite Symbol.stopper p input1, |
|
289 Scan.finite Symbol.stopper p input2) |
|
290 end" |
|
291 "(([\"f\", \"o\", \"o\", \"o\", \"o\", \"o\"], []), |
|
292 ([\"f\", \"o\", \"o\"], [\"*\", \"o\", \"o\", \"o\"]))"} |
|
293 |
|
294 After parsing is done, you nearly always want to apply a function on the parsed |
|
295 items. One way to do this is the function @{ML "(p >> f)" for p f}, which runs |
|
296 first the parser @{text p} and upon successful completion applies the |
|
297 function @{text f} to the result. For example |
|
298 |
|
299 @{ML_response [display,gray] |
|
300 "let |
|
301 fun double (x,y) = (x ^ x, y ^ y) |
|
302 in |
|
303 (($$ \"h\") -- ($$ \"e\") >> double) (explode \"hello\") |
|
304 end" |
|
305 "((\"hh\", \"ee\"), [\"l\", \"l\", \"o\"])"} |
|
306 |
|
307 doubles the two parsed input strings; or |
|
308 |
|
309 @{ML_response [display,gray] |
|
310 "let |
|
311 val p = Scan.repeat (Scan.one Symbol.not_eof) |
|
312 val input = (explode \"foo bar foo\") |
|
313 in |
|
314 Scan.finite Symbol.stopper (p >> implode) input |
|
315 end" |
|
316 "(\"foo bar foo\",[])"} |
|
317 |
|
318 where the single-character strings in the parsed output are transformed |
|
319 back into one string. |
|
320 |
|
321 The function @{ML Scan.ahead} parses some input, but leaves the original |
|
322 input unchanged. For example: |
|
323 |
|
324 @{ML_response [display,gray] |
|
325 "Scan.ahead (Scan.this_string \"foo\") (explode \"foo\")" |
|
326 "(\"foo\", [\"f\", \"o\", \"o\"])"} |
|
327 |
|
328 The function @{ML Scan.lift} takes a parser and a pair as arguments. This function applies |
|
329 the given parser to the second component of the pair and leaves the first component |
|
330 untouched. For example |
|
331 |
|
332 @{ML_response [display,gray] |
|
333 "Scan.lift (($$ \"h\") -- ($$ \"e\")) (1,(explode \"hello\"))" |
|
334 "((\"h\", \"e\"), (1, [\"l\", \"l\", \"o\"]))"} |
|
335 |
|
336 (FIXME: In which situations is this useful? Give examples.) |
|
337 |
|
338 \begin{exercise}\label{ex:scancmts} |
|
339 Write a parser that parses an input string so that any comment enclosed |
|
340 inside @{text "(*\<dots>*)"} is replaced by a the same comment but enclosed inside |
|
341 @{text "(**\<dots>**)"} in the output string. To enclose a string, you can use the |
|
342 function @{ML "enclose s1 s2 s" for s1 s2 s} which produces the string @{ML |
|
343 "s1 ^ s ^ s2" for s1 s2 s}. |
|
344 \end{exercise} |
|
345 *} |
|
346 |
|
347 section {* Parsing Theory Syntax *} |
|
348 |
|
349 text {* |
|
350 (FIXME: context parser) |
|
351 |
|
352 Most of the time, however, Isabelle developers have to deal with parsing |
|
353 tokens, not strings. These token parsers have the type: |
|
354 *} |
|
355 |
|
356 ML{*type 'a parser = OuterLex.token list -> 'a * OuterLex.token list*} |
|
357 |
|
358 text {* |
|
359 The reason for using token parsers is that theory syntax, as well as the |
|
360 parsers for the arguments of proof methods, use the type @{ML_type |
|
361 OuterLex.token} (which is identical to the type @{ML_type |
|
362 OuterParse.token}). However, there are also handy parsers for |
|
363 ML-expressions and ML-files. |
|
364 |
|
365 \begin{readmore} |
|
366 The parser functions for the theory syntax are contained in the structure |
|
367 @{ML_struct OuterParse} defined in the file @{ML_file "Pure/Isar/outer_parse.ML"}. |
|
368 The definition for tokens is in the file @{ML_file "Pure/Isar/outer_lex.ML"}. |
|
369 \end{readmore} |
|
370 |
|
371 The structure @{ML_struct OuterLex} defines several kinds of tokens (for example |
|
372 @{ML "Ident" in OuterLex} for identifiers, @{ML "Keyword" in OuterLex} for keywords and |
|
373 @{ML "Command" in OuterLex} for commands). Some token parsers take into account the |
|
374 kind of tokens. |
|
375 *} |
|
376 |
|
377 text {* |
|
378 The first example shows how to generate a token list out of a string using |
|
379 the function @{ML "OuterSyntax.scan"}. It is given the argument @{ML "Position.none"} |
|
380 since, at the moment, we are not interested in generating |
|
381 precise error messages. The following code |
|
382 |
|
383 @{ML_response_fake [display,gray] "OuterSyntax.scan Position.none \"hello world\"" |
|
384 "[Token (\<dots>,(Ident, \"hello\"),\<dots>), |
|
385 Token (\<dots>,(Space, \" \"),\<dots>), |
|
386 Token (\<dots>,(Ident, \"world\"),\<dots>)]"} |
|
387 |
|
388 produces three tokens where the first and the last are identifiers, since |
|
389 @{text [quotes] "hello"} and @{text [quotes] "world"} do not match any |
|
390 other syntactic category.\footnote{Note that because of a possible a bug in |
|
391 the PolyML runtime system the result is printed as @{text [quotes] "?"}, instead of |
|
392 the tokens.} The second indicates a space. |
|
393 |
|
394 Many parsing functions later on will require spaces, comments and the like |
|
395 to have already been filtered out. So from now on we are going to use the |
|
396 functions @{ML filter} and @{ML OuterLex.is_proper} do this. For example: |
|
397 |
|
398 @{ML_response_fake [display,gray] |
|
399 "let |
|
400 val input = OuterSyntax.scan Position.none \"hello world\" |
|
401 in |
|
402 filter OuterLex.is_proper input |
|
403 end" |
|
404 "[Token (\<dots>,(Ident, \"hello\"), \<dots>), Token (\<dots>,(Ident, \"world\"), \<dots>)]"} |
|
405 |
|
406 For convenience we define the function: |
|
407 |
|
408 *} |
|
409 |
|
410 ML{*fun filtered_input str = |
|
411 filter OuterLex.is_proper (OuterSyntax.scan Position.none str) *} |
|
412 |
|
413 text {* |
|
414 |
|
415 If you now parse |
|
416 |
|
417 @{ML_response_fake [display,gray] |
|
418 "filtered_input \"inductive | for\"" |
|
419 "[Token (\<dots>,(Command, \"inductive\"),\<dots>), |
|
420 Token (\<dots>,(Keyword, \"|\"),\<dots>), |
|
421 Token (\<dots>,(Keyword, \"for\"),\<dots>)]"} |
|
422 |
|
423 you obtain a list consisting of only a command and two keyword tokens. |
|
424 If you want to see which keywords and commands are currently known to Isabelle, type in |
|
425 the following code (you might have to adjust the @{ML print_depth} in order to |
|
426 see the complete list): |
|
427 |
|
428 @{ML_response_fake [display,gray] |
|
429 "let |
|
430 val (keywords, commands) = OuterKeyword.get_lexicons () |
|
431 in |
|
432 (Scan.dest_lexicon commands, Scan.dest_lexicon keywords) |
|
433 end" |
|
434 "([\"}\", \"{\", \<dots>], [\"\<rightleftharpoons>\", \"\<leftharpoondown>\", \<dots>])"} |
|
435 |
|
436 The parser @{ML "OuterParse.$$$"} parses a single keyword. For example: |
|
437 |
|
438 @{ML_response [display,gray] |
|
439 "let |
|
440 val input1 = filtered_input \"where for\" |
|
441 val input2 = filtered_input \"| in\" |
|
442 in |
|
443 (OuterParse.$$$ \"where\" input1, OuterParse.$$$ \"|\" input2) |
|
444 end" |
|
445 "((\"where\",\<dots>), (\"|\",\<dots>))"} |
|
446 |
|
447 Like before, you can sequentially connect parsers with @{ML "--"}. For example: |
|
448 |
|
449 @{ML_response [display,gray] |
|
450 "let |
|
451 val input = filtered_input \"| in\" |
|
452 in |
|
453 (OuterParse.$$$ \"|\" -- OuterParse.$$$ \"in\") input |
|
454 end" |
|
455 "((\"|\", \"in\"), [])"} |
|
456 |
|
457 The parser @{ML "OuterParse.enum s p" for s p} parses a possibly empty |
|
458 list of items recognised by the parser @{text p}, where the items being parsed |
|
459 are separated by the string @{text s}. For example: |
|
460 |
|
461 @{ML_response [display,gray] |
|
462 "let |
|
463 val input = filtered_input \"in | in | in foo\" |
|
464 in |
|
465 (OuterParse.enum \"|\" (OuterParse.$$$ \"in\")) input |
|
466 end" |
|
467 "([\"in\", \"in\", \"in\"], [\<dots>])"} |
|
468 |
|
469 @{ML "OuterParse.enum1"} works similarly, except that the parsed list must |
|
470 be non-empty. Note that we had to add a string @{text [quotes] "foo"} at the |
|
471 end of the parsed string, otherwise the parser would have consumed all |
|
472 tokens and then failed with the exception @{text "MORE"}. Like in the |
|
473 previous section, we can avoid this exception using the wrapper @{ML |
|
474 Scan.finite}. This time, however, we have to use the ``stopper-token'' @{ML |
|
475 OuterLex.stopper}. We can write: |
|
476 |
|
477 @{ML_response [display,gray] |
|
478 "let |
|
479 val input = filtered_input \"in | in | in\" |
|
480 in |
|
481 Scan.finite OuterLex.stopper |
|
482 (OuterParse.enum \"|\" (OuterParse.$$$ \"in\")) input |
|
483 end" |
|
484 "([\"in\", \"in\", \"in\"], [])"} |
|
485 |
|
486 The following function will help to run examples. |
|
487 |
|
488 *} |
|
489 |
|
490 ML{*fun parse p input = Scan.finite OuterLex.stopper (Scan.error p) input *} |
|
491 |
|
492 text {* |
|
493 |
|
494 The function @{ML "OuterParse.!!!"} can be used to force termination of the |
|
495 parser in case of a dead end, just like @{ML "Scan.!!"} (see previous section), |
|
496 except that the error message is fixed to be @{text [quotes] "Outer syntax error"} |
|
497 with a relatively precise description of the failure. For example: |
|
498 |
|
499 @{ML_response_fake [display,gray] |
|
500 "let |
|
501 val input = filtered_input \"in |\" |
|
502 val parse_bar_then_in = OuterParse.$$$ \"|\" -- OuterParse.$$$ \"in\" |
|
503 in |
|
504 parse (OuterParse.!!! parse_bar_then_in) input |
|
505 end" |
|
506 "Exception ERROR \"Outer syntax error: keyword \"|\" expected, |
|
507 but keyword in was found\" raised" |
|
508 } |
|
509 |
|
510 \begin{exercise} (FIXME) |
|
511 A type-identifier, for example @{typ "'a"}, is a token of |
|
512 kind @{ML "Keyword" in OuterLex}. It can be parsed using |
|
513 the function @{ML OuterParse.type_ident}. |
|
514 \end{exercise} |
|
515 |
|
516 (FIXME: or give parser for numbers) |
|
517 |
|
518 Whenever there is a possibility that the processing of user input can fail, |
|
519 it is a good idea to give as much information about where the error |
|
520 occured. For this Isabelle can attach positional information to tokens |
|
521 and then thread this information up the processing chain. To see this, |
|
522 modify the function @{ML filtered_input} described earlier to |
|
523 *} |
|
524 |
|
525 ML{*fun filtered_input' str = |
|
526 filter OuterLex.is_proper (OuterSyntax.scan (Position.line 7) str) *} |
|
527 |
|
528 text {* |
|
529 where we pretend the parsed string starts on line 7. An example is |
|
530 |
|
531 @{ML_response_fake [display,gray] |
|
532 "filtered_input' \"foo \\n bar\"" |
|
533 "[Token ((\"foo\", ({line=7, end_line=7}, {line=7})), (Ident, \"foo\"), \<dots>), |
|
534 Token ((\"bar\", ({line=8, end_line=8}, {line=8})), (Ident, \"bar\"), \<dots>)]"} |
|
535 |
|
536 in which the @{text [quotes] "\\n"} causes the second token to be in |
|
537 line 8. |
|
538 |
|
539 By using the parser @{ML OuterParse.position} you can decode the positional |
|
540 information and return it as part of the parsed input. For example |
|
541 |
|
542 @{ML_response_fake [display,gray] |
|
543 "let |
|
544 val input = (filtered_input' \"where\") |
|
545 in |
|
546 parse (OuterParse.position (OuterParse.$$$ \"where\")) input |
|
547 end" |
|
548 "((\"where\", {line=7, end_line=7}), [])"} |
|
549 |
|
550 \begin{readmore} |
|
551 The functions related to positions are implemented in the file |
|
552 @{ML_file "Pure/General/position.ML"}. |
|
553 \end{readmore} |
|
554 |
|
555 *} |
|
556 |
|
557 section {* Parsing Inner Syntax *} |
|
558 |
|
559 text {* |
|
560 There is usually no need to write your own parser for parsing inner syntax, that is |
|
561 for terms and types: you can just call the pre-defined parsers. Terms can |
|
562 be parsed using the function @{ML OuterParse.term}. For example: |
|
563 |
|
564 @{ML_response [display,gray] |
|
565 "let |
|
566 val input = OuterSyntax.scan Position.none \"foo\" |
|
567 in |
|
568 OuterParse.term input |
|
569 end" |
|
570 "(\"\\^E\\^Ftoken\\^Efoo\\^E\\^F\\^E\", [])"} |
|
571 |
|
572 The function @{ML OuterParse.prop} is similar, except that it gives a different |
|
573 error message, when parsing fails. As you can see, the parser not just returns |
|
574 the parsed string, but also some encoded information. You can decode the |
|
575 information with the function @{ML YXML.parse}. For example |
|
576 |
|
577 @{ML_response [display,gray] |
|
578 "YXML.parse \"\\^E\\^Ftoken\\^Efoo\\^E\\^F\\^E\"" |
|
579 "XML.Elem (\"token\", [], [XML.Text \"foo\"])"} |
|
580 |
|
581 The result of the decoding is an XML-tree. You can see better what is going on if |
|
582 you replace @{ML Position.none} by @{ML "Position.line 42"}, say: |
|
583 |
|
584 @{ML_response [display,gray] |
|
585 "let |
|
586 val input = OuterSyntax.scan (Position.line 42) \"foo\" |
|
587 in |
|
588 YXML.parse (fst (OuterParse.term input)) |
|
589 end" |
|
590 "XML.Elem (\"token\", [(\"line\", \"42\"), (\"end_line\", \"42\")], [XML.Text \"foo\"])"} |
|
591 |
|
592 The positional information is stored as part of an XML-tree so that code |
|
593 called later on will be able to give more precise error messages. |
|
594 |
|
595 \begin{readmore} |
|
596 The functions to do with input and output of XML and YXML are defined |
|
597 in @{ML_file "Pure/General/xml.ML"} and @{ML_file "Pure/General/yxml.ML"}. |
|
598 \end{readmore} |
|
599 |
|
600 *} |
|
601 |
|
602 section {* Parsing Specifications\label{sec:parsingspecs} *} |
|
603 |
|
604 text {* |
|
605 There are a number of special purpose parsers that help with parsing |
|
606 specifications of function definitions, inductive predicates and so on. In |
|
607 Capter~\ref{chp:package}, for example, we will need to parse specifications |
|
608 for inductive predicates of the form: |
|
609 *} |
|
610 |
|
611 simple_inductive |
|
612 even and odd |
|
613 where |
|
614 even0: "even 0" |
|
615 | evenS: "odd n \<Longrightarrow> even (Suc n)" |
|
616 | oddS: "even n \<Longrightarrow> odd (Suc n)" |
|
617 |
|
618 text {* |
|
619 For this we are going to use the parser: |
|
620 *} |
|
621 |
|
622 ML %linenosgray{*val spec_parser = |
|
623 OuterParse.fixes -- |
|
624 Scan.optional |
|
625 (OuterParse.$$$ "where" |-- |
|
626 OuterParse.!!! |
|
627 (OuterParse.enum1 "|" |
|
628 (SpecParse.opt_thm_name ":" -- OuterParse.prop))) []*} |
|
629 |
|
630 text {* |
|
631 Note that the parser does not parse the keyword \simpleinductive, even if it is |
|
632 meant to process definitions as shown above. The parser of the keyword |
|
633 will be given by the infrastructure that will eventually call @{ML spec_parser}. |
|
634 |
|
635 |
|
636 To see what the parser returns, let us parse the string corresponding to the |
|
637 definition of @{term even} and @{term odd}: |
|
638 |
|
639 @{ML_response [display,gray] |
|
640 "let |
|
641 val input = filtered_input |
|
642 (\"even and odd \" ^ |
|
643 \"where \" ^ |
|
644 \" even0[intro]: \\\"even 0\\\" \" ^ |
|
645 \"| evenS[intro]: \\\"odd n \<Longrightarrow> even (Suc n)\\\" \" ^ |
|
646 \"| oddS[intro]: \\\"even n \<Longrightarrow> odd (Suc n)\\\"\") |
|
647 in |
|
648 parse spec_parser input |
|
649 end" |
|
650 "(([(even, NONE, NoSyn), (odd, NONE, NoSyn)], |
|
651 [((even0,\<dots>), \"\\^E\\^Ftoken\\^Eeven 0\\^E\\^F\\^E\"), |
|
652 ((evenS,\<dots>), \"\\^E\\^Ftoken\\^Eodd n \<Longrightarrow> even (Suc n)\\^E\\^F\\^E\"), |
|
653 ((oddS,\<dots>), \"\\^E\\^Ftoken\\^Eeven n \<Longrightarrow> odd (Suc n)\\^E\\^F\\^E\")]), [])"} |
|
654 |
|
655 As you see, the result is a pair consisting of a list of |
|
656 variables with optional type-annotation and syntax-annotation, and a list of |
|
657 rules where every rule has optionally a name and an attribute. |
|
658 |
|
659 The function @{ML OuterParse.fixes} in Line 2 of the parser reads an |
|
660 \isacommand{and}-separated |
|
661 list of variables that can include optional type annotations and syntax translations. |
|
662 For example:\footnote{Note that in the code we need to write |
|
663 @{text "\\\"int \<Rightarrow> bool\\\""} in order to properly escape the double quotes |
|
664 in the compound type.} |
|
665 |
|
666 @{ML_response [display,gray] |
|
667 "let |
|
668 val input = filtered_input |
|
669 \"foo::\\\"int \<Rightarrow> bool\\\" and bar::nat (\\\"BAR\\\" 100) and blonk\" |
|
670 in |
|
671 parse OuterParse.fixes input |
|
672 end" |
|
673 "([(foo, SOME \"\\^E\\^Ftoken\\^Eint \<Rightarrow> bool\\^E\\^F\\^E\", NoSyn), |
|
674 (bar, SOME \"\\^E\\^Ftoken\\^Enat\\^E\\^F\\^E\", Mixfix (\"BAR\", [], 100)), |
|
675 (blonk, NONE, NoSyn)],[])"} |
|
676 *} |
|
677 |
|
678 text {* |
|
679 Whenever types are given, they are stored in the @{ML SOME}s. The types are |
|
680 not yet used to type the variables: this must be done by type-inference later |
|
681 on. Since types are part of the inner syntax they are strings with some |
|
682 encoded information (see previous section). If a syntax translation is |
|
683 present for a variable, then it is stored in the @{ML Mixfix} datastructure; |
|
684 no syntax translation is indicated by @{ML NoSyn}. |
|
685 |
|
686 \begin{readmore} |
|
687 The datastructre for sytax annotations is defined in @{ML_file "Pure/Syntax/mixfix.ML"}. |
|
688 \end{readmore} |
|
689 |
|
690 Lines 3 to 7 in the function @{ML spec_parser} implement the parser for a |
|
691 list of introduction rules, that is propositions with theorem |
|
692 annotations. The introduction rules are propositions parsed by @{ML |
|
693 OuterParse.prop}. However, they can include an optional theorem name plus |
|
694 some attributes. For example |
|
695 |
|
696 @{ML_response [display,gray] "let |
|
697 val input = filtered_input \"foo_lemma[intro,dest!]:\" |
|
698 val ((name, attrib), _) = parse (SpecParse.thm_name \":\") input |
|
699 in |
|
700 (name, map Args.dest_src attrib) |
|
701 end" "(foo_lemma, [((\"intro\", []), \<dots>), ((\"dest\", [\<dots>]), \<dots>)])"} |
|
702 |
|
703 The function @{ML opt_thm_name in SpecParse} is the ``optional'' variant of |
|
704 @{ML thm_name in SpecParse}. Theorem names can contain attributes. The name |
|
705 has to end with @{text [quotes] ":"}---see the argument of |
|
706 the function @{ML SpecParse.opt_thm_name} in Line 7. |
|
707 |
|
708 \begin{readmore} |
|
709 Attributes and arguments are implemented in the files @{ML_file "Pure/Isar/attrib.ML"} |
|
710 and @{ML_file "Pure/Isar/args.ML"}. |
|
711 \end{readmore} |
|
712 *} |
|
713 |
|
714 section {* New Commands and Keyword Files\label{sec:newcommand} *} |
|
715 |
|
716 text {* |
|
717 (FIXME: update to the right command setup) |
|
718 |
|
719 Often new commands, for example for providing new definitional principles, |
|
720 need to be implemented. While this is not difficult on the ML-level, |
|
721 new commands, in order to be useful, need to be recognised by |
|
722 ProofGeneral. This results in some subtle configuration issues, which we |
|
723 will explain in this section. |
|
724 |
|
725 To keep things simple, let us start with a ``silly'' command that does nothing |
|
726 at all. We shall name this command \isacommand{foobar}. On the ML-level it can be |
|
727 defined as: |
|
728 *} |
|
729 |
|
730 ML{*let |
|
731 val do_nothing = Scan.succeed (Toplevel.theory I) |
|
732 val kind = OuterKeyword.thy_decl |
|
733 in |
|
734 OuterSyntax.command "foobar" "description of foobar" kind do_nothing |
|
735 end *} |
|
736 |
|
737 text {* |
|
738 The crucial function @{ML OuterSyntax.command} expects a name for the command, a |
|
739 short description, a kind indicator (which we will explain later on more thoroughly) and a |
|
740 parser producing a top-level transition function (its purpose will also explained |
|
741 later). |
|
742 |
|
743 While this is everything you have to do on the ML-level, you need a keyword |
|
744 file that can be loaded by ProofGeneral. This is to enable ProofGeneral to |
|
745 recognise \isacommand{foobar} as a command. Such a keyword file can be |
|
746 generated with the command-line: |
|
747 |
|
748 @{text [display] "$ isabelle keywords -k foobar some_log_files"} |
|
749 |
|
750 The option @{text "-k foobar"} indicates which postfix the name of the keyword file |
|
751 will be assigned. In the case above the file will be named @{text |
|
752 "isar-keywords-foobar.el"}. This command requires log files to be |
|
753 present (in order to extract the keywords from them). To generate these log |
|
754 files, you first need to package the code above into a separate theory file named |
|
755 @{text "Command.thy"}, say---see Figure~\ref{fig:commandtheory} for the |
|
756 complete code. |
|
757 |
|
758 |
|
759 %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% |
|
760 \begin{figure}[t] |
|
761 \begin{graybox}\small |
|
762 \isacommand{theory}~@{text Command}\\ |
|
763 \isacommand{imports}~@{text Main}\\ |
|
764 \isacommand{begin}\\ |
|
765 \isacommand{ML}~@{text "\<verbopen>"}\\ |
|
766 @{ML |
|
767 "let |
|
768 val do_nothing = Scan.succeed (Toplevel.theory I) |
|
769 val kind = OuterKeyword.thy_decl |
|
770 in |
|
771 OuterSyntax.command \"foobar\" \"description of foobar\" kind do_nothing |
|
772 end"}\\ |
|
773 @{text "\<verbclose>"}\\ |
|
774 \isacommand{end} |
|
775 \end{graybox} |
|
776 \caption{\small The file @{text "Command.thy"} is necessary for generating a log |
|
777 file. This log file enables Isabelle to generate a keyword file containing |
|
778 the command \isacommand{foobar}.\label{fig:commandtheory}} |
|
779 \end{figure} |
|
780 %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% |
|
781 |
|
782 For our purposes it is sufficient to use the log files of the theories |
|
783 @{text "Pure"}, @{text "HOL"} and @{text "Pure-ProofGeneral"}, as well as |
|
784 the log file for the theory @{text "Command.thy"}, which contains the new |
|
785 \isacommand{foobar}-command. If you target other logics besides HOL, such |
|
786 as Nominal or ZF, then you need to adapt the log files appropriately. |
|
787 |
|
788 @{text Pure} and @{text HOL} are usually compiled during the installation of |
|
789 Isabelle. So log files for them should be already available. If not, then |
|
790 they can be conveniently compiled with the help of the build-script from the Isabelle |
|
791 distribution. |
|
792 |
|
793 @{text [display] |
|
794 "$ ./build -m \"Pure\" |
|
795 $ ./build -m \"HOL\""} |
|
796 |
|
797 The @{text "Pure-ProofGeneral"} theory needs to be compiled with: |
|
798 |
|
799 @{text [display] "$ ./build -m \"Pure-ProofGeneral\" \"Pure\""} |
|
800 |
|
801 For the theory @{text "Command.thy"}, you first need to create a ``managed'' subdirectory |
|
802 with: |
|
803 |
|
804 @{text [display] "$ isabelle mkdir FoobarCommand"} |
|
805 |
|
806 This generates a directory containing the files: |
|
807 |
|
808 @{text [display] |
|
809 "./IsaMakefile |
|
810 ./FoobarCommand/ROOT.ML |
|
811 ./FoobarCommand/document |
|
812 ./FoobarCommand/document/root.tex"} |
|
813 |
|
814 |
|
815 You need to copy the file @{text "Command.thy"} into the directory @{text "FoobarCommand"} |
|
816 and add the line |
|
817 |
|
818 @{text [display] "use_thy \"Command\";"} |
|
819 |
|
820 to the file @{text "./FoobarCommand/ROOT.ML"}. You can now compile the theory by just typing: |
|
821 |
|
822 @{text [display] "$ isabelle make"} |
|
823 |
|
824 If the compilation succeeds, you have finally created all the necessary log files. |
|
825 They are stored in the directory |
|
826 |
|
827 @{text [display] "~/.isabelle/heaps/Isabelle2008/polyml-5.2.1_x86-linux/log"} |
|
828 |
|
829 or something similar depending on your Isabelle distribution and architecture. |
|
830 One quick way to assign a shell variable to this directory is by typing |
|
831 |
|
832 @{text [display] "$ ISABELLE_LOGS=\"$(isabelle getenv -b ISABELLE_OUTPUT)\"/log"} |
|
833 |
|
834 on the Unix prompt. If you now type @{text "ls $ISABELLE_LOGS"}, then the |
|
835 directory should include the files: |
|
836 |
|
837 @{text [display] |
|
838 "Pure.gz |
|
839 HOL.gz |
|
840 Pure-ProofGeneral.gz |
|
841 HOL-FoobarCommand.gz"} |
|
842 |
|
843 From them you can create the keyword files. Assuming the name |
|
844 of the directory is in @{text "$ISABELLE_LOGS"}, |
|
845 then the Unix command for creating the keyword file is: |
|
846 |
|
847 @{text [display] |
|
848 "$ isabelle keywords -k foobar |
|
849 $ISABELLE_LOGS/{Pure.gz,HOL.gz,Pure-ProofGeneral.gz,HOL-FoobarCommand.gz}"} |
|
850 |
|
851 The result is the file @{text "isar-keywords-foobar.el"}. It should contain |
|
852 the string @{text "foobar"} twice.\footnote{To see whether things are fine, check |
|
853 that @{text "grep foobar"} on this file returns something |
|
854 non-empty.} This keyword file needs to |
|
855 be copied into the directory @{text "~/.isabelle/etc"}. To make Isabelle |
|
856 aware of this keyword file, you have to start Isabelle with the option @{text |
|
857 "-k foobar"}, that is: |
|
858 |
|
859 |
|
860 @{text [display] "$ isabelle emacs -k foobar a_theory_file"} |
|
861 |
|
862 If you now build a theory on top of @{text "Command.thy"}, |
|
863 then the command \isacommand{foobar} can be used. |
|
864 Similarly with any other new command. |
|
865 |
|
866 |
|
867 At the moment \isacommand{foobar} is not very useful. Let us refine it a bit |
|
868 next by letting it take a proposition as argument and printing this proposition |
|
869 inside the tracing buffer. |
|
870 |
|
871 The crucial part of a command is the function that determines the behaviour |
|
872 of the command. In the code above we used a ``do-nothing''-function, which |
|
873 because of @{ML Scan.succeed} does not parse any argument, but immediately |
|
874 returns the simple toplevel function @{ML "Toplevel.theory I"}. We can |
|
875 replace this code by a function that first parses a proposition (using the |
|
876 parser @{ML OuterParse.prop}), then prints out the tracing |
|
877 information (using a new top-level function @{text trace_top_lvl}) and |
|
878 finally does nothing. For this you can write: |
|
879 *} |
|
880 |
|
881 ML{*let |
|
882 fun trace_top_lvl str = |
|
883 Toplevel.theory (fn thy => (tracing str; thy)) |
|
884 |
|
885 val trace_prop = OuterParse.prop >> trace_top_lvl |
|
886 |
|
887 val kind = OuterKeyword.thy_decl |
|
888 in |
|
889 OuterSyntax.command "foobar" "traces a proposition" kind trace_prop |
|
890 end *} |
|
891 |
|
892 text {* |
|
893 Now you can type |
|
894 |
|
895 \begin{isabelle} |
|
896 \isacommand{foobar}~@{text [quotes] "True \<and> False"}\\ |
|
897 @{text "> \"True \<and> False\""} |
|
898 \end{isabelle} |
|
899 |
|
900 and see the proposition in the tracing buffer. |
|
901 |
|
902 Note that so far we used @{ML thy_decl in OuterKeyword} as the kind indicator |
|
903 for the command. This means that the command finishes as soon as the |
|
904 arguments are processed. Examples of this kind of commands are |
|
905 \isacommand{definition} and \isacommand{declare}. In other cases, |
|
906 commands are expected to parse some arguments, for example a proposition, |
|
907 and then ``open up'' a proof in order to prove the proposition (for example |
|
908 \isacommand{lemma}) or prove some other properties (for example |
|
909 \isacommand{function}). To achieve this kind of behaviour, you have to use the kind |
|
910 indicator @{ML thy_goal in OuterKeyword}. Note, however, once you change the |
|
911 ``kind'' of a command from @{ML thy_decl in OuterKeyword} to @{ML thy_goal in OuterKeyword} |
|
912 then the keyword file needs to be re-created! |
|
913 |
|
914 Below we change \isacommand{foobar} so that it takes a proposition as |
|
915 argument and then starts a proof in order to prove it. Therefore in Line 13, |
|
916 we set the kind indicator to @{ML thy_goal in OuterKeyword}. |
|
917 *} |
|
918 |
|
919 ML%linenosgray{*let |
|
920 fun set_up_thm str ctxt = |
|
921 let |
|
922 val prop = Syntax.read_prop ctxt str |
|
923 in |
|
924 Proof.theorem_i NONE (K I) [[(prop,[])]] ctxt |
|
925 end; |
|
926 |
|
927 val prove_prop = OuterParse.prop >> |
|
928 (fn str => Toplevel.print o |
|
929 Toplevel.local_theory_to_proof NONE (set_up_thm str)) |
|
930 |
|
931 val kind = OuterKeyword.thy_goal |
|
932 in |
|
933 OuterSyntax.command "foobar" "proving a proposition" kind prove_prop |
|
934 end *} |
|
935 |
|
936 text {* |
|
937 The function @{text set_up_thm} in Lines 2 to 7 takes a string (the proposition to be |
|
938 proved) and a context as argument. The context is necessary in order to be able to use |
|
939 @{ML Syntax.read_prop}, which converts a string into a proper proposition. |
|
940 In Line 6 the function @{ML Proof.theorem_i} starts the proof for the |
|
941 proposition. Its argument @{ML NONE} stands for a locale (which we chose to |
|
942 omit); the argument @{ML "(K I)"} stands for a function that determines what |
|
943 should be done with the theorem once it is proved (we chose to just forget |
|
944 about it). Lines 9 to 11 contain the parser for the proposition. |
|
945 |
|
946 If you now type \isacommand{foobar}~@{text [quotes] "True \<and> True"}, you obtain the following |
|
947 proof state: |
|
948 |
|
949 \begin{isabelle} |
|
950 \isacommand{foobar}~@{text [quotes] "True \<and> True"}\\ |
|
951 @{text "goal (1 subgoal):"}\\ |
|
952 @{text "1. True \<and> True"} |
|
953 \end{isabelle} |
|
954 |
|
955 and you can build the proof |
|
956 |
|
957 \begin{isabelle} |
|
958 \isacommand{foobar}~@{text [quotes] "True \<and> True"}\\ |
|
959 \isacommand{apply}@{text "(rule conjI)"}\\ |
|
960 \isacommand{apply}@{text "(rule TrueI)+"}\\ |
|
961 \isacommand{done} |
|
962 \end{isabelle} |
|
963 |
|
964 |
|
965 |
|
966 (FIXME What do @{ML "Toplevel.theory"} |
|
967 @{ML "Toplevel.print"} |
|
968 @{ML Toplevel.local_theory} do?) |
|
969 |
|
970 (FIXME read a name and show how to store theorems) |
|
971 |
|
972 *} |
|
973 |
|
974 section {* Methods *} |
|
975 |
|
976 text {* |
|
977 Methods are a central concept in Isabelle. They are the ones you use for example |
|
978 in \isacommand{apply}. To print out all currently known methods you can use the |
|
979 Isabelle command. |
|
980 *} |
|
981 |
|
982 print_methods |
|
983 |
|
984 text {* |
|
985 An example of a very simple method is the following code. |
|
986 *} |
|
987 |
|
988 method_setup %gray foobar_meth = |
|
989 {* Scan.succeed |
|
990 (K (SIMPLE_METHOD ((etac @{thm conjE} THEN' rtac @{thm conjI}) 1))) *} |
|
991 "foobar method" |
|
992 |
|
993 text {* |
|
994 It defines the method @{text foobar_meth}, which takes no arguments (therefore the |
|
995 parser @{ML Scan.succeed}) and |
|
996 only applies the tactic @{thm [source] conjE} and then @{thm [source] conjI}. |
|
997 This method can be used in the following proof |
|
998 *} |
|
999 |
|
1000 lemma shows "A \<and> B \<Longrightarrow> C \<and> D" |
|
1001 apply(foobar_meth) |
|
1002 txt {* |
|
1003 where it results in the goal state |
|
1004 |
|
1005 \begin{minipage}{\textwidth} |
|
1006 @{subgoals} |
|
1007 \end{minipage} *} |
|
1008 (*<*)oops(*>*) |
|
1009 |
|
1010 text {* |
|
1011 (FIXME: explain a version of rule-tac) |
|
1012 *} |
|
1013 |
|
1014 (*<*) |
|
1015 |
|
1016 chapter {* Parsing *} |
|
1017 |
|
1018 text {* |
|
1019 |
|
1020 Lots of Standard ML code is given in this document, for various reasons, |
|
1021 including: |
|
1022 \begin{itemize} |
|
1023 \item direct quotation of code found in the Isabelle source files, |
|
1024 or simplified versions of such code |
|
1025 \item identifiers found in the Isabelle source code, with their types |
|
1026 (or specialisations of their types) |
|
1027 \item code examples, which can be run by the reader, to help illustrate the |
|
1028 behaviour of functions found in the Isabelle source code |
|
1029 \item ancillary functions, not from the Isabelle source code, |
|
1030 which enable the reader to run relevant code examples |
|
1031 \item type abbreviations, which help explain the uses of certain functions |
|
1032 \end{itemize} |
|
1033 |
|
1034 *} |
|
1035 |
|
1036 section {* Parsing Isar input *} |
|
1037 |
|
1038 text {* |
|
1039 |
|
1040 The typical parsing function has the type |
|
1041 \texttt{'src -> 'res * 'src}, with input |
|
1042 of type \texttt{'src}, returning a result |
|
1043 of type \texttt{'res}, which is (or is derived from) the first part of the |
|
1044 input, and also returning the remainder of the input. |
|
1045 (In the common case, when it is clear what the ``remainder of the input'' |
|
1046 means, we will just say that the functions ``returns'' the |
|
1047 value of type \texttt{'res}). |
|
1048 An exception is raised if an appropriate value |
|
1049 cannot be produced from the input. |
|
1050 A range of exceptions can be used to identify different reasons |
|
1051 for the failure of a parse. |
|
1052 |
|
1053 This contrasts the standard parsing function in Standard ML, |
|
1054 which is of type |
|
1055 \texttt{type ('res, 'src) reader = 'src -> ('res * 'src) option}; |
|
1056 (for example, \texttt{List.getItem} and \texttt{Substring.getc}). |
|
1057 However, much of the discussion at |
|
1058 FIX file:/home/jeremy/html/ml/SMLBasis/string-cvt.html |
|
1059 is relevant. |
|
1060 |
|
1061 Naturally one may convert between the two different sorts of parsing functions |
|
1062 as follows: |
|
1063 \begin{verbatim} |
|
1064 open StringCvt ; |
|
1065 type ('res, 'src) ex_reader = 'src -> 'res * 'src |
|
1066 ex_reader : ('res, 'src) reader -> ('res, 'src) ex_reader |
|
1067 fun ex_reader rdr src = Option.valOf (rdr src) ; |
|
1068 reader : ('res, 'src) ex_reader -> ('res, 'src) reader |
|
1069 fun reader exrdr src = SOME (exrdr src) handle _ => NONE ; |
|
1070 \end{verbatim} |
|
1071 |
|
1072 *} |
|
1073 |
|
1074 section{* The \texttt{Scan} structure *} |
|
1075 |
|
1076 text {* |
|
1077 The source file is \texttt{src/General/scan.ML}. |
|
1078 This structure provides functions for using and combining parsing functions |
|
1079 of the type \texttt{'src -> 'res * 'src}. |
|
1080 Three exceptions are used: |
|
1081 \begin{verbatim} |
|
1082 exception MORE of string option; (*need more input (prompt)*) |
|
1083 exception FAIL of string option; (*try alternatives (reason of failure)*) |
|
1084 exception ABORT of string; (*dead end*) |
|
1085 \end{verbatim} |
|
1086 Many functions in this structure (generally those with names composed of |
|
1087 symbols) are declared as infix. |
|
1088 |
|
1089 Some functions from that structure are |
|
1090 \begin{verbatim} |
|
1091 |-- : ('src -> 'res1 * 'src') * ('src' -> 'res2 * 'src'') -> |
|
1092 'src -> 'res2 * 'src'' |
|
1093 --| : ('src -> 'res1 * 'src') * ('src' -> 'res2 * 'src'') -> |
|
1094 'src -> 'res1 * 'src'' |
|
1095 -- : ('src -> 'res1 * 'src') * ('src' -> 'res2 * 'src'') -> |
|
1096 'src -> ('res1 * 'res2) * 'src'' |
|
1097 ^^ : ('src -> string * 'src') * ('src' -> string * 'src'') -> |
|
1098 'src -> string * 'src'' |
|
1099 \end{verbatim} |
|
1100 These functions parse a result off the input source twice. |
|
1101 |
|
1102 \texttt{|--} and \texttt{--|} |
|
1103 return the first result and the second result, respectively. |
|
1104 |
|
1105 \texttt{--} returns both. |
|
1106 |
|
1107 \verb|^^| returns the result of concatenating the two results |
|
1108 (which must be strings). |
|
1109 |
|
1110 Note how, although the types |
|
1111 \texttt{'src}, \texttt{'src'} and \texttt{'src''} will normally be the same, |
|
1112 the types as shown help suggest the behaviour of the functions. |
|
1113 \begin{verbatim} |
|
1114 :-- : ('src -> 'res1 * 'src') * ('res1 -> 'src' -> 'res2 * 'src'') -> |
|
1115 'src -> ('res1 * 'res2) * 'src'' |
|
1116 :|-- : ('src -> 'res1 * 'src') * ('res1 -> 'src' -> 'res2 * 'src'') -> |
|
1117 'src -> 'res2 * 'src'' |
|
1118 \end{verbatim} |
|
1119 These are similar to \texttt{|--} and \texttt{--|}, |
|
1120 except that the second parsing function can depend on the result of the first. |
|
1121 \begin{verbatim} |
|
1122 >> : ('src -> 'res1 * 'src') * ('res1 -> 'res2) -> 'src -> 'res2 * 'src' |
|
1123 || : ('src -> 'res_src) * ('src -> 'res_src) -> 'src -> 'res_src |
|
1124 \end{verbatim} |
|
1125 \texttt{p >> f} applies a function \texttt{f} to the result of a parse. |
|
1126 |
|
1127 \texttt{||} tries a second parsing function if the first one |
|
1128 fails by raising an exception of the form \texttt{FAIL \_}. |
|
1129 |
|
1130 \begin{verbatim} |
|
1131 succeed : 'res -> ('src -> 'res * 'src) ; |
|
1132 fail : ('src -> 'res_src) ; |
|
1133 !! : ('src * string option -> string) -> |
|
1134 ('src -> 'res_src) -> ('src -> 'res_src) ; |
|
1135 \end{verbatim} |
|
1136 \texttt{succeed r} returns \texttt{r}, with the input unchanged. |
|
1137 \texttt{fail} always fails, raising exception \texttt{FAIL NONE}. |
|
1138 \texttt{!! f} only affects the failure mode, turning a failure that |
|
1139 raises \texttt{FAIL \_} into a failure that raises \texttt{ABORT ...}. |
|
1140 This is used to prevent recovery from the failure --- |
|
1141 thus, in \texttt{!! parse1 || parse2}, if \texttt{parse1} fails, |
|
1142 it won't recover by trying \texttt{parse2}. |
|
1143 |
|
1144 \begin{verbatim} |
|
1145 one : ('si -> bool) -> ('si list -> 'si * 'si list) ; |
|
1146 some : ('si -> 'res option) -> ('si list -> 'res * 'si list) ; |
|
1147 \end{verbatim} |
|
1148 These require the input to be a list of items: |
|
1149 they fail, raising \texttt{MORE NONE} if the list is empty. |
|
1150 On other failures they raise \texttt{FAIL NONE} |
|
1151 |
|
1152 \texttt{one p} takes the first |
|
1153 item from the list if it satisfies \texttt{p}, otherwise fails. |
|
1154 |
|
1155 \texttt{some f} takes the first |
|
1156 item from the list and applies \texttt{f} to it, failing if this returns |
|
1157 \texttt{NONE}. |
|
1158 |
|
1159 \begin{verbatim} |
|
1160 many : ('si -> bool) -> 'si list -> 'si list * 'si list ; |
|
1161 \end{verbatim} |
|
1162 \texttt{many p} takes items from the input until it encounters one |
|
1163 which does not satisfy \texttt{p}. If it reaches the end of the input |
|
1164 it fails, raising \texttt{MORE NONE}. |
|
1165 |
|
1166 \texttt{many1} (with the same type) fails if the first item |
|
1167 does not satisfy \texttt{p}. |
|
1168 |
|
1169 \begin{verbatim} |
|
1170 option : ('src -> 'res * 'src) -> ('src -> 'res option * 'src) |
|
1171 optional : ('src -> 'res * 'src) -> 'res -> ('src -> 'res * 'src) |
|
1172 \end{verbatim} |
|
1173 \texttt{option}: |
|
1174 where the parser \texttt{f} succeeds with result \texttt{r} |
|
1175 or raises \texttt{FAIL \_}, |
|
1176 \texttt{option f} gives the result \texttt{SOME r} or \texttt{NONE}. |
|
1177 |
|
1178 \texttt{optional}: if parser \texttt{f} fails by raising \texttt{FAIL \_}, |
|
1179 \texttt{optional f default} provides the result \texttt{default}. |
|
1180 |
|
1181 \begin{verbatim} |
|
1182 repeat : ('src -> 'res * 'src) -> 'src -> 'res list * 'src |
|
1183 repeat1 : ('src -> 'res * 'src) -> 'src -> 'res list * 'src |
|
1184 bulk : ('src -> 'res * 'src) -> 'src -> 'res list * 'src |
|
1185 \end{verbatim} |
|
1186 \texttt{repeat f} repeatedly parses an item off the remaining input until |
|
1187 \texttt{f} fails with \texttt{FAIL \_} |
|
1188 |
|
1189 \texttt{repeat1} is as for \texttt{repeat}, but requires at least one |
|
1190 successful parse. |
|
1191 |
|
1192 \begin{verbatim} |
|
1193 lift : ('src -> 'res * 'src) -> ('ex * 'src -> 'res * ('ex * 'src)) |
|
1194 \end{verbatim} |
|
1195 \texttt{lift} changes the source type of a parser by putting in an extra |
|
1196 component \texttt{'ex}, which is ignored in the parsing. |
|
1197 |
|
1198 The \texttt{Scan} structure also provides the type \texttt{lexicon}, |
|
1199 HOW DO THEY WORK ?? TO BE COMPLETED |
|
1200 \begin{verbatim} |
|
1201 dest_lexicon: lexicon -> string list ; |
|
1202 make_lexicon: string list list -> lexicon ; |
|
1203 empty_lexicon: lexicon ; |
|
1204 extend_lexicon: string list list -> lexicon -> lexicon ; |
|
1205 merge_lexicons: lexicon -> lexicon -> lexicon ; |
|
1206 is_literal: lexicon -> string list -> bool ; |
|
1207 literal: lexicon -> string list -> string list * string list ; |
|
1208 \end{verbatim} |
|
1209 Two lexicons, for the commands and keywords, are stored and can be retrieved |
|
1210 by: |
|
1211 \begin{verbatim} |
|
1212 val (command_lexicon, keyword_lexicon) = OuterSyntax.get_lexicons () ; |
|
1213 val commands = Scan.dest_lexicon command_lexicon ; |
|
1214 val keywords = Scan.dest_lexicon keyword_lexicon ; |
|
1215 \end{verbatim} |
|
1216 *} |
|
1217 |
|
1218 section{* The \texttt{OuterLex} structure *} |
|
1219 |
|
1220 text {* |
|
1221 The source file is @{text "src/Pure/Isar/outer_lex.ML"}. |
|
1222 In some other source files its name is abbreviated: |
|
1223 \begin{verbatim} |
|
1224 structure T = OuterLex; |
|
1225 \end{verbatim} |
|
1226 This structure defines the type \texttt{token}. |
|
1227 (The types |
|
1228 \texttt{OuterLex.token}, |
|
1229 \texttt{OuterParse.token} and |
|
1230 \texttt{SpecParse.token} are all the same). |
|
1231 |
|
1232 Input text is split up into tokens, and the input source type for many parsing |
|
1233 functions is \texttt{token list}. |
|
1234 |
|
1235 The datatype definition (which is not published in the signature) is |
|
1236 \begin{verbatim} |
|
1237 datatype token = Token of Position.T * (token_kind * string); |
|
1238 \end{verbatim} |
|
1239 but here are some runnable examples for viewing tokens: |
|
1240 |
|
1241 *} |
|
1242 |
|
1243 |
|
1244 |
|
1245 |
|
1246 ML{* |
|
1247 val toks = OuterSyntax.scan Position.none |
|
1248 "theory,imports;begin x.y.z apply ?v1 ?'a 'a -- || 44 simp (* xx *) { * fff * }" ; |
|
1249 *} |
|
1250 |
|
1251 ML{* |
|
1252 print_depth 20 ; |
|
1253 *} |
|
1254 |
|
1255 ML{* |
|
1256 map OuterLex.text_of toks ; |
|
1257 *} |
|
1258 |
|
1259 ML{* |
|
1260 val proper_toks = filter OuterLex.is_proper toks ; |
|
1261 *} |
|
1262 |
|
1263 ML{* |
|
1264 map OuterLex.kind_of proper_toks |
|
1265 *} |
|
1266 |
|
1267 ML{* |
|
1268 map OuterLex.unparse proper_toks ; |
|
1269 *} |
|
1270 |
|
1271 ML{* |
|
1272 OuterLex.stopper |
|
1273 *} |
|
1274 |
|
1275 text {* |
|
1276 |
|
1277 The function \texttt{is\_proper : token -> bool} identifies tokens which are |
|
1278 not white space or comments: many parsing functions assume require spaces or |
|
1279 comments to have been filtered out. |
|
1280 |
|
1281 There is a special end-of-file token: |
|
1282 \begin{verbatim} |
|
1283 val (tok_eof : token, is_eof : token -> bool) = T.stopper ; |
|
1284 (* end of file token *) |
|
1285 \end{verbatim} |
|
1286 |
|
1287 *} |
|
1288 |
|
1289 section {* The \texttt{OuterParse} structure *} |
|
1290 |
|
1291 text {* |
|
1292 The source file is \texttt{src/Pure/Isar/outer\_parse.ML}. |
|
1293 In some other source files its name is abbreviated: |
|
1294 \begin{verbatim} |
|
1295 structure P = OuterParse; |
|
1296 \end{verbatim} |
|
1297 Here the parsers use \texttt{token list} as the input source type. |
|
1298 |
|
1299 Some of the parsers simply select the first token, provided that it is of the |
|
1300 right kind (as returned by \texttt{T.kind\_of}): these are |
|
1301 \texttt{ command, keyword, short\_ident, long\_ident, sym\_ident, term\_var, |
|
1302 type\_ident, type\_var, number, string, alt\_string, verbatim, sync, eof} |
|
1303 Others select the first token, provided that it is one of several kinds, |
|
1304 (eg, \texttt{name, xname, text, typ}). |
|
1305 |
|
1306 \begin{verbatim} |
|
1307 type 'a tlp = token list -> 'a * token list ; (* token list parser *) |
|
1308 $$$ : string -> string tlp |
|
1309 nat : int tlp ; |
|
1310 maybe : 'a tlp -> 'a option tlp ; |
|
1311 \end{verbatim} |
|
1312 |
|
1313 \texttt{\$\$\$ s} returns the first token, |
|
1314 if it equals \texttt{s} \emph{and} \texttt{s} is a keyword. |
|
1315 |
|
1316 \texttt{nat} returns the first token, if it is a number, and evaluates it. |
|
1317 |
|
1318 \texttt{maybe}: if \texttt{p} returns \texttt{r}, |
|
1319 then \texttt{maybe p} returns \texttt{SOME r} ; |
|
1320 if the first token is an underscore, it returns \texttt{NONE}. |
|
1321 |
|
1322 A few examples: |
|
1323 \begin{verbatim} |
|
1324 P.list : 'a tlp -> 'a list tlp ; (* likewise P.list1 *) |
|
1325 P.and_list : 'a tlp -> 'a list tlp ; (* likewise P.and_list1 *) |
|
1326 val toks : token list = OuterSyntax.scan "44 ,_, 66,77" ; |
|
1327 val proper_toks = List.filter T.is_proper toks ; |
|
1328 P.list P.nat toks ; (* OK, doesn't recognize white space *) |
|
1329 P.list P.nat proper_toks ; (* fails, doesn't recognize what follows ',' *) |
|
1330 P.list (P.maybe P.nat) proper_toks ; (* fails, end of input *) |
|
1331 P.list (P.maybe P.nat) (proper_toks @ [tok_eof]) ; (* OK *) |
|
1332 val toks : token list = OuterSyntax.scan "44 and 55 and 66 and 77" ; |
|
1333 P.and_list P.nat (List.filter T.is_proper toks @ [tok_eof]) ; (* ??? *) |
|
1334 \end{verbatim} |
|
1335 |
|
1336 The following code helps run examples: |
|
1337 \begin{verbatim} |
|
1338 fun parse_str tlp str = |
|
1339 let val toks : token list = OuterSyntax.scan str ; |
|
1340 val proper_toks = List.filter T.is_proper toks @ [tok_eof] ; |
|
1341 val (res, rem_toks) = tlp proper_toks ; |
|
1342 val rem_str = String.concat |
|
1343 (Library.separate " " (List.map T.unparse rem_toks)) ; |
|
1344 in (res, rem_str) end ; |
|
1345 \end{verbatim} |
|
1346 |
|
1347 Some examples from \texttt{src/Pure/Isar/outer\_parse.ML} |
|
1348 \begin{verbatim} |
|
1349 val type_args = |
|
1350 type_ident >> Library.single || |
|
1351 $$$ "(" |-- !!! (list1 type_ident --| $$$ ")") || |
|
1352 Scan.succeed []; |
|
1353 \end{verbatim} |
|
1354 There are three ways parsing a list of type arguments can succeed. |
|
1355 The first line reads a single type argument, and turns it into a singleton |
|
1356 list. |
|
1357 The second line reads "(", and then the remainder, ignoring the "(" ; |
|
1358 the remainder consists of a list of type identifiers (at least one), |
|
1359 and then a ")" which is also ignored. |
|
1360 The \texttt{!!!} ensures that if the parsing proceeds this far and then fails, |
|
1361 it won't try the third line (see the description of \texttt{Scan.!!}). |
|
1362 The third line consumes no input and returns the empty list. |
|
1363 |
|
1364 \begin{verbatim} |
|
1365 fun triple2 (x, (y, z)) = (x, y, z); |
|
1366 val arity = xname -- ($$$ "::" |-- !!! ( |
|
1367 Scan.optional ($$$ "(" |-- !!! (list1 sort --| $$$ ")")) [] |
|
1368 -- sort)) >> triple2; |
|
1369 \end{verbatim} |
|
1370 The parser \texttt{arity} reads a typename $t$, then ``\texttt{::}'' (which is |
|
1371 ignored), then optionally a list $ss$ of sorts and then another sort $s$. |
|
1372 The result $(t, (ss, s))$ is transformed by \texttt{triple2} to $(t, ss, s)$. |
|
1373 The second line reads the optional list of sorts: |
|
1374 it reads first ``\texttt{(}'' and last ``\texttt{)}'', which are both ignored, |
|
1375 and between them a comma-separated list of sorts. |
|
1376 If this list is absent, the default \texttt{[]} provides the list of sorts. |
|
1377 |
|
1378 \begin{verbatim} |
|
1379 parse_str P.type_args "('a, 'b) ntyp" ; |
|
1380 parse_str P.type_args "'a ntyp" ; |
|
1381 parse_str P.type_args "ntyp" ; |
|
1382 parse_str P.arity "ty :: tycl" ; |
|
1383 parse_str P.arity "ty :: (tycl1, tycl2) tycl" ; |
|
1384 \end{verbatim} |
|
1385 |
|
1386 *} |
|
1387 |
|
1388 section {* The \texttt{SpecParse} structure *} |
|
1389 |
|
1390 text {* |
|
1391 The source file is \texttt{src/Pure/Isar/spec\_parse.ML}. |
|
1392 This structure contains token list parsers for more complicated values. |
|
1393 For example, |
|
1394 \begin{verbatim} |
|
1395 open SpecParse ; |
|
1396 attrib : Attrib.src tok_rdr ; |
|
1397 attribs : Attrib.src list tok_rdr ; |
|
1398 opt_attribs : Attrib.src list tok_rdr ; |
|
1399 xthm : (thmref * Attrib.src list) tok_rdr ; |
|
1400 xthms1 : (thmref * Attrib.src list) list tok_rdr ; |
|
1401 |
|
1402 parse_str attrib "simp" ; |
|
1403 parse_str opt_attribs "hello" ; |
|
1404 val (ass, "") = parse_str attribs "[standard, xxxx, simp, intro, OF sym]" ; |
|
1405 map Args.dest_src ass ; |
|
1406 val (asrc, "") = parse_str attrib "THEN trans [THEN sym]" ; |
|
1407 |
|
1408 parse_str xthm "mythm [attr]" ; |
|
1409 parse_str xthms1 "thm1 [attr] thms2" ; |
|
1410 \end{verbatim} |
|
1411 |
|
1412 As you can see, attributes are described using types of the \texttt{Args} |
|
1413 structure, described below. |
|
1414 *} |
|
1415 |
|
1416 section{* The \texttt{Args} structure *} |
|
1417 |
|
1418 text {* |
|
1419 The source file is \texttt{src/Pure/Isar/args.ML}. |
|
1420 The primary type of this structure is the \texttt{src} datatype; |
|
1421 the single constructors not published in the signature, but |
|
1422 \texttt{Args.src} and \texttt{Args.dest\_src} |
|
1423 are in fact the constructor and destructor functions. |
|
1424 Note that the types \texttt{Attrib.src} and \texttt{Method.src} |
|
1425 are in fact \texttt{Args.src}. |
|
1426 |
|
1427 \begin{verbatim} |
|
1428 src : (string * Args.T list) * Position.T -> Args.src ; |
|
1429 dest_src : Args.src -> (string * Args.T list) * Position.T ; |
|
1430 Args.pretty_src : Proof.context -> Args.src -> Pretty.T ; |
|
1431 fun pr_src ctxt src = Pretty.string_of (Args.pretty_src ctxt src) ; |
|
1432 |
|
1433 val thy = ML_Context.the_context () ; |
|
1434 val ctxt = ProofContext.init thy ; |
|
1435 map (pr_src ctxt) ass ; |
|
1436 \end{verbatim} |
|
1437 |
|
1438 So an \texttt{Args.src} consists of the first word, then a list of further |
|
1439 ``arguments'', of type \texttt{Args.T}, with information about position in the |
|
1440 input. |
|
1441 \begin{verbatim} |
|
1442 (* how an Args.src is parsed *) |
|
1443 P.position : 'a tlp -> ('a * Position.T) tlp ; |
|
1444 P.arguments : Args.T list tlp ; |
|
1445 |
|
1446 val parse_src : Args.src tlp = |
|
1447 P.position (P.xname -- P.arguments) >> Args.src ; |
|
1448 \end{verbatim} |
|
1449 |
|
1450 \begin{verbatim} |
|
1451 val ((first_word, args), pos) = Args.dest_src asrc ; |
|
1452 map Args.string_of args ; |
|
1453 \end{verbatim} |
|
1454 |
|
1455 The \texttt{Args} structure contains more parsers and parser transformers |
|
1456 for which the input source type is \texttt{Args.T list}. For example, |
|
1457 \begin{verbatim} |
|
1458 type 'a atlp = Args.T list -> 'a * Args.T list ; |
|
1459 open Args ; |
|
1460 nat : int atlp ; (* also Args.int *) |
|
1461 thm_sel : PureThy.interval list atlp ; |
|
1462 list : 'a atlp -> 'a list atlp ; |
|
1463 attribs : (string -> string) -> Args.src list atlp ; |
|
1464 opt_attribs : (string -> string) -> Args.src list atlp ; |
|
1465 |
|
1466 (* parse_atl_str : 'a atlp -> (string -> 'a * string) ; |
|
1467 given an Args.T list parser, to get a string parser *) |
|
1468 fun parse_atl_str atlp str = |
|
1469 let val (ats, rem_str) = parse_str P.arguments str ; |
|
1470 val (res, rem_ats) = atlp ats ; |
|
1471 in (res, String.concat (Library.separate " " |
|
1472 (List.map Args.string_of rem_ats @ [rem_str]))) end ; |
|
1473 |
|
1474 parse_atl_str Args.int "-1-," ; |
|
1475 parse_atl_str (Scan.option Args.int) "x1-," ; |
|
1476 parse_atl_str Args.thm_sel "(1-,4,13-22)" ; |
|
1477 |
|
1478 val (ats as atsrc :: _, "") = parse_atl_str (Args.attribs I) |
|
1479 "[THEN trans [THEN sym], simp, OF sym]" ; |
|
1480 \end{verbatim} |
|
1481 |
|
1482 From here, an attribute is interpreted using \texttt{Attrib.attribute}. |
|
1483 |
|
1484 \texttt{Args} has a large number of functions which parse an \texttt{Args.src} |
|
1485 and also refer to a generic context. |
|
1486 Note the use of \texttt{Scan.lift} for this. |
|
1487 (as does \texttt{Attrib} - RETHINK THIS) |
|
1488 |
|
1489 (\texttt{Args.syntax} shown below has type specialised) |
|
1490 |
|
1491 \begin{verbatim} |
|
1492 type ('res, 'src) parse_fn = 'src -> 'res * 'src ; |
|
1493 type 'a cgatlp = ('a, Context.generic * Args.T list) parse_fn ; |
|
1494 Scan.lift : 'a atlp -> 'a cgatlp ; |
|
1495 term : term cgatlp ; |
|
1496 typ : typ cgatlp ; |
|
1497 |
|
1498 Args.syntax : string -> 'res cgatlp -> src -> ('res, Context.generic) parse_fn ; |
|
1499 Attrib.thm : thm cgatlp ; |
|
1500 Attrib.thms : thm list cgatlp ; |
|
1501 Attrib.multi_thm : thm list cgatlp ; |
|
1502 |
|
1503 (* parse_cgatl_str : 'a cgatlp -> (string -> 'a * string) ; |
|
1504 given a (Context.generic * Args.T list) parser, to get a string parser *) |
|
1505 fun parse_cgatl_str cgatlp str = |
|
1506 let |
|
1507 (* use the current generic context *) |
|
1508 val generic = Context.Theory thy ; |
|
1509 val (ats, rem_str) = parse_str P.arguments str ; |
|
1510 (* ignore any change to the generic context *) |
|
1511 val (res, (_, rem_ats)) = cgatlp (generic, ats) ; |
|
1512 in (res, String.concat (Library.separate " " |
|
1513 (List.map Args.string_of rem_ats @ [rem_str]))) end ; |
|
1514 \end{verbatim} |
|
1515 *} |
|
1516 |
|
1517 section{* Attributes, and the \texttt{Attrib} structure *} |
|
1518 |
|
1519 text {* |
|
1520 The type \texttt{attribute} is declared in \texttt{src/Pure/thm.ML}. |
|
1521 The source file for the \texttt{Attrib} structure is |
|
1522 \texttt{src/Pure/Isar/attrib.ML}. |
|
1523 Most attributes use a theorem to change a generic context (for example, |
|
1524 by declaring that the theorem should be used, by default, in simplification), |
|
1525 or change a theorem (which most often involves referring to the current |
|
1526 theory). |
|
1527 The functions \texttt{Thm.rule\_attribute} and |
|
1528 \texttt{Thm.declaration\_attribute} create attributes of these kinds. |
|
1529 |
|
1530 \begin{verbatim} |
|
1531 type attribute = Context.generic * thm -> Context.generic * thm; |
|
1532 type 'a trf = 'a -> 'a ; (* transformer of a given type *) |
|
1533 Thm.rule_attribute : (Context.generic -> thm -> thm) -> attribute ; |
|
1534 Thm.declaration_attribute : (thm -> Context.generic trf) -> attribute ; |
|
1535 |
|
1536 Attrib.print_attributes : theory -> unit ; |
|
1537 Attrib.pretty_attribs : Proof.context -> src list -> Pretty.T list ; |
|
1538 |
|
1539 List.app Pretty.writeln (Attrib.pretty_attribs ctxt ass) ; |
|
1540 \end{verbatim} |
|
1541 |
|
1542 An attribute is stored in a theory as indicated by: |
|
1543 \begin{verbatim} |
|
1544 Attrib.add_attributes : |
|
1545 (bstring * (src -> attribute) * string) list -> theory trf ; |
|
1546 (* |
|
1547 Attrib.add_attributes [("THEN", THEN_att, "resolution with rule")] ; |
|
1548 *) |
|
1549 \end{verbatim} |
|
1550 where the first and third arguments are name and description of the attribute, |
|
1551 and the second is a function which parses the attribute input text |
|
1552 (including the attribute name, which has necessarily already been parsed). |
|
1553 Here, \texttt{THEN\_att} is a function declared in the code for the |
|
1554 structure \texttt{Attrib}, but not published in its signature. |
|
1555 The source file \texttt{src/Pure/Isar/attrib.ML} shows the use of |
|
1556 \texttt{Attrib.add\_attributes} to add a number of attributes. |
|
1557 |
|
1558 \begin{verbatim} |
|
1559 FullAttrib.THEN_att : src -> attribute ; |
|
1560 FullAttrib.THEN_att atsrc (generic, ML_Context.thm "sym") ; |
|
1561 FullAttrib.THEN_att atsrc (generic, ML_Context.thm "all_comm") ; |
|
1562 \end{verbatim} |
|
1563 |
|
1564 \begin{verbatim} |
|
1565 Attrib.syntax : attribute cgatlp -> src -> attribute ; |
|
1566 Attrib.no_args : attribute -> src -> attribute ; |
|
1567 \end{verbatim} |
|
1568 When this is called as \texttt{syntax scan src (gc, th)} |
|
1569 the generic context \texttt{gc} is used |
|
1570 (and potentially changed to \texttt{gc'}) |
|
1571 by \texttt{scan} in parsing to obtain an attribute \texttt{attr} which would |
|
1572 then be applied to \texttt{(gc', th)}. |
|
1573 The source for parsing the attribute is the arguments part of \texttt{src}, |
|
1574 which must all be consumed by the parse. |
|
1575 |
|
1576 For example, for \texttt{Attrib.no\_args attr src}, the attribute parser |
|
1577 simply returns \texttt{attr}, requiring that the arguments part of |
|
1578 \texttt{src} must be empty. |
|
1579 |
|
1580 Some examples from \texttt{src/Pure/Isar/attrib.ML}, modified: |
|
1581 \begin{verbatim} |
|
1582 fun rot_att_n n (gc, th) = (gc, rotate_prems n th) ; |
|
1583 rot_att_n : int -> attribute ; |
|
1584 val rot_arg = Scan.lift (Scan.optional Args.int 1 : int atlp) : int cgatlp ; |
|
1585 val rotated_att : src -> attribute = |
|
1586 Attrib.syntax (rot_arg >> rot_att_n : attribute cgatlp) ; |
|
1587 |
|
1588 val THEN_arg : int cgatlp = Scan.lift |
|
1589 (Scan.optional (Args.bracks Args.nat : int atlp) 1 : int atlp) ; |
|
1590 |
|
1591 Attrib.thm : thm cgatlp ; |
|
1592 |
|
1593 THEN_arg -- Attrib.thm : (int * thm) cgatlp ; |
|
1594 |
|
1595 fun THEN_att_n (n, tht) (gc, th) = (gc, th RSN (n, tht)) ; |
|
1596 THEN_att_n : int * thm -> attribute ; |
|
1597 |
|
1598 val THEN_att : src -> attribute = Attrib.syntax |
|
1599 (THEN_arg -- Attrib.thm >> THEN_att_n : attribute cgatlp); |
|
1600 \end{verbatim} |
|
1601 The functions I've called \texttt{rot\_arg} and \texttt{THEN\_arg} |
|
1602 read an optional argument, which for \texttt{rotated} is an integer, |
|
1603 and for \texttt{THEN} is a natural enclosed in square brackets; |
|
1604 the default, if the argument is absent, is 1 in each case. |
|
1605 Functions \texttt{rot\_att\_n} and \texttt{THEN\_att\_n} turn these into |
|
1606 attributes, where \texttt{THEN\_att\_n} also requires a theorem, which is |
|
1607 parsed by \texttt{Attrib.thm}. |
|
1608 Infix operators \texttt{--} and \texttt{>>} are in the structure \texttt{Scan}. |
|
1609 |
|
1610 *} |
|
1611 |
|
1612 section{* Methods, and the \texttt{Method} structure *} |
|
1613 |
|
1614 text {* |
|
1615 The source file is \texttt{src/Pure/Isar/method.ML}. |
|
1616 The type \texttt{method} is defined by the datatype declaration |
|
1617 \begin{verbatim} |
|
1618 (* datatype method = Meth of thm list -> cases_tactic; *) |
|
1619 RuleCases.NO_CASES : tactic -> cases_tactic ; |
|
1620 \end{verbatim} |
|
1621 In fact \texttt{RAW\_METHOD\_CASES} (below) is exactly the constructor |
|
1622 \texttt{Meth}. |
|
1623 A \texttt{cases\_tactic} is an elaborated version of a tactic. |
|
1624 \texttt{NO\_CASES tac} is a \texttt{cases\_tactic} which consists of a |
|
1625 \texttt{cases\_tactic} without any further case information. |
|
1626 For further details see the description of structure \texttt{RuleCases} below. |
|
1627 The list of theorems to be passed to a method consists of the current |
|
1628 \emph{facts} in the proof. |
|
1629 |
|
1630 \begin{verbatim} |
|
1631 RAW_METHOD : (thm list -> tactic) -> method ; |
|
1632 METHOD : (thm list -> tactic) -> method ; |
|
1633 |
|
1634 SIMPLE_METHOD : tactic -> method ; |
|
1635 SIMPLE_METHOD' : (int -> tactic) -> method ; |
|
1636 SIMPLE_METHOD'' : ((int -> tactic) -> tactic) -> (int -> tactic) -> method ; |
|
1637 |
|
1638 RAW_METHOD_CASES : (thm list -> cases_tactic) -> method ; |
|
1639 METHOD_CASES : (thm list -> cases_tactic) -> method ; |
|
1640 \end{verbatim} |
|
1641 A method is, in its simplest form, a tactic; applying the method is to apply |
|
1642 the tactic to the current goal state. |
|
1643 |
|
1644 Applying \texttt{RAW\_METHOD tacf} creates a tactic by applying |
|
1645 \texttt{tacf} to the current {facts}, and applying that tactic to the |
|
1646 goal state. |
|
1647 |
|
1648 \texttt{METHOD} is similar but also first applies |
|
1649 \texttt{Goal.conjunction\_tac} to all subgoals. |
|
1650 |
|
1651 \texttt{SIMPLE\_METHOD tac} inserts the facts into all subgoals and then |
|
1652 applies \texttt{tacf}. |
|
1653 |
|
1654 \texttt{SIMPLE\_METHOD' tacf} inserts the facts and then |
|
1655 applies \texttt{tacf} to subgoal 1. |
|
1656 |
|
1657 \texttt{SIMPLE\_METHOD'' quant tacf} does this for subgoal(s) selected by |
|
1658 \texttt{quant}, which may be, for example, |
|
1659 \texttt{ALLGOALS} (all subgoals), |
|
1660 \texttt{TRYALL} (try all subgoals, failure is OK), |
|
1661 \texttt{FIRSTGOAL} (try subgoals until it succeeds once), |
|
1662 \texttt{(fn tacf => tacf 4)} (subgoal 4), etc |
|
1663 (see the \texttt{Tactical} structure, FIXME) %%\cite[Chapter 4]{ref}). |
|
1664 |
|
1665 A method is stored in a theory as indicated by: |
|
1666 \begin{verbatim} |
|
1667 Method.add_method : |
|
1668 (bstring * (src -> Proof.context -> method) * string) -> theory trf ; |
|
1669 ( * |
|
1670 * ) |
|
1671 \end{verbatim} |
|
1672 where the first and third arguments are name and description of the method, |
|
1673 and the second is a function which parses the method input text |
|
1674 (including the method name, which has necessarily already been parsed). |
|
1675 |
|
1676 Here, \texttt{xxx} is a function declared in the code for the |
|
1677 structure \texttt{Method}, but not published in its signature. |
|
1678 The source file \texttt{src/Pure/Isar/method.ML} shows the use of |
|
1679 \texttt{Method.add\_method} to add a number of methods. |
|
1680 |
|
1681 |
|
1682 *} |
|
1683 (*>*) |
|
1684 end |