35 section {* Building Generic Parsers *} |
35 section {* Building Generic Parsers *} |
36 |
36 |
37 text {* |
37 text {* |
38 |
38 |
39 Let us first have a look at parsing strings using generic parsing |
39 Let us first have a look at parsing strings using generic parsing |
40 combinators. The function @{ML_ind "$$"} takes a string as argument and will |
40 combinators. The function @{ML_ind "$$" in Scan} takes a string as argument and will |
41 ``consume'' this string from a given input list of strings. ``Consume'' in |
41 ``consume'' this string from a given input list of strings. ``Consume'' in |
42 this context means that it will return a pair consisting of this string and |
42 this context means that it will return a pair consisting of this string and |
43 the rest of the input list. For example: |
43 the rest of the input list. For example: |
44 |
44 |
45 @{ML_response [display,gray] |
45 @{ML_response [display,gray] |
69 \end{itemize} |
69 \end{itemize} |
70 |
70 |
71 However, note that these exceptions are private to the parser and cannot be accessed |
71 However, note that these exceptions are private to the parser and cannot be accessed |
72 by the programmer (for example to handle them). |
72 by the programmer (for example to handle them). |
73 |
73 |
74 In the examples above we use the function @{ML_ind Symbol.explode}, instead of the |
74 In the examples above we use the function @{ML_ind explode in Symbol} in the |
75 more standard library function @{ML_ind explode}, for obtaining an input list for |
75 structure @{ML_struct Symbol}, instead of the more standard library function |
76 the parser. The reason is that @{ML_ind Symbol.explode} is aware of character sequences, |
76 @{ML_ind explode}, for obtaining an input list for the parser. The reason is |
77 for example @{text "\<foo>"}, that have a special meaning in Isabelle. To see |
77 that @{ML_ind explode} in @{ML_struct Symbol} is aware of character |
78 the difference consider |
78 sequences, for example @{text "\<foo>"}, that have a special meaning in |
|
79 Isabelle. To see the difference consider |
79 |
80 |
80 @{ML_response_fake [display,gray] |
81 @{ML_response_fake [display,gray] |
81 "let |
82 "let |
82 val input = \"\<foo> bar\" |
83 val input = \"\<foo> bar\" |
83 in |
84 in |
85 end" |
86 end" |
86 "([\"\\\", \"<\", \"f\", \"o\", \"o\", \">\", \" \", \"b\", \"a\", \"r\"], |
87 "([\"\\\", \"<\", \"f\", \"o\", \"o\", \">\", \" \", \"b\", \"a\", \"r\"], |
87 [\"\<foo>\", \" \", \"b\", \"a\", \"r\"])"} |
88 [\"\<foo>\", \" \", \"b\", \"a\", \"r\"])"} |
88 |
89 |
89 Slightly more general than the parser @{ML "$$"} is the function |
90 Slightly more general than the parser @{ML "$$"} is the function |
90 @{ML_ind one in Scan}, in that it takes a predicate as argument and |
91 @{ML_ind one in Scan}, in that it takes a predicate as argument and |
91 then parses exactly |
92 then parses exactly |
92 one item from the input list satisfying this predicate. For example the |
93 one item from the input list satisfying this predicate. For example the |
93 following parser either consumes an @{text [quotes] "h"} or a @{text |
94 following parser either consumes an @{text [quotes] "h"} or a @{text |
94 [quotes] "w"}: |
95 [quotes] "w"}: |
95 |
96 |
101 in |
102 in |
102 (hw input1, hw input2) |
103 (hw input1, hw input2) |
103 end" |
104 end" |
104 "((\"h\", [\"e\", \"l\", \"l\", \"o\"]),(\"w\", [\"o\", \"r\", \"l\", \"d\"]))"} |
105 "((\"h\", [\"e\", \"l\", \"l\", \"o\"]),(\"w\", [\"o\", \"r\", \"l\", \"d\"]))"} |
105 |
106 |
106 Two parsers can be connected in sequence by using the function @{ML_ind "--"}. |
107 Two parsers can be connected in sequence by using the function @{ML_ind "--" in Scan}. |
107 For example parsing @{text "h"}, @{text "e"} and @{text "l"} (in this |
108 For example parsing @{text "h"}, @{text "e"} and @{text "l"} (in this |
108 order) you can achieve by: |
109 order) you can achieve by: |
109 |
110 |
110 @{ML_response [display,gray] |
111 @{ML_response [display,gray] |
111 "($$ \"h\" -- $$ \"e\" -- $$ \"l\") (Symbol.explode \"hello\")" |
112 "($$ \"h\" -- $$ \"e\" -- $$ \"l\") (Symbol.explode \"hello\")" |
119 @{ML_response [display,gray] |
120 @{ML_response [display,gray] |
120 "Scan.this_string \"hell\" (Symbol.explode \"hello\")" |
121 "Scan.this_string \"hell\" (Symbol.explode \"hello\")" |
121 "(\"hell\", [\"o\"])"} |
122 "(\"hell\", [\"o\"])"} |
122 |
123 |
123 Parsers that explore alternatives can be constructed using the function |
124 Parsers that explore alternatives can be constructed using the function |
124 @{ML_ind "||"}. The parser @{ML "(p || q)" for p q} returns the |
125 @{ML_ind "||" in Scan}. The parser @{ML "(p || q)" for p q} returns the |
125 result of @{text "p"}, in case it succeeds, otherwise it returns the |
126 result of @{text "p"}, in case it succeeds, otherwise it returns the |
126 result of @{text "q"}. For example: |
127 result of @{text "q"}. For example: |
127 |
128 |
128 |
129 |
129 @{ML_response [display,gray] |
130 @{ML_response [display,gray] |
134 in |
135 in |
135 (hw input1, hw input2) |
136 (hw input1, hw input2) |
136 end" |
137 end" |
137 "((\"h\", [\"e\", \"l\", \"l\", \"o\"]), (\"w\", [\"o\", \"r\", \"l\", \"d\"]))"} |
138 "((\"h\", [\"e\", \"l\", \"l\", \"o\"]), (\"w\", [\"o\", \"r\", \"l\", \"d\"]))"} |
138 |
139 |
139 The functions @{ML_ind "|--"} and @{ML_ind "--|"} work like the sequencing |
140 The functions @{ML_ind "|--" in Scan} and @{ML_ind "--|" in Scan} work like the sequencing |
140 function for parsers, except that they discard the item being parsed by the |
141 function for parsers, except that they discard the item being parsed by the |
141 first (respectively second) parser. For example: |
142 first (respectively second) parser. For example: |
142 |
143 |
143 @{ML_response [display,gray] |
144 @{ML_response [display,gray] |
144 "let |
145 "let |
162 in |
163 in |
163 (p input1, p input2) |
164 (p input1, p input2) |
164 end" |
165 end" |
165 "((\"h\", [\"e\", \"l\", \"l\", \"o\"]), (\"x\", [\"w\", \"o\", \"r\", \"l\", \"d\"]))"} |
166 "((\"h\", [\"e\", \"l\", \"l\", \"o\"]), (\"x\", [\"w\", \"o\", \"r\", \"l\", \"d\"]))"} |
166 |
167 |
167 The function @{ML_ind option in Scan} works similarly, except no default value can |
168 The function @{ML_ind option in Scan} works similarly, except no default value can |
168 be given. Instead, the result is wrapped as an @{text "option"}-type. For example: |
169 be given. Instead, the result is wrapped as an @{text "option"}-type. For example: |
169 |
170 |
170 @{ML_response [display,gray] |
171 @{ML_response [display,gray] |
171 "let |
172 "let |
172 val p = Scan.option ($$ \"h\") |
173 val p = Scan.option ($$ \"h\") |
174 val input2 = Symbol.explode \"world\" |
175 val input2 = Symbol.explode \"world\" |
175 in |
176 in |
176 (p input1, p input2) |
177 (p input1, p input2) |
177 end" "((SOME \"h\", [\"e\", \"l\", \"l\", \"o\"]), (NONE, [\"w\", \"o\", \"r\", \"l\", \"d\"]))"} |
178 end" "((SOME \"h\", [\"e\", \"l\", \"l\", \"o\"]), (NONE, [\"w\", \"o\", \"r\", \"l\", \"d\"]))"} |
178 |
179 |
179 The function @{ML_ind ahead in Scan} parses some input, but leaves the original |
180 The function @{ML_ind ahead in Scan} parses some input, but leaves the original |
180 input unchanged. For example: |
181 input unchanged. For example: |
181 |
182 |
182 @{ML_response [display,gray] |
183 @{ML_response [display,gray] |
183 "Scan.ahead (Scan.this_string \"foo\") (Symbol.explode \"foo\")" |
184 "Scan.ahead (Scan.this_string \"foo\") (Symbol.explode \"foo\")" |
184 "(\"foo\", [\"f\", \"o\", \"o\"])"} |
185 "(\"foo\", [\"f\", \"o\", \"o\"])"} |
185 |
186 |
186 The function @{ML_ind "!!"} helps with producing appropriate error messages |
187 The function @{ML_ind "!!" in Scan} helps with producing appropriate error messages |
187 during parsing. For example if you want to parse @{text p} immediately |
188 during parsing. For example if you want to parse @{text p} immediately |
188 followed by @{text q}, or start a completely different parser @{text r}, |
189 followed by @{text q}, or start a completely different parser @{text r}, |
189 you might write: |
190 you might write: |
190 |
191 |
191 @{ML [display,gray] "(p -- q) || r" for p q r} |
192 @{ML [display,gray] "(p -- q) || r" for p q r} |
217 @{ML_response_fake [display,gray] "(!! (fn _ => \"foo\") ($$ \"h\")) (Symbol.explode \"world\")" |
218 @{ML_response_fake [display,gray] "(!! (fn _ => \"foo\") ($$ \"h\")) (Symbol.explode \"world\")" |
218 "Exception ABORT raised"} |
219 "Exception ABORT raised"} |
219 |
220 |
220 then the parsing aborts and the error message @{text "foo"} is printed. In order to |
221 then the parsing aborts and the error message @{text "foo"} is printed. In order to |
221 see the error message properly, you need to prefix the parser with the function |
222 see the error message properly, you need to prefix the parser with the function |
222 @{ML_ind error in Scan}. For example: |
223 @{ML_ind error in Scan}. For example: |
223 |
224 |
224 @{ML_response_fake [display,gray] |
225 @{ML_response_fake [display,gray] |
225 "Scan.error (!! (fn _ => \"foo\") ($$ \"h\"))" |
226 "Scan.error (!! (fn _ => \"foo\") ($$ \"h\"))" |
226 "Exception Error \"foo\" raised"} |
227 "Exception Error \"foo\" raised"} |
227 |
228 |
228 This ``prefixing'' is usually done by wrappers such as @{ML_ind local_theory in OuterSyntax} |
229 This ``prefixing'' is usually done by wrappers such as @{ML_ind local_theory in OuterSyntax} |
229 (see Section~\ref{sec:newcommand} which explains this function in more detail). |
230 (see Section~\ref{sec:newcommand} which explains this function in more detail). |
230 |
231 |
231 Let us now return to our example of parsing @{ML "(p -- q) || r" for p q |
232 Let us now return to our example of parsing @{ML "(p -- q) || r" for p q |
232 r}. If you want to generate the correct error message for failure |
233 r}. If you want to generate the correct error message for failure |
233 of parsing @{text "p"}-followed-by-@{text "q"}, then you have to write: |
234 of parsing @{text "p"}-followed-by-@{text "q"}, then you have to write: |
260 often as it succeeds. For example: |
261 often as it succeeds. For example: |
261 |
262 |
262 @{ML_response [display,gray] "Scan.repeat ($$ \"h\") (Symbol.explode \"hhhhello\")" |
263 @{ML_response [display,gray] "Scan.repeat ($$ \"h\") (Symbol.explode \"hhhhello\")" |
263 "([\"h\", \"h\", \"h\", \"h\"], [\"e\", \"l\", \"l\", \"o\"])"} |
264 "([\"h\", \"h\", \"h\", \"h\"], [\"e\", \"l\", \"l\", \"o\"])"} |
264 |
265 |
265 Note that @{ML_ind repeat in Scan} stores the parsed items in a list. The function |
266 Note that @{ML_ind repeat in Scan} stores the parsed items in a list. The function |
266 @{ML_ind repeat1 in Scan} is similar, but requires that the parser @{text "p"} |
267 @{ML_ind repeat1 in Scan} is similar, but requires that the parser @{text "p"} |
267 succeeds at least once. |
268 succeeds at least once. |
268 |
269 |
269 Also note that the parser would have aborted with the exception @{text MORE}, if |
270 Also note that the parser would have aborted with the exception @{text MORE}, if |
270 you had it run with the string @{text [quotes] "hhhh"}. This can be avoided by using |
271 you had it run with the string @{text [quotes] "hhhh"}. This can be avoided by using |
271 the wrapper @{ML_ind finite in Scan} and the ``stopper-token'' |
272 the wrapper @{ML_ind finite in Scan} and the ``stopper-token'' |
272 @{ML_ind stopper in Symbol}. With them you can write: |
273 @{ML_ind stopper in Symbol}. With them you can write: |
273 |
274 |
274 @{ML_response [display,gray] "Scan.finite Symbol.stopper (Scan.repeat ($$ \"h\")) (Symbol.explode \"hhhh\")" |
275 @{ML_response [display,gray] "Scan.finite Symbol.stopper (Scan.repeat ($$ \"h\")) (Symbol.explode \"hhhh\")" |
275 "([\"h\", \"h\", \"h\", \"h\"], [])"} |
276 "([\"h\", \"h\", \"h\", \"h\"], [])"} |
276 |
277 |
277 The function @{ML stopper in Symbol} is the ``end-of-input'' indicator for parsing strings; |
278 The function @{ML stopper in Symbol} is the ``end-of-input'' indicator for parsing strings; |
278 other stoppers need to be used when parsing, for example, tokens. However, this kind of |
279 other stoppers need to be used when parsing, for example, tokens. However, this kind of |
279 manually wrapping is often already done by the surrounding infrastructure. |
280 manually wrapping is often already done by the surrounding infrastructure. |
280 |
281 |
281 The function @{ML_ind repeat in Scan} can be used with @{ML_ind one in Scan} to read any |
282 The function @{ML_ind repeat in Scan} can be used with @{ML_ind one in Scan} to read any |
282 string as in |
283 string as in |
283 |
284 |
284 @{ML_response [display,gray] |
285 @{ML_response [display,gray] |
285 "let |
286 "let |
286 val p = Scan.repeat (Scan.one Symbol.not_eof) |
287 val p = Scan.repeat (Scan.one Symbol.not_eof) |
288 in |
289 in |
289 Scan.finite Symbol.stopper p input |
290 Scan.finite Symbol.stopper p input |
290 end" |
291 end" |
291 "([\"f\", \"o\", \"o\", \" \", \"b\", \"a\", \"r\", \" \", \"f\", \"o\", \"o\"], [])"} |
292 "([\"f\", \"o\", \"o\", \" \", \"b\", \"a\", \"r\", \" \", \"f\", \"o\", \"o\"], [])"} |
292 |
293 |
293 where the function @{ML_ind not_eof in Symbol} ensures that we do not read beyond the |
294 where the function @{ML_ind not_eof in Symbol} ensures that we do not read beyond the |
294 end of the input string (i.e.~stopper symbol). |
295 end of the input string (i.e.~stopper symbol). |
295 |
296 |
296 The function @{ML_ind unless in Scan} takes two parsers: if the first one can |
297 The function @{ML_ind unless in Scan} takes two parsers: if the first one can |
297 parse the input, then the whole parser fails; if not, then the second is tried. Therefore |
298 parse the input, then the whole parser fails; if not, then the second is tried. Therefore |
298 |
299 |
299 @{ML_response_fake_both [display,gray] "Scan.unless ($$ \"h\") ($$ \"w\") (Symbol.explode \"hello\")" |
300 @{ML_response_fake_both [display,gray] "Scan.unless ($$ \"h\") ($$ \"w\") (Symbol.explode \"hello\")" |
300 "Exception FAIL raised"} |
301 "Exception FAIL raised"} |
301 |
302 |
304 @{ML_response [display,gray] "Scan.unless ($$ \"h\") ($$ \"w\") (Symbol.explode \"world\")" |
305 @{ML_response [display,gray] "Scan.unless ($$ \"h\") ($$ \"w\") (Symbol.explode \"world\")" |
305 "(\"w\",[\"o\", \"r\", \"l\", \"d\"])"} |
306 "(\"w\",[\"o\", \"r\", \"l\", \"d\"])"} |
306 |
307 |
307 succeeds. |
308 succeeds. |
308 |
309 |
309 The functions @{ML_ind repeat in Scan} and @{ML_ind unless in Scan} can |
310 The functions @{ML_ind repeat in Scan} and @{ML_ind unless in Scan} can |
310 be combined to read any input until a certain marker symbol is reached. In the |
311 be combined to read any input until a certain marker symbol is reached. In the |
311 example below the marker symbol is a @{text [quotes] "*"}. |
312 example below the marker symbol is a @{text [quotes] "*"}. |
312 |
313 |
313 @{ML_response [display,gray] |
314 @{ML_response [display,gray] |
314 "let |
315 "let |
322 "(([\"f\", \"o\", \"o\", \"o\", \"o\", \"o\"], []), |
323 "(([\"f\", \"o\", \"o\", \"o\", \"o\", \"o\"], []), |
323 ([\"f\", \"o\", \"o\"], [\"*\", \"o\", \"o\", \"o\"]))"} |
324 ([\"f\", \"o\", \"o\"], [\"*\", \"o\", \"o\", \"o\"]))"} |
324 |
325 |
325 |
326 |
326 After parsing is done, you almost always want to apply a function to the parsed |
327 After parsing is done, you almost always want to apply a function to the parsed |
327 items. One way to do this is the function @{ML_ind ">>"} where |
328 items. One way to do this is the function @{ML_ind ">>" in Scan} where |
328 @{ML "(p >> f)" for p f} runs |
329 @{ML "(p >> f)" for p f} runs |
329 first the parser @{text p} and upon successful completion applies the |
330 first the parser @{text p} and upon successful completion applies the |
330 function @{text f} to the result. For example |
331 function @{text f} to the result. For example |
331 |
332 |
332 @{ML_response [display,gray] |
333 @{ML_response [display,gray] |
350 "(\"foo bar foo\",[])"} |
351 "(\"foo bar foo\",[])"} |
351 |
352 |
352 where the single-character strings in the parsed output are transformed |
353 where the single-character strings in the parsed output are transformed |
353 back into one string. |
354 back into one string. |
354 |
355 |
355 The function @{ML_ind lift in Scan} takes a parser and a pair as arguments. This function applies |
356 The function @{ML_ind lift in Scan} takes a parser and a pair as arguments. This function applies |
356 the given parser to the second component of the pair and leaves the first component |
357 the given parser to the second component of the pair and leaves the first component |
357 untouched. For example |
358 untouched. For example |
358 |
359 |
359 @{ML_response [display,gray] |
360 @{ML_response [display,gray] |
360 "Scan.lift ($$ \"h\" -- $$ \"e\") (1, Symbol.explode \"hello\")" |
361 "Scan.lift ($$ \"h\" -- $$ \"e\") (1, Symbol.explode \"hello\")" |
391 @{ML_struct OuterParse} defined in the file @{ML_file "Pure/Isar/outer_parse.ML"}. |
392 @{ML_struct OuterParse} defined in the file @{ML_file "Pure/Isar/outer_parse.ML"}. |
392 The definition for tokens is in the file @{ML_file "Pure/Isar/outer_lex.ML"}. |
393 The definition for tokens is in the file @{ML_file "Pure/Isar/outer_lex.ML"}. |
393 \end{readmore} |
394 \end{readmore} |
394 |
395 |
395 The structure @{ML_struct OuterLex} defines several kinds of tokens (for |
396 The structure @{ML_struct OuterLex} defines several kinds of tokens (for |
396 example @{ML_ind Ident in OuterLex} for identifiers, @{ML Keyword in |
397 example @{ML_ind Ident in OuterLex} for identifiers, @{ML Keyword in |
397 OuterLex} for keywords and @{ML_ind Command in OuterLex} for commands). Some |
398 OuterLex} for keywords and @{ML_ind Command in OuterLex} for commands). Some |
398 token parsers take into account the kind of tokens. The first example shows |
399 token parsers take into account the kind of tokens. The first example shows |
399 how to generate a token list out of a string using the function |
400 how to generate a token list out of a string using the function |
400 @{ML_ind scan in OuterSyntax}. It is given the argument |
401 @{ML_ind scan in OuterSyntax}. It is given the argument |
401 @{ML "Position.none"} since, |
402 @{ML "Position.none"} since, |
402 at the moment, we are not interested in generating precise error |
403 at the moment, we are not interested in generating precise error |
403 messages. The following code\footnote{Note that because of a possible bug in |
404 messages. The following code\footnote{Note that because of a possible bug in |
404 the PolyML runtime system, the result is printed as @{text [quotes] "?"}, |
405 the PolyML runtime system, the result is printed as @{text [quotes] "?"}, |
405 instead of the tokens.} |
406 instead of the tokens.} |
413 produces three tokens where the first and the last are identifiers, since |
414 produces three tokens where the first and the last are identifiers, since |
414 @{text [quotes] "hello"} and @{text [quotes] "world"} do not match any |
415 @{text [quotes] "hello"} and @{text [quotes] "world"} do not match any |
415 other syntactic category. The second indicates a space. |
416 other syntactic category. The second indicates a space. |
416 |
417 |
417 We can easily change what is recognised as a keyword with the function |
418 We can easily change what is recognised as a keyword with the function |
418 @{ML_ind keyword in OuterKeyword}. For example calling it with |
419 @{ML_ind keyword in OuterKeyword}. For example calling it with |
419 *} |
420 *} |
420 |
421 |
421 ML{*val _ = OuterKeyword.keyword "hello"*} |
422 ML{*val _ = OuterKeyword.keyword "hello"*} |
422 |
423 |
423 text {* |
424 text {* |
428 Token (\<dots>,(Space, \" \"),\<dots>), |
429 Token (\<dots>,(Space, \" \"),\<dots>), |
429 Token (\<dots>,(Ident, \"world\"),\<dots>)]"} |
430 Token (\<dots>,(Ident, \"world\"),\<dots>)]"} |
430 |
431 |
431 Many parsing functions later on will require white space, comments and the like |
432 Many parsing functions later on will require white space, comments and the like |
432 to have already been filtered out. So from now on we are going to use the |
433 to have already been filtered out. So from now on we are going to use the |
433 functions @{ML filter} and @{ML_ind is_proper in OuterLex} to do this. |
434 functions @{ML filter} and @{ML_ind is_proper in OuterLex} to do this. |
434 For example: |
435 For example: |
435 |
436 |
436 @{ML_response_fake [display,gray] |
437 @{ML_response_fake [display,gray] |
437 "let |
438 "let |
438 val input = OuterSyntax.scan Position.none \"hello world\" |
439 val input = OuterSyntax.scan Position.none \"hello world\" |
466 in |
467 in |
467 (Scan.dest_lexicon commands, Scan.dest_lexicon keywords) |
468 (Scan.dest_lexicon commands, Scan.dest_lexicon keywords) |
468 end" |
469 end" |
469 "([\"}\", \"{\", \<dots>], [\"\<rightleftharpoons>\", \"\<leftharpoondown>\", \<dots>])"} |
470 "([\"}\", \"{\", \<dots>], [\"\<rightleftharpoons>\", \"\<leftharpoondown>\", \<dots>])"} |
470 |
471 |
471 You might have to adjust the @{ML_ind print_depth} in order to |
472 You might have to adjust the @{ML_ind print_depth} in order to |
472 see the complete list. |
473 see the complete list. |
473 |
474 |
474 The parser @{ML_ind "$$$" in OuterParse} parses a single keyword. For example: |
475 The parser @{ML_ind "$$$" in OuterParse} parses a single keyword. For example: |
475 |
476 |
476 @{ML_response [display,gray] |
477 @{ML_response [display,gray] |
477 "let |
478 "let |
478 val input1 = filtered_input \"where for\" |
479 val input1 = filtered_input \"where for\" |
479 val input2 = filtered_input \"| in\" |
480 val input2 = filtered_input \"| in\" |
480 in |
481 in |
481 (OuterParse.$$$ \"where\" input1, OuterParse.$$$ \"|\" input2) |
482 (OuterParse.$$$ \"where\" input1, OuterParse.$$$ \"|\" input2) |
482 end" |
483 end" |
483 "((\"where\",\<dots>), (\"|\",\<dots>))"} |
484 "((\"where\",\<dots>), (\"|\",\<dots>))"} |
484 |
485 |
485 Any non-keyword string can be parsed with the function @{ML_ind reserved in OuterParse}. |
486 Any non-keyword string can be parsed with the function @{ML_ind reserved in OuterParse}. |
486 For example: |
487 For example: |
487 |
488 |
488 @{ML_response [display,gray] |
489 @{ML_response [display,gray] |
489 "let |
490 "let |
490 val p = OuterParse.reserved \"bar\" |
491 val p = OuterParse.reserved \"bar\" |
556 but keyword in was found\" raised" |
557 but keyword in was found\" raised" |
557 } |
558 } |
558 |
559 |
559 \begin{exercise} (FIXME) |
560 \begin{exercise} (FIXME) |
560 A type-identifier, for example @{typ "'a"}, is a token of |
561 A type-identifier, for example @{typ "'a"}, is a token of |
561 kind @{ML_ind Keyword in OuterLex}. It can be parsed using |
562 kind @{ML_ind Keyword in OuterLex}. It can be parsed using |
562 the function @{ML type_ident in OuterParse}. |
563 the function @{ML type_ident in OuterParse}. |
563 \end{exercise} |
564 \end{exercise} |
564 |
565 |
565 (FIXME: or give parser for numbers) |
566 (FIXME: or give parser for numbers) |
566 |
567 |
737 |
738 |
738 As you see, the result is a pair consisting of a list of |
739 As you see, the result is a pair consisting of a list of |
739 variables with optional type-annotation and syntax-annotation, and a list of |
740 variables with optional type-annotation and syntax-annotation, and a list of |
740 rules where every rule has optionally a name and an attribute. |
741 rules where every rule has optionally a name and an attribute. |
741 |
742 |
742 The function @{ML_ind "fixes" in OuterParse} in Line 2 of the parser reads an |
743 The function @{ML_ind "fixes" in OuterParse} in Line 2 of the parser reads an |
743 \isacommand{and}-separated |
744 \isacommand{and}-separated |
744 list of variables that can include optional type annotations and syntax translations. |
745 list of variables that can include optional type annotations and syntax translations. |
745 For example:\footnote{Note that in the code we need to write |
746 For example:\footnote{Note that in the code we need to write |
746 @{text "\\\"int \<Rightarrow> bool\\\""} in order to properly escape the double quotes |
747 @{text "\\\"int \<Rightarrow> bool\\\""} in order to properly escape the double quotes |
747 in the compound type.} |
748 in the compound type.} |
761 text {* |
762 text {* |
762 Whenever types are given, they are stored in the @{ML SOME}s. The types are |
763 Whenever types are given, they are stored in the @{ML SOME}s. The types are |
763 not yet used to type the variables: this must be done by type-inference later |
764 not yet used to type the variables: this must be done by type-inference later |
764 on. Since types are part of the inner syntax they are strings with some |
765 on. Since types are part of the inner syntax they are strings with some |
765 encoded information (see previous section). If a mixfix-syntax is |
766 encoded information (see previous section). If a mixfix-syntax is |
766 present for a variable, then it is stored in the @{ML_ind Mixfix} data structure; |
767 present for a variable, then it is stored in the @{ML_ind Mixfix} data structure; |
767 no syntax translation is indicated by @{ML_ind NoSyn}. |
768 no syntax translation is indicated by @{ML_ind NoSyn}. |
768 |
769 |
769 \begin{readmore} |
770 \begin{readmore} |
770 The data structure for mixfix annotations is defined in @{ML_file "Pure/Syntax/mixfix.ML"}. |
771 The data structure for mixfix annotations is defined in @{ML_file "Pure/Syntax/mixfix.ML"}. |
771 \end{readmore} |
772 \end{readmore} |
772 |
773 |
773 Lines 3 to 7 in the function @{ML spec_parser} implement the parser for a |
774 Lines 3 to 7 in the function @{ML spec_parser} implement the parser for a |
774 list of introduction rules, that is propositions with theorem annotations |
775 list of introduction rules, that is propositions with theorem annotations |
775 such as rule names and attributes. The introduction rules are propositions |
776 such as rule names and attributes. The introduction rules are propositions |
776 parsed by @{ML_ind prop in OuterParse}. However, they can include an optional |
777 parsed by @{ML_ind prop in OuterParse}. However, they can include an optional |
777 theorem name plus some attributes. For example |
778 theorem name plus some attributes. For example |
778 |
779 |
779 @{ML_response [display,gray] "let |
780 @{ML_response [display,gray] "let |
780 val input = filtered_input \"foo_lemma[intro,dest!]:\" |
781 val input = filtered_input \"foo_lemma[intro,dest!]:\" |
781 val ((name, attrib), _) = parse (SpecParse.thm_name \":\") input |
782 val ((name, attrib), _) = parse (SpecParse.thm_name \":\") input |
782 in |
783 in |
783 (name, map Args.dest_src attrib) |
784 (name, map Args.dest_src attrib) |
784 end" "(foo_lemma, [((\"intro\", []), \<dots>), ((\"dest\", [\<dots>]), \<dots>)])"} |
785 end" "(foo_lemma, [((\"intro\", []), \<dots>), ((\"dest\", [\<dots>]), \<dots>)])"} |
785 |
786 |
786 The function @{ML_ind opt_thm_name in SpecParse} is the ``optional'' variant of |
787 The function @{ML_ind opt_thm_name in SpecParse} is the ``optional'' variant of |
787 @{ML_ind thm_name in SpecParse}. Theorem names can contain attributes. The name |
788 @{ML_ind thm_name in SpecParse}. Theorem names can contain attributes. The name |
788 has to end with @{text [quotes] ":"}---see the argument of |
789 has to end with @{text [quotes] ":"}---see the argument of |
789 the function @{ML SpecParse.opt_thm_name} in Line 7. |
790 the function @{ML SpecParse.opt_thm_name} in Line 7. |
790 |
791 |
791 \begin{readmore} |
792 \begin{readmore} |
792 Attributes and arguments are implemented in the files @{ML_file "Pure/Isar/attrib.ML"} |
793 Attributes and arguments are implemented in the files @{ML_file "Pure/Isar/attrib.ML"} |
848 in |
849 in |
849 OuterSyntax.local_theory "foobar" "description of foobar" kind do_nothing |
850 OuterSyntax.local_theory "foobar" "description of foobar" kind do_nothing |
850 end *} |
851 end *} |
851 |
852 |
852 text {* |
853 text {* |
853 The crucial function @{ML_ind local_theory in OuterSyntax} expects a name for the command, a |
854 The crucial function @{ML_ind local_theory in OuterSyntax} expects a name for the command, a |
854 short description, a kind indicator (which we will explain later more thoroughly) and a |
855 short description, a kind indicator (which we will explain later more thoroughly) and a |
855 parser producing a local theory transition (its purpose will also explained |
856 parser producing a local theory transition (its purpose will also explained |
856 later). |
857 later). |
857 |
858 |
858 While this is everything you have to do on the ML-level, you need a keyword |
859 While this is everything you have to do on the ML-level, you need a keyword |
994 refine it a bit next by letting it take a proposition as argument and |
995 refine it a bit next by letting it take a proposition as argument and |
995 printing this proposition inside the tracing buffer. |
996 printing this proposition inside the tracing buffer. |
996 |
997 |
997 The crucial part of a command is the function that determines the behaviour |
998 The crucial part of a command is the function that determines the behaviour |
998 of the command. In the code above we used a ``do-nothing''-function, which |
999 of the command. In the code above we used a ``do-nothing''-function, which |
999 because of @{ML_ind succeed in Scan} does not parse any argument, but immediately |
1000 because of @{ML_ind succeed in Scan} does not parse any argument, but immediately |
1000 returns the simple function @{ML "LocalTheory.theory I"}. We can |
1001 returns the simple function @{ML "LocalTheory.theory I"}. We can |
1001 replace this code by a function that first parses a proposition (using the |
1002 replace this code by a function that first parses a proposition (using the |
1002 parser @{ML OuterParse.prop}), then prints out the tracing |
1003 parser @{ML OuterParse.prop}), then prints out the tracing |
1003 information (using a new function @{text trace_prop}) and |
1004 information (using a new function @{text trace_prop}) and |
1004 finally does nothing. For this you can write: |
1005 finally does nothing. For this you can write: |
1020 *} |
1021 *} |
1021 |
1022 |
1022 foobar_trace "True \<and> False" |
1023 foobar_trace "True \<and> False" |
1023 |
1024 |
1024 text {* |
1025 text {* |
1025 Note that so far we used @{ML_ind thy_decl in OuterKeyword} as the kind |
1026 Note that so far we used @{ML_ind thy_decl in OuterKeyword} as the kind |
1026 indicator for the command. This means that the command finishes as soon as |
1027 indicator for the command. This means that the command finishes as soon as |
1027 the arguments are processed. Examples of this kind of commands are |
1028 the arguments are processed. Examples of this kind of commands are |
1028 \isacommand{definition} and \isacommand{declare}. In other cases, commands |
1029 \isacommand{definition} and \isacommand{declare}. In other cases, commands |
1029 are expected to parse some arguments, for example a proposition, and then |
1030 are expected to parse some arguments, for example a proposition, and then |
1030 ``open up'' a proof in order to prove the proposition (for example |
1031 ``open up'' a proof in order to prove the proposition (for example |
1031 \isacommand{lemma}) or prove some other properties (for example |
1032 \isacommand{lemma}) or prove some other properties (for example |
1032 \isacommand{function}). To achieve this kind of behaviour, you have to use |
1033 \isacommand{function}). To achieve this kind of behaviour, you have to use |
1033 the kind indicator @{ML_ind thy_goal in OuterKeyword} and the function @{ML |
1034 the kind indicator @{ML_ind thy_goal in OuterKeyword} and the function @{ML |
1034 "local_theory_to_proof" in OuterSyntax} to set up the command. Note, |
1035 "local_theory_to_proof" in OuterSyntax} to set up the command. Note, |
1035 however, once you change the ``kind'' of a command from @{ML thy_decl in |
1036 however, once you change the ``kind'' of a command from @{ML thy_decl in |
1036 OuterKeyword} to @{ML thy_goal in OuterKeyword} then the keyword file needs |
1037 OuterKeyword} to @{ML thy_goal in OuterKeyword} then the keyword file needs |
1037 to be re-created! |
1038 to be re-created! |
1038 |
1039 |
1057 end *} |
1058 end *} |
1058 |
1059 |
1059 text {* |
1060 text {* |
1060 The function @{text goal_prop} in Lines 2 to 7 takes a string (the proposition to be |
1061 The function @{text goal_prop} in Lines 2 to 7 takes a string (the proposition to be |
1061 proved) and a context as argument. The context is necessary in order to be able to use |
1062 proved) and a context as argument. The context is necessary in order to be able to use |
1062 @{ML_ind read_prop in Syntax}, which converts a string into a proper proposition. |
1063 @{ML_ind read_prop in Syntax}, which converts a string into a proper proposition. |
1063 In Line 6 the function @{ML_ind theorem_i in Proof} starts the proof for the |
1064 In Line 6 the function @{ML_ind theorem_i in Proof} starts the proof for the |
1064 proposition. Its argument @{ML NONE} stands for a locale (which we chose to |
1065 proposition. Its argument @{ML NONE} stands for a locale (which we chose to |
1065 omit); the argument @{ML "(K I)"} stands for a function that determines what |
1066 omit); the argument @{ML "(K I)"} stands for a function that determines what |
1066 should be done with the theorem once it is proved (we chose to just forget |
1067 should be done with the theorem once it is proved (we chose to just forget |
1067 about it). Line 9 contains the parser for the proposition. |
1068 about it). Line 9 contains the parser for the proposition. |
1068 |
1069 |
1264 |
1265 |
1265 text {* |
1266 text {* |
1266 It defines the method @{text foo}, which takes no arguments (therefore the |
1267 It defines the method @{text foo}, which takes no arguments (therefore the |
1267 parser @{ML Scan.succeed}) and only applies a single tactic, namely the tactic which |
1268 parser @{ML Scan.succeed}) and only applies a single tactic, namely the tactic which |
1268 applies @{thm [source] conjE} and then @{thm [source] conjI}. The function |
1269 applies @{thm [source] conjE} and then @{thm [source] conjI}. The function |
1269 @{ML_ind SIMPLE_METHOD} |
1270 @{ML_ind SIMPLE_METHOD in Method} |
1270 turns such a tactic into a method. The method @{text "foo"} can be used as follows |
1271 turns such a tactic into a method. The method @{text "foo"} can be used as follows |
1271 *} |
1272 *} |
1272 |
1273 |
1273 lemma shows "A \<and> B \<Longrightarrow> C \<and> D" |
1274 lemma shows "A \<and> B \<Longrightarrow> C \<and> D" |
1274 apply(foo) |
1275 apply(foo) |