35 section {* Building Generic Parsers *} |
35 section {* Building Generic Parsers *} |
36 |
36 |
37 text {* |
37 text {* |
38 |
38 |
39 Let us first have a look at parsing strings using generic parsing |
39 Let us first have a look at parsing strings using generic parsing |
40 combinators. The function @{ML [index] "$$"} takes a string as argument and will |
40 combinators. The function @{ML_ind [index] "$$"} takes a string as argument and will |
41 ``consume'' this string from a given input list of strings. ``Consume'' in |
41 ``consume'' this string from a given input list of strings. ``Consume'' in |
42 this context means that it will return a pair consisting of this string and |
42 this context means that it will return a pair consisting of this string and |
43 the rest of the input list. For example: |
43 the rest of the input list. For example: |
44 |
44 |
45 @{ML_response [display,gray] |
45 @{ML_response [display,gray] |
87 end" |
87 end" |
88 "([\"\\\", \"<\", \"f\", \"o\", \"o\", \">\", \" \", \"b\", \"a\", \"r\"], |
88 "([\"\\\", \"<\", \"f\", \"o\", \"o\", \">\", \" \", \"b\", \"a\", \"r\"], |
89 [\"\<foo>\", \" \", \"b\", \"a\", \"r\"])"} |
89 [\"\<foo>\", \" \", \"b\", \"a\", \"r\"])"} |
90 |
90 |
91 Slightly more general than the parser @{ML "$$"} is the function |
91 Slightly more general than the parser @{ML "$$"} is the function |
92 @{ML [index] one in Scan}, in that it takes a predicate as argument and |
92 @{ML_ind [index] one in Scan}, in that it takes a predicate as argument and |
93 then parses exactly |
93 then parses exactly |
94 one item from the input list satisfying this predicate. For example the |
94 one item from the input list satisfying this predicate. For example the |
95 following parser either consumes an @{text [quotes] "h"} or a @{text |
95 following parser either consumes an @{text [quotes] "h"} or a @{text |
96 [quotes] "w"}: |
96 [quotes] "w"}: |
97 |
97 |
103 in |
103 in |
104 (hw input1, hw input2) |
104 (hw input1, hw input2) |
105 end" |
105 end" |
106 "((\"h\", [\"e\", \"l\", \"l\", \"o\"]),(\"w\", [\"o\", \"r\", \"l\", \"d\"]))"} |
106 "((\"h\", [\"e\", \"l\", \"l\", \"o\"]),(\"w\", [\"o\", \"r\", \"l\", \"d\"]))"} |
107 |
107 |
108 Two parsers can be connected in sequence by using the function @{ML [index] "--"}. |
108 Two parsers can be connected in sequence by using the function @{ML_ind [index] "--"}. |
109 For example parsing @{text "h"}, @{text "e"} and @{text "l"} (in this |
109 For example parsing @{text "h"}, @{text "e"} and @{text "l"} (in this |
110 order) you can achieve by: |
110 order) you can achieve by: |
111 |
111 |
112 @{ML_response [display,gray] |
112 @{ML_response [display,gray] |
113 "($$ \"h\" -- $$ \"e\" -- $$ \"l\") (Symbol.explode \"hello\")" |
113 "($$ \"h\" -- $$ \"e\" -- $$ \"l\") (Symbol.explode \"hello\")" |
114 "(((\"h\", \"e\"), \"l\"), [\"l\", \"o\"])"} |
114 "(((\"h\", \"e\"), \"l\"), [\"l\", \"o\"])"} |
115 |
115 |
116 Note how the result of consumed strings builds up on the left as nested pairs. |
116 Note how the result of consumed strings builds up on the left as nested pairs. |
117 |
117 |
118 If, as in the previous example, you want to parse a particular string, |
118 If, as in the previous example, you want to parse a particular string, |
119 then you should use the function @{ML [index] this_string in Scan}: |
119 then you should use the function @{ML_ind [index] this_string in Scan}: |
120 |
120 |
121 @{ML_response [display,gray] |
121 @{ML_response [display,gray] |
122 "Scan.this_string \"hell\" (Symbol.explode \"hello\")" |
122 "Scan.this_string \"hell\" (Symbol.explode \"hello\")" |
123 "(\"hell\", [\"o\"])"} |
123 "(\"hell\", [\"o\"])"} |
124 |
124 |
125 Parsers that explore alternatives can be constructed using the function |
125 Parsers that explore alternatives can be constructed using the function |
126 @{ML [index] "||"}. The parser @{ML "(p || q)" for p q} returns the |
126 @{ML_ind [index] "||"}. The parser @{ML "(p || q)" for p q} returns the |
127 result of @{text "p"}, in case it succeeds, otherwise it returns the |
127 result of @{text "p"}, in case it succeeds, otherwise it returns the |
128 result of @{text "q"}. For example: |
128 result of @{text "q"}. For example: |
129 |
129 |
130 |
130 |
131 @{ML_response [display,gray] |
131 @{ML_response [display,gray] |
136 in |
136 in |
137 (hw input1, hw input2) |
137 (hw input1, hw input2) |
138 end" |
138 end" |
139 "((\"h\", [\"e\", \"l\", \"l\", \"o\"]), (\"w\", [\"o\", \"r\", \"l\", \"d\"]))"} |
139 "((\"h\", [\"e\", \"l\", \"l\", \"o\"]), (\"w\", [\"o\", \"r\", \"l\", \"d\"]))"} |
140 |
140 |
141 The functions @{ML [index] "|--"} and @{ML [index] "--|"} work like the sequencing function |
141 The functions @{ML_ind [index] "|--"} and @{ML_ind [index] "--|"} work like the sequencing function |
142 for parsers, except that they discard the item being parsed by the first (respectively second) |
142 for parsers, except that they discard the item being parsed by the first (respectively second) |
143 parser. For example: |
143 parser. For example: |
144 |
144 |
145 @{ML_response [display,gray] |
145 @{ML_response [display,gray] |
146 "let |
146 "let |
165 in |
165 in |
166 (p input1, p input2) |
166 (p input1, p input2) |
167 end" |
167 end" |
168 "((\"h\", [\"e\", \"l\", \"l\", \"o\"]), (\"x\", [\"w\", \"o\", \"r\", \"l\", \"d\"]))"} |
168 "((\"h\", [\"e\", \"l\", \"l\", \"o\"]), (\"x\", [\"w\", \"o\", \"r\", \"l\", \"d\"]))"} |
169 |
169 |
170 The function @{ML [index] option in Scan} works similarly, except no default value can |
170 The function @{ML_ind [index] option in Scan} works similarly, except no default value can |
171 be given. Instead, the result is wrapped as an @{text "option"}-type. For example: |
171 be given. Instead, the result is wrapped as an @{text "option"}-type. For example: |
172 |
172 |
173 @{ML_response [display,gray] |
173 @{ML_response [display,gray] |
174 "let |
174 "let |
175 val p = Scan.option ($$ \"h\") |
175 val p = Scan.option ($$ \"h\") |
177 val input2 = Symbol.explode \"world\" |
177 val input2 = Symbol.explode \"world\" |
178 in |
178 in |
179 (p input1, p input2) |
179 (p input1, p input2) |
180 end" "((SOME \"h\", [\"e\", \"l\", \"l\", \"o\"]), (NONE, [\"w\", \"o\", \"r\", \"l\", \"d\"]))"} |
180 end" "((SOME \"h\", [\"e\", \"l\", \"l\", \"o\"]), (NONE, [\"w\", \"o\", \"r\", \"l\", \"d\"]))"} |
181 |
181 |
182 The function @{ML [index] "!!"} helps to produce appropriate error messages |
182 The function @{ML_ind [index] "!!"} helps to produce appropriate error messages |
183 for parsing. For example if you want to parse @{text p} immediately |
183 for parsing. For example if you want to parse @{text p} immediately |
184 followed by @{text q}, or start a completely different parser @{text r}, |
184 followed by @{text q}, or start a completely different parser @{text r}, |
185 you might write: |
185 you might write: |
186 |
186 |
187 @{ML [display,gray] "(p -- q) || r" for p q r} |
187 @{ML [display,gray] "(p -- q) || r" for p q r} |
213 @{ML_response_fake [display,gray] "(!! (fn _ => \"foo\") ($$ \"h\")) (Symbol.explode \"world\")" |
213 @{ML_response_fake [display,gray] "(!! (fn _ => \"foo\") ($$ \"h\")) (Symbol.explode \"world\")" |
214 "Exception ABORT raised"} |
214 "Exception ABORT raised"} |
215 |
215 |
216 then the parsing aborts and the error message @{text "foo"} is printed. In order to |
216 then the parsing aborts and the error message @{text "foo"} is printed. In order to |
217 see the error message properly, you need to prefix the parser with the function |
217 see the error message properly, you need to prefix the parser with the function |
218 @{ML [index] error in Scan}. For example: |
218 @{ML_ind [index] error in Scan}. For example: |
219 |
219 |
220 @{ML_response_fake [display,gray] |
220 @{ML_response_fake [display,gray] |
221 "Scan.error (!! (fn _ => \"foo\") ($$ \"h\"))" |
221 "Scan.error (!! (fn _ => \"foo\") ($$ \"h\"))" |
222 "Exception Error \"foo\" raised"} |
222 "Exception Error \"foo\" raised"} |
223 |
223 |
224 This ``prefixing'' is usually done by wrappers such as @{ML [index] local_theory in OuterSyntax} |
224 This ``prefixing'' is usually done by wrappers such as @{ML_ind [index] local_theory in OuterSyntax} |
225 (see Section~\ref{sec:newcommand} which explains this function in more detail). |
225 (see Section~\ref{sec:newcommand} which explains this function in more detail). |
226 |
226 |
227 Let us now return to our example of parsing @{ML "(p -- q) || r" for p q |
227 Let us now return to our example of parsing @{ML "(p -- q) || r" for p q |
228 r}. If you want to generate the correct error message for |
228 r}. If you want to generate the correct error message for |
229 @{text "p"}-followed-by-@{text "q"}, then you have to write: |
229 @{text "p"}-followed-by-@{text "q"}, then you have to write: |
256 often as it succeeds. For example: |
256 often as it succeeds. For example: |
257 |
257 |
258 @{ML_response [display,gray] "Scan.repeat ($$ \"h\") (Symbol.explode \"hhhhello\")" |
258 @{ML_response [display,gray] "Scan.repeat ($$ \"h\") (Symbol.explode \"hhhhello\")" |
259 "([\"h\", \"h\", \"h\", \"h\"], [\"e\", \"l\", \"l\", \"o\"])"} |
259 "([\"h\", \"h\", \"h\", \"h\"], [\"e\", \"l\", \"l\", \"o\"])"} |
260 |
260 |
261 Note that @{ML [index] repeat in Scan} stores the parsed items in a list. The function |
261 Note that @{ML_ind [index] repeat in Scan} stores the parsed items in a list. The function |
262 @{ML [index] repeat1 in Scan} is similar, but requires that the parser @{text "p"} |
262 @{ML_ind [index] repeat1 in Scan} is similar, but requires that the parser @{text "p"} |
263 succeeds at least once. |
263 succeeds at least once. |
264 |
264 |
265 Also note that the parser would have aborted with the exception @{text MORE}, if |
265 Also note that the parser would have aborted with the exception @{text MORE}, if |
266 you had run it only on just @{text [quotes] "hhhh"}. This can be avoided by using |
266 you had run it only on just @{text [quotes] "hhhh"}. This can be avoided by using |
267 the wrapper @{ML [index] finite in Scan} and the ``stopper-token'' |
267 the wrapper @{ML_ind [index] finite in Scan} and the ``stopper-token'' |
268 @{ML [index] stopper in Symbol}. With them you can write: |
268 @{ML_ind [index] stopper in Symbol}. With them you can write: |
269 |
269 |
270 @{ML_response [display,gray] "Scan.finite Symbol.stopper (Scan.repeat ($$ \"h\")) (Symbol.explode \"hhhh\")" |
270 @{ML_response [display,gray] "Scan.finite Symbol.stopper (Scan.repeat ($$ \"h\")) (Symbol.explode \"hhhh\")" |
271 "([\"h\", \"h\", \"h\", \"h\"], [])"} |
271 "([\"h\", \"h\", \"h\", \"h\"], [])"} |
272 |
272 |
273 @{ML Symbol.stopper} is the ``end-of-input'' indicator for parsing strings; |
273 @{ML Symbol.stopper} is the ``end-of-input'' indicator for parsing strings; |
274 other stoppers need to be used when parsing, for example, tokens. However, this kind of |
274 other stoppers need to be used when parsing, for example, tokens. However, this kind of |
275 manually wrapping is often already done by the surrounding infrastructure. |
275 manually wrapping is often already done by the surrounding infrastructure. |
276 |
276 |
277 The function @{ML [index] repeat in Scan} can be used with @{ML [index] one in Scan} to read any |
277 The function @{ML_ind [index] repeat in Scan} can be used with @{ML_ind [index] one in Scan} to read any |
278 string as in |
278 string as in |
279 |
279 |
280 @{ML_response [display,gray] |
280 @{ML_response [display,gray] |
281 "let |
281 "let |
282 val p = Scan.repeat (Scan.one Symbol.not_eof) |
282 val p = Scan.repeat (Scan.one Symbol.not_eof) |
284 in |
284 in |
285 Scan.finite Symbol.stopper p input |
285 Scan.finite Symbol.stopper p input |
286 end" |
286 end" |
287 "([\"f\", \"o\", \"o\", \" \", \"b\", \"a\", \"r\", \" \", \"f\", \"o\", \"o\"], [])"} |
287 "([\"f\", \"o\", \"o\", \" \", \"b\", \"a\", \"r\", \" \", \"f\", \"o\", \"o\"], [])"} |
288 |
288 |
289 where the function @{ML [index] not_eof in Symbol} ensures that we do not read beyond the |
289 where the function @{ML_ind [index] not_eof in Symbol} ensures that we do not read beyond the |
290 end of the input string (i.e.~stopper symbol). |
290 end of the input string (i.e.~stopper symbol). |
291 |
291 |
292 The function @{ML "Scan.unless p q" for p q} takes two parsers: if the first one can |
292 The function @{ML_ind [index] unless in Scan} takes two parsers: if the first one can |
293 parse the input, then the whole parser fails; if not, then the second is tried. Therefore |
293 parse the input, then the whole parser fails; if not, then the second is tried. Therefore |
294 |
294 |
295 @{ML_response_fake_both [display,gray] "Scan.unless ($$ \"h\") ($$ \"w\") (Symbol.explode \"hello\")" |
295 @{ML_response_fake_both [display,gray] "Scan.unless ($$ \"h\") ($$ \"w\") (Symbol.explode \"hello\")" |
296 "Exception FAIL raised"} |
296 "Exception FAIL raised"} |
297 |
297 |
300 @{ML_response [display,gray] "Scan.unless ($$ \"h\") ($$ \"w\") (Symbol.explode \"world\")" |
300 @{ML_response [display,gray] "Scan.unless ($$ \"h\") ($$ \"w\") (Symbol.explode \"world\")" |
301 "(\"w\",[\"o\", \"r\", \"l\", \"d\"])"} |
301 "(\"w\",[\"o\", \"r\", \"l\", \"d\"])"} |
302 |
302 |
303 succeeds. |
303 succeeds. |
304 |
304 |
305 The functions @{ML [index] repeat in Scan} and @{ML [index] unless in Scan} can |
305 The functions @{ML_ind [index] repeat in Scan} and @{ML_ind [index] unless in Scan} can |
306 be combined to read any input until a certain marker symbol is reached. In the |
306 be combined to read any input until a certain marker symbol is reached. In the |
307 example below the marker symbol is a @{text [quotes] "*"}. |
307 example below the marker symbol is a @{text [quotes] "*"}. |
308 |
308 |
309 @{ML_response [display,gray] |
309 @{ML_response [display,gray] |
310 "let |
310 "let |
318 "(([\"f\", \"o\", \"o\", \"o\", \"o\", \"o\"], []), |
318 "(([\"f\", \"o\", \"o\", \"o\", \"o\", \"o\"], []), |
319 ([\"f\", \"o\", \"o\"], [\"*\", \"o\", \"o\", \"o\"]))"} |
319 ([\"f\", \"o\", \"o\"], [\"*\", \"o\", \"o\", \"o\"]))"} |
320 |
320 |
321 |
321 |
322 After parsing is done, you almost always want to apply a function to the parsed |
322 After parsing is done, you almost always want to apply a function to the parsed |
323 items. One way to do this is the function @{ML [index]">>"} where |
323 items. One way to do this is the function @{ML_ind [index]">>"} where |
324 @{ML "(p >> f)" for p f} runs |
324 @{ML "(p >> f)" for p f} runs |
325 first the parser @{text p} and upon successful completion applies the |
325 first the parser @{text p} and upon successful completion applies the |
326 function @{text f} to the result. For example |
326 function @{text f} to the result. For example |
327 |
327 |
328 @{ML_response [display,gray] |
328 @{ML_response [display,gray] |
347 where the single-character strings in the parsed output are transformed |
347 where the single-character strings in the parsed output are transformed |
348 back into one string. |
348 back into one string. |
349 |
349 |
350 (FIXME: move to an earlier place) |
350 (FIXME: move to an earlier place) |
351 |
351 |
352 The function @{ML [index] ahead in Scan} parses some input, but leaves the original |
352 The function @{ML_ind [index] ahead in Scan} parses some input, but leaves the original |
353 input unchanged. For example: |
353 input unchanged. For example: |
354 |
354 |
355 @{ML_response [display,gray] |
355 @{ML_response [display,gray] |
356 "Scan.ahead (Scan.this_string \"foo\") (Symbol.explode \"foo\")" |
356 "Scan.ahead (Scan.this_string \"foo\") (Symbol.explode \"foo\")" |
357 "(\"foo\", [\"f\", \"o\", \"o\"])"} |
357 "(\"foo\", [\"f\", \"o\", \"o\"])"} |
358 |
358 |
359 The function @{ML [index] lift in Scan} takes a parser and a pair as arguments. This function applies |
359 The function @{ML_ind [index] lift in Scan} takes a parser and a pair as arguments. This function applies |
360 the given parser to the second component of the pair and leaves the first component |
360 the given parser to the second component of the pair and leaves the first component |
361 untouched. For example |
361 untouched. For example |
362 |
362 |
363 @{ML_response [display,gray] |
363 @{ML_response [display,gray] |
364 "Scan.lift ($$ \"h\" -- $$ \"e\") (1, Symbol.explode \"hello\")" |
364 "Scan.lift ($$ \"h\" -- $$ \"e\") (1, Symbol.explode \"hello\")" |
395 @{ML_struct OuterParse} defined in the file @{ML_file "Pure/Isar/outer_parse.ML"}. |
395 @{ML_struct OuterParse} defined in the file @{ML_file "Pure/Isar/outer_parse.ML"}. |
396 The definition for tokens is in the file @{ML_file "Pure/Isar/outer_lex.ML"}. |
396 The definition for tokens is in the file @{ML_file "Pure/Isar/outer_lex.ML"}. |
397 \end{readmore} |
397 \end{readmore} |
398 |
398 |
399 The structure @{ML_struct [index] OuterLex} defines several kinds of tokens (for |
399 The structure @{ML_struct [index] OuterLex} defines several kinds of tokens (for |
400 example @{ML [index] Ident in OuterLex} for identifiers, @{ML Keyword in |
400 example @{ML_ind [index] Ident in OuterLex} for identifiers, @{ML Keyword in |
401 OuterLex} for keywords and @{ML [index] Command in OuterLex} for commands). Some |
401 OuterLex} for keywords and @{ML_ind [index] Command in OuterLex} for commands). Some |
402 token parsers take into account the kind of tokens. The first example shows |
402 token parsers take into account the kind of tokens. The first example shows |
403 how to generate a token list out of a string using the function |
403 how to generate a token list out of a string using the function |
404 @{ML [index] scan in OuterSyntax}. It is given the argument |
404 @{ML_ind [index] scan in OuterSyntax}. It is given the argument |
405 @{ML "Position.none"} since, |
405 @{ML "Position.none"} since, |
406 at the moment, we are not interested in generating precise error |
406 at the moment, we are not interested in generating precise error |
407 messages. The following code\footnote{Note that because of a possible bug in |
407 messages. The following code\footnote{Note that because of a possible bug in |
408 the PolyML runtime system, the result is printed as @{text [quotes] "?"}, |
408 the PolyML runtime system, the result is printed as @{text [quotes] "?"}, |
409 instead of the tokens.} |
409 instead of the tokens.} |
417 produces three tokens where the first and the last are identifiers, since |
417 produces three tokens where the first and the last are identifiers, since |
418 @{text [quotes] "hello"} and @{text [quotes] "world"} do not match any |
418 @{text [quotes] "hello"} and @{text [quotes] "world"} do not match any |
419 other syntactic category. The second indicates a space. |
419 other syntactic category. The second indicates a space. |
420 |
420 |
421 We can easily change what is recognised as a keyword with |
421 We can easily change what is recognised as a keyword with |
422 @{ML [index] keyword in OuterKeyword}. For example calling this function |
422 @{ML_ind [index] keyword in OuterKeyword}. For example calling this function |
423 *} |
423 *} |
424 |
424 |
425 ML{*val _ = OuterKeyword.keyword "hello"*} |
425 ML{*val _ = OuterKeyword.keyword "hello"*} |
426 |
426 |
427 text {* |
427 text {* |
432 Token (\<dots>,(Space, \" \"),\<dots>), |
432 Token (\<dots>,(Space, \" \"),\<dots>), |
433 Token (\<dots>,(Ident, \"world\"),\<dots>)]"} |
433 Token (\<dots>,(Ident, \"world\"),\<dots>)]"} |
434 |
434 |
435 Many parsing functions later on will require white space, comments and the like |
435 Many parsing functions later on will require white space, comments and the like |
436 to have already been filtered out. So from now on we are going to use the |
436 to have already been filtered out. So from now on we are going to use the |
437 functions @{ML filter} and @{ML [index] is_proper in OuterLex} to do this. |
437 functions @{ML filter} and @{ML_ind [index] is_proper in OuterLex} to do this. |
438 For example: |
438 For example: |
439 |
439 |
440 @{ML_response_fake [display,gray] |
440 @{ML_response_fake [display,gray] |
441 "let |
441 "let |
442 val input = OuterSyntax.scan Position.none \"hello world\" |
442 val input = OuterSyntax.scan Position.none \"hello world\" |
470 in |
470 in |
471 (Scan.dest_lexicon commands, Scan.dest_lexicon keywords) |
471 (Scan.dest_lexicon commands, Scan.dest_lexicon keywords) |
472 end" |
472 end" |
473 "([\"}\", \"{\", \<dots>], [\"\<rightleftharpoons>\", \"\<leftharpoondown>\", \<dots>])"} |
473 "([\"}\", \"{\", \<dots>], [\"\<rightleftharpoons>\", \"\<leftharpoondown>\", \<dots>])"} |
474 |
474 |
475 You might have to adjust the @{ML [index] print_depth} in order to |
475 You might have to adjust the @{ML_ind [index] print_depth} in order to |
476 see the complete list. |
476 see the complete list. |
477 |
477 |
478 The parser @{ML [index] "$$$" in OuterParse} parses a single keyword. For example: |
478 The parser @{ML_ind [index] "$$$" in OuterParse} parses a single keyword. For example: |
479 |
479 |
480 @{ML_response [display,gray] |
480 @{ML_response [display,gray] |
481 "let |
481 "let |
482 val input1 = filtered_input \"where for\" |
482 val input1 = filtered_input \"where for\" |
483 val input2 = filtered_input \"| in\" |
483 val input2 = filtered_input \"| in\" |
484 in |
484 in |
485 (OuterParse.$$$ \"where\" input1, OuterParse.$$$ \"|\" input2) |
485 (OuterParse.$$$ \"where\" input1, OuterParse.$$$ \"|\" input2) |
486 end" |
486 end" |
487 "((\"where\",\<dots>), (\"|\",\<dots>))"} |
487 "((\"where\",\<dots>), (\"|\",\<dots>))"} |
488 |
488 |
489 Any non-keyword string can be parsed with the function @{ML [index] reserved in OuterParse}. |
489 Any non-keyword string can be parsed with the function @{ML_ind [index] reserved in OuterParse}. |
490 For example: |
490 For example: |
491 |
491 |
492 @{ML_response [display,gray] |
492 @{ML_response [display,gray] |
493 "let |
493 "let |
494 val p = OuterParse.reserved \"bar\" |
494 val p = OuterParse.reserved \"bar\" |
519 in |
519 in |
520 (OuterParse.enum \"|\" (OuterParse.$$$ \"in\")) input |
520 (OuterParse.enum \"|\" (OuterParse.$$$ \"in\")) input |
521 end" |
521 end" |
522 "([\"in\", \"in\", \"in\"], [\<dots>])"} |
522 "([\"in\", \"in\", \"in\"], [\<dots>])"} |
523 |
523 |
524 @{ML [index] enum1 in OuterParse} works similarly, except that the parsed list must |
524 @{ML_ind [index] enum1 in OuterParse} works similarly, except that the parsed list must |
525 be non-empty. Note that we had to add a string @{text [quotes] "foo"} at the |
525 be non-empty. Note that we had to add a string @{text [quotes] "foo"} at the |
526 end of the parsed string, otherwise the parser would have consumed all |
526 end of the parsed string, otherwise the parser would have consumed all |
527 tokens and then failed with the exception @{text "MORE"}. Like in the |
527 tokens and then failed with the exception @{text "MORE"}. Like in the |
528 previous section, we can avoid this exception using the wrapper @{ML |
528 previous section, we can avoid this exception using the wrapper @{ML |
529 Scan.finite}. This time, however, we have to use the ``stopper-token'' @{ML |
529 Scan.finite}. This time, however, we have to use the ``stopper-token'' @{ML |
543 |
543 |
544 ML{*fun parse p input = Scan.finite OuterLex.stopper (Scan.error p) input *} |
544 ML{*fun parse p input = Scan.finite OuterLex.stopper (Scan.error p) input *} |
545 |
545 |
546 text {* |
546 text {* |
547 |
547 |
548 The function @{ML [index] "!!!" in OuterParse} can be used to force termination of the |
548 The function @{ML_ind [index] "!!!" in OuterParse} can be used to force termination of the |
549 parser in case of a dead end, just like @{ML "Scan.!!"} (see previous section). |
549 parser in case of a dead end, just like @{ML "Scan.!!"} (see previous section). |
550 Except that the error message of @{ML "OuterParse.!!!"} is fixed to be |
550 Except that the error message of @{ML "OuterParse.!!!"} is fixed to be |
551 @{text [quotes] "Outer syntax error"} |
551 @{text [quotes] "Outer syntax error"} |
552 together with a relatively precise description of the failure. For example: |
552 together with a relatively precise description of the failure. For example: |
553 |
553 |
562 but keyword in was found\" raised" |
562 but keyword in was found\" raised" |
563 } |
563 } |
564 |
564 |
565 \begin{exercise} (FIXME) |
565 \begin{exercise} (FIXME) |
566 A type-identifier, for example @{typ "'a"}, is a token of |
566 A type-identifier, for example @{typ "'a"}, is a token of |
567 kind @{ML [index] Keyword in OuterLex}. It can be parsed using |
567 kind @{ML_ind [index] Keyword in OuterLex}. It can be parsed using |
568 the function @{ML type_ident in OuterParse}. |
568 the function @{ML type_ident in OuterParse}. |
569 \end{exercise} |
569 \end{exercise} |
570 |
570 |
571 (FIXME: or give parser for numbers) |
571 (FIXME: or give parser for numbers) |
572 |
572 |
624 section {* Parsing Inner Syntax *} |
624 section {* Parsing Inner Syntax *} |
625 |
625 |
626 text {* |
626 text {* |
627 There is usually no need to write your own parser for parsing inner syntax, that is |
627 There is usually no need to write your own parser for parsing inner syntax, that is |
628 for terms and types: you can just call the predefined parsers. Terms can |
628 for terms and types: you can just call the predefined parsers. Terms can |
629 be parsed using the function @{ML [index] term in OuterParse}. For example: |
629 be parsed using the function @{ML_ind [index] term in OuterParse}. For example: |
630 |
630 |
631 @{ML_response [display,gray] |
631 @{ML_response [display,gray] |
632 "let |
632 "let |
633 val input = OuterSyntax.scan Position.none \"foo\" |
633 val input = OuterSyntax.scan Position.none \"foo\" |
634 in |
634 in |
635 OuterParse.term input |
635 OuterParse.term input |
636 end" |
636 end" |
637 "(\"\\^E\\^Ftoken\\^Efoo\\^E\\^F\\^E\", [])"} |
637 "(\"\\^E\\^Ftoken\\^Efoo\\^E\\^F\\^E\", [])"} |
638 |
638 |
639 The function @{ML [index] prop in OuterParse} is similar, except that it gives a different |
639 The function @{ML_ind [index] prop in OuterParse} is similar, except that it gives a different |
640 error message, when parsing fails. As you can see, the parser not just returns |
640 error message, when parsing fails. As you can see, the parser not just returns |
641 the parsed string, but also some encoded information. You can decode the |
641 the parsed string, but also some encoded information. You can decode the |
642 information with the function @{ML [index] parse in YXML}. For example |
642 information with the function @{ML_ind [index] parse in YXML}. For example |
643 |
643 |
644 @{ML_response [display,gray] |
644 @{ML_response [display,gray] |
645 "YXML.parse \"\\^E\\^Ftoken\\^Efoo\\^E\\^F\\^E\"" |
645 "YXML.parse \"\\^E\\^Ftoken\\^Efoo\\^E\\^F\\^E\"" |
646 "XML.Elem (\"token\", [], [XML.Text \"foo\"])"} |
646 "XML.Elem (\"token\", [], [XML.Text \"foo\"])"} |
647 |
647 |
721 |
721 |
722 As you see, the result is a pair consisting of a list of |
722 As you see, the result is a pair consisting of a list of |
723 variables with optional type-annotation and syntax-annotation, and a list of |
723 variables with optional type-annotation and syntax-annotation, and a list of |
724 rules where every rule has optionally a name and an attribute. |
724 rules where every rule has optionally a name and an attribute. |
725 |
725 |
726 The function @{ML [index] "fixes" in OuterParse} in Line 2 of the parser reads an |
726 The function @{ML_ind [index] "fixes" in OuterParse} in Line 2 of the parser reads an |
727 \isacommand{and}-separated |
727 \isacommand{and}-separated |
728 list of variables that can include optional type annotations and syntax translations. |
728 list of variables that can include optional type annotations and syntax translations. |
729 For example:\footnote{Note that in the code we need to write |
729 For example:\footnote{Note that in the code we need to write |
730 @{text "\\\"int \<Rightarrow> bool\\\""} in order to properly escape the double quotes |
730 @{text "\\\"int \<Rightarrow> bool\\\""} in order to properly escape the double quotes |
731 in the compound type.} |
731 in the compound type.} |
745 text {* |
745 text {* |
746 Whenever types are given, they are stored in the @{ML SOME}s. The types are |
746 Whenever types are given, they are stored in the @{ML SOME}s. The types are |
747 not yet used to type the variables: this must be done by type-inference later |
747 not yet used to type the variables: this must be done by type-inference later |
748 on. Since types are part of the inner syntax they are strings with some |
748 on. Since types are part of the inner syntax they are strings with some |
749 encoded information (see previous section). If a mixfix-syntax is |
749 encoded information (see previous section). If a mixfix-syntax is |
750 present for a variable, then it is stored in the @{ML [index] Mixfix} data structure; |
750 present for a variable, then it is stored in the @{ML_ind [index] Mixfix} data structure; |
751 no syntax translation is indicated by @{ML [index] NoSyn}. |
751 no syntax translation is indicated by @{ML_ind [index] NoSyn}. |
752 |
752 |
753 \begin{readmore} |
753 \begin{readmore} |
754 The data structure for mixfix annotations is defined in @{ML_file "Pure/Syntax/mixfix.ML"}. |
754 The data structure for mixfix annotations is defined in @{ML_file "Pure/Syntax/mixfix.ML"}. |
755 \end{readmore} |
755 \end{readmore} |
756 |
756 |
757 Lines 3 to 7 in the function @{ML spec_parser} implement the parser for a |
757 Lines 3 to 7 in the function @{ML spec_parser} implement the parser for a |
758 list of introduction rules, that is propositions with theorem annotations |
758 list of introduction rules, that is propositions with theorem annotations |
759 such as rule names and attributes. The introduction rules are propositions |
759 such as rule names and attributes. The introduction rules are propositions |
760 parsed by @{ML [index] prop in OuterParse}. However, they can include an optional |
760 parsed by @{ML_ind [index] prop in OuterParse}. However, they can include an optional |
761 theorem name plus some attributes. For example |
761 theorem name plus some attributes. For example |
762 |
762 |
763 @{ML_response [display,gray] "let |
763 @{ML_response [display,gray] "let |
764 val input = filtered_input \"foo_lemma[intro,dest!]:\" |
764 val input = filtered_input \"foo_lemma[intro,dest!]:\" |
765 val ((name, attrib), _) = parse (SpecParse.thm_name \":\") input |
765 val ((name, attrib), _) = parse (SpecParse.thm_name \":\") input |
766 in |
766 in |
767 (name, map Args.dest_src attrib) |
767 (name, map Args.dest_src attrib) |
768 end" "(foo_lemma, [((\"intro\", []), \<dots>), ((\"dest\", [\<dots>]), \<dots>)])"} |
768 end" "(foo_lemma, [((\"intro\", []), \<dots>), ((\"dest\", [\<dots>]), \<dots>)])"} |
769 |
769 |
770 The function @{ML [index] opt_thm_name in SpecParse} is the ``optional'' variant of |
770 The function @{ML_ind [index] opt_thm_name in SpecParse} is the ``optional'' variant of |
771 @{ML [index] thm_name in SpecParse}. Theorem names can contain attributes. The name |
771 @{ML_ind [index] thm_name in SpecParse}. Theorem names can contain attributes. The name |
772 has to end with @{text [quotes] ":"}---see the argument of |
772 has to end with @{text [quotes] ":"}---see the argument of |
773 the function @{ML SpecParse.opt_thm_name} in Line 7. |
773 the function @{ML SpecParse.opt_thm_name} in Line 7. |
774 |
774 |
775 \begin{readmore} |
775 \begin{readmore} |
776 Attributes and arguments are implemented in the files @{ML_file "Pure/Isar/attrib.ML"} |
776 Attributes and arguments are implemented in the files @{ML_file "Pure/Isar/attrib.ML"} |
832 in |
832 in |
833 OuterSyntax.local_theory "foobar" "description of foobar" kind do_nothing |
833 OuterSyntax.local_theory "foobar" "description of foobar" kind do_nothing |
834 end *} |
834 end *} |
835 |
835 |
836 text {* |
836 text {* |
837 The crucial function @{ML [index] local_theory in OuterSyntax} expects a name for the command, a |
837 The crucial function @{ML_ind [index] local_theory in OuterSyntax} expects a name for the command, a |
838 short description, a kind indicator (which we will explain later more thoroughly) and a |
838 short description, a kind indicator (which we will explain later more thoroughly) and a |
839 parser producing a local theory transition (its purpose will also explained |
839 parser producing a local theory transition (its purpose will also explained |
840 later). |
840 later). |
841 |
841 |
842 While this is everything you have to do on the ML-level, you need a keyword |
842 While this is everything you have to do on the ML-level, you need a keyword |
972 next by letting it take a proposition as argument and printing this proposition |
972 next by letting it take a proposition as argument and printing this proposition |
973 inside the tracing buffer. |
973 inside the tracing buffer. |
974 |
974 |
975 The crucial part of a command is the function that determines the behaviour |
975 The crucial part of a command is the function that determines the behaviour |
976 of the command. In the code above we used a ``do-nothing''-function, which |
976 of the command. In the code above we used a ``do-nothing''-function, which |
977 because of @{ML [index] succeed in Scan} does not parse any argument, but immediately |
977 because of @{ML_ind [index] succeed in Scan} does not parse any argument, but immediately |
978 returns the simple function @{ML "LocalTheory.theory I"}. We can |
978 returns the simple function @{ML "LocalTheory.theory I"}. We can |
979 replace this code by a function that first parses a proposition (using the |
979 replace this code by a function that first parses a proposition (using the |
980 parser @{ML OuterParse.prop}), then prints out the tracing |
980 parser @{ML OuterParse.prop}), then prints out the tracing |
981 information (using a new function @{text trace_prop}) and |
981 information (using a new function @{text trace_prop}) and |
982 finally does nothing. For this you can write: |
982 finally does nothing. For this you can write: |
1002 @{text "> \"True \<and> False\""} |
1002 @{text "> \"True \<and> False\""} |
1003 \end{isabelle} |
1003 \end{isabelle} |
1004 |
1004 |
1005 and see the proposition in the tracing buffer. |
1005 and see the proposition in the tracing buffer. |
1006 |
1006 |
1007 Note that so far we used @{ML [index] thy_decl in OuterKeyword} as the kind |
1007 Note that so far we used @{ML_ind [index] thy_decl in OuterKeyword} as the kind |
1008 indicator for the command. This means that the command finishes as soon as |
1008 indicator for the command. This means that the command finishes as soon as |
1009 the arguments are processed. Examples of this kind of commands are |
1009 the arguments are processed. Examples of this kind of commands are |
1010 \isacommand{definition} and \isacommand{declare}. In other cases, commands |
1010 \isacommand{definition} and \isacommand{declare}. In other cases, commands |
1011 are expected to parse some arguments, for example a proposition, and then |
1011 are expected to parse some arguments, for example a proposition, and then |
1012 ``open up'' a proof in order to prove the proposition (for example |
1012 ``open up'' a proof in order to prove the proposition (for example |
1013 \isacommand{lemma}) or prove some other properties (for example |
1013 \isacommand{lemma}) or prove some other properties (for example |
1014 \isacommand{function}). To achieve this kind of behaviour, you have to use |
1014 \isacommand{function}). To achieve this kind of behaviour, you have to use |
1015 the kind indicator @{ML [index] thy_goal in OuterKeyword} and the function @{ML |
1015 the kind indicator @{ML_ind [index] thy_goal in OuterKeyword} and the function @{ML |
1016 "local_theory_to_proof" in OuterSyntax} to set up the command. Note, |
1016 "local_theory_to_proof" in OuterSyntax} to set up the command. Note, |
1017 however, once you change the ``kind'' of a command from @{ML thy_decl in |
1017 however, once you change the ``kind'' of a command from @{ML thy_decl in |
1018 OuterKeyword} to @{ML thy_goal in OuterKeyword} then the keyword file needs |
1018 OuterKeyword} to @{ML thy_goal in OuterKeyword} then the keyword file needs |
1019 to be re-created! |
1019 to be re-created! |
1020 |
1020 |
1039 end *} |
1039 end *} |
1040 |
1040 |
1041 text {* |
1041 text {* |
1042 The function @{text prove_prop} in Lines 2 to 7 takes a string (the proposition to be |
1042 The function @{text prove_prop} in Lines 2 to 7 takes a string (the proposition to be |
1043 proved) and a context as argument. The context is necessary in order to be able to use |
1043 proved) and a context as argument. The context is necessary in order to be able to use |
1044 @{ML [index] read_prop in Syntax}, which converts a string into a proper proposition. |
1044 @{ML_ind [index] read_prop in Syntax}, which converts a string into a proper proposition. |
1045 In Line 6 the function @{ML [index] theorem_i in Proof} starts the proof for the |
1045 In Line 6 the function @{ML_ind [index] theorem_i in Proof} starts the proof for the |
1046 proposition. Its argument @{ML NONE} stands for a locale (which we chose to |
1046 proposition. Its argument @{ML NONE} stands for a locale (which we chose to |
1047 omit); the argument @{ML "(K I)"} stands for a function that determines what |
1047 omit); the argument @{ML "(K I)"} stands for a function that determines what |
1048 should be done with the theorem once it is proved (we chose to just forget |
1048 should be done with the theorem once it is proved (we chose to just forget |
1049 about it). Line 9 contains the parser for the proposition. |
1049 about it). Line 9 contains the parser for the proposition. |
1050 |
1050 |
1097 |
1097 |
1098 text {* |
1098 text {* |
1099 It defines the method @{text foo}, which takes no arguments (therefore the |
1099 It defines the method @{text foo}, which takes no arguments (therefore the |
1100 parser @{ML Scan.succeed}) and only applies a single tactic, namely the tactic which |
1100 parser @{ML Scan.succeed}) and only applies a single tactic, namely the tactic which |
1101 applies @{thm [source] conjE} and then @{thm [source] conjI}. The function |
1101 applies @{thm [source] conjE} and then @{thm [source] conjI}. The function |
1102 @{ML [index] SIMPLE_METHOD} |
1102 @{ML_ind [index] SIMPLE_METHOD} |
1103 turns such a tactic into a method. The method @{text "foo"} can be used as follows |
1103 turns such a tactic into a method. The method @{text "foo"} can be used as follows |
1104 *} |
1104 *} |
1105 |
1105 |
1106 lemma shows "A \<and> B \<Longrightarrow> C \<and> D" |
1106 lemma shows "A \<and> B \<Longrightarrow> C \<and> D" |
1107 apply(foo) |
1107 apply(foo) |