35 section {* Building Generic Parsers *} |
35 section {* Building Generic Parsers *} |
36 |
36 |
37 text {* |
37 text {* |
38 |
38 |
39 Let us first have a look at parsing strings using generic parsing |
39 Let us first have a look at parsing strings using generic parsing |
40 combinators. The function @{ML_ind [index] "$$"} takes a string as argument and will |
40 combinators. The function @{ML_ind "$$"} takes a string as argument and will |
41 ``consume'' this string from a given input list of strings. ``Consume'' in |
41 ``consume'' this string from a given input list of strings. ``Consume'' in |
42 this context means that it will return a pair consisting of this string and |
42 this context means that it will return a pair consisting of this string and |
43 the rest of the input list. For example: |
43 the rest of the input list. For example: |
44 |
44 |
45 @{ML_response [display,gray] |
45 @{ML_response [display,gray] |
69 \end{itemize} |
69 \end{itemize} |
70 |
70 |
71 However, note that these exceptions are private to the parser and cannot be accessed |
71 However, note that these exceptions are private to the parser and cannot be accessed |
72 by the programmer (for example to handle them). |
72 by the programmer (for example to handle them). |
73 |
73 |
74 \indexdef{explode@ {\tt\slshape{}explode}}{in {\tt\slshape Symbol}} |
74 In the examples above we use the function @{ML_ind Symbol.explode}, instead of the |
75 \index{explode@ {\tt\slshape{}explode}} |
75 more standard library function @{ML_ind explode}, for obtaining an input list for |
76 In the examples above we use the function @{ML Symbol.explode}, instead of the |
76 the parser. The reason is that @{ML_ind Symbol.explode} is aware of character sequences, |
77 more standard library function @{ML explode}, for obtaining an input list for |
|
78 the parser. The reason is that @{ML Symbol.explode} is aware of character sequences, |
|
79 for example @{text "\<foo>"}, that have a special meaning in Isabelle. To see |
77 for example @{text "\<foo>"}, that have a special meaning in Isabelle. To see |
80 the difference consider |
78 the difference consider |
81 |
79 |
82 @{ML_response_fake [display,gray] |
80 @{ML_response_fake [display,gray] |
83 "let |
81 "let |
87 end" |
85 end" |
88 "([\"\\\", \"<\", \"f\", \"o\", \"o\", \">\", \" \", \"b\", \"a\", \"r\"], |
86 "([\"\\\", \"<\", \"f\", \"o\", \"o\", \">\", \" \", \"b\", \"a\", \"r\"], |
89 [\"\<foo>\", \" \", \"b\", \"a\", \"r\"])"} |
87 [\"\<foo>\", \" \", \"b\", \"a\", \"r\"])"} |
90 |
88 |
91 Slightly more general than the parser @{ML "$$"} is the function |
89 Slightly more general than the parser @{ML "$$"} is the function |
92 @{ML_ind [index] one in Scan}, in that it takes a predicate as argument and |
90 @{ML_ind one in Scan}, in that it takes a predicate as argument and |
93 then parses exactly |
91 then parses exactly |
94 one item from the input list satisfying this predicate. For example the |
92 one item from the input list satisfying this predicate. For example the |
95 following parser either consumes an @{text [quotes] "h"} or a @{text |
93 following parser either consumes an @{text [quotes] "h"} or a @{text |
96 [quotes] "w"}: |
94 [quotes] "w"}: |
97 |
95 |
103 in |
101 in |
104 (hw input1, hw input2) |
102 (hw input1, hw input2) |
105 end" |
103 end" |
106 "((\"h\", [\"e\", \"l\", \"l\", \"o\"]),(\"w\", [\"o\", \"r\", \"l\", \"d\"]))"} |
104 "((\"h\", [\"e\", \"l\", \"l\", \"o\"]),(\"w\", [\"o\", \"r\", \"l\", \"d\"]))"} |
107 |
105 |
108 Two parsers can be connected in sequence by using the function @{ML_ind [index] "--"}. |
106 Two parsers can be connected in sequence by using the function @{ML_ind "--"}. |
109 For example parsing @{text "h"}, @{text "e"} and @{text "l"} (in this |
107 For example parsing @{text "h"}, @{text "e"} and @{text "l"} (in this |
110 order) you can achieve by: |
108 order) you can achieve by: |
111 |
109 |
112 @{ML_response [display,gray] |
110 @{ML_response [display,gray] |
113 "($$ \"h\" -- $$ \"e\" -- $$ \"l\") (Symbol.explode \"hello\")" |
111 "($$ \"h\" -- $$ \"e\" -- $$ \"l\") (Symbol.explode \"hello\")" |
114 "(((\"h\", \"e\"), \"l\"), [\"l\", \"o\"])"} |
112 "(((\"h\", \"e\"), \"l\"), [\"l\", \"o\"])"} |
115 |
113 |
116 Note how the result of consumed strings builds up on the left as nested pairs. |
114 Note how the result of consumed strings builds up on the left as nested pairs. |
117 |
115 |
118 If, as in the previous example, you want to parse a particular string, |
116 If, as in the previous example, you want to parse a particular string, |
119 then you should use the function @{ML_ind [index] this_string in Scan}: |
117 then you should use the function @{ML_ind this_string in Scan}: |
120 |
118 |
121 @{ML_response [display,gray] |
119 @{ML_response [display,gray] |
122 "Scan.this_string \"hell\" (Symbol.explode \"hello\")" |
120 "Scan.this_string \"hell\" (Symbol.explode \"hello\")" |
123 "(\"hell\", [\"o\"])"} |
121 "(\"hell\", [\"o\"])"} |
124 |
122 |
125 Parsers that explore alternatives can be constructed using the function |
123 Parsers that explore alternatives can be constructed using the function |
126 @{ML_ind [index] "||"}. The parser @{ML "(p || q)" for p q} returns the |
124 @{ML_ind "||"}. The parser @{ML "(p || q)" for p q} returns the |
127 result of @{text "p"}, in case it succeeds, otherwise it returns the |
125 result of @{text "p"}, in case it succeeds, otherwise it returns the |
128 result of @{text "q"}. For example: |
126 result of @{text "q"}. For example: |
129 |
127 |
130 |
128 |
131 @{ML_response [display,gray] |
129 @{ML_response [display,gray] |
136 in |
134 in |
137 (hw input1, hw input2) |
135 (hw input1, hw input2) |
138 end" |
136 end" |
139 "((\"h\", [\"e\", \"l\", \"l\", \"o\"]), (\"w\", [\"o\", \"r\", \"l\", \"d\"]))"} |
137 "((\"h\", [\"e\", \"l\", \"l\", \"o\"]), (\"w\", [\"o\", \"r\", \"l\", \"d\"]))"} |
140 |
138 |
141 The functions @{ML_ind [index] "|--"} and @{ML_ind [index] "--|"} work like the sequencing function |
139 The functions @{ML_ind "|--"} and @{ML_ind "--|"} work like the sequencing function |
142 for parsers, except that they discard the item being parsed by the first (respectively second) |
140 for parsers, except that they discard the item being parsed by the first (respectively second) |
143 parser. For example: |
141 parser. For example: |
144 |
142 |
145 @{ML_response [display,gray] |
143 @{ML_response [display,gray] |
146 "let |
144 "let |
150 in |
148 in |
151 (just_e input, just_h input) |
149 (just_e input, just_h input) |
152 end" |
150 end" |
153 "((\"e\", [\"l\", \"l\", \"o\"]), (\"h\", [\"l\", \"l\", \"o\"]))"} |
151 "((\"e\", [\"l\", \"l\", \"o\"]), (\"h\", [\"l\", \"l\", \"o\"]))"} |
154 |
152 |
155 \indexdef{optional@ {\tt\slshape{optional}}}{in {\tt\slshape Scan}} |
|
156 The parser @{ML "Scan.optional p x" for p x} returns the result of the parser |
153 The parser @{ML "Scan.optional p x" for p x} returns the result of the parser |
157 @{text "p"}, if it succeeds; otherwise it returns |
154 @{text "p"}, if it succeeds; otherwise it returns |
158 the default value @{text "x"}. For example: |
155 the default value @{text "x"}. For example: |
159 |
156 |
160 @{ML_response [display,gray] |
157 @{ML_response [display,gray] |
165 in |
162 in |
166 (p input1, p input2) |
163 (p input1, p input2) |
167 end" |
164 end" |
168 "((\"h\", [\"e\", \"l\", \"l\", \"o\"]), (\"x\", [\"w\", \"o\", \"r\", \"l\", \"d\"]))"} |
165 "((\"h\", [\"e\", \"l\", \"l\", \"o\"]), (\"x\", [\"w\", \"o\", \"r\", \"l\", \"d\"]))"} |
169 |
166 |
170 The function @{ML_ind [index] option in Scan} works similarly, except no default value can |
167 The function @{ML_ind option in Scan} works similarly, except no default value can |
171 be given. Instead, the result is wrapped as an @{text "option"}-type. For example: |
168 be given. Instead, the result is wrapped as an @{text "option"}-type. For example: |
172 |
169 |
173 @{ML_response [display,gray] |
170 @{ML_response [display,gray] |
174 "let |
171 "let |
175 val p = Scan.option ($$ \"h\") |
172 val p = Scan.option ($$ \"h\") |
177 val input2 = Symbol.explode \"world\" |
174 val input2 = Symbol.explode \"world\" |
178 in |
175 in |
179 (p input1, p input2) |
176 (p input1, p input2) |
180 end" "((SOME \"h\", [\"e\", \"l\", \"l\", \"o\"]), (NONE, [\"w\", \"o\", \"r\", \"l\", \"d\"]))"} |
177 end" "((SOME \"h\", [\"e\", \"l\", \"l\", \"o\"]), (NONE, [\"w\", \"o\", \"r\", \"l\", \"d\"]))"} |
181 |
178 |
182 The function @{ML_ind [index] "!!"} helps to produce appropriate error messages |
179 The function @{ML_ind "!!"} helps to produce appropriate error messages |
183 for parsing. For example if you want to parse @{text p} immediately |
180 for parsing. For example if you want to parse @{text p} immediately |
184 followed by @{text q}, or start a completely different parser @{text r}, |
181 followed by @{text q}, or start a completely different parser @{text r}, |
185 you might write: |
182 you might write: |
186 |
183 |
187 @{ML [display,gray] "(p -- q) || r" for p q r} |
184 @{ML [display,gray] "(p -- q) || r" for p q r} |
213 @{ML_response_fake [display,gray] "(!! (fn _ => \"foo\") ($$ \"h\")) (Symbol.explode \"world\")" |
210 @{ML_response_fake [display,gray] "(!! (fn _ => \"foo\") ($$ \"h\")) (Symbol.explode \"world\")" |
214 "Exception ABORT raised"} |
211 "Exception ABORT raised"} |
215 |
212 |
216 then the parsing aborts and the error message @{text "foo"} is printed. In order to |
213 then the parsing aborts and the error message @{text "foo"} is printed. In order to |
217 see the error message properly, you need to prefix the parser with the function |
214 see the error message properly, you need to prefix the parser with the function |
218 @{ML_ind [index] error in Scan}. For example: |
215 @{ML_ind error in Scan}. For example: |
219 |
216 |
220 @{ML_response_fake [display,gray] |
217 @{ML_response_fake [display,gray] |
221 "Scan.error (!! (fn _ => \"foo\") ($$ \"h\"))" |
218 "Scan.error (!! (fn _ => \"foo\") ($$ \"h\"))" |
222 "Exception Error \"foo\" raised"} |
219 "Exception Error \"foo\" raised"} |
223 |
220 |
224 This ``prefixing'' is usually done by wrappers such as @{ML_ind [index] local_theory in OuterSyntax} |
221 This ``prefixing'' is usually done by wrappers such as @{ML_ind local_theory in OuterSyntax} |
225 (see Section~\ref{sec:newcommand} which explains this function in more detail). |
222 (see Section~\ref{sec:newcommand} which explains this function in more detail). |
226 |
223 |
227 Let us now return to our example of parsing @{ML "(p -- q) || r" for p q |
224 Let us now return to our example of parsing @{ML "(p -- q) || r" for p q |
228 r}. If you want to generate the correct error message for |
225 r}. If you want to generate the correct error message for |
229 @{text "p"}-followed-by-@{text "q"}, then you have to write: |
226 @{text "p"}-followed-by-@{text "q"}, then you have to write: |
256 often as it succeeds. For example: |
253 often as it succeeds. For example: |
257 |
254 |
258 @{ML_response [display,gray] "Scan.repeat ($$ \"h\") (Symbol.explode \"hhhhello\")" |
255 @{ML_response [display,gray] "Scan.repeat ($$ \"h\") (Symbol.explode \"hhhhello\")" |
259 "([\"h\", \"h\", \"h\", \"h\"], [\"e\", \"l\", \"l\", \"o\"])"} |
256 "([\"h\", \"h\", \"h\", \"h\"], [\"e\", \"l\", \"l\", \"o\"])"} |
260 |
257 |
261 Note that @{ML_ind [index] repeat in Scan} stores the parsed items in a list. The function |
258 Note that @{ML_ind repeat in Scan} stores the parsed items in a list. The function |
262 @{ML_ind [index] repeat1 in Scan} is similar, but requires that the parser @{text "p"} |
259 @{ML_ind repeat1 in Scan} is similar, but requires that the parser @{text "p"} |
263 succeeds at least once. |
260 succeeds at least once. |
264 |
261 |
265 Also note that the parser would have aborted with the exception @{text MORE}, if |
262 Also note that the parser would have aborted with the exception @{text MORE}, if |
266 you had run it only on just @{text [quotes] "hhhh"}. This can be avoided by using |
263 you had run it only on just @{text [quotes] "hhhh"}. This can be avoided by using |
267 the wrapper @{ML_ind [index] finite in Scan} and the ``stopper-token'' |
264 the wrapper @{ML_ind finite in Scan} and the ``stopper-token'' |
268 @{ML_ind [index] stopper in Symbol}. With them you can write: |
265 @{ML_ind stopper in Symbol}. With them you can write: |
269 |
266 |
270 @{ML_response [display,gray] "Scan.finite Symbol.stopper (Scan.repeat ($$ \"h\")) (Symbol.explode \"hhhh\")" |
267 @{ML_response [display,gray] "Scan.finite Symbol.stopper (Scan.repeat ($$ \"h\")) (Symbol.explode \"hhhh\")" |
271 "([\"h\", \"h\", \"h\", \"h\"], [])"} |
268 "([\"h\", \"h\", \"h\", \"h\"], [])"} |
272 |
269 |
273 @{ML Symbol.stopper} is the ``end-of-input'' indicator for parsing strings; |
270 @{ML Symbol.stopper} is the ``end-of-input'' indicator for parsing strings; |
274 other stoppers need to be used when parsing, for example, tokens. However, this kind of |
271 other stoppers need to be used when parsing, for example, tokens. However, this kind of |
275 manually wrapping is often already done by the surrounding infrastructure. |
272 manually wrapping is often already done by the surrounding infrastructure. |
276 |
273 |
277 The function @{ML_ind [index] repeat in Scan} can be used with @{ML_ind [index] one in Scan} to read any |
274 The function @{ML_ind repeat in Scan} can be used with @{ML_ind one in Scan} to read any |
278 string as in |
275 string as in |
279 |
276 |
280 @{ML_response [display,gray] |
277 @{ML_response [display,gray] |
281 "let |
278 "let |
282 val p = Scan.repeat (Scan.one Symbol.not_eof) |
279 val p = Scan.repeat (Scan.one Symbol.not_eof) |
284 in |
281 in |
285 Scan.finite Symbol.stopper p input |
282 Scan.finite Symbol.stopper p input |
286 end" |
283 end" |
287 "([\"f\", \"o\", \"o\", \" \", \"b\", \"a\", \"r\", \" \", \"f\", \"o\", \"o\"], [])"} |
284 "([\"f\", \"o\", \"o\", \" \", \"b\", \"a\", \"r\", \" \", \"f\", \"o\", \"o\"], [])"} |
288 |
285 |
289 where the function @{ML_ind [index] not_eof in Symbol} ensures that we do not read beyond the |
286 where the function @{ML_ind not_eof in Symbol} ensures that we do not read beyond the |
290 end of the input string (i.e.~stopper symbol). |
287 end of the input string (i.e.~stopper symbol). |
291 |
288 |
292 The function @{ML_ind [index] unless in Scan} takes two parsers: if the first one can |
289 The function @{ML_ind unless in Scan} takes two parsers: if the first one can |
293 parse the input, then the whole parser fails; if not, then the second is tried. Therefore |
290 parse the input, then the whole parser fails; if not, then the second is tried. Therefore |
294 |
291 |
295 @{ML_response_fake_both [display,gray] "Scan.unless ($$ \"h\") ($$ \"w\") (Symbol.explode \"hello\")" |
292 @{ML_response_fake_both [display,gray] "Scan.unless ($$ \"h\") ($$ \"w\") (Symbol.explode \"hello\")" |
296 "Exception FAIL raised"} |
293 "Exception FAIL raised"} |
297 |
294 |
300 @{ML_response [display,gray] "Scan.unless ($$ \"h\") ($$ \"w\") (Symbol.explode \"world\")" |
297 @{ML_response [display,gray] "Scan.unless ($$ \"h\") ($$ \"w\") (Symbol.explode \"world\")" |
301 "(\"w\",[\"o\", \"r\", \"l\", \"d\"])"} |
298 "(\"w\",[\"o\", \"r\", \"l\", \"d\"])"} |
302 |
299 |
303 succeeds. |
300 succeeds. |
304 |
301 |
305 The functions @{ML_ind [index] repeat in Scan} and @{ML_ind [index] unless in Scan} can |
302 The functions @{ML_ind repeat in Scan} and @{ML_ind unless in Scan} can |
306 be combined to read any input until a certain marker symbol is reached. In the |
303 be combined to read any input until a certain marker symbol is reached. In the |
307 example below the marker symbol is a @{text [quotes] "*"}. |
304 example below the marker symbol is a @{text [quotes] "*"}. |
308 |
305 |
309 @{ML_response [display,gray] |
306 @{ML_response [display,gray] |
310 "let |
307 "let |
318 "(([\"f\", \"o\", \"o\", \"o\", \"o\", \"o\"], []), |
315 "(([\"f\", \"o\", \"o\", \"o\", \"o\", \"o\"], []), |
319 ([\"f\", \"o\", \"o\"], [\"*\", \"o\", \"o\", \"o\"]))"} |
316 ([\"f\", \"o\", \"o\"], [\"*\", \"o\", \"o\", \"o\"]))"} |
320 |
317 |
321 |
318 |
322 After parsing is done, you almost always want to apply a function to the parsed |
319 After parsing is done, you almost always want to apply a function to the parsed |
323 items. One way to do this is the function @{ML_ind [index]">>"} where |
320 items. One way to do this is the function @{ML_ind ">>"} where |
324 @{ML "(p >> f)" for p f} runs |
321 @{ML "(p >> f)" for p f} runs |
325 first the parser @{text p} and upon successful completion applies the |
322 first the parser @{text p} and upon successful completion applies the |
326 function @{text f} to the result. For example |
323 function @{text f} to the result. For example |
327 |
324 |
328 @{ML_response [display,gray] |
325 @{ML_response [display,gray] |
347 where the single-character strings in the parsed output are transformed |
344 where the single-character strings in the parsed output are transformed |
348 back into one string. |
345 back into one string. |
349 |
346 |
350 (FIXME: move to an earlier place) |
347 (FIXME: move to an earlier place) |
351 |
348 |
352 The function @{ML_ind [index] ahead in Scan} parses some input, but leaves the original |
349 The function @{ML_ind ahead in Scan} parses some input, but leaves the original |
353 input unchanged. For example: |
350 input unchanged. For example: |
354 |
351 |
355 @{ML_response [display,gray] |
352 @{ML_response [display,gray] |
356 "Scan.ahead (Scan.this_string \"foo\") (Symbol.explode \"foo\")" |
353 "Scan.ahead (Scan.this_string \"foo\") (Symbol.explode \"foo\")" |
357 "(\"foo\", [\"f\", \"o\", \"o\"])"} |
354 "(\"foo\", [\"f\", \"o\", \"o\"])"} |
358 |
355 |
359 The function @{ML_ind [index] lift in Scan} takes a parser and a pair as arguments. This function applies |
356 The function @{ML_ind lift in Scan} takes a parser and a pair as arguments. This function applies |
360 the given parser to the second component of the pair and leaves the first component |
357 the given parser to the second component of the pair and leaves the first component |
361 untouched. For example |
358 untouched. For example |
362 |
359 |
363 @{ML_response [display,gray] |
360 @{ML_response [display,gray] |
364 "Scan.lift ($$ \"h\" -- $$ \"e\") (1, Symbol.explode \"hello\")" |
361 "Scan.lift ($$ \"h\" -- $$ \"e\") (1, Symbol.explode \"hello\")" |
394 The parser functions for the theory syntax are contained in the structure |
391 The parser functions for the theory syntax are contained in the structure |
395 @{ML_struct OuterParse} defined in the file @{ML_file "Pure/Isar/outer_parse.ML"}. |
392 @{ML_struct OuterParse} defined in the file @{ML_file "Pure/Isar/outer_parse.ML"}. |
396 The definition for tokens is in the file @{ML_file "Pure/Isar/outer_lex.ML"}. |
393 The definition for tokens is in the file @{ML_file "Pure/Isar/outer_lex.ML"}. |
397 \end{readmore} |
394 \end{readmore} |
398 |
395 |
399 The structure @{ML_struct [index] OuterLex} defines several kinds of tokens (for |
396 The structure @{ML_struct OuterLex} defines several kinds of tokens (for |
400 example @{ML_ind [index] Ident in OuterLex} for identifiers, @{ML Keyword in |
397 example @{ML_ind Ident in OuterLex} for identifiers, @{ML Keyword in |
401 OuterLex} for keywords and @{ML_ind [index] Command in OuterLex} for commands). Some |
398 OuterLex} for keywords and @{ML_ind Command in OuterLex} for commands). Some |
402 token parsers take into account the kind of tokens. The first example shows |
399 token parsers take into account the kind of tokens. The first example shows |
403 how to generate a token list out of a string using the function |
400 how to generate a token list out of a string using the function |
404 @{ML_ind [index] scan in OuterSyntax}. It is given the argument |
401 @{ML_ind scan in OuterSyntax}. It is given the argument |
405 @{ML "Position.none"} since, |
402 @{ML "Position.none"} since, |
406 at the moment, we are not interested in generating precise error |
403 at the moment, we are not interested in generating precise error |
407 messages. The following code\footnote{Note that because of a possible bug in |
404 messages. The following code\footnote{Note that because of a possible bug in |
408 the PolyML runtime system, the result is printed as @{text [quotes] "?"}, |
405 the PolyML runtime system, the result is printed as @{text [quotes] "?"}, |
409 instead of the tokens.} |
406 instead of the tokens.} |
417 produces three tokens where the first and the last are identifiers, since |
414 produces three tokens where the first and the last are identifiers, since |
418 @{text [quotes] "hello"} and @{text [quotes] "world"} do not match any |
415 @{text [quotes] "hello"} and @{text [quotes] "world"} do not match any |
419 other syntactic category. The second indicates a space. |
416 other syntactic category. The second indicates a space. |
420 |
417 |
421 We can easily change what is recognised as a keyword with |
418 We can easily change what is recognised as a keyword with |
422 @{ML_ind [index] keyword in OuterKeyword}. For example calling this function |
419 @{ML_ind keyword in OuterKeyword}. For example calling this function |
423 *} |
420 *} |
424 |
421 |
425 ML{*val _ = OuterKeyword.keyword "hello"*} |
422 ML{*val _ = OuterKeyword.keyword "hello"*} |
426 |
423 |
427 text {* |
424 text {* |
432 Token (\<dots>,(Space, \" \"),\<dots>), |
429 Token (\<dots>,(Space, \" \"),\<dots>), |
433 Token (\<dots>,(Ident, \"world\"),\<dots>)]"} |
430 Token (\<dots>,(Ident, \"world\"),\<dots>)]"} |
434 |
431 |
435 Many parsing functions later on will require white space, comments and the like |
432 Many parsing functions later on will require white space, comments and the like |
436 to have already been filtered out. So from now on we are going to use the |
433 to have already been filtered out. So from now on we are going to use the |
437 functions @{ML filter} and @{ML_ind [index] is_proper in OuterLex} to do this. |
434 functions @{ML filter} and @{ML_ind is_proper in OuterLex} to do this. |
438 For example: |
435 For example: |
439 |
436 |
440 @{ML_response_fake [display,gray] |
437 @{ML_response_fake [display,gray] |
441 "let |
438 "let |
442 val input = OuterSyntax.scan Position.none \"hello world\" |
439 val input = OuterSyntax.scan Position.none \"hello world\" |
470 in |
467 in |
471 (Scan.dest_lexicon commands, Scan.dest_lexicon keywords) |
468 (Scan.dest_lexicon commands, Scan.dest_lexicon keywords) |
472 end" |
469 end" |
473 "([\"}\", \"{\", \<dots>], [\"\<rightleftharpoons>\", \"\<leftharpoondown>\", \<dots>])"} |
470 "([\"}\", \"{\", \<dots>], [\"\<rightleftharpoons>\", \"\<leftharpoondown>\", \<dots>])"} |
474 |
471 |
475 You might have to adjust the @{ML_ind [index] print_depth} in order to |
472 You might have to adjust the @{ML_ind print_depth} in order to |
476 see the complete list. |
473 see the complete list. |
477 |
474 |
478 The parser @{ML_ind [index] "$$$" in OuterParse} parses a single keyword. For example: |
475 The parser @{ML_ind "$$$" in OuterParse} parses a single keyword. For example: |
479 |
476 |
480 @{ML_response [display,gray] |
477 @{ML_response [display,gray] |
481 "let |
478 "let |
482 val input1 = filtered_input \"where for\" |
479 val input1 = filtered_input \"where for\" |
483 val input2 = filtered_input \"| in\" |
480 val input2 = filtered_input \"| in\" |
484 in |
481 in |
485 (OuterParse.$$$ \"where\" input1, OuterParse.$$$ \"|\" input2) |
482 (OuterParse.$$$ \"where\" input1, OuterParse.$$$ \"|\" input2) |
486 end" |
483 end" |
487 "((\"where\",\<dots>), (\"|\",\<dots>))"} |
484 "((\"where\",\<dots>), (\"|\",\<dots>))"} |
488 |
485 |
489 Any non-keyword string can be parsed with the function @{ML_ind [index] reserved in OuterParse}. |
486 Any non-keyword string can be parsed with the function @{ML_ind reserved in OuterParse}. |
490 For example: |
487 For example: |
491 |
488 |
492 @{ML_response [display,gray] |
489 @{ML_response [display,gray] |
493 "let |
490 "let |
494 val p = OuterParse.reserved \"bar\" |
491 val p = OuterParse.reserved \"bar\" |
496 in |
493 in |
497 p input |
494 p input |
498 end" |
495 end" |
499 "(\"bar\",[])"} |
496 "(\"bar\",[])"} |
500 |
497 |
501 Like before, you can sequentially connect parsers with @{ML_ind [index] "--"}. For example: |
498 Like before, you can sequentially connect parsers with @{ML_ind "--"}. For example: |
502 |
499 |
503 @{ML_response [display,gray] |
500 @{ML_response [display,gray] |
504 "let |
501 "let |
505 val input = filtered_input \"| in\" |
502 val input = filtered_input \"| in\" |
506 in |
503 in |
507 (OuterParse.$$$ \"|\" -- OuterParse.$$$ \"in\") input |
504 (OuterParse.$$$ \"|\" -- OuterParse.$$$ \"in\") input |
508 end" |
505 end" |
509 "((\"|\", \"in\"), [])"} |
506 "((\"|\", \"in\"), [])"} |
510 |
507 |
511 \indexdef{enum@ {\tt\slshape{enum}}}{in {\tt\slshape OuterParse}} |
|
512 The parser @{ML "OuterParse.enum s p" for s p} parses a possibly empty |
508 The parser @{ML "OuterParse.enum s p" for s p} parses a possibly empty |
513 list of items recognised by the parser @{text p}, where the items being parsed |
509 list of items recognised by the parser @{text p}, where the items being parsed |
514 are separated by the string @{text s}. For example: |
510 are separated by the string @{text s}. For example: |
515 |
511 |
516 @{ML_response [display,gray] |
512 @{ML_response [display,gray] |
519 in |
515 in |
520 (OuterParse.enum \"|\" (OuterParse.$$$ \"in\")) input |
516 (OuterParse.enum \"|\" (OuterParse.$$$ \"in\")) input |
521 end" |
517 end" |
522 "([\"in\", \"in\", \"in\"], [\<dots>])"} |
518 "([\"in\", \"in\", \"in\"], [\<dots>])"} |
523 |
519 |
524 @{ML_ind [index] enum1 in OuterParse} works similarly, except that the parsed list must |
520 @{ML_ind enum1 in OuterParse} works similarly, except that the parsed list must |
525 be non-empty. Note that we had to add a string @{text [quotes] "foo"} at the |
521 be non-empty. Note that we had to add a string @{text [quotes] "foo"} at the |
526 end of the parsed string, otherwise the parser would have consumed all |
522 end of the parsed string, otherwise the parser would have consumed all |
527 tokens and then failed with the exception @{text "MORE"}. Like in the |
523 tokens and then failed with the exception @{text "MORE"}. Like in the |
528 previous section, we can avoid this exception using the wrapper @{ML |
524 previous section, we can avoid this exception using the wrapper @{ML |
529 Scan.finite}. This time, however, we have to use the ``stopper-token'' @{ML |
525 Scan.finite}. This time, however, we have to use the ``stopper-token'' @{ML |
543 |
539 |
544 ML{*fun parse p input = Scan.finite OuterLex.stopper (Scan.error p) input *} |
540 ML{*fun parse p input = Scan.finite OuterLex.stopper (Scan.error p) input *} |
545 |
541 |
546 text {* |
542 text {* |
547 |
543 |
548 The function @{ML_ind [index] "!!!" in OuterParse} can be used to force termination of the |
544 The function @{ML_ind "!!!" in OuterParse} can be used to force termination of the |
549 parser in case of a dead end, just like @{ML "Scan.!!"} (see previous section). |
545 parser in case of a dead end, just like @{ML "Scan.!!"} (see previous section). |
550 Except that the error message of @{ML "OuterParse.!!!"} is fixed to be |
546 Except that the error message of @{ML "OuterParse.!!!"} is fixed to be |
551 @{text [quotes] "Outer syntax error"} |
547 @{text [quotes] "Outer syntax error"} |
552 together with a relatively precise description of the failure. For example: |
548 together with a relatively precise description of the failure. For example: |
553 |
549 |
562 but keyword in was found\" raised" |
558 but keyword in was found\" raised" |
563 } |
559 } |
564 |
560 |
565 \begin{exercise} (FIXME) |
561 \begin{exercise} (FIXME) |
566 A type-identifier, for example @{typ "'a"}, is a token of |
562 A type-identifier, for example @{typ "'a"}, is a token of |
567 kind @{ML_ind [index] Keyword in OuterLex}. It can be parsed using |
563 kind @{ML_ind Keyword in OuterLex}. It can be parsed using |
568 the function @{ML type_ident in OuterParse}. |
564 the function @{ML type_ident in OuterParse}. |
569 \end{exercise} |
565 \end{exercise} |
570 |
566 |
571 (FIXME: or give parser for numbers) |
567 (FIXME: or give parser for numbers) |
572 |
568 |
624 section {* Parsing Inner Syntax *} |
620 section {* Parsing Inner Syntax *} |
625 |
621 |
626 text {* |
622 text {* |
627 There is usually no need to write your own parser for parsing inner syntax, that is |
623 There is usually no need to write your own parser for parsing inner syntax, that is |
628 for terms and types: you can just call the predefined parsers. Terms can |
624 for terms and types: you can just call the predefined parsers. Terms can |
629 be parsed using the function @{ML_ind [index] term in OuterParse}. For example: |
625 be parsed using the function @{ML_ind term in OuterParse}. For example: |
630 |
626 |
631 @{ML_response [display,gray] |
627 @{ML_response [display,gray] |
632 "let |
628 "let |
633 val input = OuterSyntax.scan Position.none \"foo\" |
629 val input = OuterSyntax.scan Position.none \"foo\" |
634 in |
630 in |
635 OuterParse.term input |
631 OuterParse.term input |
636 end" |
632 end" |
637 "(\"\\^E\\^Ftoken\\^Efoo\\^E\\^F\\^E\", [])"} |
633 "(\"\\^E\\^Ftoken\\^Efoo\\^E\\^F\\^E\", [])"} |
638 |
634 |
639 The function @{ML_ind [index] prop in OuterParse} is similar, except that it gives a different |
635 The function @{ML_ind prop in OuterParse} is similar, except that it gives a different |
640 error message, when parsing fails. As you can see, the parser not just returns |
636 error message, when parsing fails. As you can see, the parser not just returns |
641 the parsed string, but also some encoded information. You can decode the |
637 the parsed string, but also some encoded information. You can decode the |
642 information with the function @{ML_ind [index] parse in YXML}. For example |
638 information with the function @{ML_ind parse in YXML}. For example |
643 |
639 |
644 @{ML_response [display,gray] |
640 @{ML_response [display,gray] |
645 "YXML.parse \"\\^E\\^Ftoken\\^Efoo\\^E\\^F\\^E\"" |
641 "YXML.parse \"\\^E\\^Ftoken\\^Efoo\\^E\\^F\\^E\"" |
646 "XML.Elem (\"token\", [], [XML.Text \"foo\"])"} |
642 "XML.Elem (\"token\", [], [XML.Text \"foo\"])"} |
647 |
643 |
721 |
717 |
722 As you see, the result is a pair consisting of a list of |
718 As you see, the result is a pair consisting of a list of |
723 variables with optional type-annotation and syntax-annotation, and a list of |
719 variables with optional type-annotation and syntax-annotation, and a list of |
724 rules where every rule has optionally a name and an attribute. |
720 rules where every rule has optionally a name and an attribute. |
725 |
721 |
726 The function @{ML_ind [index] "fixes" in OuterParse} in Line 2 of the parser reads an |
722 The function @{ML_ind "fixes" in OuterParse} in Line 2 of the parser reads an |
727 \isacommand{and}-separated |
723 \isacommand{and}-separated |
728 list of variables that can include optional type annotations and syntax translations. |
724 list of variables that can include optional type annotations and syntax translations. |
729 For example:\footnote{Note that in the code we need to write |
725 For example:\footnote{Note that in the code we need to write |
730 @{text "\\\"int \<Rightarrow> bool\\\""} in order to properly escape the double quotes |
726 @{text "\\\"int \<Rightarrow> bool\\\""} in order to properly escape the double quotes |
731 in the compound type.} |
727 in the compound type.} |
745 text {* |
741 text {* |
746 Whenever types are given, they are stored in the @{ML SOME}s. The types are |
742 Whenever types are given, they are stored in the @{ML SOME}s. The types are |
747 not yet used to type the variables: this must be done by type-inference later |
743 not yet used to type the variables: this must be done by type-inference later |
748 on. Since types are part of the inner syntax they are strings with some |
744 on. Since types are part of the inner syntax they are strings with some |
749 encoded information (see previous section). If a mixfix-syntax is |
745 encoded information (see previous section). If a mixfix-syntax is |
750 present for a variable, then it is stored in the @{ML_ind [index] Mixfix} data structure; |
746 present for a variable, then it is stored in the @{ML_ind Mixfix} data structure; |
751 no syntax translation is indicated by @{ML_ind [index] NoSyn}. |
747 no syntax translation is indicated by @{ML_ind NoSyn}. |
752 |
748 |
753 \begin{readmore} |
749 \begin{readmore} |
754 The data structure for mixfix annotations is defined in @{ML_file "Pure/Syntax/mixfix.ML"}. |
750 The data structure for mixfix annotations is defined in @{ML_file "Pure/Syntax/mixfix.ML"}. |
755 \end{readmore} |
751 \end{readmore} |
756 |
752 |
757 Lines 3 to 7 in the function @{ML spec_parser} implement the parser for a |
753 Lines 3 to 7 in the function @{ML spec_parser} implement the parser for a |
758 list of introduction rules, that is propositions with theorem annotations |
754 list of introduction rules, that is propositions with theorem annotations |
759 such as rule names and attributes. The introduction rules are propositions |
755 such as rule names and attributes. The introduction rules are propositions |
760 parsed by @{ML_ind [index] prop in OuterParse}. However, they can include an optional |
756 parsed by @{ML_ind prop in OuterParse}. However, they can include an optional |
761 theorem name plus some attributes. For example |
757 theorem name plus some attributes. For example |
762 |
758 |
763 @{ML_response [display,gray] "let |
759 @{ML_response [display,gray] "let |
764 val input = filtered_input \"foo_lemma[intro,dest!]:\" |
760 val input = filtered_input \"foo_lemma[intro,dest!]:\" |
765 val ((name, attrib), _) = parse (SpecParse.thm_name \":\") input |
761 val ((name, attrib), _) = parse (SpecParse.thm_name \":\") input |
766 in |
762 in |
767 (name, map Args.dest_src attrib) |
763 (name, map Args.dest_src attrib) |
768 end" "(foo_lemma, [((\"intro\", []), \<dots>), ((\"dest\", [\<dots>]), \<dots>)])"} |
764 end" "(foo_lemma, [((\"intro\", []), \<dots>), ((\"dest\", [\<dots>]), \<dots>)])"} |
769 |
765 |
770 The function @{ML_ind [index] opt_thm_name in SpecParse} is the ``optional'' variant of |
766 The function @{ML_ind opt_thm_name in SpecParse} is the ``optional'' variant of |
771 @{ML_ind [index] thm_name in SpecParse}. Theorem names can contain attributes. The name |
767 @{ML_ind thm_name in SpecParse}. Theorem names can contain attributes. The name |
772 has to end with @{text [quotes] ":"}---see the argument of |
768 has to end with @{text [quotes] ":"}---see the argument of |
773 the function @{ML SpecParse.opt_thm_name} in Line 7. |
769 the function @{ML SpecParse.opt_thm_name} in Line 7. |
774 |
770 |
775 \begin{readmore} |
771 \begin{readmore} |
776 Attributes and arguments are implemented in the files @{ML_file "Pure/Isar/attrib.ML"} |
772 Attributes and arguments are implemented in the files @{ML_file "Pure/Isar/attrib.ML"} |
832 in |
828 in |
833 OuterSyntax.local_theory "foobar" "description of foobar" kind do_nothing |
829 OuterSyntax.local_theory "foobar" "description of foobar" kind do_nothing |
834 end *} |
830 end *} |
835 |
831 |
836 text {* |
832 text {* |
837 The crucial function @{ML_ind [index] local_theory in OuterSyntax} expects a name for the command, a |
833 The crucial function @{ML_ind local_theory in OuterSyntax} expects a name for the command, a |
838 short description, a kind indicator (which we will explain later more thoroughly) and a |
834 short description, a kind indicator (which we will explain later more thoroughly) and a |
839 parser producing a local theory transition (its purpose will also explained |
835 parser producing a local theory transition (its purpose will also explained |
840 later). |
836 later). |
841 |
837 |
842 While this is everything you have to do on the ML-level, you need a keyword |
838 While this is everything you have to do on the ML-level, you need a keyword |
972 next by letting it take a proposition as argument and printing this proposition |
968 next by letting it take a proposition as argument and printing this proposition |
973 inside the tracing buffer. |
969 inside the tracing buffer. |
974 |
970 |
975 The crucial part of a command is the function that determines the behaviour |
971 The crucial part of a command is the function that determines the behaviour |
976 of the command. In the code above we used a ``do-nothing''-function, which |
972 of the command. In the code above we used a ``do-nothing''-function, which |
977 because of @{ML_ind [index] succeed in Scan} does not parse any argument, but immediately |
973 because of @{ML_ind succeed in Scan} does not parse any argument, but immediately |
978 returns the simple function @{ML "LocalTheory.theory I"}. We can |
974 returns the simple function @{ML "LocalTheory.theory I"}. We can |
979 replace this code by a function that first parses a proposition (using the |
975 replace this code by a function that first parses a proposition (using the |
980 parser @{ML OuterParse.prop}), then prints out the tracing |
976 parser @{ML OuterParse.prop}), then prints out the tracing |
981 information (using a new function @{text trace_prop}) and |
977 information (using a new function @{text trace_prop}) and |
982 finally does nothing. For this you can write: |
978 finally does nothing. For this you can write: |
1002 @{text "> \"True \<and> False\""} |
998 @{text "> \"True \<and> False\""} |
1003 \end{isabelle} |
999 \end{isabelle} |
1004 |
1000 |
1005 and see the proposition in the tracing buffer. |
1001 and see the proposition in the tracing buffer. |
1006 |
1002 |
1007 Note that so far we used @{ML_ind [index] thy_decl in OuterKeyword} as the kind |
1003 Note that so far we used @{ML_ind thy_decl in OuterKeyword} as the kind |
1008 indicator for the command. This means that the command finishes as soon as |
1004 indicator for the command. This means that the command finishes as soon as |
1009 the arguments are processed. Examples of this kind of commands are |
1005 the arguments are processed. Examples of this kind of commands are |
1010 \isacommand{definition} and \isacommand{declare}. In other cases, commands |
1006 \isacommand{definition} and \isacommand{declare}. In other cases, commands |
1011 are expected to parse some arguments, for example a proposition, and then |
1007 are expected to parse some arguments, for example a proposition, and then |
1012 ``open up'' a proof in order to prove the proposition (for example |
1008 ``open up'' a proof in order to prove the proposition (for example |
1013 \isacommand{lemma}) or prove some other properties (for example |
1009 \isacommand{lemma}) or prove some other properties (for example |
1014 \isacommand{function}). To achieve this kind of behaviour, you have to use |
1010 \isacommand{function}). To achieve this kind of behaviour, you have to use |
1015 the kind indicator @{ML_ind [index] thy_goal in OuterKeyword} and the function @{ML |
1011 the kind indicator @{ML_ind thy_goal in OuterKeyword} and the function @{ML |
1016 "local_theory_to_proof" in OuterSyntax} to set up the command. Note, |
1012 "local_theory_to_proof" in OuterSyntax} to set up the command. Note, |
1017 however, once you change the ``kind'' of a command from @{ML thy_decl in |
1013 however, once you change the ``kind'' of a command from @{ML thy_decl in |
1018 OuterKeyword} to @{ML thy_goal in OuterKeyword} then the keyword file needs |
1014 OuterKeyword} to @{ML thy_goal in OuterKeyword} then the keyword file needs |
1019 to be re-created! |
1015 to be re-created! |
1020 |
1016 |
1039 end *} |
1035 end *} |
1040 |
1036 |
1041 text {* |
1037 text {* |
1042 The function @{text prove_prop} in Lines 2 to 7 takes a string (the proposition to be |
1038 The function @{text prove_prop} in Lines 2 to 7 takes a string (the proposition to be |
1043 proved) and a context as argument. The context is necessary in order to be able to use |
1039 proved) and a context as argument. The context is necessary in order to be able to use |
1044 @{ML_ind [index] read_prop in Syntax}, which converts a string into a proper proposition. |
1040 @{ML_ind read_prop in Syntax}, which converts a string into a proper proposition. |
1045 In Line 6 the function @{ML_ind [index] theorem_i in Proof} starts the proof for the |
1041 In Line 6 the function @{ML_ind theorem_i in Proof} starts the proof for the |
1046 proposition. Its argument @{ML NONE} stands for a locale (which we chose to |
1042 proposition. Its argument @{ML NONE} stands for a locale (which we chose to |
1047 omit); the argument @{ML "(K I)"} stands for a function that determines what |
1043 omit); the argument @{ML "(K I)"} stands for a function that determines what |
1048 should be done with the theorem once it is proved (we chose to just forget |
1044 should be done with the theorem once it is proved (we chose to just forget |
1049 about it). Line 9 contains the parser for the proposition. |
1045 about it). Line 9 contains the parser for the proposition. |
1050 |
1046 |
1097 |
1093 |
1098 text {* |
1094 text {* |
1099 It defines the method @{text foo}, which takes no arguments (therefore the |
1095 It defines the method @{text foo}, which takes no arguments (therefore the |
1100 parser @{ML Scan.succeed}) and only applies a single tactic, namely the tactic which |
1096 parser @{ML Scan.succeed}) and only applies a single tactic, namely the tactic which |
1101 applies @{thm [source] conjE} and then @{thm [source] conjI}. The function |
1097 applies @{thm [source] conjE} and then @{thm [source] conjI}. The function |
1102 @{ML_ind [index] SIMPLE_METHOD} |
1098 @{ML_ind SIMPLE_METHOD} |
1103 turns such a tactic into a method. The method @{text "foo"} can be used as follows |
1099 turns such a tactic into a method. The method @{text "foo"} can be used as follows |
1104 *} |
1100 *} |
1105 |
1101 |
1106 lemma shows "A \<and> B \<Longrightarrow> C \<and> D" |
1102 lemma shows "A \<and> B \<Longrightarrow> C \<and> D" |
1107 apply(foo) |
1103 apply(foo) |