34 section {* Building Generic Parsers *} |
34 section {* Building Generic Parsers *} |
35 |
35 |
36 text {* |
36 text {* |
37 |
37 |
38 Let us first have a look at parsing strings using generic parsing |
38 Let us first have a look at parsing strings using generic parsing |
39 combinators. The function @{ML "$$"} takes a string as argument and will |
39 combinators. The function @{ML [index] "$$"} takes a string as argument and will |
40 ``consume'' this string from a given input list of strings. ``Consume'' in |
40 ``consume'' this string from a given input list of strings. ``Consume'' in |
41 this context means that it will return a pair consisting of this string and |
41 this context means that it will return a pair consisting of this string and |
42 the rest of the input list. For example: |
42 the rest of the input list. For example: |
43 |
43 |
44 @{ML_response [display,gray] |
44 @{ML_response [display,gray] |
68 \end{itemize} |
68 \end{itemize} |
69 |
69 |
70 However, note that these exceptions are private to the parser and cannot be accessed |
70 However, note that these exceptions are private to the parser and cannot be accessed |
71 by the programmer (for example to handle them). |
71 by the programmer (for example to handle them). |
72 |
72 |
|
73 \indexdef{explode@ {\tt\slshape{}explode}}{in {\tt\slshape Symbol}} |
|
74 \index{explode@ {\tt\slshape{}explode}} |
73 In the examples above we use the function @{ML Symbol.explode}, instead of the |
75 In the examples above we use the function @{ML Symbol.explode}, instead of the |
74 more standard library function @{ML explode}, for obtaining an input list for |
76 more standard library function @{ML explode}, for obtaining an input list for |
75 the parser. The reason is that @{ML Symbol.explode} is aware of character sequences, |
77 the parser. The reason is that @{ML Symbol.explode} is aware of character sequences, |
76 for example @{text "\\<foo>"}, that have a special meaning in Isabelle. To see |
78 for example @{text "\\<foo>"}, that have a special meaning in Isabelle. To see |
77 the difference consider |
79 the difference consider |
83 (explode input, Symbol.explode input) |
85 (explode input, Symbol.explode input) |
84 end" |
86 end" |
85 "([\"\\\", \"<\", \"f\", \"o\", \"o\", \">\", \" \", \"b\", \"a\", \"r\"], |
87 "([\"\\\", \"<\", \"f\", \"o\", \"o\", \">\", \" \", \"b\", \"a\", \"r\"], |
86 [\"\\<foo>\", \" \", \"b\", \"a\", \"r\"])"} |
88 [\"\\<foo>\", \" \", \"b\", \"a\", \"r\"])"} |
87 |
89 |
88 Slightly more general than the parser @{ML "$$"} is the function @{ML |
90 Slightly more general than the parser @{ML "$$"} is the function |
89 Scan.one}, in that it takes a predicate as argument and then parses exactly |
91 @{ML [index] one in Scan}, in that it takes a predicate as argument and |
|
92 then parses exactly |
90 one item from the input list satisfying this predicate. For example the |
93 one item from the input list satisfying this predicate. For example the |
91 following parser either consumes an @{text [quotes] "h"} or a @{text |
94 following parser either consumes an @{text [quotes] "h"} or a @{text |
92 [quotes] "w"}: |
95 [quotes] "w"}: |
93 |
96 |
94 @{ML_response [display,gray] |
97 @{ML_response [display,gray] |
99 in |
102 in |
100 (hw input1, hw input2) |
103 (hw input1, hw input2) |
101 end" |
104 end" |
102 "((\"h\", [\"e\", \"l\", \"l\", \"o\"]),(\"w\", [\"o\", \"r\", \"l\", \"d\"]))"} |
105 "((\"h\", [\"e\", \"l\", \"l\", \"o\"]),(\"w\", [\"o\", \"r\", \"l\", \"d\"]))"} |
103 |
106 |
104 Two parsers can be connected in sequence by using the function @{ML "--"}. |
107 Two parsers can be connected in sequence by using the function @{ML [index] "--"}. |
105 For example parsing @{text "h"}, @{text "e"} and @{text "l"} (in this |
108 For example parsing @{text "h"}, @{text "e"} and @{text "l"} (in this |
106 order) you can achieve by: |
109 order) you can achieve by: |
107 |
110 |
108 @{ML_response [display,gray] |
111 @{ML_response [display,gray] |
109 "($$ \"h\" -- $$ \"e\" -- $$ \"l\") (Symbol.explode \"hello\")" |
112 "($$ \"h\" -- $$ \"e\" -- $$ \"l\") (Symbol.explode \"hello\")" |
110 "(((\"h\", \"e\"), \"l\"), [\"l\", \"o\"])"} |
113 "(((\"h\", \"e\"), \"l\"), [\"l\", \"o\"])"} |
111 |
114 |
112 Note how the result of consumed strings builds up on the left as nested pairs. |
115 Note how the result of consumed strings builds up on the left as nested pairs. |
113 |
116 |
114 If, as in the previous example, you want to parse a particular string, |
117 If, as in the previous example, you want to parse a particular string, |
115 then you should use the function @{ML Scan.this_string}: |
118 then you should use the function @{ML [index] this_string in Scan}: |
116 |
119 |
117 @{ML_response [display,gray] |
120 @{ML_response [display,gray] |
118 "Scan.this_string \"hell\" (Symbol.explode \"hello\")" |
121 "Scan.this_string \"hell\" (Symbol.explode \"hello\")" |
119 "(\"hell\", [\"o\"])"} |
122 "(\"hell\", [\"o\"])"} |
120 |
123 |
121 Parsers that explore alternatives can be constructed using the function @{ML |
124 Parsers that explore alternatives can be constructed using the function |
122 "||"}. The parser @{ML "(p || q)" for p q} returns the |
125 @{ML [index] "||"}. The parser @{ML "(p || q)" for p q} returns the |
123 result of @{text "p"}, in case it succeeds, otherwise it returns the |
126 result of @{text "p"}, in case it succeeds, otherwise it returns the |
124 result of @{text "q"}. For example: |
127 result of @{text "q"}. For example: |
125 |
128 |
126 |
129 |
127 @{ML_response [display,gray] |
130 @{ML_response [display,gray] |
132 in |
135 in |
133 (hw input1, hw input2) |
136 (hw input1, hw input2) |
134 end" |
137 end" |
135 "((\"h\", [\"e\", \"l\", \"l\", \"o\"]), (\"w\", [\"o\", \"r\", \"l\", \"d\"]))"} |
138 "((\"h\", [\"e\", \"l\", \"l\", \"o\"]), (\"w\", [\"o\", \"r\", \"l\", \"d\"]))"} |
136 |
139 |
137 The functions @{ML "|--"} and @{ML "--|"} work like the sequencing function |
140 The functions @{ML [index] "|--"} and @{ML [index] "--|"} work like the sequencing function |
138 for parsers, except that they discard the item being parsed by the first (respectively second) |
141 for parsers, except that they discard the item being parsed by the first (respectively second) |
139 parser. For example: |
142 parser. For example: |
140 |
143 |
141 @{ML_response [display,gray] |
144 @{ML_response [display,gray] |
142 "let |
145 "let |
146 in |
149 in |
147 (just_e input, just_h input) |
150 (just_e input, just_h input) |
148 end" |
151 end" |
149 "((\"e\", [\"l\", \"l\", \"o\"]), (\"h\", [\"l\", \"l\", \"o\"]))"} |
152 "((\"e\", [\"l\", \"l\", \"o\"]), (\"h\", [\"l\", \"l\", \"o\"]))"} |
150 |
153 |
|
154 \indexdef{optional@ {\tt\slshape{optional}}}{in {\tt\slshape Scan}} |
151 The parser @{ML "Scan.optional p x" for p x} returns the result of the parser |
155 The parser @{ML "Scan.optional p x" for p x} returns the result of the parser |
152 @{text "p"}, if it succeeds; otherwise it returns |
156 @{text "p"}, if it succeeds; otherwise it returns |
153 the default value @{text "x"}. For example: |
157 the default value @{text "x"}. For example: |
154 |
158 |
155 @{ML_response [display,gray] |
159 @{ML_response [display,gray] |
160 in |
164 in |
161 (p input1, p input2) |
165 (p input1, p input2) |
162 end" |
166 end" |
163 "((\"h\", [\"e\", \"l\", \"l\", \"o\"]), (\"x\", [\"w\", \"o\", \"r\", \"l\", \"d\"]))"} |
167 "((\"h\", [\"e\", \"l\", \"l\", \"o\"]), (\"x\", [\"w\", \"o\", \"r\", \"l\", \"d\"]))"} |
164 |
168 |
165 The function @{ML Scan.option} works similarly, except no default value can |
169 The function @{ML [index] option in Scan} works similarly, except no default value can |
166 be given. Instead, the result is wrapped as an @{text "option"}-type. For example: |
170 be given. Instead, the result is wrapped as an @{text "option"}-type. For example: |
167 |
171 |
168 @{ML_response [display,gray] |
172 @{ML_response [display,gray] |
169 "let |
173 "let |
170 val p = Scan.option ($$ \"h\") |
174 val p = Scan.option ($$ \"h\") |
172 val input2 = Symbol.explode \"world\" |
176 val input2 = Symbol.explode \"world\" |
173 in |
177 in |
174 (p input1, p input2) |
178 (p input1, p input2) |
175 end" "((SOME \"h\", [\"e\", \"l\", \"l\", \"o\"]), (NONE, [\"w\", \"o\", \"r\", \"l\", \"d\"]))"} |
179 end" "((SOME \"h\", [\"e\", \"l\", \"l\", \"o\"]), (NONE, [\"w\", \"o\", \"r\", \"l\", \"d\"]))"} |
176 |
180 |
177 The function @{ML "!!"} helps to produce appropriate error messages |
181 The function @{ML [index] "!!"} helps to produce appropriate error messages |
178 for parsing. For example if you want to parse @{text p} immediately |
182 for parsing. For example if you want to parse @{text p} immediately |
179 followed by @{text q}, or start a completely different parser @{text r}, |
183 followed by @{text q}, or start a completely different parser @{text r}, |
180 you might write: |
184 you might write: |
181 |
185 |
182 @{ML [display,gray] "(p -- q) || r" for p q r} |
186 @{ML [display,gray] "(p -- q) || r" for p q r} |
208 @{ML_response_fake [display,gray] "(!! (fn _ => \"foo\") ($$ \"h\")) (Symbol.explode \"world\")" |
212 @{ML_response_fake [display,gray] "(!! (fn _ => \"foo\") ($$ \"h\")) (Symbol.explode \"world\")" |
209 "Exception ABORT raised"} |
213 "Exception ABORT raised"} |
210 |
214 |
211 then the parsing aborts and the error message @{text "foo"} is printed. In order to |
215 then the parsing aborts and the error message @{text "foo"} is printed. In order to |
212 see the error message properly, you need to prefix the parser with the function |
216 see the error message properly, you need to prefix the parser with the function |
213 @{ML "Scan.error"}. For example: |
217 @{ML [index] error in Scan}. For example: |
214 |
218 |
215 @{ML_response_fake [display,gray] |
219 @{ML_response_fake [display,gray] |
216 "Scan.error (!! (fn _ => \"foo\") ($$ \"h\"))" |
220 "Scan.error (!! (fn _ => \"foo\") ($$ \"h\"))" |
217 "Exception Error \"foo\" raised"} |
221 "Exception Error \"foo\" raised"} |
218 |
222 |
219 This ``prefixing'' is usually done by wrappers such as @{ML "OuterSyntax.local_theory"} |
223 This ``prefixing'' is usually done by wrappers such as @{ML [index] local_theory in OuterSyntax} |
220 (see Section~\ref{sec:newcommand} which explains this function in more detail). |
224 (see Section~\ref{sec:newcommand} which explains this function in more detail). |
221 |
225 |
222 Let us now return to our example of parsing @{ML "(p -- q) || r" for p q |
226 Let us now return to our example of parsing @{ML "(p -- q) || r" for p q |
223 r}. If you want to generate the correct error message for |
227 r}. If you want to generate the correct error message for |
224 @{text "p"}-followed-by-@{text "q"}, then you have to write: |
228 @{text "p"}-followed-by-@{text "q"}, then you have to write: |
251 often as it succeeds. For example: |
255 often as it succeeds. For example: |
252 |
256 |
253 @{ML_response [display,gray] "Scan.repeat ($$ \"h\") (Symbol.explode \"hhhhello\")" |
257 @{ML_response [display,gray] "Scan.repeat ($$ \"h\") (Symbol.explode \"hhhhello\")" |
254 "([\"h\", \"h\", \"h\", \"h\"], [\"e\", \"l\", \"l\", \"o\"])"} |
258 "([\"h\", \"h\", \"h\", \"h\"], [\"e\", \"l\", \"l\", \"o\"])"} |
255 |
259 |
256 Note that @{ML "Scan.repeat"} stores the parsed items in a list. The function |
260 Note that @{ML [index] repeat in Scan} stores the parsed items in a list. The function |
257 @{ML "Scan.repeat1"} is similar, but requires that the parser @{text "p"} |
261 @{ML [index] repeat1 in Scan} is similar, but requires that the parser @{text "p"} |
258 succeeds at least once. |
262 succeeds at least once. |
259 |
263 |
260 Also note that the parser would have aborted with the exception @{text MORE}, if |
264 Also note that the parser would have aborted with the exception @{text MORE}, if |
261 you had run it only on just @{text [quotes] "hhhh"}. This can be avoided by using |
265 you had run it only on just @{text [quotes] "hhhh"}. This can be avoided by using |
262 the wrapper @{ML Scan.finite} and the ``stopper-token'' @{ML Symbol.stopper}. With |
266 the wrapper @{ML [index] finite in Scan} and the ``stopper-token'' |
263 them you can write: |
267 @{ML [index] stopper in Symbol}. With them you can write: |
264 |
268 |
265 @{ML_response [display,gray] "Scan.finite Symbol.stopper (Scan.repeat ($$ \"h\")) (Symbol.explode \"hhhh\")" |
269 @{ML_response [display,gray] "Scan.finite Symbol.stopper (Scan.repeat ($$ \"h\")) (Symbol.explode \"hhhh\")" |
266 "([\"h\", \"h\", \"h\", \"h\"], [])"} |
270 "([\"h\", \"h\", \"h\", \"h\"], [])"} |
267 |
271 |
268 @{ML Symbol.stopper} is the ``end-of-input'' indicator for parsing strings; |
272 @{ML Symbol.stopper} is the ``end-of-input'' indicator for parsing strings; |
269 other stoppers need to be used when parsing, for example, tokens. However, this kind of |
273 other stoppers need to be used when parsing, for example, tokens. However, this kind of |
270 manually wrapping is often already done by the surrounding infrastructure. |
274 manually wrapping is often already done by the surrounding infrastructure. |
271 |
275 |
272 The function @{ML Scan.repeat} can be used with @{ML Scan.one} to read any |
276 The function @{ML [index] repeat in Scan} can be used with @{ML [index] one in Scan} to read any |
273 string as in |
277 string as in |
274 |
278 |
275 @{ML_response [display,gray] |
279 @{ML_response [display,gray] |
276 "let |
280 "let |
277 val p = Scan.repeat (Scan.one Symbol.not_eof) |
281 val p = Scan.repeat (Scan.one Symbol.not_eof) |
279 in |
283 in |
280 Scan.finite Symbol.stopper p input |
284 Scan.finite Symbol.stopper p input |
281 end" |
285 end" |
282 "([\"f\", \"o\", \"o\", \" \", \"b\", \"a\", \"r\", \" \", \"f\", \"o\", \"o\"], [])"} |
286 "([\"f\", \"o\", \"o\", \" \", \"b\", \"a\", \"r\", \" \", \"f\", \"o\", \"o\"], [])"} |
283 |
287 |
284 where the function @{ML Symbol.not_eof} ensures that we do not read beyond the |
288 where the function @{ML [index] not_eof in Symbol} ensures that we do not read beyond the |
285 end of the input string (i.e.~stopper symbol). |
289 end of the input string (i.e.~stopper symbol). |
286 |
290 |
|
291 \indexdef{unless@ {\tt\slshape{unless}}}{in {\tt\slshape Scan}} |
287 The function @{ML "Scan.unless p q" for p q} takes two parsers: if the first one can |
292 The function @{ML "Scan.unless p q" for p q} takes two parsers: if the first one can |
288 parse the input, then the whole parser fails; if not, then the second is tried. Therefore |
293 parse the input, then the whole parser fails; if not, then the second is tried. Therefore |
289 |
294 |
290 @{ML_response_fake_both [display,gray] "Scan.unless ($$ \"h\") ($$ \"w\") (Symbol.explode \"hello\")" |
295 @{ML_response_fake_both [display,gray] "Scan.unless ($$ \"h\") ($$ \"w\") (Symbol.explode \"hello\")" |
291 "Exception FAIL raised"} |
296 "Exception FAIL raised"} |
295 @{ML_response [display,gray] "Scan.unless ($$ \"h\") ($$ \"w\") (Symbol.explode \"world\")" |
300 @{ML_response [display,gray] "Scan.unless ($$ \"h\") ($$ \"w\") (Symbol.explode \"world\")" |
296 "(\"w\",[\"o\", \"r\", \"l\", \"d\"])"} |
301 "(\"w\",[\"o\", \"r\", \"l\", \"d\"])"} |
297 |
302 |
298 succeeds. |
303 succeeds. |
299 |
304 |
300 The functions @{ML Scan.repeat} and @{ML Scan.unless} can be combined to read any |
305 The functions @{ML [index] repeat in Scan} and @{ML [index] unless in Scan} can |
301 input until a certain marker symbol is reached. In the example below the marker |
306 be combined to read any input until a certain marker symbol is reached. In the |
302 symbol is a @{text [quotes] "*"}. |
307 example below the marker symbol is a @{text [quotes] "*"}. |
303 |
308 |
304 @{ML_response [display,gray] |
309 @{ML_response [display,gray] |
305 "let |
310 "let |
306 val p = Scan.repeat (Scan.unless ($$ \"*\") (Scan.one Symbol.not_eof)) |
311 val p = Scan.repeat (Scan.unless ($$ \"*\") (Scan.one Symbol.not_eof)) |
307 val input1 = Symbol.explode \"fooooo\" |
312 val input1 = Symbol.explode \"fooooo\" |
311 Scan.finite Symbol.stopper p input2) |
316 Scan.finite Symbol.stopper p input2) |
312 end" |
317 end" |
313 "(([\"f\", \"o\", \"o\", \"o\", \"o\", \"o\"], []), |
318 "(([\"f\", \"o\", \"o\", \"o\", \"o\", \"o\"], []), |
314 ([\"f\", \"o\", \"o\"], [\"*\", \"o\", \"o\", \"o\"]))"} |
319 ([\"f\", \"o\", \"o\"], [\"*\", \"o\", \"o\", \"o\"]))"} |
315 |
320 |
|
321 |
316 After parsing is done, you almost always want to apply a function to the parsed |
322 After parsing is done, you almost always want to apply a function to the parsed |
317 items. One way to do this is the function @{ML "(p >> f)" for p f}, which runs |
323 items. One way to do this is the function @{ML [index]">>"} where |
|
324 @{ML "(p >> f)" for p f} runs |
318 first the parser @{text p} and upon successful completion applies the |
325 first the parser @{text p} and upon successful completion applies the |
319 function @{text f} to the result. For example |
326 function @{text f} to the result. For example |
320 |
327 |
321 @{ML_response [display,gray] |
328 @{ML_response [display,gray] |
322 "let |
329 "let |
340 where the single-character strings in the parsed output are transformed |
347 where the single-character strings in the parsed output are transformed |
341 back into one string. |
348 back into one string. |
342 |
349 |
343 (FIXME: move to an earlier place) |
350 (FIXME: move to an earlier place) |
344 |
351 |
345 The function @{ML Scan.ahead} parses some input, but leaves the original |
352 The function @{ML [index] ahead in Scan} parses some input, but leaves the original |
346 input unchanged. For example: |
353 input unchanged. For example: |
347 |
354 |
348 @{ML_response [display,gray] |
355 @{ML_response [display,gray] |
349 "Scan.ahead (Scan.this_string \"foo\") (Symbol.explode \"foo\")" |
356 "Scan.ahead (Scan.this_string \"foo\") (Symbol.explode \"foo\")" |
350 "(\"foo\", [\"f\", \"o\", \"o\"])"} |
357 "(\"foo\", [\"f\", \"o\", \"o\"])"} |
351 |
358 |
352 The function @{ML Scan.lift} takes a parser and a pair as arguments. This function applies |
359 The function @{ML [index] lift in Scan} takes a parser and a pair as arguments. This function applies |
353 the given parser to the second component of the pair and leaves the first component |
360 the given parser to the second component of the pair and leaves the first component |
354 untouched. For example |
361 untouched. For example |
355 |
362 |
356 @{ML_response [display,gray] |
363 @{ML_response [display,gray] |
357 "Scan.lift ($$ \"h\" -- $$ \"e\") (1, Symbol.explode \"hello\")" |
364 "Scan.lift ($$ \"h\" -- $$ \"e\") (1, Symbol.explode \"hello\")" |
387 The parser functions for the theory syntax are contained in the structure |
394 The parser functions for the theory syntax are contained in the structure |
388 @{ML_struct OuterParse} defined in the file @{ML_file "Pure/Isar/outer_parse.ML"}. |
395 @{ML_struct OuterParse} defined in the file @{ML_file "Pure/Isar/outer_parse.ML"}. |
389 The definition for tokens is in the file @{ML_file "Pure/Isar/outer_lex.ML"}. |
396 The definition for tokens is in the file @{ML_file "Pure/Isar/outer_lex.ML"}. |
390 \end{readmore} |
397 \end{readmore} |
391 |
398 |
392 The structure @{ML_struct OuterLex} defines several kinds of tokens (for |
399 The structure @{ML_struct [index] OuterLex} defines several kinds of tokens (for |
393 example @{ML "Ident" in OuterLex} for identifiers, @{ML "Keyword" in |
400 example @{ML [index] Ident in OuterLex} for identifiers, @{ML Keyword in |
394 OuterLex} for keywords and @{ML "Command" in OuterLex} for commands). Some |
401 OuterLex} for keywords and @{ML [index] Command in OuterLex} for commands). Some |
395 token parsers take into account the kind of tokens. The first example shows |
402 token parsers take into account the kind of tokens. The first example shows |
396 how to generate a token list out of a string using the function @{ML |
403 how to generate a token list out of a string using the function |
397 "OuterSyntax.scan"}. It is given the argument @{ML "Position.none"} since, |
404 @{ML [index] scan in OuterSyntax}. It is given the argument |
|
405 @{ML "Position.none"} since, |
398 at the moment, we are not interested in generating precise error |
406 at the moment, we are not interested in generating precise error |
399 messages. The following code\footnote{Note that because of a possible bug in |
407 messages. The following code\footnote{Note that because of a possible bug in |
400 the PolyML runtime system, the result is printed as @{text [quotes] "?"}, |
408 the PolyML runtime system, the result is printed as @{text [quotes] "?"}, |
401 instead of the tokens.} |
409 instead of the tokens.} |
402 |
410 |
409 produces three tokens where the first and the last are identifiers, since |
417 produces three tokens where the first and the last are identifiers, since |
410 @{text [quotes] "hello"} and @{text [quotes] "world"} do not match any |
418 @{text [quotes] "hello"} and @{text [quotes] "world"} do not match any |
411 other syntactic category. The second indicates a space. |
419 other syntactic category. The second indicates a space. |
412 |
420 |
413 We can easily change what is recognised as a keyword with |
421 We can easily change what is recognised as a keyword with |
414 @{ML OuterKeyword.keyword}. For example calling this function |
422 @{ML [index] keyword in OuterKeyword}. For example calling this function |
415 *} |
423 *} |
416 |
424 |
417 ML{*val _ = OuterKeyword.keyword "hello"*} |
425 ML{*val _ = OuterKeyword.keyword "hello"*} |
418 |
426 |
419 text {* |
427 text {* |
424 Token (\<dots>,(Space, \" \"),\<dots>), |
432 Token (\<dots>,(Space, \" \"),\<dots>), |
425 Token (\<dots>,(Ident, \"world\"),\<dots>)]"} |
433 Token (\<dots>,(Ident, \"world\"),\<dots>)]"} |
426 |
434 |
427 Many parsing functions later on will require white space, comments and the like |
435 Many parsing functions later on will require white space, comments and the like |
428 to have already been filtered out. So from now on we are going to use the |
436 to have already been filtered out. So from now on we are going to use the |
429 functions @{ML filter} and @{ML OuterLex.is_proper} to do this. For example: |
437 functions @{ML filter} and @{ML [index] is_proper in OuterLex} to do this. |
|
438 For example: |
430 |
439 |
431 @{ML_response_fake [display,gray] |
440 @{ML_response_fake [display,gray] |
432 "let |
441 "let |
433 val input = OuterSyntax.scan Position.none \"hello world\" |
442 val input = OuterSyntax.scan Position.none \"hello world\" |
434 in |
443 in |
461 in |
470 in |
462 (Scan.dest_lexicon commands, Scan.dest_lexicon keywords) |
471 (Scan.dest_lexicon commands, Scan.dest_lexicon keywords) |
463 end" |
472 end" |
464 "([\"}\", \"{\", \<dots>], [\"\<rightleftharpoons>\", \"\<leftharpoondown>\", \<dots>])"} |
473 "([\"}\", \"{\", \<dots>], [\"\<rightleftharpoons>\", \"\<leftharpoondown>\", \<dots>])"} |
465 |
474 |
466 You might have to adjust the @{ML print_depth} in order to |
475 You might have to adjust the @{ML [index] print_depth} in order to |
467 see the complete list. |
476 see the complete list. |
468 |
477 |
469 The parser @{ML "OuterParse.$$$"} parses a single keyword. For example: |
478 The parser @{ML [index] "$$$" in OuterParse} parses a single keyword. For example: |
470 |
479 |
471 @{ML_response [display,gray] |
480 @{ML_response [display,gray] |
472 "let |
481 "let |
473 val input1 = filtered_input \"where for\" |
482 val input1 = filtered_input \"where for\" |
474 val input2 = filtered_input \"| in\" |
483 val input2 = filtered_input \"| in\" |
475 in |
484 in |
476 (OuterParse.$$$ \"where\" input1, OuterParse.$$$ \"|\" input2) |
485 (OuterParse.$$$ \"where\" input1, OuterParse.$$$ \"|\" input2) |
477 end" |
486 end" |
478 "((\"where\",\<dots>), (\"|\",\<dots>))"} |
487 "((\"where\",\<dots>), (\"|\",\<dots>))"} |
479 |
488 |
480 Any non-keyword string can be parsed with the function @{ML OuterParse.reserved}. |
489 Any non-keyword string can be parsed with the function @{ML [index] reserved in OuterParse}. |
481 For example: |
490 For example: |
482 |
491 |
483 @{ML_response [display,gray] |
492 @{ML_response [display,gray] |
484 "let |
493 "let |
485 val p = OuterParse.reserved \"bar\" |
494 val p = OuterParse.reserved \"bar\" |
487 in |
496 in |
488 p input |
497 p input |
489 end" |
498 end" |
490 "(\"bar\",[])"} |
499 "(\"bar\",[])"} |
491 |
500 |
492 Like before, you can sequentially connect parsers with @{ML "--"}. For example: |
501 Like before, you can sequentially connect parsers with @{ML [index] "--"}. For example: |
493 |
502 |
494 @{ML_response [display,gray] |
503 @{ML_response [display,gray] |
495 "let |
504 "let |
496 val input = filtered_input \"| in\" |
505 val input = filtered_input \"| in\" |
497 in |
506 in |
498 (OuterParse.$$$ \"|\" -- OuterParse.$$$ \"in\") input |
507 (OuterParse.$$$ \"|\" -- OuterParse.$$$ \"in\") input |
499 end" |
508 end" |
500 "((\"|\", \"in\"), [])"} |
509 "((\"|\", \"in\"), [])"} |
501 |
510 |
|
511 \indexdef{enum@ {\tt\slshape{enum}}}{in {\tt\slshape OuterParse}} |
502 The parser @{ML "OuterParse.enum s p" for s p} parses a possibly empty |
512 The parser @{ML "OuterParse.enum s p" for s p} parses a possibly empty |
503 list of items recognised by the parser @{text p}, where the items being parsed |
513 list of items recognised by the parser @{text p}, where the items being parsed |
504 are separated by the string @{text s}. For example: |
514 are separated by the string @{text s}. For example: |
505 |
515 |
506 @{ML_response [display,gray] |
516 @{ML_response [display,gray] |
509 in |
519 in |
510 (OuterParse.enum \"|\" (OuterParse.$$$ \"in\")) input |
520 (OuterParse.enum \"|\" (OuterParse.$$$ \"in\")) input |
511 end" |
521 end" |
512 "([\"in\", \"in\", \"in\"], [\<dots>])"} |
522 "([\"in\", \"in\", \"in\"], [\<dots>])"} |
513 |
523 |
514 @{ML "OuterParse.enum1"} works similarly, except that the parsed list must |
524 @{ML [index] enum1 in OuterParse} works similarly, except that the parsed list must |
515 be non-empty. Note that we had to add a string @{text [quotes] "foo"} at the |
525 be non-empty. Note that we had to add a string @{text [quotes] "foo"} at the |
516 end of the parsed string, otherwise the parser would have consumed all |
526 end of the parsed string, otherwise the parser would have consumed all |
517 tokens and then failed with the exception @{text "MORE"}. Like in the |
527 tokens and then failed with the exception @{text "MORE"}. Like in the |
518 previous section, we can avoid this exception using the wrapper @{ML |
528 previous section, we can avoid this exception using the wrapper @{ML |
519 Scan.finite}. This time, however, we have to use the ``stopper-token'' @{ML |
529 Scan.finite}. This time, however, we have to use the ``stopper-token'' @{ML |
533 |
543 |
534 ML{*fun parse p input = Scan.finite OuterLex.stopper (Scan.error p) input *} |
544 ML{*fun parse p input = Scan.finite OuterLex.stopper (Scan.error p) input *} |
535 |
545 |
536 text {* |
546 text {* |
537 |
547 |
538 The function @{ML "OuterParse.!!!"} can be used to force termination of the |
548 The function @{ML [index] "!!!" in OuterParse} can be used to force termination of the |
539 parser in case of a dead end, just like @{ML "Scan.!!"} (see previous section). |
549 parser in case of a dead end, just like @{ML "Scan.!!"} (see previous section). |
540 Except that the error message of @{ML "OuterParse.!!!"} is fixed to be |
550 Except that the error message of @{ML "OuterParse.!!!"} is fixed to be |
541 @{text [quotes] "Outer syntax error"} |
551 @{text [quotes] "Outer syntax error"} |
542 together with a relatively precise description of the failure. For example: |
552 together with a relatively precise description of the failure. For example: |
543 |
553 |
552 but keyword in was found\" raised" |
562 but keyword in was found\" raised" |
553 } |
563 } |
554 |
564 |
555 \begin{exercise} (FIXME) |
565 \begin{exercise} (FIXME) |
556 A type-identifier, for example @{typ "'a"}, is a token of |
566 A type-identifier, for example @{typ "'a"}, is a token of |
557 kind @{ML "Keyword" in OuterLex}. It can be parsed using |
567 kind @{ML [index] Keyword in OuterLex}. It can be parsed using |
558 the function @{ML OuterParse.type_ident}. |
568 the function @{ML type_ident in OuterParse}. |
559 \end{exercise} |
569 \end{exercise} |
560 |
570 |
561 (FIXME: or give parser for numbers) |
571 (FIXME: or give parser for numbers) |
562 |
572 |
563 Whenever there is a possibility that the processing of user input can fail, |
573 Whenever there is a possibility that the processing of user input can fail, |
711 |
721 |
712 As you see, the result is a pair consisting of a list of |
722 As you see, the result is a pair consisting of a list of |
713 variables with optional type-annotation and syntax-annotation, and a list of |
723 variables with optional type-annotation and syntax-annotation, and a list of |
714 rules where every rule has optionally a name and an attribute. |
724 rules where every rule has optionally a name and an attribute. |
715 |
725 |
716 The function @{ML OuterParse.fixes} in Line 2 of the parser reads an |
726 The function @{ML [index] "fixes" in OuterParse} in Line 2 of the parser reads an |
717 \isacommand{and}-separated |
727 \isacommand{and}-separated |
718 list of variables that can include optional type annotations and syntax translations. |
728 list of variables that can include optional type annotations and syntax translations. |
719 For example:\footnote{Note that in the code we need to write |
729 For example:\footnote{Note that in the code we need to write |
720 @{text "\\\"int \<Rightarrow> bool\\\""} in order to properly escape the double quotes |
730 @{text "\\\"int \<Rightarrow> bool\\\""} in order to properly escape the double quotes |
721 in the compound type.} |
731 in the compound type.} |
735 text {* |
745 text {* |
736 Whenever types are given, they are stored in the @{ML SOME}s. The types are |
746 Whenever types are given, they are stored in the @{ML SOME}s. The types are |
737 not yet used to type the variables: this must be done by type-inference later |
747 not yet used to type the variables: this must be done by type-inference later |
738 on. Since types are part of the inner syntax they are strings with some |
748 on. Since types are part of the inner syntax they are strings with some |
739 encoded information (see previous section). If a mixfix-syntax is |
749 encoded information (see previous section). If a mixfix-syntax is |
740 present for a variable, then it is stored in the @{ML Mixfix} data structure; |
750 present for a variable, then it is stored in the @{ML [index] Mixfix} data structure; |
741 no syntax translation is indicated by @{ML NoSyn}. |
751 no syntax translation is indicated by @{ML [index] NoSyn}. |
742 |
752 |
743 \begin{readmore} |
753 \begin{readmore} |
744 The data structure for mixfix annotations is defined in @{ML_file "Pure/Syntax/mixfix.ML"}. |
754 The data structure for mixfix annotations is defined in @{ML_file "Pure/Syntax/mixfix.ML"}. |
745 \end{readmore} |
755 \end{readmore} |
746 |
756 |
747 Lines 3 to 7 in the function @{ML spec_parser} implement the parser for a |
757 Lines 3 to 7 in the function @{ML spec_parser} implement the parser for a |
748 list of introduction rules, that is propositions with theorem annotations |
758 list of introduction rules, that is propositions with theorem annotations |
749 such as rule names and attributes. The introduction rules are propositions |
759 such as rule names and attributes. The introduction rules are propositions |
750 parsed by @{ML OuterParse.prop}. However, they can include an optional |
760 parsed by @{ML [index] prop in OuterParse}. However, they can include an optional |
751 theorem name plus some attributes. For example |
761 theorem name plus some attributes. For example |
752 |
762 |
753 @{ML_response [display,gray] "let |
763 @{ML_response [display,gray] "let |
754 val input = filtered_input \"foo_lemma[intro,dest!]:\" |
764 val input = filtered_input \"foo_lemma[intro,dest!]:\" |
755 val ((name, attrib), _) = parse (SpecParse.thm_name \":\") input |
765 val ((name, attrib), _) = parse (SpecParse.thm_name \":\") input |
756 in |
766 in |
757 (name, map Args.dest_src attrib) |
767 (name, map Args.dest_src attrib) |
758 end" "(foo_lemma, [((\"intro\", []), \<dots>), ((\"dest\", [\<dots>]), \<dots>)])"} |
768 end" "(foo_lemma, [((\"intro\", []), \<dots>), ((\"dest\", [\<dots>]), \<dots>)])"} |
759 |
769 |
760 The function @{ML opt_thm_name in SpecParse} is the ``optional'' variant of |
770 The function @{ML [index] opt_thm_name in SpecParse} is the ``optional'' variant of |
761 @{ML thm_name in SpecParse}. Theorem names can contain attributes. The name |
771 @{ML [index] thm_name in SpecParse}. Theorem names can contain attributes. The name |
762 has to end with @{text [quotes] ":"}---see the argument of |
772 has to end with @{text [quotes] ":"}---see the argument of |
763 the function @{ML SpecParse.opt_thm_name} in Line 7. |
773 the function @{ML SpecParse.opt_thm_name} in Line 7. |
764 |
774 |
765 \begin{readmore} |
775 \begin{readmore} |
766 Attributes and arguments are implemented in the files @{ML_file "Pure/Isar/attrib.ML"} |
776 Attributes and arguments are implemented in the files @{ML_file "Pure/Isar/attrib.ML"} |
820 in |
830 in |
821 OuterSyntax.local_theory "foobar" "description of foobar" kind do_nothing |
831 OuterSyntax.local_theory "foobar" "description of foobar" kind do_nothing |
822 end *} |
832 end *} |
823 |
833 |
824 text {* |
834 text {* |
825 The crucial function @{ML OuterSyntax.local_theory} expects a name for the command, a |
835 The crucial function @{ML [index] local_theory in OuterSyntax} expects a name for the command, a |
826 short description, a kind indicator (which we will explain later more thoroughly) and a |
836 short description, a kind indicator (which we will explain later more thoroughly) and a |
827 parser producing a local theory transition (its purpose will also explained |
837 parser producing a local theory transition (its purpose will also explained |
828 later). |
838 later). |
829 |
839 |
830 While this is everything you have to do on the ML-level, you need a keyword |
840 While this is everything you have to do on the ML-level, you need a keyword |
960 next by letting it take a proposition as argument and printing this proposition |
970 next by letting it take a proposition as argument and printing this proposition |
961 inside the tracing buffer. |
971 inside the tracing buffer. |
962 |
972 |
963 The crucial part of a command is the function that determines the behaviour |
973 The crucial part of a command is the function that determines the behaviour |
964 of the command. In the code above we used a ``do-nothing''-function, which |
974 of the command. In the code above we used a ``do-nothing''-function, which |
965 because of @{ML Scan.succeed} does not parse any argument, but immediately |
975 because of @{ML [index] succeed in Scan} does not parse any argument, but immediately |
966 returns the simple function @{ML "LocalTheory.theory I"}. We can |
976 returns the simple function @{ML "LocalTheory.theory I"}. We can |
967 replace this code by a function that first parses a proposition (using the |
977 replace this code by a function that first parses a proposition (using the |
968 parser @{ML OuterParse.prop}), then prints out the tracing |
978 parser @{ML OuterParse.prop}), then prints out the tracing |
969 information (using a new function @{text trace_prop}) and |
979 information (using a new function @{text trace_prop}) and |
970 finally does nothing. For this you can write: |
980 finally does nothing. For this you can write: |
990 @{text "> \"True \<and> False\""} |
1000 @{text "> \"True \<and> False\""} |
991 \end{isabelle} |
1001 \end{isabelle} |
992 |
1002 |
993 and see the proposition in the tracing buffer. |
1003 and see the proposition in the tracing buffer. |
994 |
1004 |
995 Note that so far we used @{ML thy_decl in OuterKeyword} as the kind |
1005 Note that so far we used @{ML [index] thy_decl in OuterKeyword} as the kind |
996 indicator for the command. This means that the command finishes as soon as |
1006 indicator for the command. This means that the command finishes as soon as |
997 the arguments are processed. Examples of this kind of commands are |
1007 the arguments are processed. Examples of this kind of commands are |
998 \isacommand{definition} and \isacommand{declare}. In other cases, commands |
1008 \isacommand{definition} and \isacommand{declare}. In other cases, commands |
999 are expected to parse some arguments, for example a proposition, and then |
1009 are expected to parse some arguments, for example a proposition, and then |
1000 ``open up'' a proof in order to prove the proposition (for example |
1010 ``open up'' a proof in order to prove the proposition (for example |
1001 \isacommand{lemma}) or prove some other properties (for example |
1011 \isacommand{lemma}) or prove some other properties (for example |
1002 \isacommand{function}). To achieve this kind of behaviour, you have to use |
1012 \isacommand{function}). To achieve this kind of behaviour, you have to use |
1003 the kind indicator @{ML thy_goal in OuterKeyword} and the function @{ML |
1013 the kind indicator @{ML [index] thy_goal in OuterKeyword} and the function @{ML |
1004 "local_theory_to_proof" in OuterSyntax} to set up the command. Note, |
1014 "local_theory_to_proof" in OuterSyntax} to set up the command. Note, |
1005 however, once you change the ``kind'' of a command from @{ML thy_decl in |
1015 however, once you change the ``kind'' of a command from @{ML thy_decl in |
1006 OuterKeyword} to @{ML thy_goal in OuterKeyword} then the keyword file needs |
1016 OuterKeyword} to @{ML thy_goal in OuterKeyword} then the keyword file needs |
1007 to be re-created! |
1017 to be re-created! |
1008 |
1018 |
1027 end *} |
1037 end *} |
1028 |
1038 |
1029 text {* |
1039 text {* |
1030 The function @{text prove_prop} in Lines 2 to 7 takes a string (the proposition to be |
1040 The function @{text prove_prop} in Lines 2 to 7 takes a string (the proposition to be |
1031 proved) and a context as argument. The context is necessary in order to be able to use |
1041 proved) and a context as argument. The context is necessary in order to be able to use |
1032 @{ML Syntax.read_prop}, which converts a string into a proper proposition. |
1042 @{ML [index] read_prop in Syntax}, which converts a string into a proper proposition. |
1033 In Line 6 the function @{ML Proof.theorem_i} starts the proof for the |
1043 In Line 6 the function @{ML [index] theorem_i in Proof} starts the proof for the |
1034 proposition. Its argument @{ML NONE} stands for a locale (which we chose to |
1044 proposition. Its argument @{ML NONE} stands for a locale (which we chose to |
1035 omit); the argument @{ML "(K I)"} stands for a function that determines what |
1045 omit); the argument @{ML "(K I)"} stands for a function that determines what |
1036 should be done with the theorem once it is proved (we chose to just forget |
1046 should be done with the theorem once it is proved (we chose to just forget |
1037 about it). Line 9 contains the parser for the proposition. |
1047 about it). Line 9 contains the parser for the proposition. |
1038 |
1048 |
1084 "foo method for conjE and conjI" |
1094 "foo method for conjE and conjI" |
1085 |
1095 |
1086 text {* |
1096 text {* |
1087 It defines the method @{text foobar}, which takes no arguments (therefore the |
1097 It defines the method @{text foobar}, which takes no arguments (therefore the |
1088 parser @{ML Scan.succeed}) and only applies a single tactic, namely the tactic which |
1098 parser @{ML Scan.succeed}) and only applies a single tactic, namely the tactic which |
1089 applies @{thm [source] conjE} and then @{thm [source] conjI}. The function @{ML SIMPLE_METHOD} |
1099 applies @{thm [source] conjE} and then @{thm [source] conjI}. The function |
|
1100 @{ML [index] SIMPLE_METHOD} |
1090 turns such a tactic into a method. The method @{text "foobar"} can be used as follows |
1101 turns such a tactic into a method. The method @{text "foobar"} can be used as follows |
1091 *} |
1102 *} |
1092 |
1103 |
1093 lemma shows "A \<and> B \<Longrightarrow> C \<and> D" |
1104 lemma shows "A \<and> B \<Longrightarrow> C \<and> D" |
1094 apply(foo) |
1105 apply(foo) |