36 calling methods with specific arguments. |
36 calling methods with specific arguments. |
37 |
37 |
38 \begin{readmore} |
38 \begin{readmore} |
39 The library for writing parser combinators is split up, roughly, into two |
39 The library for writing parser combinators is split up, roughly, into two |
40 parts: The first part consists of a collection of generic parser combinators |
40 parts: The first part consists of a collection of generic parser combinators |
41 defined in the structure @{ML_struct Scan} in the file @{ML_file |
41 defined in the structure @{ML_structure Scan} in the file @{ML_file |
42 "Pure/General/scan.ML"}. The second part of the library consists of |
42 "Pure/General/scan.ML"}. The second part of the library consists of |
43 combinators for dealing with specific token types, which are defined in the |
43 combinators for dealing with specific token types, which are defined in the |
44 structure @{ML_struct Parse} in the file @{ML_file |
44 structure @{ML_structure Parse} in the file @{ML_file |
45 "Pure/Isar/parse.ML"}. In addition specific parsers for packages are |
45 "Pure/Isar/parse.ML"}. In addition specific parsers for packages are |
46 defined in @{ML_file "Pure/Isar/parse_spec.ML"}. Parsers for method arguments |
46 defined in @{ML_file "Pure/Isar/parse_spec.ML"}. Parsers for method arguments |
47 are defined in @{ML_file "Pure/Isar/args.ML"}. |
47 are defined in @{ML_file "Pure/Isar/args.ML"}. |
48 \end{readmore} |
48 \end{readmore} |
49 |
49 |
57 combinators. The function @{ML_ind "$$" in Scan} takes a string as argument and will |
57 combinators. The function @{ML_ind "$$" in Scan} takes a string as argument and will |
58 ``consume'' this string from a given input list of strings. ``Consume'' in |
58 ``consume'' this string from a given input list of strings. ``Consume'' in |
59 this context means that it will return a pair consisting of this string and |
59 this context means that it will return a pair consisting of this string and |
60 the rest of the input list. For example: |
60 the rest of the input list. For example: |
61 |
61 |
62 @{ML_response [display,gray] |
62 @{ML_matchresult [display,gray] |
63 "($$ \"h\") (Symbol.explode \"hello\")" "(\"h\", [\"e\", \"l\", \"l\", \"o\"])"} |
63 "($$ \"h\") (Symbol.explode \"hello\")" "(\"h\", [\"e\", \"l\", \"l\", \"o\"])"} |
64 |
64 |
65 @{ML_response [display,gray] |
65 @{ML_matchresult [display,gray] |
66 "($$ \"w\") (Symbol.explode \"world\")" "(\"w\", [\"o\", \"r\", \"l\", \"d\"])"} |
66 "($$ \"w\") (Symbol.explode \"world\")" "(\"w\", [\"o\", \"r\", \"l\", \"d\"])"} |
67 |
67 |
68 The function @{ML "$$"} will either succeed (as in the two examples above) |
68 The function @{ML "$$"} will either succeed (as in the two examples above) |
69 or raise the exception \<open>FAIL\<close> if no string can be consumed. For |
69 or raise the exception \<open>FAIL\<close> if no string can be consumed. For |
70 example trying to parse |
70 example trying to parse |
71 |
71 |
72 @{ML_response_fake [display,gray] |
72 @{ML_matchresult_fake [display,gray] |
73 "($$ \"x\") (Symbol.explode \"world\")" |
73 "($$ \"x\") (Symbol.explode \"world\")" |
74 "Exception FAIL raised"} |
74 "Exception FAIL raised"} |
75 |
75 |
76 will raise the exception \<open>FAIL\<close>. The function @{ML_ind "$$" in Scan} will also |
76 will raise the exception \<open>FAIL\<close>. The function @{ML_ind "$$" in Scan} will also |
77 fail if you try to consume more than a single character. |
77 fail if you try to consume more than a single character. |
89 |
89 |
90 However, note that these exceptions are private to the parser and cannot be accessed |
90 However, note that these exceptions are private to the parser and cannot be accessed |
91 by the programmer (for example to handle them). |
91 by the programmer (for example to handle them). |
92 |
92 |
93 In the examples above we use the function @{ML_ind explode in Symbol} from the |
93 In the examples above we use the function @{ML_ind explode in Symbol} from the |
94 structure @{ML_struct Symbol}, instead of the more standard library function |
94 structure @{ML_structure Symbol}, instead of the more standard library function |
95 @{ML_ind explode in String}, for obtaining an input list for the parser. The reason is |
95 @{ML_ind explode in String}, for obtaining an input list for the parser. The reason is |
96 that @{ML explode in Symbol} is aware of character |
96 that @{ML explode in Symbol} is aware of character |
97 sequences, for example \<open>\<foo>\<close>, that have a special meaning in |
97 sequences, for example \<open>\<foo>\<close>, that have a special meaning in |
98 Isabelle. To see the difference consider |
98 Isabelle. To see the difference consider |
99 |
99 |
100 @{ML_response_fake [display,gray] |
100 @{ML_matchresult_fake [display,gray] |
101 "let |
101 "let |
102 val input = \"\<foo> bar\" |
102 val input = \"\<foo> bar\" |
103 in |
103 in |
104 (String.explode input, Symbol.explode input) |
104 (String.explode input, Symbol.explode input) |
105 end" |
105 end" |
111 then parses exactly |
111 then parses exactly |
112 one item from the input list satisfying this predicate. For example the |
112 one item from the input list satisfying this predicate. For example the |
113 following parser either consumes an @{text [quotes] "h"} or a @{text |
113 following parser either consumes an @{text [quotes] "h"} or a @{text |
114 [quotes] "w"}: |
114 [quotes] "w"}: |
115 |
115 |
116 @{ML_response [display,gray] |
116 @{ML_matchresult [display,gray] |
117 "let |
117 "let |
118 val hw = Scan.one (fn x => x = \"h\" orelse x = \"w\") |
118 val hw = Scan.one (fn x => x = \"h\" orelse x = \"w\") |
119 val input1 = Symbol.explode \"hello\" |
119 val input1 = Symbol.explode \"hello\" |
120 val input2 = Symbol.explode \"world\" |
120 val input2 = Symbol.explode \"world\" |
121 in |
121 in |
125 |
125 |
126 Two parsers can be connected in sequence by using the function @{ML_ind "--" in Scan}. |
126 Two parsers can be connected in sequence by using the function @{ML_ind "--" in Scan}. |
127 For example parsing \<open>h\<close>, \<open>e\<close> and \<open>l\<close> (in this |
127 For example parsing \<open>h\<close>, \<open>e\<close> and \<open>l\<close> (in this |
128 order) you can achieve by: |
128 order) you can achieve by: |
129 |
129 |
130 @{ML_response [display,gray] |
130 @{ML_matchresult [display,gray] |
131 "($$ \"h\" -- $$ \"e\" -- $$ \"l\") (Symbol.explode \"hello\")" |
131 "($$ \"h\" -- $$ \"e\" -- $$ \"l\") (Symbol.explode \"hello\")" |
132 "(((\"h\", \"e\"), \"l\"), [\"l\", \"o\"])"} |
132 "(((\"h\", \"e\"), \"l\"), [\"l\", \"o\"])"} |
133 |
133 |
134 Note how the result of consumed strings builds up on the left as nested pairs. |
134 Note how the result of consumed strings builds up on the left as nested pairs. |
135 |
135 |
136 If, as in the previous example, you want to parse a particular string, |
136 If, as in the previous example, you want to parse a particular string, |
137 then you can use the function @{ML_ind this_string in Scan}. |
137 then you can use the function @{ML_ind this_string in Scan}. |
138 |
138 |
139 @{ML_response [display,gray] |
139 @{ML_matchresult [display,gray] |
140 "Scan.this_string \"hell\" (Symbol.explode \"hello\")" |
140 "Scan.this_string \"hell\" (Symbol.explode \"hello\")" |
141 "(\"hell\", [\"o\"])"} |
141 "(\"hell\", [\"o\"])"} |
142 |
142 |
143 Parsers that explore alternatives can be constructed using the function |
143 Parsers that explore alternatives can be constructed using the function |
144 @{ML_ind "||" in Scan}. The parser @{ML "(p || q)" for p q} returns the |
144 @{ML_ind "||" in Scan}. The parser @{ML "(p || q)" for p q} returns the |
145 result of \<open>p\<close>, in case it succeeds; otherwise it returns the |
145 result of \<open>p\<close>, in case it succeeds; otherwise it returns the |
146 result of \<open>q\<close>. For example: |
146 result of \<open>q\<close>. For example: |
147 |
147 |
148 |
148 |
149 @{ML_response [display,gray] |
149 @{ML_matchresult [display,gray] |
150 "let |
150 "let |
151 val hw = $$ \"h\" || $$ \"w\" |
151 val hw = $$ \"h\" || $$ \"w\" |
152 val input1 = Symbol.explode \"hello\" |
152 val input1 = Symbol.explode \"hello\" |
153 val input2 = Symbol.explode \"world\" |
153 val input2 = Symbol.explode \"world\" |
154 in |
154 in |
160 function for parsers, except that they discard the item being parsed by the |
160 function for parsers, except that they discard the item being parsed by the |
161 first (respectively second) parser. That means the item being dropped is the |
161 first (respectively second) parser. That means the item being dropped is the |
162 one that @{ML_ind "|--" in Scan} and @{ML_ind "--|" in Scan} ``point'' away. |
162 one that @{ML_ind "|--" in Scan} and @{ML_ind "--|" in Scan} ``point'' away. |
163 For example: |
163 For example: |
164 |
164 |
165 @{ML_response [display,gray] |
165 @{ML_matchresult [display,gray] |
166 "let |
166 "let |
167 val just_e = $$ \"h\" |-- $$ \"e\" |
167 val just_e = $$ \"h\" |-- $$ \"e\" |
168 val just_h = $$ \"h\" --| $$ \"e\" |
168 val just_h = $$ \"h\" --| $$ \"e\" |
169 val input = Symbol.explode \"hello\" |
169 val input = Symbol.explode \"hello\" |
170 in |
170 in |
174 |
174 |
175 The parser @{ML "Scan.optional p x" for p x} returns the result of the parser |
175 The parser @{ML "Scan.optional p x" for p x} returns the result of the parser |
176 \<open>p\<close>, in case it succeeds; otherwise it returns |
176 \<open>p\<close>, in case it succeeds; otherwise it returns |
177 the default value \<open>x\<close>. For example: |
177 the default value \<open>x\<close>. For example: |
178 |
178 |
179 @{ML_response [display,gray] |
179 @{ML_matchresult [display,gray] |
180 "let |
180 "let |
181 val p = Scan.optional ($$ \"h\") \"x\" |
181 val p = Scan.optional ($$ \"h\") \"x\" |
182 val input1 = Symbol.explode \"hello\" |
182 val input1 = Symbol.explode \"hello\" |
183 val input2 = Symbol.explode \"world\" |
183 val input2 = Symbol.explode \"world\" |
184 in |
184 in |
187 "((\"h\", [\"e\", \"l\", \"l\", \"o\"]), (\"x\", [\"w\", \"o\", \"r\", \"l\", \"d\"]))"} |
187 "((\"h\", [\"e\", \"l\", \"l\", \"o\"]), (\"x\", [\"w\", \"o\", \"r\", \"l\", \"d\"]))"} |
188 |
188 |
189 The function @{ML_ind option in Scan} works similarly, except no default value can |
189 The function @{ML_ind option in Scan} works similarly, except no default value can |
190 be given. Instead, the result is wrapped as an \<open>option\<close>-type. For example: |
190 be given. Instead, the result is wrapped as an \<open>option\<close>-type. For example: |
191 |
191 |
192 @{ML_response [display,gray] |
192 @{ML_matchresult [display,gray] |
193 "let |
193 "let |
194 val p = Scan.option ($$ \"h\") |
194 val p = Scan.option ($$ \"h\") |
195 val input1 = Symbol.explode \"hello\" |
195 val input1 = Symbol.explode \"hello\" |
196 val input2 = Symbol.explode \"world\" |
196 val input2 = Symbol.explode \"world\" |
197 in |
197 in |
199 end" "((SOME \"h\", [\"e\", \"l\", \"l\", \"o\"]), (NONE, [\"w\", \"o\", \"r\", \"l\", \"d\"]))"} |
199 end" "((SOME \"h\", [\"e\", \"l\", \"l\", \"o\"]), (NONE, [\"w\", \"o\", \"r\", \"l\", \"d\"]))"} |
200 |
200 |
201 The function @{ML_ind ahead in Scan} parses some input, but leaves the original |
201 The function @{ML_ind ahead in Scan} parses some input, but leaves the original |
202 input unchanged. For example: |
202 input unchanged. For example: |
203 |
203 |
204 @{ML_response [display,gray] |
204 @{ML_matchresult [display,gray] |
205 "Scan.ahead (Scan.this_string \"foo\") (Symbol.explode \"foo\")" |
205 "Scan.ahead (Scan.this_string \"foo\") (Symbol.explode \"foo\")" |
206 "(\"foo\", [\"f\", \"o\", \"o\"])"} |
206 "(\"foo\", [\"f\", \"o\", \"o\"])"} |
207 |
207 |
208 The function @{ML_ind "!!" in Scan} helps with producing appropriate error messages |
208 The function @{ML_ind "!!" in Scan} helps with producing appropriate error messages |
209 during parsing. For example if you want to parse \<open>p\<close> immediately |
209 during parsing. For example if you want to parse \<open>p\<close> immediately |
228 |
228 |
229 @{ML [display,gray] "!! (fn _ => fn _ =>\"foo\") ($$ \"h\")"} |
229 @{ML [display,gray] "!! (fn _ => fn _ =>\"foo\") ($$ \"h\")"} |
230 |
230 |
231 on @{text [quotes] "hello"}, the parsing succeeds |
231 on @{text [quotes] "hello"}, the parsing succeeds |
232 |
232 |
233 @{ML_response [display,gray] |
233 @{ML_matchresult [display,gray] |
234 "(!! (fn _ => fn _ => \"foo\") ($$ \"h\")) (Symbol.explode \"hello\")" |
234 "(!! (fn _ => fn _ => \"foo\") ($$ \"h\")) (Symbol.explode \"hello\")" |
235 "(\"h\", [\"e\", \"l\", \"l\", \"o\"])"} |
235 "(\"h\", [\"e\", \"l\", \"l\", \"o\"])"} |
236 |
236 |
237 but if you invoke it on @{text [quotes] "world"} |
237 but if you invoke it on @{text [quotes] "world"} |
238 |
238 |
239 @{ML_response_fake [display,gray] "(!! (fn _ => fn _ => \"foo\") ($$ \"h\")) (Symbol.explode \"world\")" |
239 @{ML_matchresult_fake [display,gray] "(!! (fn _ => fn _ => \"foo\") ($$ \"h\")) (Symbol.explode \"world\")" |
240 "Exception ABORT raised"} |
240 "Exception ABORT raised"} |
241 |
241 |
242 then the parsing aborts and the error message \<open>foo\<close> is printed. In order to |
242 then the parsing aborts and the error message \<open>foo\<close> is printed. In order to |
243 see the error message properly, you need to prefix the parser with the function |
243 see the error message properly, you need to prefix the parser with the function |
244 @{ML_ind error in Scan}. For example: |
244 @{ML_ind error in Scan}. For example: |
245 |
245 |
246 @{ML_response_fake [display,gray] |
246 @{ML_matchresult_fake [display,gray] |
247 "Scan.error (!! (fn _ => fn _ => \"foo\") ($$ \"h\")) (Symbol.explode \"world\")" |
247 "Scan.error (!! (fn _ => fn _ => \"foo\") ($$ \"h\")) (Symbol.explode \"world\")" |
248 "Exception Error \"foo\" raised"} |
248 "Exception Error \"foo\" raised"} |
249 |
249 |
250 This kind of ``prefixing'' to see the correct error message is usually done by wrappers |
250 This kind of ``prefixing'' to see the correct error message is usually done by wrappers |
251 such as @{ML_ind local_theory in Outer_Syntax} |
251 such as @{ML_ind local_theory in Outer_Syntax} |
267 text \<open> |
267 text \<open> |
268 Running this parser with the arguments |
268 Running this parser with the arguments |
269 @{text [quotes] "h"}, @{text [quotes] "e"} and @{text [quotes] "w"}, and |
269 @{text [quotes] "h"}, @{text [quotes] "e"} and @{text [quotes] "w"}, and |
270 the input @{text [quotes] "holle"} |
270 the input @{text [quotes] "holle"} |
271 |
271 |
272 @{ML_response_fake [display,gray] "Scan.error (p_followed_by_q \"h\" \"e\" \"w\") (Symbol.explode \"holle\")" |
272 @{ML_matchresult_fake [display,gray] "Scan.error (p_followed_by_q \"h\" \"e\" \"w\") (Symbol.explode \"holle\")" |
273 "Exception ERROR \"h is not followed by e\" raised"} |
273 "Exception ERROR \"h is not followed by e\" raised"} |
274 |
274 |
275 produces the correct error message. Running it with |
275 produces the correct error message. Running it with |
276 |
276 |
277 @{ML_response [display,gray] "Scan.error (p_followed_by_q \"h\" \"e\" \"w\") (Symbol.explode \"wworld\")" |
277 @{ML_matchresult [display,gray] "Scan.error (p_followed_by_q \"h\" \"e\" \"w\") (Symbol.explode \"wworld\")" |
278 "((\"w\", \"w\"), [\"o\", \"r\", \"l\", \"d\"])"} |
278 "((\"w\", \"w\"), [\"o\", \"r\", \"l\", \"d\"])"} |
279 |
279 |
280 yields the expected parsing. |
280 yields the expected parsing. |
281 |
281 |
282 The function @{ML "Scan.repeat p" for p} will apply a parser \<open>p\<close> as |
282 The function @{ML "Scan.repeat p" for p} will apply a parser \<open>p\<close> as |
283 long as it succeeds. For example: |
283 long as it succeeds. For example: |
284 |
284 |
285 @{ML_response [display,gray] "Scan.repeat ($$ \"h\") (Symbol.explode \"hhhhello\")" |
285 @{ML_matchresult [display,gray] "Scan.repeat ($$ \"h\") (Symbol.explode \"hhhhello\")" |
286 "([\"h\", \"h\", \"h\", \"h\"], [\"e\", \"l\", \"l\", \"o\"])"} |
286 "([\"h\", \"h\", \"h\", \"h\"], [\"e\", \"l\", \"l\", \"o\"])"} |
287 |
287 |
288 Note that @{ML_ind repeat in Scan} stores the parsed items in a list. The function |
288 Note that @{ML_ind repeat in Scan} stores the parsed items in a list. The function |
289 @{ML_ind repeat1 in Scan} is similar, but requires that the parser \<open>p\<close> |
289 @{ML_ind repeat1 in Scan} is similar, but requires that the parser \<open>p\<close> |
290 succeeds at least once. |
290 succeeds at least once. |
292 Also note that the parser would have aborted with the exception \<open>MORE\<close>, if |
292 Also note that the parser would have aborted with the exception \<open>MORE\<close>, if |
293 you had it run with the string @{text [quotes] "hhhh"}. This can be avoided by using |
293 you had it run with the string @{text [quotes] "hhhh"}. This can be avoided by using |
294 the wrapper @{ML_ind finite in Scan} and the ``stopper-token'' |
294 the wrapper @{ML_ind finite in Scan} and the ``stopper-token'' |
295 @{ML_ind stopper in Symbol}. With them you can write: |
295 @{ML_ind stopper in Symbol}. With them you can write: |
296 |
296 |
297 @{ML_response [display,gray] "Scan.finite Symbol.stopper (Scan.repeat ($$ \"h\")) (Symbol.explode \"hhhh\")" |
297 @{ML_matchresult [display,gray] "Scan.finite Symbol.stopper (Scan.repeat ($$ \"h\")) (Symbol.explode \"hhhh\")" |
298 "([\"h\", \"h\", \"h\", \"h\"], [])"} |
298 "([\"h\", \"h\", \"h\", \"h\"], [])"} |
299 |
299 |
300 The function @{ML stopper in Symbol} is the ``end-of-input'' indicator for parsing strings; |
300 The function @{ML stopper in Symbol} is the ``end-of-input'' indicator for parsing strings; |
301 other stoppers need to be used when parsing, for example, tokens. However, this kind of |
301 other stoppers need to be used when parsing, for example, tokens. However, this kind of |
302 manually wrapping is often already done by the surrounding infrastructure. |
302 manually wrapping is often already done by the surrounding infrastructure. |
303 |
303 |
304 The function @{ML_ind repeat in Scan} can be used with @{ML_ind one in Scan} to read any |
304 The function @{ML_ind repeat in Scan} can be used with @{ML_ind one in Scan} to read any |
305 string as in |
305 string as in |
306 |
306 |
307 @{ML_response [display,gray] |
307 @{ML_matchresult [display,gray] |
308 "let |
308 "let |
309 val p = Scan.repeat (Scan.one Symbol.not_eof) |
309 val p = Scan.repeat (Scan.one Symbol.not_eof) |
310 val input = Symbol.explode \"foo bar foo\" |
310 val input = Symbol.explode \"foo bar foo\" |
311 in |
311 in |
312 Scan.finite Symbol.stopper p input |
312 Scan.finite Symbol.stopper p input |
317 end of the input string (i.e.~stopper symbol). |
317 end of the input string (i.e.~stopper symbol). |
318 |
318 |
319 The function @{ML_ind unless in Scan} takes two parsers: if the first one can |
319 The function @{ML_ind unless in Scan} takes two parsers: if the first one can |
320 parse the input, then the whole parser fails; if not, then the second is tried. Therefore |
320 parse the input, then the whole parser fails; if not, then the second is tried. Therefore |
321 |
321 |
322 @{ML_response_fake_both [display,gray] "Scan.unless ($$ \"h\") ($$ \"w\") (Symbol.explode \"hello\")" |
322 @{ML_matchresult_fake_both [display,gray] "Scan.unless ($$ \"h\") ($$ \"w\") (Symbol.explode \"hello\")" |
323 "Exception FAIL raised"} |
323 "Exception FAIL raised"} |
324 |
324 |
325 fails, while |
325 fails, while |
326 |
326 |
327 @{ML_response [display,gray] "Scan.unless ($$ \"h\") ($$ \"w\") (Symbol.explode \"world\")" |
327 @{ML_matchresult [display,gray] "Scan.unless ($$ \"h\") ($$ \"w\") (Symbol.explode \"world\")" |
328 "(\"w\",[\"o\", \"r\", \"l\", \"d\"])"} |
328 "(\"w\",[\"o\", \"r\", \"l\", \"d\"])"} |
329 |
329 |
330 succeeds. |
330 succeeds. |
331 |
331 |
332 The functions @{ML_ind repeat in Scan} and @{ML_ind unless in Scan} can |
332 The functions @{ML_ind repeat in Scan} and @{ML_ind unless in Scan} can |
333 be combined to read any input until a certain marker symbol is reached. In the |
333 be combined to read any input until a certain marker symbol is reached. In the |
334 example below the marker symbol is a @{text [quotes] "*"}. |
334 example below the marker symbol is a @{text [quotes] "*"}. |
335 |
335 |
336 @{ML_response [display,gray] |
336 @{ML_matchresult [display,gray] |
337 "let |
337 "let |
338 val p = Scan.repeat (Scan.unless ($$ \"*\") (Scan.one Symbol.not_eof)) |
338 val p = Scan.repeat (Scan.unless ($$ \"*\") (Scan.one Symbol.not_eof)) |
339 val input1 = Symbol.explode \"fooooo\" |
339 val input1 = Symbol.explode \"fooooo\" |
340 val input2 = Symbol.explode \"foo*ooo\" |
340 val input2 = Symbol.explode \"foo*ooo\" |
341 in |
341 in |
350 items. One way to do this is the function @{ML_ind ">>" in Scan} where |
350 items. One way to do this is the function @{ML_ind ">>" in Scan} where |
351 @{ML "(p >> f)" for p f} runs |
351 @{ML "(p >> f)" for p f} runs |
352 first the parser \<open>p\<close> and upon successful completion applies the |
352 first the parser \<open>p\<close> and upon successful completion applies the |
353 function \<open>f\<close> to the result. For example |
353 function \<open>f\<close> to the result. For example |
354 |
354 |
355 @{ML_response [display,gray] |
355 @{ML_matchresult [display,gray] |
356 "let |
356 "let |
357 fun double (x, y) = (x ^ x, y ^ y) |
357 fun double (x, y) = (x ^ x, y ^ y) |
358 val parser = $$ \"h\" -- $$ \"e\" |
358 val parser = $$ \"h\" -- $$ \"e\" |
359 in |
359 in |
360 (parser >> double) (Symbol.explode \"hello\") |
360 (parser >> double) (Symbol.explode \"hello\") |
361 end" |
361 end" |
362 "((\"hh\", \"ee\"), [\"l\", \"l\", \"o\"])"} |
362 "((\"hh\", \"ee\"), [\"l\", \"l\", \"o\"])"} |
363 |
363 |
364 doubles the two parsed input strings; or |
364 doubles the two parsed input strings; or |
365 |
365 |
366 @{ML_response [display,gray] |
366 @{ML_matchresult [display,gray] |
367 "let |
367 "let |
368 val p = Scan.repeat (Scan.one Symbol.not_eof) |
368 val p = Scan.repeat (Scan.one Symbol.not_eof) |
369 val input = Symbol.explode \"foo bar foo\" |
369 val input = Symbol.explode \"foo bar foo\" |
370 in |
370 in |
371 Scan.finite Symbol.stopper (p >> implode) input |
371 Scan.finite Symbol.stopper (p >> implode) input |
377 |
377 |
378 The function @{ML_ind lift in Scan} takes a parser and a pair as arguments. This function applies |
378 The function @{ML_ind lift in Scan} takes a parser and a pair as arguments. This function applies |
379 the given parser to the second component of the pair and leaves the first component |
379 the given parser to the second component of the pair and leaves the first component |
380 untouched. For example |
380 untouched. For example |
381 |
381 |
382 @{ML_response [display,gray] |
382 @{ML_matchresult [display,gray] |
383 "Scan.lift ($$ \"h\" -- $$ \"e\") (1, Symbol.explode \"hello\")" |
383 "Scan.lift ($$ \"h\" -- $$ \"e\") (1, Symbol.explode \"hello\")" |
384 "((\"h\", \"e\"), (1, [\"l\", \"l\", \"o\"]))"} |
384 "((\"h\", \"e\"), (1, [\"l\", \"l\", \"o\"]))"} |
385 |
385 |
386 \footnote{\bf FIXME: In which situations is \<open>lift\<close> useful? Give examples.} |
386 \footnote{\bf FIXME: In which situations is \<open>lift\<close> useful? Give examples.} |
387 |
387 |
423 parentheses. The parser @{ML parse_tree} reads either a pair of trees |
423 parentheses. The parser @{ML parse_tree} reads either a pair of trees |
424 separated by a comma, or acts like @{ML parse_basic}. Unfortunately, |
424 separated by a comma, or acts like @{ML parse_basic}. Unfortunately, |
425 because of the mutual recursion, this parser will immediately run into a |
425 because of the mutual recursion, this parser will immediately run into a |
426 loop, even if it is called without any input. For example |
426 loop, even if it is called without any input. For example |
427 |
427 |
428 @{ML_response_fake_both [display, gray] |
428 @{ML_matchresult_fake_both [display, gray] |
429 "parse_tree \"A\"" |
429 "parse_tree \"A\"" |
430 "*** Exception- TOPLEVEL_ERROR raised"} |
430 "*** Exception- TOPLEVEL_ERROR raised"} |
431 |
431 |
432 raises an exception indicating that the stack limit is reached. Such |
432 raises an exception indicating that the stack limit is reached. Such |
433 looping parser are not useful, because of ML's strict evaluation of |
433 looping parser are not useful, because of ML's strict evaluation of |
447 While the type of the parser is unchanged by the addition, its behaviour |
447 While the type of the parser is unchanged by the addition, its behaviour |
448 changed: with this version of the parser the execution is delayed until |
448 changed: with this version of the parser the execution is delayed until |
449 some string is applied for the argument \<open>xs\<close>. This gives us |
449 some string is applied for the argument \<open>xs\<close>. This gives us |
450 exactly the parser what we wanted. An example is as follows: |
450 exactly the parser what we wanted. An example is as follows: |
451 |
451 |
452 @{ML_response [display, gray] |
452 @{ML_matchresult [display, gray] |
453 "let |
453 "let |
454 val input = Symbol.explode \"(A,((A))),A\" |
454 val input = Symbol.explode \"(A,((A))),A\" |
455 in |
455 in |
456 Scan.finite Symbol.stopper (parse_tree \"A\") input |
456 Scan.finite Symbol.stopper (parse_tree \"A\") input |
457 end" |
457 end" |
485 parsers for the arguments of proof methods, use the type @{ML_type |
485 parsers for the arguments of proof methods, use the type @{ML_type |
486 Token.T}. |
486 Token.T}. |
487 |
487 |
488 \begin{readmore} |
488 \begin{readmore} |
489 The parser functions for the theory syntax are contained in the structure |
489 The parser functions for the theory syntax are contained in the structure |
490 @{ML_struct Parse} defined in the file @{ML_file "Pure/Isar/parse.ML"}. |
490 @{ML_structure Parse} defined in the file @{ML_file "Pure/Isar/parse.ML"}. |
491 The definition for tokens is in the file @{ML_file "Pure/Isar/token.ML"}. |
491 The definition for tokens is in the file @{ML_file "Pure/Isar/token.ML"}. |
492 \end{readmore} |
492 \end{readmore} |
493 |
493 |
494 The structure @{ML_struct Token} defines several kinds of tokens (for |
494 The structure @{ML_structure Token} defines several kinds of tokens (for |
495 example @{ML_ind Ident in Token} for identifiers, @{ML Keyword in |
495 example @{ML_ind Ident in Token} for identifiers, @{ML Keyword in |
496 Token} for keywords and @{ML_ind Command in Token} for commands). Some |
496 Token} for keywords and @{ML_ind Command in Token} for commands). Some |
497 token parsers take into account the kind of tokens. The first example shows |
497 token parsers take into account the kind of tokens. The first example shows |
498 how to generate a token list out of a string using the function |
498 how to generate a token list out of a string using the function |
499 @{ML_ind explode in Token}. It is given the argument |
499 @{ML_ind explode in Token}. It is given the argument |
500 @{ML "Position.none"} since, |
500 @{ML "Position.none"} since, |
501 at the moment, we are not interested in generating precise error |
501 at the moment, we are not interested in generating precise error |
502 messages. The following code |
502 messages. The following code |
503 |
503 |
504 |
504 |
505 @{ML_response_fake [display,gray] \<open>Token.explode |
505 @{ML_matchresult_fake [display,gray] \<open>Token.explode |
506 (Thy_Header.get_keywords' @{context}) Position.none "hello world"\<close> |
506 (Thy_Header.get_keywords' @{context}) Position.none "hello world"\<close> |
507 \<open>[Token (_,(Ident, "hello"),_), |
507 \<open>[Token (_,(Ident, "hello"),_), |
508 Token (_,(Space, " "),_), |
508 Token (_,(Space, " "),_), |
509 Token (_,(Ident, "world"),_)]\<close>} |
509 Token (_,(Ident, "world"),_)]\<close>} |
510 |
510 |
522 |
522 |
523 |
523 |
524 text \<open> |
524 text \<open> |
525 then lexing @{text [quotes] "hello world"} will produce |
525 then lexing @{text [quotes] "hello world"} will produce |
526 |
526 |
527 @{ML_response_fake [display,gray] "Token.explode |
527 @{ML_matchresult_fake [display,gray] "Token.explode |
528 (Thy_Header.get_keywords' @{context}) Position.none \"hello world\"" |
528 (Thy_Header.get_keywords' @{context}) Position.none \"hello world\"" |
529 "[Token (_,(Keyword, \"hello\"),_), |
529 "[Token (_,(Keyword, \"hello\"),_), |
530 Token (_,(Space, \" \"),_), |
530 Token (_,(Space, \" \"),_), |
531 Token (_,(Ident, \"world\"),_)]"} |
531 Token (_,(Ident, \"world\"),_)]"} |
532 |
532 |
533 Many parsing functions later on will require white space, comments and the like |
533 Many parsing functions later on will require white space, comments and the like |
534 to have already been filtered out. So from now on we are going to use the |
534 to have already been filtered out. So from now on we are going to use the |
535 functions @{ML filter} and @{ML_ind is_proper in Token} to do this. |
535 functions @{ML filter} and @{ML_ind is_proper in Token} to do this. |
536 For example: |
536 For example: |
537 |
537 |
538 @{ML_response_fake [display,gray] |
538 @{ML_matchresult_fake [display,gray] |
539 "let |
539 "let |
540 val input = Token.explode (Thy_Header.get_keywords' @{context}) Position.none \"hello world\" |
540 val input = Token.explode (Thy_Header.get_keywords' @{context}) Position.none \"hello world\" |
541 in |
541 in |
542 filter Token.is_proper input |
542 filter Token.is_proper input |
543 end" |
543 end" |
551 |
551 |
552 ML \<open>filtered_input "inductive | for"\<close> |
552 ML \<open>filtered_input "inductive | for"\<close> |
553 text \<open> |
553 text \<open> |
554 If you now parse |
554 If you now parse |
555 |
555 |
556 @{ML_response_fake [display,gray] |
556 @{ML_matchresult_fake [display,gray] |
557 "filtered_input \"inductive | for\"" |
557 "filtered_input \"inductive | for\"" |
558 "[Token (_,(Command, \"inductive\"),_), |
558 "[Token (_,(Command, \"inductive\"),_), |
559 Token (_,(Keyword, \"|\"),_), |
559 Token (_,(Keyword, \"|\"),_), |
560 Token (_,(Keyword, \"for\"),_)]"} |
560 Token (_,(Keyword, \"for\"),_)]"} |
561 |
561 |
566 You might have to adjust the \<open>ML_print_depth\<close> in order to |
566 You might have to adjust the \<open>ML_print_depth\<close> in order to |
567 see the complete list. |
567 see the complete list. |
568 |
568 |
569 The parser @{ML_ind "$$$" in Parse} parses a single keyword. For example: |
569 The parser @{ML_ind "$$$" in Parse} parses a single keyword. For example: |
570 |
570 |
571 @{ML_response [display,gray] |
571 @{ML_matchresult [display,gray] |
572 "let |
572 "let |
573 val input1 = filtered_input \"where for\" |
573 val input1 = filtered_input \"where for\" |
574 val input2 = filtered_input \"| in\" |
574 val input2 = filtered_input \"| in\" |
575 in |
575 in |
576 (Parse.$$$ \"where\" input1, Parse.$$$ \"|\" input2) |
576 (Parse.$$$ \"where\" input1, Parse.$$$ \"|\" input2) |
578 "((\"where\",_), (\"|\",_))"} |
578 "((\"where\",_), (\"|\",_))"} |
579 |
579 |
580 Any non-keyword string can be parsed with the function @{ML_ind reserved in Parse}. |
580 Any non-keyword string can be parsed with the function @{ML_ind reserved in Parse}. |
581 For example: |
581 For example: |
582 |
582 |
583 @{ML_response [display,gray] |
583 @{ML_matchresult [display,gray] |
584 "let |
584 "let |
585 val p = Parse.reserved \"bar\" |
585 val p = Parse.reserved \"bar\" |
586 val input = filtered_input \"bar\" |
586 val input = filtered_input \"bar\" |
587 in |
587 in |
588 p input |
588 p input |
589 end" |
589 end" |
590 "(\"bar\",[])"} |
590 "(\"bar\",[])"} |
591 |
591 |
592 Like before, you can sequentially connect parsers with @{ML "--"}. For example: |
592 Like before, you can sequentially connect parsers with @{ML "--"}. For example: |
593 |
593 |
594 @{ML_response [display,gray] |
594 @{ML_matchresult [display,gray] |
595 "let |
595 "let |
596 val input = filtered_input \"| in\" |
596 val input = filtered_input \"| in\" |
597 in |
597 in |
598 (Parse.$$$ \"|\" -- Parse.$$$ \"in\") input |
598 (Parse.$$$ \"|\" -- Parse.$$$ \"in\") input |
599 end" |
599 end" |
601 |
601 |
602 The parser @{ML "Parse.enum s p" for s p} parses a possibly empty |
602 The parser @{ML "Parse.enum s p" for s p} parses a possibly empty |
603 list of items recognised by the parser \<open>p\<close>, where the items being parsed |
603 list of items recognised by the parser \<open>p\<close>, where the items being parsed |
604 are separated by the string \<open>s\<close>. For example: |
604 are separated by the string \<open>s\<close>. For example: |
605 |
605 |
606 @{ML_response [display,gray] |
606 @{ML_matchresult [display,gray] |
607 "let |
607 "let |
608 val input = filtered_input \"in | in | in foo\" |
608 val input = filtered_input \"in | in | in foo\" |
609 in |
609 in |
610 (Parse.enum \"|\" (Parse.$$$ \"in\")) input |
610 (Parse.enum \"|\" (Parse.$$$ \"in\")) input |
611 end" |
611 end" |
616 [quotes] "foo"} at the end of the parsed string, otherwise the parser would |
616 [quotes] "foo"} at the end of the parsed string, otherwise the parser would |
617 have consumed all tokens and then failed with the exception \<open>MORE\<close>. Like in the previous section, we can avoid this exception using the |
617 have consumed all tokens and then failed with the exception \<open>MORE\<close>. Like in the previous section, we can avoid this exception using the |
618 wrapper @{ML Scan.finite}. This time, however, we have to use the |
618 wrapper @{ML Scan.finite}. This time, however, we have to use the |
619 ``stopper-token'' @{ML Token.stopper}. We can write: |
619 ``stopper-token'' @{ML Token.stopper}. We can write: |
620 |
620 |
621 @{ML_response [display,gray] |
621 @{ML_matchresult [display,gray] |
622 "let |
622 "let |
623 val input = filtered_input \"in | in | in\" |
623 val input = filtered_input \"in | in | in\" |
624 val p = Parse.enum \"|\" (Parse.$$$ \"in\") |
624 val p = Parse.enum \"|\" (Parse.$$$ \"in\") |
625 in |
625 in |
626 Scan.finite Token.stopper p input |
626 Scan.finite Token.stopper p input |
637 of the parser in case of a dead end, just like @{ML "Scan.!!"} (see previous |
637 of the parser in case of a dead end, just like @{ML "Scan.!!"} (see previous |
638 section). A difference, however, is that the error message of @{ML |
638 section). A difference, however, is that the error message of @{ML |
639 "Parse.!!!"} is fixed to be @{text [quotes] "Outer syntax error"} |
639 "Parse.!!!"} is fixed to be @{text [quotes] "Outer syntax error"} |
640 together with a relatively precise description of the failure. For example: |
640 together with a relatively precise description of the failure. For example: |
641 |
641 |
642 @{ML_response_fake [display,gray] |
642 @{ML_matchresult_fake [display,gray] |
643 "let |
643 "let |
644 val input = filtered_input \"in |\" |
644 val input = filtered_input \"in |\" |
645 val parse_bar_then_in = Parse.$$$ \"|\" -- Parse.$$$ \"in\" |
645 val parse_bar_then_in = Parse.$$$ \"|\" -- Parse.$$$ \"in\" |
646 in |
646 in |
647 parse (Parse.!!! parse_bar_then_in) input |
647 parse (Parse.!!! parse_bar_then_in) input |
669 filter Token.is_proper (Token.explode (Thy_Header.get_keywords' @{context}) (Position.line 7) str)\<close> |
669 filter Token.is_proper (Token.explode (Thy_Header.get_keywords' @{context}) (Position.line 7) str)\<close> |
670 |
670 |
671 text \<open> |
671 text \<open> |
672 where we pretend the parsed string starts on line 7. An example is |
672 where we pretend the parsed string starts on line 7. An example is |
673 |
673 |
674 @{ML_response_fake [display,gray] |
674 @{ML_matchresult_fake [display,gray] |
675 "filtered_input' \"foo \\n bar\"" |
675 "filtered_input' \"foo \\n bar\"" |
676 "[Token ((\"foo\", ({line=7, end_line=7}, {line=7})), (Ident, \"foo\"), _), |
676 "[Token ((\"foo\", ({line=7, end_line=7}, {line=7})), (Ident, \"foo\"), _), |
677 Token ((\"bar\", ({line=8, end_line=8}, {line=8})), (Ident, \"bar\"), _)]"} |
677 Token ((\"bar\", ({line=8, end_line=8}, {line=8})), (Ident, \"bar\"), _)]"} |
678 |
678 |
679 in which the @{text [quotes] "\\n"} causes the second token to be in |
679 in which the @{text [quotes] "\\n"} causes the second token to be in |
680 line 8. |
680 line 8. |
681 |
681 |
682 By using the parser @{ML position in Parse} you can access the token |
682 By using the parser @{ML position in Parse} you can access the token |
683 position and return it as part of the parser result. For example |
683 position and return it as part of the parser result. For example |
684 |
684 |
685 @{ML_response_fake [display,gray] |
685 @{ML_matchresult_fake [display,gray] |
686 "let |
686 "let |
687 val input = filtered_input' \"where\" |
687 val input = filtered_input' \"where\" |
688 in |
688 in |
689 parse (Parse.position (Parse.$$$ \"where\")) input |
689 parse (Parse.position (Parse.$$$ \"where\")) input |
690 end" |
690 end" |
751 text \<open> |
751 text \<open> |
752 There is usually no need to write your own parser for parsing inner syntax, that is |
752 There is usually no need to write your own parser for parsing inner syntax, that is |
753 for terms and types: you can just call the predefined parsers. Terms can |
753 for terms and types: you can just call the predefined parsers. Terms can |
754 be parsed using the function @{ML_ind term in Parse}. For example: |
754 be parsed using the function @{ML_ind term in Parse}. For example: |
755 |
755 |
756 @{ML_response_fake [display,gray] |
756 @{ML_matchresult_fake [display,gray] |
757 "let |
757 "let |
758 val input = Token.explode (Thy_Header.get_keywords' @{context}) Position.none \"foo\" |
758 val input = Token.explode (Thy_Header.get_keywords' @{context}) Position.none \"foo\" |
759 in |
759 in |
760 Parse.term input |
760 Parse.term input |
761 end" |
761 end" |
763 |
763 |
764 |
764 |
765 The function @{ML_ind prop in Parse} is similar, except that it gives a different |
765 The function @{ML_ind prop in Parse} is similar, except that it gives a different |
766 error message, when parsing fails. As you can see, the parser not just returns |
766 error message, when parsing fails. As you can see, the parser not just returns |
767 the parsed string, but also some markup information. You can decode the |
767 the parsed string, but also some markup information. You can decode the |
768 information with the function @{ML_ind parse in YXML} in @{ML_struct YXML}. |
768 information with the function @{ML_ind parse in YXML} in @{ML_structure YXML}. |
769 The result of the decoding is an XML-tree. You can see better what is going on if |
769 The result of the decoding is an XML-tree. You can see better what is going on if |
770 you replace @{ML Position.none} by @{ML "Position.line 42"}, say: |
770 you replace @{ML Position.none} by @{ML "Position.line 42"}, say: |
771 |
771 |
772 @{ML_response_fake [display,gray] |
772 @{ML_matchresult_fake [display,gray] |
773 "let |
773 "let |
774 val input = Token.explode (Thy_Header.get_keywords' @{context}) (Position.line 42) \"foo\" |
774 val input = Token.explode (Thy_Header.get_keywords' @{context}) (Position.line 42) \"foo\" |
775 in |
775 in |
776 YXML.parse (fst (Parse.term input)) |
776 YXML.parse (fst (Parse.term input)) |
777 end" |
777 end" |
830 |
830 |
831 |
831 |
832 To see what the parser returns, let us parse the string corresponding to the |
832 To see what the parser returns, let us parse the string corresponding to the |
833 definition of @{term even} and @{term odd}: |
833 definition of @{term even} and @{term odd}: |
834 |
834 |
835 @{ML_response [display,gray] |
835 @{ML_matchresult [display,gray] |
836 "let |
836 "let |
837 val input = filtered_input |
837 val input = filtered_input |
838 (\"even and odd \" ^ |
838 (\"even and odd \" ^ |
839 \"where \" ^ |
839 \"where \" ^ |
840 \" even0[intro]: \\\"even 0\\\" \" ^ |
840 \" even0[intro]: \\\"even 0\\\" \" ^ |
866 list of variables that can include optional type annotations and syntax translations. |
866 list of variables that can include optional type annotations and syntax translations. |
867 For example:\footnote{Note that in the code we need to write |
867 For example:\footnote{Note that in the code we need to write |
868 \<open>\"int \<Rightarrow> bool\"\<close> in order to properly escape the double quotes |
868 \<open>\"int \<Rightarrow> bool\"\<close> in order to properly escape the double quotes |
869 in the compound type.} |
869 in the compound type.} |
870 |
870 |
871 @{ML_response_fake [display,gray] |
871 @{ML_matchresult_fake [display,gray] |
872 "let |
872 "let |
873 val input = filtered_input |
873 val input = filtered_input |
874 \"foo::\\\"int \<Rightarrow> bool\\\" and bar::nat (\\\"BAR\\\" 100) and blonk\" |
874 \"foo::\\\"int \<Rightarrow> bool\\\" and bar::nat (\\\"BAR\\\" 100) and blonk\" |
875 in |
875 in |
876 parse Parse.vars input |
876 parse Parse.vars input |
897 list of introduction rules, that is propositions with theorem annotations |
897 list of introduction rules, that is propositions with theorem annotations |
898 such as rule names and attributes. The introduction rules are propositions |
898 such as rule names and attributes. The introduction rules are propositions |
899 parsed by @{ML_ind prop in Parse}. However, they can include an optional |
899 parsed by @{ML_ind prop in Parse}. However, they can include an optional |
900 theorem name plus some attributes. For example |
900 theorem name plus some attributes. For example |
901 |
901 |
902 @{ML_response [display,gray] "let |
902 @{ML_matchresult [display,gray] "let |
903 val input = filtered_input \"foo_lemma[intro,dest!]:\" |
903 val input = filtered_input \"foo_lemma[intro,dest!]:\" |
904 val ((name, attrib), _) = parse (Parse_Spec.thm_name \":\") input |
904 val ((name, attrib), _) = parse (Parse_Spec.thm_name \":\") input |
905 in |
905 in |
906 (name, map Token.name_of_src attrib) |
906 (name, map Token.name_of_src attrib) |
907 end" "(foo_lemma, [(\"intro\", _), (\"dest\", _)])"} |
907 end" "(foo_lemma, [(\"intro\", _), (\"dest\", _)])"} |
1116 proposition to have a name that can be referenced later on. |
1116 proposition to have a name that can be referenced later on. |
1117 |
1117 |
1118 The first problem for \isacommand{foobar\_proof} is to parse some |
1118 The first problem for \isacommand{foobar\_proof} is to parse some |
1119 text as ML-source and then interpret it as an Isabelle term using |
1119 text as ML-source and then interpret it as an Isabelle term using |
1120 the ML-runtime. For the parsing part, we can use the function |
1120 the ML-runtime. For the parsing part, we can use the function |
1121 @{ML_ind "ML_source" in Parse} in the structure @{ML_struct |
1121 @{ML_ind "ML_source" in Parse} in the structure @{ML_structure |
1122 Parse}. For running the ML-interpreter we need the following |
1122 Parse}. For running the ML-interpreter we need the following |
1123 scaffolding code. |
1123 scaffolding code. |
1124 \<close> |
1124 \<close> |
1125 |
1125 |
1126 ML %grayML\<open> |
1126 ML %grayML\<open> |
1156 |
1156 |
1157 text \<open> |
1157 text \<open> |
1158 In Line 12, we implement a parser that first reads in an optional lemma name (terminated |
1158 In Line 12, we implement a parser that first reads in an optional lemma name (terminated |
1159 by ``:'') and then some ML-code. The function in Lines 5 to 10 takes the ML-text |
1159 by ``:'') and then some ML-code. The function in Lines 5 to 10 takes the ML-text |
1160 and lets the ML-runtime evaluate it using the function @{ML_ind value in Code_Runtime} |
1160 and lets the ML-runtime evaluate it using the function @{ML_ind value in Code_Runtime} |
1161 in the structure @{ML_struct Code_Runtime}. Once the ML-text has been turned into a term, |
1161 in the structure @{ML_structure Code_Runtime}. Once the ML-text has been turned into a term, |
1162 the function @{ML theorem in Proof} opens a corresponding proof-state. This function takes the |
1162 the function @{ML theorem in Proof} opens a corresponding proof-state. This function takes the |
1163 function \<open>after_qed\<close> as argument, whose purpose is to store the theorem |
1163 function \<open>after_qed\<close> as argument, whose purpose is to store the theorem |
1164 (once it is proven) under the given name \<open>thm_name\<close>. |
1164 (once it is proven) under the given name \<open>thm_name\<close>. |
1165 |
1165 |
1166 You can now define a term, for example |
1166 You can now define a term, for example |