19 Isabelle developers are usually concerned with writing these outer syntax parsers, |
19 Isabelle developers are usually concerned with writing these outer syntax parsers, |
20 either for new definitional packages or for calling tactics with specific arguments. |
20 either for new definitional packages or for calling tactics with specific arguments. |
21 |
21 |
22 \begin{readmore} |
22 \begin{readmore} |
23 The library |
23 The library |
24 for writing parser combinators can be split up, roughly, into two parts. |
24 for writing parser combinators is split up, roughly, into two parts. |
25 The first part consists of a collection of generic parser combinators defined |
25 The first part consists of a collection of generic parser combinators defined |
26 in the structure @{ML_struct Scan} in the file |
26 in the structure @{ML_struct Scan} in the file |
27 @{ML_file "Pure/General/scan.ML"}. The second part of the library consists of |
27 @{ML_file "Pure/General/scan.ML"}. The second part of the library consists of |
28 combinators for dealing with specific token types, which are defined in the |
28 combinators for dealing with specific token types, which are defined in the |
29 structure @{ML_struct OuterParse} in the file |
29 structure @{ML_struct OuterParse} in the file |
61 in @{ML_text "($$ \"h\") []"}. |
61 in @{ML_text "($$ \"h\") []"}. |
62 \item @{ML_text "ABORT"} is the exception which is raised when a dead end is reached. |
62 \item @{ML_text "ABORT"} is the exception which is raised when a dead end is reached. |
63 It is used for example in the function @{ML "(op !!)"} (see below). |
63 It is used for example in the function @{ML "(op !!)"} (see below). |
64 \end{itemize} |
64 \end{itemize} |
65 |
65 |
66 However, note that these exception private to the parser and cannot be accessed |
66 However, note that these exceptions are private to the parser and cannot be accessed |
67 by the programmer (for example to handle them). |
67 by the programmer (for example to handle them). |
68 |
68 |
69 Slightly more general than the parser @{ML "(op $$)"} is the function @{ML |
69 Slightly more general than the parser @{ML "(op $$)"} is the function @{ML |
70 Scan.one}, in that it takes a predicate as argument and then parses exactly |
70 Scan.one}, in that it takes a predicate as argument and then parses exactly |
71 one item from the input list satisfying this prediate. For example the |
71 one item from the input list satisfying this prediate. For example the |
106 (hw input1, hw input2) |
106 (hw input1, hw input2) |
107 end" |
107 end" |
108 "((\"h\", [\"e\", \"l\", \"l\", \"o\"]), (\"w\", [\"o\", \"r\", \"l\", \"d\"]))"} |
108 "((\"h\", [\"e\", \"l\", \"l\", \"o\"]), (\"w\", [\"o\", \"r\", \"l\", \"d\"]))"} |
109 |
109 |
110 The functions @{ML "(op |--)"} and @{ML "(op --|)"} work like the sequencing funtion |
110 The functions @{ML "(op |--)"} and @{ML "(op --|)"} work like the sequencing funtion |
111 for parsers, except that they discard the item parsed by the first (respectively second) |
111 for parsers, except that they discard the item being parsed by the first (respectively second) |
112 parser. For example |
112 parser. For example |
113 |
113 |
114 @{ML_response [display] |
114 @{ML_response [display] |
115 "let |
115 "let |
116 val just_e = ($$ \"h\") |-- ($$ \"e\") |
116 val just_e = ($$ \"h\") |-- ($$ \"e\") |
134 (p input1, p input2) |
134 (p input1, p input2) |
135 end" |
135 end" |
136 "((\"h\", [\"e\", \"l\", \"l\", \"o\"]), (\"x\", [\"w\", \"o\", \"r\", \"l\", \"d\"]))"} |
136 "((\"h\", [\"e\", \"l\", \"l\", \"o\"]), (\"x\", [\"w\", \"o\", \"r\", \"l\", \"d\"]))"} |
137 |
137 |
138 The function @{ML Scan.option} works similarly, except no default value can |
138 The function @{ML Scan.option} works similarly, except no default value can |
139 be given and in the failure case this parser does nothing. |
139 be given. Instead, the result is wrapped as an @{text "option"}-type. For example: |
|
140 |
|
141 @{ML_response [display] |
|
142 "let |
|
143 val p = Scan.option ($$ \"h\") |
|
144 val input1 = (explode \"hello\") |
|
145 val input2 = (explode \"world\") |
|
146 in |
|
147 (p input1, p input2) |
|
148 end" "((SOME \"h\", [\"e\", \"l\", \"l\", \"o\"]), (NONE, [\"w\", \"o\", \"r\", \"l\", \"d\"]))"} |
140 |
149 |
141 The function @{ML "(op !!)"} helps to produce appropriate error messages |
150 The function @{ML "(op !!)"} helps to produce appropriate error messages |
142 during parsing. For example if one wants to parse that @{ML_text p} is immediately |
151 during parsing. For example if one wants to parse that @{ML_text p} is immediately |
143 followed by @{ML_text q}, or start a completely different parser @{ML_text r}, |
152 followed by @{ML_text q}, or start a completely different parser @{ML_text r}, |
144 one might write |
153 one might write |
219 Note that @{ML "Scan.repeat"} stores the parsed items in a list. The function |
228 Note that @{ML "Scan.repeat"} stores the parsed items in a list. The function |
220 @{ML "Scan.repeat1"} is similar, but requires that the parser @{ML_text "p"} |
229 @{ML "Scan.repeat1"} is similar, but requires that the parser @{ML_text "p"} |
221 succeeds at least once. |
230 succeeds at least once. |
222 |
231 |
223 Also note that the parser would have aborted with the exception @{ML_text MORE}, if |
232 Also note that the parser would have aborted with the exception @{ML_text MORE}, if |
224 we had run it only on just @{ML_text [quotes] "hhhh"}. This can be awoided using |
233 we had run it only on just @{ML_text [quotes] "hhhh"}. This can be avoided using |
225 the wrapper @{ML Scan.finite} and the ``stopper-token'' @{ML Symbol.stopper}. With |
234 the wrapper @{ML Scan.finite} and the ``stopper-token'' @{ML Symbol.stopper}. With |
226 them we can write |
235 them we can write |
227 |
236 |
228 @{ML_response [display] "Scan.finite Symbol.stopper (Scan.repeat ($$ \"h\")) (explode \"hhhh\")" |
237 @{ML_response [display] "Scan.finite Symbol.stopper (Scan.repeat ($$ \"h\")) (explode \"hhhh\")" |
229 "([\"h\", \"h\", \"h\", \"h\"], [])"} |
238 "([\"h\", \"h\", \"h\", \"h\"], [])"} |
230 |
239 |
231 However, the Isabelle-develloper rarely needs to do this kind of wrapping manually. |
240 However, this kind of manually wrapping needs to be done only very rarely |
|
241 in practise, because it is already done by the infrastructure for you. |
232 |
242 |
233 After parsing succeeded, one nearly always wants to apply a function on the parsed |
243 After parsing succeeded, one nearly always wants to apply a function on the parsed |
234 items. This is done using the function @{ML_open "(p >> f)" for p f} which runs |
244 items. This is done using the function @{ML_open "(p >> f)" for p f} which runs |
235 first the parser @{ML_text p} and upon successful completion applies the |
245 first the parser @{ML_text p} and upon successful completion applies the |
236 function @{ML_text f} to the result. For example |
246 function @{ML_text f} to the result. For example |
270 The definition for tokens is in the file @{ML_file "Pure/Isar/outer_lex.ML"}. |
280 The definition for tokens is in the file @{ML_file "Pure/Isar/outer_lex.ML"}. |
271 \end{readmore} |
281 \end{readmore} |
272 |
282 |
273 The structure @{ML_struct OuterLex} defines several kinds of token (for example |
283 The structure @{ML_struct OuterLex} defines several kinds of token (for example |
274 @{ML "OuterLex.Ident"} for identifiers, @{ML "OuterLex.Keyword"} for keywords and |
284 @{ML "OuterLex.Ident"} for identifiers, @{ML "OuterLex.Keyword"} for keywords and |
275 @{ML "OuterLex.Command"} for commands). Some parsers take into account the |
285 @{ML "OuterLex.Command"} for commands). Some token parsers take into account the |
276 kind of token. |
286 kind of token. |
277 |
287 |
278 We can generate a token list using the function @{ML "OuterSyntax.scan"}, which we give |
288 We can generate a token list out of a string using the function @{ML |
279 below @{ML "Position.none"} as argument since, at the moment, we are not interested in |
289 "OuterSyntax.scan"}, which we give below @{ML "Position.none"} as argument |
280 generating precise error messages. The following\footnote{There is something funny |
290 since, at the moment, we are not interested in generating precise error |
281 going on with the pretty printing of the result token list.} |
291 messages. The following |
282 |
292 |
283 @{ML_response_fake [display] "OuterSyntax.scan Position.none \"hello world\"" |
293 @{ML_response_fake [display] "OuterSyntax.scan Position.none \"hello world\"" |
284 "[Token (\<dots>,(OuterLex.Ident, \"hello\"),\<dots>), |
294 "[Token (\<dots>,(Ident, \"hello\"),\<dots>), |
285 Token (\<dots>,(OuterLex.Space, \" \"),\<dots>), |
295 Token (\<dots>,(Space, \" \"),\<dots>), |
286 Token (\<dots>,(OuterLex.Ident, \"world\"),\<dots>)]"} |
296 Token (\<dots>,(Ident, \"world\"),\<dots>)]"} |
287 |
297 |
288 produces three tokens where the first and the last are identifiers, since |
298 produces three tokens where the first and the last are identifiers, since |
289 @{ML_text [quotes] "hello"} and @{ML_text [quotes] "world"} do not match |
299 @{ML_text [quotes] "hello"} and @{ML_text [quotes] "world"} do not match any |
290 any other category. The second indicates a space. Many parsing functions |
300 other syntactic category.\footnote{Note that because of a possible a bug in |
291 later on will require spaces, comments and the like to have been filtered out. |
301 the PolyML runtime system the result is printed as @{text "?"}, instead of |
|
302 the token.} The second indicates a space. |
|
303 |
|
304 Many parsing functions later on will require spaces, comments and the like |
|
305 to have been filtered out. In what follows below, we are going to use the |
|
306 functions @{ML filter} and @{ML OuterLex.is_proper} do this. For example |
|
307 |
|
308 |
|
309 @{ML_response_fake [display] |
|
310 "let |
|
311 val input = OuterSyntax.scan Position.none \"hello world\" |
|
312 in |
|
313 filter OuterLex.is_proper input |
|
314 end" |
|
315 "[Token (\<dots>,(Ident, \"hello\"), \<dots>), Token (\<dots>,(Ident, \"world\"), \<dots>)]"} |
|
316 |
|
317 For convenience we are going to use the function |
|
318 |
|
319 *} |
|
320 |
|
321 ML {* |
|
322 fun filtered_input str = |
|
323 filter OuterLex.is_proper (OuterSyntax.scan Position.none str) |
|
324 *} |
|
325 |
|
326 text {* |
|
327 |
292 If we parse |
328 If we parse |
293 |
329 |
294 @{ML_response_fake [display] "OuterSyntax.scan Position.none \"inductive|for\"" |
330 @{ML_response_fake [display] |
295 "[Token (\<dots>,(OuterLex.Command, \"inductive\"),\<dots>), |
331 "filtered_input \"inductive | for\"" |
296 Token (\<dots>,(OuterLex.Keyword, \"|\"),\<dots>), |
332 "[Token (\<dots>,(Command, \"inductive\"),\<dots>), |
297 Token (\<dots>,(OuterLex.Keyword, \"for\"),\<dots>)]"} |
333 Token (\<dots>,(Keyword, \"|\"),\<dots>), |
298 |
334 Token (\<dots>,(Keyword, \"for\"),\<dots>)]"} |
299 we obtain a list consiting of only command and keyword tokens. |
335 |
|
336 we obtain a list consiting of only a command and two keyword tokens. |
300 If you want to see which keywords and commands are currently known, use |
337 If you want to see which keywords and commands are currently known, use |
301 the following (you might have to adjust the @{ML print_depth} in order to |
338 the following (you might have to adjust the @{ML print_depth} in order to |
302 see the complete list): |
339 see the complete list): |
303 |
340 |
304 @{ML_response_fake [display] |
341 @{ML_response_fake [display] |
308 (Scan.dest_lexicon commands, Scan.dest_lexicon keywords) |
345 (Scan.dest_lexicon commands, Scan.dest_lexicon keywords) |
309 end" |
346 end" |
310 "([\"}\",\"{\",\<dots>],[\"\<rightleftharpoons>\",\"\<leftharpoondown>\",\<dots>])"} |
347 "([\"}\",\"{\",\<dots>],[\"\<rightleftharpoons>\",\"\<leftharpoondown>\",\<dots>])"} |
311 |
348 |
312 Now the parser @{ML "OuterParse.$$$"} parses a single keyword. For example |
349 Now the parser @{ML "OuterParse.$$$"} parses a single keyword. For example |
313 |
350 |
314 @{ML_response [display] |
351 @{ML_response [display] |
315 "let |
352 "let |
316 val input1 = OuterSyntax.scan Position.none \"where for\" |
353 val input1 = filtered_input \"where for\" |
317 val input2 = OuterSyntax.scan Position.none \"|in\" |
354 val input2 = filtered_input \"| in\" |
318 in |
355 in |
319 (OuterParse.$$$ \"where\" input1, OuterParse.$$$ \"|\" input2) |
356 (OuterParse.$$$ \"where\" input1, OuterParse.$$$ \"|\" input2) |
320 end" |
357 end" |
321 "((\"where\",\<dots>),(\"|\",\<dots>))"} |
358 "((\"where\",\<dots>),(\"|\",\<dots>))"} |
322 |
359 |
323 Like before, we can sequentially connect parsers with @{ML "(op --)"}. For example |
360 Like before, we can sequentially connect parsers with @{ML "(op --)"}. For example |
324 |
361 |
325 @{ML_response [display] |
362 @{ML_response [display] |
326 "let |
363 "let |
327 val input = OuterSyntax.scan Position.none \"|in\" |
364 val input = filtered_input \"| in\" |
328 in |
365 in |
329 (OuterParse.$$$ \"|\" -- OuterParse.$$$ \"in\") input |
366 (OuterParse.$$$ \"|\" -- OuterParse.$$$ \"in\") input |
330 end" |
367 end" |
331 "((\"|\",\"in\"),[])"} |
368 "((\"|\",\"in\"),[])"} |
332 |
369 |
334 list of items recognised by the parser @{ML_text p}, where the items are |
371 list of items recognised by the parser @{ML_text p}, where the items are |
335 separated by the string @{ML_text s}. For example |
372 separated by the string @{ML_text s}. For example |
336 |
373 |
337 @{ML_response [display] |
374 @{ML_response [display] |
338 "let |
375 "let |
339 val input = OuterSyntax.scan Position.none \"in|in|in\\n\" |
376 val input = filtered_input \"in | in | in end\" |
340 in |
377 in |
341 (OuterParse.enum \"|\" (OuterParse.$$$ \"in\")) input |
378 (OuterParse.enum \"|\" (OuterParse.$$$ \"in\")) input |
342 end" |
379 end" |
343 "([\"in\",\"in\",\"in\"],[\<dots>])"} |
380 "([\"in\",\"in\",\"in\"],[\<dots>])"} |
344 |
381 |
345 @{ML "OuterParse.enum1"} works similarly, except that the parsed list must |
382 @{ML "OuterParse.enum1"} works similarly, except that the parsed list must |
346 be non-empty. |
383 be non-empty. Note that we had to add a @{ML_text [quotes] "end"} at the end |
347 |
384 of the parsed string, otherwise the parser would have consumed all tokens |
348 Note that we had to add a @{ML_text [quotes] "\\n"} at the end of the parsed |
385 and then failed with the exception @{ML_text "MORE"}. Like in the previous |
349 string, otherwise the parser would have consumed all tokens and then failed with |
386 section, we can avoid this exception using the wrapper @{ML |
350 the exception @{ML_text "MORE"}. Like in the previous section, we can avoid |
387 Scan.finite}. This time, however, we have to use the ``stopper-token'' @{ML |
351 this exception using the wrapper @{ML Scan.finite}. This time, however, we |
388 OuterLex.stopper}. We can write |
352 have to use the ``stopper-token'' @{ML OuterLex.stopper}. We can write |
|
353 |
389 |
354 @{ML_response [display] |
390 @{ML_response [display] |
355 "let |
391 "let |
356 val input = OuterSyntax.scan Position.none \"in|in|in\" |
392 val input = filtered_input \"in | in | in\" |
357 in |
393 in |
358 Scan.finite OuterLex.stopper (OuterParse.enum \"|\" (OuterParse.$$$ \"in\")) input |
394 Scan.finite OuterLex.stopper (OuterParse.enum \"|\" (OuterParse.$$$ \"in\")) input |
359 end" |
395 end" |
360 "([\"in\",\"in\",\"in\"],[])"} |
396 "([\"in\",\"in\",\"in\"],[])"} |
361 |
397 |
364 except that the error message is fixed to be @{text [quotes] "Outer syntax error"} |
400 except that the error message is fixed to be @{text [quotes] "Outer syntax error"} |
365 with a relatively precise description of the failure. For example: |
401 with a relatively precise description of the failure. For example: |
366 |
402 |
367 @{ML_response_fake [display] |
403 @{ML_response_fake [display] |
368 "let |
404 "let |
369 val input = OuterSyntax.scan Position.none \"in|\" |
405 val input = filtered_input \"in |\" |
370 val parse_bar_then_in = OuterParse.$$$ \"|\" -- OuterParse.$$$ \"in\" |
406 val parse_bar_then_in = OuterParse.$$$ \"|\" -- OuterParse.$$$ \"in\" |
371 in |
407 in |
372 Scan.error (OuterParse.!!! parse_bar_then_in) input |
408 Scan.error (OuterParse.!!! parse_bar_then_in) input |
373 end" |
409 end" |
374 "Exception ERROR \"Outer syntax error: keyword \"|\" expected, |
410 "Exception ERROR \"Outer syntax error: keyword \"|\" expected, |
375 but keyword in was found\" raised" |
411 but keyword in was found\" raised" |
376 } |
412 } |
377 |
413 |
378 *} |
414 *} |
379 |
415 |
380 |
|
381 ML {* |
|
382 let |
|
383 val input = filter OuterLex.is_proper (OuterSyntax.scan Position.none "(in foo)") |
|
384 in |
|
385 OuterParse.target input |
|
386 end |
|
387 *} |
|
388 |
|
389 section {* Positional Information *} |
416 section {* Positional Information *} |
390 |
417 |
391 text {* |
418 text {* |
392 |
419 |
393 @{ML OuterParse.position} |
420 @{ML OuterParse.position} |