92 |
92 |
93 Note how the result of consumed strings builds up on the left as nested pairs. |
93 Note how the result of consumed strings builds up on the left as nested pairs. |
94 |
94 |
95 Parsers that explore |
95 Parsers that explore |
96 alternatives can be constructed using the function @{ML "(op ||)"}. For example, the |
96 alternatives can be constructed using the function @{ML "(op ||)"}. For example, the |
97 parser @{ML_open "(p || q)" for p q} returns the result of @{ML_text "p"}, in case it succeeds, |
97 parser @{ML "(p || q)" for p q} returns the result of @{ML_text "p"}, in case it succeeds, |
98 otherwise it returns the result of @{ML_text "q"}. For example |
98 otherwise it returns the result of @{ML_text "q"}. For example |
99 |
99 |
100 @{ML_response [display] |
100 @{ML_response [display] |
101 "let |
101 "let |
102 val hw = ($$ \"h\") || ($$ \"w\") |
102 val hw = ($$ \"h\") || ($$ \"w\") |
119 in |
119 in |
120 (just_e input, just_h input) |
120 (just_e input, just_h input) |
121 end" |
121 end" |
122 "((\"e\", [\"l\", \"l\", \"o\"]),(\"h\", [\"l\", \"l\", \"o\"]))"} |
122 "((\"e\", [\"l\", \"l\", \"o\"]),(\"h\", [\"l\", \"l\", \"o\"]))"} |
123 |
123 |
124 The parser @{ML_open "Scan.optional p x" for p x} returns the result of the parser |
124 The parser @{ML "Scan.optional p x" for p x} returns the result of the parser |
125 @{ML_text "p"}, if it succeeds; otherwise it returns |
125 @{ML_text "p"}, if it succeeds; otherwise it returns |
126 the default value @{ML_text "x"}. For example |
126 the default value @{ML_text "x"}. For example |
127 |
127 |
128 @{ML_response [display] |
128 @{ML_response [display] |
129 "let |
129 "let |
150 The function @{ML "(op !!)"} helps to produce appropriate error messages |
150 The function @{ML "(op !!)"} helps to produce appropriate error messages |
151 during parsing. For example if one wants to parse that @{ML_text p} is immediately |
151 during parsing. For example if one wants to parse that @{ML_text p} is immediately |
152 followed by @{ML_text q}, or start a completely different parser @{ML_text r}, |
152 followed by @{ML_text q}, or start a completely different parser @{ML_text r}, |
153 one might write |
153 one might write |
154 |
154 |
155 @{ML_open [display] "(p -- q) || r" for p q r} |
155 @{ML [display] "(p -- q) || r" for p q r} |
156 |
156 |
157 However, this parser is problematic for producing an appropriate error message, in case |
157 However, this parser is problematic for producing an appropriate error message, in case |
158 the parsing of @{ML_open "(p -- q)" for p q} fails. Because in that case one loses with the parser |
158 the parsing of @{ML "(p -- q)" for p q} fails. Because in that case one loses with the parser |
159 above the information |
159 above the information |
160 that @{ML_text p} should be followed by @{ML_text q}. To see this consider the case in |
160 that @{ML_text p} should be followed by @{ML_text q}. To see this consider the case in |
161 which @{ML_text p} |
161 which @{ML_text p} |
162 is present in the input, but not @{ML_text q}. That means @{ML_open "(p -- q)" for p q} will fail |
162 is present in the input, but not @{ML_text q}. That means @{ML "(p -- q)" for p q} will fail |
163 and the |
163 and the |
164 alternative parser @{ML_text r} will be tried. However in many circumstance this will be the wrong |
164 alternative parser @{ML_text r} will be tried. However in many circumstance this will be the wrong |
165 parser for the input ``p-followed-by-q'' and therefore will also fail. The error message is then |
165 parser for the input ``p-followed-by-q'' and therefore will also fail. The error message is then |
166 caused by the |
166 caused by the |
167 failure of @{ML_text r}, not by the absence of @{ML_text q} in the input. This kind of situation |
167 failure of @{ML_text r}, not by the absence of @{ML_text q} in the input. This kind of situation |
217 @{ML_response [display] "Scan.error (p_followed_by_q \"h\" \"e\" \"w\") (explode \"wworld\")" |
217 @{ML_response [display] "Scan.error (p_followed_by_q \"h\" \"e\" \"w\") (explode \"wworld\")" |
218 "((\"w\", \"w\"), [\"o\", \"r\", \"l\", \"d\"])"} |
218 "((\"w\", \"w\"), [\"o\", \"r\", \"l\", \"d\"])"} |
219 |
219 |
220 yields the expected parsing. |
220 yields the expected parsing. |
221 |
221 |
222 The function @{ML_open "Scan.repeat p" for p} will apply a parser @{ML_text p} as |
222 The function @{ML "Scan.repeat p" for p} will apply a parser @{ML_text p} as |
223 often as it succeeds. For example |
223 often as it succeeds. For example |
224 |
224 |
225 @{ML_response [display] "Scan.repeat ($$ \"h\") (explode \"hhhhello\")" |
225 @{ML_response [display] "Scan.repeat ($$ \"h\") (explode \"hhhhello\")" |
226 "([\"h\", \"h\", \"h\", \"h\"], [\"e\", \"l\", \"l\", \"o\"])"} |
226 "([\"h\", \"h\", \"h\", \"h\"], [\"e\", \"l\", \"l\", \"o\"])"} |
227 |
227 |
239 |
239 |
240 However, this kind of manually wrapping needs to be done only very rarely |
240 However, this kind of manually wrapping needs to be done only very rarely |
241 in practise, because it is already done by the infrastructure for you. |
241 in practise, because it is already done by the infrastructure for you. |
242 |
242 |
243 After parsing succeeded, one nearly always wants to apply a function on the parsed |
243 After parsing succeeded, one nearly always wants to apply a function on the parsed |
244 items. This is done using the function @{ML_open "(p >> f)" for p f} which runs |
244 items. This is done using the function @{ML "(p >> f)" for p f} which runs |
245 first the parser @{ML_text p} and upon successful completion applies the |
245 first the parser @{ML_text p} and upon successful completion applies the |
246 function @{ML_text f} to the result. For example |
246 function @{ML_text f} to the result. For example |
247 |
247 |
248 @{ML_response [display] |
248 @{ML_response [display] |
249 "let |
249 "let |
279 @{ML_struct OuterParse} defined in the file @{ML_file "Pure/Isar/outer_parse.ML"}. |
279 @{ML_struct OuterParse} defined in the file @{ML_file "Pure/Isar/outer_parse.ML"}. |
280 The definition for tokens is in the file @{ML_file "Pure/Isar/outer_lex.ML"}. |
280 The definition for tokens is in the file @{ML_file "Pure/Isar/outer_lex.ML"}. |
281 \end{readmore} |
281 \end{readmore} |
282 |
282 |
283 The structure @{ML_struct OuterLex} defines several kinds of token (for example |
283 The structure @{ML_struct OuterLex} defines several kinds of token (for example |
284 @{ML "OuterLex.Ident"} for identifiers, @{ML "OuterLex.Keyword"} for keywords and |
284 @{ML "Ident" in OuterLex} for identifiers, @{ML "Keyword" in OuterLex} for keywords and |
285 @{ML "OuterLex.Command"} for commands). Some token parsers take into account the |
285 @{ML "Command" in OuterLex} for commands). Some token parsers take into account the |
286 kind of token. |
286 kind of token. |
287 |
287 *} |
288 We can generate a token list out of a string using the function @{ML |
288 |
289 "OuterSyntax.scan"}, which we give below @{ML "Position.none"} as argument |
289 text {* |
290 since, at the moment, we are not interested in generating precise error |
290 For the examples below, we can generate a token list out of a string using |
291 messages. The following |
291 the function @{ML "OuterSyntax.scan"}, which we give below @{ML |
|
292 "Position.none"} as argument since, at the moment, we are not interested in |
|
293 generating precise error messages. The following |
|
294 |
292 |
295 |
293 @{ML_response_fake [display] "OuterSyntax.scan Position.none \"hello world\"" |
296 @{ML_response_fake [display] "OuterSyntax.scan Position.none \"hello world\"" |
294 "[Token (\<dots>,(Ident, \"hello\"),\<dots>), |
297 "[Token (\<dots>,(Ident, \"hello\"),\<dots>), |
295 Token (\<dots>,(Space, \" \"),\<dots>), |
298 Token (\<dots>,(Space, \" \"),\<dots>), |
296 Token (\<dots>,(Ident, \"world\"),\<dots>)]"} |
299 Token (\<dots>,(Ident, \"world\"),\<dots>)]"} |
300 other syntactic category.\footnote{Note that because of a possible a bug in |
303 other syntactic category.\footnote{Note that because of a possible a bug in |
301 the PolyML runtime system the result is printed as @{text "?"}, instead of |
304 the PolyML runtime system the result is printed as @{text "?"}, instead of |
302 the token.} The second indicates a space. |
305 the token.} The second indicates a space. |
303 |
306 |
304 Many parsing functions later on will require spaces, comments and the like |
307 Many parsing functions later on will require spaces, comments and the like |
305 to have been filtered out. In what follows below, we are going to use the |
308 to have already been filtered out. So from now on we are going to use the |
306 functions @{ML filter} and @{ML OuterLex.is_proper} do this. For example |
309 functions @{ML filter} and @{ML OuterLex.is_proper} do this. For example |
307 |
310 |
308 |
311 |
309 @{ML_response_fake [display] |
312 @{ML_response_fake [display] |
310 "let |
313 "let |
365 in |
368 in |
366 (OuterParse.$$$ \"|\" -- OuterParse.$$$ \"in\") input |
369 (OuterParse.$$$ \"|\" -- OuterParse.$$$ \"in\") input |
367 end" |
370 end" |
368 "((\"|\",\"in\"),[])"} |
371 "((\"|\",\"in\"),[])"} |
369 |
372 |
370 The parser @{ML_open "OuterParse.enum s p" for s p} parses a possibly empty |
373 The parser @{ML "OuterParse.enum s p" for s p} parses a possibly empty |
371 list of items recognised by the parser @{ML_text p}, where the items are |
374 list of items recognised by the parser @{ML_text p}, where the items being parsed |
372 separated by the string @{ML_text s}. For example |
375 are separated by the string @{ML_text s}. For example |
373 |
376 |
374 @{ML_response [display] |
377 @{ML_response [display] |
375 "let |
378 "let |
376 val input = filtered_input \"in | in | in end\" |
379 val input = filtered_input \"in | in | in foo\" |
377 in |
380 in |
378 (OuterParse.enum \"|\" (OuterParse.$$$ \"in\")) input |
381 (OuterParse.enum \"|\" (OuterParse.$$$ \"in\")) input |
379 end" |
382 end" |
380 "([\"in\",\"in\",\"in\"],[\<dots>])"} |
383 "([\"in\",\"in\",\"in\"],[\<dots>])"} |
381 |
384 |
382 @{ML "OuterParse.enum1"} works similarly, except that the parsed list must |
385 @{ML "OuterParse.enum1"} works similarly, except that the parsed list must |
383 be non-empty. Note that we had to add a @{ML_text [quotes] "end"} at the end |
386 be non-empty. Note that we had to add an @{ML_text [quotes] "foo"} at the end |
384 of the parsed string, otherwise the parser would have consumed all tokens |
387 of the parsed string, otherwise the parser would have consumed all tokens |
385 and then failed with the exception @{ML_text "MORE"}. Like in the previous |
388 and then failed with the exception @{ML_text "MORE"}. Like in the previous |
386 section, we can avoid this exception using the wrapper @{ML |
389 section, we can avoid this exception using the wrapper @{ML |
387 Scan.finite}. This time, however, we have to use the ``stopper-token'' @{ML |
390 Scan.finite}. This time, however, we have to use the ``stopper-token'' @{ML |
388 OuterLex.stopper}. We can write |
391 OuterLex.stopper}. We can write |
389 |
392 |
390 @{ML_response [display] |
393 @{ML_response [display] |
391 "let |
394 "let |
392 val input = filtered_input \"in | in | in\" |
395 val input = filtered_input \"in | in | in\" |
393 in |
396 in |
394 Scan.finite OuterLex.stopper (OuterParse.enum \"|\" (OuterParse.$$$ \"in\")) input |
397 Scan.finite OuterLex.stopper |
|
398 (OuterParse.enum \"|\" (OuterParse.$$$ \"in\")) input |
395 end" |
399 end" |
396 "([\"in\",\"in\",\"in\"],[])"} |
400 "([\"in\",\"in\",\"in\"],[])"} |
|
401 |
|
402 The following function will help us to run examples |
|
403 |
|
404 *} |
|
405 |
|
406 ML {* |
|
407 fun parse p input = Scan.finite OuterLex.stopper (Scan.error p) input |
|
408 *} |
|
409 |
|
410 text {* |
397 |
411 |
398 The function @{ML "OuterParse.!!!"} can be used to force termination of the |
412 The function @{ML "OuterParse.!!!"} can be used to force termination of the |
399 parser in case of a dead end, just like @{ML "Scan.!!"} (see previous section), |
413 parser in case of a dead end, just like @{ML "Scan.!!"} (see previous section), |
400 except that the error message is fixed to be @{text [quotes] "Outer syntax error"} |
414 except that the error message is fixed to be @{text [quotes] "Outer syntax error"} |
401 with a relatively precise description of the failure. For example: |
415 with a relatively precise description of the failure. For example: |
403 @{ML_response_fake [display] |
417 @{ML_response_fake [display] |
404 "let |
418 "let |
405 val input = filtered_input \"in |\" |
419 val input = filtered_input \"in |\" |
406 val parse_bar_then_in = OuterParse.$$$ \"|\" -- OuterParse.$$$ \"in\" |
420 val parse_bar_then_in = OuterParse.$$$ \"|\" -- OuterParse.$$$ \"in\" |
407 in |
421 in |
408 Scan.error (OuterParse.!!! parse_bar_then_in) input |
422 parse (OuterParse.!!! parse_bar_then_in) input |
409 end" |
423 end" |
410 "Exception ERROR \"Outer syntax error: keyword \"|\" expected, |
424 "Exception ERROR \"Outer syntax error: keyword \"|\" expected, |
411 but keyword in was found\" raised" |
425 but keyword in was found\" raised" |
412 } |
426 } |
413 |
427 |
414 *} |
428 \begin{exercise} |
|
429 A type-identifier, for example @{typ "'a"}, is a token of |
|
430 kind @{ML "Keyword" in OuterLex} can be parsed |
|
431 |
|
432 \end{exercise} |
|
433 |
|
434 |
|
435 *} |
|
436 |
415 |
437 |
416 section {* Positional Information *} |
438 section {* Positional Information *} |
417 |
439 |
418 text {* |
440 text {* |
419 |
441 |