18 these outer syntax parsers, either for new definitional packages or for |
18 these outer syntax parsers, either for new definitional packages or for |
19 calling methods with specific arguments. |
19 calling methods with specific arguments. |
20 |
20 |
21 \begin{readmore} |
21 \begin{readmore} |
22 The library for writing parser combinators is split up, roughly, into two |
22 The library for writing parser combinators is split up, roughly, into two |
23 parts. The first part consists of a collection of generic parser combinators |
23 parts: The first part consists of a collection of generic parser combinators |
24 defined in the structure @{ML_struct Scan} in the file @{ML_file |
24 defined in the structure @{ML_struct Scan} in the file @{ML_file |
25 "Pure/General/scan.ML"}. The second part of the library consists of |
25 "Pure/General/scan.ML"}. The second part of the library consists of |
26 combinators for dealing with specific token types, which are defined in the |
26 combinators for dealing with specific token types, which are defined in the |
27 structure @{ML_struct OuterParse} in the file @{ML_file |
27 structure @{ML_struct OuterParse} in the file @{ML_file |
28 "Pure/Isar/outer_parse.ML"}. Specific parsers for packages are defined in |
28 "Pure/Isar/outer_parse.ML"}. In addition specific parsers for packages are |
29 @{ML_file "Pure/Isar/spec_parse.ML"}. Parsers for method arguments are |
29 defined in @{ML_file "Pure/Isar/spec_parse.ML"}. Parsers for method arguments |
30 defined in @{ML_file "Pure/Isar/args.ML"}. |
30 are defined in @{ML_file "Pure/Isar/args.ML"}. |
31 \end{readmore} |
31 \end{readmore} |
32 |
32 |
33 *} |
33 *} |
34 |
34 |
35 section {* Building Generic Parsers *} |
35 section {* Building Generic Parsers *} |
71 However, note that these exceptions are private to the parser and cannot be accessed |
71 However, note that these exceptions are private to the parser and cannot be accessed |
72 by the programmer (for example to handle them). |
72 by the programmer (for example to handle them). |
73 |
73 |
74 In the examples above we use the function @{ML_ind Symbol.explode}, instead of the |
74 In the examples above we use the function @{ML_ind Symbol.explode}, instead of the |
75 more standard library function @{ML_ind explode}, for obtaining an input list for |
75 more standard library function @{ML_ind explode}, for obtaining an input list for |
76 the parser. The reason is that @{ML_ind Symbol.explode} is aware of character sequences, |
76 the parser. The reason is that @{ML_ind Symbol.explode} is aware of character sequences, |
77 for example @{text "\<foo>"}, that have a special meaning in Isabelle. To see |
77 for example @{text "\<foo>"}, that have a special meaning in Isabelle. To see |
78 the difference consider |
78 the difference consider |
79 |
79 |
80 @{ML_response_fake [display,gray] |
80 @{ML_response_fake [display,gray] |
81 "let |
81 "let |
112 "(((\"h\", \"e\"), \"l\"), [\"l\", \"o\"])"} |
112 "(((\"h\", \"e\"), \"l\"), [\"l\", \"o\"])"} |
113 |
113 |
114 Note how the result of consumed strings builds up on the left as nested pairs. |
114 Note how the result of consumed strings builds up on the left as nested pairs. |
115 |
115 |
116 If, as in the previous example, you want to parse a particular string, |
116 If, as in the previous example, you want to parse a particular string, |
117 then you should use the function @{ML_ind this_string in Scan}: |
117 then you can use the function @{ML_ind this_string in Scan}. |
118 |
118 |
119 @{ML_response [display,gray] |
119 @{ML_response [display,gray] |
120 "Scan.this_string \"hell\" (Symbol.explode \"hello\")" |
120 "Scan.this_string \"hell\" (Symbol.explode \"hello\")" |
121 "(\"hell\", [\"o\"])"} |
121 "(\"hell\", [\"o\"])"} |
122 |
122 |
174 val input2 = Symbol.explode \"world\" |
174 val input2 = Symbol.explode \"world\" |
175 in |
175 in |
176 (p input1, p input2) |
176 (p input1, p input2) |
177 end" "((SOME \"h\", [\"e\", \"l\", \"l\", \"o\"]), (NONE, [\"w\", \"o\", \"r\", \"l\", \"d\"]))"} |
177 end" "((SOME \"h\", [\"e\", \"l\", \"l\", \"o\"]), (NONE, [\"w\", \"o\", \"r\", \"l\", \"d\"]))"} |
178 |
178 |
179 The function @{ML_ind "!!"} helps to produce appropriate error messages |
179 The function @{ML_ind ahead in Scan} parses some input, but leaves the original |
180 for parsing. For example if you want to parse @{text p} immediately |
180 input unchanged. For example: |
|
181 |
|
182 @{ML_response [display,gray] |
|
183 "Scan.ahead (Scan.this_string \"foo\") (Symbol.explode \"foo\")" |
|
184 "(\"foo\", [\"f\", \"o\", \"o\"])"} |
|
185 |
|
186 The function @{ML_ind "!!"} helps with producing appropriate error messages |
|
187 during parsing. For example if you want to parse @{text p} immediately |
181 followed by @{text q}, or start a completely different parser @{text r}, |
188 followed by @{text q}, or start a completely different parser @{text r}, |
182 you might write: |
189 you might write: |
183 |
190 |
184 @{ML [display,gray] "(p -- q) || r" for p q r} |
191 @{ML [display,gray] "(p -- q) || r" for p q r} |
185 |
192 |
186 However, this parser is problematic for producing an appropriate error |
193 However, this parser is problematic for producing a useful error |
187 message, if the parsing of @{ML "(p -- q)" for p q} fails. Because in that |
194 message, if the parsing of @{ML "(p -- q)" for p q} fails. Because with the |
188 case you lose the information that @{text p} should be followed by @{text q}. |
195 parser above you lose the information that @{text p} should be followed by @{text q}. |
189 To see this assume that @{text p} is present in the input, but it is not |
196 To see this assume that @{text p} is present in the input, but it is not |
190 followed by @{text q}. That means @{ML "(p -- q)" for p q} will fail and |
197 followed by @{text q}. That means @{ML "(p -- q)" for p q} will fail and |
191 hence the alternative parser @{text r} will be tried. However, in many |
198 hence the alternative parser @{text r} will be tried. However, in many |
192 circumstances this will be the wrong parser for the input ``@{text "p"}-followed-by-something'' |
199 circumstances this will be the wrong parser for the input ``@{text "p"}-followed-by-something'' |
193 and therefore will also fail. The error message is then caused by the failure |
200 and therefore will also fail. The error message is then caused by the failure |
220 |
227 |
221 This ``prefixing'' is usually done by wrappers such as @{ML_ind local_theory in OuterSyntax} |
228 This ``prefixing'' is usually done by wrappers such as @{ML_ind local_theory in OuterSyntax} |
222 (see Section~\ref{sec:newcommand} which explains this function in more detail). |
229 (see Section~\ref{sec:newcommand} which explains this function in more detail). |
223 |
230 |
224 Let us now return to our example of parsing @{ML "(p -- q) || r" for p q |
231 Let us now return to our example of parsing @{ML "(p -- q) || r" for p q |
225 r}. If you want to generate the correct error message for |
232 r}. If you want to generate the correct error message for failure |
226 @{text "p"}-followed-by-@{text "q"}, then you have to write: |
233 of parsing @{text "p"}-followed-by-@{text "q"}, then you have to write: |
227 *} |
234 *} |
228 |
235 |
229 ML{*fun p_followed_by_q p q r = |
236 ML{*fun p_followed_by_q p q r = |
230 let |
237 let |
231 val err_msg = fn _ => p ^ " is not followed by " ^ q |
238 val err_msg = fn _ => p ^ " is not followed by " ^ q |
258 Note that @{ML_ind repeat in Scan} stores the parsed items in a list. The function |
265 Note that @{ML_ind repeat in Scan} stores the parsed items in a list. The function |
259 @{ML_ind repeat1 in Scan} is similar, but requires that the parser @{text "p"} |
266 @{ML_ind repeat1 in Scan} is similar, but requires that the parser @{text "p"} |
260 succeeds at least once. |
267 succeeds at least once. |
261 |
268 |
262 Also note that the parser would have aborted with the exception @{text MORE}, if |
269 Also note that the parser would have aborted with the exception @{text MORE}, if |
263 you had run it only on just @{text [quotes] "hhhh"}. This can be avoided by using |
270 you had it run with the string @{text [quotes] "hhhh"}. This can be avoided by using |
264 the wrapper @{ML_ind finite in Scan} and the ``stopper-token'' |
271 the wrapper @{ML_ind finite in Scan} and the ``stopper-token'' |
265 @{ML_ind stopper in Symbol}. With them you can write: |
272 @{ML_ind stopper in Symbol}. With them you can write: |
266 |
273 |
267 @{ML_response [display,gray] "Scan.finite Symbol.stopper (Scan.repeat ($$ \"h\")) (Symbol.explode \"hhhh\")" |
274 @{ML_response [display,gray] "Scan.finite Symbol.stopper (Scan.repeat ($$ \"h\")) (Symbol.explode \"hhhh\")" |
268 "([\"h\", \"h\", \"h\", \"h\"], [])"} |
275 "([\"h\", \"h\", \"h\", \"h\"], [])"} |
269 |
276 |
270 @{ML Symbol.stopper} is the ``end-of-input'' indicator for parsing strings; |
277 The function @{ML stopper in Symbol} is the ``end-of-input'' indicator for parsing strings; |
271 other stoppers need to be used when parsing, for example, tokens. However, this kind of |
278 other stoppers need to be used when parsing, for example, tokens. However, this kind of |
272 manually wrapping is often already done by the surrounding infrastructure. |
279 manually wrapping is often already done by the surrounding infrastructure. |
273 |
280 |
274 The function @{ML_ind repeat in Scan} can be used with @{ML_ind one in Scan} to read any |
281 The function @{ML_ind repeat in Scan} can be used with @{ML_ind one in Scan} to read any |
275 string as in |
282 string as in |
323 function @{text f} to the result. For example |
330 function @{text f} to the result. For example |
324 |
331 |
325 @{ML_response [display,gray] |
332 @{ML_response [display,gray] |
326 "let |
333 "let |
327 fun double (x, y) = (x ^ x, y ^ y) |
334 fun double (x, y) = (x ^ x, y ^ y) |
|
335 val parser = $$ \"h\" -- $$ \"e\" |
328 in |
336 in |
329 (($$ \"h\") -- ($$ \"e\") >> double) (Symbol.explode \"hello\") |
337 (parser >> double) (Symbol.explode \"hello\") |
330 end" |
338 end" |
331 "((\"hh\", \"ee\"), [\"l\", \"l\", \"o\"])"} |
339 "((\"hh\", \"ee\"), [\"l\", \"l\", \"o\"])"} |
332 |
340 |
333 doubles the two parsed input strings; or |
341 doubles the two parsed input strings; or |
334 |
342 |
341 end" |
349 end" |
342 "(\"foo bar foo\",[])"} |
350 "(\"foo bar foo\",[])"} |
343 |
351 |
344 where the single-character strings in the parsed output are transformed |
352 where the single-character strings in the parsed output are transformed |
345 back into one string. |
353 back into one string. |
346 |
|
347 (FIXME: move to an earlier place) |
|
348 |
|
349 The function @{ML_ind ahead in Scan} parses some input, but leaves the original |
|
350 input unchanged. For example: |
|
351 |
|
352 @{ML_response [display,gray] |
|
353 "Scan.ahead (Scan.this_string \"foo\") (Symbol.explode \"foo\")" |
|
354 "(\"foo\", [\"f\", \"o\", \"o\"])"} |
|
355 |
354 |
356 The function @{ML_ind lift in Scan} takes a parser and a pair as arguments. This function applies |
355 The function @{ML_ind lift in Scan} takes a parser and a pair as arguments. This function applies |
357 the given parser to the second component of the pair and leaves the first component |
356 the given parser to the second component of the pair and leaves the first component |
358 untouched. For example |
357 untouched. For example |
359 |
358 |
413 |
412 |
414 produces three tokens where the first and the last are identifiers, since |
413 produces three tokens where the first and the last are identifiers, since |
415 @{text [quotes] "hello"} and @{text [quotes] "world"} do not match any |
414 @{text [quotes] "hello"} and @{text [quotes] "world"} do not match any |
416 other syntactic category. The second indicates a space. |
415 other syntactic category. The second indicates a space. |
417 |
416 |
418 We can easily change what is recognised as a keyword with |
417 We can easily change what is recognised as a keyword with the function |
419 @{ML_ind keyword in OuterKeyword}. For example calling this function |
418 @{ML_ind keyword in OuterKeyword}. For example calling it with |
420 *} |
419 *} |
421 |
420 |
422 ML{*val _ = OuterKeyword.keyword "hello"*} |
421 ML{*val _ = OuterKeyword.keyword "hello"*} |
423 |
422 |
424 text {* |
423 text {* |
515 in |
514 in |
516 (OuterParse.enum \"|\" (OuterParse.$$$ \"in\")) input |
515 (OuterParse.enum \"|\" (OuterParse.$$$ \"in\")) input |
517 end" |
516 end" |
518 "([\"in\", \"in\", \"in\"], [\<dots>])"} |
517 "([\"in\", \"in\", \"in\"], [\<dots>])"} |
519 |
518 |
520 @{ML_ind enum1 in OuterParse} works similarly, except that the parsed list must |
519 The function @{ML_ind enum1 in OuterParse} works similarly, except that the |
521 be non-empty. Note that we had to add a string @{text [quotes] "foo"} at the |
520 parsed list must be non-empty. Note that we had to add a string @{text |
522 end of the parsed string, otherwise the parser would have consumed all |
521 [quotes] "foo"} at the end of the parsed string, otherwise the parser would |
523 tokens and then failed with the exception @{text "MORE"}. Like in the |
522 have consumed all tokens and then failed with the exception @{text |
524 previous section, we can avoid this exception using the wrapper @{ML |
523 "MORE"}. Like in the previous section, we can avoid this exception using the |
525 Scan.finite}. This time, however, we have to use the ``stopper-token'' @{ML |
524 wrapper @{ML Scan.finite}. This time, however, we have to use the |
526 OuterLex.stopper}. We can write: |
525 ``stopper-token'' @{ML OuterLex.stopper}. We can write: |
527 |
526 |
528 @{ML_response [display,gray] |
527 @{ML_response [display,gray] |
529 "let |
528 "let |
530 val input = filtered_input \"in | in | in\" |
529 val input = filtered_input \"in | in | in\" |
|
530 val p = OuterParse.enum \"|\" (OuterParse.$$$ \"in\") |
531 in |
531 in |
532 Scan.finite OuterLex.stopper |
532 Scan.finite OuterLex.stopper p input |
533 (OuterParse.enum \"|\" (OuterParse.$$$ \"in\")) input |
|
534 end" |
533 end" |
535 "([\"in\", \"in\", \"in\"], [])"} |
534 "([\"in\", \"in\", \"in\"], [])"} |
536 |
535 |
537 The following function will help to run examples. |
536 The following function will help to run examples. |
538 *} |
537 *} |
539 |
538 |
540 ML{*fun parse p input = Scan.finite OuterLex.stopper (Scan.error p) input *} |
539 ML{*fun parse p input = Scan.finite OuterLex.stopper (Scan.error p) input *} |
541 |
540 |
542 text {* |
541 text {* |
543 |
542 The function @{ML_ind "!!!" in OuterParse} can be used to force termination |
544 The function @{ML_ind "!!!" in OuterParse} can be used to force termination of the |
543 of the parser in case of a dead end, just like @{ML "Scan.!!"} (see previous |
545 parser in case of a dead end, just like @{ML "Scan.!!"} (see previous section). |
544 section). A difference, however, is that the error message of @{ML |
546 Except that the error message of @{ML "OuterParse.!!!"} is fixed to be |
545 "OuterParse.!!!"} is fixed to be @{text [quotes] "Outer syntax error"} |
547 @{text [quotes] "Outer syntax error"} |
|
548 together with a relatively precise description of the failure. For example: |
546 together with a relatively precise description of the failure. For example: |
549 |
547 |
550 @{ML_response_fake [display,gray] |
548 @{ML_response_fake [display,gray] |
551 "let |
549 "let |
552 val input = filtered_input \"in |\" |
550 val input = filtered_input \"in |\" |
567 (FIXME: or give parser for numbers) |
565 (FIXME: or give parser for numbers) |
568 |
566 |
569 Whenever there is a possibility that the processing of user input can fail, |
567 Whenever there is a possibility that the processing of user input can fail, |
570 it is a good idea to give all available information about where the error |
568 it is a good idea to give all available information about where the error |
571 occurred. For this Isabelle can attach positional information to tokens |
569 occurred. For this Isabelle can attach positional information to tokens |
572 and then thread this information up the processing chain. To see this, |
570 and then thread this information up the ``processing chain''. To see this, |
573 modify the function @{ML filtered_input} described earlier to |
571 modify the function @{ML filtered_input}, described earlier, as follows |
574 *} |
572 *} |
575 |
573 |
576 ML{*fun filtered_input' str = |
574 ML{*fun filtered_input' str = |
577 filter OuterLex.is_proper (OuterSyntax.scan (Position.line 7) str) *} |
575 filter OuterLex.is_proper (OuterSyntax.scan (Position.line 7) str) *} |
578 |
576 |
585 Token ((\"bar\", ({line=8, end_line=8}, {line=8})), (Ident, \"bar\"), \<dots>)]"} |
583 Token ((\"bar\", ({line=8, end_line=8}, {line=8})), (Ident, \"bar\"), \<dots>)]"} |
586 |
584 |
587 in which the @{text [quotes] "\\n"} causes the second token to be in |
585 in which the @{text [quotes] "\\n"} causes the second token to be in |
588 line 8. |
586 line 8. |
589 |
587 |
590 By using the parser @{ML OuterParse.position} you can access the token position |
588 By using the parser @{ML position in OuterParse} you can access the token |
591 and return it as part of the parser result. For example |
589 position and return it as part of the parser result. For example |
592 |
590 |
593 @{ML_response_fake [display,gray] |
591 @{ML_response_fake [display,gray] |
594 "let |
592 "let |
595 val input = filtered_input' \"where\" |
593 val input = filtered_input' \"where\" |
596 in |
594 in |
603 @{ML_file "Pure/General/position.ML"}. |
601 @{ML_file "Pure/General/position.ML"}. |
604 \end{readmore} |
602 \end{readmore} |
605 |
603 |
606 *} |
604 *} |
607 |
605 |
608 text {* |
606 section {* Parsers for ML-Code (TBD) *} |
609 (FIXME: there are also handy parsers for ML-expressions and ML-files) |
607 |
|
608 text {* |
|
609 @{ML_ind ML_source in OuterParse} |
610 *} |
610 *} |
611 |
611 |
612 section {* Context Parser (TBD) *} |
612 section {* Context Parser (TBD) *} |
613 |
613 |
614 text {* |
614 text {* |
|
615 @{ML_ind Args.context} |
|
616 *} |
|
617 (* |
|
618 ML {* |
|
619 let |
|
620 val parser = Args.context -- Scan.lift Args.name_source |
|
621 |
|
622 fun term_pat (ctxt, str) = |
|
623 str |> Syntax.read_prop ctxt |
|
624 in |
|
625 (parser >> term_pat) (Context.Proof @{context}, filtered_input "f (a::nat)") |
|
626 |> fst |
|
627 end |
|
628 *} |
|
629 *) |
|
630 |
|
631 text {* |
|
632 @{ML_ind Args.context} |
|
633 |
615 Used for example in \isacommand{attribute\_setup} and \isacommand{method\_setup}. |
634 Used for example in \isacommand{attribute\_setup} and \isacommand{method\_setup}. |
616 *} |
635 *} |
617 |
636 |
618 section {* Argument and Attribute Parsers (TBD) *} |
637 section {* Argument and Attribute Parsers (TBD) *} |
619 |
638 |
620 section {* Parsing Inner Syntax *} |
639 section {* Parsing Inner Syntax *} |
621 |
640 |
622 text {* |
641 text {* |
623 There is usually no need to write your own parser for parsing inner syntax, that is |
642 There is usually no need to write your own parser for parsing inner syntax, that is |
624 for terms and types: you can just call the predefined parsers. Terms can |
643 for terms and types: you can just call the predefined parsers. Terms can |
625 be parsed using the function @{ML_ind term in OuterParse}. For example: |
644 be parsed using the function @{ML_ind term in OuterParse}. For example: |
626 |
645 |
627 @{ML_response [display,gray] |
646 @{ML_response [display,gray] |
628 "let |
647 "let |
629 val input = OuterSyntax.scan Position.none \"foo\" |
648 val input = OuterSyntax.scan Position.none \"foo\" |
630 in |
649 in |
631 OuterParse.term input |
650 OuterParse.term input |
632 end" |
651 end" |
633 "(\"\\^E\\^Ftoken\\^Efoo\\^E\\^F\\^E\", [])"} |
652 "(\"\\^E\\^Ftoken\\^Efoo\\^E\\^F\\^E\", [])"} |
634 |
653 |
635 The function @{ML_ind prop in OuterParse} is similar, except that it gives a different |
654 The function @{ML_ind prop in OuterParse} is similar, except that it gives a different |
636 error message, when parsing fails. As you can see, the parser not just returns |
655 error message, when parsing fails. As you can see, the parser not just returns |
637 the parsed string, but also some encoded information. You can decode the |
656 the parsed string, but also some encoded information. You can decode the |
638 information with the function @{ML_ind parse in YXML}. For example |
657 information with the function @{ML_ind parse in YXML} in @{ML_struct YXML}. For example |
639 |
658 |
640 @{ML_response [display,gray] |
659 @{ML_response [display,gray] |
641 "YXML.parse \"\\^E\\^Ftoken\\^Efoo\\^E\\^F\\^E\"" |
660 "YXML.parse \"\\^E\\^Ftoken\\^Efoo\\^E\\^F\\^E\"" |
642 "XML.Elem (\"token\", [], [XML.Text \"foo\"])"} |
661 "XML.Elem (\"token\", [], [XML.Text \"foo\"])"} |
643 |
662 |
952 |
971 |
953 |
972 |
954 @{text [display] "$ isabelle emacs -k foobar a_theory_file"} |
973 @{text [display] "$ isabelle emacs -k foobar a_theory_file"} |
955 |
974 |
956 If you now build a theory on top of @{text "Command.thy"}, |
975 If you now build a theory on top of @{text "Command.thy"}, |
957 then the command \isacommand{foobar} can be used. You can just write |
976 then you can use the command \isacommand{foobar}. You can just write |
958 *} |
977 *} |
959 |
978 |
960 foobar |
979 foobar |
961 |
980 |
962 text {* |
981 text {* |
963 but you will not see any action as we chose to implement this command to do |
982 but you will not see any action as we chose to implement this command to do |
964 nothing. A similarly procedure has to be done with any other new command, and |
983 nothing. The point of this command is to only show the procedure of how |
965 also any new keyword that is introduced with @{ML_ind OuterKeyword.keyword}. |
984 to interact with ProofGeneral. A similar procedure has to be done with any |
966 |
985 other new command, and also any new keyword that is introduced with |
|
986 @{ML_ind OuterKeyword.keyword}. |
967 *} |
987 *} |
968 |
988 |
969 ML{*val _ = OuterKeyword.keyword "blink" *} |
989 ML{*val _ = OuterKeyword.keyword "blink" *} |
970 |
990 |
971 text {* |
991 text {* |