93 Note how the result of consumed strings builds up on the left as nested pairs. |
93 Note how the result of consumed strings builds up on the left as nested pairs. |
94 |
94 |
95 If, as in the previous example, one wants to parse a particular string, |
95 If, as in the previous example, one wants to parse a particular string, |
96 then one should use the function @{ML Scan.this_string}: |
96 then one should use the function @{ML Scan.this_string}: |
97 |
97 |
98 @{ML_response [display] "Scan.this_string \"hel\" (explode \"hello\")" |
98 @{ML_response [display] "Scan.this_string \"hell\" (explode \"hello\")" |
99 "(\"hel\", [\"l\", \"o\"])"} |
99 "(\"hell\", [\"o\"])"} |
100 |
100 |
101 Parsers that explore alternatives can be constructed using the function @{ML |
101 Parsers that explore alternatives can be constructed using the function @{ML |
102 "(op ||)"}. For example, the parser @{ML "(p || q)" for p q} returns the |
102 "(op ||)"}. For example, the parser @{ML "(p || q)" for p q} returns the |
103 result of @{text "p"}, in case it succeeds, otherwise it returns the |
103 result of @{text "p"}, in case it succeeds, otherwise it returns the |
104 result of @{text "q"}. For example |
104 result of @{text "q"}. For example |
160 one might write |
160 one might write |
161 |
161 |
162 @{ML [display] "(p -- q) || r" for p q r} |
162 @{ML [display] "(p -- q) || r" for p q r} |
163 |
163 |
164 However, this parser is problematic for producing an appropriate error message, in case |
164 However, this parser is problematic for producing an appropriate error message, in case |
165 the parsing of @{ML "(p -- q)" for p q} fails. Because in that case one loses with the parser |
165 the parsing of @{ML "(p -- q)" for p q} fails. Because in that case one loses the information |
166 above the information |
|
167 that @{text p} should be followed by @{text q}. To see this consider the case in |
166 that @{text p} should be followed by @{text q}. To see this consider the case in |
168 which @{text p} |
167 which @{text p} |
169 is present in the input, but not @{text q}. That means @{ML "(p -- q)" for p q} will fail |
168 is present in the input, but not @{text q}. That means @{ML "(p -- q)" for p q} will fail |
170 and the |
169 and the |
171 alternative parser @{text r} will be tried. However in many circumstance this will be the wrong |
170 alternative parser @{text r} will be tried. However in many circumstance this will be the wrong |
172 parser for the input ``p-followed-by-q'' and therefore will also fail. The error message is then |
171 parser for the input ``p-followed-by-q'' and therefore will also fail. The error message is then |
173 caused by the |
172 caused by the |
174 failure of @{text r}, not by the absence of @{text q} in the input. This kind of situation |
173 failure of @{text r}, not by the absence of @{text q} in the input. This kind of situation |
175 can be avoided by using the function @{ML "(op !!)"}. This function aborts the whole process of |
174 can be avoided when using the function @{ML "(op !!)"}. This function aborts the whole process of |
176 parsing in case of a failure and invokes an error message. For example if we invoke the parser |
175 parsing in case of a failure and prints an error message. For example if we invoke the parser |
177 |
176 |
178 @{ML [display] "(!! (fn _ => \"foo\") ($$ \"h\"))"} |
177 @{ML [display] "(!! (fn _ => \"foo\") ($$ \"h\"))"} |
179 |
178 |
180 on @{text [quotes] "hello"}, the parsing succeeds |
179 on @{text [quotes] "hello"}, the parsing succeeds |
181 |
180 |
186 but if we invoke it on @{text [quotes] "world"} |
185 but if we invoke it on @{text [quotes] "world"} |
187 |
186 |
188 @{ML_response_fake [display] "(!! (fn _ => \"foo\") ($$ \"h\")) (explode \"world\")" |
187 @{ML_response_fake [display] "(!! (fn _ => \"foo\") ($$ \"h\")) (explode \"world\")" |
189 "Exception ABORT raised"} |
188 "Exception ABORT raised"} |
190 |
189 |
191 the parsing aborts and the error message @{text "foo"} is printed out. In order to |
190 then the parsing aborts and the error message @{text "foo"} is printed out. In order to |
192 see the error message properly, we need to prefix the parser with the function |
191 see the error message properly, we need to prefix the parser with the function |
193 @{ML "Scan.error"}. For example |
192 @{ML "Scan.error"}. For example |
194 |
193 |
195 @{ML_response_fake [display] "Scan.error ((!! (fn _ => \"foo\") ($$ \"h\")))" |
194 @{ML_response_fake [display] "Scan.error ((!! (fn _ => \"foo\") ($$ \"h\")))" |
196 "Exception Error \"foo\" raised"} |
195 "Exception Error \"foo\" raised"} |
197 |
196 |
198 This ``prefixing'' is usually done by wrappers such as @{ML "OuterSyntax.command"} |
197 This ``prefixing'' is usually done by wrappers such as @{ML "OuterSyntax.command"} |
199 (FIXME: give reference to later place). |
198 (FIXME: give reference to later place). |
200 |
199 |
201 Returning to our example of parsing @{ML "(p -- q) || r" for p q r}. If we want |
200 Let us now return to our example of parsing @{ML "(p -- q) || r" for p q r}. If we want |
202 to generate the correct error message for p-followed-by-q, then |
201 to generate the correct error message for p-followed-by-q, then |
203 we have to write: |
202 we have to write: |
204 *} |
203 *} |
205 |
204 |
206 ML {* |
205 ML {* |
207 fun p_followed_by_q p q r = |
206 fun p_followed_by_q p q r = |
208 let |
207 let |
209 val err = (fn _ => p ^ " is not followed by " ^ q) |
208 val err_msg = (fn _ => p ^ " is not followed by " ^ q) |
210 in |
209 in |
211 (($$ p) -- (!! err ($$ q))) || (($$ r) -- ($$ r)) |
210 ($$ p -- (!! err_msg ($$ q))) || ($$ r -- $$ r) |
212 end |
211 end |
213 *} |
212 *} |
214 |
213 |
215 |
214 |
216 text {* |
215 text {* |
217 Running this parser with |
216 Running this parser with the @{text [quotes] "h"} and @{text [quotes] "e"}, and |
|
217 the input @{text [quotes] "holle"} |
218 |
218 |
219 @{ML_response_fake [display] "Scan.error (p_followed_by_q \"h\" \"e\" \"w\") (explode \"holle\")" |
219 @{ML_response_fake [display] "Scan.error (p_followed_by_q \"h\" \"e\" \"w\") (explode \"holle\")" |
220 "Exception ERROR \"h is not followed by e\" raised"} |
220 "Exception ERROR \"h is not followed by e\" raised"} |
221 |
221 |
222 gives the correct error message. Running it with |
222 produces the correct error message. Running it with |
223 |
223 |
224 @{ML_response [display] "Scan.error (p_followed_by_q \"h\" \"e\" \"w\") (explode \"wworld\")" |
224 @{ML_response [display] "Scan.error (p_followed_by_q \"h\" \"e\" \"w\") (explode \"wworld\")" |
225 "((\"w\", \"w\"), [\"o\", \"r\", \"l\", \"d\"])"} |
225 "((\"w\", \"w\"), [\"o\", \"r\", \"l\", \"d\"])"} |
226 |
226 |
227 yields the expected parsing. |
227 yields the expected parsing. |
235 Note that @{ML "Scan.repeat"} stores the parsed items in a list. The function |
235 Note that @{ML "Scan.repeat"} stores the parsed items in a list. The function |
236 @{ML "Scan.repeat1"} is similar, but requires that the parser @{text "p"} |
236 @{ML "Scan.repeat1"} is similar, but requires that the parser @{text "p"} |
237 succeeds at least once. |
237 succeeds at least once. |
238 |
238 |
239 Also note that the parser would have aborted with the exception @{text MORE}, if |
239 Also note that the parser would have aborted with the exception @{text MORE}, if |
240 we had run it only on just @{text [quotes] "hhhh"}. This can be avoided using |
240 we had run it only on just @{text [quotes] "hhhh"}. This can be avoided by using |
241 the wrapper @{ML Scan.finite} and the ``stopper-token'' @{ML Symbol.stopper}. With |
241 the wrapper @{ML Scan.finite} and the ``stopper-token'' @{ML Symbol.stopper}. With |
242 them we can write |
242 them we can write |
243 |
243 |
244 @{ML_response [display] "Scan.finite Symbol.stopper (Scan.repeat ($$ \"h\")) (explode \"hhhh\")" |
244 @{ML_response [display] "Scan.finite Symbol.stopper (Scan.repeat ($$ \"h\")) (explode \"hhhh\")" |
245 "([\"h\", \"h\", \"h\", \"h\"], [])"} |
245 "([\"h\", \"h\", \"h\", \"h\"], [])"} |
246 |
246 |
247 However, this kind of manually wrapping needs to be done only very rarely |
247 @{ML Symbol.stopper} is the ``end-of-input'' indicator for parsing strings; |
248 in practise, because it is already done by the infrastructure for you. |
248 other stoppers need to be used when parsing token, for example. However, this kind of |
|
249 manually wrapping is often already done by the surrounding infrastructure. |
249 |
250 |
250 The function @{ML Scan.repeat} can be used with @{ML Scan.one} to read any |
251 The function @{ML Scan.repeat} can be used with @{ML Scan.one} to read any |
251 string as in |
252 string as in |
252 |
253 |
253 @{ML_response [display] |
254 @{ML_response [display] |
258 Scan.finite Symbol.stopper p input |
259 Scan.finite Symbol.stopper p input |
259 end" |
260 end" |
260 "([\"f\", \"o\", \"o\", \" \", \"b\", \"a\", \"r\", \" \", \"f\", \"o\", \"o\"], [])"} |
261 "([\"f\", \"o\", \"o\", \" \", \"b\", \"a\", \"r\", \" \", \"f\", \"o\", \"o\"], [])"} |
261 |
262 |
262 where the function @{ML Symbol.not_eof} ensures that we do not read beyond the |
263 where the function @{ML Symbol.not_eof} ensures that we do not read beyond the |
263 end of the input string. |
264 end of the input string (i.e.~stopper symbol). |
264 |
265 |
265 The function @{ML "Scan.unless p q" for p q} takes two parsers: if the first one can |
266 The function @{ML "Scan.unless p q" for p q} takes two parsers: if the first one can |
266 parse the input, then the whole parser fails; if not, then the second is tried. Therefore |
267 parse the input, then the whole parser fails; if not, then the second is tried. Therefore |
267 |
268 |
268 @{ML_response_fake_both [display] "Scan.unless ($$ \"h\") ($$ \"w\") (explode \"hello\")" |
269 @{ML_response_fake_both [display] "Scan.unless ($$ \"h\") ($$ \"w\") (explode \"hello\")" |
289 Scan.finite Symbol.stopper p input2) |
290 Scan.finite Symbol.stopper p input2) |
290 end" |
291 end" |
291 "(([\"f\", \"o\", \"o\", \"o\", \"o\", \"o\"], []), |
292 "(([\"f\", \"o\", \"o\", \"o\", \"o\", \"o\"], []), |
292 ([\"f\", \"o\", \"o\"], [\"*\", \"o\", \"o\", \"o\"]))"} |
293 ([\"f\", \"o\", \"o\"], [\"*\", \"o\", \"o\", \"o\"]))"} |
293 |
294 |
294 After parsing succeeded, one nearly always wants to apply a function on the parsed |
295 After parsing is done, one nearly always wants to apply a function on the parsed |
295 items. This is done using the function @{ML "(p >> f)" for p f} which runs |
296 items. To do this the function @{ML "(p >> f)" for p f} can be employed, which runs |
296 first the parser @{text p} and upon successful completion applies the |
297 first the parser @{text p} and upon successful completion applies the |
297 function @{text f} to the result. For example |
298 function @{text f} to the result. For example |
298 |
299 |
299 @{ML_response [display] |
300 @{ML_response [display] |
300 "let |
301 "let |
343 text {* |
344 text {* |
344 Most of the time, however, Isabelle developers have to deal with parsing tokens, not strings. |
345 Most of the time, however, Isabelle developers have to deal with parsing tokens, not strings. |
345 This is because the parsers for the theory syntax, as well as the parsers for the |
346 This is because the parsers for the theory syntax, as well as the parsers for the |
346 argument syntax of proof methods and attributes use the type |
347 argument syntax of proof methods and attributes use the type |
347 @{ML_type OuterLex.token} (which is identical to the type @{ML_type OuterParse.token}). |
348 @{ML_type OuterLex.token} (which is identical to the type @{ML_type OuterParse.token}). |
|
349 There are also parsers for ML-expressions and ML-files. |
348 |
350 |
349 \begin{readmore} |
351 \begin{readmore} |
350 The parser functions for the theory syntax are contained in the structure |
352 The parser functions for the theory syntax are contained in the structure |
351 @{ML_struct OuterParse} defined in the file @{ML_file "Pure/Isar/outer_parse.ML"}. |
353 @{ML_struct OuterParse} defined in the file @{ML_file "Pure/Isar/outer_parse.ML"}. |
352 The definition for tokens is in the file @{ML_file "Pure/Isar/outer_lex.ML"}. |
354 The definition for tokens is in the file @{ML_file "Pure/Isar/outer_lex.ML"}. |
407 "[Token (\<dots>,(Command, \"inductive\"),\<dots>), |
409 "[Token (\<dots>,(Command, \"inductive\"),\<dots>), |
408 Token (\<dots>,(Keyword, \"|\"),\<dots>), |
410 Token (\<dots>,(Keyword, \"|\"),\<dots>), |
409 Token (\<dots>,(Keyword, \"for\"),\<dots>)]"} |
411 Token (\<dots>,(Keyword, \"for\"),\<dots>)]"} |
410 |
412 |
411 we obtain a list consisting of only a command and two keyword tokens. |
413 we obtain a list consisting of only a command and two keyword tokens. |
412 If you want to see which keywords and commands are currently known, use |
414 If you want to see which keywords and commands are currently known, type in |
413 the following (you might have to adjust the @{ML print_depth} in order to |
415 the following (you might have to adjust the @{ML print_depth} in order to |
414 see the complete list): |
416 see the complete list): |
415 |
417 |
416 @{ML_response_fake [display] |
418 @{ML_response_fake [display] |
417 "let |
419 "let |
533 |
538 |
534 ML {* |
539 ML {* |
535 OuterParse.opt_target |
540 OuterParse.opt_target |
536 *} |
541 *} |
537 |
542 |
|
543 text {* (FIXME funny output for a proposition) *} |
|
544 |
538 ML {* |
545 ML {* |
539 OuterParse.opt_target -- |
546 OuterParse.opt_target -- |
540 OuterParse.fixes -- |
547 OuterParse.fixes -- |
541 OuterParse.for_fixes -- |
548 OuterParse.for_fixes -- |
542 Scan.optional (OuterParse.$$$ "where" |-- |
549 Scan.optional (OuterParse.$$$ "where" |-- |
543 OuterParse.!!! (OuterParse.enum1 "|" (SpecParse.opt_thm_name ":" -- OuterParse.prop))) [] |
550 OuterParse.!!! (OuterParse.enum1 "|" (SpecParse.opt_thm_name ":" -- OuterParse.prop))) [] |
544 |
551 |
545 *} |
552 *} |
546 |
553 |
547 text {* (FIXME funny output for a proposition) *} |
554 ML {* OuterSyntax.command *} |
|
555 |
|
556 section {* New Commands and Keyword Files *} |
|
557 |
|
558 text {* |
|
559 |
|
560 Often new commands, for example for providing a new definitional principle, |
|
561 need to be programmed. While this is not too difficult to do on the |
|
562 ML-level, new commands, in order to be useful, need to be recognised by |
|
563 ProofGeneral. This results in some subtle configuration issues, which we |
|
564 will explain in this section. |
|
565 |
|
566 |
|
567 Let us start with a silly command, named \isacommand{foo}, which does nothing at all. |
|
568 It can be defined as |
|
569 |
|
570 @{ML [display] |
|
571 "let |
|
572 val do_nothing = Scan.succeed (Toplevel.theory (fn x => x)) |
|
573 val flag = OuterKeyword.thy_decl |
|
574 in |
|
575 OuterSyntax.command \"foo\" \"description of foo\" flag do_nothing |
|
576 end"} |
|
577 |
|
578 The crucial function that makes the command \isacommand{foo} known to |
|
579 Isabelle is @{ML OuterSyntax.command}. It expects a name of a command, a |
|
580 short description, a flag (we will explain it later more thoroughly) and a |
|
581 parser for a top-level transition function (its purpose will also explained |
|
582 later on). Lets also assume we packaged this function into a separate |
|
583 theory named @{text "Command"}, say, and a file named @{text "Command.thy"}. |
|
584 |
|
585 We now need a keyword file, that can be loaded by ProofGeneral in |
|
586 order to recognise \isacommand{foo} as command. Such a keyword file |
|
587 can be generated with the command-line |
|
588 |
|
589 @{text [display] "$ isabelle keywords -k foo some-log-files"} |
|
590 |
|
591 The option @{text "-k foo"} indicates which postfix the keyword file will obtain. In |
|
592 the case above the generated file is named @{text "isar-keywords-foo.el"}. This command |
|
593 requires log files to be present (in order to extract the keywords from them). For |
|
594 our purposes it is sufficient to use the log files for the theories @{text "Pure"}, |
|
595 @{text "HOL"} and @{text "Pure-ProofGeneral"}, as well as the theory @{text "Command.thy"} |
|
596 containing the new command. @{text Pure} and @{text HOL} are usually compiled during the |
|
597 installation of Isabelle, if not this can be done conveniently using build script from |
|
598 the distribution |
|
599 |
|
600 @{text [display] |
|
601 "$ ./build -m \"Pure\" |
|
602 $ ./build -m \"HOL\""} |
|
603 |
|
604 The @{text "Pure-ProofGeneral"} theory needs to be compiled with |
|
605 |
|
606 @{text [display] "$ ./build -m \"Pure-ProofGeneral\" \"Pure\""} |
|
607 |
|
608 For the theory containing the new command \isacommand{foo}, we first |
|
609 create a ``managed'' subdirectory by |
|
610 |
|
611 |
|
612 This creates files @{text "IsaMakefile"} and @{text "./FooCommand/ROOT.ML"}. We |
|
613 need to copy the file @{text "Command.thy"} into the directory @{text "FooCommand"} |
|
614 and add @{text "use_thy \"Command\";"} to @{text "./FooCommand/ROOT.ML"}. |
|
615 Now we can compile the theory by just typing |
|
616 |
|
617 @{text [display] "$ isabelle make"} |
|
618 |
|
619 All these compilations created the log files that we are after. They are typically stored |
|
620 under the directory |
|
621 |
|
622 @{text [display] "~/.isabelle/heaps/\<dots>\<dots>/\<dots>\<dots>/log"} |
|
623 |
|
624 where the dots stand for names of the current Isabelle distribution and |
|
625 hardware architecture. In it are the files |
|
626 |
|
627 @{text [display] |
|
628 "Pure.gz |
|
629 HOL.gz |
|
630 Pure-ProofGeneral.gz |
|
631 HOL-FooCommand.gz"} |
|
632 |
|
633 Let us assume the directory with these files is stored in the shell variable |
|
634 @{text "ISABELLE_LOGS"}, by typing for example |
|
635 |
|
636 @{text [display] "$ ISABELLE_LOGS=\"$(isabelle getenv -b ISABELLE_OUTPUT)\"/log"} |
|
637 |
|
638 We can now create the keyword file with |
|
639 |
|
640 @{text [display] |
|
641 "$ isabelle keywords -k foo |
|
642 `$ISABELLE_LOGS/{Pure.gz,HOL.gz,Pure-ProofGeneral,HOL-FooCommand.gz}`"} |
|
643 |
|
644 The result is the file @{text "isar-keywords-foo.el"}, which needs to be |
|
645 copied in the directory @{text "~/.isabelle/etc"}. To make Isabelle use |
|
646 this keyword file, we need to start it with the option @{text "-k foo"}: |
|
647 |
|
648 @{text [display] "isabelle -k foo a-theory-file"} |
|
649 |
|
650 If we now run the original code creating the command |
|
651 |
|
652 @{ML [display] |
|
653 "let |
|
654 val do_nothing = Scan.succeed (Toplevel.theory (fn x => x)) |
|
655 val flag = OuterKeyword.thy_decl |
|
656 in |
|
657 OuterSyntax.command \"foo\" \"description of foo\" flag do_nothing |
|
658 end"} |
|
659 |
|
660 then we can finally make use of \isacommand{foo}! Similarly |
|
661 with any other command you want to implement. |
|
662 |
|
663 In the example above, we built the theories on top of the HOL-logic. If you |
|
664 target other logics, such as Nominal or ZF, then you need to change the |
|
665 log files appropriately. |
|
666 |
|
667 (FIXME Explain @{text "OuterKeyword.thy_decl"} and parser) |
|
668 |
|
669 |
|
670 *} |
|
671 |
548 |
672 |
549 |
673 |
550 |
674 |
551 chapter {* Parsing *} |
675 chapter {* Parsing *} |
552 |
676 |