469 text {* |
469 text {* |
470 Most of the time, however, Isabelle developers have to deal with parsing |
470 Most of the time, however, Isabelle developers have to deal with parsing |
471 tokens, not strings. These token parsers have the type: |
471 tokens, not strings. These token parsers have the type: |
472 *} |
472 *} |
473 |
473 |
474 ML{*type 'a parser = OuterLex.token list -> 'a * OuterLex.token list*} |
474 ML{*type 'a parser = Token.T list -> 'a * Token.T list*} |
475 |
475 |
476 text {* |
476 text {* |
477 The reason for using token parsers is that theory syntax, as well as the |
477 The reason for using token parsers is that theory syntax, as well as the |
478 parsers for the arguments of proof methods, use the type @{ML_type |
478 parsers for the arguments of proof methods, use the type @{ML_type |
479 OuterLex.token}. |
479 Token.T}. |
480 |
480 |
481 \begin{readmore} |
481 \begin{readmore} |
482 The parser functions for the theory syntax are contained in the structure |
482 The parser functions for the theory syntax are contained in the structure |
483 @{ML_struct OuterParse} defined in the file @{ML_file "Pure/Isar/parse.ML"}. |
483 @{ML_struct Parse} defined in the file @{ML_file "Pure/Isar/parse.ML"}. |
484 The definition for tokens is in the file @{ML_file "Pure/Isar/outer_lex.ML"}. |
484 The definition for tokens is in the file @{ML_file "Pure/Isar/token.ML"}. |
485 \end{readmore} |
485 \end{readmore} |
486 |
486 |
487 The structure @{ML_struct OuterLex} defines several kinds of tokens (for |
487 The structure @{ML_struct Token} defines several kinds of tokens (for |
488 example @{ML_ind Ident in OuterLex} for identifiers, @{ML Keyword in |
488 example @{ML_ind Ident in Token} for identifiers, @{ML Keyword in |
489 OuterLex} for keywords and @{ML_ind Command in OuterLex} for commands). Some |
489 Token} for keywords and @{ML_ind Command in Token} for commands). Some |
490 token parsers take into account the kind of tokens. The first example shows |
490 token parsers take into account the kind of tokens. The first example shows |
491 how to generate a token list out of a string using the function |
491 how to generate a token list out of a string using the function |
492 @{ML_ind scan in OuterSyntax}. It is given the argument |
492 @{ML_ind scan in Outer_Syntax}. It is given the argument |
493 @{ML "Position.none"} since, |
493 @{ML "Position.none"} since, |
494 at the moment, we are not interested in generating precise error |
494 at the moment, we are not interested in generating precise error |
495 messages. The following code |
495 messages. The following code |
496 |
496 |
497 |
497 |
498 @{ML_response_fake [display,gray] "OuterSyntax.scan Position.none \"hello world\"" |
498 @{ML_response_fake [display,gray] "Outer_Syntax.scan Position.none \"hello world\"" |
499 "[Token (\<dots>,(Ident, \"hello\"),\<dots>), |
499 "[Token (\<dots>,(Ident, \"hello\"),\<dots>), |
500 Token (\<dots>,(Space, \" \"),\<dots>), |
500 Token (\<dots>,(Space, \" \"),\<dots>), |
501 Token (\<dots>,(Ident, \"world\"),\<dots>)]"} |
501 Token (\<dots>,(Ident, \"world\"),\<dots>)]"} |
502 |
502 |
503 produces three tokens where the first and the last are identifiers, since |
503 produces three tokens where the first and the last are identifiers, since |
504 @{text [quotes] "hello"} and @{text [quotes] "world"} do not match any |
504 @{text [quotes] "hello"} and @{text [quotes] "world"} do not match any |
505 other syntactic category. The second indicates a space. |
505 other syntactic category. The second indicates a space. |
506 |
506 |
507 We can easily change what is recognised as a keyword with the function |
507 We can easily change what is recognised as a keyword with the function |
508 @{ML_ind keyword in OuterKeyword}. For example calling it with |
508 @{ML_ind keyword in Keyword}. For example calling it with |
509 *} |
509 *} |
510 |
510 |
511 ML{*val _ = OuterKeyword.keyword "hello"*} |
511 ML{*val _ = Keyword.keyword "hello"*} |
512 |
512 |
513 text {* |
513 text {* |
514 then lexing @{text [quotes] "hello world"} will produce |
514 then lexing @{text [quotes] "hello world"} will produce |
515 |
515 |
516 @{ML_response_fake [display,gray] "OuterSyntax.scan Position.none \"hello world\"" |
516 @{ML_response_fake [display,gray] "Outer_Syntax.scan Position.none \"hello world\"" |
517 "[Token (\<dots>,(Keyword, \"hello\"),\<dots>), |
517 "[Token (\<dots>,(Keyword, \"hello\"),\<dots>), |
518 Token (\<dots>,(Space, \" \"),\<dots>), |
518 Token (\<dots>,(Space, \" \"),\<dots>), |
519 Token (\<dots>,(Ident, \"world\"),\<dots>)]"} |
519 Token (\<dots>,(Ident, \"world\"),\<dots>)]"} |
520 |
520 |
521 Many parsing functions later on will require white space, comments and the like |
521 Many parsing functions later on will require white space, comments and the like |
522 to have already been filtered out. So from now on we are going to use the |
522 to have already been filtered out. So from now on we are going to use the |
523 functions @{ML filter} and @{ML_ind is_proper in OuterLex} to do this. |
523 functions @{ML filter} and @{ML_ind is_proper in Token} to do this. |
524 For example: |
524 For example: |
525 |
525 |
526 @{ML_response_fake [display,gray] |
526 @{ML_response_fake [display,gray] |
527 "let |
527 "let |
528 val input = OuterSyntax.scan Position.none \"hello world\" |
528 val input = Outer_Syntax.scan Position.none \"hello world\" |
529 in |
529 in |
530 filter OuterLex.is_proper input |
530 filter Token.is_proper input |
531 end" |
531 end" |
532 "[Token (\<dots>,(Ident, \"hello\"), \<dots>), Token (\<dots>,(Ident, \"world\"), \<dots>)]"} |
532 "[Token (\<dots>,(Ident, \"hello\"), \<dots>), Token (\<dots>,(Ident, \"world\"), \<dots>)]"} |
533 |
533 |
534 For convenience we define the function: |
534 For convenience we define the function: |
535 *} |
535 *} |
536 |
536 |
537 ML{*fun filtered_input str = |
537 ML{*fun filtered_input str = |
538 filter OuterLex.is_proper (OuterSyntax.scan Position.none str) *} |
538 filter Token.is_proper (Outer_Syntax.scan Position.none str) *} |
539 |
539 |
540 text {* |
540 text {* |
541 If you now parse |
541 If you now parse |
542 |
542 |
543 @{ML_response_fake [display,gray] |
543 @{ML_response_fake [display,gray] |
588 |
588 |
589 @{ML_response [display,gray] |
589 @{ML_response [display,gray] |
590 "let |
590 "let |
591 val input = filtered_input \"| in\" |
591 val input = filtered_input \"| in\" |
592 in |
592 in |
593 (OuterParse.$$$ \"|\" -- OuterParse.$$$ \"in\") input |
593 (Parse.$$$ \"|\" -- Parse.$$$ \"in\") input |
594 end" |
594 end" |
595 "((\"|\", \"in\"), [])"} |
595 "((\"|\", \"in\"), [])"} |
596 |
596 |
597 The parser @{ML "OuterParse.enum s p" for s p} parses a possibly empty |
597 The parser @{ML "Parse.enum s p" for s p} parses a possibly empty |
598 list of items recognised by the parser @{text p}, where the items being parsed |
598 list of items recognised by the parser @{text p}, where the items being parsed |
599 are separated by the string @{text s}. For example: |
599 are separated by the string @{text s}. For example: |
600 |
600 |
601 @{ML_response [display,gray] |
601 @{ML_response [display,gray] |
602 "let |
602 "let |
603 val input = filtered_input \"in | in | in foo\" |
603 val input = filtered_input \"in | in | in foo\" |
604 in |
604 in |
605 (OuterParse.enum \"|\" (OuterParse.$$$ \"in\")) input |
605 (Parse.enum \"|\" (Parse.$$$ \"in\")) input |
606 end" |
606 end" |
607 "([\"in\", \"in\", \"in\"], [\<dots>])"} |
607 "([\"in\", \"in\", \"in\"], [\<dots>])"} |
608 |
608 |
609 The function @{ML_ind enum1 in OuterParse} works similarly, except that the |
609 The function @{ML_ind enum1 in Parse} works similarly, except that the |
610 parsed list must be non-empty. Note that we had to add a string @{text |
610 parsed list must be non-empty. Note that we had to add a string @{text |
611 [quotes] "foo"} at the end of the parsed string, otherwise the parser would |
611 [quotes] "foo"} at the end of the parsed string, otherwise the parser would |
612 have consumed all tokens and then failed with the exception @{text |
612 have consumed all tokens and then failed with the exception @{text |
613 "MORE"}. Like in the previous section, we can avoid this exception using the |
613 "MORE"}. Like in the previous section, we can avoid this exception using the |
614 wrapper @{ML Scan.finite}. This time, however, we have to use the |
614 wrapper @{ML Scan.finite}. This time, however, we have to use the |
615 ``stopper-token'' @{ML OuterLex.stopper}. We can write: |
615 ``stopper-token'' @{ML Token.stopper}. We can write: |
616 |
616 |
617 @{ML_response [display,gray] |
617 @{ML_response [display,gray] |
618 "let |
618 "let |
619 val input = filtered_input \"in | in | in\" |
619 val input = filtered_input \"in | in | in\" |
620 val p = OuterParse.enum \"|\" (OuterParse.$$$ \"in\") |
620 val p = Parse.enum \"|\" (Parse.$$$ \"in\") |
621 in |
621 in |
622 Scan.finite OuterLex.stopper p input |
622 Scan.finite Token.stopper p input |
623 end" |
623 end" |
624 "([\"in\", \"in\", \"in\"], [])"} |
624 "([\"in\", \"in\", \"in\"], [])"} |
625 |
625 |
626 The following function will help to run examples. |
626 The following function will help to run examples. |
627 *} |
627 *} |
628 |
628 |
629 ML{*fun parse p input = Scan.finite OuterLex.stopper (Scan.error p) input *} |
629 ML{*fun parse p input = Scan.finite Token.stopper (Scan.error p) input *} |
630 |
630 |
631 text {* |
631 text {* |
632 The function @{ML_ind "!!!" in OuterParse} can be used to force termination |
632 The function @{ML_ind "!!!" in Parse} can be used to force termination |
633 of the parser in case of a dead end, just like @{ML "Scan.!!"} (see previous |
633 of the parser in case of a dead end, just like @{ML "Scan.!!"} (see previous |
634 section). A difference, however, is that the error message of @{ML |
634 section). A difference, however, is that the error message of @{ML |
635 "OuterParse.!!!"} is fixed to be @{text [quotes] "Outer syntax error"} |
635 "Parse.!!!"} is fixed to be @{text [quotes] "Outer syntax error"} |
636 together with a relatively precise description of the failure. For example: |
636 together with a relatively precise description of the failure. For example: |
637 |
637 |
638 @{ML_response_fake [display,gray] |
638 @{ML_response_fake [display,gray] |
639 "let |
639 "let |
640 val input = filtered_input \"in |\" |
640 val input = filtered_input \"in |\" |
641 val parse_bar_then_in = OuterParse.$$$ \"|\" -- OuterParse.$$$ \"in\" |
641 val parse_bar_then_in = Parse.$$$ \"|\" -- Parse.$$$ \"in\" |
642 in |
642 in |
643 parse (OuterParse.!!! parse_bar_then_in) input |
643 parse (Parse.!!! parse_bar_then_in) input |
644 end" |
644 end" |
645 "Exception ERROR \"Outer syntax error: keyword \"|\" expected, |
645 "Exception ERROR \"Outer syntax error: keyword \"|\" expected, |
646 but keyword in was found\" raised" |
646 but keyword in was found\" raised" |
647 } |
647 } |
648 |
648 |
649 \begin{exercise} (FIXME) |
649 \begin{exercise} (FIXME) |
650 A type-identifier, for example @{typ "'a"}, is a token of |
650 A type-identifier, for example @{typ "'a"}, is a token of |
651 kind @{ML_ind Keyword in OuterLex}. It can be parsed using |
651 kind @{ML_ind Keyword in Token}. It can be parsed using |
652 the function @{ML type_ident in OuterParse}. |
652 the function @{ML type_ident in Parse}. |
653 \end{exercise} |
653 \end{exercise} |
654 |
654 |
655 (FIXME: or give parser for numbers) |
655 (FIXME: or give parser for numbers) |
656 |
656 |
657 Whenever there is a possibility that the processing of user input can fail, |
657 Whenever there is a possibility that the processing of user input can fail, |
1454 text {* |
1454 text {* |
1455 (********************************************************) |
1455 (********************************************************) |
1456 (FIXME: explain a version of rule-tac) |
1456 (FIXME: explain a version of rule-tac) |
1457 *} |
1457 *} |
1458 |
1458 |
1459 (*<*) |
|
1460 (* THIS IS AN OLD VERSION OF THE PARSING CHAPTER BY JEREMY DAWSON *) |
|
1461 chapter {* Parsing *} |
|
1462 |
|
1463 text {* |
|
1464 |
|
1465 Lots of Standard ML code is given in this document, for various reasons, |
|
1466 including: |
|
1467 \begin{itemize} |
|
1468 \item direct quotation of code found in the Isabelle source files, |
|
1469 or simplified versions of such code |
|
1470 \item identifiers found in the Isabelle source code, with their types |
|
1471 (or specialisations of their types) |
|
1472 \item code examples, which can be run by the reader, to help illustrate the |
|
1473 behaviour of functions found in the Isabelle source code |
|
1474 \item ancillary functions, not from the Isabelle source code, |
|
1475 which enable the reader to run relevant code examples |
|
1476 \item type abbreviations, which help explain the uses of certain functions |
|
1477 \end{itemize} |
|
1478 |
|
1479 *} |
|
1480 |
|
1481 section {* Parsing Isar input *} |
|
1482 |
|
1483 text {* |
|
1484 |
|
1485 The typical parsing function has the type |
|
1486 \texttt{'src -> 'res * 'src}, with input |
|
1487 of type \texttt{'src}, returning a result |
|
1488 of type \texttt{'res}, which is (or is derived from) the first part of the |
|
1489 input, and also returning the remainder of the input. |
|
1490 (In the common case, when it is clear what the ``remainder of the input'' |
|
1491 means, we will just say that the functions ``returns'' the |
|
1492 value of type \texttt{'res}). |
|
1493 An exception is raised if an appropriate value |
|
1494 cannot be produced from the input. |
|
1495 A range of exceptions can be used to identify different reasons |
|
1496 for the failure of a parse. |
|
1497 |
|
1498 This contrasts the standard parsing function in Standard ML, |
|
1499 which is of type |
|
1500 \texttt{type ('res, 'src) reader = 'src -> ('res * 'src) option}; |
|
1501 (for example, \texttt{List.getItem} and \texttt{Substring.getc}). |
|
1502 However, much of the discussion at |
|
1503 FIX file:/home/jeremy/html/ml/SMLBasis/string-cvt.html |
|
1504 is relevant. |
|
1505 |
|
1506 Naturally one may convert between the two different sorts of parsing functions |
|
1507 as follows: |
|
1508 \begin{verbatim} |
|
1509 open StringCvt ; |
|
1510 type ('res, 'src) ex_reader = 'src -> 'res * 'src |
|
1511 ex_reader : ('res, 'src) reader -> ('res, 'src) ex_reader |
|
1512 fun ex_reader rdr src = Option.valOf (rdr src) ; |
|
1513 reader : ('res, 'src) ex_reader -> ('res, 'src) reader |
|
1514 fun reader exrdr src = SOME (exrdr src) handle _ => NONE ; |
|
1515 \end{verbatim} |
|
1516 |
|
1517 *} |
|
1518 |
|
1519 section{* The \texttt{Scan} structure *} |
|
1520 |
|
1521 text {* |
|
1522 The source file is \texttt{src/General/scan.ML}. |
|
1523 This structure provides functions for using and combining parsing functions |
|
1524 of the type \texttt{'src -> 'res * 'src}. |
|
1525 Three exceptions are used: |
|
1526 \begin{verbatim} |
|
1527 exception MORE of string option; (*need more input (prompt)*) |
|
1528 exception FAIL of string option; (*try alternatives (reason of failure)*) |
|
1529 exception ABORT of string; (*dead end*) |
|
1530 \end{verbatim} |
|
1531 Many functions in this structure (generally those with names composed of |
|
1532 symbols) are declared as infix. |
|
1533 |
|
1534 Some functions from that structure are |
|
1535 \begin{verbatim} |
|
1536 |-- : ('src -> 'res1 * 'src') * ('src' -> 'res2 * 'src'') -> |
|
1537 'src -> 'res2 * 'src'' |
|
1538 --| : ('src -> 'res1 * 'src') * ('src' -> 'res2 * 'src'') -> |
|
1539 'src -> 'res1 * 'src'' |
|
1540 -- : ('src -> 'res1 * 'src') * ('src' -> 'res2 * 'src'') -> |
|
1541 'src -> ('res1 * 'res2) * 'src'' |
|
1542 ^^ : ('src -> string * 'src') * ('src' -> string * 'src'') -> |
|
1543 'src -> string * 'src'' |
|
1544 \end{verbatim} |
|
1545 These functions parse a result off the input source twice. |
|
1546 |
|
1547 \texttt{|--} and \texttt{--|} |
|
1548 return the first result and the second result, respectively. |
|
1549 |
|
1550 \texttt{--} returns both. |
|
1551 |
|
1552 \verb|^^| returns the result of concatenating the two results |
|
1553 (which must be strings). |
|
1554 |
|
1555 Note how, although the types |
|
1556 \texttt{'src}, \texttt{'src'} and \texttt{'src''} will normally be the same, |
|
1557 the types as shown help suggest the behaviour of the functions. |
|
1558 \begin{verbatim} |
|
1559 :-- : ('src -> 'res1 * 'src') * ('res1 -> 'src' -> 'res2 * 'src'') -> |
|
1560 'src -> ('res1 * 'res2) * 'src'' |
|
1561 :|-- : ('src -> 'res1 * 'src') * ('res1 -> 'src' -> 'res2 * 'src'') -> |
|
1562 'src -> 'res2 * 'src'' |
|
1563 \end{verbatim} |
|
1564 These are similar to \texttt{|--} and \texttt{--|}, |
|
1565 except that the second parsing function can depend on the result of the first. |
|
1566 \begin{verbatim} |
|
1567 >> : ('src -> 'res1 * 'src') * ('res1 -> 'res2) -> 'src -> 'res2 * 'src' |
|
1568 || : ('src -> 'res_src) * ('src -> 'res_src) -> 'src -> 'res_src |
|
1569 \end{verbatim} |
|
1570 \texttt{p >> f} applies a function \texttt{f} to the result of a parse. |
|
1571 |
|
1572 \texttt{||} tries a second parsing function if the first one |
|
1573 fails by raising an exception of the form \texttt{FAIL \_}. |
|
1574 |
|
1575 \begin{verbatim} |
|
1576 succeed : 'res -> ('src -> 'res * 'src) ; |
|
1577 fail : ('src -> 'res_src) ; |
|
1578 !! : ('src * string option -> string) -> |
|
1579 ('src -> 'res_src) -> ('src -> 'res_src) ; |
|
1580 \end{verbatim} |
|
1581 \texttt{succeed r} returns \texttt{r}, with the input unchanged. |
|
1582 \texttt{fail} always fails, raising exception \texttt{FAIL NONE}. |
|
1583 \texttt{!! f} only affects the failure mode, turning a failure that |
|
1584 raises \texttt{FAIL \_} into a failure that raises \texttt{ABORT ...}. |
|
1585 This is used to prevent recovery from the failure --- |
|
1586 thus, in \texttt{!! parse1 || parse2}, if \texttt{parse1} fails, |
|
1587 it won't recover by trying \texttt{parse2}. |
|
1588 |
|
1589 \begin{verbatim} |
|
1590 one : ('si -> bool) -> ('si list -> 'si * 'si list) ; |
|
1591 some : ('si -> 'res option) -> ('si list -> 'res * 'si list) ; |
|
1592 \end{verbatim} |
|
1593 These require the input to be a list of items: |
|
1594 they fail, raising \texttt{MORE NONE} if the list is empty. |
|
1595 On other failures they raise \texttt{FAIL NONE} |
|
1596 |
|
1597 \texttt{one p} takes the first |
|
1598 item from the list if it satisfies \texttt{p}, otherwise fails. |
|
1599 |
|
1600 \texttt{some f} takes the first |
|
1601 item from the list and applies \texttt{f} to it, failing if this returns |
|
1602 \texttt{NONE}. |
|
1603 |
|
1604 \begin{verbatim} |
|
1605 many : ('si -> bool) -> 'si list -> 'si list * 'si list ; |
|
1606 \end{verbatim} |
|
1607 \texttt{many p} takes items from the input until it encounters one |
|
1608 which does not satisfy \texttt{p}. If it reaches the end of the input |
|
1609 it fails, raising \texttt{MORE NONE}. |
|
1610 |
|
1611 \texttt{many1} (with the same type) fails if the first item |
|
1612 does not satisfy \texttt{p}. |
|
1613 |
|
1614 \begin{verbatim} |
|
1615 option : ('src -> 'res * 'src) -> ('src -> 'res option * 'src) |
|
1616 optional : ('src -> 'res * 'src) -> 'res -> ('src -> 'res * 'src) |
|
1617 \end{verbatim} |
|
1618 \texttt{option}: |
|
1619 where the parser \texttt{f} succeeds with result \texttt{r} |
|
1620 or raises \texttt{FAIL \_}, |
|
1621 \texttt{option f} gives the result \texttt{SOME r} or \texttt{NONE}. |
|
1622 |
|
1623 \texttt{optional}: if parser \texttt{f} fails by raising \texttt{FAIL \_}, |
|
1624 \texttt{optional f default} provides the result \texttt{default}. |
|
1625 |
|
1626 \begin{verbatim} |
|
1627 repeat : ('src -> 'res * 'src) -> 'src -> 'res list * 'src |
|
1628 repeat1 : ('src -> 'res * 'src) -> 'src -> 'res list * 'src |
|
1629 bulk : ('src -> 'res * 'src) -> 'src -> 'res list * 'src |
|
1630 \end{verbatim} |
|
1631 \texttt{repeat f} repeatedly parses an item off the remaining input until |
|
1632 \texttt{f} fails with \texttt{FAIL \_} |
|
1633 |
|
1634 \texttt{repeat1} is as for \texttt{repeat}, but requires at least one |
|
1635 successful parse. |
|
1636 |
|
1637 \begin{verbatim} |
|
1638 lift : ('src -> 'res * 'src) -> ('ex * 'src -> 'res * ('ex * 'src)) |
|
1639 \end{verbatim} |
|
1640 \texttt{lift} changes the source type of a parser by putting in an extra |
|
1641 component \texttt{'ex}, which is ignored in the parsing. |
|
1642 |
|
1643 The \texttt{Scan} structure also provides the type \texttt{lexicon}, |
|
1644 HOW DO THEY WORK ?? TO BE COMPLETED |
|
1645 \begin{verbatim} |
|
1646 dest_lexicon: lexicon -> string list ; |
|
1647 make_lexicon: string list list -> lexicon ; |
|
1648 empty_lexicon: lexicon ; |
|
1649 extend_lexicon: string list list -> lexicon -> lexicon ; |
|
1650 merge_lexicons: lexicon -> lexicon -> lexicon ; |
|
1651 is_literal: lexicon -> string list -> bool ; |
|
1652 literal: lexicon -> string list -> string list * string list ; |
|
1653 \end{verbatim} |
|
1654 Two lexicons, for the commands and keywords, are stored and can be retrieved |
|
1655 by: |
|
1656 \begin{verbatim} |
|
1657 val (command_lexicon, keyword_lexicon) = OuterSyntax.get_lexicons () ; |
|
1658 val commands = Scan.dest_lexicon command_lexicon ; |
|
1659 val keywords = Scan.dest_lexicon keyword_lexicon ; |
|
1660 \end{verbatim} |
|
1661 *} |
|
1662 |
|
1663 section{* The \texttt{OuterLex} structure *} |
|
1664 |
|
1665 text {* |
|
1666 The source file is @{text "src/Pure/Isar/outer_lex.ML"}. |
|
1667 In some other source files its name is abbreviated: |
|
1668 \begin{verbatim} |
|
1669 structure T = OuterLex; |
|
1670 \end{verbatim} |
|
1671 This structure defines the type \texttt{token}. |
|
1672 (The types |
|
1673 \texttt{OuterLex.token}, |
|
1674 \texttt{OuterParse.token} and |
|
1675 \texttt{SpecParse.token} are all the same). |
|
1676 |
|
1677 Input text is split up into tokens, and the input source type for many parsing |
|
1678 functions is \texttt{token list}. |
|
1679 |
|
1680 The datatype definition (which is not published in the signature) is |
|
1681 \begin{verbatim} |
|
1682 datatype token = Token of Position.T * (token_kind * string); |
|
1683 \end{verbatim} |
|
1684 but here are some runnable examples for viewing tokens: |
|
1685 |
|
1686 *} |
|
1687 |
|
1688 |
|
1689 |
|
1690 |
|
1691 ML{* |
|
1692 val toks = OuterSyntax.scan Position.none |
|
1693 "theory,imports;begin x.y.z apply ?v1 ?'a 'a -- || 44 simp (* xx *) { * fff * }" ; |
|
1694 *} |
|
1695 |
|
1696 ML{* |
|
1697 print_depth 20 ; |
|
1698 *} |
|
1699 |
|
1700 ML{* |
|
1701 map OuterLex.text_of toks ; |
|
1702 *} |
|
1703 |
|
1704 ML{* |
|
1705 val proper_toks = filter OuterLex.is_proper toks ; |
|
1706 *} |
|
1707 |
|
1708 ML{* |
|
1709 map OuterLex.kind_of proper_toks |
|
1710 *} |
|
1711 |
|
1712 ML{* |
|
1713 map OuterLex.unparse proper_toks ; |
|
1714 *} |
|
1715 |
|
1716 ML{* |
|
1717 OuterLex.stopper |
|
1718 *} |
|
1719 |
|
1720 text {* |
|
1721 |
|
1722 The function \texttt{is\_proper : token -> bool} identifies tokens which are |
|
1723 not white space or comments: many parsing functions assume require spaces or |
|
1724 comments to have been filtered out. |
|
1725 |
|
1726 There is a special end-of-file token: |
|
1727 \begin{verbatim} |
|
1728 val (tok_eof : token, is_eof : token -> bool) = T.stopper ; |
|
1729 (* end of file token *) |
|
1730 \end{verbatim} |
|
1731 |
|
1732 *} |
|
1733 |
|
1734 section {* The \texttt{OuterParse} structure *} |
|
1735 |
|
1736 text {* |
|
1737 The source file is \texttt{src/Pure/Isar/outer\_parse.ML}. |
|
1738 In some other source files its name is abbreviated: |
|
1739 \begin{verbatim} |
|
1740 structure P = OuterParse; |
|
1741 \end{verbatim} |
|
1742 Here the parsers use \texttt{token list} as the input source type. |
|
1743 |
|
1744 Some of the parsers simply select the first token, provided that it is of the |
|
1745 right kind (as returned by \texttt{T.kind\_of}): these are |
|
1746 \texttt{ command, keyword, short\_ident, long\_ident, sym\_ident, term\_var, |
|
1747 type\_ident, type\_var, number, string, alt\_string, verbatim, sync, eof} |
|
1748 Others select the first token, provided that it is one of several kinds, |
|
1749 (eg, \texttt{name, xname, text, typ}). |
|
1750 |
|
1751 \begin{verbatim} |
|
1752 type 'a tlp = token list -> 'a * token list ; (* token list parser *) |
|
1753 $$$ : string -> string tlp |
|
1754 nat : int tlp ; |
|
1755 maybe : 'a tlp -> 'a option tlp ; |
|
1756 \end{verbatim} |
|
1757 |
|
1758 \texttt{\$\$\$ s} returns the first token, |
|
1759 if it equals \texttt{s} \emph{and} \texttt{s} is a keyword. |
|
1760 |
|
1761 \texttt{nat} returns the first token, if it is a number, and evaluates it. |
|
1762 |
|
1763 \texttt{maybe}: if \texttt{p} returns \texttt{r}, |
|
1764 then \texttt{maybe p} returns \texttt{SOME r} ; |
|
1765 if the first token is an underscore, it returns \texttt{NONE}. |
|
1766 |
|
1767 A few examples: |
|
1768 \begin{verbatim} |
|
1769 P.list : 'a tlp -> 'a list tlp ; (* likewise P.list1 *) |
|
1770 P.and_list : 'a tlp -> 'a list tlp ; (* likewise P.and_list1 *) |
|
1771 val toks : token list = OuterSyntax.scan "44 ,_, 66,77" ; |
|
1772 val proper_toks = List.filter T.is_proper toks ; |
|
1773 P.list P.nat toks ; (* OK, doesn't recognize white space *) |
|
1774 P.list P.nat proper_toks ; (* fails, doesn't recognize what follows ',' *) |
|
1775 P.list (P.maybe P.nat) proper_toks ; (* fails, end of input *) |
|
1776 P.list (P.maybe P.nat) (proper_toks @ [tok_eof]) ; (* OK *) |
|
1777 val toks : token list = OuterSyntax.scan "44 and 55 and 66 and 77" ; |
|
1778 P.and_list P.nat (List.filter T.is_proper toks @ [tok_eof]) ; (* ??? *) |
|
1779 \end{verbatim} |
|
1780 |
|
1781 The following code helps run examples: |
|
1782 \begin{verbatim} |
|
1783 fun parse_str tlp str = |
|
1784 let val toks : token list = OuterSyntax.scan str ; |
|
1785 val proper_toks = List.filter T.is_proper toks @ [tok_eof] ; |
|
1786 val (res, rem_toks) = tlp proper_toks ; |
|
1787 val rem_str = String.concat |
|
1788 (Library.separate " " (List.map T.unparse rem_toks)) ; |
|
1789 in (res, rem_str) end ; |
|
1790 \end{verbatim} |
|
1791 |
|
1792 Some examples from \texttt{src/Pure/Isar/outer\_parse.ML} |
|
1793 \begin{verbatim} |
|
1794 val type_args = |
|
1795 type_ident >> Library.single || |
|
1796 $$$ "(" |-- !!! (list1 type_ident --| $$$ ")") || |
|
1797 Scan.succeed []; |
|
1798 \end{verbatim} |
|
1799 There are three ways parsing a list of type arguments can succeed. |
|
1800 The first line reads a single type argument, and turns it into a singleton |
|
1801 list. |
|
1802 The second line reads "(", and then the remainder, ignoring the "(" ; |
|
1803 the remainder consists of a list of type identifiers (at least one), |
|
1804 and then a ")" which is also ignored. |
|
1805 The \texttt{!!!} ensures that if the parsing proceeds this far and then fails, |
|
1806 it won't try the third line (see the description of \texttt{Scan.!!}). |
|
1807 The third line consumes no input and returns the empty list. |
|
1808 |
|
1809 \begin{verbatim} |
|
1810 fun triple2 (x, (y, z)) = (x, y, z); |
|
1811 val arity = xname -- ($$$ "::" |-- !!! ( |
|
1812 Scan.optional ($$$ "(" |-- !!! (list1 sort --| $$$ ")")) [] |
|
1813 -- sort)) >> triple2; |
|
1814 \end{verbatim} |
|
1815 The parser \texttt{arity} reads a typename $t$, then ``\texttt{::}'' (which is |
|
1816 ignored), then optionally a list $ss$ of sorts and then another sort $s$. |
|
1817 The result $(t, (ss, s))$ is transformed by \texttt{triple2} to $(t, ss, s)$. |
|
1818 The second line reads the optional list of sorts: |
|
1819 it reads first ``\texttt{(}'' and last ``\texttt{)}'', which are both ignored, |
|
1820 and between them a comma-separated list of sorts. |
|
1821 If this list is absent, the default \texttt{[]} provides the list of sorts. |
|
1822 |
|
1823 \begin{verbatim} |
|
1824 parse_str P.type_args "('a, 'b) ntyp" ; |
|
1825 parse_str P.type_args "'a ntyp" ; |
|
1826 parse_str P.type_args "ntyp" ; |
|
1827 parse_str P.arity "ty :: tycl" ; |
|
1828 parse_str P.arity "ty :: (tycl1, tycl2) tycl" ; |
|
1829 \end{verbatim} |
|
1830 |
|
1831 *} |
|
1832 |
|
1833 section {* The \texttt{SpecParse} structure *} |
|
1834 |
|
1835 text {* |
|
1836 The source file is \texttt{src/Pure/Isar/spec\_parse.ML}. |
|
1837 This structure contains token list parsers for more complicated values. |
|
1838 For example, |
|
1839 \begin{verbatim} |
|
1840 open SpecParse ; |
|
1841 attrib : Attrib.src tok_rdr ; |
|
1842 attribs : Attrib.src list tok_rdr ; |
|
1843 opt_attribs : Attrib.src list tok_rdr ; |
|
1844 xthm : (thmref * Attrib.src list) tok_rdr ; |
|
1845 xthms1 : (thmref * Attrib.src list) list tok_rdr ; |
|
1846 |
|
1847 parse_str attrib "simp" ; |
|
1848 parse_str opt_attribs "hello" ; |
|
1849 val (ass, "") = parse_str attribs "[standard, xxxx, simp, intro, OF sym]" ; |
|
1850 map Args.dest_src ass ; |
|
1851 val (asrc, "") = parse_str attrib "THEN trans [THEN sym]" ; |
|
1852 |
|
1853 parse_str xthm "mythm [attr]" ; |
|
1854 parse_str xthms1 "thm1 [attr] thms2" ; |
|
1855 \end{verbatim} |
|
1856 |
|
1857 As you can see, attributes are described using types of the \texttt{Args} |
|
1858 structure, described below. |
|
1859 *} |
|
1860 |
|
1861 section{* The \texttt{Args} structure *} |
|
1862 |
|
1863 text {* |
|
1864 The source file is \texttt{src/Pure/Isar/args.ML}. |
|
1865 The primary type of this structure is the \texttt{src} datatype; |
|
1866 the single constructors not published in the signature, but |
|
1867 \texttt{Args.src} and \texttt{Args.dest\_src} |
|
1868 are in fact the constructor and destructor functions. |
|
1869 Note that the types \texttt{Attrib.src} and \texttt{Method.src} |
|
1870 are in fact \texttt{Args.src}. |
|
1871 |
|
1872 \begin{verbatim} |
|
1873 src : (string * Args.T list) * Position.T -> Args.src ; |
|
1874 dest_src : Args.src -> (string * Args.T list) * Position.T ; |
|
1875 Args.pretty_src : Proof.context -> Args.src -> Pretty.T ; |
|
1876 fun pr_src ctxt src = Pretty.string_of (Args.pretty_src ctxt src) ; |
|
1877 |
|
1878 val thy = ML_Context.the_context () ; |
|
1879 val ctxt = ProofContext.init thy ; |
|
1880 map (pr_src ctxt) ass ; |
|
1881 \end{verbatim} |
|
1882 |
|
1883 So an \texttt{Args.src} consists of the first word, then a list of further |
|
1884 ``arguments'', of type \texttt{Args.T}, with information about position in the |
|
1885 input. |
|
1886 \begin{verbatim} |
|
1887 (* how an Args.src is parsed *) |
|
1888 P.position : 'a tlp -> ('a * Position.T) tlp ; |
|
1889 P.arguments : Args.T list tlp ; |
|
1890 |
|
1891 val parse_src : Args.src tlp = |
|
1892 P.position (P.xname -- P.arguments) >> Args.src ; |
|
1893 \end{verbatim} |
|
1894 |
|
1895 \begin{verbatim} |
|
1896 val ((first_word, args), pos) = Args.dest_src asrc ; |
|
1897 map Args.string_of args ; |
|
1898 \end{verbatim} |
|
1899 |
|
1900 The \texttt{Args} structure contains more parsers and parser transformers |
|
1901 for which the input source type is \texttt{Args.T list}. For example, |
|
1902 \begin{verbatim} |
|
1903 type 'a atlp = Args.T list -> 'a * Args.T list ; |
|
1904 open Args ; |
|
1905 nat : int atlp ; (* also Args.int *) |
|
1906 thm_sel : PureThy.interval list atlp ; |
|
1907 list : 'a atlp -> 'a list atlp ; |
|
1908 attribs : (string -> string) -> Args.src list atlp ; |
|
1909 opt_attribs : (string -> string) -> Args.src list atlp ; |
|
1910 |
|
1911 (* parse_atl_str : 'a atlp -> (string -> 'a * string) ; |
|
1912 given an Args.T list parser, to get a string parser *) |
|
1913 fun parse_atl_str atlp str = |
|
1914 let val (ats, rem_str) = parse_str P.arguments str ; |
|
1915 val (res, rem_ats) = atlp ats ; |
|
1916 in (res, String.concat (Library.separate " " |
|
1917 (List.map Args.string_of rem_ats @ [rem_str]))) end ; |
|
1918 |
|
1919 parse_atl_str Args.int "-1-," ; |
|
1920 parse_atl_str (Scan.option Args.int) "x1-," ; |
|
1921 parse_atl_str Args.thm_sel "(1-,4,13-22)" ; |
|
1922 |
|
1923 val (ats as atsrc :: _, "") = parse_atl_str (Args.attribs I) |
|
1924 "[THEN trans [THEN sym], simp, OF sym]" ; |
|
1925 \end{verbatim} |
|
1926 |
|
1927 From here, an attribute is interpreted using \texttt{Attrib.attribute}. |
|
1928 |
|
1929 \texttt{Args} has a large number of functions which parse an \texttt{Args.src} |
|
1930 and also refer to a generic context. |
|
1931 Note the use of \texttt{Scan.lift} for this. |
|
1932 (as does \texttt{Attrib} - RETHINK THIS) |
|
1933 |
|
1934 (\texttt{Args.syntax} shown below has type specialised) |
|
1935 |
|
1936 \begin{verbatim} |
|
1937 type ('res, 'src) parse_fn = 'src -> 'res * 'src ; |
|
1938 type 'a cgatlp = ('a, Context.generic * Args.T list) parse_fn ; |
|
1939 Scan.lift : 'a atlp -> 'a cgatlp ; |
|
1940 term : term cgatlp ; |
|
1941 typ : typ cgatlp ; |
|
1942 |
|
1943 Args.syntax : string -> 'res cgatlp -> src -> ('res, Context.generic) parse_fn ; |
|
1944 Attrib.thm : thm cgatlp ; |
|
1945 Attrib.thms : thm list cgatlp ; |
|
1946 Attrib.multi_thm : thm list cgatlp ; |
|
1947 |
|
1948 (* parse_cgatl_str : 'a cgatlp -> (string -> 'a * string) ; |
|
1949 given a (Context.generic * Args.T list) parser, to get a string parser *) |
|
1950 fun parse_cgatl_str cgatlp str = |
|
1951 let |
|
1952 (* use the current generic context *) |
|
1953 val generic = Context.Theory thy ; |
|
1954 val (ats, rem_str) = parse_str P.arguments str ; |
|
1955 (* ignore any change to the generic context *) |
|
1956 val (res, (_, rem_ats)) = cgatlp (generic, ats) ; |
|
1957 in (res, String.concat (Library.separate " " |
|
1958 (List.map Args.string_of rem_ats @ [rem_str]))) end ; |
|
1959 \end{verbatim} |
|
1960 *} |
|
1961 |
|
1962 section{* Attributes, and the \texttt{Attrib} structure *} |
|
1963 |
|
1964 text {* |
|
1965 The type \texttt{attribute} is declared in \texttt{src/Pure/thm.ML}. |
|
1966 The source file for the \texttt{Attrib} structure is |
|
1967 \texttt{src/Pure/Isar/attrib.ML}. |
|
1968 Most attributes use a theorem to change a generic context (for example, |
|
1969 by declaring that the theorem should be used, by default, in simplification), |
|
1970 or change a theorem (which most often involves referring to the current |
|
1971 theory). |
|
1972 The functions \texttt{Thm.rule\_attribute} and |
|
1973 \texttt{Thm.declaration\_attribute} create attributes of these kinds. |
|
1974 |
|
1975 \begin{verbatim} |
|
1976 type attribute = Context.generic * thm -> Context.generic * thm; |
|
1977 type 'a trf = 'a -> 'a ; (* transformer of a given type *) |
|
1978 Thm.rule_attribute : (Context.generic -> thm -> thm) -> attribute ; |
|
1979 Thm.declaration_attribute : (thm -> Context.generic trf) -> attribute ; |
|
1980 |
|
1981 Attrib.print_attributes : theory -> unit ; |
|
1982 Attrib.pretty_attribs : Proof.context -> src list -> Pretty.T list ; |
|
1983 |
|
1984 List.app Pretty.writeln (Attrib.pretty_attribs ctxt ass) ; |
|
1985 \end{verbatim} |
|
1986 |
|
1987 An attribute is stored in a theory as indicated by: |
|
1988 \begin{verbatim} |
|
1989 Attrib.add_attributes : |
|
1990 (bstring * (src -> attribute) * string) list -> theory trf ; |
|
1991 (* |
|
1992 Attrib.add_attributes [("THEN", THEN_att, "resolution with rule")] ; |
|
1993 *) |
|
1994 \end{verbatim} |
|
1995 where the first and third arguments are name and description of the attribute, |
|
1996 and the second is a function which parses the attribute input text |
|
1997 (including the attribute name, which has necessarily already been parsed). |
|
1998 Here, \texttt{THEN\_att} is a function declared in the code for the |
|
1999 structure \texttt{Attrib}, but not published in its signature. |
|
2000 The source file \texttt{src/Pure/Isar/attrib.ML} shows the use of |
|
2001 \texttt{Attrib.add\_attributes} to add a number of attributes. |
|
2002 |
|
2003 \begin{verbatim} |
|
2004 FullAttrib.THEN_att : src -> attribute ; |
|
2005 FullAttrib.THEN_att atsrc (generic, ML_Context.thm "sym") ; |
|
2006 FullAttrib.THEN_att atsrc (generic, ML_Context.thm "all_comm") ; |
|
2007 \end{verbatim} |
|
2008 |
|
2009 \begin{verbatim} |
|
2010 Attrib.syntax : attribute cgatlp -> src -> attribute ; |
|
2011 Attrib.no_args : attribute -> src -> attribute ; |
|
2012 \end{verbatim} |
|
2013 When this is called as \texttt{syntax scan src (gc, th)} |
|
2014 the generic context \texttt{gc} is used |
|
2015 (and potentially changed to \texttt{gc'}) |
|
2016 by \texttt{scan} in parsing to obtain an attribute \texttt{attr} which would |
|
2017 then be applied to \texttt{(gc', th)}. |
|
2018 The source for parsing the attribute is the arguments part of \texttt{src}, |
|
2019 which must all be consumed by the parse. |
|
2020 |
|
2021 For example, for \texttt{Attrib.no\_args attr src}, the attribute parser |
|
2022 simply returns \texttt{attr}, requiring that the arguments part of |
|
2023 \texttt{src} must be empty. |
|
2024 |
|
2025 Some examples from \texttt{src/Pure/Isar/attrib.ML}, modified: |
|
2026 \begin{verbatim} |
|
2027 fun rot_att_n n (gc, th) = (gc, rotate_prems n th) ; |
|
2028 rot_att_n : int -> attribute ; |
|
2029 val rot_arg = Scan.lift (Scan.optional Args.int 1 : int atlp) : int cgatlp ; |
|
2030 val rotated_att : src -> attribute = |
|
2031 Attrib.syntax (rot_arg >> rot_att_n : attribute cgatlp) ; |
|
2032 |
|
2033 val THEN_arg : int cgatlp = Scan.lift |
|
2034 (Scan.optional (Args.bracks Args.nat : int atlp) 1 : int atlp) ; |
|
2035 |
|
2036 Attrib.thm : thm cgatlp ; |
|
2037 |
|
2038 THEN_arg -- Attrib.thm : (int * thm) cgatlp ; |
|
2039 |
|
2040 fun THEN_att_n (n, tht) (gc, th) = (gc, th RSN (n, tht)) ; |
|
2041 THEN_att_n : int * thm -> attribute ; |
|
2042 |
|
2043 val THEN_att : src -> attribute = Attrib.syntax |
|
2044 (THEN_arg -- Attrib.thm >> THEN_att_n : attribute cgatlp); |
|
2045 \end{verbatim} |
|
2046 The functions I've called \texttt{rot\_arg} and \texttt{THEN\_arg} |
|
2047 read an optional argument, which for \texttt{rotated} is an integer, |
|
2048 and for \texttt{THEN} is a natural enclosed in square brackets; |
|
2049 the default, if the argument is absent, is 1 in each case. |
|
2050 Functions \texttt{rot\_att\_n} and \texttt{THEN\_att\_n} turn these into |
|
2051 attributes, where \texttt{THEN\_att\_n} also requires a theorem, which is |
|
2052 parsed by \texttt{Attrib.thm}. |
|
2053 Infix operators \texttt{--} and \texttt{>>} are in the structure \texttt{Scan}. |
|
2054 |
|
2055 *} |
|
2056 |
|
2057 section{* Methods, and the \texttt{Method} structure *} |
|
2058 |
|
2059 text {* |
|
2060 The source file is \texttt{src/Pure/Isar/method.ML}. |
|
2061 The type \texttt{method} is defined by the datatype declaration |
|
2062 \begin{verbatim} |
|
2063 (* datatype method = Meth of thm list -> cases_tactic; *) |
|
2064 RuleCases.NO_CASES : tactic -> cases_tactic ; |
|
2065 \end{verbatim} |
|
2066 In fact \texttt{RAW\_METHOD\_CASES} (below) is exactly the constructor |
|
2067 \texttt{Meth}. |
|
2068 A \texttt{cases\_tactic} is an elaborated version of a tactic. |
|
2069 \texttt{NO\_CASES tac} is a \texttt{cases\_tactic} which consists of a |
|
2070 \texttt{cases\_tactic} without any further case information. |
|
2071 For further details see the description of structure \texttt{RuleCases} below. |
|
2072 The list of theorems to be passed to a method consists of the current |
|
2073 \emph{facts} in the proof. |
|
2074 |
|
2075 \begin{verbatim} |
|
2076 RAW_METHOD : (thm list -> tactic) -> method ; |
|
2077 METHOD : (thm list -> tactic) -> method ; |
|
2078 |
|
2079 SIMPLE_METHOD : tactic -> method ; |
|
2080 SIMPLE_METHOD' : (int -> tactic) -> method ; |
|
2081 SIMPLE_METHOD'' : ((int -> tactic) -> tactic) -> (int -> tactic) -> method ; |
|
2082 |
|
2083 RAW_METHOD_CASES : (thm list -> cases_tactic) -> method ; |
|
2084 METHOD_CASES : (thm list -> cases_tactic) -> method ; |
|
2085 \end{verbatim} |
|
2086 A method is, in its simplest form, a tactic; applying the method is to apply |
|
2087 the tactic to the current goal state. |
|
2088 |
|
2089 Applying \texttt{RAW\_METHOD tacf} creates a tactic by applying |
|
2090 \texttt{tacf} to the current {facts}, and applying that tactic to the |
|
2091 goal state. |
|
2092 |
|
2093 \texttt{METHOD} is similar but also first applies |
|
2094 \texttt{Goal.conjunction\_tac} to all subgoals. |
|
2095 |
|
2096 \texttt{SIMPLE\_METHOD tac} inserts the facts into all subgoals and then |
|
2097 applies \texttt{tacf}. |
|
2098 |
|
2099 \texttt{SIMPLE\_METHOD' tacf} inserts the facts and then |
|
2100 applies \texttt{tacf} to subgoal 1. |
|
2101 |
|
2102 \texttt{SIMPLE\_METHOD'' quant tacf} does this for subgoal(s) selected by |
|
2103 \texttt{quant}, which may be, for example, |
|
2104 \texttt{ALLGOALS} (all subgoals), |
|
2105 \texttt{TRYALL} (try all subgoals, failure is OK), |
|
2106 \texttt{FIRSTGOAL} (try subgoals until it succeeds once), |
|
2107 \texttt{(fn tacf => tacf 4)} (subgoal 4), etc |
|
2108 (see the \texttt{Tactical} structure, FIXME) %%\cite[Chapter 4]{ref}). |
|
2109 |
|
2110 A method is stored in a theory as indicated by: |
|
2111 \begin{verbatim} |
|
2112 Method.add_method : |
|
2113 (bstring * (src -> Proof.context -> method) * string) -> theory trf ; |
|
2114 ( * |
|
2115 * ) |
|
2116 \end{verbatim} |
|
2117 where the first and third arguments are name and description of the method, |
|
2118 and the second is a function which parses the method input text |
|
2119 (including the method name, which has necessarily already been parsed). |
|
2120 |
|
2121 Here, \texttt{xxx} is a function declared in the code for the |
|
2122 structure \texttt{Method}, but not published in its signature. |
|
2123 The source file \texttt{src/Pure/Isar/method.ML} shows the use of |
|
2124 \texttt{Method.add\_method} to add a number of methods. |
|
2125 *} |
|
2126 |
|
2127 (*>*) |
|
2128 end |
1459 end |