CookBook/Parsing.thy
changeset 128 693711a0c702
parent 127 74846cb0fff9
child 131 8db9195bb3e9
equal deleted inserted replaced
127:74846cb0fff9 128:693711a0c702
    43 
    43 
    44   @{ML_response [display,gray] "($$ \"h\") (explode \"hello\")" "(\"h\", [\"e\", \"l\", \"l\", \"o\"])"}
    44   @{ML_response [display,gray] "($$ \"h\") (explode \"hello\")" "(\"h\", [\"e\", \"l\", \"l\", \"o\"])"}
    45 
    45 
    46   @{ML_response [display,gray] "($$ \"w\") (explode \"world\")" "(\"w\", [\"o\", \"r\", \"l\", \"d\"])"}
    46   @{ML_response [display,gray] "($$ \"w\") (explode \"world\")" "(\"w\", [\"o\", \"r\", \"l\", \"d\"])"}
    47 
    47 
       
    48   The type of a parser is defined as
       
    49 
       
    50   
       
    51 
       
    52 
    48   This function will either succeed (as in the two examples above) or raise the exception 
    53   This function will either succeed (as in the two examples above) or raise the exception 
    49   @{text "FAIL"} if no string can be consumed. For example trying to parse
    54   @{text "FAIL"} if no string can be consumed. For example trying to parse
    50 
    55 
    51   @{ML_response_fake [display,gray] "($$ \"x\") (explode \"world\")" 
    56   @{ML_response_fake [display,gray] "($$ \"x\") (explode \"world\")" 
    52                                "Exception FAIL raised"}
    57                                "Exception FAIL raised"}
    91                           "(((\"h\", \"e\"), \"l\"), [\"l\", \"o\"])"}
    96                           "(((\"h\", \"e\"), \"l\"), [\"l\", \"o\"])"}
    92 
    97 
    93   Note how the result of consumed strings builds up on the left as nested pairs.  
    98   Note how the result of consumed strings builds up on the left as nested pairs.  
    94 
    99 
    95   If, as in the previous example, you want to parse a particular string, 
   100   If, as in the previous example, you want to parse a particular string, 
    96   then one should use the function @{ML Scan.this_string}:
   101   then you should use the function @{ML Scan.this_string}:
    97 
   102 
    98   @{ML_response [display,gray] "Scan.this_string \"hell\" (explode \"hello\")"
   103   @{ML_response [display,gray] "Scan.this_string \"hell\" (explode \"hello\")"
    99                           "(\"hell\", [\"o\"])"}
   104                           "(\"hell\", [\"o\"])"}
   100 
   105 
   101   Parsers that explore alternatives can be constructed using the function @{ML
   106   Parsers that explore alternatives can be constructed using the function @{ML
   242   
   247   
   243   @{ML_response [display,gray] "Scan.finite Symbol.stopper (Scan.repeat ($$ \"h\")) (explode \"hhhh\")" 
   248   @{ML_response [display,gray] "Scan.finite Symbol.stopper (Scan.repeat ($$ \"h\")) (explode \"hhhh\")" 
   244                 "([\"h\", \"h\", \"h\", \"h\"], [])"}
   249                 "([\"h\", \"h\", \"h\", \"h\"], [])"}
   245 
   250 
   246   @{ML Symbol.stopper} is the ``end-of-input'' indicator for parsing strings;
   251   @{ML Symbol.stopper} is the ``end-of-input'' indicator for parsing strings;
   247   other stoppers need to be used when parsing tokens, for example. However, this kind of 
   252   other stoppers need to be used when parsing, for example, tokens. However, this kind of 
   248   manually wrapping is often already done by the surrounding infrastructure. 
   253   manually wrapping is often already done by the surrounding infrastructure. 
   249 
   254 
   250   The function @{ML Scan.repeat} can be used with @{ML Scan.one} to read any 
   255   The function @{ML Scan.repeat} can be used with @{ML Scan.one} to read any 
   251   string as in
   256   string as in
   252 
   257 
   346 
   351 
   347 section {* Parsing Theory Syntax *}
   352 section {* Parsing Theory Syntax *}
   348 
   353 
   349 text {*
   354 text {*
   350   Most of the time, however, Isabelle developers have to deal with parsing
   355   Most of the time, however, Isabelle developers have to deal with parsing
   351   tokens, not strings.  This is because the parsers for the theory syntax, as
   356   tokens, not strings.  These token parsers will have the type
   352   well as the parsers for the arguments of proof methods the type @{ML_type OuterLex.token} 
   357 *}
   353   (which is identical to the type @{ML_type OuterParse.token}).  There are also handy 
   358   
   354   parsers for ML-expressions and ML-files.
   359 ML{*type 'a parser = OuterLex.token list -> 'a * OuterLex.token list*}
       
   360 
       
   361 text {*
       
   362   This reason for using token parsers is that theory syntax, as well as the
       
   363   parsers for the arguments of proof methods, use the type @{ML_type
       
   364   OuterLex.token} (which is identical to the type @{ML_type
       
   365   OuterParse.token}).  However, there are also handy parsers for
       
   366   ML-expressions and ML-files.
   355 
   367 
   356   \begin{readmore}
   368   \begin{readmore}
   357   The parser functions for the theory syntax are contained in the structure
   369   The parser functions for the theory syntax are contained in the structure
   358   @{ML_struct OuterParse} defined in the file @{ML_file  "Pure/Isar/outer_parse.ML"}.
   370   @{ML_struct OuterParse} defined in the file @{ML_file  "Pure/Isar/outer_parse.ML"}.
   359   The definition for tokens is in the file @{ML_file "Pure/Isar/outer_lex.ML"}.
   371   The definition for tokens is in the file @{ML_file "Pure/Isar/outer_lex.ML"}.
   365   kind of tokens.
   377   kind of tokens.
   366 *}  
   378 *}  
   367 
   379 
   368 text {*
   380 text {*
   369   The first example shows how to generate a token list out of a string using
   381   The first example shows how to generate a token list out of a string using
   370   the function @{ML "OuterSyntax.scan"}. It is given below @{ML "Position.none"}
   382   the function @{ML "OuterSyntax.scan"}. It is given the argument @{ML "Position.none"}
   371   as argument since, at the moment, we are not interested in generating
   383   since, at the moment, we are not interested in generating
   372   precise error messages. The following code
   384   precise error messages. The following code
   373 
   385 
   374 @{ML_response_fake [display,gray] "OuterSyntax.scan Position.none \"hello world\"" 
   386 @{ML_response_fake [display,gray] "OuterSyntax.scan Position.none \"hello world\"" 
   375 "[Token (\<dots>,(Ident, \"hello\"),\<dots>), 
   387 "[Token (\<dots>,(Ident, \"hello\"),\<dots>), 
   376  Token (\<dots>,(Space, \" \"),\<dots>), 
   388  Token (\<dots>,(Space, \" \"),\<dots>), 
   420 "let 
   432 "let 
   421   val (keywords, commands) = OuterKeyword.get_lexicons ()
   433   val (keywords, commands) = OuterKeyword.get_lexicons ()
   422 in 
   434 in 
   423   (Scan.dest_lexicon commands, Scan.dest_lexicon keywords)
   435   (Scan.dest_lexicon commands, Scan.dest_lexicon keywords)
   424 end" 
   436 end" 
   425 "([\"}\",\"{\",\<dots>],[\"\<rightleftharpoons>\",\"\<leftharpoondown>\",\<dots>])"}
   437 "([\"}\", \"{\", \<dots>],[\"\<rightleftharpoons>\", \"\<leftharpoondown>\", \<dots>])"}
   426 
   438 
   427   The parser @{ML "OuterParse.$$$"} parses a single keyword. For example:
   439   The parser @{ML "OuterParse.$$$"} parses a single keyword. For example:
   428 
   440 
   429 @{ML_response [display,gray]
   441 @{ML_response [display,gray]
   430 "let 
   442 "let 
   431   val input1 = filtered_input \"where for\"
   443   val input1 = filtered_input \"where for\"
   432   val input2 = filtered_input \"| in\"
   444   val input2 = filtered_input \"| in\"
   433 in 
   445 in 
   434   (OuterParse.$$$ \"where\" input1, OuterParse.$$$ \"|\" input2)
   446   (OuterParse.$$$ \"where\" input1, OuterParse.$$$ \"|\" input2)
   435 end"
   447 end"
   436 "((\"where\",\<dots>),(\"|\",\<dots>))"}
   448 "((\"where\",\<dots>), (\"|\",\<dots>))"}
   437 
   449 
   438   Like before, you can sequentially connect parsers with @{ML "--"}. For example: 
   450   Like before, you can sequentially connect parsers with @{ML "--"}. For example: 
   439 
   451 
   440 @{ML_response [display,gray]
   452 @{ML_response [display,gray]
   441 "let 
   453 "let 
   442   val input = filtered_input \"| in\"
   454   val input = filtered_input \"| in\"
   443 in 
   455 in 
   444   (OuterParse.$$$ \"|\" -- OuterParse.$$$ \"in\") input
   456   (OuterParse.$$$ \"|\" -- OuterParse.$$$ \"in\") input
   445 end"
   457 end"
   446 "((\"|\",\"in\"),[])"}
   458 "((\"|\", \"in\"),[])"}
   447 
   459 
   448   The parser @{ML "OuterParse.enum s p" for s p} parses a possibly empty 
   460   The parser @{ML "OuterParse.enum s p" for s p} parses a possibly empty 
   449   list of items recognised by the parser @{text p}, where the items being parsed
   461   list of items recognised by the parser @{text p}, where the items being parsed
   450   are separated by the string @{text s}. For example:
   462   are separated by the string @{text s}. For example:
   451 
   463 
   453 "let 
   465 "let 
   454   val input = filtered_input \"in | in | in foo\"
   466   val input = filtered_input \"in | in | in foo\"
   455 in 
   467 in 
   456   (OuterParse.enum \"|\" (OuterParse.$$$ \"in\")) input
   468   (OuterParse.enum \"|\" (OuterParse.$$$ \"in\")) input
   457 end" 
   469 end" 
   458 "([\"in\",\"in\",\"in\"],[\<dots>])"}
   470 "([\"in\", \"in\", \"in\"],[\<dots>])"}
   459 
   471 
   460   @{ML "OuterParse.enum1"} works similarly, except that the parsed list must
   472   @{ML "OuterParse.enum1"} works similarly, except that the parsed list must
   461   be non-empty. Note that we had to add a string @{text [quotes] "foo"} at the
   473   be non-empty. Note that we had to add a string @{text [quotes] "foo"} at the
   462   end of the parsed string, otherwise the parser would have consumed all
   474   end of the parsed string, otherwise the parser would have consumed all
   463   tokens and then failed with the exception @{text "MORE"}. Like in the
   475   tokens and then failed with the exception @{text "MORE"}. Like in the
   470   val input = filtered_input \"in | in | in\"
   482   val input = filtered_input \"in | in | in\"
   471 in 
   483 in 
   472   Scan.finite OuterLex.stopper 
   484   Scan.finite OuterLex.stopper 
   473          (OuterParse.enum \"|\" (OuterParse.$$$ \"in\")) input
   485          (OuterParse.enum \"|\" (OuterParse.$$$ \"in\")) input
   474 end" 
   486 end" 
   475 "([\"in\",\"in\",\"in\"],[])"}
   487 "([\"in\", \"in\", \"in\"],[])"}
   476 
   488 
   477   The following function will help to run examples.
   489   The following function will help to run examples.
   478 
   490 
   479 *}
   491 *}
   480 
   492 
   508 
   520 
   509   Whenever there is a possibility that the processing of user input can fail, 
   521   Whenever there is a possibility that the processing of user input can fail, 
   510   it is a good idea to give as much information about where the error 
   522   it is a good idea to give as much information about where the error 
   511   occured. For this Isabelle can attach positional information to tokens
   523   occured. For this Isabelle can attach positional information to tokens
   512   and then thread this information up the processing chain. To see this,
   524   and then thread this information up the processing chain. To see this,
   513   modify the function @{ML filtered_input} described earlier to be
   525   modify the function @{ML filtered_input} described earlier to 
   514 *}
   526 *}
   515 
   527 
   516 ML{*fun filtered_input' str = 
   528 ML{*fun filtered_input' str = 
   517        filter OuterLex.is_proper (OuterSyntax.scan (Position.line 7) str) *}
   529        filter OuterLex.is_proper (OuterSyntax.scan (Position.line 7) str) *}
   518 
   530 
   582   
   594   
   583   The positional information is stored so that code called later on will be
   595   The positional information is stored so that code called later on will be
   584   able to give more precise error messages. 
   596   able to give more precise error messages. 
   585 
   597 
   586   \begin{readmore}
   598   \begin{readmore}
   587   The functions to do with input and outout of XML and YXML are defined 
   599   The functions to do with input and output of XML and YXML are defined 
   588   in @{ML_file "Pure/General/xml.ML"} and @{ML_file "Pure/General/yxml.ML"}.
   600   in @{ML_file "Pure/General/xml.ML"} and @{ML_file "Pure/General/yxml.ML"}.
   589   \end{readmore}
   601   \end{readmore}
   590 *}
   602 *}
   591 
   603 
   592 
   604 
   627                (SpecParse.opt_thm_name ":" -- OuterParse.prop))) []*}
   639                (SpecParse.opt_thm_name ":" -- OuterParse.prop))) []*}
   628 
   640 
   629 text {*
   641 text {*
   630   Note that the parser does not parse the keyword \simpleinductive, even if it is
   642   Note that the parser does not parse the keyword \simpleinductive, even if it is
   631   meant to process definitions as shown above. The parser of the keyword 
   643   meant to process definitions as shown above. The parser of the keyword 
   632   will be given by the infrastructure that will eventually calls @{ML spec_parser}.
   644   will be given by the infrastructure that will eventually call @{ML spec_parser}.
   633   
   645   
   634 
   646 
   635   To see what the parser returns, let us parse the string corresponding to the 
   647   To see what the parser returns, let us parse the string corresponding to the 
   636   definition of @{term even} and @{term odd}:
   648   definition of @{term even} and @{term odd}:
   637 
   649 
   716   (name, map Args.dest_src attrib)
   728   (name, map Args.dest_src attrib)
   717 end" "(foo_lemma, [((\"intro\", []), \<dots>), ((\"dest\", [\<dots>]), \<dots>)])"}
   729 end" "(foo_lemma, [((\"intro\", []), \<dots>), ((\"dest\", [\<dots>]), \<dots>)])"}
   718  
   730  
   719   The function @{ML opt_thm_name in SpecParse} is the ``optional'' variant of
   731   The function @{ML opt_thm_name in SpecParse} is the ``optional'' variant of
   720   @{ML thm_name in SpecParse}. As can be seen each theorem name can contain some 
   732   @{ML thm_name in SpecParse}. As can be seen each theorem name can contain some 
   721   attributes. 
   733   attributes. The name has to end with @{text [quotes] ":"}---see argument of 
   722 
   734   @{ML SpecParse.opt_thm_name} in Line 9.
   723   For the inductive definitions described above only the attibutes @{text "[intro]"} and
       
   724   @{text "[simp]"} make sense.
       
   725 
   735 
   726   \begin{readmore}
   736   \begin{readmore}
   727   Attributes and arguments are implemented in the files @{ML_file "Pure/Isar/attrib.ML"} 
   737   Attributes and arguments are implemented in the files @{ML_file "Pure/Isar/attrib.ML"} 
   728   and @{ML_file "Pure/Isar/args.ML"}.
   738   and @{ML_file "Pure/Isar/args.ML"}.
   729   \end{readmore}
   739   \end{readmore}
   845   or something similar depending on your Isabelle distribution and architecture.
   855   or something similar depending on your Isabelle distribution and architecture.
   846   One quick way to assign a shell variable to this directory is by typing
   856   One quick way to assign a shell variable to this directory is by typing
   847 
   857 
   848   @{text [display] "$ ISABELLE_LOGS=\"$(isabelle getenv -b ISABELLE_OUTPUT)\"/log"}
   858   @{text [display] "$ ISABELLE_LOGS=\"$(isabelle getenv -b ISABELLE_OUTPUT)\"/log"}
   849  
   859  
   850   on the Unix prompt. The directory should include the files:
   860   on the Unix prompt. If you now also type @{text "ls $ISABELLE_LOGS"}, then the 
       
   861   directory should include the files:
   851 
   862 
   852   @{text [display] 
   863   @{text [display] 
   853 "Pure.gz
   864 "Pure.gz
   854 HOL.gz
   865 HOL.gz
   855 Pure-ProofGeneral.gz
   866 Pure-ProofGeneral.gz
   954   proposition. Its argument @{ML NONE} stands for a locale (which we chose to
   965   proposition. Its argument @{ML NONE} stands for a locale (which we chose to
   955   omit); the argument @{ML "(K I)"} stands for a function that determines what
   966   omit); the argument @{ML "(K I)"} stands for a function that determines what
   956   should be done with the theorem once it is proved (we chose to just forget
   967   should be done with the theorem once it is proved (we chose to just forget
   957   about it). Lines 9 to 11 contain the parser for the proposition.
   968   about it). Lines 9 to 11 contain the parser for the proposition.
   958 
   969 
   959   (FIXME: explain @{ML Toplevel.print} etc)
       
   960 
       
   961   If you now type \isacommand{foobar}~@{text [quotes] "True \<and> True"}, you obtain the following 
   970   If you now type \isacommand{foobar}~@{text [quotes] "True \<and> True"}, you obtain the following 
   962   proof state:
   971   proof state:
   963 
   972 
   964   \begin{isabelle}
   973   \begin{isabelle}
   965   \isacommand{foobar}~@{text [quotes] "True \<and> True"}\\
   974   \isacommand{foobar}~@{text [quotes] "True \<and> True"}\\
   974   \isacommand{apply}@{text "(rule conjI)"}\\
   983   \isacommand{apply}@{text "(rule conjI)"}\\
   975   \isacommand{apply}@{text "(rule TrueI)+"}\\
   984   \isacommand{apply}@{text "(rule TrueI)+"}\\
   976   \isacommand{done}
   985   \isacommand{done}
   977   \end{isabelle}
   986   \end{isabelle}
   978 
   987 
       
   988   However, once you change the ``kind'' of a command from @{ML thy_decl in OuterKeyword}
       
   989   to @{ML thy_goal in OuterKeyword} then the keyword file needs to be re-created.
   979   
   990   
   980   (FIXME What do @{ML "Toplevel.theory"} 
   991   (FIXME What do @{ML "Toplevel.theory"} 
   981   @{ML "Toplevel.print"} 
   992   @{ML "Toplevel.print"} 
   982   @{ML Toplevel.local_theory}?)
   993   @{ML Toplevel.local_theory} do?)
   983 
   994 
   984   (FIXME read a name and show how to store theorems)
   995   (FIXME read a name and show how to store theorems)
   985 
   996 
   986 *}
   997 *}
   987 
   998