CookBook/Parsing.thy
changeset 65 c8e9a4f97916
parent 60 5b9c6010897b
child 66 d563f8ff6aa0
equal deleted inserted replaced
64:9a6e5e0c4906 65:c8e9a4f97916
    93   Note how the result of consumed strings builds up on the left as nested pairs.  
    93   Note how the result of consumed strings builds up on the left as nested pairs.  
    94 
    94 
    95   If, as in the previous example, one wants to parse a particular string, 
    95   If, as in the previous example, one wants to parse a particular string, 
    96   then one should use the function @{ML Scan.this_string}:
    96   then one should use the function @{ML Scan.this_string}:
    97 
    97 
    98   @{ML_response [display] "Scan.this_string \"hel\" (explode \"hello\")"
    98   @{ML_response [display] "Scan.this_string \"hell\" (explode \"hello\")"
    99                           "(\"hel\", [\"l\", \"o\"])"}
    99                           "(\"hell\", [\"o\"])"}
   100 
   100 
   101   Parsers that explore alternatives can be constructed using the function @{ML
   101   Parsers that explore alternatives can be constructed using the function @{ML
   102   "(op ||)"}. For example, the parser @{ML "(p || q)" for p q} returns the
   102   "(op ||)"}. For example, the parser @{ML "(p || q)" for p q} returns the
   103   result of @{text "p"}, in case it succeeds, otherwise it returns the
   103   result of @{text "p"}, in case it succeeds, otherwise it returns the
   104   result of @{text "q"}. For example
   104   result of @{text "q"}. For example
   160   one might write
   160   one might write
   161 
   161 
   162   @{ML [display] "(p -- q) || r" for p q r}
   162   @{ML [display] "(p -- q) || r" for p q r}
   163 
   163 
   164   However, this parser is problematic for producing an appropriate error message, in case
   164   However, this parser is problematic for producing an appropriate error message, in case
   165   the parsing of @{ML "(p -- q)" for p q} fails. Because in that case one loses with the parser
   165   the parsing of @{ML "(p -- q)" for p q} fails. Because in that case one loses the information 
   166   above the information 
       
   167   that @{text p} should be followed by @{text q}. To see this consider the case in
   166   that @{text p} should be followed by @{text q}. To see this consider the case in
   168   which @{text p} 
   167   which @{text p} 
   169   is present in the input, but not @{text q}. That means @{ML "(p -- q)" for p q} will fail 
   168   is present in the input, but not @{text q}. That means @{ML "(p -- q)" for p q} will fail 
   170   and the 
   169   and the 
   171   alternative parser @{text r} will be tried. However in many circumstance this will be the wrong
   170   alternative parser @{text r} will be tried. However in many circumstance this will be the wrong
   172   parser for the input ``p-followed-by-q'' and therefore will also fail. The error message is then 
   171   parser for the input ``p-followed-by-q'' and therefore will also fail. The error message is then 
   173   caused by the
   172   caused by the
   174   failure of @{text r}, not by the absence of @{text q} in the input. This kind of situation
   173   failure of @{text r}, not by the absence of @{text q} in the input. This kind of situation
   175   can be avoided by using the function @{ML "(op !!)"}. This function aborts the whole process of
   174   can be avoided when using the function @{ML "(op !!)"}. This function aborts the whole process of
   176   parsing in case of a failure and invokes an error message. For example if we invoke the parser
   175   parsing in case of a failure and prints an error message. For example if we invoke the parser
   177   
   176   
   178   @{ML [display] "(!! (fn _ => \"foo\") ($$ \"h\"))"}
   177   @{ML [display] "(!! (fn _ => \"foo\") ($$ \"h\"))"}
   179 
   178 
   180   on @{text [quotes] "hello"}, the parsing succeeds
   179   on @{text [quotes] "hello"}, the parsing succeeds
   181 
   180 
   186   but if we invoke it on @{text [quotes] "world"}
   185   but if we invoke it on @{text [quotes] "world"}
   187   
   186   
   188   @{ML_response_fake [display] "(!! (fn _ => \"foo\") ($$ \"h\")) (explode \"world\")"
   187   @{ML_response_fake [display] "(!! (fn _ => \"foo\") ($$ \"h\")) (explode \"world\")"
   189                                "Exception ABORT raised"}
   188                                "Exception ABORT raised"}
   190 
   189 
   191   the parsing aborts and the error message @{text "foo"} is printed out. In order to
   190   then the parsing aborts and the error message @{text "foo"} is printed out. In order to
   192   see the error message properly, we need to prefix the parser with the function 
   191   see the error message properly, we need to prefix the parser with the function 
   193   @{ML "Scan.error"}. For example
   192   @{ML "Scan.error"}. For example
   194 
   193 
   195   @{ML_response_fake [display] "Scan.error ((!! (fn _ => \"foo\") ($$ \"h\")))"
   194   @{ML_response_fake [display] "Scan.error ((!! (fn _ => \"foo\") ($$ \"h\")))"
   196                                "Exception Error \"foo\" raised"}
   195                                "Exception Error \"foo\" raised"}
   197 
   196 
   198   This ``prefixing'' is usually done by wrappers such as @{ML "OuterSyntax.command"} 
   197   This ``prefixing'' is usually done by wrappers such as @{ML "OuterSyntax.command"} 
   199   (FIXME: give reference to later place). 
   198   (FIXME: give reference to later place). 
   200   
   199   
   201   Returning to our example of parsing @{ML "(p -- q) || r" for p q r}. If we want
   200   Let us now return to our example of parsing @{ML "(p -- q) || r" for p q r}. If we want
   202   to generate the correct error message for p-followed-by-q, then
   201   to generate the correct error message for p-followed-by-q, then
   203   we have to write:
   202   we have to write:
   204 *}
   203 *}
   205 
   204 
   206 ML {* 
   205 ML {* 
   207 fun p_followed_by_q p q r =
   206 fun p_followed_by_q p q r =
   208   let 
   207   let 
   209      val err = (fn _ => p ^ " is not followed by " ^ q)
   208      val err_msg = (fn _ => p ^ " is not followed by " ^ q)
   210   in
   209   in
   211     (($$ p) -- (!! err ($$ q))) || (($$ r) -- ($$ r))
   210     ($$ p -- (!! err_msg ($$ q))) || ($$ r -- $$ r)
   212   end
   211   end
   213 *}
   212 *}
   214 
   213 
   215 
   214 
   216 text {*
   215 text {*
   217   Running this parser with
   216   Running this parser with the @{text [quotes] "h"} and @{text [quotes] "e"}, and 
       
   217   the input @{text [quotes] "holle"} 
   218 
   218 
   219   @{ML_response_fake [display] "Scan.error (p_followed_by_q \"h\" \"e\" \"w\") (explode \"holle\")"
   219   @{ML_response_fake [display] "Scan.error (p_followed_by_q \"h\" \"e\" \"w\") (explode \"holle\")"
   220                                "Exception ERROR \"h is not followed by e\" raised"} 
   220                                "Exception ERROR \"h is not followed by e\" raised"} 
   221 
   221 
   222   gives the correct error message. Running it with
   222   produces the correct error message. Running it with
   223  
   223  
   224   @{ML_response [display] "Scan.error (p_followed_by_q \"h\" \"e\" \"w\") (explode \"wworld\")"
   224   @{ML_response [display] "Scan.error (p_followed_by_q \"h\" \"e\" \"w\") (explode \"wworld\")"
   225                           "((\"w\", \"w\"), [\"o\", \"r\", \"l\", \"d\"])"}
   225                           "((\"w\", \"w\"), [\"o\", \"r\", \"l\", \"d\"])"}
   226   
   226   
   227   yields the expected parsing. 
   227   yields the expected parsing. 
   235   Note that @{ML "Scan.repeat"} stores the parsed items in a list. The function
   235   Note that @{ML "Scan.repeat"} stores the parsed items in a list. The function
   236   @{ML "Scan.repeat1"} is similar, but requires that the parser @{text "p"} 
   236   @{ML "Scan.repeat1"} is similar, but requires that the parser @{text "p"} 
   237   succeeds at least once.
   237   succeeds at least once.
   238 
   238 
   239   Also note that the parser would have aborted with the exception @{text MORE}, if
   239   Also note that the parser would have aborted with the exception @{text MORE}, if
   240   we had run it only on just @{text [quotes] "hhhh"}. This can be avoided using
   240   we had run it only on just @{text [quotes] "hhhh"}. This can be avoided by using
   241   the wrapper @{ML Scan.finite} and the ``stopper-token'' @{ML Symbol.stopper}. With
   241   the wrapper @{ML Scan.finite} and the ``stopper-token'' @{ML Symbol.stopper}. With
   242   them we can write
   242   them we can write
   243   
   243   
   244   @{ML_response [display] "Scan.finite Symbol.stopper (Scan.repeat ($$ \"h\")) (explode \"hhhh\")" 
   244   @{ML_response [display] "Scan.finite Symbol.stopper (Scan.repeat ($$ \"h\")) (explode \"hhhh\")" 
   245                 "([\"h\", \"h\", \"h\", \"h\"], [])"}
   245                 "([\"h\", \"h\", \"h\", \"h\"], [])"}
   246 
   246 
   247   However, this kind of manually wrapping needs to be done only very rarely
   247   @{ML Symbol.stopper} is the ``end-of-input'' indicator for parsing strings;
   248   in practise, because it is already done by the infrastructure for you. 
   248   other stoppers need to be used when parsing token, for example. However, this kind of 
       
   249   manually wrapping is often already done by the surrounding infrastructure. 
   249 
   250 
   250   The function @{ML Scan.repeat} can be used with @{ML Scan.one} to read any 
   251   The function @{ML Scan.repeat} can be used with @{ML Scan.one} to read any 
   251   string as in
   252   string as in
   252 
   253 
   253   @{ML_response [display] 
   254   @{ML_response [display] 
   258    Scan.finite Symbol.stopper p input
   259    Scan.finite Symbol.stopper p input
   259 end" 
   260 end" 
   260 "([\"f\", \"o\", \"o\", \" \", \"b\", \"a\", \"r\", \" \", \"f\", \"o\", \"o\"], [])"}
   261 "([\"f\", \"o\", \"o\", \" \", \"b\", \"a\", \"r\", \" \", \"f\", \"o\", \"o\"], [])"}
   261 
   262 
   262   where the function @{ML Symbol.not_eof} ensures that we do not read beyond the 
   263   where the function @{ML Symbol.not_eof} ensures that we do not read beyond the 
   263   end of the input string.
   264   end of the input string (i.e.~stopper symbol).
   264 
   265 
   265   The function @{ML "Scan.unless p q" for p q} takes two parsers: if the first one can 
   266   The function @{ML "Scan.unless p q" for p q} takes two parsers: if the first one can 
   266   parse the input, then the whole parser fails; if not, then the second is tried. Therefore
   267   parse the input, then the whole parser fails; if not, then the second is tried. Therefore
   267   
   268   
   268   @{ML_response_fake_both [display] "Scan.unless ($$ \"h\") ($$ \"w\") (explode \"hello\")"
   269   @{ML_response_fake_both [display] "Scan.unless ($$ \"h\") ($$ \"w\") (explode \"hello\")"
   289    Scan.finite Symbol.stopper p input2)
   290    Scan.finite Symbol.stopper p input2)
   290 end"
   291 end"
   291 "(([\"f\", \"o\", \"o\", \"o\", \"o\", \"o\"], []),
   292 "(([\"f\", \"o\", \"o\", \"o\", \"o\", \"o\"], []),
   292  ([\"f\", \"o\", \"o\"], [\"*\", \"o\", \"o\", \"o\"]))"}
   293  ([\"f\", \"o\", \"o\"], [\"*\", \"o\", \"o\", \"o\"]))"}
   293 
   294 
   294   After parsing succeeded, one nearly always wants to apply a function on the parsed 
   295   After parsing is done, one nearly always wants to apply a function on the parsed 
   295   items. This is done using the function @{ML "(p >> f)" for p f} which runs 
   296   items. To do this the function @{ML "(p >> f)" for p f} can be employed, which runs 
   296   first the parser @{text p} and upon successful completion applies the 
   297   first the parser @{text p} and upon successful completion applies the 
   297   function @{text f} to the result. For example
   298   function @{text f} to the result. For example
   298 
   299 
   299 @{ML_response [display]
   300 @{ML_response [display]
   300 "let 
   301 "let 
   343 text {*
   344 text {*
   344   Most of the time, however, Isabelle developers have to deal with parsing tokens, not strings.
   345   Most of the time, however, Isabelle developers have to deal with parsing tokens, not strings.
   345   This is because the parsers for the theory syntax, as well as the parsers for the 
   346   This is because the parsers for the theory syntax, as well as the parsers for the 
   346   argument syntax of proof methods and attributes use the type 
   347   argument syntax of proof methods and attributes use the type 
   347   @{ML_type OuterLex.token} (which is identical to the type @{ML_type OuterParse.token}).
   348   @{ML_type OuterLex.token} (which is identical to the type @{ML_type OuterParse.token}).
       
   349   There are also parsers for ML-expressions and ML-files.
   348 
   350 
   349   \begin{readmore}
   351   \begin{readmore}
   350   The parser functions for the theory syntax are contained in the structure
   352   The parser functions for the theory syntax are contained in the structure
   351   @{ML_struct OuterParse} defined in the file @{ML_file  "Pure/Isar/outer_parse.ML"}.
   353   @{ML_struct OuterParse} defined in the file @{ML_file  "Pure/Isar/outer_parse.ML"}.
   352   The definition for tokens is in the file @{ML_file "Pure/Isar/outer_lex.ML"}.
   354   The definition for tokens is in the file @{ML_file "Pure/Isar/outer_lex.ML"}.
   387 in
   389 in
   388    filter OuterLex.is_proper input
   390    filter OuterLex.is_proper input
   389 end" 
   391 end" 
   390 "[Token (\<dots>,(Ident, \"hello\"), \<dots>), Token (\<dots>,(Ident, \"world\"), \<dots>)]"}
   392 "[Token (\<dots>,(Ident, \"hello\"), \<dots>), Token (\<dots>,(Ident, \"world\"), \<dots>)]"}
   391 
   393 
   392   For convenience we are going to use the function
   394   For convenience we define the function
   393 
   395 
   394 *}
   396 *}
   395 
   397 
   396 ML {* 
   398 ML {* 
   397 fun filtered_input str = 
   399 fun filtered_input str = 
   407 "[Token (\<dots>,(Command, \"inductive\"),\<dots>), 
   409 "[Token (\<dots>,(Command, \"inductive\"),\<dots>), 
   408  Token (\<dots>,(Keyword, \"|\"),\<dots>), 
   410  Token (\<dots>,(Keyword, \"|\"),\<dots>), 
   409  Token (\<dots>,(Keyword, \"for\"),\<dots>)]"}
   411  Token (\<dots>,(Keyword, \"for\"),\<dots>)]"}
   410 
   412 
   411   we obtain a list consisting of only a command and two keyword tokens.
   413   we obtain a list consisting of only a command and two keyword tokens.
   412   If you want to see which keywords and commands are currently known, use
   414   If you want to see which keywords and commands are currently known, type in
   413   the following (you might have to adjust the @{ML print_depth} in order to
   415   the following (you might have to adjust the @{ML print_depth} in order to
   414   see the complete list):
   416   see the complete list):
   415 
   417 
   416 @{ML_response_fake [display] 
   418 @{ML_response_fake [display] 
   417 "let 
   419 "let 
   495 end"
   497 end"
   496 "Exception ERROR \"Outer syntax error: keyword \"|\" expected, 
   498 "Exception ERROR \"Outer syntax error: keyword \"|\" expected, 
   497 but keyword in was found\" raised"
   499 but keyword in was found\" raised"
   498 }
   500 }
   499 
   501 
   500   \begin{exercise}
   502   \begin{exercise} (FIXME)
   501   A type-identifier, for example @{typ "'a"}, is a token of 
   503   A type-identifier, for example @{typ "'a"}, is a token of 
   502   kind @{ML "Keyword" in OuterLex}. It can be parsed using 
   504   kind @{ML "Keyword" in OuterLex}. It can be parsed using 
   503   the function @{ML OuterParse.type_ident}.
   505   the function @{ML OuterParse.type_ident}.
   504   \end{exercise}
   506   \end{exercise}
   505 
   507 
   506 
   508 
   507 *}
   509 *}
       
   510 
       
   511 
       
   512 
   508 
   513 
   509 
   514 
   510 section {* Positional Information *}
   515 section {* Positional Information *}
   511 
   516 
   512 text {*
   517 text {*
   533 
   538 
   534 ML {*
   539 ML {*
   535   OuterParse.opt_target
   540   OuterParse.opt_target
   536 *}
   541 *}
   537 
   542 
       
   543 text {* (FIXME funny output for a proposition) *}
       
   544 
   538 ML {*
   545 ML {*
   539   OuterParse.opt_target --
   546   OuterParse.opt_target --
   540   OuterParse.fixes -- 
   547   OuterParse.fixes -- 
   541   OuterParse.for_fixes --
   548   OuterParse.for_fixes --
   542   Scan.optional (OuterParse.$$$ "where" |--
   549   Scan.optional (OuterParse.$$$ "where" |--
   543     OuterParse.!!! (OuterParse.enum1 "|" (SpecParse.opt_thm_name ":" -- OuterParse.prop))) []
   550     OuterParse.!!! (OuterParse.enum1 "|" (SpecParse.opt_thm_name ":" -- OuterParse.prop))) []
   544 
   551 
   545 *}
   552 *}
   546 
   553 
   547 text {* (FIXME funny output for a proposition) *}
   554 ML {* OuterSyntax.command *}
       
   555 
       
   556 section {* New Commands and Keyword Files *}
       
   557 
       
   558 text {*
       
   559 
       
   560   Often new commands, for example for providing a new definitional principle,
       
   561   need to be programmed. While this is not too difficult to do on the
       
   562   ML-level, new commands, in order to be useful, need to be recognised by
       
   563   ProofGeneral. This results in some subtle configuration issues, which we
       
   564   will explain in this section.
       
   565 
       
   566 
       
   567   Let us start with a silly command, named \isacommand{foo}, which does nothing at all. 
       
   568   It can be defined as
       
   569 
       
   570 @{ML [display]
       
   571 "let
       
   572   val do_nothing = Scan.succeed (Toplevel.theory (fn x => x))
       
   573   val flag = OuterKeyword.thy_decl
       
   574 in
       
   575   OuterSyntax.command \"foo\" \"description of foo\" flag do_nothing
       
   576 end"}
       
   577 
       
   578   The crucial function that makes the command \isacommand{foo} known to
       
   579   Isabelle is @{ML OuterSyntax.command}.  It expects a name of a command, a
       
   580   short description, a flag (we will explain it later more thoroughly) and a
       
   581   parser for a top-level transition function (its purpose will also explained
       
   582   later on). Lets also assume we packaged this function into a separate
       
   583   theory named @{text "Command"}, say, and a file named @{text "Command.thy"}. 
       
   584 
       
   585   We now need a keyword file, that can be loaded by ProofGeneral in
       
   586   order to recognise \isacommand{foo} as command. Such a keyword file
       
   587   can be generated with the command-line
       
   588 
       
   589   @{text [display] "$ isabelle keywords -k foo some-log-files"}
       
   590 
       
   591   The option @{text "-k foo"} indicates which postfix the keyword file will obtain. In
       
   592   the case above the generated file is named @{text "isar-keywords-foo.el"}. This command
       
   593   requires log files to be present (in order to extract the keywords from them). For 
       
   594   our purposes it is sufficient to use the log files for the theories @{text "Pure"},
       
   595   @{text "HOL"} and @{text "Pure-ProofGeneral"}, as well as the theory @{text "Command.thy"}
       
   596   containing the new command. @{text Pure} and @{text HOL} are usually compiled during the 
       
   597   installation of Isabelle, if not this can be done conveniently using build script from 
       
   598   the distribution
       
   599 
       
   600   @{text [display] 
       
   601 "$ ./build -m \"Pure\"
       
   602 $ ./build -m \"HOL\""}
       
   603   
       
   604   The @{text "Pure-ProofGeneral"} theory needs to be compiled with
       
   605 
       
   606   @{text [display] "$ ./build -m \"Pure-ProofGeneral\" \"Pure\""}
       
   607 
       
   608   For the theory containing the new command \isacommand{foo}, we first
       
   609   create a ``managed'' subdirectory by
       
   610 
       
   611 
       
   612   This creates files @{text "IsaMakefile"} and @{text "./FooCommand/ROOT.ML"}. We
       
   613   need to copy the file @{text "Command.thy"} into the directory @{text "FooCommand"}
       
   614   and add @{text "use_thy \"Command\";"} to @{text "./FooCommand/ROOT.ML"}.
       
   615   Now we can compile the theory by just typing
       
   616 
       
   617   @{text [display] "$ isabelle make"}
       
   618 
       
   619   All these compilations created the log files that we are after. They are typically stored 
       
   620   under the directory 
       
   621   
       
   622   @{text [display]  "~/.isabelle/heaps/\<dots>\<dots>/\<dots>\<dots>/log"}
       
   623 
       
   624   where the dots stand for names of the current Isabelle distribution and 
       
   625   hardware architecture. In it are the files
       
   626 
       
   627   @{text [display] 
       
   628 "Pure.gz
       
   629 HOL.gz
       
   630 Pure-ProofGeneral.gz
       
   631 HOL-FooCommand.gz"} 
       
   632 
       
   633   Let us assume the directory with these files is stored in the shell variable 
       
   634   @{text "ISABELLE_LOGS"}, by typing for example
       
   635 
       
   636   @{text [display] "$ ISABELLE_LOGS=\"$(isabelle getenv -b ISABELLE_OUTPUT)\"/log"}
       
   637  
       
   638   We can now create the keyword file with
       
   639 
       
   640 @{text [display]
       
   641 "$ isabelle keywords -k foo 
       
   642        `$ISABELLE_LOGS/{Pure.gz,HOL.gz,Pure-ProofGeneral,HOL-FooCommand.gz}`"}
       
   643 
       
   644   The result is the file @{text "isar-keywords-foo.el"}, which needs to be 
       
   645   copied in the directory @{text "~/.isabelle/etc"}. To make Isabelle use
       
   646   this keyword file, we need to start it with the option @{text "-k foo"}:
       
   647 
       
   648   @{text [display] "isabelle -k foo a-theory-file"}
       
   649 
       
   650   If we now run the original code creating the command
       
   651 
       
   652   @{ML [display]
       
   653 "let
       
   654   val do_nothing = Scan.succeed (Toplevel.theory (fn x => x))
       
   655   val flag = OuterKeyword.thy_decl
       
   656 in
       
   657   OuterSyntax.command \"foo\" \"description of foo\" flag do_nothing
       
   658 end"}
       
   659 
       
   660   then we can finally make use of \isacommand{foo}! Similarly
       
   661   with any other command you want to implement. 
       
   662 
       
   663   In the example above, we built the theories on top of the HOL-logic. If you
       
   664   target other logics, such as Nominal or ZF, then you need to change the
       
   665   log files appropriately.
       
   666 
       
   667   (FIXME Explain @{text "OuterKeyword.thy_decl"} and parser)
       
   668   
       
   669 
       
   670 *}
       
   671 
   548 
   672 
   549 
   673 
   550 
   674 
   551 chapter {* Parsing *}
   675 chapter {* Parsing *}
   552 
   676 
  1215 
  1339 
  1216 
  1340 
  1217 *}
  1341 *}
  1218 
  1342 
  1219 
  1343 
       
  1344 
  1220 end 
  1345 end