CookBook/Parsing.thy
changeset 104 5dcad9348e4d
parent 102 5e309df58557
child 105 f49dc7e96235
equal deleted inserted replaced
103:fe10da5354a3 104:5dcad9348e4d
   129 end"
   129 end"
   130   "((\"e\", [\"l\", \"l\", \"o\"]),(\"h\", [\"l\", \"l\", \"o\"]))"}
   130   "((\"e\", [\"l\", \"l\", \"o\"]),(\"h\", [\"l\", \"l\", \"o\"]))"}
   131 
   131 
   132   The parser @{ML "Scan.optional p x" for p x} returns the result of the parser 
   132   The parser @{ML "Scan.optional p x" for p x} returns the result of the parser 
   133   @{text "p"}, if it succeeds; otherwise it returns 
   133   @{text "p"}, if it succeeds; otherwise it returns 
   134   the default value @{text "x"}. For example
   134   the default value @{text "x"}. For example:
   135 
   135 
   136 @{ML_response [display,gray]
   136 @{ML_response [display,gray]
   137 "let 
   137 "let 
   138   val p = Scan.optional ($$ \"h\") \"x\"
   138   val p = Scan.optional ($$ \"h\") \"x\"
   139   val input1 = (explode \"hello\")
   139   val input1 = (explode \"hello\")
   156 end" "((SOME \"h\", [\"e\", \"l\", \"l\", \"o\"]), (NONE, [\"w\", \"o\", \"r\", \"l\", \"d\"]))"}
   156 end" "((SOME \"h\", [\"e\", \"l\", \"l\", \"o\"]), (NONE, [\"w\", \"o\", \"r\", \"l\", \"d\"]))"}
   157 
   157 
   158   The function @{ML "(op !!)"} helps to produce appropriate error messages
   158   The function @{ML "(op !!)"} helps to produce appropriate error messages
   159   during parsing. For example if you want to parse that @{text p} is immediately 
   159   during parsing. For example if you want to parse that @{text p} is immediately 
   160   followed by @{text q}, or start a completely different parser @{text r},
   160   followed by @{text q}, or start a completely different parser @{text r},
   161   you might write
   161   you might write:
   162 
   162 
   163   @{ML [display,gray] "(p -- q) || r" for p q r}
   163   @{ML [display,gray] "(p -- q) || r" for p q r}
   164 
   164 
   165   However, this parser is problematic for producing an appropriate error
   165   However, this parser is problematic for producing an appropriate error
   166   message, in case the parsing of @{ML "(p -- q)" for p q} fails. Because in
   166   message, in case the parsing of @{ML "(p -- q)" for p q} fails. Because in
   169   the input, but not @{text q}. That means @{ML "(p -- q)" for p q} will fail
   169   the input, but not @{text q}. That means @{ML "(p -- q)" for p q} will fail
   170   and the alternative parser @{text r} will be tried. However in many
   170   and the alternative parser @{text r} will be tried. However in many
   171   circumstance this will be the wrong parser for the input ``p-followed-by-q''
   171   circumstance this will be the wrong parser for the input ``p-followed-by-q''
   172   and therefore will also fail. The error message is then caused by the
   172   and therefore will also fail. The error message is then caused by the
   173   failure of @{text r}, not by the absence of @{text q} in the input. This
   173   failure of @{text r}, not by the absence of @{text q} in the input. This
   174   kind of situation can be avoided when using the function @{ML "(op
   174   kind of situation can be avoided when using the function @{ML "(op !!)"}. 
   175   !!)"}. This function aborts the whole process of parsing in case of a
   175   This function aborts the whole process of parsing in case of a
   176   failure and prints an error message. For example if you invoke the parser
   176   failure and prints an error message. For example if you invoke the parser
   177 
   177 
   178   
   178   
   179   @{ML [display,gray] "(!! (fn _ => \"foo\") ($$ \"h\"))"}
   179   @{ML [display,gray] "(!! (fn _ => \"foo\") ($$ \"h\"))"}
   180 
   180 
   188   
   188   
   189   @{ML_response_fake [display,gray] "(!! (fn _ => \"foo\") ($$ \"h\")) (explode \"world\")"
   189   @{ML_response_fake [display,gray] "(!! (fn _ => \"foo\") ($$ \"h\")) (explode \"world\")"
   190                                "Exception ABORT raised"}
   190                                "Exception ABORT raised"}
   191 
   191 
   192   then the parsing aborts and the error message @{text "foo"} is printed out. In order to
   192   then the parsing aborts and the error message @{text "foo"} is printed out. In order to
   193   see the error message, we need to prefix the parser with the function 
   193   see the error message properly, we need to prefix the parser with the function 
   194   @{ML "Scan.error"}. For example:
   194   @{ML "Scan.error"}. For example:
   195 
   195 
   196   @{ML_response_fake [display,gray] "Scan.error (!! (fn _ => \"foo\") ($$ \"h\"))"
   196   @{ML_response_fake [display,gray] "Scan.error (!! (fn _ => \"foo\") ($$ \"h\"))"
   197                                "Exception Error \"foo\" raised"}
   197                                "Exception Error \"foo\" raised"}
   198 
   198 
   291 end"
   291 end"
   292 "(([\"f\", \"o\", \"o\", \"o\", \"o\", \"o\"], []),
   292 "(([\"f\", \"o\", \"o\", \"o\", \"o\", \"o\"], []),
   293  ([\"f\", \"o\", \"o\"], [\"*\", \"o\", \"o\", \"o\"]))"}
   293  ([\"f\", \"o\", \"o\"], [\"*\", \"o\", \"o\", \"o\"]))"}
   294 
   294 
   295   After parsing is done, you nearly always want to apply a function on the parsed 
   295   After parsing is done, you nearly always want to apply a function on the parsed 
   296   items. To do this the function @{ML "(p >> f)" for p f} can be employed, which runs 
   296   items. One way to do this is the function @{ML "(p >> f)" for p f}, which runs 
   297   first the parser @{text p} and upon successful completion applies the 
   297   first the parser @{text p} and upon successful completion applies the 
   298   function @{text f} to the result. For example
   298   function @{text f} to the result. For example
   299 
   299 
   300 @{ML_response [display,gray]
   300 @{ML_response [display,gray]
   301 "let 
   301 "let 
   303 in
   303 in
   304   (($$ \"h\") -- ($$ \"e\") >> double) (explode \"hello\")
   304   (($$ \"h\") -- ($$ \"e\") >> double) (explode \"hello\")
   305 end"
   305 end"
   306 "((\"hh\", \"ee\"), [\"l\", \"l\", \"o\"])"}
   306 "((\"hh\", \"ee\"), [\"l\", \"l\", \"o\"])"}
   307 
   307 
   308   doubles the two parsed input strings. Or
   308   doubles the two parsed input strings; or
   309 
   309 
   310   @{ML_response [display,gray] 
   310   @{ML_response [display,gray] 
   311 "let 
   311 "let 
   312    val p = Scan.repeat (Scan.one Symbol.not_eof) >> implode
   312    val p = Scan.repeat (Scan.one Symbol.not_eof)
   313    val input = (explode \"foo bar foo\") 
   313    val input = (explode \"foo bar foo\") 
   314 in
   314 in
   315    Scan.finite Symbol.stopper p input
   315    Scan.finite Symbol.stopper (p >> implode) input
   316 end" 
   316 end" 
   317 "(\"foo bar foo\",[])"}
   317 "(\"foo bar foo\",[])"}
   318 
   318 
   319   where the single-character strings in the parsed output are transformed
   319   where the single-character strings in the parsed output are transformed
   320   back into one string.
   320   back into one string.
   342 section {* Parsing Theory Syntax *}
   342 section {* Parsing Theory Syntax *}
   343 
   343 
   344 text {*
   344 text {*
   345   Most of the time, however, Isabelle developers have to deal with parsing
   345   Most of the time, however, Isabelle developers have to deal with parsing
   346   tokens, not strings.  This is because the parsers for the theory syntax, as
   346   tokens, not strings.  This is because the parsers for the theory syntax, as
   347   well as the parsers for the argument syntax of proof methods and attributes
   347   well as the parsers for the arguments of proof methods the type @{ML_type OuterLex.token} 
   348   use the type @{ML_type OuterLex.token} (which is identical to the type
   348   (which is identical to the type @{ML_type OuterParse.token}).  There are also handy 
   349   @{ML_type OuterParse.token}).  There are also parsers for ML-expressions and
   349   parsers for ML-expressions and ML-files.
   350   ML-files, which can be sometimes handy.
       
   351 
   350 
   352   \begin{readmore}
   351   \begin{readmore}
   353   The parser functions for the theory syntax are contained in the structure
   352   The parser functions for the theory syntax are contained in the structure
   354   @{ML_struct OuterParse} defined in the file @{ML_file  "Pure/Isar/outer_parse.ML"}.
   353   @{ML_struct OuterParse} defined in the file @{ML_file  "Pure/Isar/outer_parse.ML"}.
   355   The definition for tokens is in the file @{ML_file "Pure/Isar/outer_lex.ML"}.
   354   The definition for tokens is in the file @{ML_file "Pure/Isar/outer_lex.ML"}.
   360   @{ML "Command" in OuterLex} for commands). Some token parsers take into account the 
   359   @{ML "Command" in OuterLex} for commands). Some token parsers take into account the 
   361   kind of tokens.
   360   kind of tokens.
   362 *}  
   361 *}  
   363 
   362 
   364 text {*
   363 text {*
   365   With the examples below, you can generate a token list out of a string using
   364   The first example shows how to generate a token list out of a string using
   366   the function @{ML "OuterSyntax.scan"}, which is given @{ML "Position.none"}
   365   the function @{ML "OuterSyntax.scan"}. It is given below @{ML "Position.none"}
   367   as argument since, at the moment, we are not interested in generating
   366   as argument since, at the moment, we are not interested in generating
   368   precise error messages. The following code
   367   precise error messages. The following code
   369 
   368 
   370 @{ML_response_fake [display,gray] "OuterSyntax.scan Position.none \"hello world\"" 
   369 @{ML_response_fake [display,gray] "OuterSyntax.scan Position.none \"hello world\"" 
   371 "[Token (\<dots>,(Ident, \"hello\"),\<dots>), 
   370 "[Token (\<dots>,(Ident, \"hello\"),\<dots>), 
   406 "[Token (\<dots>,(Command, \"inductive\"),\<dots>), 
   405 "[Token (\<dots>,(Command, \"inductive\"),\<dots>), 
   407  Token (\<dots>,(Keyword, \"|\"),\<dots>), 
   406  Token (\<dots>,(Keyword, \"|\"),\<dots>), 
   408  Token (\<dots>,(Keyword, \"for\"),\<dots>)]"}
   407  Token (\<dots>,(Keyword, \"for\"),\<dots>)]"}
   409 
   408 
   410   you obtain a list consisting of only a command and two keyword tokens.
   409   you obtain a list consisting of only a command and two keyword tokens.
   411   If you want to see which keywords and commands are currently known, type in
   410   If you want to see which keywords and commands are currently known to Isabelle, type in
   412   the following code (you might have to adjust the @{ML print_depth} in order to
   411   the following code (you might have to adjust the @{ML print_depth} in order to
   413   see the complete list):
   412   see the complete list):
   414 
   413 
   415 @{ML_response_fake [display,gray] 
   414 @{ML_response_fake [display,gray] 
   416 "let 
   415 "let 
   498   A type-identifier, for example @{typ "'a"}, is a token of 
   497   A type-identifier, for example @{typ "'a"}, is a token of 
   499   kind @{ML "Keyword" in OuterLex}. It can be parsed using 
   498   kind @{ML "Keyword" in OuterLex}. It can be parsed using 
   500   the function @{ML OuterParse.type_ident}.
   499   the function @{ML OuterParse.type_ident}.
   501   \end{exercise}
   500   \end{exercise}
   502 
   501 
       
   502   (FIXME: or give parser for numbers)
   503 
   503 
   504 *}
   504 *}
   505 
   505 
   506 
   506 
   507 
   507 
   698   For our purposes it is sufficient to use the log files of the theories
   698   For our purposes it is sufficient to use the log files of the theories
   699   @{text "Pure"}, @{text "HOL"} and @{text "Pure-ProofGeneral"}, as well as
   699   @{text "Pure"}, @{text "HOL"} and @{text "Pure-ProofGeneral"}, as well as
   700   the log file for the theory @{text "Command.thy"}, which contains the new
   700   the log file for the theory @{text "Command.thy"}, which contains the new
   701   \isacommand{foobar}-command. If you target other logics besides HOL, such
   701   \isacommand{foobar}-command. If you target other logics besides HOL, such
   702   as Nominal or ZF, then you need to adapt the log files appropriately.
   702   as Nominal or ZF, then you need to adapt the log files appropriately.
       
   703   
   703   @{text Pure} and @{text HOL} are usually compiled during the installation of
   704   @{text Pure} and @{text HOL} are usually compiled during the installation of
   704   Isabelle. So log files for them should be already available. If not, then
   705   Isabelle. So log files for them should be already available. If not, then
   705   they can be conveniently compiled with the help of the build-script from the Isabelle
   706   they can be conveniently compiled with the help of the build-script from the Isabelle
   706   distribution.
   707   distribution.
   707 
   708 
   818   arguments are processed. Examples of this kind of commands are
   819   arguments are processed. Examples of this kind of commands are
   819   \isacommand{definition} and \isacommand{declare}.  In other cases,
   820   \isacommand{definition} and \isacommand{declare}.  In other cases,
   820   commands are expected to parse some arguments, for example a proposition,
   821   commands are expected to parse some arguments, for example a proposition,
   821   and then ``open up'' a proof in order to prove the proposition (for example
   822   and then ``open up'' a proof in order to prove the proposition (for example
   822   \isacommand{lemma}) or prove some other properties (for example
   823   \isacommand{lemma}) or prove some other properties (for example
   823   \isacommand{function}). To achieve this kind of behaviour, we have to use the kind
   824   \isacommand{function}). To achieve this kind of behaviour, you have to use the kind
   824   indicator @{ML thy_goal in OuterKeyword}.
   825   indicator @{ML thy_goal in OuterKeyword}.
   825 
   826 
   826   Below we change \isacommand{foobar} so that it takes a proposition as
   827   Below we change \isacommand{foobar} so that it takes a proposition as
   827   argument and then starts a proof in order to prove it. Therefore in Line 13, 
   828   argument and then starts a proof in order to prove it. Therefore in Line 13, 
   828   we set the kind indicator to @{ML thy_goal in OuterKeyword}.
   829   we set the kind indicator to @{ML thy_goal in OuterKeyword}.
   872   \isacommand{foobar}~@{text [quotes] "True \<and> True"}\\
   873   \isacommand{foobar}~@{text [quotes] "True \<and> True"}\\
   873   \isacommand{apply}@{text "(rule conjI)"}\\
   874   \isacommand{apply}@{text "(rule conjI)"}\\
   874   \isacommand{apply}@{text "(rule TrueI)+"}\\
   875   \isacommand{apply}@{text "(rule TrueI)+"}\\
   875   \isacommand{done}
   876   \isacommand{done}
   876   \end{isabelle}
   877   \end{isabelle}
   877 
       
   878   Similarly for the other function composition combinators.
       
   879 
   878 
   880   
   879   
   881   (FIXME What do @{ML "Toplevel.theory"} 
   880   (FIXME What do @{ML "Toplevel.theory"} 
   882   @{ML "Toplevel.print"} 
   881   @{ML "Toplevel.print"} 
   883   @{ML Toplevel.local_theory}?)
   882   @{ML Toplevel.local_theory}?)