CookBook/Parsing.thy
changeset 102 5e309df58557
parent 101 123401a5c8e9
child 104 5dcad9348e4d
equal deleted inserted replaced
101:123401a5c8e9 102:5e309df58557
    84 end"
    84 end"
    85     "((\"h\", [\"e\", \"l\", \"l\", \"o\"]),(\"w\", [\"o\", \"r\", \"l\", \"d\"]))"}
    85     "((\"h\", [\"e\", \"l\", \"l\", \"o\"]),(\"w\", [\"o\", \"r\", \"l\", \"d\"]))"}
    86 
    86 
    87   Two parser can be connected in sequence by using the function @{ML "(op --)"}. 
    87   Two parser can be connected in sequence by using the function @{ML "(op --)"}. 
    88   For example parsing @{text "h"}, @{text "e"} and @{text "l"} in this 
    88   For example parsing @{text "h"}, @{text "e"} and @{text "l"} in this 
    89   sequence can be achieved by
    89   sequence you can achieve by:
    90 
    90 
    91   @{ML_response [display,gray] "(($$ \"h\") -- ($$ \"e\") -- ($$ \"l\")) (explode \"hello\")"
    91   @{ML_response [display,gray] "(($$ \"h\") -- ($$ \"e\") -- ($$ \"l\")) (explode \"hello\")"
    92                           "(((\"h\", \"e\"), \"l\"), [\"l\", \"o\"])"}
    92                           "(((\"h\", \"e\"), \"l\"), [\"l\", \"o\"])"}
    93 
    93 
    94   Note how the result of consumed strings builds up on the left as nested pairs.  
    94   Note how the result of consumed strings builds up on the left as nested pairs.  
    95 
    95 
    96   If, as in the previous example, one wants to parse a particular string, 
    96   If, as in the previous example, you want to parse a particular string, 
    97   then one should use the function @{ML Scan.this_string}:
    97   then one should use the function @{ML Scan.this_string}:
    98 
    98 
    99   @{ML_response [display,gray] "Scan.this_string \"hell\" (explode \"hello\")"
    99   @{ML_response [display,gray] "Scan.this_string \"hell\" (explode \"hello\")"
   100                           "(\"hell\", [\"o\"])"}
   100                           "(\"hell\", [\"o\"])"}
   101 
   101 
   102   Parsers that explore alternatives can be constructed using the function @{ML
   102   Parsers that explore alternatives can be constructed using the function @{ML
   103   "(op ||)"}. For example, the parser @{ML "(p || q)" for p q} returns the
   103   "(op ||)"}. For example, the parser @{ML "(p || q)" for p q} returns the
   104   result of @{text "p"}, in case it succeeds, otherwise it returns the
   104   result of @{text "p"}, in case it succeeds, otherwise it returns the
   105   result of @{text "q"}. For example
   105   result of @{text "q"}. For example:
   106 
   106 
   107 
   107 
   108 @{ML_response [display,gray] 
   108 @{ML_response [display,gray] 
   109 "let 
   109 "let 
   110   val hw = ($$ \"h\") || ($$ \"w\")
   110   val hw = ($$ \"h\") || ($$ \"w\")
   115 end"
   115 end"
   116   "((\"h\", [\"e\", \"l\", \"l\", \"o\"]), (\"w\", [\"o\", \"r\", \"l\", \"d\"]))"}
   116   "((\"h\", [\"e\", \"l\", \"l\", \"o\"]), (\"w\", [\"o\", \"r\", \"l\", \"d\"]))"}
   117 
   117 
   118   The functions @{ML "(op |--)"} and @{ML "(op --|)"} work like the sequencing function 
   118   The functions @{ML "(op |--)"} and @{ML "(op --|)"} work like the sequencing function 
   119   for parsers, except that they discard the item being parsed by the first (respectively second)
   119   for parsers, except that they discard the item being parsed by the first (respectively second)
   120   parser. For example
   120   parser. For example:
   121   
   121   
   122 @{ML_response [display,gray]
   122 @{ML_response [display,gray]
   123 "let 
   123 "let 
   124   val just_e = ($$ \"h\") |-- ($$ \"e\") 
   124   val just_e = ($$ \"h\") |-- ($$ \"e\") 
   125   val just_h = ($$ \"h\") --| ($$ \"e\") 
   125   val just_h = ($$ \"h\") --| ($$ \"e\") 
   154 in 
   154 in 
   155   (p input1, p input2)
   155   (p input1, p input2)
   156 end" "((SOME \"h\", [\"e\", \"l\", \"l\", \"o\"]), (NONE, [\"w\", \"o\", \"r\", \"l\", \"d\"]))"}
   156 end" "((SOME \"h\", [\"e\", \"l\", \"l\", \"o\"]), (NONE, [\"w\", \"o\", \"r\", \"l\", \"d\"]))"}
   157 
   157 
   158   The function @{ML "(op !!)"} helps to produce appropriate error messages
   158   The function @{ML "(op !!)"} helps to produce appropriate error messages
   159   during parsing. For example if one wants to parse that @{text p} is immediately 
   159   during parsing. For example if you want to parse that @{text p} is immediately 
   160   followed by @{text q}, or start a completely different parser @{text r},
   160   followed by @{text q}, or start a completely different parser @{text r},
   161   one might write
   161   you might write
   162 
   162 
   163   @{ML [display,gray] "(p -- q) || r" for p q r}
   163   @{ML [display,gray] "(p -- q) || r" for p q r}
   164 
   164 
   165   However, this parser is problematic for producing an appropriate error message, in case
   165   However, this parser is problematic for producing an appropriate error
   166   the parsing of @{ML "(p -- q)" for p q} fails. Because in that case one loses the information 
   166   message, in case the parsing of @{ML "(p -- q)" for p q} fails. Because in
   167   that @{text p} should be followed by @{text q}. To see this consider the case in
   167   that case you lose the information that @{text p} should be followed by
   168   which @{text p} 
   168   @{text q}. To see this consider the case in which @{text p} is present in
   169   is present in the input, but not @{text q}. That means @{ML "(p -- q)" for p q} will fail 
   169   the input, but not @{text q}. That means @{ML "(p -- q)" for p q} will fail
   170   and the 
   170   and the alternative parser @{text r} will be tried. However in many
   171   alternative parser @{text r} will be tried. However in many circumstance this will be the wrong
   171   circumstance this will be the wrong parser for the input ``p-followed-by-q''
   172   parser for the input ``p-followed-by-q'' and therefore will also fail. The error message is then 
   172   and therefore will also fail. The error message is then caused by the
   173   caused by the
   173   failure of @{text r}, not by the absence of @{text q} in the input. This
   174   failure of @{text r}, not by the absence of @{text q} in the input. This kind of situation
   174   kind of situation can be avoided when using the function @{ML "(op
   175   can be avoided when using the function @{ML "(op !!)"}. This function aborts the whole process of
   175   !!)"}. This function aborts the whole process of parsing in case of a
   176   parsing in case of a failure and prints an error message. For example if we invoke the parser
   176   failure and prints an error message. For example if you invoke the parser
       
   177 
   177   
   178   
   178   @{ML [display,gray] "(!! (fn _ => \"foo\") ($$ \"h\"))"}
   179   @{ML [display,gray] "(!! (fn _ => \"foo\") ($$ \"h\"))"}
   179 
   180 
   180   on @{text [quotes] "hello"}, the parsing succeeds
   181   on @{text [quotes] "hello"}, the parsing succeeds
   181 
   182 
   182   @{ML_response [display,gray] 
   183   @{ML_response [display,gray] 
   183                 "(!! (fn _ => \"foo\") ($$ \"h\")) (explode \"hello\")" 
   184                 "(!! (fn _ => \"foo\") ($$ \"h\")) (explode \"hello\")" 
   184                 "(\"h\", [\"e\", \"l\", \"l\", \"o\"])"}
   185                 "(\"h\", [\"e\", \"l\", \"l\", \"o\"])"}
   185 
   186 
   186   but if we invoke it on @{text [quotes] "world"}
   187   but if you invoke it on @{text [quotes] "world"}
   187   
   188   
   188   @{ML_response_fake [display,gray] "(!! (fn _ => \"foo\") ($$ \"h\")) (explode \"world\")"
   189   @{ML_response_fake [display,gray] "(!! (fn _ => \"foo\") ($$ \"h\")) (explode \"world\")"
   189                                "Exception ABORT raised"}
   190                                "Exception ABORT raised"}
   190 
   191 
   191   then the parsing aborts and the error message @{text "foo"} is printed out. In order to
   192   then the parsing aborts and the error message @{text "foo"} is printed out. In order to
   192   see the error message, we need to prefix the parser with the function 
   193   see the error message, we need to prefix the parser with the function 
   193   @{ML "Scan.error"}. For example
   194   @{ML "Scan.error"}. For example:
   194 
   195 
   195   @{ML_response_fake [display,gray] "Scan.error (!! (fn _ => \"foo\") ($$ \"h\"))"
   196   @{ML_response_fake [display,gray] "Scan.error (!! (fn _ => \"foo\") ($$ \"h\"))"
   196                                "Exception Error \"foo\" raised"}
   197                                "Exception Error \"foo\" raised"}
   197 
   198 
   198   This ``prefixing'' is usually done by wrappers such as @{ML "OuterSyntax.command"} 
   199   This ``prefixing'' is usually done by wrappers such as @{ML "OuterSyntax.command"} 
   199   (FIXME: give reference to later place). 
   200   (see Section~\ref{sec:newcommand} which explains this function in more detail). 
   200   
   201   
   201   Let us now return to our example of parsing @{ML "(p -- q) || r" for p q r}. If we want
   202   Let us now return to our example of parsing @{ML "(p -- q) || r" for p q
   202   to generate the correct error message for p-followed-by-q, then
   203   r}. If you want to generate the correct error message for p-followed-by-q,
   203   we have to write:
   204   then you have to write:
   204 *}
   205 *}
   205 
   206 
   206 ML{*fun p_followed_by_q p q r =
   207 ML{*fun p_followed_by_q p q r =
   207   let 
   208   let 
   208      val err_msg = (fn _ => p ^ " is not followed by " ^ q)
   209      val err_msg = (fn _ => p ^ " is not followed by " ^ q)
   224                           "((\"w\", \"w\"), [\"o\", \"r\", \"l\", \"d\"])"}
   225                           "((\"w\", \"w\"), [\"o\", \"r\", \"l\", \"d\"])"}
   225   
   226   
   226   yields the expected parsing. 
   227   yields the expected parsing. 
   227 
   228 
   228   The function @{ML "Scan.repeat p" for p} will apply a parser @{text p} as 
   229   The function @{ML "Scan.repeat p" for p} will apply a parser @{text p} as 
   229   often as it succeeds. For example
   230   often as it succeeds. For example:
   230   
   231   
   231   @{ML_response [display,gray] "Scan.repeat ($$ \"h\") (explode \"hhhhello\")" 
   232   @{ML_response [display,gray] "Scan.repeat ($$ \"h\") (explode \"hhhhello\")" 
   232                 "([\"h\", \"h\", \"h\", \"h\"], [\"e\", \"l\", \"l\", \"o\"])"}
   233                 "([\"h\", \"h\", \"h\", \"h\"], [\"e\", \"l\", \"l\", \"o\"])"}
   233   
   234   
   234   Note that @{ML "Scan.repeat"} stores the parsed items in a list. The function
   235   Note that @{ML "Scan.repeat"} stores the parsed items in a list. The function
   235   @{ML "Scan.repeat1"} is similar, but requires that the parser @{text "p"} 
   236   @{ML "Scan.repeat1"} is similar, but requires that the parser @{text "p"} 
   236   succeeds at least once.
   237   succeeds at least once.
   237 
   238 
   238   Also note that the parser would have aborted with the exception @{text MORE}, if
   239   Also note that the parser would have aborted with the exception @{text MORE}, if
   239   we had run it only on just @{text [quotes] "hhhh"}. This can be avoided by using
   240   you had run it only on just @{text [quotes] "hhhh"}. This can be avoided by using
   240   the wrapper @{ML Scan.finite} and the ``stopper-token'' @{ML Symbol.stopper}. With
   241   the wrapper @{ML Scan.finite} and the ``stopper-token'' @{ML Symbol.stopper}. With
   241   them we can write
   242   them you can write:
   242   
   243   
   243   @{ML_response [display,gray] "Scan.finite Symbol.stopper (Scan.repeat ($$ \"h\")) (explode \"hhhh\")" 
   244   @{ML_response [display,gray] "Scan.finite Symbol.stopper (Scan.repeat ($$ \"h\")) (explode \"hhhh\")" 
   244                 "([\"h\", \"h\", \"h\", \"h\"], [])"}
   245                 "([\"h\", \"h\", \"h\", \"h\"], [])"}
   245 
   246 
   246   @{ML Symbol.stopper} is the ``end-of-input'' indicator for parsing strings;
   247   @{ML Symbol.stopper} is the ``end-of-input'' indicator for parsing strings;
   247   other stoppers need to be used when parsing token, for example. However, this kind of 
   248   other stoppers need to be used when parsing tokens, for example. However, this kind of 
   248   manually wrapping is often already done by the surrounding infrastructure. 
   249   manually wrapping is often already done by the surrounding infrastructure. 
   249 
   250 
   250   The function @{ML Scan.repeat} can be used with @{ML Scan.one} to read any 
   251   The function @{ML Scan.repeat} can be used with @{ML Scan.one} to read any 
   251   string as in
   252   string as in
   252 
   253 
   289    Scan.finite Symbol.stopper p input2)
   290    Scan.finite Symbol.stopper p input2)
   290 end"
   291 end"
   291 "(([\"f\", \"o\", \"o\", \"o\", \"o\", \"o\"], []),
   292 "(([\"f\", \"o\", \"o\", \"o\", \"o\", \"o\"], []),
   292  ([\"f\", \"o\", \"o\"], [\"*\", \"o\", \"o\", \"o\"]))"}
   293  ([\"f\", \"o\", \"o\"], [\"*\", \"o\", \"o\", \"o\"]))"}
   293 
   294 
   294   After parsing is done, one nearly always wants to apply a function on the parsed 
   295   After parsing is done, you nearly always want to apply a function on the parsed 
   295   items. To do this the function @{ML "(p >> f)" for p f} can be employed, which runs 
   296   items. To do this the function @{ML "(p >> f)" for p f} can be employed, which runs 
   296   first the parser @{text p} and upon successful completion applies the 
   297   first the parser @{text p} and upon successful completion applies the 
   297   function @{text f} to the result. For example
   298   function @{text f} to the result. For example
   298 
   299 
   299 @{ML_response [display,gray]
   300 @{ML_response [display,gray]
   300 "let 
   301 "let 
   301   fun double (x,y) = (x^x,y^y) 
   302   fun double (x,y) = (x ^ x, y ^ y) 
   302 in
   303 in
   303   (($$ \"h\") -- ($$ \"e\") >> double) (explode \"hello\")
   304   (($$ \"h\") -- ($$ \"e\") >> double) (explode \"hello\")
   304 end"
   305 end"
   305 "((\"hh\", \"ee\"), [\"l\", \"l\", \"o\"])"}
   306 "((\"hh\", \"ee\"), [\"l\", \"l\", \"o\"])"}
   306 
   307 
   321   \begin{exercise}\label{ex:scancmts}
   322   \begin{exercise}\label{ex:scancmts}
   322   Write a parser that parses an input string so that any comment enclosed
   323   Write a parser that parses an input string so that any comment enclosed
   323   inside @{text "(*\<dots>*)"} is replaced by a the same comment but enclosed inside
   324   inside @{text "(*\<dots>*)"} is replaced by a the same comment but enclosed inside
   324   @{text "(**\<dots>**)"} in the output string. To enclose a string, you can use the
   325   @{text "(**\<dots>**)"} in the output string. To enclose a string, you can use the
   325   function @{ML "enclose s1 s2 s" for s1 s2 s} which produces the string @{ML
   326   function @{ML "enclose s1 s2 s" for s1 s2 s} which produces the string @{ML
   326   "s1^s^s2" for s1 s2 s}.
   327   "s1 ^ s ^ s2" for s1 s2 s}.
   327   \end{exercise}
   328   \end{exercise}
   328 
   329 
   329 
   330 
   330   The function @{ML Scan.lift} takes a parser and a pair as arguments. This function applies
   331   The function @{ML Scan.lift} takes a parser and a pair as arguments. This function applies
   331   the given parser to the second component of the pair and leaves the  first component 
   332   the given parser to the second component of the pair and leaves the  first component 
   339 *}
   340 *}
   340 
   341 
   341 section {* Parsing Theory Syntax *}
   342 section {* Parsing Theory Syntax *}
   342 
   343 
   343 text {*
   344 text {*
   344   Most of the time, however, Isabelle developers have to deal with parsing tokens, not strings.
   345   Most of the time, however, Isabelle developers have to deal with parsing
   345   This is because the parsers for the theory syntax, as well as the parsers for the 
   346   tokens, not strings.  This is because the parsers for the theory syntax, as
   346   argument syntax of proof methods and attributes use the type 
   347   well as the parsers for the argument syntax of proof methods and attributes
   347   @{ML_type OuterLex.token} (which is identical to the type @{ML_type OuterParse.token}).
   348   use the type @{ML_type OuterLex.token} (which is identical to the type
   348   There are also parsers for ML-expressions and ML-files.
   349   @{ML_type OuterParse.token}).  There are also parsers for ML-expressions and
       
   350   ML-files, which can be sometimes handy.
   349 
   351 
   350   \begin{readmore}
   352   \begin{readmore}
   351   The parser functions for the theory syntax are contained in the structure
   353   The parser functions for the theory syntax are contained in the structure
   352   @{ML_struct OuterParse} defined in the file @{ML_file  "Pure/Isar/outer_parse.ML"}.
   354   @{ML_struct OuterParse} defined in the file @{ML_file  "Pure/Isar/outer_parse.ML"}.
   353   The definition for tokens is in the file @{ML_file "Pure/Isar/outer_lex.ML"}.
   355   The definition for tokens is in the file @{ML_file "Pure/Isar/outer_lex.ML"}.
   358   @{ML "Command" in OuterLex} for commands). Some token parsers take into account the 
   360   @{ML "Command" in OuterLex} for commands). Some token parsers take into account the 
   359   kind of tokens.
   361   kind of tokens.
   360 *}  
   362 *}  
   361 
   363 
   362 text {*
   364 text {*
   363   For the examples below, we can generate a token list out of a string using
   365   With the examples below, you can generate a token list out of a string using
   364   the function @{ML "OuterSyntax.scan"}, which we give below @{ML
   366   the function @{ML "OuterSyntax.scan"}, which is given @{ML "Position.none"}
   365   "Position.none"} as argument since, at the moment, we are not interested in
   367   as argument since, at the moment, we are not interested in generating
   366   generating precise error messages. The following code
   368   precise error messages. The following code
   367 
       
   368 
   369 
   369 @{ML_response_fake [display,gray] "OuterSyntax.scan Position.none \"hello world\"" 
   370 @{ML_response_fake [display,gray] "OuterSyntax.scan Position.none \"hello world\"" 
   370 "[Token (\<dots>,(Ident, \"hello\"),\<dots>), 
   371 "[Token (\<dots>,(Ident, \"hello\"),\<dots>), 
   371  Token (\<dots>,(Space, \" \"),\<dots>), 
   372  Token (\<dots>,(Space, \" \"),\<dots>), 
   372  Token (\<dots>,(Ident, \"world\"),\<dots>)]"}
   373  Token (\<dots>,(Ident, \"world\"),\<dots>)]"}
   373 
   374 
   374   produces three tokens where the first and the last are identifiers, since
   375   produces three tokens where the first and the last are identifiers, since
   375   @{text [quotes] "hello"} and @{text [quotes] "world"} do not match any
   376   @{text [quotes] "hello"} and @{text [quotes] "world"} do not match any
   376   other syntactic category.\footnote{Note that because of a possible a bug in
   377   other syntactic category.\footnote{Note that because of a possible a bug in
   377   the PolyML runtime system the result is printed as @{text "?"}, instead of
   378   the PolyML runtime system the result is printed as @{text [quotes] "?"}, instead of
   378   the tokens.} The second indicates a space.
   379   the tokens.} The second indicates a space.
   379 
   380 
   380   Many parsing functions later on will require spaces, comments and the like
   381   Many parsing functions later on will require spaces, comments and the like
   381   to have already been filtered out.  So from now on we are going to use the 
   382   to have already been filtered out.  So from now on we are going to use the 
   382   functions @{ML filter} and @{ML OuterLex.is_proper} do this. For example
   383   functions @{ML filter} and @{ML OuterLex.is_proper} do this. For example:
   383 
       
   384 
   384 
   385 @{ML_response_fake [display,gray]
   385 @{ML_response_fake [display,gray]
   386 "let
   386 "let
   387    val input = OuterSyntax.scan Position.none \"hello world\"
   387    val input = OuterSyntax.scan Position.none \"hello world\"
   388 in
   388 in
   389    filter OuterLex.is_proper input
   389    filter OuterLex.is_proper input
   390 end" 
   390 end" 
   391 "[Token (\<dots>,(Ident, \"hello\"), \<dots>), Token (\<dots>,(Ident, \"world\"), \<dots>)]"}
   391 "[Token (\<dots>,(Ident, \"hello\"), \<dots>), Token (\<dots>,(Ident, \"world\"), \<dots>)]"}
   392 
   392 
   393   For convenience we define the function
   393   For convenience we define the function:
   394 
   394 
   395 *}
   395 *}
   396 
   396 
   397 ML{*fun filtered_input str = 
   397 ML{*fun filtered_input str = 
   398        filter OuterLex.is_proper (OuterSyntax.scan Position.none str) *}
   398        filter OuterLex.is_proper (OuterSyntax.scan Position.none str) *}
   399 
   399 
   400 text {*
   400 text {*
   401 
   401 
   402   If we parse
   402   If you now parse
   403 
   403 
   404 @{ML_response_fake [display,gray] 
   404 @{ML_response_fake [display,gray] 
   405 "filtered_input \"inductive | for\"" 
   405 "filtered_input \"inductive | for\"" 
   406 "[Token (\<dots>,(Command, \"inductive\"),\<dots>), 
   406 "[Token (\<dots>,(Command, \"inductive\"),\<dots>), 
   407  Token (\<dots>,(Keyword, \"|\"),\<dots>), 
   407  Token (\<dots>,(Keyword, \"|\"),\<dots>), 
   408  Token (\<dots>,(Keyword, \"for\"),\<dots>)]"}
   408  Token (\<dots>,(Keyword, \"for\"),\<dots>)]"}
   409 
   409 
   410   we obtain a list consisting of only a command and two keyword tokens.
   410   you obtain a list consisting of only a command and two keyword tokens.
   411   If you want to see which keywords and commands are currently known, type in
   411   If you want to see which keywords and commands are currently known, type in
   412   the following code (you might have to adjust the @{ML print_depth} in order to
   412   the following code (you might have to adjust the @{ML print_depth} in order to
   413   see the complete list):
   413   see the complete list):
   414 
   414 
   415 @{ML_response_fake [display,gray] 
   415 @{ML_response_fake [display,gray] 
   418 in 
   418 in 
   419   (Scan.dest_lexicon commands, Scan.dest_lexicon keywords)
   419   (Scan.dest_lexicon commands, Scan.dest_lexicon keywords)
   420 end" 
   420 end" 
   421 "([\"}\",\"{\",\<dots>],[\"\<rightleftharpoons>\",\"\<leftharpoondown>\",\<dots>])"}
   421 "([\"}\",\"{\",\<dots>],[\"\<rightleftharpoons>\",\"\<leftharpoondown>\",\<dots>])"}
   422 
   422 
   423   Now the parser @{ML "OuterParse.$$$"} parses a single keyword. For example
   423   The parser @{ML "OuterParse.$$$"} parses a single keyword. For example:
   424 
   424 
   425 @{ML_response [display,gray]
   425 @{ML_response [display,gray]
   426 "let 
   426 "let 
   427   val input1 = filtered_input \"where for\"
   427   val input1 = filtered_input \"where for\"
   428   val input2 = filtered_input \"| in\"
   428   val input2 = filtered_input \"| in\"
   429 in 
   429 in 
   430   (OuterParse.$$$ \"where\" input1, OuterParse.$$$ \"|\" input2)
   430   (OuterParse.$$$ \"where\" input1, OuterParse.$$$ \"|\" input2)
   431 end"
   431 end"
   432 "((\"where\",\<dots>),(\"|\",\<dots>))"}
   432 "((\"where\",\<dots>),(\"|\",\<dots>))"}
   433 
   433 
   434   Like before, we can sequentially connect parsers with @{ML "(op --)"}. For example 
   434   Like before, you can sequentially connect parsers with @{ML "(op --)"}. For example: 
   435 
   435 
   436 @{ML_response [display,gray]
   436 @{ML_response [display,gray]
   437 "let 
   437 "let 
   438   val input = filtered_input \"| in\"
   438   val input = filtered_input \"| in\"
   439 in 
   439 in 
   441 end"
   441 end"
   442 "((\"|\",\"in\"),[])"}
   442 "((\"|\",\"in\"),[])"}
   443 
   443 
   444   The parser @{ML "OuterParse.enum s p" for s p} parses a possibly empty 
   444   The parser @{ML "OuterParse.enum s p" for s p} parses a possibly empty 
   445   list of items recognised by the parser @{text p}, where the items being parsed
   445   list of items recognised by the parser @{text p}, where the items being parsed
   446   are separated by the string @{text s}. For example
   446   are separated by the string @{text s}. For example:
   447 
   447 
   448 @{ML_response [display,gray]
   448 @{ML_response [display,gray]
   449 "let 
   449 "let 
   450   val input = filtered_input \"in | in | in foo\"
   450   val input = filtered_input \"in | in | in foo\"
   451 in 
   451 in 
   452   (OuterParse.enum \"|\" (OuterParse.$$$ \"in\")) input
   452   (OuterParse.enum \"|\" (OuterParse.$$$ \"in\")) input
   453 end" 
   453 end" 
   454 "([\"in\",\"in\",\"in\"],[\<dots>])"}
   454 "([\"in\",\"in\",\"in\"],[\<dots>])"}
   455 
   455 
   456   @{ML "OuterParse.enum1"} works similarly, except that the parsed list must
   456   @{ML "OuterParse.enum1"} works similarly, except that the parsed list must
   457   be non-empty. Note that we had to add a string @{text [quotes] "foo"} at the end
   457   be non-empty. Note that we had to add a string @{text [quotes] "foo"} at the
   458   of the parsed string, otherwise the parser would have consumed all tokens
   458   end of the parsed string, otherwise the parser would have consumed all
   459   and then failed with the exception @{text "MORE"}. Like in the previous
   459   tokens and then failed with the exception @{text "MORE"}. Like in the
   460   section, we can avoid this exception using the wrapper @{ML
   460   previous section, we can avoid this exception using the wrapper @{ML
   461   Scan.finite}. This time, however, we have to use the ``stopper-token'' @{ML
   461   Scan.finite}. This time, however, we have to use the ``stopper-token'' @{ML
   462   OuterLex.stopper}. We can write
   462   OuterLex.stopper}. We can write:
   463 
   463 
   464 @{ML_response [display,gray]
   464 @{ML_response [display,gray]
   465 "let 
   465 "let 
   466   val input = filtered_input \"in | in | in\"
   466   val input = filtered_input \"in | in | in\"
   467 in 
   467 in 
   526 text {* (FIXME funny output for a proposition) *}
   526 text {* (FIXME funny output for a proposition) *}
   527 
   527 
   528 section {* Parsing Specifications *}
   528 section {* Parsing Specifications *}
   529 
   529 
   530 text {*
   530 text {*
   531   There are a number of special purpose parsers that help for parsing specifications.
   531   There are a number of special purpose parsers that help with parsing specifications
   532   For example the function @{ML OuterParse.target} reads a target in order to indicate
   532   of functions, inductive definitions and so on. For example the function 
   533   a locale.
   533   @{ML OuterParse.target} reads a target in order to indicate a locale.
   534 
   534 
   535 @{ML_response [display,gray]
   535 @{ML_response [display,gray]
   536 "let 
   536 "let 
   537   val input = filtered_input \"(in test)\"
   537   val input = filtered_input \"(in test)\"
   538 in 
   538 in 
   539   parse OuterParse.target input 
   539   parse OuterParse.target input 
   540 end" "(\"test\",[])"}
   540 end" "(\"test\",[])"}
   541   
   541   
   542   The function @{ML OuterParse.opt_target} makes this parser ``optional''.
   542   The function @{ML OuterParse.opt_target} makes this parser ``optional'', that
       
   543   is wrapping the result into an option type and returning @{ML NONE} if no
       
   544   target is present.
   543 
   545 
   544   The function @{ML OuterParse.fixes} reads a list of constants
   546   The function @{ML OuterParse.fixes} reads a list of constants
   545   which can include a type annotation and a syntax translation.
   547   that can include type annotations and syntax translations.
   546   The list needs to be separated by \isacommand{and}.
   548   The list is separated by \isacommand{and}. For example:
   547 
       
   548 *}
   549 *}
   549 
   550 
   550 text {*
   551 text {*
   551 
   552 
   552 @{ML_response [display,gray]
   553 @{ML_response [display,gray]
   558 end"
   559 end"
   559 "([(foo, SOME \<dots>, Mixfix (\"FOO\",[100],100)), 
   560 "([(foo, SOME \<dots>, Mixfix (\"FOO\",[100],100)), 
   560   (bar, SOME \<dots>, NoSyn), 
   561   (bar, SOME \<dots>, NoSyn), 
   561   (blonk, NONE, NoSyn)],[])"}  
   562   (blonk, NONE, NoSyn)],[])"}  
   562 
   563 
   563   The types of the constants is stored in the @{ML SOME}s. 
   564   Whenever types are given, then they are stored in the @{ML SOME}s. 
   564   We need to write @{text "\\\"int \<Rightarrow> bool\\\""} to properly
   565   Note that we had to write @{text "\\\"int \<Rightarrow> bool\\\""} to properly escape 
   565   escape the type.
   566   the double quotes in the compound type.
   566 
   567 
   567   @{ML OuterParse.for_fixes} is an ``optional'' that prefixes 
   568   @{ML OuterParse.for_fixes} is an ``optional'' that prefixes 
   568   @{ML OuterParse.fixes} with the comman \isacommand{for}.
   569   @{ML OuterParse.fixes} with the comman \isacommand{for}.
   569 *}
   570   (FIXME give an example and explain more)
       
   571 
       
   572 @{ML_response [display,gray]
       
   573 "let
       
   574   val input = filtered_input 
       
   575         \"for foo::\\\"int \<Rightarrow> bool\\\" (\\\"FOO\\\" [100] 100) and bar::nat and blonk\"
       
   576 in
       
   577    parse OuterParse.for_fixes input
       
   578 end"
       
   579 "([(foo, SOME \<dots>, Mixfix (\"FOO\",[100],100)), 
       
   580   (bar, SOME \<dots>, NoSyn), 
       
   581   (blonk, NONE, NoSyn)],[])"}  
       
   582 
       
   583 *}
       
   584 
   570 
   585 
   571 ML{*let 
   586 ML{*let 
   572   val input = filtered_input "test_lemma[intro,foo,elim,dest!,bar]:"
   587   val input = filtered_input "test_lemma[intro,foo,elim,dest!,bar]:"
   573 in 
   588 in 
   574   parse (SpecParse.thm_name ":") input 
   589   parse (SpecParse.thm_name ":") input 
   612   a list of predicate constants with optional type-annotation and 
   627   a list of predicate constants with optional type-annotation and 
   613   optional syntax-annotation; a list of for-fixes (fixed parameters); 
   628   optional syntax-annotation; a list of for-fixes (fixed parameters); 
   614   and a list of rules where each rule has optionally a name and an attribute.
   629   and a list of rules where each rule has optionally a name and an attribute.
   615 *}
   630 *}
   616 
   631 
   617 section {* New Commands and Keyword Files *}
   632 section {* New Commands and Keyword Files\label{sec:newcommand} *}
   618 
   633 
   619 text {*
   634 text {*
   620   Often new commands, for example for providing new definitional principles,
   635   Often new commands, for example for providing new definitional principles,
   621   need to be implemented. While this is not difficult on the ML-level,
   636   need to be implemented. While this is not difficult on the ML-level,
   622   new commands, in order to be useful, need to be recognised by
   637   new commands, in order to be useful, need to be recognised by
   623   ProofGeneral. This results in some subtle configuration issues, which we
   638   ProofGeneral. This results in some subtle configuration issues, which we
   624   will explain in this section.
   639   will explain in this section.
   625 
   640 
   626   To keep things simple, let us start with a ``silly'' command that does nothing 
   641   To keep things simple, let us start with a ``silly'' command that does nothing 
   627   at all. We shall name this command \isacommand{foobar}. On the ML-level it can be 
   642   at all. We shall name this command \isacommand{foobar}. On the ML-level it can be 
   628   defined as
   643   defined as:
   629 *}
   644 *}
   630 
   645 
   631 ML{*let
   646 ML{*let
   632   val do_nothing = Scan.succeed (Toplevel.theory I)
   647   val do_nothing = Scan.succeed (Toplevel.theory I)
   633   val kind = OuterKeyword.thy_decl
   648   val kind = OuterKeyword.thy_decl
   686   \isacommand{foobar}-command. If you target other logics besides HOL, such
   701   \isacommand{foobar}-command. If you target other logics besides HOL, such
   687   as Nominal or ZF, then you need to adapt the log files appropriately.
   702   as Nominal or ZF, then you need to adapt the log files appropriately.
   688   @{text Pure} and @{text HOL} are usually compiled during the installation of
   703   @{text Pure} and @{text HOL} are usually compiled during the installation of
   689   Isabelle. So log files for them should be already available. If not, then
   704   Isabelle. So log files for them should be already available. If not, then
   690   they can be conveniently compiled with the help of the build-script from the Isabelle
   705   they can be conveniently compiled with the help of the build-script from the Isabelle
   691   distribution
   706   distribution.
   692 
       
   693 
   707 
   694   @{text [display] 
   708   @{text [display] 
   695 "$ ./build -m \"Pure\"
   709 "$ ./build -m \"Pure\"
   696 $ ./build -m \"HOL\""}
   710 $ ./build -m \"HOL\""}
   697   
   711   
   698   The @{text "Pure-ProofGeneral"} theory needs to be compiled with
   712   The @{text "Pure-ProofGeneral"} theory needs to be compiled with:
   699 
   713 
   700   @{text [display] "$ ./build -m \"Pure-ProofGeneral\" \"Pure\""}
   714   @{text [display] "$ ./build -m \"Pure-ProofGeneral\" \"Pure\""}
   701 
   715 
   702   For the theory @{text "Command.thy"}, you first need to create a ``managed'' subdirectory 
   716   For the theory @{text "Command.thy"}, you first need to create a ``managed'' subdirectory 
   703   with
   717   with:
   704 
   718 
   705   @{text [display] "$ isabelle mkdir FoobarCommand"}
   719   @{text [display] "$ isabelle mkdir FoobarCommand"}
   706 
   720 
   707   This generates a directory containing the files 
   721   This generates a directory containing the files: 
   708 
   722 
   709   @{text [display] 
   723   @{text [display] 
   710 "./IsaMakefile
   724 "./IsaMakefile
   711 ./FoobarCommand/ROOT.ML
   725 ./FoobarCommand/ROOT.ML
   712 ./FoobarCommand/document
   726 ./FoobarCommand/document
   716   You need to copy the file @{text "Command.thy"} into the directory @{text "FoobarCommand"}
   730   You need to copy the file @{text "Command.thy"} into the directory @{text "FoobarCommand"}
   717   and add the line 
   731   and add the line 
   718 
   732 
   719   @{text [display] "use_thy \"Command\";"} 
   733   @{text [display] "use_thy \"Command\";"} 
   720   
   734   
   721   to the file @{text "./FoobarCommand/ROOT.ML"}. You can now compile the theory by just typing
   735   to the file @{text "./FoobarCommand/ROOT.ML"}. You can now compile the theory by just typing:
   722 
   736 
   723   @{text [display] "$ isabelle make"}
   737   @{text [display] "$ isabelle make"}
   724 
   738 
   725   If the compilation succeeds, you have finally created all the necessary log files. 
   739   If the compilation succeeds, you have finally created all the necessary log files. 
   726   They are stored in the directory 
   740   They are stored in the directory 
   730   or something similar depending on your Isabelle distribution and architecture.
   744   or something similar depending on your Isabelle distribution and architecture.
   731   One quick way to assign a shell variable to this directory is by typing
   745   One quick way to assign a shell variable to this directory is by typing
   732 
   746 
   733   @{text [display] "$ ISABELLE_LOGS=\"$(isabelle getenv -b ISABELLE_OUTPUT)\"/log"}
   747   @{text [display] "$ ISABELLE_LOGS=\"$(isabelle getenv -b ISABELLE_OUTPUT)\"/log"}
   734  
   748  
   735   on the Unix prompt. The directory should include the files
   749   on the Unix prompt. The directory should include the files:
   736 
   750 
   737   @{text [display] 
   751   @{text [display] 
   738 "Pure.gz
   752 "Pure.gz
   739 HOL.gz
   753 HOL.gz
   740 Pure-ProofGeneral.gz
   754 Pure-ProofGeneral.gz
   752   the string @{text "foobar"} twice.\footnote{To see whether things are fine, check
   766   the string @{text "foobar"} twice.\footnote{To see whether things are fine, check
   753   that @{text "grep foobar"} on this file returns something
   767   that @{text "grep foobar"} on this file returns something
   754   non-empty.}  This keyword file needs to
   768   non-empty.}  This keyword file needs to
   755   be copied into the directory @{text "~/.isabelle/etc"}. To make Isabelle
   769   be copied into the directory @{text "~/.isabelle/etc"}. To make Isabelle
   756   aware of this keyword file, you have to start Isabelle with the option @{text
   770   aware of this keyword file, you have to start Isabelle with the option @{text
   757   "-k foobar"}, i.e.
   771   "-k foobar"}, that is:
   758 
   772 
   759 
   773 
   760   @{text [display] "$ isabelle -k foobar a_theory_file"}
   774   @{text [display] "$ isabelle emacs -k foobar a_theory_file"}
   761 
   775 
   762   If you now build a theory on top of @{text "Command.thy"}, 
   776   If you now build a theory on top of @{text "Command.thy"}, 
   763   then the command \isacommand{foobar} can be used. 
   777   then the command \isacommand{foobar} can be used. 
   764   Similarly with any other new command. 
   778   Similarly with any other new command. 
   765 
   779 
   766 
   780 
   767   At the moment \isacommand{foobar} is not very useful. Let us refine it a bit 
   781   At the moment \isacommand{foobar} is not very useful. Let us refine it a bit 
   768   next by taking a proposition as argument and printing this proposition inside
   782   next by letting it take a proposition as argument and printing this proposition 
   769   the tracing buffer. 
   783   inside the tracing buffer. 
   770 
   784 
   771   The crucial part of a command is the function that determines the behaviour
   785   The crucial part of a command is the function that determines the behaviour
   772   of the command. In the code above we used a ``do-nothing''-function, which
   786   of the command. In the code above we used a ``do-nothing''-function, which
   773   because of @{ML Scan.succeed} does not parse any argument, but immediately
   787   because of @{ML Scan.succeed} does not parse any argument, but immediately
   774   returns the simple toplevel function @{ML "Toplevel.theory I"}. We can
   788   returns the simple toplevel function @{ML "Toplevel.theory I"}. We can
   775   replace this code by a function that first parses a proposition (using the
   789   replace this code by a function that first parses a proposition (using the
   776   parser @{ML OuterParse.prop}), then prints out the tracing
   790   parser @{ML OuterParse.prop}), then prints out the tracing
   777   information (using a new top-level function @{text trace_top_lvl}) and 
   791   information (using a new top-level function @{text trace_top_lvl}) and 
   778   finally does nothing. For this we can write
   792   finally does nothing. For this you can write:
   779 *}
   793 *}
   780 
   794 
   781 ML{*let
   795 ML{*let
   782   fun trace_top_lvl str = 
   796   fun trace_top_lvl str = 
   783      Toplevel.theory (fn thy => (tracing str; thy))
   797      Toplevel.theory (fn thy => (tracing str; thy))
   788 in
   802 in
   789   OuterSyntax.command "foobar" "traces a proposition" kind trace_prop
   803   OuterSyntax.command "foobar" "traces a proposition" kind trace_prop
   790 end *}
   804 end *}
   791 
   805 
   792 text {*
   806 text {*
   793   Now we can type
   807   Now you can type
   794 
   808 
   795   \begin{isabelle}
   809   \begin{isabelle}
   796   \isacommand{foobar}~@{text [quotes] "True \<and> False"}\\
   810   \isacommand{foobar}~@{text [quotes] "True \<and> False"}\\
   797   @{text "> \"True \<and> False\""}
   811   @{text "> \"True \<and> False\""}
   798   \end{isabelle}
   812   \end{isabelle}
   808   \isacommand{lemma}) or prove some other properties (for example
   822   \isacommand{lemma}) or prove some other properties (for example
   809   \isacommand{function}). To achieve this kind of behaviour, we have to use the kind
   823   \isacommand{function}). To achieve this kind of behaviour, we have to use the kind
   810   indicator @{ML thy_goal in OuterKeyword}.
   824   indicator @{ML thy_goal in OuterKeyword}.
   811 
   825 
   812   Below we change \isacommand{foobar} so that it takes a proposition as
   826   Below we change \isacommand{foobar} so that it takes a proposition as
   813   argument and then starts a proof in order to prove it. Therefore in Line 13
   827   argument and then starts a proof in order to prove it. Therefore in Line 13, 
   814   below, we set the kind indicator to @{ML thy_goal in OuterKeyword}.
   828   we set the kind indicator to @{ML thy_goal in OuterKeyword}.
   815 *}
   829 *}
   816 
   830 
   817 ML%linenumbers{*let
   831 ML%linenumbers{*let
   818   fun set_up_thm str ctxt =
   832   fun set_up_thm str ctxt =
   819     let
   833     let
   831   OuterSyntax.command "foobar" "proving a proposition" kind prove_prop
   845   OuterSyntax.command "foobar" "proving a proposition" kind prove_prop
   832 end *}
   846 end *}
   833 
   847 
   834 text {*
   848 text {*
   835   The function @{text set_up_thm} in Lines 2 to 7 takes a string (the proposition to be
   849   The function @{text set_up_thm} in Lines 2 to 7 takes a string (the proposition to be
   836   proved) and a context.  The context is necessary in order to be able to use
   850   proved) and a context as argument.  The context is necessary in order to be able to use
   837   @{ML Syntax.read_prop}, which converts a string into a proper proposition.
   851   @{ML Syntax.read_prop}, which converts a string into a proper proposition.
   838   In Line 6 the function @{ML Proof.theorem_i} starts the proof for the
   852   In Line 6 the function @{ML Proof.theorem_i} starts the proof for the
   839   proposition. Its argument @{ML NONE} stands for a locale (which we chose to
   853   proposition. Its argument @{ML NONE} stands for a locale (which we chose to
   840   omit); the argument @{ML "(K I)"} stands for a function that determines what
   854   omit); the argument @{ML "(K I)"} stands for a function that determines what
   841   should be done with the theorem once it is proved (we chose to just forget
   855   should be done with the theorem once it is proved (we chose to just forget
   842   about it). In Lines 9 to 11 contain the parser for the proposition.
   856   about it). Lines 9 to 11 contain the parser for the proposition.
   843 
   857 
   844   If we now type \isacommand{foobar}~@{text [quotes] "True \<and> True"}, we obtain the following 
   858   (FIXME: explain @{ML Toplevel.print} etc)
       
   859 
       
   860   If you now type \isacommand{foobar}~@{text [quotes] "True \<and> True"}, you obtain the following 
   845   proof state:
   861   proof state:
   846  
   862 
   847   \begin{isabelle}
   863   \begin{isabelle}
   848   \isacommand{foobar}~@{text [quotes] "True \<and> True"}\\
   864   \isacommand{foobar}~@{text [quotes] "True \<and> True"}\\
   849   @{text "goal (1 subgoal)"}\\
   865   @{text "goal (1 subgoal):"}\\
   850   @{text "1. True \<and> True"}
   866   @{text "1. True \<and> True"}
   851   \end{isabelle}
   867   \end{isabelle}
   852 
   868 
   853   and we can build the proof
   869   and you can build the proof
   854 
   870 
   855   \begin{isabelle}
   871   \begin{isabelle}
   856   \isacommand{foobar}~@{text [quotes] "True \<and> True"}\\
   872   \isacommand{foobar}~@{text [quotes] "True \<and> True"}\\
   857   \isacommand{apply}@{text "(rule conjI)"}\\
   873   \isacommand{apply}@{text "(rule conjI)"}\\
   858   \isacommand{apply}@{text "(rule TrueI)+"}\\
   874   \isacommand{apply}@{text "(rule TrueI)+"}\\
   864   
   880   
   865   (FIXME What do @{ML "Toplevel.theory"} 
   881   (FIXME What do @{ML "Toplevel.theory"} 
   866   @{ML "Toplevel.print"} 
   882   @{ML "Toplevel.print"} 
   867   @{ML Toplevel.local_theory}?)
   883   @{ML Toplevel.local_theory}?)
   868 
   884 
   869   
       
   870 
       
   871   (FIXME read a name and show how to store theorems)
   885   (FIXME read a name and show how to store theorems)
   872 
   886 
   873   (FIXME possibly also read a locale)
       
   874 *}
   887 *}
   875 
   888 
   876 (*<*)
   889 (*<*)
   877 
   890 
   878 chapter {* Parsing *}
   891 chapter {* Parsing *}