ProgTutorial/Parsing.thy
changeset 344 83d5bca38bec
parent 328 c0cae24b9d46
child 346 0fea8b7a14a1
equal deleted inserted replaced
343:8f73e80c8c6f 344:83d5bca38bec
    35 section {* Building Generic Parsers *}
    35 section {* Building Generic Parsers *}
    36 
    36 
    37 text {*
    37 text {*
    38 
    38 
    39   Let us first have a look at parsing strings using generic parsing
    39   Let us first have a look at parsing strings using generic parsing
    40   combinators. The function @{ML_ind  "$$"} takes a string as argument and will
    40   combinators. The function @{ML_ind "$$" in Scan} takes a string as argument and will
    41   ``consume'' this string from a given input list of strings. ``Consume'' in
    41   ``consume'' this string from a given input list of strings. ``Consume'' in
    42   this context means that it will return a pair consisting of this string and
    42   this context means that it will return a pair consisting of this string and
    43   the rest of the input list. For example:
    43   the rest of the input list. For example:
    44 
    44 
    45   @{ML_response [display,gray] 
    45   @{ML_response [display,gray] 
    69   \end{itemize}
    69   \end{itemize}
    70 
    70 
    71   However, note that these exceptions are private to the parser and cannot be accessed
    71   However, note that these exceptions are private to the parser and cannot be accessed
    72   by the programmer (for example to handle them).
    72   by the programmer (for example to handle them).
    73 
    73 
    74   In the examples above we use the function @{ML_ind Symbol.explode}, instead of the 
    74   In the examples above we use the function @{ML_ind explode in Symbol} in the
    75   more standard library function @{ML_ind explode}, for obtaining an input list for 
    75   structure @{ML_struct Symbol}, instead of the more standard library function
    76   the parser. The reason is that @{ML_ind Symbol.explode} is aware of character sequences, 
    76   @{ML_ind explode}, for obtaining an input list for the parser. The reason is
    77   for example @{text "\<foo>"}, that have a special meaning in Isabelle. To see
    77   that @{ML_ind explode} in @{ML_struct Symbol} is aware of character
    78   the difference consider
    78   sequences, for example @{text "\<foo>"}, that have a special meaning in
       
    79   Isabelle. To see the difference consider
    79 
    80 
    80 @{ML_response_fake [display,gray]
    81 @{ML_response_fake [display,gray]
    81 "let 
    82 "let 
    82   val input = \"\<foo> bar\"
    83   val input = \"\<foo> bar\"
    83 in
    84 in
    85 end"
    86 end"
    86 "([\"\\\", \"<\", \"f\", \"o\", \"o\", \">\", \" \", \"b\", \"a\", \"r\"],
    87 "([\"\\\", \"<\", \"f\", \"o\", \"o\", \">\", \" \", \"b\", \"a\", \"r\"],
    87  [\"\<foo>\", \" \", \"b\", \"a\", \"r\"])"}
    88  [\"\<foo>\", \" \", \"b\", \"a\", \"r\"])"}
    88 
    89 
    89   Slightly more general than the parser @{ML "$$"} is the function 
    90   Slightly more general than the parser @{ML "$$"} is the function 
    90   @{ML_ind  one in Scan}, in that it takes a predicate as argument and 
    91   @{ML_ind one in Scan}, in that it takes a predicate as argument and 
    91   then parses exactly
    92   then parses exactly
    92   one item from the input list satisfying this predicate. For example the
    93   one item from the input list satisfying this predicate. For example the
    93   following parser either consumes an @{text [quotes] "h"} or a @{text
    94   following parser either consumes an @{text [quotes] "h"} or a @{text
    94   [quotes] "w"}:
    95   [quotes] "w"}:
    95 
    96 
   101 in
   102 in
   102   (hw input1, hw input2)
   103   (hw input1, hw input2)
   103 end"
   104 end"
   104     "((\"h\", [\"e\", \"l\", \"l\", \"o\"]),(\"w\", [\"o\", \"r\", \"l\", \"d\"]))"}
   105     "((\"h\", [\"e\", \"l\", \"l\", \"o\"]),(\"w\", [\"o\", \"r\", \"l\", \"d\"]))"}
   105 
   106 
   106   Two parsers can be connected in sequence by using the function @{ML_ind  "--"}. 
   107   Two parsers can be connected in sequence by using the function @{ML_ind "--" in Scan}. 
   107   For example parsing @{text "h"}, @{text "e"} and @{text "l"} (in this 
   108   For example parsing @{text "h"}, @{text "e"} and @{text "l"} (in this 
   108   order) you can achieve by:
   109   order) you can achieve by:
   109 
   110 
   110   @{ML_response [display,gray] 
   111   @{ML_response [display,gray] 
   111   "($$ \"h\" -- $$ \"e\" -- $$ \"l\") (Symbol.explode \"hello\")"
   112   "($$ \"h\" -- $$ \"e\" -- $$ \"l\") (Symbol.explode \"hello\")"
   119   @{ML_response [display,gray] 
   120   @{ML_response [display,gray] 
   120   "Scan.this_string \"hell\" (Symbol.explode \"hello\")"
   121   "Scan.this_string \"hell\" (Symbol.explode \"hello\")"
   121   "(\"hell\", [\"o\"])"}
   122   "(\"hell\", [\"o\"])"}
   122 
   123 
   123   Parsers that explore alternatives can be constructed using the function 
   124   Parsers that explore alternatives can be constructed using the function 
   124   @{ML_ind  "||"}. The parser @{ML "(p || q)" for p q} returns the
   125   @{ML_ind  "||" in Scan}. The parser @{ML "(p || q)" for p q} returns the
   125   result of @{text "p"}, in case it succeeds, otherwise it returns the
   126   result of @{text "p"}, in case it succeeds, otherwise it returns the
   126   result of @{text "q"}. For example:
   127   result of @{text "q"}. For example:
   127 
   128 
   128 
   129 
   129 @{ML_response [display,gray] 
   130 @{ML_response [display,gray] 
   134 in
   135 in
   135   (hw input1, hw input2)
   136   (hw input1, hw input2)
   136 end"
   137 end"
   137   "((\"h\", [\"e\", \"l\", \"l\", \"o\"]), (\"w\", [\"o\", \"r\", \"l\", \"d\"]))"}
   138   "((\"h\", [\"e\", \"l\", \"l\", \"o\"]), (\"w\", [\"o\", \"r\", \"l\", \"d\"]))"}
   138 
   139 
   139   The functions @{ML_ind "|--"} and @{ML_ind "--|"} work like the sequencing
   140   The functions @{ML_ind "|--" in Scan} and @{ML_ind "--|" in Scan} work like the sequencing
   140   function for parsers, except that they discard the item being parsed by the
   141   function for parsers, except that they discard the item being parsed by the
   141   first (respectively second) parser. For example:
   142   first (respectively second) parser. For example:
   142   
   143   
   143 @{ML_response [display,gray]
   144 @{ML_response [display,gray]
   144 "let 
   145 "let 
   162 in 
   163 in 
   163   (p input1, p input2)
   164   (p input1, p input2)
   164 end" 
   165 end" 
   165  "((\"h\", [\"e\", \"l\", \"l\", \"o\"]), (\"x\", [\"w\", \"o\", \"r\", \"l\", \"d\"]))"}
   166  "((\"h\", [\"e\", \"l\", \"l\", \"o\"]), (\"x\", [\"w\", \"o\", \"r\", \"l\", \"d\"]))"}
   166 
   167 
   167   The function @{ML_ind  option in Scan} works similarly, except no default value can
   168   The function @{ML_ind option in Scan} works similarly, except no default value can
   168   be given. Instead, the result is wrapped as an @{text "option"}-type. For example:
   169   be given. Instead, the result is wrapped as an @{text "option"}-type. For example:
   169 
   170 
   170 @{ML_response [display,gray]
   171 @{ML_response [display,gray]
   171 "let 
   172 "let 
   172   val p = Scan.option ($$ \"h\")
   173   val p = Scan.option ($$ \"h\")
   174   val input2 = Symbol.explode \"world\"  
   175   val input2 = Symbol.explode \"world\"  
   175 in 
   176 in 
   176   (p input1, p input2)
   177   (p input1, p input2)
   177 end" "((SOME \"h\", [\"e\", \"l\", \"l\", \"o\"]), (NONE, [\"w\", \"o\", \"r\", \"l\", \"d\"]))"}
   178 end" "((SOME \"h\", [\"e\", \"l\", \"l\", \"o\"]), (NONE, [\"w\", \"o\", \"r\", \"l\", \"d\"]))"}
   178 
   179 
   179   The function @{ML_ind  ahead in Scan} parses some input, but leaves the original
   180   The function @{ML_ind ahead in Scan} parses some input, but leaves the original
   180   input unchanged. For example:
   181   input unchanged. For example:
   181 
   182 
   182   @{ML_response [display,gray]
   183   @{ML_response [display,gray]
   183   "Scan.ahead (Scan.this_string \"foo\") (Symbol.explode \"foo\")" 
   184   "Scan.ahead (Scan.this_string \"foo\") (Symbol.explode \"foo\")" 
   184   "(\"foo\", [\"f\", \"o\", \"o\"])"} 
   185   "(\"foo\", [\"f\", \"o\", \"o\"])"} 
   185 
   186 
   186   The function @{ML_ind  "!!"} helps with producing appropriate error messages
   187   The function @{ML_ind "!!" in Scan} helps with producing appropriate error messages
   187   during parsing. For example if you want to parse @{text p} immediately 
   188   during parsing. For example if you want to parse @{text p} immediately 
   188   followed by @{text q}, or start a completely different parser @{text r},
   189   followed by @{text q}, or start a completely different parser @{text r},
   189   you might write:
   190   you might write:
   190 
   191 
   191   @{ML [display,gray] "(p -- q) || r" for p q r}
   192   @{ML [display,gray] "(p -- q) || r" for p q r}
   217   @{ML_response_fake [display,gray] "(!! (fn _ => \"foo\") ($$ \"h\")) (Symbol.explode \"world\")"
   218   @{ML_response_fake [display,gray] "(!! (fn _ => \"foo\") ($$ \"h\")) (Symbol.explode \"world\")"
   218                                "Exception ABORT raised"}
   219                                "Exception ABORT raised"}
   219 
   220 
   220   then the parsing aborts and the error message @{text "foo"} is printed. In order to
   221   then the parsing aborts and the error message @{text "foo"} is printed. In order to
   221   see the error message properly, you need to prefix the parser with the function 
   222   see the error message properly, you need to prefix the parser with the function 
   222   @{ML_ind  error in Scan}. For example:
   223   @{ML_ind error in Scan}. For example:
   223 
   224 
   224   @{ML_response_fake [display,gray] 
   225   @{ML_response_fake [display,gray] 
   225   "Scan.error (!! (fn _ => \"foo\") ($$ \"h\"))"
   226   "Scan.error (!! (fn _ => \"foo\") ($$ \"h\"))"
   226   "Exception Error \"foo\" raised"}
   227   "Exception Error \"foo\" raised"}
   227 
   228 
   228   This ``prefixing'' is usually done by wrappers such as @{ML_ind  local_theory in OuterSyntax} 
   229   This ``prefixing'' is usually done by wrappers such as @{ML_ind local_theory in OuterSyntax} 
   229   (see Section~\ref{sec:newcommand} which explains this function in more detail). 
   230   (see Section~\ref{sec:newcommand} which explains this function in more detail). 
   230   
   231   
   231   Let us now return to our example of parsing @{ML "(p -- q) || r" for p q
   232   Let us now return to our example of parsing @{ML "(p -- q) || r" for p q
   232   r}. If you want to generate the correct error message for failure
   233   r}. If you want to generate the correct error message for failure
   233   of parsing @{text "p"}-followed-by-@{text "q"}, then you have to write:
   234   of parsing @{text "p"}-followed-by-@{text "q"}, then you have to write:
   260   often as it succeeds. For example:
   261   often as it succeeds. For example:
   261   
   262   
   262   @{ML_response [display,gray] "Scan.repeat ($$ \"h\") (Symbol.explode \"hhhhello\")" 
   263   @{ML_response [display,gray] "Scan.repeat ($$ \"h\") (Symbol.explode \"hhhhello\")" 
   263                 "([\"h\", \"h\", \"h\", \"h\"], [\"e\", \"l\", \"l\", \"o\"])"}
   264                 "([\"h\", \"h\", \"h\", \"h\"], [\"e\", \"l\", \"l\", \"o\"])"}
   264   
   265   
   265   Note that @{ML_ind  repeat in Scan} stores the parsed items in a list. The function
   266   Note that @{ML_ind repeat in Scan} stores the parsed items in a list. The function
   266   @{ML_ind  repeat1 in Scan} is similar, but requires that the parser @{text "p"} 
   267   @{ML_ind repeat1 in Scan} is similar, but requires that the parser @{text "p"} 
   267   succeeds at least once.
   268   succeeds at least once.
   268 
   269 
   269   Also note that the parser would have aborted with the exception @{text MORE}, if
   270   Also note that the parser would have aborted with the exception @{text MORE}, if
   270   you had it run with the string @{text [quotes] "hhhh"}. This can be avoided by using
   271   you had it run with the string @{text [quotes] "hhhh"}. This can be avoided by using
   271   the wrapper @{ML_ind  finite in Scan} and the ``stopper-token'' 
   272   the wrapper @{ML_ind finite in Scan} and the ``stopper-token'' 
   272   @{ML_ind  stopper in Symbol}. With them you can write:
   273   @{ML_ind stopper in Symbol}. With them you can write:
   273   
   274   
   274   @{ML_response [display,gray] "Scan.finite Symbol.stopper (Scan.repeat ($$ \"h\")) (Symbol.explode \"hhhh\")" 
   275   @{ML_response [display,gray] "Scan.finite Symbol.stopper (Scan.repeat ($$ \"h\")) (Symbol.explode \"hhhh\")" 
   275                 "([\"h\", \"h\", \"h\", \"h\"], [])"}
   276                 "([\"h\", \"h\", \"h\", \"h\"], [])"}
   276 
   277 
   277   The function @{ML stopper in Symbol} is the ``end-of-input'' indicator for parsing strings;
   278   The function @{ML stopper in Symbol} is the ``end-of-input'' indicator for parsing strings;
   278   other stoppers need to be used when parsing, for example, tokens. However, this kind of 
   279   other stoppers need to be used when parsing, for example, tokens. However, this kind of 
   279   manually wrapping is often already done by the surrounding infrastructure. 
   280   manually wrapping is often already done by the surrounding infrastructure. 
   280 
   281 
   281   The function @{ML_ind  repeat in Scan} can be used with @{ML_ind  one in Scan} to read any 
   282   The function @{ML_ind repeat in Scan} can be used with @{ML_ind one in Scan} to read any 
   282   string as in
   283   string as in
   283 
   284 
   284   @{ML_response [display,gray] 
   285   @{ML_response [display,gray] 
   285 "let 
   286 "let 
   286    val p = Scan.repeat (Scan.one Symbol.not_eof)
   287    val p = Scan.repeat (Scan.one Symbol.not_eof)
   288 in
   289 in
   289    Scan.finite Symbol.stopper p input
   290    Scan.finite Symbol.stopper p input
   290 end" 
   291 end" 
   291 "([\"f\", \"o\", \"o\", \" \", \"b\", \"a\", \"r\", \" \", \"f\", \"o\", \"o\"], [])"}
   292 "([\"f\", \"o\", \"o\", \" \", \"b\", \"a\", \"r\", \" \", \"f\", \"o\", \"o\"], [])"}
   292 
   293 
   293   where the function @{ML_ind  not_eof in Symbol} ensures that we do not read beyond the 
   294   where the function @{ML_ind not_eof in Symbol} ensures that we do not read beyond the 
   294   end of the input string (i.e.~stopper symbol).
   295   end of the input string (i.e.~stopper symbol).
   295 
   296 
   296   The function @{ML_ind  unless in Scan} takes two parsers: if the first one can 
   297   The function @{ML_ind unless in Scan} takes two parsers: if the first one can 
   297   parse the input, then the whole parser fails; if not, then the second is tried. Therefore
   298   parse the input, then the whole parser fails; if not, then the second is tried. Therefore
   298   
   299   
   299   @{ML_response_fake_both [display,gray] "Scan.unless ($$ \"h\") ($$ \"w\") (Symbol.explode \"hello\")"
   300   @{ML_response_fake_both [display,gray] "Scan.unless ($$ \"h\") ($$ \"w\") (Symbol.explode \"hello\")"
   300                                "Exception FAIL raised"}
   301                                "Exception FAIL raised"}
   301 
   302 
   304   @{ML_response [display,gray] "Scan.unless ($$ \"h\") ($$ \"w\") (Symbol.explode \"world\")"
   305   @{ML_response [display,gray] "Scan.unless ($$ \"h\") ($$ \"w\") (Symbol.explode \"world\")"
   305                           "(\"w\",[\"o\", \"r\", \"l\", \"d\"])"}
   306                           "(\"w\",[\"o\", \"r\", \"l\", \"d\"])"}
   306 
   307 
   307   succeeds. 
   308   succeeds. 
   308 
   309 
   309   The functions @{ML_ind  repeat in Scan} and @{ML_ind  unless in Scan} can 
   310   The functions @{ML_ind repeat in Scan} and @{ML_ind unless in Scan} can 
   310   be combined to read any input until a certain marker symbol is reached. In the 
   311   be combined to read any input until a certain marker symbol is reached. In the 
   311   example below the marker symbol is a @{text [quotes] "*"}.
   312   example below the marker symbol is a @{text [quotes] "*"}.
   312 
   313 
   313   @{ML_response [display,gray]
   314   @{ML_response [display,gray]
   314 "let 
   315 "let 
   322 "(([\"f\", \"o\", \"o\", \"o\", \"o\", \"o\"], []),
   323 "(([\"f\", \"o\", \"o\", \"o\", \"o\", \"o\"], []),
   323  ([\"f\", \"o\", \"o\"], [\"*\", \"o\", \"o\", \"o\"]))"}
   324  ([\"f\", \"o\", \"o\"], [\"*\", \"o\", \"o\", \"o\"]))"}
   324 
   325 
   325   
   326   
   326   After parsing is done, you almost always want to apply a function to the parsed 
   327   After parsing is done, you almost always want to apply a function to the parsed 
   327   items. One way to do this is the function @{ML_ind ">>"} where 
   328   items. One way to do this is the function @{ML_ind ">>" in Scan} where 
   328   @{ML "(p >> f)" for p f} runs 
   329   @{ML "(p >> f)" for p f} runs 
   329   first the parser @{text p} and upon successful completion applies the 
   330   first the parser @{text p} and upon successful completion applies the 
   330   function @{text f} to the result. For example
   331   function @{text f} to the result. For example
   331 
   332 
   332 @{ML_response [display,gray]
   333 @{ML_response [display,gray]
   350 "(\"foo bar foo\",[])"}
   351 "(\"foo bar foo\",[])"}
   351 
   352 
   352   where the single-character strings in the parsed output are transformed
   353   where the single-character strings in the parsed output are transformed
   353   back into one string.
   354   back into one string.
   354 
   355 
   355   The function @{ML_ind  lift in Scan} takes a parser and a pair as arguments. This function applies
   356   The function @{ML_ind lift in Scan} takes a parser and a pair as arguments. This function applies
   356   the given parser to the second component of the pair and leaves the  first component 
   357   the given parser to the second component of the pair and leaves the  first component 
   357   untouched. For example
   358   untouched. For example
   358 
   359 
   359 @{ML_response [display,gray]
   360 @{ML_response [display,gray]
   360 "Scan.lift ($$ \"h\" -- $$ \"e\") (1, Symbol.explode \"hello\")"
   361 "Scan.lift ($$ \"h\" -- $$ \"e\") (1, Symbol.explode \"hello\")"
   391   @{ML_struct OuterParse} defined in the file @{ML_file  "Pure/Isar/outer_parse.ML"}.
   392   @{ML_struct OuterParse} defined in the file @{ML_file  "Pure/Isar/outer_parse.ML"}.
   392   The definition for tokens is in the file @{ML_file "Pure/Isar/outer_lex.ML"}.
   393   The definition for tokens is in the file @{ML_file "Pure/Isar/outer_lex.ML"}.
   393   \end{readmore}
   394   \end{readmore}
   394 
   395 
   395   The structure @{ML_struct  OuterLex} defines several kinds of tokens (for
   396   The structure @{ML_struct  OuterLex} defines several kinds of tokens (for
   396   example @{ML_ind  Ident in OuterLex} for identifiers, @{ML Keyword in
   397   example @{ML_ind Ident in OuterLex} for identifiers, @{ML Keyword in
   397   OuterLex} for keywords and @{ML_ind  Command in OuterLex} for commands). Some
   398   OuterLex} for keywords and @{ML_ind Command in OuterLex} for commands). Some
   398   token parsers take into account the kind of tokens. The first example shows
   399   token parsers take into account the kind of tokens. The first example shows
   399   how to generate a token list out of a string using the function 
   400   how to generate a token list out of a string using the function 
   400   @{ML_ind  scan in OuterSyntax}. It is given the argument 
   401   @{ML_ind scan in OuterSyntax}. It is given the argument 
   401   @{ML "Position.none"} since,
   402   @{ML "Position.none"} since,
   402   at the moment, we are not interested in generating precise error
   403   at the moment, we are not interested in generating precise error
   403   messages. The following code\footnote{Note that because of a possible bug in
   404   messages. The following code\footnote{Note that because of a possible bug in
   404   the PolyML runtime system, the result is printed as @{text [quotes] "?"},
   405   the PolyML runtime system, the result is printed as @{text [quotes] "?"},
   405   instead of the tokens.}
   406   instead of the tokens.}
   413   produces three tokens where the first and the last are identifiers, since
   414   produces three tokens where the first and the last are identifiers, since
   414   @{text [quotes] "hello"} and @{text [quotes] "world"} do not match any
   415   @{text [quotes] "hello"} and @{text [quotes] "world"} do not match any
   415   other syntactic category. The second indicates a space.
   416   other syntactic category. The second indicates a space.
   416 
   417 
   417   We can easily change what is recognised as a keyword with the function
   418   We can easily change what is recognised as a keyword with the function
   418   @{ML_ind  keyword in OuterKeyword}. For example calling it with 
   419   @{ML_ind keyword in OuterKeyword}. For example calling it with 
   419 *}
   420 *}
   420 
   421 
   421 ML{*val _ = OuterKeyword.keyword "hello"*}
   422 ML{*val _ = OuterKeyword.keyword "hello"*}
   422 
   423 
   423 text {*
   424 text {*
   428  Token (\<dots>,(Space, \" \"),\<dots>), 
   429  Token (\<dots>,(Space, \" \"),\<dots>), 
   429  Token (\<dots>,(Ident, \"world\"),\<dots>)]"}
   430  Token (\<dots>,(Ident, \"world\"),\<dots>)]"}
   430 
   431 
   431   Many parsing functions later on will require white space, comments and the like
   432   Many parsing functions later on will require white space, comments and the like
   432   to have already been filtered out.  So from now on we are going to use the 
   433   to have already been filtered out.  So from now on we are going to use the 
   433   functions @{ML filter} and @{ML_ind  is_proper in OuterLex} to do this. 
   434   functions @{ML filter} and @{ML_ind is_proper in OuterLex} to do this. 
   434   For example:
   435   For example:
   435 
   436 
   436 @{ML_response_fake [display,gray]
   437 @{ML_response_fake [display,gray]
   437 "let
   438 "let
   438    val input = OuterSyntax.scan Position.none \"hello world\"
   439    val input = OuterSyntax.scan Position.none \"hello world\"
   466 in 
   467 in 
   467   (Scan.dest_lexicon commands, Scan.dest_lexicon keywords)
   468   (Scan.dest_lexicon commands, Scan.dest_lexicon keywords)
   468 end" 
   469 end" 
   469 "([\"}\", \"{\", \<dots>], [\"\<rightleftharpoons>\", \"\<leftharpoondown>\", \<dots>])"}
   470 "([\"}\", \"{\", \<dots>], [\"\<rightleftharpoons>\", \"\<leftharpoondown>\", \<dots>])"}
   470 
   471 
   471   You might have to adjust the @{ML_ind  print_depth} in order to
   472   You might have to adjust the @{ML_ind print_depth} in order to
   472   see the complete list.
   473   see the complete list.
   473 
   474 
   474   The parser @{ML_ind  "$$$" in OuterParse} parses a single keyword. For example:
   475   The parser @{ML_ind "$$$" in OuterParse} parses a single keyword. For example:
   475 
   476 
   476 @{ML_response [display,gray]
   477 @{ML_response [display,gray]
   477 "let 
   478 "let 
   478   val input1 = filtered_input \"where for\"
   479   val input1 = filtered_input \"where for\"
   479   val input2 = filtered_input \"| in\"
   480   val input2 = filtered_input \"| in\"
   480 in 
   481 in 
   481   (OuterParse.$$$ \"where\" input1, OuterParse.$$$ \"|\" input2)
   482   (OuterParse.$$$ \"where\" input1, OuterParse.$$$ \"|\" input2)
   482 end"
   483 end"
   483 "((\"where\",\<dots>), (\"|\",\<dots>))"}
   484 "((\"where\",\<dots>), (\"|\",\<dots>))"}
   484 
   485 
   485   Any non-keyword string can be parsed with the function @{ML_ind  reserved in OuterParse}.
   486   Any non-keyword string can be parsed with the function @{ML_ind reserved in OuterParse}.
   486   For example:
   487   For example:
   487 
   488 
   488   @{ML_response [display,gray]
   489   @{ML_response [display,gray]
   489 "let 
   490 "let 
   490   val p = OuterParse.reserved \"bar\"
   491   val p = OuterParse.reserved \"bar\"
   492 in
   493 in
   493   p input
   494   p input
   494 end"
   495 end"
   495   "(\"bar\",[])"}
   496   "(\"bar\",[])"}
   496 
   497 
   497   Like before, you can sequentially connect parsers with @{ML_ind  "--"}. For example: 
   498   Like before, you can sequentially connect parsers with @{ML "--"}. For example: 
   498 
   499 
   499 @{ML_response [display,gray]
   500 @{ML_response [display,gray]
   500 "let 
   501 "let 
   501   val input = filtered_input \"| in\"
   502   val input = filtered_input \"| in\"
   502 in 
   503 in 
   556 but keyword in was found\" raised"
   557 but keyword in was found\" raised"
   557 }
   558 }
   558 
   559 
   559   \begin{exercise} (FIXME)
   560   \begin{exercise} (FIXME)
   560   A type-identifier, for example @{typ "'a"}, is a token of 
   561   A type-identifier, for example @{typ "'a"}, is a token of 
   561   kind @{ML_ind  Keyword in OuterLex}. It can be parsed using 
   562   kind @{ML_ind Keyword in OuterLex}. It can be parsed using 
   562   the function @{ML type_ident in OuterParse}.
   563   the function @{ML type_ident in OuterParse}.
   563   \end{exercise}
   564   \end{exercise}
   564 
   565 
   565   (FIXME: or give parser for numbers)
   566   (FIXME: or give parser for numbers)
   566 
   567 
   737 
   738 
   738   As you see, the result is a pair consisting of a list of
   739   As you see, the result is a pair consisting of a list of
   739   variables with optional type-annotation and syntax-annotation, and a list of
   740   variables with optional type-annotation and syntax-annotation, and a list of
   740   rules where every rule has optionally a name and an attribute.
   741   rules where every rule has optionally a name and an attribute.
   741 
   742 
   742   The function @{ML_ind  "fixes" in OuterParse} in Line 2 of the parser reads an 
   743   The function @{ML_ind "fixes" in OuterParse} in Line 2 of the parser reads an 
   743   \isacommand{and}-separated 
   744   \isacommand{and}-separated 
   744   list of variables that can include optional type annotations and syntax translations. 
   745   list of variables that can include optional type annotations and syntax translations. 
   745   For example:\footnote{Note that in the code we need to write 
   746   For example:\footnote{Note that in the code we need to write 
   746   @{text "\\\"int \<Rightarrow> bool\\\""} in order to properly escape the double quotes
   747   @{text "\\\"int \<Rightarrow> bool\\\""} in order to properly escape the double quotes
   747   in the compound type.}
   748   in the compound type.}
   761 text {*
   762 text {*
   762   Whenever types are given, they are stored in the @{ML SOME}s. The types are
   763   Whenever types are given, they are stored in the @{ML SOME}s. The types are
   763   not yet used to type the variables: this must be done by type-inference later
   764   not yet used to type the variables: this must be done by type-inference later
   764   on. Since types are part of the inner syntax they are strings with some
   765   on. Since types are part of the inner syntax they are strings with some
   765   encoded information (see previous section). If a mixfix-syntax is
   766   encoded information (see previous section). If a mixfix-syntax is
   766   present for a variable, then it is stored in the @{ML_ind  Mixfix} data structure;
   767   present for a variable, then it is stored in the @{ML_ind Mixfix} data structure;
   767   no syntax translation is indicated by @{ML_ind  NoSyn}.
   768   no syntax translation is indicated by @{ML_ind NoSyn}.
   768 
   769 
   769   \begin{readmore}
   770   \begin{readmore}
   770   The data structure for mixfix annotations is defined in @{ML_file "Pure/Syntax/mixfix.ML"}.
   771   The data structure for mixfix annotations is defined in @{ML_file "Pure/Syntax/mixfix.ML"}.
   771   \end{readmore}
   772   \end{readmore}
   772 
   773 
   773   Lines 3 to 7 in the function @{ML spec_parser} implement the parser for a
   774   Lines 3 to 7 in the function @{ML spec_parser} implement the parser for a
   774   list of introduction rules, that is propositions with theorem annotations
   775   list of introduction rules, that is propositions with theorem annotations
   775   such as rule names and attributes. The introduction rules are propositions
   776   such as rule names and attributes. The introduction rules are propositions
   776   parsed by @{ML_ind  prop  in OuterParse}. However, they can include an optional
   777   parsed by @{ML_ind prop in OuterParse}. However, they can include an optional
   777   theorem name plus some attributes. For example
   778   theorem name plus some attributes. For example
   778 
   779 
   779 @{ML_response [display,gray] "let 
   780 @{ML_response [display,gray] "let 
   780   val input = filtered_input \"foo_lemma[intro,dest!]:\"
   781   val input = filtered_input \"foo_lemma[intro,dest!]:\"
   781   val ((name, attrib), _) = parse (SpecParse.thm_name \":\") input 
   782   val ((name, attrib), _) = parse (SpecParse.thm_name \":\") input 
   782 in 
   783 in 
   783   (name, map Args.dest_src attrib)
   784   (name, map Args.dest_src attrib)
   784 end" "(foo_lemma, [((\"intro\", []), \<dots>), ((\"dest\", [\<dots>]), \<dots>)])"}
   785 end" "(foo_lemma, [((\"intro\", []), \<dots>), ((\"dest\", [\<dots>]), \<dots>)])"}
   785  
   786  
   786   The function @{ML_ind  opt_thm_name in SpecParse} is the ``optional'' variant of
   787   The function @{ML_ind opt_thm_name in SpecParse} is the ``optional'' variant of
   787   @{ML_ind  thm_name in SpecParse}. Theorem names can contain attributes. The name 
   788   @{ML_ind thm_name in SpecParse}. Theorem names can contain attributes. The name 
   788   has to end with @{text [quotes] ":"}---see the argument of 
   789   has to end with @{text [quotes] ":"}---see the argument of 
   789   the function @{ML SpecParse.opt_thm_name} in Line 7.
   790   the function @{ML SpecParse.opt_thm_name} in Line 7.
   790 
   791 
   791   \begin{readmore}
   792   \begin{readmore}
   792   Attributes and arguments are implemented in the files @{ML_file "Pure/Isar/attrib.ML"} 
   793   Attributes and arguments are implemented in the files @{ML_file "Pure/Isar/attrib.ML"} 
   797 text_raw {*
   798 text_raw {*
   798   \begin{exercise}
   799   \begin{exercise}
   799   Have a look at how the parser @{ML SpecParse.where_alt_specs} is implemented
   800   Have a look at how the parser @{ML SpecParse.where_alt_specs} is implemented
   800   in file @{ML_file "Pure/Isar/spec_parse.ML"}. This parser corresponds
   801   in file @{ML_file "Pure/Isar/spec_parse.ML"}. This parser corresponds
   801   to the ``where-part'' of the introduction rules given above. Below
   802   to the ``where-part'' of the introduction rules given above. Below
   802   we paraphrase the code of @{ML_ind  where_alt_specs in SpecParse} adapted to our
   803   we paraphrase the code of @{ML_ind where_alt_specs in SpecParse} adapted to our
   803   purposes. 
   804   purposes. 
   804   \begin{isabelle}
   805   \begin{isabelle}
   805 *}
   806 *}
   806 ML %linenosgray{*val spec_parser' = 
   807 ML %linenosgray{*val spec_parser' = 
   807      OuterParse.fixes -- 
   808      OuterParse.fixes -- 
   848 in
   849 in
   849   OuterSyntax.local_theory "foobar" "description of foobar" kind do_nothing
   850   OuterSyntax.local_theory "foobar" "description of foobar" kind do_nothing
   850 end *}
   851 end *}
   851 
   852 
   852 text {*
   853 text {*
   853   The crucial function @{ML_ind  local_theory in OuterSyntax} expects a name for the command, a
   854   The crucial function @{ML_ind local_theory in OuterSyntax} expects a name for the command, a
   854   short description, a kind indicator (which we will explain later more thoroughly) and a
   855   short description, a kind indicator (which we will explain later more thoroughly) and a
   855   parser producing a local theory transition (its purpose will also explained
   856   parser producing a local theory transition (its purpose will also explained
   856   later). 
   857   later). 
   857 
   858 
   858   While this is everything you have to do on the ML-level, you need a keyword
   859   While this is everything you have to do on the ML-level, you need a keyword
   994   refine it a bit next by letting it take a proposition as argument and
   995   refine it a bit next by letting it take a proposition as argument and
   995   printing this proposition inside the tracing buffer.
   996   printing this proposition inside the tracing buffer.
   996 
   997 
   997   The crucial part of a command is the function that determines the behaviour
   998   The crucial part of a command is the function that determines the behaviour
   998   of the command. In the code above we used a ``do-nothing''-function, which
   999   of the command. In the code above we used a ``do-nothing''-function, which
   999   because of @{ML_ind  succeed in Scan} does not parse any argument, but immediately
  1000   because of @{ML_ind succeed in Scan} does not parse any argument, but immediately
  1000   returns the simple function @{ML "LocalTheory.theory I"}. We can
  1001   returns the simple function @{ML "LocalTheory.theory I"}. We can
  1001   replace this code by a function that first parses a proposition (using the
  1002   replace this code by a function that first parses a proposition (using the
  1002   parser @{ML OuterParse.prop}), then prints out the tracing
  1003   parser @{ML OuterParse.prop}), then prints out the tracing
  1003   information (using a new function @{text trace_prop}) and 
  1004   information (using a new function @{text trace_prop}) and 
  1004   finally does nothing. For this you can write:
  1005   finally does nothing. For this you can write:
  1020 *}
  1021 *}
  1021 
  1022 
  1022 foobar_trace "True \<and> False"
  1023 foobar_trace "True \<and> False"
  1023 
  1024 
  1024 text {*
  1025 text {*
  1025   Note that so far we used @{ML_ind  thy_decl in OuterKeyword} as the kind
  1026   Note that so far we used @{ML_ind thy_decl in OuterKeyword} as the kind
  1026   indicator for the command.  This means that the command finishes as soon as
  1027   indicator for the command.  This means that the command finishes as soon as
  1027   the arguments are processed. Examples of this kind of commands are
  1028   the arguments are processed. Examples of this kind of commands are
  1028   \isacommand{definition} and \isacommand{declare}.  In other cases, commands
  1029   \isacommand{definition} and \isacommand{declare}.  In other cases, commands
  1029   are expected to parse some arguments, for example a proposition, and then
  1030   are expected to parse some arguments, for example a proposition, and then
  1030   ``open up'' a proof in order to prove the proposition (for example
  1031   ``open up'' a proof in order to prove the proposition (for example
  1031   \isacommand{lemma}) or prove some other properties (for example
  1032   \isacommand{lemma}) or prove some other properties (for example
  1032   \isacommand{function}). To achieve this kind of behaviour, you have to use
  1033   \isacommand{function}). To achieve this kind of behaviour, you have to use
  1033   the kind indicator @{ML_ind  thy_goal in OuterKeyword} and the function @{ML
  1034   the kind indicator @{ML_ind thy_goal in OuterKeyword} and the function @{ML
  1034   "local_theory_to_proof" in OuterSyntax} to set up the command.  Note,
  1035   "local_theory_to_proof" in OuterSyntax} to set up the command.  Note,
  1035   however, once you change the ``kind'' of a command from @{ML thy_decl in
  1036   however, once you change the ``kind'' of a command from @{ML thy_decl in
  1036   OuterKeyword} to @{ML thy_goal in OuterKeyword} then the keyword file needs
  1037   OuterKeyword} to @{ML thy_goal in OuterKeyword} then the keyword file needs
  1037   to be re-created!
  1038   to be re-created!
  1038 
  1039 
  1057 end *}
  1058 end *}
  1058 
  1059 
  1059 text {*
  1060 text {*
  1060   The function @{text goal_prop} in Lines 2 to 7 takes a string (the proposition to be
  1061   The function @{text goal_prop} in Lines 2 to 7 takes a string (the proposition to be
  1061   proved) and a context as argument.  The context is necessary in order to be able to use
  1062   proved) and a context as argument.  The context is necessary in order to be able to use
  1062   @{ML_ind  read_prop in Syntax}, which converts a string into a proper proposition.
  1063   @{ML_ind read_prop in Syntax}, which converts a string into a proper proposition.
  1063   In Line 6 the function @{ML_ind  theorem_i in Proof} starts the proof for the
  1064   In Line 6 the function @{ML_ind theorem_i in Proof} starts the proof for the
  1064   proposition. Its argument @{ML NONE} stands for a locale (which we chose to
  1065   proposition. Its argument @{ML NONE} stands for a locale (which we chose to
  1065   omit); the argument @{ML "(K I)"} stands for a function that determines what
  1066   omit); the argument @{ML "(K I)"} stands for a function that determines what
  1066   should be done with the theorem once it is proved (we chose to just forget
  1067   should be done with the theorem once it is proved (we chose to just forget
  1067   about it). Line 9 contains the parser for the proposition.
  1068   about it). Line 9 contains the parser for the proposition.
  1068 
  1069 
  1083 done
  1084 done
  1084 
  1085 
  1085 text {*
  1086 text {*
  1086   {\bf TBD below}
  1087   {\bf TBD below}
  1087 
  1088 
  1088   (FIXME: read a name and show how to store theorems; see @{ML_ind  note in LocalTheory})
  1089   (FIXME: read a name and show how to store theorems; see @{ML_ind note in LocalTheory})
  1089   
  1090   
  1090 *}
  1091 *}
  1091 
  1092 
  1092 ML_val{*val r = Unsynchronized.ref (NONE:(unit -> term) option)*}
  1093 ML_val{*val r = Unsynchronized.ref (NONE:(unit -> term) option)*}
  1093 ML{*let
  1094 ML{*let
  1264 
  1265 
  1265 text {*
  1266 text {*
  1266   It defines the method @{text foo}, which takes no arguments (therefore the
  1267   It defines the method @{text foo}, which takes no arguments (therefore the
  1267   parser @{ML Scan.succeed}) and only applies a single tactic, namely the tactic which 
  1268   parser @{ML Scan.succeed}) and only applies a single tactic, namely the tactic which 
  1268   applies @{thm [source] conjE} and then @{thm [source] conjI}. The function 
  1269   applies @{thm [source] conjE} and then @{thm [source] conjI}. The function 
  1269   @{ML_ind  SIMPLE_METHOD}
  1270   @{ML_ind SIMPLE_METHOD in Method}
  1270   turns such a tactic into a method. The method @{text "foo"} can be used as follows
  1271   turns such a tactic into a method. The method @{text "foo"} can be used as follows
  1271 *}
  1272 *}
  1272 
  1273 
  1273 lemma shows "A \<and> B \<Longrightarrow> C \<and> D"
  1274 lemma shows "A \<and> B \<Longrightarrow> C \<and> D"
  1274 apply(foo)
  1275 apply(foo)