ProgTutorial/Parsing.thy
changeset 315 de49d5780f57
parent 310 007922777ff1
child 316 74f0a06f751f
equal deleted inserted replaced
314:79202e2eab6a 315:de49d5780f57
    35 section {* Building Generic Parsers *}
    35 section {* Building Generic Parsers *}
    36 
    36 
    37 text {*
    37 text {*
    38 
    38 
    39   Let us first have a look at parsing strings using generic parsing
    39   Let us first have a look at parsing strings using generic parsing
    40   combinators. The function @{ML [index] "$$"} takes a string as argument and will
    40   combinators. The function @{ML_ind [index] "$$"} takes a string as argument and will
    41   ``consume'' this string from a given input list of strings. ``Consume'' in
    41   ``consume'' this string from a given input list of strings. ``Consume'' in
    42   this context means that it will return a pair consisting of this string and
    42   this context means that it will return a pair consisting of this string and
    43   the rest of the input list. For example:
    43   the rest of the input list. For example:
    44 
    44 
    45   @{ML_response [display,gray] 
    45   @{ML_response [display,gray] 
    87 end"
    87 end"
    88 "([\"\\\", \"<\", \"f\", \"o\", \"o\", \">\", \" \", \"b\", \"a\", \"r\"],
    88 "([\"\\\", \"<\", \"f\", \"o\", \"o\", \">\", \" \", \"b\", \"a\", \"r\"],
    89  [\"\<foo>\", \" \", \"b\", \"a\", \"r\"])"}
    89  [\"\<foo>\", \" \", \"b\", \"a\", \"r\"])"}
    90 
    90 
    91   Slightly more general than the parser @{ML "$$"} is the function 
    91   Slightly more general than the parser @{ML "$$"} is the function 
    92   @{ML [index] one in Scan}, in that it takes a predicate as argument and 
    92   @{ML_ind [index] one in Scan}, in that it takes a predicate as argument and 
    93   then parses exactly
    93   then parses exactly
    94   one item from the input list satisfying this predicate. For example the
    94   one item from the input list satisfying this predicate. For example the
    95   following parser either consumes an @{text [quotes] "h"} or a @{text
    95   following parser either consumes an @{text [quotes] "h"} or a @{text
    96   [quotes] "w"}:
    96   [quotes] "w"}:
    97 
    97 
   103 in
   103 in
   104   (hw input1, hw input2)
   104   (hw input1, hw input2)
   105 end"
   105 end"
   106     "((\"h\", [\"e\", \"l\", \"l\", \"o\"]),(\"w\", [\"o\", \"r\", \"l\", \"d\"]))"}
   106     "((\"h\", [\"e\", \"l\", \"l\", \"o\"]),(\"w\", [\"o\", \"r\", \"l\", \"d\"]))"}
   107 
   107 
   108   Two parsers can be connected in sequence by using the function @{ML [index] "--"}. 
   108   Two parsers can be connected in sequence by using the function @{ML_ind [index] "--"}. 
   109   For example parsing @{text "h"}, @{text "e"} and @{text "l"} (in this 
   109   For example parsing @{text "h"}, @{text "e"} and @{text "l"} (in this 
   110   order) you can achieve by:
   110   order) you can achieve by:
   111 
   111 
   112   @{ML_response [display,gray] 
   112   @{ML_response [display,gray] 
   113   "($$ \"h\" -- $$ \"e\" -- $$ \"l\") (Symbol.explode \"hello\")"
   113   "($$ \"h\" -- $$ \"e\" -- $$ \"l\") (Symbol.explode \"hello\")"
   114   "(((\"h\", \"e\"), \"l\"), [\"l\", \"o\"])"}
   114   "(((\"h\", \"e\"), \"l\"), [\"l\", \"o\"])"}
   115 
   115 
   116   Note how the result of consumed strings builds up on the left as nested pairs.  
   116   Note how the result of consumed strings builds up on the left as nested pairs.  
   117 
   117 
   118   If, as in the previous example, you want to parse a particular string, 
   118   If, as in the previous example, you want to parse a particular string, 
   119   then you should use the function @{ML [index] this_string in Scan}:
   119   then you should use the function @{ML_ind [index] this_string in Scan}:
   120 
   120 
   121   @{ML_response [display,gray] 
   121   @{ML_response [display,gray] 
   122   "Scan.this_string \"hell\" (Symbol.explode \"hello\")"
   122   "Scan.this_string \"hell\" (Symbol.explode \"hello\")"
   123   "(\"hell\", [\"o\"])"}
   123   "(\"hell\", [\"o\"])"}
   124 
   124 
   125   Parsers that explore alternatives can be constructed using the function 
   125   Parsers that explore alternatives can be constructed using the function 
   126   @{ML [index] "||"}. The parser @{ML "(p || q)" for p q} returns the
   126   @{ML_ind [index] "||"}. The parser @{ML "(p || q)" for p q} returns the
   127   result of @{text "p"}, in case it succeeds, otherwise it returns the
   127   result of @{text "p"}, in case it succeeds, otherwise it returns the
   128   result of @{text "q"}. For example:
   128   result of @{text "q"}. For example:
   129 
   129 
   130 
   130 
   131 @{ML_response [display,gray] 
   131 @{ML_response [display,gray] 
   136 in
   136 in
   137   (hw input1, hw input2)
   137   (hw input1, hw input2)
   138 end"
   138 end"
   139   "((\"h\", [\"e\", \"l\", \"l\", \"o\"]), (\"w\", [\"o\", \"r\", \"l\", \"d\"]))"}
   139   "((\"h\", [\"e\", \"l\", \"l\", \"o\"]), (\"w\", [\"o\", \"r\", \"l\", \"d\"]))"}
   140 
   140 
   141   The functions @{ML [index] "|--"} and @{ML [index] "--|"} work like the sequencing function 
   141   The functions @{ML_ind [index] "|--"} and @{ML_ind [index] "--|"} work like the sequencing function 
   142   for parsers, except that they discard the item being parsed by the first (respectively second)
   142   for parsers, except that they discard the item being parsed by the first (respectively second)
   143   parser. For example:
   143   parser. For example:
   144   
   144   
   145 @{ML_response [display,gray]
   145 @{ML_response [display,gray]
   146 "let 
   146 "let 
   165 in 
   165 in 
   166   (p input1, p input2)
   166   (p input1, p input2)
   167 end" 
   167 end" 
   168  "((\"h\", [\"e\", \"l\", \"l\", \"o\"]), (\"x\", [\"w\", \"o\", \"r\", \"l\", \"d\"]))"}
   168  "((\"h\", [\"e\", \"l\", \"l\", \"o\"]), (\"x\", [\"w\", \"o\", \"r\", \"l\", \"d\"]))"}
   169 
   169 
   170   The function @{ML [index] option in Scan} works similarly, except no default value can
   170   The function @{ML_ind [index] option in Scan} works similarly, except no default value can
   171   be given. Instead, the result is wrapped as an @{text "option"}-type. For example:
   171   be given. Instead, the result is wrapped as an @{text "option"}-type. For example:
   172 
   172 
   173 @{ML_response [display,gray]
   173 @{ML_response [display,gray]
   174 "let 
   174 "let 
   175   val p = Scan.option ($$ \"h\")
   175   val p = Scan.option ($$ \"h\")
   177   val input2 = Symbol.explode \"world\"  
   177   val input2 = Symbol.explode \"world\"  
   178 in 
   178 in 
   179   (p input1, p input2)
   179   (p input1, p input2)
   180 end" "((SOME \"h\", [\"e\", \"l\", \"l\", \"o\"]), (NONE, [\"w\", \"o\", \"r\", \"l\", \"d\"]))"}
   180 end" "((SOME \"h\", [\"e\", \"l\", \"l\", \"o\"]), (NONE, [\"w\", \"o\", \"r\", \"l\", \"d\"]))"}
   181 
   181 
   182   The function @{ML [index] "!!"} helps to produce appropriate error messages
   182   The function @{ML_ind [index] "!!"} helps to produce appropriate error messages
   183   for parsing. For example if you want to parse @{text p} immediately 
   183   for parsing. For example if you want to parse @{text p} immediately 
   184   followed by @{text q}, or start a completely different parser @{text r},
   184   followed by @{text q}, or start a completely different parser @{text r},
   185   you might write:
   185   you might write:
   186 
   186 
   187   @{ML [display,gray] "(p -- q) || r" for p q r}
   187   @{ML [display,gray] "(p -- q) || r" for p q r}
   213   @{ML_response_fake [display,gray] "(!! (fn _ => \"foo\") ($$ \"h\")) (Symbol.explode \"world\")"
   213   @{ML_response_fake [display,gray] "(!! (fn _ => \"foo\") ($$ \"h\")) (Symbol.explode \"world\")"
   214                                "Exception ABORT raised"}
   214                                "Exception ABORT raised"}
   215 
   215 
   216   then the parsing aborts and the error message @{text "foo"} is printed. In order to
   216   then the parsing aborts and the error message @{text "foo"} is printed. In order to
   217   see the error message properly, you need to prefix the parser with the function 
   217   see the error message properly, you need to prefix the parser with the function 
   218   @{ML [index] error in Scan}. For example:
   218   @{ML_ind [index] error in Scan}. For example:
   219 
   219 
   220   @{ML_response_fake [display,gray] 
   220   @{ML_response_fake [display,gray] 
   221   "Scan.error (!! (fn _ => \"foo\") ($$ \"h\"))"
   221   "Scan.error (!! (fn _ => \"foo\") ($$ \"h\"))"
   222   "Exception Error \"foo\" raised"}
   222   "Exception Error \"foo\" raised"}
   223 
   223 
   224   This ``prefixing'' is usually done by wrappers such as @{ML [index] local_theory in OuterSyntax} 
   224   This ``prefixing'' is usually done by wrappers such as @{ML_ind [index] local_theory in OuterSyntax} 
   225   (see Section~\ref{sec:newcommand} which explains this function in more detail). 
   225   (see Section~\ref{sec:newcommand} which explains this function in more detail). 
   226   
   226   
   227   Let us now return to our example of parsing @{ML "(p -- q) || r" for p q
   227   Let us now return to our example of parsing @{ML "(p -- q) || r" for p q
   228   r}. If you want to generate the correct error message for 
   228   r}. If you want to generate the correct error message for 
   229   @{text "p"}-followed-by-@{text "q"}, then you have to write:
   229   @{text "p"}-followed-by-@{text "q"}, then you have to write:
   256   often as it succeeds. For example:
   256   often as it succeeds. For example:
   257   
   257   
   258   @{ML_response [display,gray] "Scan.repeat ($$ \"h\") (Symbol.explode \"hhhhello\")" 
   258   @{ML_response [display,gray] "Scan.repeat ($$ \"h\") (Symbol.explode \"hhhhello\")" 
   259                 "([\"h\", \"h\", \"h\", \"h\"], [\"e\", \"l\", \"l\", \"o\"])"}
   259                 "([\"h\", \"h\", \"h\", \"h\"], [\"e\", \"l\", \"l\", \"o\"])"}
   260   
   260   
   261   Note that @{ML [index] repeat in Scan} stores the parsed items in a list. The function
   261   Note that @{ML_ind [index] repeat in Scan} stores the parsed items in a list. The function
   262   @{ML [index] repeat1 in Scan} is similar, but requires that the parser @{text "p"} 
   262   @{ML_ind [index] repeat1 in Scan} is similar, but requires that the parser @{text "p"} 
   263   succeeds at least once.
   263   succeeds at least once.
   264 
   264 
   265   Also note that the parser would have aborted with the exception @{text MORE}, if
   265   Also note that the parser would have aborted with the exception @{text MORE}, if
   266   you had run it only on just @{text [quotes] "hhhh"}. This can be avoided by using
   266   you had run it only on just @{text [quotes] "hhhh"}. This can be avoided by using
   267   the wrapper @{ML [index] finite in Scan} and the ``stopper-token'' 
   267   the wrapper @{ML_ind [index] finite in Scan} and the ``stopper-token'' 
   268   @{ML [index] stopper in Symbol}. With them you can write:
   268   @{ML_ind [index] stopper in Symbol}. With them you can write:
   269   
   269   
   270   @{ML_response [display,gray] "Scan.finite Symbol.stopper (Scan.repeat ($$ \"h\")) (Symbol.explode \"hhhh\")" 
   270   @{ML_response [display,gray] "Scan.finite Symbol.stopper (Scan.repeat ($$ \"h\")) (Symbol.explode \"hhhh\")" 
   271                 "([\"h\", \"h\", \"h\", \"h\"], [])"}
   271                 "([\"h\", \"h\", \"h\", \"h\"], [])"}
   272 
   272 
   273   @{ML Symbol.stopper} is the ``end-of-input'' indicator for parsing strings;
   273   @{ML Symbol.stopper} is the ``end-of-input'' indicator for parsing strings;
   274   other stoppers need to be used when parsing, for example, tokens. However, this kind of 
   274   other stoppers need to be used when parsing, for example, tokens. However, this kind of 
   275   manually wrapping is often already done by the surrounding infrastructure. 
   275   manually wrapping is often already done by the surrounding infrastructure. 
   276 
   276 
   277   The function @{ML [index] repeat in Scan} can be used with @{ML [index] one in Scan} to read any 
   277   The function @{ML_ind [index] repeat in Scan} can be used with @{ML_ind [index] one in Scan} to read any 
   278   string as in
   278   string as in
   279 
   279 
   280   @{ML_response [display,gray] 
   280   @{ML_response [display,gray] 
   281 "let 
   281 "let 
   282    val p = Scan.repeat (Scan.one Symbol.not_eof)
   282    val p = Scan.repeat (Scan.one Symbol.not_eof)
   284 in
   284 in
   285    Scan.finite Symbol.stopper p input
   285    Scan.finite Symbol.stopper p input
   286 end" 
   286 end" 
   287 "([\"f\", \"o\", \"o\", \" \", \"b\", \"a\", \"r\", \" \", \"f\", \"o\", \"o\"], [])"}
   287 "([\"f\", \"o\", \"o\", \" \", \"b\", \"a\", \"r\", \" \", \"f\", \"o\", \"o\"], [])"}
   288 
   288 
   289   where the function @{ML [index] not_eof in Symbol} ensures that we do not read beyond the 
   289   where the function @{ML_ind [index] not_eof in Symbol} ensures that we do not read beyond the 
   290   end of the input string (i.e.~stopper symbol).
   290   end of the input string (i.e.~stopper symbol).
   291 
   291 
   292   The function @{ML "Scan.unless p q" for p q} takes two parsers: if the first one can 
   292   The function @{ML_ind [index] unless in Scan} takes two parsers: if the first one can 
   293   parse the input, then the whole parser fails; if not, then the second is tried. Therefore
   293   parse the input, then the whole parser fails; if not, then the second is tried. Therefore
   294   
   294   
   295   @{ML_response_fake_both [display,gray] "Scan.unless ($$ \"h\") ($$ \"w\") (Symbol.explode \"hello\")"
   295   @{ML_response_fake_both [display,gray] "Scan.unless ($$ \"h\") ($$ \"w\") (Symbol.explode \"hello\")"
   296                                "Exception FAIL raised"}
   296                                "Exception FAIL raised"}
   297 
   297 
   300   @{ML_response [display,gray] "Scan.unless ($$ \"h\") ($$ \"w\") (Symbol.explode \"world\")"
   300   @{ML_response [display,gray] "Scan.unless ($$ \"h\") ($$ \"w\") (Symbol.explode \"world\")"
   301                           "(\"w\",[\"o\", \"r\", \"l\", \"d\"])"}
   301                           "(\"w\",[\"o\", \"r\", \"l\", \"d\"])"}
   302 
   302 
   303   succeeds. 
   303   succeeds. 
   304 
   304 
   305   The functions @{ML [index] repeat in Scan} and @{ML [index] unless in Scan} can 
   305   The functions @{ML_ind [index] repeat in Scan} and @{ML_ind [index] unless in Scan} can 
   306   be combined to read any input until a certain marker symbol is reached. In the 
   306   be combined to read any input until a certain marker symbol is reached. In the 
   307   example below the marker symbol is a @{text [quotes] "*"}.
   307   example below the marker symbol is a @{text [quotes] "*"}.
   308 
   308 
   309   @{ML_response [display,gray]
   309   @{ML_response [display,gray]
   310 "let 
   310 "let 
   318 "(([\"f\", \"o\", \"o\", \"o\", \"o\", \"o\"], []),
   318 "(([\"f\", \"o\", \"o\", \"o\", \"o\", \"o\"], []),
   319  ([\"f\", \"o\", \"o\"], [\"*\", \"o\", \"o\", \"o\"]))"}
   319  ([\"f\", \"o\", \"o\"], [\"*\", \"o\", \"o\", \"o\"]))"}
   320 
   320 
   321   
   321   
   322   After parsing is done, you almost always want to apply a function to the parsed 
   322   After parsing is done, you almost always want to apply a function to the parsed 
   323   items. One way to do this is the function @{ML [index]">>"} where 
   323   items. One way to do this is the function @{ML_ind [index]">>"} where 
   324   @{ML "(p >> f)" for p f} runs 
   324   @{ML "(p >> f)" for p f} runs 
   325   first the parser @{text p} and upon successful completion applies the 
   325   first the parser @{text p} and upon successful completion applies the 
   326   function @{text f} to the result. For example
   326   function @{text f} to the result. For example
   327 
   327 
   328 @{ML_response [display,gray]
   328 @{ML_response [display,gray]
   347   where the single-character strings in the parsed output are transformed
   347   where the single-character strings in the parsed output are transformed
   348   back into one string.
   348   back into one string.
   349  
   349  
   350   (FIXME:  move to an earlier place)
   350   (FIXME:  move to an earlier place)
   351 
   351 
   352   The function @{ML [index] ahead in Scan} parses some input, but leaves the original
   352   The function @{ML_ind [index] ahead in Scan} parses some input, but leaves the original
   353   input unchanged. For example:
   353   input unchanged. For example:
   354 
   354 
   355   @{ML_response [display,gray]
   355   @{ML_response [display,gray]
   356   "Scan.ahead (Scan.this_string \"foo\") (Symbol.explode \"foo\")" 
   356   "Scan.ahead (Scan.this_string \"foo\") (Symbol.explode \"foo\")" 
   357   "(\"foo\", [\"f\", \"o\", \"o\"])"} 
   357   "(\"foo\", [\"f\", \"o\", \"o\"])"} 
   358 
   358 
   359   The function @{ML [index] lift in Scan} takes a parser and a pair as arguments. This function applies
   359   The function @{ML_ind [index] lift in Scan} takes a parser and a pair as arguments. This function applies
   360   the given parser to the second component of the pair and leaves the  first component 
   360   the given parser to the second component of the pair and leaves the  first component 
   361   untouched. For example
   361   untouched. For example
   362 
   362 
   363 @{ML_response [display,gray]
   363 @{ML_response [display,gray]
   364 "Scan.lift ($$ \"h\" -- $$ \"e\") (1, Symbol.explode \"hello\")"
   364 "Scan.lift ($$ \"h\" -- $$ \"e\") (1, Symbol.explode \"hello\")"
   395   @{ML_struct OuterParse} defined in the file @{ML_file  "Pure/Isar/outer_parse.ML"}.
   395   @{ML_struct OuterParse} defined in the file @{ML_file  "Pure/Isar/outer_parse.ML"}.
   396   The definition for tokens is in the file @{ML_file "Pure/Isar/outer_lex.ML"}.
   396   The definition for tokens is in the file @{ML_file "Pure/Isar/outer_lex.ML"}.
   397   \end{readmore}
   397   \end{readmore}
   398 
   398 
   399   The structure @{ML_struct [index] OuterLex} defines several kinds of tokens (for
   399   The structure @{ML_struct [index] OuterLex} defines several kinds of tokens (for
   400   example @{ML [index] Ident in OuterLex} for identifiers, @{ML Keyword in
   400   example @{ML_ind [index] Ident in OuterLex} for identifiers, @{ML Keyword in
   401   OuterLex} for keywords and @{ML [index] Command in OuterLex} for commands). Some
   401   OuterLex} for keywords and @{ML_ind [index] Command in OuterLex} for commands). Some
   402   token parsers take into account the kind of tokens. The first example shows
   402   token parsers take into account the kind of tokens. The first example shows
   403   how to generate a token list out of a string using the function 
   403   how to generate a token list out of a string using the function 
   404   @{ML [index] scan in OuterSyntax}. It is given the argument 
   404   @{ML_ind [index] scan in OuterSyntax}. It is given the argument 
   405   @{ML "Position.none"} since,
   405   @{ML "Position.none"} since,
   406   at the moment, we are not interested in generating precise error
   406   at the moment, we are not interested in generating precise error
   407   messages. The following code\footnote{Note that because of a possible bug in
   407   messages. The following code\footnote{Note that because of a possible bug in
   408   the PolyML runtime system, the result is printed as @{text [quotes] "?"},
   408   the PolyML runtime system, the result is printed as @{text [quotes] "?"},
   409   instead of the tokens.}
   409   instead of the tokens.}
   417   produces three tokens where the first and the last are identifiers, since
   417   produces three tokens where the first and the last are identifiers, since
   418   @{text [quotes] "hello"} and @{text [quotes] "world"} do not match any
   418   @{text [quotes] "hello"} and @{text [quotes] "world"} do not match any
   419   other syntactic category. The second indicates a space.
   419   other syntactic category. The second indicates a space.
   420 
   420 
   421   We can easily change what is recognised as a keyword with 
   421   We can easily change what is recognised as a keyword with 
   422   @{ML [index] keyword in OuterKeyword}. For example calling this function 
   422   @{ML_ind [index] keyword in OuterKeyword}. For example calling this function 
   423 *}
   423 *}
   424 
   424 
   425 ML{*val _ = OuterKeyword.keyword "hello"*}
   425 ML{*val _ = OuterKeyword.keyword "hello"*}
   426 
   426 
   427 text {*
   427 text {*
   432  Token (\<dots>,(Space, \" \"),\<dots>), 
   432  Token (\<dots>,(Space, \" \"),\<dots>), 
   433  Token (\<dots>,(Ident, \"world\"),\<dots>)]"}
   433  Token (\<dots>,(Ident, \"world\"),\<dots>)]"}
   434 
   434 
   435   Many parsing functions later on will require white space, comments and the like
   435   Many parsing functions later on will require white space, comments and the like
   436   to have already been filtered out.  So from now on we are going to use the 
   436   to have already been filtered out.  So from now on we are going to use the 
   437   functions @{ML filter} and @{ML [index] is_proper in OuterLex} to do this. 
   437   functions @{ML filter} and @{ML_ind [index] is_proper in OuterLex} to do this. 
   438   For example:
   438   For example:
   439 
   439 
   440 @{ML_response_fake [display,gray]
   440 @{ML_response_fake [display,gray]
   441 "let
   441 "let
   442    val input = OuterSyntax.scan Position.none \"hello world\"
   442    val input = OuterSyntax.scan Position.none \"hello world\"
   470 in 
   470 in 
   471   (Scan.dest_lexicon commands, Scan.dest_lexicon keywords)
   471   (Scan.dest_lexicon commands, Scan.dest_lexicon keywords)
   472 end" 
   472 end" 
   473 "([\"}\", \"{\", \<dots>], [\"\<rightleftharpoons>\", \"\<leftharpoondown>\", \<dots>])"}
   473 "([\"}\", \"{\", \<dots>], [\"\<rightleftharpoons>\", \"\<leftharpoondown>\", \<dots>])"}
   474 
   474 
   475   You might have to adjust the @{ML [index] print_depth} in order to
   475   You might have to adjust the @{ML_ind [index] print_depth} in order to
   476   see the complete list.
   476   see the complete list.
   477 
   477 
   478   The parser @{ML [index] "$$$" in OuterParse} parses a single keyword. For example:
   478   The parser @{ML_ind [index] "$$$" in OuterParse} parses a single keyword. For example:
   479 
   479 
   480 @{ML_response [display,gray]
   480 @{ML_response [display,gray]
   481 "let 
   481 "let 
   482   val input1 = filtered_input \"where for\"
   482   val input1 = filtered_input \"where for\"
   483   val input2 = filtered_input \"| in\"
   483   val input2 = filtered_input \"| in\"
   484 in 
   484 in 
   485   (OuterParse.$$$ \"where\" input1, OuterParse.$$$ \"|\" input2)
   485   (OuterParse.$$$ \"where\" input1, OuterParse.$$$ \"|\" input2)
   486 end"
   486 end"
   487 "((\"where\",\<dots>), (\"|\",\<dots>))"}
   487 "((\"where\",\<dots>), (\"|\",\<dots>))"}
   488 
   488 
   489   Any non-keyword string can be parsed with the function @{ML [index] reserved in OuterParse}.
   489   Any non-keyword string can be parsed with the function @{ML_ind [index] reserved in OuterParse}.
   490   For example:
   490   For example:
   491 
   491 
   492   @{ML_response [display,gray]
   492   @{ML_response [display,gray]
   493 "let 
   493 "let 
   494   val p = OuterParse.reserved \"bar\"
   494   val p = OuterParse.reserved \"bar\"
   496 in
   496 in
   497   p input
   497   p input
   498 end"
   498 end"
   499   "(\"bar\",[])"}
   499   "(\"bar\",[])"}
   500 
   500 
   501   Like before, you can sequentially connect parsers with @{ML [index] "--"}. For example: 
   501   Like before, you can sequentially connect parsers with @{ML_ind [index] "--"}. For example: 
   502 
   502 
   503 @{ML_response [display,gray]
   503 @{ML_response [display,gray]
   504 "let 
   504 "let 
   505   val input = filtered_input \"| in\"
   505   val input = filtered_input \"| in\"
   506 in 
   506 in 
   519 in 
   519 in 
   520   (OuterParse.enum \"|\" (OuterParse.$$$ \"in\")) input
   520   (OuterParse.enum \"|\" (OuterParse.$$$ \"in\")) input
   521 end" 
   521 end" 
   522 "([\"in\", \"in\", \"in\"], [\<dots>])"}
   522 "([\"in\", \"in\", \"in\"], [\<dots>])"}
   523 
   523 
   524   @{ML [index] enum1 in OuterParse} works similarly, except that the parsed list must
   524   @{ML_ind [index] enum1 in OuterParse} works similarly, except that the parsed list must
   525   be non-empty. Note that we had to add a string @{text [quotes] "foo"} at the
   525   be non-empty. Note that we had to add a string @{text [quotes] "foo"} at the
   526   end of the parsed string, otherwise the parser would have consumed all
   526   end of the parsed string, otherwise the parser would have consumed all
   527   tokens and then failed with the exception @{text "MORE"}. Like in the
   527   tokens and then failed with the exception @{text "MORE"}. Like in the
   528   previous section, we can avoid this exception using the wrapper @{ML
   528   previous section, we can avoid this exception using the wrapper @{ML
   529   Scan.finite}. This time, however, we have to use the ``stopper-token'' @{ML
   529   Scan.finite}. This time, however, we have to use the ``stopper-token'' @{ML
   543 
   543 
   544 ML{*fun parse p input = Scan.finite OuterLex.stopper (Scan.error p) input *}
   544 ML{*fun parse p input = Scan.finite OuterLex.stopper (Scan.error p) input *}
   545 
   545 
   546 text {*
   546 text {*
   547 
   547 
   548   The function @{ML [index] "!!!" in OuterParse} can be used to force termination of the
   548   The function @{ML_ind [index] "!!!" in OuterParse} can be used to force termination of the
   549   parser in case of a dead end, just like @{ML "Scan.!!"} (see previous section). 
   549   parser in case of a dead end, just like @{ML "Scan.!!"} (see previous section). 
   550   Except that the error message of @{ML "OuterParse.!!!"} is fixed to be 
   550   Except that the error message of @{ML "OuterParse.!!!"} is fixed to be 
   551   @{text [quotes] "Outer syntax error"}
   551   @{text [quotes] "Outer syntax error"}
   552   together with a relatively precise description of the failure. For example:
   552   together with a relatively precise description of the failure. For example:
   553 
   553 
   562 but keyword in was found\" raised"
   562 but keyword in was found\" raised"
   563 }
   563 }
   564 
   564 
   565   \begin{exercise} (FIXME)
   565   \begin{exercise} (FIXME)
   566   A type-identifier, for example @{typ "'a"}, is a token of 
   566   A type-identifier, for example @{typ "'a"}, is a token of 
   567   kind @{ML [index] Keyword in OuterLex}. It can be parsed using 
   567   kind @{ML_ind [index] Keyword in OuterLex}. It can be parsed using 
   568   the function @{ML type_ident in OuterParse}.
   568   the function @{ML type_ident in OuterParse}.
   569   \end{exercise}
   569   \end{exercise}
   570 
   570 
   571   (FIXME: or give parser for numbers)
   571   (FIXME: or give parser for numbers)
   572 
   572 
   624 section {* Parsing Inner Syntax *}
   624 section {* Parsing Inner Syntax *}
   625 
   625 
   626 text {*
   626 text {*
   627   There is usually no need to write your own parser for parsing inner syntax, that is 
   627   There is usually no need to write your own parser for parsing inner syntax, that is 
   628   for terms and  types: you can just call the predefined parsers. Terms can 
   628   for terms and  types: you can just call the predefined parsers. Terms can 
   629   be parsed using the function @{ML [index] term in OuterParse}. For example:
   629   be parsed using the function @{ML_ind [index] term in OuterParse}. For example:
   630 
   630 
   631 @{ML_response [display,gray]
   631 @{ML_response [display,gray]
   632 "let 
   632 "let 
   633   val input = OuterSyntax.scan Position.none \"foo\"
   633   val input = OuterSyntax.scan Position.none \"foo\"
   634 in 
   634 in 
   635   OuterParse.term input
   635   OuterParse.term input
   636 end"
   636 end"
   637 "(\"\\^E\\^Ftoken\\^Efoo\\^E\\^F\\^E\", [])"}
   637 "(\"\\^E\\^Ftoken\\^Efoo\\^E\\^F\\^E\", [])"}
   638 
   638 
   639   The function @{ML [index] prop in OuterParse} is similar, except that it gives a different
   639   The function @{ML_ind [index] prop in OuterParse} is similar, except that it gives a different
   640   error message, when parsing fails. As you can see, the parser not just returns 
   640   error message, when parsing fails. As you can see, the parser not just returns 
   641   the parsed string, but also some encoded information. You can decode the
   641   the parsed string, but also some encoded information. You can decode the
   642   information with the function @{ML [index] parse in YXML}. For example
   642   information with the function @{ML_ind [index] parse in YXML}. For example
   643 
   643 
   644   @{ML_response [display,gray]
   644   @{ML_response [display,gray]
   645   "YXML.parse \"\\^E\\^Ftoken\\^Efoo\\^E\\^F\\^E\""
   645   "YXML.parse \"\\^E\\^Ftoken\\^Efoo\\^E\\^F\\^E\""
   646   "XML.Elem (\"token\", [], [XML.Text \"foo\"])"}
   646   "XML.Elem (\"token\", [], [XML.Text \"foo\"])"}
   647 
   647 
   721 
   721 
   722   As you see, the result is a pair consisting of a list of
   722   As you see, the result is a pair consisting of a list of
   723   variables with optional type-annotation and syntax-annotation, and a list of
   723   variables with optional type-annotation and syntax-annotation, and a list of
   724   rules where every rule has optionally a name and an attribute.
   724   rules where every rule has optionally a name and an attribute.
   725 
   725 
   726   The function @{ML [index] "fixes" in OuterParse} in Line 2 of the parser reads an 
   726   The function @{ML_ind [index] "fixes" in OuterParse} in Line 2 of the parser reads an 
   727   \isacommand{and}-separated 
   727   \isacommand{and}-separated 
   728   list of variables that can include optional type annotations and syntax translations. 
   728   list of variables that can include optional type annotations and syntax translations. 
   729   For example:\footnote{Note that in the code we need to write 
   729   For example:\footnote{Note that in the code we need to write 
   730   @{text "\\\"int \<Rightarrow> bool\\\""} in order to properly escape the double quotes
   730   @{text "\\\"int \<Rightarrow> bool\\\""} in order to properly escape the double quotes
   731   in the compound type.}
   731   in the compound type.}
   745 text {*
   745 text {*
   746   Whenever types are given, they are stored in the @{ML SOME}s. The types are
   746   Whenever types are given, they are stored in the @{ML SOME}s. The types are
   747   not yet used to type the variables: this must be done by type-inference later
   747   not yet used to type the variables: this must be done by type-inference later
   748   on. Since types are part of the inner syntax they are strings with some
   748   on. Since types are part of the inner syntax they are strings with some
   749   encoded information (see previous section). If a mixfix-syntax is
   749   encoded information (see previous section). If a mixfix-syntax is
   750   present for a variable, then it is stored in the @{ML [index] Mixfix} data structure;
   750   present for a variable, then it is stored in the @{ML_ind [index] Mixfix} data structure;
   751   no syntax translation is indicated by @{ML [index] NoSyn}.
   751   no syntax translation is indicated by @{ML_ind [index] NoSyn}.
   752 
   752 
   753   \begin{readmore}
   753   \begin{readmore}
   754   The data structure for mixfix annotations is defined in @{ML_file "Pure/Syntax/mixfix.ML"}.
   754   The data structure for mixfix annotations is defined in @{ML_file "Pure/Syntax/mixfix.ML"}.
   755   \end{readmore}
   755   \end{readmore}
   756 
   756 
   757   Lines 3 to 7 in the function @{ML spec_parser} implement the parser for a
   757   Lines 3 to 7 in the function @{ML spec_parser} implement the parser for a
   758   list of introduction rules, that is propositions with theorem annotations
   758   list of introduction rules, that is propositions with theorem annotations
   759   such as rule names and attributes. The introduction rules are propositions
   759   such as rule names and attributes. The introduction rules are propositions
   760   parsed by @{ML [index] prop  in OuterParse}. However, they can include an optional
   760   parsed by @{ML_ind [index] prop  in OuterParse}. However, they can include an optional
   761   theorem name plus some attributes. For example
   761   theorem name plus some attributes. For example
   762 
   762 
   763 @{ML_response [display,gray] "let 
   763 @{ML_response [display,gray] "let 
   764   val input = filtered_input \"foo_lemma[intro,dest!]:\"
   764   val input = filtered_input \"foo_lemma[intro,dest!]:\"
   765   val ((name, attrib), _) = parse (SpecParse.thm_name \":\") input 
   765   val ((name, attrib), _) = parse (SpecParse.thm_name \":\") input 
   766 in 
   766 in 
   767   (name, map Args.dest_src attrib)
   767   (name, map Args.dest_src attrib)
   768 end" "(foo_lemma, [((\"intro\", []), \<dots>), ((\"dest\", [\<dots>]), \<dots>)])"}
   768 end" "(foo_lemma, [((\"intro\", []), \<dots>), ((\"dest\", [\<dots>]), \<dots>)])"}
   769  
   769  
   770   The function @{ML [index] opt_thm_name in SpecParse} is the ``optional'' variant of
   770   The function @{ML_ind [index] opt_thm_name in SpecParse} is the ``optional'' variant of
   771   @{ML [index] thm_name in SpecParse}. Theorem names can contain attributes. The name 
   771   @{ML_ind [index] thm_name in SpecParse}. Theorem names can contain attributes. The name 
   772   has to end with @{text [quotes] ":"}---see the argument of 
   772   has to end with @{text [quotes] ":"}---see the argument of 
   773   the function @{ML SpecParse.opt_thm_name} in Line 7.
   773   the function @{ML SpecParse.opt_thm_name} in Line 7.
   774 
   774 
   775   \begin{readmore}
   775   \begin{readmore}
   776   Attributes and arguments are implemented in the files @{ML_file "Pure/Isar/attrib.ML"} 
   776   Attributes and arguments are implemented in the files @{ML_file "Pure/Isar/attrib.ML"} 
   781 text_raw {*
   781 text_raw {*
   782   \begin{exercise}
   782   \begin{exercise}
   783   Have a look at how the parser @{ML SpecParse.where_alt_specs} is implemented
   783   Have a look at how the parser @{ML SpecParse.where_alt_specs} is implemented
   784   in file @{ML_file "Pure/Isar/spec_parse.ML"}. This parser corresponds
   784   in file @{ML_file "Pure/Isar/spec_parse.ML"}. This parser corresponds
   785   to the ``where-part'' of the introduction rules given above. Below
   785   to the ``where-part'' of the introduction rules given above. Below
   786   we paraphrase the code of @{ML [index] where_alt_specs in SpecParse} adapted to our
   786   we paraphrase the code of @{ML_ind [index] where_alt_specs in SpecParse} adapted to our
   787   purposes. 
   787   purposes. 
   788   \begin{isabelle}
   788   \begin{isabelle}
   789 *}
   789 *}
   790 ML %linenosgray{*val spec_parser' = 
   790 ML %linenosgray{*val spec_parser' = 
   791      OuterParse.fixes -- 
   791      OuterParse.fixes -- 
   832 in
   832 in
   833   OuterSyntax.local_theory "foobar" "description of foobar" kind do_nothing
   833   OuterSyntax.local_theory "foobar" "description of foobar" kind do_nothing
   834 end *}
   834 end *}
   835 
   835 
   836 text {*
   836 text {*
   837   The crucial function @{ML [index] local_theory in OuterSyntax} expects a name for the command, a
   837   The crucial function @{ML_ind [index] local_theory in OuterSyntax} expects a name for the command, a
   838   short description, a kind indicator (which we will explain later more thoroughly) and a
   838   short description, a kind indicator (which we will explain later more thoroughly) and a
   839   parser producing a local theory transition (its purpose will also explained
   839   parser producing a local theory transition (its purpose will also explained
   840   later). 
   840   later). 
   841 
   841 
   842   While this is everything you have to do on the ML-level, you need a keyword
   842   While this is everything you have to do on the ML-level, you need a keyword
   972   next by letting it take a proposition as argument and printing this proposition 
   972   next by letting it take a proposition as argument and printing this proposition 
   973   inside the tracing buffer. 
   973   inside the tracing buffer. 
   974 
   974 
   975   The crucial part of a command is the function that determines the behaviour
   975   The crucial part of a command is the function that determines the behaviour
   976   of the command. In the code above we used a ``do-nothing''-function, which
   976   of the command. In the code above we used a ``do-nothing''-function, which
   977   because of @{ML [index] succeed in Scan} does not parse any argument, but immediately
   977   because of @{ML_ind [index] succeed in Scan} does not parse any argument, but immediately
   978   returns the simple function @{ML "LocalTheory.theory I"}. We can
   978   returns the simple function @{ML "LocalTheory.theory I"}. We can
   979   replace this code by a function that first parses a proposition (using the
   979   replace this code by a function that first parses a proposition (using the
   980   parser @{ML OuterParse.prop}), then prints out the tracing
   980   parser @{ML OuterParse.prop}), then prints out the tracing
   981   information (using a new function @{text trace_prop}) and 
   981   information (using a new function @{text trace_prop}) and 
   982   finally does nothing. For this you can write:
   982   finally does nothing. For this you can write:
  1002   @{text "> \"True \<and> False\""}
  1002   @{text "> \"True \<and> False\""}
  1003   \end{isabelle}
  1003   \end{isabelle}
  1004   
  1004   
  1005   and see the proposition in the tracing buffer.  
  1005   and see the proposition in the tracing buffer.  
  1006 
  1006 
  1007   Note that so far we used @{ML [index] thy_decl in OuterKeyword} as the kind
  1007   Note that so far we used @{ML_ind [index] thy_decl in OuterKeyword} as the kind
  1008   indicator for the command.  This means that the command finishes as soon as
  1008   indicator for the command.  This means that the command finishes as soon as
  1009   the arguments are processed. Examples of this kind of commands are
  1009   the arguments are processed. Examples of this kind of commands are
  1010   \isacommand{definition} and \isacommand{declare}.  In other cases, commands
  1010   \isacommand{definition} and \isacommand{declare}.  In other cases, commands
  1011   are expected to parse some arguments, for example a proposition, and then
  1011   are expected to parse some arguments, for example a proposition, and then
  1012   ``open up'' a proof in order to prove the proposition (for example
  1012   ``open up'' a proof in order to prove the proposition (for example
  1013   \isacommand{lemma}) or prove some other properties (for example
  1013   \isacommand{lemma}) or prove some other properties (for example
  1014   \isacommand{function}). To achieve this kind of behaviour, you have to use
  1014   \isacommand{function}). To achieve this kind of behaviour, you have to use
  1015   the kind indicator @{ML [index] thy_goal in OuterKeyword} and the function @{ML
  1015   the kind indicator @{ML_ind [index] thy_goal in OuterKeyword} and the function @{ML
  1016   "local_theory_to_proof" in OuterSyntax} to set up the command.  Note,
  1016   "local_theory_to_proof" in OuterSyntax} to set up the command.  Note,
  1017   however, once you change the ``kind'' of a command from @{ML thy_decl in
  1017   however, once you change the ``kind'' of a command from @{ML thy_decl in
  1018   OuterKeyword} to @{ML thy_goal in OuterKeyword} then the keyword file needs
  1018   OuterKeyword} to @{ML thy_goal in OuterKeyword} then the keyword file needs
  1019   to be re-created!
  1019   to be re-created!
  1020 
  1020 
  1039 end *}
  1039 end *}
  1040 
  1040 
  1041 text {*
  1041 text {*
  1042   The function @{text prove_prop} in Lines 2 to 7 takes a string (the proposition to be
  1042   The function @{text prove_prop} in Lines 2 to 7 takes a string (the proposition to be
  1043   proved) and a context as argument.  The context is necessary in order to be able to use
  1043   proved) and a context as argument.  The context is necessary in order to be able to use
  1044   @{ML [index] read_prop in Syntax}, which converts a string into a proper proposition.
  1044   @{ML_ind [index] read_prop in Syntax}, which converts a string into a proper proposition.
  1045   In Line 6 the function @{ML [index] theorem_i in Proof} starts the proof for the
  1045   In Line 6 the function @{ML_ind [index] theorem_i in Proof} starts the proof for the
  1046   proposition. Its argument @{ML NONE} stands for a locale (which we chose to
  1046   proposition. Its argument @{ML NONE} stands for a locale (which we chose to
  1047   omit); the argument @{ML "(K I)"} stands for a function that determines what
  1047   omit); the argument @{ML "(K I)"} stands for a function that determines what
  1048   should be done with the theorem once it is proved (we chose to just forget
  1048   should be done with the theorem once it is proved (we chose to just forget
  1049   about it). Line 9 contains the parser for the proposition.
  1049   about it). Line 9 contains the parser for the proposition.
  1050 
  1050 
  1064   \isacommand{apply}@{text "(rule conjI)"}\\
  1064   \isacommand{apply}@{text "(rule conjI)"}\\
  1065   \isacommand{apply}@{text "(rule TrueI)+"}\\
  1065   \isacommand{apply}@{text "(rule TrueI)+"}\\
  1066   \isacommand{done}
  1066   \isacommand{done}
  1067   \end{isabelle}
  1067   \end{isabelle}
  1068 
  1068 
  1069   (FIXME: read a name and show how to store theorems; see @{ML [index] note in LocalTheory})
  1069   (FIXME: read a name and show how to store theorems; see @{ML_ind [index] note in LocalTheory})
  1070   
  1070   
  1071 *}
  1071 *}
  1072 
  1072 
  1073 section {* Methods (TBD) *}
  1073 section {* Methods (TBD) *}
  1074 
  1074 
  1097 
  1097 
  1098 text {*
  1098 text {*
  1099   It defines the method @{text foo}, which takes no arguments (therefore the
  1099   It defines the method @{text foo}, which takes no arguments (therefore the
  1100   parser @{ML Scan.succeed}) and only applies a single tactic, namely the tactic which 
  1100   parser @{ML Scan.succeed}) and only applies a single tactic, namely the tactic which 
  1101   applies @{thm [source] conjE} and then @{thm [source] conjI}. The function 
  1101   applies @{thm [source] conjE} and then @{thm [source] conjI}. The function 
  1102   @{ML [index] SIMPLE_METHOD}
  1102   @{ML_ind [index] SIMPLE_METHOD}
  1103   turns such a tactic into a method. The method @{text "foo"} can be used as follows
  1103   turns such a tactic into a method. The method @{text "foo"} can be used as follows
  1104 *}
  1104 *}
  1105 
  1105 
  1106 lemma shows "A \<and> B \<Longrightarrow> C \<and> D"
  1106 lemma shows "A \<and> B \<Longrightarrow> C \<and> D"
  1107 apply(foo)
  1107 apply(foo)