ProgTutorial/Parsing.thy
changeset 316 74f0a06f751f
parent 315 de49d5780f57
child 319 6bce4acf7f2a
equal deleted inserted replaced
315:de49d5780f57 316:74f0a06f751f
    35 section {* Building Generic Parsers *}
    35 section {* Building Generic Parsers *}
    36 
    36 
    37 text {*
    37 text {*
    38 
    38 
    39   Let us first have a look at parsing strings using generic parsing
    39   Let us first have a look at parsing strings using generic parsing
    40   combinators. The function @{ML_ind [index] "$$"} takes a string as argument and will
    40   combinators. The function @{ML_ind  "$$"} takes a string as argument and will
    41   ``consume'' this string from a given input list of strings. ``Consume'' in
    41   ``consume'' this string from a given input list of strings. ``Consume'' in
    42   this context means that it will return a pair consisting of this string and
    42   this context means that it will return a pair consisting of this string and
    43   the rest of the input list. For example:
    43   the rest of the input list. For example:
    44 
    44 
    45   @{ML_response [display,gray] 
    45   @{ML_response [display,gray] 
    69   \end{itemize}
    69   \end{itemize}
    70 
    70 
    71   However, note that these exceptions are private to the parser and cannot be accessed
    71   However, note that these exceptions are private to the parser and cannot be accessed
    72   by the programmer (for example to handle them).
    72   by the programmer (for example to handle them).
    73 
    73 
    74   \indexdef{explode@ {\tt\slshape{}explode}}{in {\tt\slshape Symbol}}
    74   In the examples above we use the function @{ML_ind Symbol.explode}, instead of the 
    75   \index{explode@ {\tt\slshape{}explode}}
    75   more standard library function @{ML_ind explode}, for obtaining an input list for 
    76   In the examples above we use the function @{ML Symbol.explode}, instead of the 
    76   the parser. The reason is that @{ML_ind Symbol.explode} is aware of character sequences,
    77   more standard library function @{ML explode}, for obtaining an input list for 
       
    78   the parser. The reason is that @{ML Symbol.explode} is aware of character sequences,
       
    79   for example @{text "\<foo>"}, that have a special meaning in Isabelle. To see
    77   for example @{text "\<foo>"}, that have a special meaning in Isabelle. To see
    80   the difference consider
    78   the difference consider
    81 
    79 
    82 @{ML_response_fake [display,gray]
    80 @{ML_response_fake [display,gray]
    83 "let 
    81 "let 
    87 end"
    85 end"
    88 "([\"\\\", \"<\", \"f\", \"o\", \"o\", \">\", \" \", \"b\", \"a\", \"r\"],
    86 "([\"\\\", \"<\", \"f\", \"o\", \"o\", \">\", \" \", \"b\", \"a\", \"r\"],
    89  [\"\<foo>\", \" \", \"b\", \"a\", \"r\"])"}
    87  [\"\<foo>\", \" \", \"b\", \"a\", \"r\"])"}
    90 
    88 
    91   Slightly more general than the parser @{ML "$$"} is the function 
    89   Slightly more general than the parser @{ML "$$"} is the function 
    92   @{ML_ind [index] one in Scan}, in that it takes a predicate as argument and 
    90   @{ML_ind  one in Scan}, in that it takes a predicate as argument and 
    93   then parses exactly
    91   then parses exactly
    94   one item from the input list satisfying this predicate. For example the
    92   one item from the input list satisfying this predicate. For example the
    95   following parser either consumes an @{text [quotes] "h"} or a @{text
    93   following parser either consumes an @{text [quotes] "h"} or a @{text
    96   [quotes] "w"}:
    94   [quotes] "w"}:
    97 
    95 
   103 in
   101 in
   104   (hw input1, hw input2)
   102   (hw input1, hw input2)
   105 end"
   103 end"
   106     "((\"h\", [\"e\", \"l\", \"l\", \"o\"]),(\"w\", [\"o\", \"r\", \"l\", \"d\"]))"}
   104     "((\"h\", [\"e\", \"l\", \"l\", \"o\"]),(\"w\", [\"o\", \"r\", \"l\", \"d\"]))"}
   107 
   105 
   108   Two parsers can be connected in sequence by using the function @{ML_ind [index] "--"}. 
   106   Two parsers can be connected in sequence by using the function @{ML_ind  "--"}. 
   109   For example parsing @{text "h"}, @{text "e"} and @{text "l"} (in this 
   107   For example parsing @{text "h"}, @{text "e"} and @{text "l"} (in this 
   110   order) you can achieve by:
   108   order) you can achieve by:
   111 
   109 
   112   @{ML_response [display,gray] 
   110   @{ML_response [display,gray] 
   113   "($$ \"h\" -- $$ \"e\" -- $$ \"l\") (Symbol.explode \"hello\")"
   111   "($$ \"h\" -- $$ \"e\" -- $$ \"l\") (Symbol.explode \"hello\")"
   114   "(((\"h\", \"e\"), \"l\"), [\"l\", \"o\"])"}
   112   "(((\"h\", \"e\"), \"l\"), [\"l\", \"o\"])"}
   115 
   113 
   116   Note how the result of consumed strings builds up on the left as nested pairs.  
   114   Note how the result of consumed strings builds up on the left as nested pairs.  
   117 
   115 
   118   If, as in the previous example, you want to parse a particular string, 
   116   If, as in the previous example, you want to parse a particular string, 
   119   then you should use the function @{ML_ind [index] this_string in Scan}:
   117   then you should use the function @{ML_ind  this_string in Scan}:
   120 
   118 
   121   @{ML_response [display,gray] 
   119   @{ML_response [display,gray] 
   122   "Scan.this_string \"hell\" (Symbol.explode \"hello\")"
   120   "Scan.this_string \"hell\" (Symbol.explode \"hello\")"
   123   "(\"hell\", [\"o\"])"}
   121   "(\"hell\", [\"o\"])"}
   124 
   122 
   125   Parsers that explore alternatives can be constructed using the function 
   123   Parsers that explore alternatives can be constructed using the function 
   126   @{ML_ind [index] "||"}. The parser @{ML "(p || q)" for p q} returns the
   124   @{ML_ind  "||"}. The parser @{ML "(p || q)" for p q} returns the
   127   result of @{text "p"}, in case it succeeds, otherwise it returns the
   125   result of @{text "p"}, in case it succeeds, otherwise it returns the
   128   result of @{text "q"}. For example:
   126   result of @{text "q"}. For example:
   129 
   127 
   130 
   128 
   131 @{ML_response [display,gray] 
   129 @{ML_response [display,gray] 
   136 in
   134 in
   137   (hw input1, hw input2)
   135   (hw input1, hw input2)
   138 end"
   136 end"
   139   "((\"h\", [\"e\", \"l\", \"l\", \"o\"]), (\"w\", [\"o\", \"r\", \"l\", \"d\"]))"}
   137   "((\"h\", [\"e\", \"l\", \"l\", \"o\"]), (\"w\", [\"o\", \"r\", \"l\", \"d\"]))"}
   140 
   138 
   141   The functions @{ML_ind [index] "|--"} and @{ML_ind [index] "--|"} work like the sequencing function 
   139   The functions @{ML_ind  "|--"} and @{ML_ind  "--|"} work like the sequencing function 
   142   for parsers, except that they discard the item being parsed by the first (respectively second)
   140   for parsers, except that they discard the item being parsed by the first (respectively second)
   143   parser. For example:
   141   parser. For example:
   144   
   142   
   145 @{ML_response [display,gray]
   143 @{ML_response [display,gray]
   146 "let 
   144 "let 
   150 in 
   148 in 
   151   (just_e input, just_h input)
   149   (just_e input, just_h input)
   152 end"
   150 end"
   153   "((\"e\", [\"l\", \"l\", \"o\"]), (\"h\", [\"l\", \"l\", \"o\"]))"}
   151   "((\"e\", [\"l\", \"l\", \"o\"]), (\"h\", [\"l\", \"l\", \"o\"]))"}
   154 
   152 
   155   \indexdef{optional@ {\tt\slshape{optional}}}{in {\tt\slshape Scan}}
       
   156   The parser @{ML "Scan.optional p x" for p x} returns the result of the parser 
   153   The parser @{ML "Scan.optional p x" for p x} returns the result of the parser 
   157   @{text "p"}, if it succeeds; otherwise it returns 
   154   @{text "p"}, if it succeeds; otherwise it returns 
   158   the default value @{text "x"}. For example:
   155   the default value @{text "x"}. For example:
   159 
   156 
   160 @{ML_response [display,gray]
   157 @{ML_response [display,gray]
   165 in 
   162 in 
   166   (p input1, p input2)
   163   (p input1, p input2)
   167 end" 
   164 end" 
   168  "((\"h\", [\"e\", \"l\", \"l\", \"o\"]), (\"x\", [\"w\", \"o\", \"r\", \"l\", \"d\"]))"}
   165  "((\"h\", [\"e\", \"l\", \"l\", \"o\"]), (\"x\", [\"w\", \"o\", \"r\", \"l\", \"d\"]))"}
   169 
   166 
   170   The function @{ML_ind [index] option in Scan} works similarly, except no default value can
   167   The function @{ML_ind  option in Scan} works similarly, except no default value can
   171   be given. Instead, the result is wrapped as an @{text "option"}-type. For example:
   168   be given. Instead, the result is wrapped as an @{text "option"}-type. For example:
   172 
   169 
   173 @{ML_response [display,gray]
   170 @{ML_response [display,gray]
   174 "let 
   171 "let 
   175   val p = Scan.option ($$ \"h\")
   172   val p = Scan.option ($$ \"h\")
   177   val input2 = Symbol.explode \"world\"  
   174   val input2 = Symbol.explode \"world\"  
   178 in 
   175 in 
   179   (p input1, p input2)
   176   (p input1, p input2)
   180 end" "((SOME \"h\", [\"e\", \"l\", \"l\", \"o\"]), (NONE, [\"w\", \"o\", \"r\", \"l\", \"d\"]))"}
   177 end" "((SOME \"h\", [\"e\", \"l\", \"l\", \"o\"]), (NONE, [\"w\", \"o\", \"r\", \"l\", \"d\"]))"}
   181 
   178 
   182   The function @{ML_ind [index] "!!"} helps to produce appropriate error messages
   179   The function @{ML_ind  "!!"} helps to produce appropriate error messages
   183   for parsing. For example if you want to parse @{text p} immediately 
   180   for parsing. For example if you want to parse @{text p} immediately 
   184   followed by @{text q}, or start a completely different parser @{text r},
   181   followed by @{text q}, or start a completely different parser @{text r},
   185   you might write:
   182   you might write:
   186 
   183 
   187   @{ML [display,gray] "(p -- q) || r" for p q r}
   184   @{ML [display,gray] "(p -- q) || r" for p q r}
   213   @{ML_response_fake [display,gray] "(!! (fn _ => \"foo\") ($$ \"h\")) (Symbol.explode \"world\")"
   210   @{ML_response_fake [display,gray] "(!! (fn _ => \"foo\") ($$ \"h\")) (Symbol.explode \"world\")"
   214                                "Exception ABORT raised"}
   211                                "Exception ABORT raised"}
   215 
   212 
   216   then the parsing aborts and the error message @{text "foo"} is printed. In order to
   213   then the parsing aborts and the error message @{text "foo"} is printed. In order to
   217   see the error message properly, you need to prefix the parser with the function 
   214   see the error message properly, you need to prefix the parser with the function 
   218   @{ML_ind [index] error in Scan}. For example:
   215   @{ML_ind  error in Scan}. For example:
   219 
   216 
   220   @{ML_response_fake [display,gray] 
   217   @{ML_response_fake [display,gray] 
   221   "Scan.error (!! (fn _ => \"foo\") ($$ \"h\"))"
   218   "Scan.error (!! (fn _ => \"foo\") ($$ \"h\"))"
   222   "Exception Error \"foo\" raised"}
   219   "Exception Error \"foo\" raised"}
   223 
   220 
   224   This ``prefixing'' is usually done by wrappers such as @{ML_ind [index] local_theory in OuterSyntax} 
   221   This ``prefixing'' is usually done by wrappers such as @{ML_ind  local_theory in OuterSyntax} 
   225   (see Section~\ref{sec:newcommand} which explains this function in more detail). 
   222   (see Section~\ref{sec:newcommand} which explains this function in more detail). 
   226   
   223   
   227   Let us now return to our example of parsing @{ML "(p -- q) || r" for p q
   224   Let us now return to our example of parsing @{ML "(p -- q) || r" for p q
   228   r}. If you want to generate the correct error message for 
   225   r}. If you want to generate the correct error message for 
   229   @{text "p"}-followed-by-@{text "q"}, then you have to write:
   226   @{text "p"}-followed-by-@{text "q"}, then you have to write:
   256   often as it succeeds. For example:
   253   often as it succeeds. For example:
   257   
   254   
   258   @{ML_response [display,gray] "Scan.repeat ($$ \"h\") (Symbol.explode \"hhhhello\")" 
   255   @{ML_response [display,gray] "Scan.repeat ($$ \"h\") (Symbol.explode \"hhhhello\")" 
   259                 "([\"h\", \"h\", \"h\", \"h\"], [\"e\", \"l\", \"l\", \"o\"])"}
   256                 "([\"h\", \"h\", \"h\", \"h\"], [\"e\", \"l\", \"l\", \"o\"])"}
   260   
   257   
   261   Note that @{ML_ind [index] repeat in Scan} stores the parsed items in a list. The function
   258   Note that @{ML_ind  repeat in Scan} stores the parsed items in a list. The function
   262   @{ML_ind [index] repeat1 in Scan} is similar, but requires that the parser @{text "p"} 
   259   @{ML_ind  repeat1 in Scan} is similar, but requires that the parser @{text "p"} 
   263   succeeds at least once.
   260   succeeds at least once.
   264 
   261 
   265   Also note that the parser would have aborted with the exception @{text MORE}, if
   262   Also note that the parser would have aborted with the exception @{text MORE}, if
   266   you had run it only on just @{text [quotes] "hhhh"}. This can be avoided by using
   263   you had run it only on just @{text [quotes] "hhhh"}. This can be avoided by using
   267   the wrapper @{ML_ind [index] finite in Scan} and the ``stopper-token'' 
   264   the wrapper @{ML_ind  finite in Scan} and the ``stopper-token'' 
   268   @{ML_ind [index] stopper in Symbol}. With them you can write:
   265   @{ML_ind  stopper in Symbol}. With them you can write:
   269   
   266   
   270   @{ML_response [display,gray] "Scan.finite Symbol.stopper (Scan.repeat ($$ \"h\")) (Symbol.explode \"hhhh\")" 
   267   @{ML_response [display,gray] "Scan.finite Symbol.stopper (Scan.repeat ($$ \"h\")) (Symbol.explode \"hhhh\")" 
   271                 "([\"h\", \"h\", \"h\", \"h\"], [])"}
   268                 "([\"h\", \"h\", \"h\", \"h\"], [])"}
   272 
   269 
   273   @{ML Symbol.stopper} is the ``end-of-input'' indicator for parsing strings;
   270   @{ML Symbol.stopper} is the ``end-of-input'' indicator for parsing strings;
   274   other stoppers need to be used when parsing, for example, tokens. However, this kind of 
   271   other stoppers need to be used when parsing, for example, tokens. However, this kind of 
   275   manually wrapping is often already done by the surrounding infrastructure. 
   272   manually wrapping is often already done by the surrounding infrastructure. 
   276 
   273 
   277   The function @{ML_ind [index] repeat in Scan} can be used with @{ML_ind [index] one in Scan} to read any 
   274   The function @{ML_ind  repeat in Scan} can be used with @{ML_ind  one in Scan} to read any 
   278   string as in
   275   string as in
   279 
   276 
   280   @{ML_response [display,gray] 
   277   @{ML_response [display,gray] 
   281 "let 
   278 "let 
   282    val p = Scan.repeat (Scan.one Symbol.not_eof)
   279    val p = Scan.repeat (Scan.one Symbol.not_eof)
   284 in
   281 in
   285    Scan.finite Symbol.stopper p input
   282    Scan.finite Symbol.stopper p input
   286 end" 
   283 end" 
   287 "([\"f\", \"o\", \"o\", \" \", \"b\", \"a\", \"r\", \" \", \"f\", \"o\", \"o\"], [])"}
   284 "([\"f\", \"o\", \"o\", \" \", \"b\", \"a\", \"r\", \" \", \"f\", \"o\", \"o\"], [])"}
   288 
   285 
   289   where the function @{ML_ind [index] not_eof in Symbol} ensures that we do not read beyond the 
   286   where the function @{ML_ind  not_eof in Symbol} ensures that we do not read beyond the 
   290   end of the input string (i.e.~stopper symbol).
   287   end of the input string (i.e.~stopper symbol).
   291 
   288 
   292   The function @{ML_ind [index] unless in Scan} takes two parsers: if the first one can 
   289   The function @{ML_ind  unless in Scan} takes two parsers: if the first one can 
   293   parse the input, then the whole parser fails; if not, then the second is tried. Therefore
   290   parse the input, then the whole parser fails; if not, then the second is tried. Therefore
   294   
   291   
   295   @{ML_response_fake_both [display,gray] "Scan.unless ($$ \"h\") ($$ \"w\") (Symbol.explode \"hello\")"
   292   @{ML_response_fake_both [display,gray] "Scan.unless ($$ \"h\") ($$ \"w\") (Symbol.explode \"hello\")"
   296                                "Exception FAIL raised"}
   293                                "Exception FAIL raised"}
   297 
   294 
   300   @{ML_response [display,gray] "Scan.unless ($$ \"h\") ($$ \"w\") (Symbol.explode \"world\")"
   297   @{ML_response [display,gray] "Scan.unless ($$ \"h\") ($$ \"w\") (Symbol.explode \"world\")"
   301                           "(\"w\",[\"o\", \"r\", \"l\", \"d\"])"}
   298                           "(\"w\",[\"o\", \"r\", \"l\", \"d\"])"}
   302 
   299 
   303   succeeds. 
   300   succeeds. 
   304 
   301 
   305   The functions @{ML_ind [index] repeat in Scan} and @{ML_ind [index] unless in Scan} can 
   302   The functions @{ML_ind  repeat in Scan} and @{ML_ind  unless in Scan} can 
   306   be combined to read any input until a certain marker symbol is reached. In the 
   303   be combined to read any input until a certain marker symbol is reached. In the 
   307   example below the marker symbol is a @{text [quotes] "*"}.
   304   example below the marker symbol is a @{text [quotes] "*"}.
   308 
   305 
   309   @{ML_response [display,gray]
   306   @{ML_response [display,gray]
   310 "let 
   307 "let 
   318 "(([\"f\", \"o\", \"o\", \"o\", \"o\", \"o\"], []),
   315 "(([\"f\", \"o\", \"o\", \"o\", \"o\", \"o\"], []),
   319  ([\"f\", \"o\", \"o\"], [\"*\", \"o\", \"o\", \"o\"]))"}
   316  ([\"f\", \"o\", \"o\"], [\"*\", \"o\", \"o\", \"o\"]))"}
   320 
   317 
   321   
   318   
   322   After parsing is done, you almost always want to apply a function to the parsed 
   319   After parsing is done, you almost always want to apply a function to the parsed 
   323   items. One way to do this is the function @{ML_ind [index]">>"} where 
   320   items. One way to do this is the function @{ML_ind ">>"} where 
   324   @{ML "(p >> f)" for p f} runs 
   321   @{ML "(p >> f)" for p f} runs 
   325   first the parser @{text p} and upon successful completion applies the 
   322   first the parser @{text p} and upon successful completion applies the 
   326   function @{text f} to the result. For example
   323   function @{text f} to the result. For example
   327 
   324 
   328 @{ML_response [display,gray]
   325 @{ML_response [display,gray]
   347   where the single-character strings in the parsed output are transformed
   344   where the single-character strings in the parsed output are transformed
   348   back into one string.
   345   back into one string.
   349  
   346  
   350   (FIXME:  move to an earlier place)
   347   (FIXME:  move to an earlier place)
   351 
   348 
   352   The function @{ML_ind [index] ahead in Scan} parses some input, but leaves the original
   349   The function @{ML_ind  ahead in Scan} parses some input, but leaves the original
   353   input unchanged. For example:
   350   input unchanged. For example:
   354 
   351 
   355   @{ML_response [display,gray]
   352   @{ML_response [display,gray]
   356   "Scan.ahead (Scan.this_string \"foo\") (Symbol.explode \"foo\")" 
   353   "Scan.ahead (Scan.this_string \"foo\") (Symbol.explode \"foo\")" 
   357   "(\"foo\", [\"f\", \"o\", \"o\"])"} 
   354   "(\"foo\", [\"f\", \"o\", \"o\"])"} 
   358 
   355 
   359   The function @{ML_ind [index] lift in Scan} takes a parser and a pair as arguments. This function applies
   356   The function @{ML_ind  lift in Scan} takes a parser and a pair as arguments. This function applies
   360   the given parser to the second component of the pair and leaves the  first component 
   357   the given parser to the second component of the pair and leaves the  first component 
   361   untouched. For example
   358   untouched. For example
   362 
   359 
   363 @{ML_response [display,gray]
   360 @{ML_response [display,gray]
   364 "Scan.lift ($$ \"h\" -- $$ \"e\") (1, Symbol.explode \"hello\")"
   361 "Scan.lift ($$ \"h\" -- $$ \"e\") (1, Symbol.explode \"hello\")"
   394   The parser functions for the theory syntax are contained in the structure
   391   The parser functions for the theory syntax are contained in the structure
   395   @{ML_struct OuterParse} defined in the file @{ML_file  "Pure/Isar/outer_parse.ML"}.
   392   @{ML_struct OuterParse} defined in the file @{ML_file  "Pure/Isar/outer_parse.ML"}.
   396   The definition for tokens is in the file @{ML_file "Pure/Isar/outer_lex.ML"}.
   393   The definition for tokens is in the file @{ML_file "Pure/Isar/outer_lex.ML"}.
   397   \end{readmore}
   394   \end{readmore}
   398 
   395 
   399   The structure @{ML_struct [index] OuterLex} defines several kinds of tokens (for
   396   The structure @{ML_struct  OuterLex} defines several kinds of tokens (for
   400   example @{ML_ind [index] Ident in OuterLex} for identifiers, @{ML Keyword in
   397   example @{ML_ind  Ident in OuterLex} for identifiers, @{ML Keyword in
   401   OuterLex} for keywords and @{ML_ind [index] Command in OuterLex} for commands). Some
   398   OuterLex} for keywords and @{ML_ind  Command in OuterLex} for commands). Some
   402   token parsers take into account the kind of tokens. The first example shows
   399   token parsers take into account the kind of tokens. The first example shows
   403   how to generate a token list out of a string using the function 
   400   how to generate a token list out of a string using the function 
   404   @{ML_ind [index] scan in OuterSyntax}. It is given the argument 
   401   @{ML_ind  scan in OuterSyntax}. It is given the argument 
   405   @{ML "Position.none"} since,
   402   @{ML "Position.none"} since,
   406   at the moment, we are not interested in generating precise error
   403   at the moment, we are not interested in generating precise error
   407   messages. The following code\footnote{Note that because of a possible bug in
   404   messages. The following code\footnote{Note that because of a possible bug in
   408   the PolyML runtime system, the result is printed as @{text [quotes] "?"},
   405   the PolyML runtime system, the result is printed as @{text [quotes] "?"},
   409   instead of the tokens.}
   406   instead of the tokens.}
   417   produces three tokens where the first and the last are identifiers, since
   414   produces three tokens where the first and the last are identifiers, since
   418   @{text [quotes] "hello"} and @{text [quotes] "world"} do not match any
   415   @{text [quotes] "hello"} and @{text [quotes] "world"} do not match any
   419   other syntactic category. The second indicates a space.
   416   other syntactic category. The second indicates a space.
   420 
   417 
   421   We can easily change what is recognised as a keyword with 
   418   We can easily change what is recognised as a keyword with 
   422   @{ML_ind [index] keyword in OuterKeyword}. For example calling this function 
   419   @{ML_ind  keyword in OuterKeyword}. For example calling this function 
   423 *}
   420 *}
   424 
   421 
   425 ML{*val _ = OuterKeyword.keyword "hello"*}
   422 ML{*val _ = OuterKeyword.keyword "hello"*}
   426 
   423 
   427 text {*
   424 text {*
   432  Token (\<dots>,(Space, \" \"),\<dots>), 
   429  Token (\<dots>,(Space, \" \"),\<dots>), 
   433  Token (\<dots>,(Ident, \"world\"),\<dots>)]"}
   430  Token (\<dots>,(Ident, \"world\"),\<dots>)]"}
   434 
   431 
   435   Many parsing functions later on will require white space, comments and the like
   432   Many parsing functions later on will require white space, comments and the like
   436   to have already been filtered out.  So from now on we are going to use the 
   433   to have already been filtered out.  So from now on we are going to use the 
   437   functions @{ML filter} and @{ML_ind [index] is_proper in OuterLex} to do this. 
   434   functions @{ML filter} and @{ML_ind  is_proper in OuterLex} to do this. 
   438   For example:
   435   For example:
   439 
   436 
   440 @{ML_response_fake [display,gray]
   437 @{ML_response_fake [display,gray]
   441 "let
   438 "let
   442    val input = OuterSyntax.scan Position.none \"hello world\"
   439    val input = OuterSyntax.scan Position.none \"hello world\"
   470 in 
   467 in 
   471   (Scan.dest_lexicon commands, Scan.dest_lexicon keywords)
   468   (Scan.dest_lexicon commands, Scan.dest_lexicon keywords)
   472 end" 
   469 end" 
   473 "([\"}\", \"{\", \<dots>], [\"\<rightleftharpoons>\", \"\<leftharpoondown>\", \<dots>])"}
   470 "([\"}\", \"{\", \<dots>], [\"\<rightleftharpoons>\", \"\<leftharpoondown>\", \<dots>])"}
   474 
   471 
   475   You might have to adjust the @{ML_ind [index] print_depth} in order to
   472   You might have to adjust the @{ML_ind  print_depth} in order to
   476   see the complete list.
   473   see the complete list.
   477 
   474 
   478   The parser @{ML_ind [index] "$$$" in OuterParse} parses a single keyword. For example:
   475   The parser @{ML_ind  "$$$" in OuterParse} parses a single keyword. For example:
   479 
   476 
   480 @{ML_response [display,gray]
   477 @{ML_response [display,gray]
   481 "let 
   478 "let 
   482   val input1 = filtered_input \"where for\"
   479   val input1 = filtered_input \"where for\"
   483   val input2 = filtered_input \"| in\"
   480   val input2 = filtered_input \"| in\"
   484 in 
   481 in 
   485   (OuterParse.$$$ \"where\" input1, OuterParse.$$$ \"|\" input2)
   482   (OuterParse.$$$ \"where\" input1, OuterParse.$$$ \"|\" input2)
   486 end"
   483 end"
   487 "((\"where\",\<dots>), (\"|\",\<dots>))"}
   484 "((\"where\",\<dots>), (\"|\",\<dots>))"}
   488 
   485 
   489   Any non-keyword string can be parsed with the function @{ML_ind [index] reserved in OuterParse}.
   486   Any non-keyword string can be parsed with the function @{ML_ind  reserved in OuterParse}.
   490   For example:
   487   For example:
   491 
   488 
   492   @{ML_response [display,gray]
   489   @{ML_response [display,gray]
   493 "let 
   490 "let 
   494   val p = OuterParse.reserved \"bar\"
   491   val p = OuterParse.reserved \"bar\"
   496 in
   493 in
   497   p input
   494   p input
   498 end"
   495 end"
   499   "(\"bar\",[])"}
   496   "(\"bar\",[])"}
   500 
   497 
   501   Like before, you can sequentially connect parsers with @{ML_ind [index] "--"}. For example: 
   498   Like before, you can sequentially connect parsers with @{ML_ind  "--"}. For example: 
   502 
   499 
   503 @{ML_response [display,gray]
   500 @{ML_response [display,gray]
   504 "let 
   501 "let 
   505   val input = filtered_input \"| in\"
   502   val input = filtered_input \"| in\"
   506 in 
   503 in 
   507   (OuterParse.$$$ \"|\" -- OuterParse.$$$ \"in\") input
   504   (OuterParse.$$$ \"|\" -- OuterParse.$$$ \"in\") input
   508 end"
   505 end"
   509 "((\"|\", \"in\"), [])"}
   506 "((\"|\", \"in\"), [])"}
   510 
   507 
   511   \indexdef{enum@ {\tt\slshape{enum}}}{in {\tt\slshape OuterParse}}
       
   512   The parser @{ML "OuterParse.enum s p" for s p} parses a possibly empty 
   508   The parser @{ML "OuterParse.enum s p" for s p} parses a possibly empty 
   513   list of items recognised by the parser @{text p}, where the items being parsed
   509   list of items recognised by the parser @{text p}, where the items being parsed
   514   are separated by the string @{text s}. For example:
   510   are separated by the string @{text s}. For example:
   515 
   511 
   516 @{ML_response [display,gray]
   512 @{ML_response [display,gray]
   519 in 
   515 in 
   520   (OuterParse.enum \"|\" (OuterParse.$$$ \"in\")) input
   516   (OuterParse.enum \"|\" (OuterParse.$$$ \"in\")) input
   521 end" 
   517 end" 
   522 "([\"in\", \"in\", \"in\"], [\<dots>])"}
   518 "([\"in\", \"in\", \"in\"], [\<dots>])"}
   523 
   519 
   524   @{ML_ind [index] enum1 in OuterParse} works similarly, except that the parsed list must
   520   @{ML_ind  enum1 in OuterParse} works similarly, except that the parsed list must
   525   be non-empty. Note that we had to add a string @{text [quotes] "foo"} at the
   521   be non-empty. Note that we had to add a string @{text [quotes] "foo"} at the
   526   end of the parsed string, otherwise the parser would have consumed all
   522   end of the parsed string, otherwise the parser would have consumed all
   527   tokens and then failed with the exception @{text "MORE"}. Like in the
   523   tokens and then failed with the exception @{text "MORE"}. Like in the
   528   previous section, we can avoid this exception using the wrapper @{ML
   524   previous section, we can avoid this exception using the wrapper @{ML
   529   Scan.finite}. This time, however, we have to use the ``stopper-token'' @{ML
   525   Scan.finite}. This time, however, we have to use the ``stopper-token'' @{ML
   543 
   539 
   544 ML{*fun parse p input = Scan.finite OuterLex.stopper (Scan.error p) input *}
   540 ML{*fun parse p input = Scan.finite OuterLex.stopper (Scan.error p) input *}
   545 
   541 
   546 text {*
   542 text {*
   547 
   543 
   548   The function @{ML_ind [index] "!!!" in OuterParse} can be used to force termination of the
   544   The function @{ML_ind  "!!!" in OuterParse} can be used to force termination of the
   549   parser in case of a dead end, just like @{ML "Scan.!!"} (see previous section). 
   545   parser in case of a dead end, just like @{ML "Scan.!!"} (see previous section). 
   550   Except that the error message of @{ML "OuterParse.!!!"} is fixed to be 
   546   Except that the error message of @{ML "OuterParse.!!!"} is fixed to be 
   551   @{text [quotes] "Outer syntax error"}
   547   @{text [quotes] "Outer syntax error"}
   552   together with a relatively precise description of the failure. For example:
   548   together with a relatively precise description of the failure. For example:
   553 
   549 
   562 but keyword in was found\" raised"
   558 but keyword in was found\" raised"
   563 }
   559 }
   564 
   560 
   565   \begin{exercise} (FIXME)
   561   \begin{exercise} (FIXME)
   566   A type-identifier, for example @{typ "'a"}, is a token of 
   562   A type-identifier, for example @{typ "'a"}, is a token of 
   567   kind @{ML_ind [index] Keyword in OuterLex}. It can be parsed using 
   563   kind @{ML_ind  Keyword in OuterLex}. It can be parsed using 
   568   the function @{ML type_ident in OuterParse}.
   564   the function @{ML type_ident in OuterParse}.
   569   \end{exercise}
   565   \end{exercise}
   570 
   566 
   571   (FIXME: or give parser for numbers)
   567   (FIXME: or give parser for numbers)
   572 
   568 
   624 section {* Parsing Inner Syntax *}
   620 section {* Parsing Inner Syntax *}
   625 
   621 
   626 text {*
   622 text {*
   627   There is usually no need to write your own parser for parsing inner syntax, that is 
   623   There is usually no need to write your own parser for parsing inner syntax, that is 
   628   for terms and  types: you can just call the predefined parsers. Terms can 
   624   for terms and  types: you can just call the predefined parsers. Terms can 
   629   be parsed using the function @{ML_ind [index] term in OuterParse}. For example:
   625   be parsed using the function @{ML_ind  term in OuterParse}. For example:
   630 
   626 
   631 @{ML_response [display,gray]
   627 @{ML_response [display,gray]
   632 "let 
   628 "let 
   633   val input = OuterSyntax.scan Position.none \"foo\"
   629   val input = OuterSyntax.scan Position.none \"foo\"
   634 in 
   630 in 
   635   OuterParse.term input
   631   OuterParse.term input
   636 end"
   632 end"
   637 "(\"\\^E\\^Ftoken\\^Efoo\\^E\\^F\\^E\", [])"}
   633 "(\"\\^E\\^Ftoken\\^Efoo\\^E\\^F\\^E\", [])"}
   638 
   634 
   639   The function @{ML_ind [index] prop in OuterParse} is similar, except that it gives a different
   635   The function @{ML_ind  prop in OuterParse} is similar, except that it gives a different
   640   error message, when parsing fails. As you can see, the parser not just returns 
   636   error message, when parsing fails. As you can see, the parser not just returns 
   641   the parsed string, but also some encoded information. You can decode the
   637   the parsed string, but also some encoded information. You can decode the
   642   information with the function @{ML_ind [index] parse in YXML}. For example
   638   information with the function @{ML_ind  parse in YXML}. For example
   643 
   639 
   644   @{ML_response [display,gray]
   640   @{ML_response [display,gray]
   645   "YXML.parse \"\\^E\\^Ftoken\\^Efoo\\^E\\^F\\^E\""
   641   "YXML.parse \"\\^E\\^Ftoken\\^Efoo\\^E\\^F\\^E\""
   646   "XML.Elem (\"token\", [], [XML.Text \"foo\"])"}
   642   "XML.Elem (\"token\", [], [XML.Text \"foo\"])"}
   647 
   643 
   721 
   717 
   722   As you see, the result is a pair consisting of a list of
   718   As you see, the result is a pair consisting of a list of
   723   variables with optional type-annotation and syntax-annotation, and a list of
   719   variables with optional type-annotation and syntax-annotation, and a list of
   724   rules where every rule has optionally a name and an attribute.
   720   rules where every rule has optionally a name and an attribute.
   725 
   721 
   726   The function @{ML_ind [index] "fixes" in OuterParse} in Line 2 of the parser reads an 
   722   The function @{ML_ind  "fixes" in OuterParse} in Line 2 of the parser reads an 
   727   \isacommand{and}-separated 
   723   \isacommand{and}-separated 
   728   list of variables that can include optional type annotations and syntax translations. 
   724   list of variables that can include optional type annotations and syntax translations. 
   729   For example:\footnote{Note that in the code we need to write 
   725   For example:\footnote{Note that in the code we need to write 
   730   @{text "\\\"int \<Rightarrow> bool\\\""} in order to properly escape the double quotes
   726   @{text "\\\"int \<Rightarrow> bool\\\""} in order to properly escape the double quotes
   731   in the compound type.}
   727   in the compound type.}
   745 text {*
   741 text {*
   746   Whenever types are given, they are stored in the @{ML SOME}s. The types are
   742   Whenever types are given, they are stored in the @{ML SOME}s. The types are
   747   not yet used to type the variables: this must be done by type-inference later
   743   not yet used to type the variables: this must be done by type-inference later
   748   on. Since types are part of the inner syntax they are strings with some
   744   on. Since types are part of the inner syntax they are strings with some
   749   encoded information (see previous section). If a mixfix-syntax is
   745   encoded information (see previous section). If a mixfix-syntax is
   750   present for a variable, then it is stored in the @{ML_ind [index] Mixfix} data structure;
   746   present for a variable, then it is stored in the @{ML_ind  Mixfix} data structure;
   751   no syntax translation is indicated by @{ML_ind [index] NoSyn}.
   747   no syntax translation is indicated by @{ML_ind  NoSyn}.
   752 
   748 
   753   \begin{readmore}
   749   \begin{readmore}
   754   The data structure for mixfix annotations is defined in @{ML_file "Pure/Syntax/mixfix.ML"}.
   750   The data structure for mixfix annotations is defined in @{ML_file "Pure/Syntax/mixfix.ML"}.
   755   \end{readmore}
   751   \end{readmore}
   756 
   752 
   757   Lines 3 to 7 in the function @{ML spec_parser} implement the parser for a
   753   Lines 3 to 7 in the function @{ML spec_parser} implement the parser for a
   758   list of introduction rules, that is propositions with theorem annotations
   754   list of introduction rules, that is propositions with theorem annotations
   759   such as rule names and attributes. The introduction rules are propositions
   755   such as rule names and attributes. The introduction rules are propositions
   760   parsed by @{ML_ind [index] prop  in OuterParse}. However, they can include an optional
   756   parsed by @{ML_ind  prop  in OuterParse}. However, they can include an optional
   761   theorem name plus some attributes. For example
   757   theorem name plus some attributes. For example
   762 
   758 
   763 @{ML_response [display,gray] "let 
   759 @{ML_response [display,gray] "let 
   764   val input = filtered_input \"foo_lemma[intro,dest!]:\"
   760   val input = filtered_input \"foo_lemma[intro,dest!]:\"
   765   val ((name, attrib), _) = parse (SpecParse.thm_name \":\") input 
   761   val ((name, attrib), _) = parse (SpecParse.thm_name \":\") input 
   766 in 
   762 in 
   767   (name, map Args.dest_src attrib)
   763   (name, map Args.dest_src attrib)
   768 end" "(foo_lemma, [((\"intro\", []), \<dots>), ((\"dest\", [\<dots>]), \<dots>)])"}
   764 end" "(foo_lemma, [((\"intro\", []), \<dots>), ((\"dest\", [\<dots>]), \<dots>)])"}
   769  
   765  
   770   The function @{ML_ind [index] opt_thm_name in SpecParse} is the ``optional'' variant of
   766   The function @{ML_ind  opt_thm_name in SpecParse} is the ``optional'' variant of
   771   @{ML_ind [index] thm_name in SpecParse}. Theorem names can contain attributes. The name 
   767   @{ML_ind  thm_name in SpecParse}. Theorem names can contain attributes. The name 
   772   has to end with @{text [quotes] ":"}---see the argument of 
   768   has to end with @{text [quotes] ":"}---see the argument of 
   773   the function @{ML SpecParse.opt_thm_name} in Line 7.
   769   the function @{ML SpecParse.opt_thm_name} in Line 7.
   774 
   770 
   775   \begin{readmore}
   771   \begin{readmore}
   776   Attributes and arguments are implemented in the files @{ML_file "Pure/Isar/attrib.ML"} 
   772   Attributes and arguments are implemented in the files @{ML_file "Pure/Isar/attrib.ML"} 
   781 text_raw {*
   777 text_raw {*
   782   \begin{exercise}
   778   \begin{exercise}
   783   Have a look at how the parser @{ML SpecParse.where_alt_specs} is implemented
   779   Have a look at how the parser @{ML SpecParse.where_alt_specs} is implemented
   784   in file @{ML_file "Pure/Isar/spec_parse.ML"}. This parser corresponds
   780   in file @{ML_file "Pure/Isar/spec_parse.ML"}. This parser corresponds
   785   to the ``where-part'' of the introduction rules given above. Below
   781   to the ``where-part'' of the introduction rules given above. Below
   786   we paraphrase the code of @{ML_ind [index] where_alt_specs in SpecParse} adapted to our
   782   we paraphrase the code of @{ML_ind  where_alt_specs in SpecParse} adapted to our
   787   purposes. 
   783   purposes. 
   788   \begin{isabelle}
   784   \begin{isabelle}
   789 *}
   785 *}
   790 ML %linenosgray{*val spec_parser' = 
   786 ML %linenosgray{*val spec_parser' = 
   791      OuterParse.fixes -- 
   787      OuterParse.fixes -- 
   832 in
   828 in
   833   OuterSyntax.local_theory "foobar" "description of foobar" kind do_nothing
   829   OuterSyntax.local_theory "foobar" "description of foobar" kind do_nothing
   834 end *}
   830 end *}
   835 
   831 
   836 text {*
   832 text {*
   837   The crucial function @{ML_ind [index] local_theory in OuterSyntax} expects a name for the command, a
   833   The crucial function @{ML_ind  local_theory in OuterSyntax} expects a name for the command, a
   838   short description, a kind indicator (which we will explain later more thoroughly) and a
   834   short description, a kind indicator (which we will explain later more thoroughly) and a
   839   parser producing a local theory transition (its purpose will also explained
   835   parser producing a local theory transition (its purpose will also explained
   840   later). 
   836   later). 
   841 
   837 
   842   While this is everything you have to do on the ML-level, you need a keyword
   838   While this is everything you have to do on the ML-level, you need a keyword
   972   next by letting it take a proposition as argument and printing this proposition 
   968   next by letting it take a proposition as argument and printing this proposition 
   973   inside the tracing buffer. 
   969   inside the tracing buffer. 
   974 
   970 
   975   The crucial part of a command is the function that determines the behaviour
   971   The crucial part of a command is the function that determines the behaviour
   976   of the command. In the code above we used a ``do-nothing''-function, which
   972   of the command. In the code above we used a ``do-nothing''-function, which
   977   because of @{ML_ind [index] succeed in Scan} does not parse any argument, but immediately
   973   because of @{ML_ind  succeed in Scan} does not parse any argument, but immediately
   978   returns the simple function @{ML "LocalTheory.theory I"}. We can
   974   returns the simple function @{ML "LocalTheory.theory I"}. We can
   979   replace this code by a function that first parses a proposition (using the
   975   replace this code by a function that first parses a proposition (using the
   980   parser @{ML OuterParse.prop}), then prints out the tracing
   976   parser @{ML OuterParse.prop}), then prints out the tracing
   981   information (using a new function @{text trace_prop}) and 
   977   information (using a new function @{text trace_prop}) and 
   982   finally does nothing. For this you can write:
   978   finally does nothing. For this you can write:
  1002   @{text "> \"True \<and> False\""}
   998   @{text "> \"True \<and> False\""}
  1003   \end{isabelle}
   999   \end{isabelle}
  1004   
  1000   
  1005   and see the proposition in the tracing buffer.  
  1001   and see the proposition in the tracing buffer.  
  1006 
  1002 
  1007   Note that so far we used @{ML_ind [index] thy_decl in OuterKeyword} as the kind
  1003   Note that so far we used @{ML_ind  thy_decl in OuterKeyword} as the kind
  1008   indicator for the command.  This means that the command finishes as soon as
  1004   indicator for the command.  This means that the command finishes as soon as
  1009   the arguments are processed. Examples of this kind of commands are
  1005   the arguments are processed. Examples of this kind of commands are
  1010   \isacommand{definition} and \isacommand{declare}.  In other cases, commands
  1006   \isacommand{definition} and \isacommand{declare}.  In other cases, commands
  1011   are expected to parse some arguments, for example a proposition, and then
  1007   are expected to parse some arguments, for example a proposition, and then
  1012   ``open up'' a proof in order to prove the proposition (for example
  1008   ``open up'' a proof in order to prove the proposition (for example
  1013   \isacommand{lemma}) or prove some other properties (for example
  1009   \isacommand{lemma}) or prove some other properties (for example
  1014   \isacommand{function}). To achieve this kind of behaviour, you have to use
  1010   \isacommand{function}). To achieve this kind of behaviour, you have to use
  1015   the kind indicator @{ML_ind [index] thy_goal in OuterKeyword} and the function @{ML
  1011   the kind indicator @{ML_ind  thy_goal in OuterKeyword} and the function @{ML
  1016   "local_theory_to_proof" in OuterSyntax} to set up the command.  Note,
  1012   "local_theory_to_proof" in OuterSyntax} to set up the command.  Note,
  1017   however, once you change the ``kind'' of a command from @{ML thy_decl in
  1013   however, once you change the ``kind'' of a command from @{ML thy_decl in
  1018   OuterKeyword} to @{ML thy_goal in OuterKeyword} then the keyword file needs
  1014   OuterKeyword} to @{ML thy_goal in OuterKeyword} then the keyword file needs
  1019   to be re-created!
  1015   to be re-created!
  1020 
  1016 
  1039 end *}
  1035 end *}
  1040 
  1036 
  1041 text {*
  1037 text {*
  1042   The function @{text prove_prop} in Lines 2 to 7 takes a string (the proposition to be
  1038   The function @{text prove_prop} in Lines 2 to 7 takes a string (the proposition to be
  1043   proved) and a context as argument.  The context is necessary in order to be able to use
  1039   proved) and a context as argument.  The context is necessary in order to be able to use
  1044   @{ML_ind [index] read_prop in Syntax}, which converts a string into a proper proposition.
  1040   @{ML_ind  read_prop in Syntax}, which converts a string into a proper proposition.
  1045   In Line 6 the function @{ML_ind [index] theorem_i in Proof} starts the proof for the
  1041   In Line 6 the function @{ML_ind  theorem_i in Proof} starts the proof for the
  1046   proposition. Its argument @{ML NONE} stands for a locale (which we chose to
  1042   proposition. Its argument @{ML NONE} stands for a locale (which we chose to
  1047   omit); the argument @{ML "(K I)"} stands for a function that determines what
  1043   omit); the argument @{ML "(K I)"} stands for a function that determines what
  1048   should be done with the theorem once it is proved (we chose to just forget
  1044   should be done with the theorem once it is proved (we chose to just forget
  1049   about it). Line 9 contains the parser for the proposition.
  1045   about it). Line 9 contains the parser for the proposition.
  1050 
  1046 
  1064   \isacommand{apply}@{text "(rule conjI)"}\\
  1060   \isacommand{apply}@{text "(rule conjI)"}\\
  1065   \isacommand{apply}@{text "(rule TrueI)+"}\\
  1061   \isacommand{apply}@{text "(rule TrueI)+"}\\
  1066   \isacommand{done}
  1062   \isacommand{done}
  1067   \end{isabelle}
  1063   \end{isabelle}
  1068 
  1064 
  1069   (FIXME: read a name and show how to store theorems; see @{ML_ind [index] note in LocalTheory})
  1065   (FIXME: read a name and show how to store theorems; see @{ML_ind  note in LocalTheory})
  1070   
  1066   
  1071 *}
  1067 *}
  1072 
  1068 
  1073 section {* Methods (TBD) *}
  1069 section {* Methods (TBD) *}
  1074 
  1070 
  1097 
  1093 
  1098 text {*
  1094 text {*
  1099   It defines the method @{text foo}, which takes no arguments (therefore the
  1095   It defines the method @{text foo}, which takes no arguments (therefore the
  1100   parser @{ML Scan.succeed}) and only applies a single tactic, namely the tactic which 
  1096   parser @{ML Scan.succeed}) and only applies a single tactic, namely the tactic which 
  1101   applies @{thm [source] conjE} and then @{thm [source] conjI}. The function 
  1097   applies @{thm [source] conjE} and then @{thm [source] conjI}. The function 
  1102   @{ML_ind [index] SIMPLE_METHOD}
  1098   @{ML_ind  SIMPLE_METHOD}
  1103   turns such a tactic into a method. The method @{text "foo"} can be used as follows
  1099   turns such a tactic into a method. The method @{text "foo"} can be used as follows
  1104 *}
  1100 *}
  1105 
  1101 
  1106 lemma shows "A \<and> B \<Longrightarrow> C \<and> D"
  1102 lemma shows "A \<and> B \<Longrightarrow> C \<and> D"
  1107 apply(foo)
  1103 apply(foo)