ProgTutorial/Parsing.thy
changeset 256 1fb8d62c88a0
parent 250 ab9e09076462
child 258 03145998190b
equal deleted inserted replaced
255:ef1da1abee46 256:1fb8d62c88a0
    34 section {* Building Generic Parsers *}
    34 section {* Building Generic Parsers *}
    35 
    35 
    36 text {*
    36 text {*
    37 
    37 
    38   Let us first have a look at parsing strings using generic parsing
    38   Let us first have a look at parsing strings using generic parsing
    39   combinators. The function @{ML "$$"} takes a string as argument and will
    39   combinators. The function @{ML [index] "$$"} takes a string as argument and will
    40   ``consume'' this string from a given input list of strings. ``Consume'' in
    40   ``consume'' this string from a given input list of strings. ``Consume'' in
    41   this context means that it will return a pair consisting of this string and
    41   this context means that it will return a pair consisting of this string and
    42   the rest of the input list. For example:
    42   the rest of the input list. For example:
    43 
    43 
    44   @{ML_response [display,gray] 
    44   @{ML_response [display,gray] 
    68   \end{itemize}
    68   \end{itemize}
    69 
    69 
    70   However, note that these exceptions are private to the parser and cannot be accessed
    70   However, note that these exceptions are private to the parser and cannot be accessed
    71   by the programmer (for example to handle them).
    71   by the programmer (for example to handle them).
    72 
    72 
       
    73   \indexdef{explode@ {\tt\slshape{}explode}}{in {\tt\slshape Symbol}}
       
    74   \index{explode@ {\tt\slshape{}explode}}
    73   In the examples above we use the function @{ML Symbol.explode}, instead of the 
    75   In the examples above we use the function @{ML Symbol.explode}, instead of the 
    74   more standard library function @{ML explode}, for obtaining an input list for 
    76   more standard library function @{ML explode}, for obtaining an input list for 
    75   the parser. The reason is that @{ML Symbol.explode} is aware of character sequences,
    77   the parser. The reason is that @{ML Symbol.explode} is aware of character sequences,
    76   for example @{text "\\<foo>"}, that have a special meaning in Isabelle. To see
    78   for example @{text "\\<foo>"}, that have a special meaning in Isabelle. To see
    77   the difference consider
    79   the difference consider
    83   (explode input, Symbol.explode input)
    85   (explode input, Symbol.explode input)
    84 end"
    86 end"
    85 "([\"\\\", \"<\", \"f\", \"o\", \"o\", \">\", \" \", \"b\", \"a\", \"r\"],
    87 "([\"\\\", \"<\", \"f\", \"o\", \"o\", \">\", \" \", \"b\", \"a\", \"r\"],
    86  [\"\\<foo>\", \" \", \"b\", \"a\", \"r\"])"}
    88  [\"\\<foo>\", \" \", \"b\", \"a\", \"r\"])"}
    87 
    89 
    88   Slightly more general than the parser @{ML "$$"} is the function @{ML
    90   Slightly more general than the parser @{ML "$$"} is the function 
    89   Scan.one}, in that it takes a predicate as argument and then parses exactly
    91   @{ML [index] one in Scan}, in that it takes a predicate as argument and 
       
    92   then parses exactly
    90   one item from the input list satisfying this predicate. For example the
    93   one item from the input list satisfying this predicate. For example the
    91   following parser either consumes an @{text [quotes] "h"} or a @{text
    94   following parser either consumes an @{text [quotes] "h"} or a @{text
    92   [quotes] "w"}:
    95   [quotes] "w"}:
    93 
    96 
    94 @{ML_response [display,gray] 
    97 @{ML_response [display,gray] 
    99 in
   102 in
   100   (hw input1, hw input2)
   103   (hw input1, hw input2)
   101 end"
   104 end"
   102     "((\"h\", [\"e\", \"l\", \"l\", \"o\"]),(\"w\", [\"o\", \"r\", \"l\", \"d\"]))"}
   105     "((\"h\", [\"e\", \"l\", \"l\", \"o\"]),(\"w\", [\"o\", \"r\", \"l\", \"d\"]))"}
   103 
   106 
   104   Two parsers can be connected in sequence by using the function @{ML "--"}. 
   107   Two parsers can be connected in sequence by using the function @{ML [index] "--"}. 
   105   For example parsing @{text "h"}, @{text "e"} and @{text "l"} (in this 
   108   For example parsing @{text "h"}, @{text "e"} and @{text "l"} (in this 
   106   order) you can achieve by:
   109   order) you can achieve by:
   107 
   110 
   108   @{ML_response [display,gray] 
   111   @{ML_response [display,gray] 
   109   "($$ \"h\" -- $$ \"e\" -- $$ \"l\") (Symbol.explode \"hello\")"
   112   "($$ \"h\" -- $$ \"e\" -- $$ \"l\") (Symbol.explode \"hello\")"
   110   "(((\"h\", \"e\"), \"l\"), [\"l\", \"o\"])"}
   113   "(((\"h\", \"e\"), \"l\"), [\"l\", \"o\"])"}
   111 
   114 
   112   Note how the result of consumed strings builds up on the left as nested pairs.  
   115   Note how the result of consumed strings builds up on the left as nested pairs.  
   113 
   116 
   114   If, as in the previous example, you want to parse a particular string, 
   117   If, as in the previous example, you want to parse a particular string, 
   115   then you should use the function @{ML Scan.this_string}:
   118   then you should use the function @{ML [index] this_string in Scan}:
   116 
   119 
   117   @{ML_response [display,gray] 
   120   @{ML_response [display,gray] 
   118   "Scan.this_string \"hell\" (Symbol.explode \"hello\")"
   121   "Scan.this_string \"hell\" (Symbol.explode \"hello\")"
   119   "(\"hell\", [\"o\"])"}
   122   "(\"hell\", [\"o\"])"}
   120 
   123 
   121   Parsers that explore alternatives can be constructed using the function @{ML
   124   Parsers that explore alternatives can be constructed using the function 
   122   "||"}. The parser @{ML "(p || q)" for p q} returns the
   125   @{ML [index] "||"}. The parser @{ML "(p || q)" for p q} returns the
   123   result of @{text "p"}, in case it succeeds, otherwise it returns the
   126   result of @{text "p"}, in case it succeeds, otherwise it returns the
   124   result of @{text "q"}. For example:
   127   result of @{text "q"}. For example:
   125 
   128 
   126 
   129 
   127 @{ML_response [display,gray] 
   130 @{ML_response [display,gray] 
   132 in
   135 in
   133   (hw input1, hw input2)
   136   (hw input1, hw input2)
   134 end"
   137 end"
   135   "((\"h\", [\"e\", \"l\", \"l\", \"o\"]), (\"w\", [\"o\", \"r\", \"l\", \"d\"]))"}
   138   "((\"h\", [\"e\", \"l\", \"l\", \"o\"]), (\"w\", [\"o\", \"r\", \"l\", \"d\"]))"}
   136 
   139 
   137   The functions @{ML "|--"} and @{ML "--|"} work like the sequencing function 
   140   The functions @{ML [index] "|--"} and @{ML [index] "--|"} work like the sequencing function 
   138   for parsers, except that they discard the item being parsed by the first (respectively second)
   141   for parsers, except that they discard the item being parsed by the first (respectively second)
   139   parser. For example:
   142   parser. For example:
   140   
   143   
   141 @{ML_response [display,gray]
   144 @{ML_response [display,gray]
   142 "let 
   145 "let 
   146 in 
   149 in 
   147   (just_e input, just_h input)
   150   (just_e input, just_h input)
   148 end"
   151 end"
   149   "((\"e\", [\"l\", \"l\", \"o\"]), (\"h\", [\"l\", \"l\", \"o\"]))"}
   152   "((\"e\", [\"l\", \"l\", \"o\"]), (\"h\", [\"l\", \"l\", \"o\"]))"}
   150 
   153 
       
   154   \indexdef{optional@ {\tt\slshape{optional}}}{in {\tt\slshape Scan}}
   151   The parser @{ML "Scan.optional p x" for p x} returns the result of the parser 
   155   The parser @{ML "Scan.optional p x" for p x} returns the result of the parser 
   152   @{text "p"}, if it succeeds; otherwise it returns 
   156   @{text "p"}, if it succeeds; otherwise it returns 
   153   the default value @{text "x"}. For example:
   157   the default value @{text "x"}. For example:
   154 
   158 
   155 @{ML_response [display,gray]
   159 @{ML_response [display,gray]
   160 in 
   164 in 
   161   (p input1, p input2)
   165   (p input1, p input2)
   162 end" 
   166 end" 
   163  "((\"h\", [\"e\", \"l\", \"l\", \"o\"]), (\"x\", [\"w\", \"o\", \"r\", \"l\", \"d\"]))"}
   167  "((\"h\", [\"e\", \"l\", \"l\", \"o\"]), (\"x\", [\"w\", \"o\", \"r\", \"l\", \"d\"]))"}
   164 
   168 
   165   The function @{ML Scan.option} works similarly, except no default value can
   169   The function @{ML [index] option in Scan} works similarly, except no default value can
   166   be given. Instead, the result is wrapped as an @{text "option"}-type. For example:
   170   be given. Instead, the result is wrapped as an @{text "option"}-type. For example:
   167 
   171 
   168 @{ML_response [display,gray]
   172 @{ML_response [display,gray]
   169 "let 
   173 "let 
   170   val p = Scan.option ($$ \"h\")
   174   val p = Scan.option ($$ \"h\")
   172   val input2 = Symbol.explode \"world\"  
   176   val input2 = Symbol.explode \"world\"  
   173 in 
   177 in 
   174   (p input1, p input2)
   178   (p input1, p input2)
   175 end" "((SOME \"h\", [\"e\", \"l\", \"l\", \"o\"]), (NONE, [\"w\", \"o\", \"r\", \"l\", \"d\"]))"}
   179 end" "((SOME \"h\", [\"e\", \"l\", \"l\", \"o\"]), (NONE, [\"w\", \"o\", \"r\", \"l\", \"d\"]))"}
   176 
   180 
   177   The function @{ML "!!"} helps to produce appropriate error messages
   181   The function @{ML [index] "!!"} helps to produce appropriate error messages
   178   for parsing. For example if you want to parse @{text p} immediately 
   182   for parsing. For example if you want to parse @{text p} immediately 
   179   followed by @{text q}, or start a completely different parser @{text r},
   183   followed by @{text q}, or start a completely different parser @{text r},
   180   you might write:
   184   you might write:
   181 
   185 
   182   @{ML [display,gray] "(p -- q) || r" for p q r}
   186   @{ML [display,gray] "(p -- q) || r" for p q r}
   208   @{ML_response_fake [display,gray] "(!! (fn _ => \"foo\") ($$ \"h\")) (Symbol.explode \"world\")"
   212   @{ML_response_fake [display,gray] "(!! (fn _ => \"foo\") ($$ \"h\")) (Symbol.explode \"world\")"
   209                                "Exception ABORT raised"}
   213                                "Exception ABORT raised"}
   210 
   214 
   211   then the parsing aborts and the error message @{text "foo"} is printed. In order to
   215   then the parsing aborts and the error message @{text "foo"} is printed. In order to
   212   see the error message properly, you need to prefix the parser with the function 
   216   see the error message properly, you need to prefix the parser with the function 
   213   @{ML "Scan.error"}. For example:
   217   @{ML [index] error in Scan}. For example:
   214 
   218 
   215   @{ML_response_fake [display,gray] 
   219   @{ML_response_fake [display,gray] 
   216   "Scan.error (!! (fn _ => \"foo\") ($$ \"h\"))"
   220   "Scan.error (!! (fn _ => \"foo\") ($$ \"h\"))"
   217   "Exception Error \"foo\" raised"}
   221   "Exception Error \"foo\" raised"}
   218 
   222 
   219   This ``prefixing'' is usually done by wrappers such as @{ML "OuterSyntax.local_theory"} 
   223   This ``prefixing'' is usually done by wrappers such as @{ML [index] local_theory in OuterSyntax} 
   220   (see Section~\ref{sec:newcommand} which explains this function in more detail). 
   224   (see Section~\ref{sec:newcommand} which explains this function in more detail). 
   221   
   225   
   222   Let us now return to our example of parsing @{ML "(p -- q) || r" for p q
   226   Let us now return to our example of parsing @{ML "(p -- q) || r" for p q
   223   r}. If you want to generate the correct error message for 
   227   r}. If you want to generate the correct error message for 
   224   @{text "p"}-followed-by-@{text "q"}, then you have to write:
   228   @{text "p"}-followed-by-@{text "q"}, then you have to write:
   251   often as it succeeds. For example:
   255   often as it succeeds. For example:
   252   
   256   
   253   @{ML_response [display,gray] "Scan.repeat ($$ \"h\") (Symbol.explode \"hhhhello\")" 
   257   @{ML_response [display,gray] "Scan.repeat ($$ \"h\") (Symbol.explode \"hhhhello\")" 
   254                 "([\"h\", \"h\", \"h\", \"h\"], [\"e\", \"l\", \"l\", \"o\"])"}
   258                 "([\"h\", \"h\", \"h\", \"h\"], [\"e\", \"l\", \"l\", \"o\"])"}
   255   
   259   
   256   Note that @{ML "Scan.repeat"} stores the parsed items in a list. The function
   260   Note that @{ML [index] repeat in Scan} stores the parsed items in a list. The function
   257   @{ML "Scan.repeat1"} is similar, but requires that the parser @{text "p"} 
   261   @{ML [index] repeat1 in Scan} is similar, but requires that the parser @{text "p"} 
   258   succeeds at least once.
   262   succeeds at least once.
   259 
   263 
   260   Also note that the parser would have aborted with the exception @{text MORE}, if
   264   Also note that the parser would have aborted with the exception @{text MORE}, if
   261   you had run it only on just @{text [quotes] "hhhh"}. This can be avoided by using
   265   you had run it only on just @{text [quotes] "hhhh"}. This can be avoided by using
   262   the wrapper @{ML Scan.finite} and the ``stopper-token'' @{ML Symbol.stopper}. With
   266   the wrapper @{ML [index] finite in Scan} and the ``stopper-token'' 
   263   them you can write:
   267   @{ML [index] stopper in Symbol}. With them you can write:
   264   
   268   
   265   @{ML_response [display,gray] "Scan.finite Symbol.stopper (Scan.repeat ($$ \"h\")) (Symbol.explode \"hhhh\")" 
   269   @{ML_response [display,gray] "Scan.finite Symbol.stopper (Scan.repeat ($$ \"h\")) (Symbol.explode \"hhhh\")" 
   266                 "([\"h\", \"h\", \"h\", \"h\"], [])"}
   270                 "([\"h\", \"h\", \"h\", \"h\"], [])"}
   267 
   271 
   268   @{ML Symbol.stopper} is the ``end-of-input'' indicator for parsing strings;
   272   @{ML Symbol.stopper} is the ``end-of-input'' indicator for parsing strings;
   269   other stoppers need to be used when parsing, for example, tokens. However, this kind of 
   273   other stoppers need to be used when parsing, for example, tokens. However, this kind of 
   270   manually wrapping is often already done by the surrounding infrastructure. 
   274   manually wrapping is often already done by the surrounding infrastructure. 
   271 
   275 
   272   The function @{ML Scan.repeat} can be used with @{ML Scan.one} to read any 
   276   The function @{ML [index] repeat in Scan} can be used with @{ML [index] one in Scan} to read any 
   273   string as in
   277   string as in
   274 
   278 
   275   @{ML_response [display,gray] 
   279   @{ML_response [display,gray] 
   276 "let 
   280 "let 
   277    val p = Scan.repeat (Scan.one Symbol.not_eof)
   281    val p = Scan.repeat (Scan.one Symbol.not_eof)
   279 in
   283 in
   280    Scan.finite Symbol.stopper p input
   284    Scan.finite Symbol.stopper p input
   281 end" 
   285 end" 
   282 "([\"f\", \"o\", \"o\", \" \", \"b\", \"a\", \"r\", \" \", \"f\", \"o\", \"o\"], [])"}
   286 "([\"f\", \"o\", \"o\", \" \", \"b\", \"a\", \"r\", \" \", \"f\", \"o\", \"o\"], [])"}
   283 
   287 
   284   where the function @{ML Symbol.not_eof} ensures that we do not read beyond the 
   288   where the function @{ML [index] not_eof in Symbol} ensures that we do not read beyond the 
   285   end of the input string (i.e.~stopper symbol).
   289   end of the input string (i.e.~stopper symbol).
   286 
   290 
       
   291   \indexdef{unless@ {\tt\slshape{unless}}}{in {\tt\slshape Scan}}
   287   The function @{ML "Scan.unless p q" for p q} takes two parsers: if the first one can 
   292   The function @{ML "Scan.unless p q" for p q} takes two parsers: if the first one can 
   288   parse the input, then the whole parser fails; if not, then the second is tried. Therefore
   293   parse the input, then the whole parser fails; if not, then the second is tried. Therefore
   289   
   294   
   290   @{ML_response_fake_both [display,gray] "Scan.unless ($$ \"h\") ($$ \"w\") (Symbol.explode \"hello\")"
   295   @{ML_response_fake_both [display,gray] "Scan.unless ($$ \"h\") ($$ \"w\") (Symbol.explode \"hello\")"
   291                                "Exception FAIL raised"}
   296                                "Exception FAIL raised"}
   295   @{ML_response [display,gray] "Scan.unless ($$ \"h\") ($$ \"w\") (Symbol.explode \"world\")"
   300   @{ML_response [display,gray] "Scan.unless ($$ \"h\") ($$ \"w\") (Symbol.explode \"world\")"
   296                           "(\"w\",[\"o\", \"r\", \"l\", \"d\"])"}
   301                           "(\"w\",[\"o\", \"r\", \"l\", \"d\"])"}
   297 
   302 
   298   succeeds. 
   303   succeeds. 
   299 
   304 
   300   The functions @{ML Scan.repeat} and @{ML Scan.unless} can be combined to read any
   305   The functions @{ML [index] repeat in Scan} and @{ML [index] unless in Scan} can 
   301   input until a certain marker symbol is reached. In the example below the marker
   306   be combined to read any input until a certain marker symbol is reached. In the 
   302   symbol is a @{text [quotes] "*"}.
   307   example below the marker symbol is a @{text [quotes] "*"}.
   303 
   308 
   304   @{ML_response [display,gray]
   309   @{ML_response [display,gray]
   305 "let 
   310 "let 
   306   val p = Scan.repeat (Scan.unless ($$ \"*\") (Scan.one Symbol.not_eof))
   311   val p = Scan.repeat (Scan.unless ($$ \"*\") (Scan.one Symbol.not_eof))
   307   val input1 = Symbol.explode \"fooooo\"
   312   val input1 = Symbol.explode \"fooooo\"
   311    Scan.finite Symbol.stopper p input2)
   316    Scan.finite Symbol.stopper p input2)
   312 end"
   317 end"
   313 "(([\"f\", \"o\", \"o\", \"o\", \"o\", \"o\"], []),
   318 "(([\"f\", \"o\", \"o\", \"o\", \"o\", \"o\"], []),
   314  ([\"f\", \"o\", \"o\"], [\"*\", \"o\", \"o\", \"o\"]))"}
   319  ([\"f\", \"o\", \"o\"], [\"*\", \"o\", \"o\", \"o\"]))"}
   315 
   320 
       
   321   
   316   After parsing is done, you almost always want to apply a function to the parsed 
   322   After parsing is done, you almost always want to apply a function to the parsed 
   317   items. One way to do this is the function @{ML "(p >> f)" for p f}, which runs 
   323   items. One way to do this is the function @{ML [index]">>"} where 
       
   324   @{ML "(p >> f)" for p f} runs 
   318   first the parser @{text p} and upon successful completion applies the 
   325   first the parser @{text p} and upon successful completion applies the 
   319   function @{text f} to the result. For example
   326   function @{text f} to the result. For example
   320 
   327 
   321 @{ML_response [display,gray]
   328 @{ML_response [display,gray]
   322 "let 
   329 "let 
   340   where the single-character strings in the parsed output are transformed
   347   where the single-character strings in the parsed output are transformed
   341   back into one string.
   348   back into one string.
   342  
   349  
   343   (FIXME:  move to an earlier place)
   350   (FIXME:  move to an earlier place)
   344 
   351 
   345   The function @{ML Scan.ahead} parses some input, but leaves the original
   352   The function @{ML [index] ahead in Scan} parses some input, but leaves the original
   346   input unchanged. For example:
   353   input unchanged. For example:
   347 
   354 
   348   @{ML_response [display,gray]
   355   @{ML_response [display,gray]
   349   "Scan.ahead (Scan.this_string \"foo\") (Symbol.explode \"foo\")" 
   356   "Scan.ahead (Scan.this_string \"foo\") (Symbol.explode \"foo\")" 
   350   "(\"foo\", [\"f\", \"o\", \"o\"])"} 
   357   "(\"foo\", [\"f\", \"o\", \"o\"])"} 
   351 
   358 
   352   The function @{ML Scan.lift} takes a parser and a pair as arguments. This function applies
   359   The function @{ML [index] lift in Scan} takes a parser and a pair as arguments. This function applies
   353   the given parser to the second component of the pair and leaves the  first component 
   360   the given parser to the second component of the pair and leaves the  first component 
   354   untouched. For example
   361   untouched. For example
   355 
   362 
   356 @{ML_response [display,gray]
   363 @{ML_response [display,gray]
   357 "Scan.lift ($$ \"h\" -- $$ \"e\") (1, Symbol.explode \"hello\")"
   364 "Scan.lift ($$ \"h\" -- $$ \"e\") (1, Symbol.explode \"hello\")"
   387   The parser functions for the theory syntax are contained in the structure
   394   The parser functions for the theory syntax are contained in the structure
   388   @{ML_struct OuterParse} defined in the file @{ML_file  "Pure/Isar/outer_parse.ML"}.
   395   @{ML_struct OuterParse} defined in the file @{ML_file  "Pure/Isar/outer_parse.ML"}.
   389   The definition for tokens is in the file @{ML_file "Pure/Isar/outer_lex.ML"}.
   396   The definition for tokens is in the file @{ML_file "Pure/Isar/outer_lex.ML"}.
   390   \end{readmore}
   397   \end{readmore}
   391 
   398 
   392   The structure @{ML_struct OuterLex} defines several kinds of tokens (for
   399   The structure @{ML_struct [index] OuterLex} defines several kinds of tokens (for
   393   example @{ML "Ident" in OuterLex} for identifiers, @{ML "Keyword" in
   400   example @{ML [index] Ident in OuterLex} for identifiers, @{ML Keyword in
   394   OuterLex} for keywords and @{ML "Command" in OuterLex} for commands). Some
   401   OuterLex} for keywords and @{ML [index] Command in OuterLex} for commands). Some
   395   token parsers take into account the kind of tokens. The first example shows
   402   token parsers take into account the kind of tokens. The first example shows
   396   how to generate a token list out of a string using the function @{ML
   403   how to generate a token list out of a string using the function 
   397   "OuterSyntax.scan"}. It is given the argument @{ML "Position.none"} since,
   404   @{ML [index] scan in OuterSyntax}. It is given the argument 
       
   405   @{ML "Position.none"} since,
   398   at the moment, we are not interested in generating precise error
   406   at the moment, we are not interested in generating precise error
   399   messages. The following code\footnote{Note that because of a possible bug in
   407   messages. The following code\footnote{Note that because of a possible bug in
   400   the PolyML runtime system, the result is printed as @{text [quotes] "?"},
   408   the PolyML runtime system, the result is printed as @{text [quotes] "?"},
   401   instead of the tokens.}
   409   instead of the tokens.}
   402 
   410 
   409   produces three tokens where the first and the last are identifiers, since
   417   produces three tokens where the first and the last are identifiers, since
   410   @{text [quotes] "hello"} and @{text [quotes] "world"} do not match any
   418   @{text [quotes] "hello"} and @{text [quotes] "world"} do not match any
   411   other syntactic category. The second indicates a space.
   419   other syntactic category. The second indicates a space.
   412 
   420 
   413   We can easily change what is recognised as a keyword with 
   421   We can easily change what is recognised as a keyword with 
   414   @{ML OuterKeyword.keyword}. For example calling this function 
   422   @{ML [index] keyword in OuterKeyword}. For example calling this function 
   415 *}
   423 *}
   416 
   424 
   417 ML{*val _ = OuterKeyword.keyword "hello"*}
   425 ML{*val _ = OuterKeyword.keyword "hello"*}
   418 
   426 
   419 text {*
   427 text {*
   424  Token (\<dots>,(Space, \" \"),\<dots>), 
   432  Token (\<dots>,(Space, \" \"),\<dots>), 
   425  Token (\<dots>,(Ident, \"world\"),\<dots>)]"}
   433  Token (\<dots>,(Ident, \"world\"),\<dots>)]"}
   426 
   434 
   427   Many parsing functions later on will require white space, comments and the like
   435   Many parsing functions later on will require white space, comments and the like
   428   to have already been filtered out.  So from now on we are going to use the 
   436   to have already been filtered out.  So from now on we are going to use the 
   429   functions @{ML filter} and @{ML OuterLex.is_proper} to do this. For example:
   437   functions @{ML filter} and @{ML [index] is_proper in OuterLex} to do this. 
       
   438   For example:
   430 
   439 
   431 @{ML_response_fake [display,gray]
   440 @{ML_response_fake [display,gray]
   432 "let
   441 "let
   433    val input = OuterSyntax.scan Position.none \"hello world\"
   442    val input = OuterSyntax.scan Position.none \"hello world\"
   434 in
   443 in
   461 in 
   470 in 
   462   (Scan.dest_lexicon commands, Scan.dest_lexicon keywords)
   471   (Scan.dest_lexicon commands, Scan.dest_lexicon keywords)
   463 end" 
   472 end" 
   464 "([\"}\", \"{\", \<dots>], [\"\<rightleftharpoons>\", \"\<leftharpoondown>\", \<dots>])"}
   473 "([\"}\", \"{\", \<dots>], [\"\<rightleftharpoons>\", \"\<leftharpoondown>\", \<dots>])"}
   465 
   474 
   466   You might have to adjust the @{ML print_depth} in order to
   475   You might have to adjust the @{ML [index] print_depth} in order to
   467   see the complete list.
   476   see the complete list.
   468 
   477 
   469   The parser @{ML "OuterParse.$$$"} parses a single keyword. For example:
   478   The parser @{ML [index] "$$$" in OuterParse} parses a single keyword. For example:
   470 
   479 
   471 @{ML_response [display,gray]
   480 @{ML_response [display,gray]
   472 "let 
   481 "let 
   473   val input1 = filtered_input \"where for\"
   482   val input1 = filtered_input \"where for\"
   474   val input2 = filtered_input \"| in\"
   483   val input2 = filtered_input \"| in\"
   475 in 
   484 in 
   476   (OuterParse.$$$ \"where\" input1, OuterParse.$$$ \"|\" input2)
   485   (OuterParse.$$$ \"where\" input1, OuterParse.$$$ \"|\" input2)
   477 end"
   486 end"
   478 "((\"where\",\<dots>), (\"|\",\<dots>))"}
   487 "((\"where\",\<dots>), (\"|\",\<dots>))"}
   479 
   488 
   480   Any non-keyword string can be parsed with the function @{ML OuterParse.reserved}.
   489   Any non-keyword string can be parsed with the function @{ML [index] reserved in OuterParse}.
   481   For example:
   490   For example:
   482 
   491 
   483   @{ML_response [display,gray]
   492   @{ML_response [display,gray]
   484 "let 
   493 "let 
   485   val p = OuterParse.reserved \"bar\"
   494   val p = OuterParse.reserved \"bar\"
   487 in
   496 in
   488   p input
   497   p input
   489 end"
   498 end"
   490   "(\"bar\",[])"}
   499   "(\"bar\",[])"}
   491 
   500 
   492   Like before, you can sequentially connect parsers with @{ML "--"}. For example: 
   501   Like before, you can sequentially connect parsers with @{ML [index] "--"}. For example: 
   493 
   502 
   494 @{ML_response [display,gray]
   503 @{ML_response [display,gray]
   495 "let 
   504 "let 
   496   val input = filtered_input \"| in\"
   505   val input = filtered_input \"| in\"
   497 in 
   506 in 
   498   (OuterParse.$$$ \"|\" -- OuterParse.$$$ \"in\") input
   507   (OuterParse.$$$ \"|\" -- OuterParse.$$$ \"in\") input
   499 end"
   508 end"
   500 "((\"|\", \"in\"), [])"}
   509 "((\"|\", \"in\"), [])"}
   501 
   510 
       
   511   \indexdef{enum@ {\tt\slshape{enum}}}{in {\tt\slshape OuterParse}}
   502   The parser @{ML "OuterParse.enum s p" for s p} parses a possibly empty 
   512   The parser @{ML "OuterParse.enum s p" for s p} parses a possibly empty 
   503   list of items recognised by the parser @{text p}, where the items being parsed
   513   list of items recognised by the parser @{text p}, where the items being parsed
   504   are separated by the string @{text s}. For example:
   514   are separated by the string @{text s}. For example:
   505 
   515 
   506 @{ML_response [display,gray]
   516 @{ML_response [display,gray]
   509 in 
   519 in 
   510   (OuterParse.enum \"|\" (OuterParse.$$$ \"in\")) input
   520   (OuterParse.enum \"|\" (OuterParse.$$$ \"in\")) input
   511 end" 
   521 end" 
   512 "([\"in\", \"in\", \"in\"], [\<dots>])"}
   522 "([\"in\", \"in\", \"in\"], [\<dots>])"}
   513 
   523 
   514   @{ML "OuterParse.enum1"} works similarly, except that the parsed list must
   524   @{ML [index] enum1 in OuterParse} works similarly, except that the parsed list must
   515   be non-empty. Note that we had to add a string @{text [quotes] "foo"} at the
   525   be non-empty. Note that we had to add a string @{text [quotes] "foo"} at the
   516   end of the parsed string, otherwise the parser would have consumed all
   526   end of the parsed string, otherwise the parser would have consumed all
   517   tokens and then failed with the exception @{text "MORE"}. Like in the
   527   tokens and then failed with the exception @{text "MORE"}. Like in the
   518   previous section, we can avoid this exception using the wrapper @{ML
   528   previous section, we can avoid this exception using the wrapper @{ML
   519   Scan.finite}. This time, however, we have to use the ``stopper-token'' @{ML
   529   Scan.finite}. This time, however, we have to use the ``stopper-token'' @{ML
   533 
   543 
   534 ML{*fun parse p input = Scan.finite OuterLex.stopper (Scan.error p) input *}
   544 ML{*fun parse p input = Scan.finite OuterLex.stopper (Scan.error p) input *}
   535 
   545 
   536 text {*
   546 text {*
   537 
   547 
   538   The function @{ML "OuterParse.!!!"} can be used to force termination of the
   548   The function @{ML [index] "!!!" in OuterParse} can be used to force termination of the
   539   parser in case of a dead end, just like @{ML "Scan.!!"} (see previous section). 
   549   parser in case of a dead end, just like @{ML "Scan.!!"} (see previous section). 
   540   Except that the error message of @{ML "OuterParse.!!!"} is fixed to be 
   550   Except that the error message of @{ML "OuterParse.!!!"} is fixed to be 
   541   @{text [quotes] "Outer syntax error"}
   551   @{text [quotes] "Outer syntax error"}
   542   together with a relatively precise description of the failure. For example:
   552   together with a relatively precise description of the failure. For example:
   543 
   553 
   552 but keyword in was found\" raised"
   562 but keyword in was found\" raised"
   553 }
   563 }
   554 
   564 
   555   \begin{exercise} (FIXME)
   565   \begin{exercise} (FIXME)
   556   A type-identifier, for example @{typ "'a"}, is a token of 
   566   A type-identifier, for example @{typ "'a"}, is a token of 
   557   kind @{ML "Keyword" in OuterLex}. It can be parsed using 
   567   kind @{ML [index] Keyword in OuterLex}. It can be parsed using 
   558   the function @{ML OuterParse.type_ident}.
   568   the function @{ML type_ident in OuterParse}.
   559   \end{exercise}
   569   \end{exercise}
   560 
   570 
   561   (FIXME: or give parser for numbers)
   571   (FIXME: or give parser for numbers)
   562 
   572 
   563   Whenever there is a possibility that the processing of user input can fail, 
   573   Whenever there is a possibility that the processing of user input can fail, 
   711 
   721 
   712   As you see, the result is a pair consisting of a list of
   722   As you see, the result is a pair consisting of a list of
   713   variables with optional type-annotation and syntax-annotation, and a list of
   723   variables with optional type-annotation and syntax-annotation, and a list of
   714   rules where every rule has optionally a name and an attribute.
   724   rules where every rule has optionally a name and an attribute.
   715 
   725 
   716   The function @{ML OuterParse.fixes} in Line 2 of the parser reads an 
   726   The function @{ML [index] "fixes" in OuterParse} in Line 2 of the parser reads an 
   717   \isacommand{and}-separated 
   727   \isacommand{and}-separated 
   718   list of variables that can include optional type annotations and syntax translations. 
   728   list of variables that can include optional type annotations and syntax translations. 
   719   For example:\footnote{Note that in the code we need to write 
   729   For example:\footnote{Note that in the code we need to write 
   720   @{text "\\\"int \<Rightarrow> bool\\\""} in order to properly escape the double quotes
   730   @{text "\\\"int \<Rightarrow> bool\\\""} in order to properly escape the double quotes
   721   in the compound type.}
   731   in the compound type.}
   735 text {*
   745 text {*
   736   Whenever types are given, they are stored in the @{ML SOME}s. The types are
   746   Whenever types are given, they are stored in the @{ML SOME}s. The types are
   737   not yet used to type the variables: this must be done by type-inference later
   747   not yet used to type the variables: this must be done by type-inference later
   738   on. Since types are part of the inner syntax they are strings with some
   748   on. Since types are part of the inner syntax they are strings with some
   739   encoded information (see previous section). If a mixfix-syntax is
   749   encoded information (see previous section). If a mixfix-syntax is
   740   present for a variable, then it is stored in the @{ML Mixfix} data structure;
   750   present for a variable, then it is stored in the @{ML [index] Mixfix} data structure;
   741   no syntax translation is indicated by @{ML NoSyn}.
   751   no syntax translation is indicated by @{ML [index] NoSyn}.
   742 
   752 
   743   \begin{readmore}
   753   \begin{readmore}
   744   The data structure for mixfix annotations is defined in @{ML_file "Pure/Syntax/mixfix.ML"}.
   754   The data structure for mixfix annotations is defined in @{ML_file "Pure/Syntax/mixfix.ML"}.
   745   \end{readmore}
   755   \end{readmore}
   746 
   756 
   747   Lines 3 to 7 in the function @{ML spec_parser} implement the parser for a
   757   Lines 3 to 7 in the function @{ML spec_parser} implement the parser for a
   748   list of introduction rules, that is propositions with theorem annotations
   758   list of introduction rules, that is propositions with theorem annotations
   749   such as rule names and attributes. The introduction rules are propositions
   759   such as rule names and attributes. The introduction rules are propositions
   750   parsed by @{ML OuterParse.prop}. However, they can include an optional
   760   parsed by @{ML [index] prop  in OuterParse}. However, they can include an optional
   751   theorem name plus some attributes. For example
   761   theorem name plus some attributes. For example
   752 
   762 
   753 @{ML_response [display,gray] "let 
   763 @{ML_response [display,gray] "let 
   754   val input = filtered_input \"foo_lemma[intro,dest!]:\"
   764   val input = filtered_input \"foo_lemma[intro,dest!]:\"
   755   val ((name, attrib), _) = parse (SpecParse.thm_name \":\") input 
   765   val ((name, attrib), _) = parse (SpecParse.thm_name \":\") input 
   756 in 
   766 in 
   757   (name, map Args.dest_src attrib)
   767   (name, map Args.dest_src attrib)
   758 end" "(foo_lemma, [((\"intro\", []), \<dots>), ((\"dest\", [\<dots>]), \<dots>)])"}
   768 end" "(foo_lemma, [((\"intro\", []), \<dots>), ((\"dest\", [\<dots>]), \<dots>)])"}
   759  
   769  
   760   The function @{ML opt_thm_name in SpecParse} is the ``optional'' variant of
   770   The function @{ML [index] opt_thm_name in SpecParse} is the ``optional'' variant of
   761   @{ML thm_name in SpecParse}. Theorem names can contain attributes. The name 
   771   @{ML [index] thm_name in SpecParse}. Theorem names can contain attributes. The name 
   762   has to end with @{text [quotes] ":"}---see the argument of 
   772   has to end with @{text [quotes] ":"}---see the argument of 
   763   the function @{ML SpecParse.opt_thm_name} in Line 7.
   773   the function @{ML SpecParse.opt_thm_name} in Line 7.
   764 
   774 
   765   \begin{readmore}
   775   \begin{readmore}
   766   Attributes and arguments are implemented in the files @{ML_file "Pure/Isar/attrib.ML"} 
   776   Attributes and arguments are implemented in the files @{ML_file "Pure/Isar/attrib.ML"} 
   771 text_raw {*
   781 text_raw {*
   772   \begin{exercise}
   782   \begin{exercise}
   773   Have a look at how the parser @{ML SpecParse.where_alt_specs} is implemented
   783   Have a look at how the parser @{ML SpecParse.where_alt_specs} is implemented
   774   in file @{ML_file "Pure/Isar/spec_parse.ML"}. This parser corresponds
   784   in file @{ML_file "Pure/Isar/spec_parse.ML"}. This parser corresponds
   775   to the ``where-part'' of the introduction rules given above. Below
   785   to the ``where-part'' of the introduction rules given above. Below
   776   we paraphrase the code of @{ML SpecParse.where_alt_specs} adapted to our
   786   we paraphrase the code of @{ML [index] where_alt_specs in SpecParse} adapted to our
   777   purposes. 
   787   purposes. 
   778   \begin{isabelle}
   788   \begin{isabelle}
   779 *}
   789 *}
   780 ML %linenosgray{*val spec_parser' = 
   790 ML %linenosgray{*val spec_parser' = 
   781      OuterParse.fixes -- 
   791      OuterParse.fixes -- 
   820 in
   830 in
   821   OuterSyntax.local_theory "foobar" "description of foobar" kind do_nothing
   831   OuterSyntax.local_theory "foobar" "description of foobar" kind do_nothing
   822 end *}
   832 end *}
   823 
   833 
   824 text {*
   834 text {*
   825   The crucial function @{ML OuterSyntax.local_theory} expects a name for the command, a
   835   The crucial function @{ML [index] local_theory in OuterSyntax} expects a name for the command, a
   826   short description, a kind indicator (which we will explain later more thoroughly) and a
   836   short description, a kind indicator (which we will explain later more thoroughly) and a
   827   parser producing a local theory transition (its purpose will also explained
   837   parser producing a local theory transition (its purpose will also explained
   828   later). 
   838   later). 
   829 
   839 
   830   While this is everything you have to do on the ML-level, you need a keyword
   840   While this is everything you have to do on the ML-level, you need a keyword
   960   next by letting it take a proposition as argument and printing this proposition 
   970   next by letting it take a proposition as argument and printing this proposition 
   961   inside the tracing buffer. 
   971   inside the tracing buffer. 
   962 
   972 
   963   The crucial part of a command is the function that determines the behaviour
   973   The crucial part of a command is the function that determines the behaviour
   964   of the command. In the code above we used a ``do-nothing''-function, which
   974   of the command. In the code above we used a ``do-nothing''-function, which
   965   because of @{ML Scan.succeed} does not parse any argument, but immediately
   975   because of @{ML [index] succeed in Scan} does not parse any argument, but immediately
   966   returns the simple function @{ML "LocalTheory.theory I"}. We can
   976   returns the simple function @{ML "LocalTheory.theory I"}. We can
   967   replace this code by a function that first parses a proposition (using the
   977   replace this code by a function that first parses a proposition (using the
   968   parser @{ML OuterParse.prop}), then prints out the tracing
   978   parser @{ML OuterParse.prop}), then prints out the tracing
   969   information (using a new function @{text trace_prop}) and 
   979   information (using a new function @{text trace_prop}) and 
   970   finally does nothing. For this you can write:
   980   finally does nothing. For this you can write:
   990   @{text "> \"True \<and> False\""}
  1000   @{text "> \"True \<and> False\""}
   991   \end{isabelle}
  1001   \end{isabelle}
   992   
  1002   
   993   and see the proposition in the tracing buffer.  
  1003   and see the proposition in the tracing buffer.  
   994 
  1004 
   995   Note that so far we used @{ML thy_decl in OuterKeyword} as the kind
  1005   Note that so far we used @{ML [index] thy_decl in OuterKeyword} as the kind
   996   indicator for the command.  This means that the command finishes as soon as
  1006   indicator for the command.  This means that the command finishes as soon as
   997   the arguments are processed. Examples of this kind of commands are
  1007   the arguments are processed. Examples of this kind of commands are
   998   \isacommand{definition} and \isacommand{declare}.  In other cases, commands
  1008   \isacommand{definition} and \isacommand{declare}.  In other cases, commands
   999   are expected to parse some arguments, for example a proposition, and then
  1009   are expected to parse some arguments, for example a proposition, and then
  1000   ``open up'' a proof in order to prove the proposition (for example
  1010   ``open up'' a proof in order to prove the proposition (for example
  1001   \isacommand{lemma}) or prove some other properties (for example
  1011   \isacommand{lemma}) or prove some other properties (for example
  1002   \isacommand{function}). To achieve this kind of behaviour, you have to use
  1012   \isacommand{function}). To achieve this kind of behaviour, you have to use
  1003   the kind indicator @{ML thy_goal in OuterKeyword} and the function @{ML
  1013   the kind indicator @{ML [index] thy_goal in OuterKeyword} and the function @{ML
  1004   "local_theory_to_proof" in OuterSyntax} to set up the command.  Note,
  1014   "local_theory_to_proof" in OuterSyntax} to set up the command.  Note,
  1005   however, once you change the ``kind'' of a command from @{ML thy_decl in
  1015   however, once you change the ``kind'' of a command from @{ML thy_decl in
  1006   OuterKeyword} to @{ML thy_goal in OuterKeyword} then the keyword file needs
  1016   OuterKeyword} to @{ML thy_goal in OuterKeyword} then the keyword file needs
  1007   to be re-created!
  1017   to be re-created!
  1008 
  1018 
  1027 end *}
  1037 end *}
  1028 
  1038 
  1029 text {*
  1039 text {*
  1030   The function @{text prove_prop} in Lines 2 to 7 takes a string (the proposition to be
  1040   The function @{text prove_prop} in Lines 2 to 7 takes a string (the proposition to be
  1031   proved) and a context as argument.  The context is necessary in order to be able to use
  1041   proved) and a context as argument.  The context is necessary in order to be able to use
  1032   @{ML Syntax.read_prop}, which converts a string into a proper proposition.
  1042   @{ML [index] read_prop in Syntax}, which converts a string into a proper proposition.
  1033   In Line 6 the function @{ML Proof.theorem_i} starts the proof for the
  1043   In Line 6 the function @{ML [index] theorem_i in Proof} starts the proof for the
  1034   proposition. Its argument @{ML NONE} stands for a locale (which we chose to
  1044   proposition. Its argument @{ML NONE} stands for a locale (which we chose to
  1035   omit); the argument @{ML "(K I)"} stands for a function that determines what
  1045   omit); the argument @{ML "(K I)"} stands for a function that determines what
  1036   should be done with the theorem once it is proved (we chose to just forget
  1046   should be done with the theorem once it is proved (we chose to just forget
  1037   about it). Line 9 contains the parser for the proposition.
  1047   about it). Line 9 contains the parser for the proposition.
  1038 
  1048 
  1052   \isacommand{apply}@{text "(rule conjI)"}\\
  1062   \isacommand{apply}@{text "(rule conjI)"}\\
  1053   \isacommand{apply}@{text "(rule TrueI)+"}\\
  1063   \isacommand{apply}@{text "(rule TrueI)+"}\\
  1054   \isacommand{done}
  1064   \isacommand{done}
  1055   \end{isabelle}
  1065   \end{isabelle}
  1056 
  1066 
  1057   (FIXME: read a name and show how to store theorems; see @{ML LocalTheory.note})
  1067   (FIXME: read a name and show how to store theorems; see @{ML [index] note in LocalTheory})
  1058   
  1068   
  1059 *}
  1069 *}
  1060 
  1070 
  1061 section {* Methods (TBD) *}
  1071 section {* Methods (TBD) *}
  1062 
  1072 
  1084          "foo method for conjE and conjI"
  1094          "foo method for conjE and conjI"
  1085 
  1095 
  1086 text {*
  1096 text {*
  1087   It defines the method @{text foobar}, which takes no arguments (therefore the
  1097   It defines the method @{text foobar}, which takes no arguments (therefore the
  1088   parser @{ML Scan.succeed}) and only applies a single tactic, namely the tactic which 
  1098   parser @{ML Scan.succeed}) and only applies a single tactic, namely the tactic which 
  1089   applies @{thm [source] conjE} and then @{thm [source] conjI}. The function @{ML SIMPLE_METHOD}
  1099   applies @{thm [source] conjE} and then @{thm [source] conjI}. The function 
       
  1100   @{ML [index] SIMPLE_METHOD}
  1090   turns such a tactic into a method. The method @{text "foobar"} can be used as follows
  1101   turns such a tactic into a method. The method @{text "foobar"} can be used as follows
  1091 *}
  1102 *}
  1092 
  1103 
  1093 lemma shows "A \<and> B \<Longrightarrow> C \<and> D"
  1104 lemma shows "A \<and> B \<Longrightarrow> C \<and> D"
  1094 apply(foo)
  1105 apply(foo)