ProgTutorial/Parsing.thy
changeset 569 f875a25aa72d
parent 567 f7c97e64cc2a
child 572 438703674711
equal deleted inserted replaced
568:be23597e81db 569:f875a25aa72d
    58   ``consume'' this string from a given input list of strings. ``Consume'' in
    58   ``consume'' this string from a given input list of strings. ``Consume'' in
    59   this context means that it will return a pair consisting of this string and
    59   this context means that it will return a pair consisting of this string and
    60   the rest of the input list. For example:
    60   the rest of the input list. For example:
    61 
    61 
    62   @{ML_matchresult [display,gray] 
    62   @{ML_matchresult [display,gray] 
    63   "($$ \"h\") (Symbol.explode \"hello\")" "(\"h\", [\"e\", \"l\", \"l\", \"o\"])"}
    63   \<open>($$ "h") (Symbol.explode "hello")\<close> \<open>("h", ["e", "l", "l", "o"])\<close>}
    64 
    64 
    65   @{ML_matchresult [display,gray] 
    65   @{ML_matchresult [display,gray] 
    66   "($$ \"w\") (Symbol.explode \"world\")" "(\"w\", [\"o\", \"r\", \"l\", \"d\"])"}
    66   \<open>($$ "w") (Symbol.explode "world")\<close> \<open>("w", ["o", "r", "l", "d"])\<close>}
    67 
    67 
    68   The function @{ML "$$"} will either succeed (as in the two examples above)
    68   The function @{ML \<open>$$\<close>} will either succeed (as in the two examples above)
    69   or raise the exception \<open>FAIL\<close> if no string can be consumed. For
    69   or raise the exception \<open>FAIL\<close> if no string can be consumed. For
    70   example trying to parse
    70   example trying to parse
    71 
    71 
    72   @{ML_matchresult_fake [display,gray] 
    72   @{ML_matchresult_fake [display,gray] 
    73   "($$ \"x\") (Symbol.explode \"world\")" 
    73   \<open>($$ "x") (Symbol.explode "world")\<close> 
    74   "Exception FAIL raised"}
    74   \<open>Exception FAIL raised\<close>}
    75   
    75   
    76   will raise the exception \<open>FAIL\<close>. The function @{ML_ind "$$" in Scan} will also
    76   will raise the exception \<open>FAIL\<close>. The function @{ML_ind "$$" in Scan} will also
    77   fail if you try to consume more than a single character.
    77   fail if you try to consume more than a single character.
    78 
    78 
    79   There are three exceptions that are raised by the parsing combinators:
    79   There are three exceptions that are raised by the parsing combinators:
    82   \item \<open>FAIL\<close> indicates that alternative routes of parsing 
    82   \item \<open>FAIL\<close> indicates that alternative routes of parsing 
    83   might be explored. 
    83   might be explored. 
    84   \item \<open>MORE\<close> indicates that there is not enough input for the parser. For example 
    84   \item \<open>MORE\<close> indicates that there is not enough input for the parser. For example 
    85   in \<open>($$ "h") []\<close>.
    85   in \<open>($$ "h") []\<close>.
    86   \item \<open>ABORT\<close> indicates that a dead end is reached. 
    86   \item \<open>ABORT\<close> indicates that a dead end is reached. 
    87   It is used for example in the function @{ML "!!"} (see below).
    87   It is used for example in the function @{ML \<open>!!\<close>} (see below).
    88   \end{itemize}
    88   \end{itemize}
    89 
    89 
    90   However, note that these exceptions are private to the parser and cannot be accessed
    90   However, note that these exceptions are private to the parser and cannot be accessed
    91   by the programmer (for example to handle them).
    91   by the programmer (for example to handle them).
    92 
    92 
    96   that @{ML explode in Symbol} is aware of character
    96   that @{ML explode in Symbol} is aware of character
    97   sequences, for example \<open>\<foo>\<close>, that have a special meaning in
    97   sequences, for example \<open>\<foo>\<close>, that have a special meaning in
    98   Isabelle. To see the difference consider
    98   Isabelle. To see the difference consider
    99 
    99 
   100 @{ML_matchresult_fake [display,gray]
   100 @{ML_matchresult_fake [display,gray]
   101 "let 
   101 \<open>let 
   102   val input = \"\<foo> bar\"
   102   val input = "\<foo> bar"
   103 in
   103 in
   104   (String.explode input, Symbol.explode input)
   104   (String.explode input, Symbol.explode input)
   105 end"
   105 end\<close>
   106 "([\"\\\", \"<\", \"f\", \"o\", \"o\", \">\", \" \", \"b\", \"a\", \"r\"],
   106 \<open>(["\\", "<", "f", "o", "o", ">", " ", "b", "a", "r"],
   107  [\"\<foo>\", \" \", \"b\", \"a\", \"r\"])"}
   107  ["\<foo>", " ", "b", "a", "r"])\<close>}
   108 
   108 
   109   Slightly more general than the parser @{ML "$$"} is the function 
   109   Slightly more general than the parser @{ML \<open>$$\<close>} is the function 
   110   @{ML_ind one in Scan}, in that it takes a predicate as argument and 
   110   @{ML_ind one in Scan}, in that it takes a predicate as argument and 
   111   then parses exactly
   111   then parses exactly
   112   one item from the input list satisfying this predicate. For example the
   112   one item from the input list satisfying this predicate. For example the
   113   following parser either consumes an @{text [quotes] "h"} or a @{text
   113   following parser either consumes an @{text [quotes] "h"} or a @{text
   114   [quotes] "w"}:
   114   [quotes] "w"}:
   115 
   115 
   116 @{ML_matchresult [display,gray] 
   116 @{ML_matchresult [display,gray] 
   117 "let 
   117 \<open>let 
   118   val hw = Scan.one (fn x => x = \"h\" orelse x = \"w\")
   118   val hw = Scan.one (fn x => x = "h" orelse x = "w")
   119   val input1 = Symbol.explode \"hello\"
   119   val input1 = Symbol.explode "hello"
   120   val input2 = Symbol.explode \"world\"
   120   val input2 = Symbol.explode "world"
   121 in
   121 in
   122   (hw input1, hw input2)
   122   (hw input1, hw input2)
   123 end"
   123 end\<close>
   124     "((\"h\", [\"e\", \"l\", \"l\", \"o\"]),(\"w\", [\"o\", \"r\", \"l\", \"d\"]))"}
   124     \<open>(("h", ["e", "l", "l", "o"]),("w", ["o", "r", "l", "d"]))\<close>}
   125 
   125 
   126   Two parsers can be connected in sequence by using the function @{ML_ind "--" in Scan}. 
   126   Two parsers can be connected in sequence by using the function @{ML_ind "--" in Scan}. 
   127   For example parsing \<open>h\<close>, \<open>e\<close> and \<open>l\<close> (in this 
   127   For example parsing \<open>h\<close>, \<open>e\<close> and \<open>l\<close> (in this 
   128   order) you can achieve by:
   128   order) you can achieve by:
   129 
   129 
   130   @{ML_matchresult [display,gray] 
   130   @{ML_matchresult [display,gray] 
   131   "($$ \"h\" -- $$ \"e\" -- $$ \"l\") (Symbol.explode \"hello\")"
   131   \<open>($$ "h" -- $$ "e" -- $$ "l") (Symbol.explode "hello")\<close>
   132   "(((\"h\", \"e\"), \"l\"), [\"l\", \"o\"])"}
   132   \<open>((("h", "e"), "l"), ["l", "o"])\<close>}
   133 
   133 
   134   Note how the result of consumed strings builds up on the left as nested pairs.  
   134   Note how the result of consumed strings builds up on the left as nested pairs.  
   135 
   135 
   136   If, as in the previous example, you want to parse a particular string, 
   136   If, as in the previous example, you want to parse a particular string, 
   137   then you can use the function @{ML_ind this_string in Scan}.
   137   then you can use the function @{ML_ind this_string in Scan}.
   138 
   138 
   139   @{ML_matchresult [display,gray] 
   139   @{ML_matchresult [display,gray] 
   140   "Scan.this_string \"hell\" (Symbol.explode \"hello\")"
   140   \<open>Scan.this_string "hell" (Symbol.explode "hello")\<close>
   141   "(\"hell\", [\"o\"])"}
   141   \<open>("hell", ["o"])\<close>}
   142 
   142 
   143   Parsers that explore alternatives can be constructed using the function 
   143   Parsers that explore alternatives can be constructed using the function 
   144   @{ML_ind  "||" in Scan}. The parser @{ML "(p || q)" for p q} returns the
   144   @{ML_ind  "||" in Scan}. The parser @{ML \<open>(p || q)\<close> for p q} returns the
   145   result of \<open>p\<close>, in case it succeeds; otherwise it returns the
   145   result of \<open>p\<close>, in case it succeeds; otherwise it returns the
   146   result of \<open>q\<close>. For example:
   146   result of \<open>q\<close>. For example:
   147 
   147 
   148 
   148 
   149 @{ML_matchresult [display,gray] 
   149 @{ML_matchresult [display,gray] 
   150 "let 
   150 \<open>let 
   151   val hw = $$ \"h\" || $$ \"w\"
   151   val hw = $$ "h" || $$ "w"
   152   val input1 = Symbol.explode \"hello\"
   152   val input1 = Symbol.explode "hello"
   153   val input2 = Symbol.explode \"world\"
   153   val input2 = Symbol.explode "world"
   154 in
   154 in
   155   (hw input1, hw input2)
   155   (hw input1, hw input2)
   156 end"
   156 end\<close>
   157   "((\"h\", [\"e\", \"l\", \"l\", \"o\"]), (\"w\", [\"o\", \"r\", \"l\", \"d\"]))"}
   157   \<open>(("h", ["e", "l", "l", "o"]), ("w", ["o", "r", "l", "d"]))\<close>}
   158 
   158 
   159   The functions @{ML_ind "|--" in Scan} and @{ML_ind "--|" in Scan} work like the sequencing
   159   The functions @{ML_ind "|--" in Scan} and @{ML_ind "--|" in Scan} work like the sequencing
   160   function for parsers, except that they discard the item being parsed by the
   160   function for parsers, except that they discard the item being parsed by the
   161   first (respectively second) parser. That means the item being dropped is the 
   161   first (respectively second) parser. That means the item being dropped is the 
   162   one that @{ML_ind "|--" in Scan} and @{ML_ind "--|" in Scan} ``point'' away.
   162   one that @{ML_ind "|--" in Scan} and @{ML_ind "--|" in Scan} ``point'' away.
   163   For example:
   163   For example:
   164   
   164   
   165 @{ML_matchresult [display,gray]
   165 @{ML_matchresult [display,gray]
   166 "let 
   166 \<open>let 
   167   val just_e = $$ \"h\" |-- $$ \"e\" 
   167   val just_e = $$ "h" |-- $$ "e" 
   168   val just_h = $$ \"h\" --| $$ \"e\" 
   168   val just_h = $$ "h" --| $$ "e" 
   169   val input = Symbol.explode \"hello\"  
   169   val input = Symbol.explode "hello"  
   170 in 
   170 in 
   171   (just_e input, just_h input)
   171   (just_e input, just_h input)
   172 end"
   172 end\<close>
   173   "((\"e\", [\"l\", \"l\", \"o\"]), (\"h\", [\"l\", \"l\", \"o\"]))"}
   173   \<open>(("e", ["l", "l", "o"]), ("h", ["l", "l", "o"]))\<close>}
   174 
   174 
   175   The parser @{ML "Scan.optional p x" for p x} returns the result of the parser 
   175   The parser @{ML \<open>Scan.optional p x\<close> for p x} returns the result of the parser 
   176   \<open>p\<close>, in case it succeeds; otherwise it returns 
   176   \<open>p\<close>, in case it succeeds; otherwise it returns 
   177   the default value \<open>x\<close>. For example:
   177   the default value \<open>x\<close>. For example:
   178 
   178 
   179 @{ML_matchresult [display,gray]
   179 @{ML_matchresult [display,gray]
   180 "let 
   180 \<open>let 
   181   val p = Scan.optional ($$ \"h\") \"x\"
   181   val p = Scan.optional ($$ "h") "x"
   182   val input1 = Symbol.explode \"hello\"
   182   val input1 = Symbol.explode "hello"
   183   val input2 = Symbol.explode \"world\"  
   183   val input2 = Symbol.explode "world"  
   184 in 
   184 in 
   185   (p input1, p input2)
   185   (p input1, p input2)
   186 end" 
   186 end\<close> 
   187  "((\"h\", [\"e\", \"l\", \"l\", \"o\"]), (\"x\", [\"w\", \"o\", \"r\", \"l\", \"d\"]))"}
   187  \<open>(("h", ["e", "l", "l", "o"]), ("x", ["w", "o", "r", "l", "d"]))\<close>}
   188 
   188 
   189   The function @{ML_ind option in Scan} works similarly, except no default value can
   189   The function @{ML_ind option in Scan} works similarly, except no default value can
   190   be given. Instead, the result is wrapped as an \<open>option\<close>-type. For example:
   190   be given. Instead, the result is wrapped as an \<open>option\<close>-type. For example:
   191 
   191 
   192 @{ML_matchresult [display,gray]
   192 @{ML_matchresult [display,gray]
   193 "let 
   193 \<open>let 
   194   val p = Scan.option ($$ \"h\")
   194   val p = Scan.option ($$ "h")
   195   val input1 = Symbol.explode \"hello\"
   195   val input1 = Symbol.explode "hello"
   196   val input2 = Symbol.explode \"world\"  
   196   val input2 = Symbol.explode "world"  
   197 in 
   197 in 
   198   (p input1, p input2)
   198   (p input1, p input2)
   199 end" "((SOME \"h\", [\"e\", \"l\", \"l\", \"o\"]), (NONE, [\"w\", \"o\", \"r\", \"l\", \"d\"]))"}
   199 end\<close> \<open>((SOME "h", ["e", "l", "l", "o"]), (NONE, ["w", "o", "r", "l", "d"]))\<close>}
   200 
   200 
   201   The function @{ML_ind ahead in Scan} parses some input, but leaves the original
   201   The function @{ML_ind ahead in Scan} parses some input, but leaves the original
   202   input unchanged. For example:
   202   input unchanged. For example:
   203 
   203 
   204   @{ML_matchresult [display,gray]
   204   @{ML_matchresult [display,gray]
   205   "Scan.ahead (Scan.this_string \"foo\") (Symbol.explode \"foo\")" 
   205   \<open>Scan.ahead (Scan.this_string "foo") (Symbol.explode "foo")\<close> 
   206   "(\"foo\", [\"f\", \"o\", \"o\"])"} 
   206   \<open>("foo", ["f", "o", "o"])\<close>} 
   207 
   207 
   208   The function @{ML_ind "!!" in Scan} helps with producing appropriate error messages
   208   The function @{ML_ind "!!" in Scan} helps with producing appropriate error messages
   209   during parsing. For example if you want to parse \<open>p\<close> immediately 
   209   during parsing. For example if you want to parse \<open>p\<close> immediately 
   210   followed by \<open>q\<close>, or start a completely different parser \<open>r\<close>,
   210   followed by \<open>q\<close>, or start a completely different parser \<open>r\<close>,
   211   you might write:
   211   you might write:
   212 
   212 
   213   @{ML [display,gray] "(p -- q) || r" for p q r}
   213   @{ML [display,gray] \<open>(p -- q) || r\<close> for p q r}
   214 
   214 
   215   However, this parser is problematic for producing a useful error
   215   However, this parser is problematic for producing a useful error
   216   message, if the parsing of @{ML "(p -- q)" for p q} fails. Because with the
   216   message, if the parsing of @{ML \<open>(p -- q)\<close> for p q} fails. Because with the
   217   parser above you lose the information that \<open>p\<close> should be followed by \<open>q\<close>.
   217   parser above you lose the information that \<open>p\<close> should be followed by \<open>q\<close>.
   218   To see this assume that \<open>p\<close> is present in the input, but it is not
   218   To see this assume that \<open>p\<close> is present in the input, but it is not
   219   followed by \<open>q\<close>. That means @{ML "(p -- q)" for p q} will fail and
   219   followed by \<open>q\<close>. That means @{ML \<open>(p -- q)\<close> for p q} will fail and
   220   hence the alternative parser \<open>r\<close> will be tried. However, in many
   220   hence the alternative parser \<open>r\<close> will be tried. However, in many
   221   circumstances this will be the wrong parser for the input ``\<open>p\<close>-followed-by-something''
   221   circumstances this will be the wrong parser for the input ``\<open>p\<close>-followed-by-something''
   222   and therefore will also fail. The error message is then caused by the failure
   222   and therefore will also fail. The error message is then caused by the failure
   223   of \<open>r\<close>, not by the absence of \<open>q\<close> in the input. Such
   223   of \<open>r\<close>, not by the absence of \<open>q\<close> in the input. Such
   224   situation can be avoided when using the function @{ML "!!"}.  This function
   224   situation can be avoided when using the function @{ML \<open>!!\<close>}.  This function
   225   aborts the whole process of parsing in case of a failure and prints an error
   225   aborts the whole process of parsing in case of a failure and prints an error
   226   message. For example if you invoke the parser
   226   message. For example if you invoke the parser
   227 
   227 
   228 
   228 
   229   @{ML [display,gray] "!! (fn _ => fn _ =>\"foo\") ($$ \"h\")"}
   229   @{ML [display,gray] \<open>!! (fn _ => fn _ =>"foo") ($$ "h")\<close>}
   230 
   230 
   231   on @{text [quotes] "hello"}, the parsing succeeds
   231   on @{text [quotes] "hello"}, the parsing succeeds
   232 
   232 
   233   @{ML_matchresult [display,gray] 
   233   @{ML_matchresult [display,gray] 
   234   "(!! (fn _ => fn _ => \"foo\") ($$ \"h\")) (Symbol.explode \"hello\")" 
   234   \<open>(!! (fn _ => fn _ => "foo") ($$ "h")) (Symbol.explode "hello")\<close> 
   235   "(\"h\", [\"e\", \"l\", \"l\", \"o\"])"}
   235   \<open>("h", ["e", "l", "l", "o"])\<close>}
   236 
   236 
   237   but if you invoke it on @{text [quotes] "world"}
   237   but if you invoke it on @{text [quotes] "world"}
   238 
   238 
   239   @{ML_matchresult_fake [display,gray] "(!! (fn _ => fn _ => \"foo\") ($$ \"h\")) (Symbol.explode \"world\")"
   239   @{ML_matchresult_fake [display,gray] \<open>(!! (fn _ => fn _ => "foo") ($$ "h")) (Symbol.explode "world")\<close>
   240                                "Exception ABORT raised"}
   240                                \<open>Exception ABORT raised\<close>}
   241 
   241 
   242   then the parsing aborts and the error message \<open>foo\<close> is printed. In order to
   242   then the parsing aborts and the error message \<open>foo\<close> is printed. In order to
   243   see the error message properly, you need to prefix the parser with the function 
   243   see the error message properly, you need to prefix the parser with the function 
   244   @{ML_ind error in Scan}. For example:
   244   @{ML_ind error in Scan}. For example:
   245 
   245 
   246   @{ML_matchresult_fake [display,gray] 
   246   @{ML_matchresult_fake [display,gray] 
   247   "Scan.error (!! (fn _ => fn _ => \"foo\") ($$ \"h\")) (Symbol.explode \"world\")"
   247   \<open>Scan.error (!! (fn _ => fn _ => "foo") ($$ "h")) (Symbol.explode "world")\<close>
   248   "Exception Error \"foo\" raised"}
   248   \<open>Exception Error "foo" raised\<close>}
   249 
   249 
   250   This kind of ``prefixing'' to see the correct error message is usually done by wrappers 
   250   This kind of ``prefixing'' to see the correct error message is usually done by wrappers 
   251   such as @{ML_ind local_theory in Outer_Syntax} 
   251   such as @{ML_ind local_theory in Outer_Syntax} 
   252   (see Section~\ref{sec:newcommand} which explains this function in more detail). 
   252   (see Section~\ref{sec:newcommand} which explains this function in more detail). 
   253   
   253   
   254   Let us now return to our example of parsing @{ML "(p -- q) || r" for p q
   254   Let us now return to our example of parsing @{ML \<open>(p -- q) || r\<close> for p q
   255   r}. If you want to generate the correct error message for failure
   255   r}. If you want to generate the correct error message for failure
   256   of parsing \<open>p\<close>-followed-by-\<open>q\<close>, then you have to write:
   256   of parsing \<open>p\<close>-followed-by-\<open>q\<close>, then you have to write:
   257 \<close>
   257 \<close>
   258 
   258 
   259 ML %grayML\<open>fun p_followed_by_q p q r =
   259 ML %grayML\<open>fun p_followed_by_q p q r =
   267 text \<open>
   267 text \<open>
   268   Running this parser with the arguments
   268   Running this parser with the arguments
   269   @{text [quotes] "h"}, @{text [quotes] "e"} and @{text [quotes] "w"}, and 
   269   @{text [quotes] "h"}, @{text [quotes] "e"} and @{text [quotes] "w"}, and 
   270   the input @{text [quotes] "holle"} 
   270   the input @{text [quotes] "holle"} 
   271 
   271 
   272   @{ML_matchresult_fake [display,gray] "Scan.error (p_followed_by_q \"h\" \"e\" \"w\") (Symbol.explode \"holle\")"
   272   @{ML_matchresult_fake [display,gray] \<open>Scan.error (p_followed_by_q "h" "e" "w") (Symbol.explode "holle")\<close>
   273                                "Exception ERROR \"h is not followed by e\" raised"} 
   273                                \<open>Exception ERROR "h is not followed by e" raised\<close>} 
   274 
   274 
   275   produces the correct error message. Running it with
   275   produces the correct error message. Running it with
   276  
   276  
   277   @{ML_matchresult [display,gray] "Scan.error (p_followed_by_q \"h\" \"e\" \"w\") (Symbol.explode \"wworld\")"
   277   @{ML_matchresult [display,gray] \<open>Scan.error (p_followed_by_q "h" "e" "w") (Symbol.explode "wworld")\<close>
   278                           "((\"w\", \"w\"), [\"o\", \"r\", \"l\", \"d\"])"}
   278                           \<open>(("w", "w"), ["o", "r", "l", "d"])\<close>}
   279   
   279   
   280   yields the expected parsing. 
   280   yields the expected parsing. 
   281 
   281 
   282   The function @{ML "Scan.repeat p" for p} will apply a parser \<open>p\<close> as 
   282   The function @{ML \<open>Scan.repeat p\<close> for p} will apply a parser \<open>p\<close> as 
   283   long as it succeeds. For example:
   283   long as it succeeds. For example:
   284   
   284   
   285   @{ML_matchresult [display,gray] "Scan.repeat ($$ \"h\") (Symbol.explode \"hhhhello\")" 
   285   @{ML_matchresult [display,gray] \<open>Scan.repeat ($$ "h") (Symbol.explode "hhhhello")\<close> 
   286                 "([\"h\", \"h\", \"h\", \"h\"], [\"e\", \"l\", \"l\", \"o\"])"}
   286                 \<open>(["h", "h", "h", "h"], ["e", "l", "l", "o"])\<close>}
   287   
   287   
   288   Note that @{ML_ind repeat in Scan} stores the parsed items in a list. The function
   288   Note that @{ML_ind repeat in Scan} stores the parsed items in a list. The function
   289   @{ML_ind repeat1 in Scan} is similar, but requires that the parser \<open>p\<close> 
   289   @{ML_ind repeat1 in Scan} is similar, but requires that the parser \<open>p\<close> 
   290   succeeds at least once.
   290   succeeds at least once.
   291 
   291 
   292   Also note that the parser would have aborted with the exception \<open>MORE\<close>, if
   292   Also note that the parser would have aborted with the exception \<open>MORE\<close>, if
   293   you had it run with the string @{text [quotes] "hhhh"}. This can be avoided by using
   293   you had it run with the string @{text [quotes] "hhhh"}. This can be avoided by using
   294   the wrapper @{ML_ind finite in Scan} and the ``stopper-token'' 
   294   the wrapper @{ML_ind finite in Scan} and the ``stopper-token'' 
   295   @{ML_ind stopper in Symbol}. With them you can write:
   295   @{ML_ind stopper in Symbol}. With them you can write:
   296   
   296   
   297   @{ML_matchresult [display,gray] "Scan.finite Symbol.stopper (Scan.repeat ($$ \"h\")) (Symbol.explode \"hhhh\")" 
   297   @{ML_matchresult [display,gray] \<open>Scan.finite Symbol.stopper (Scan.repeat ($$ "h")) (Symbol.explode "hhhh")\<close> 
   298                 "([\"h\", \"h\", \"h\", \"h\"], [])"}
   298                 \<open>(["h", "h", "h", "h"], [])\<close>}
   299 
   299 
   300   The function @{ML stopper in Symbol} is the ``end-of-input'' indicator for parsing strings;
   300   The function @{ML stopper in Symbol} is the ``end-of-input'' indicator for parsing strings;
   301   other stoppers need to be used when parsing, for example, tokens. However, this kind of 
   301   other stoppers need to be used when parsing, for example, tokens. However, this kind of 
   302   manually wrapping is often already done by the surrounding infrastructure. 
   302   manually wrapping is often already done by the surrounding infrastructure. 
   303 
   303 
   304   The function @{ML_ind repeat in Scan} can be used with @{ML_ind one in Scan} to read any 
   304   The function @{ML_ind repeat in Scan} can be used with @{ML_ind one in Scan} to read any 
   305   string as in
   305   string as in
   306 
   306 
   307   @{ML_matchresult [display,gray] 
   307   @{ML_matchresult [display,gray] 
   308 "let 
   308 \<open>let 
   309    val p = Scan.repeat (Scan.one Symbol.not_eof)
   309    val p = Scan.repeat (Scan.one Symbol.not_eof)
   310    val input = Symbol.explode \"foo bar foo\"
   310    val input = Symbol.explode "foo bar foo"
   311 in
   311 in
   312    Scan.finite Symbol.stopper p input
   312    Scan.finite Symbol.stopper p input
   313 end" 
   313 end\<close> 
   314 "([\"f\", \"o\", \"o\", \" \", \"b\", \"a\", \"r\", \" \", \"f\", \"o\", \"o\"], [])"}
   314 \<open>(["f", "o", "o", " ", "b", "a", "r", " ", "f", "o", "o"], [])\<close>}
   315 
   315 
   316   where the function @{ML_ind not_eof in Symbol} ensures that we do not read beyond the 
   316   where the function @{ML_ind not_eof in Symbol} ensures that we do not read beyond the 
   317   end of the input string (i.e.~stopper symbol).
   317   end of the input string (i.e.~stopper symbol).
   318 
   318 
   319   The function @{ML_ind unless in Scan} takes two parsers: if the first one can 
   319   The function @{ML_ind unless in Scan} takes two parsers: if the first one can 
   320   parse the input, then the whole parser fails; if not, then the second is tried. Therefore
   320   parse the input, then the whole parser fails; if not, then the second is tried. Therefore
   321   
   321   
   322   @{ML_matchresult_fake_both [display,gray] "Scan.unless ($$ \"h\") ($$ \"w\") (Symbol.explode \"hello\")"
   322   @{ML_matchresult_fake_both [display,gray] \<open>Scan.unless ($$ "h") ($$ "w") (Symbol.explode "hello")\<close>
   323                                "Exception FAIL raised"}
   323                                \<open>Exception FAIL raised\<close>}
   324 
   324 
   325   fails, while
   325   fails, while
   326 
   326 
   327   @{ML_matchresult [display,gray] "Scan.unless ($$ \"h\") ($$ \"w\") (Symbol.explode \"world\")"
   327   @{ML_matchresult [display,gray] \<open>Scan.unless ($$ "h") ($$ "w") (Symbol.explode "world")\<close>
   328                           "(\"w\",[\"o\", \"r\", \"l\", \"d\"])"}
   328                           \<open>("w",["o", "r", "l", "d"])\<close>}
   329 
   329 
   330   succeeds. 
   330   succeeds. 
   331 
   331 
   332   The functions @{ML_ind repeat in Scan} and @{ML_ind unless in Scan} can 
   332   The functions @{ML_ind repeat in Scan} and @{ML_ind unless in Scan} can 
   333   be combined to read any input until a certain marker symbol is reached. In the 
   333   be combined to read any input until a certain marker symbol is reached. In the 
   334   example below the marker symbol is a @{text [quotes] "*"}.
   334   example below the marker symbol is a @{text [quotes] "*"}.
   335 
   335 
   336   @{ML_matchresult [display,gray]
   336   @{ML_matchresult [display,gray]
   337 "let 
   337 \<open>let 
   338   val p = Scan.repeat (Scan.unless ($$ \"*\") (Scan.one Symbol.not_eof))
   338   val p = Scan.repeat (Scan.unless ($$ "*") (Scan.one Symbol.not_eof))
   339   val input1 = Symbol.explode \"fooooo\"
   339   val input1 = Symbol.explode "fooooo"
   340   val input2 = Symbol.explode \"foo*ooo\"
   340   val input2 = Symbol.explode "foo*ooo"
   341 in
   341 in
   342   (Scan.finite Symbol.stopper p input1, 
   342   (Scan.finite Symbol.stopper p input1, 
   343    Scan.finite Symbol.stopper p input2)
   343    Scan.finite Symbol.stopper p input2)
   344 end"
   344 end\<close>
   345 "(([\"f\", \"o\", \"o\", \"o\", \"o\", \"o\"], []),
   345 \<open>((["f", "o", "o", "o", "o", "o"], []),
   346  ([\"f\", \"o\", \"o\"], [\"*\", \"o\", \"o\", \"o\"]))"}
   346  (["f", "o", "o"], ["*", "o", "o", "o"]))\<close>}
   347 
   347 
   348   
   348   
   349   After parsing is done, you almost always want to apply a function to the parsed 
   349   After parsing is done, you almost always want to apply a function to the parsed 
   350   items. One way to do this is the function @{ML_ind ">>" in Scan} where 
   350   items. One way to do this is the function @{ML_ind ">>" in Scan} where 
   351   @{ML "(p >> f)" for p f} runs 
   351   @{ML \<open>(p >> f)\<close> for p f} runs 
   352   first the parser \<open>p\<close> and upon successful completion applies the 
   352   first the parser \<open>p\<close> and upon successful completion applies the 
   353   function \<open>f\<close> to the result. For example
   353   function \<open>f\<close> to the result. For example
   354 
   354 
   355 @{ML_matchresult [display,gray]
   355 @{ML_matchresult [display,gray]
   356 "let 
   356 \<open>let 
   357   fun double (x, y) = (x ^ x, y ^ y) 
   357   fun double (x, y) = (x ^ x, y ^ y) 
   358   val parser = $$ \"h\" -- $$ \"e\"
   358   val parser = $$ "h" -- $$ "e"
   359 in
   359 in
   360   (parser >> double) (Symbol.explode \"hello\")
   360   (parser >> double) (Symbol.explode "hello")
   361 end"
   361 end\<close>
   362 "((\"hh\", \"ee\"), [\"l\", \"l\", \"o\"])"}
   362 \<open>(("hh", "ee"), ["l", "l", "o"])\<close>}
   363 
   363 
   364   doubles the two parsed input strings; or
   364   doubles the two parsed input strings; or
   365 
   365 
   366   @{ML_matchresult [display,gray] 
   366   @{ML_matchresult [display,gray] 
   367 "let 
   367 \<open>let 
   368    val p = Scan.repeat (Scan.one Symbol.not_eof)
   368    val p = Scan.repeat (Scan.one Symbol.not_eof)
   369    val input = Symbol.explode \"foo bar foo\" 
   369    val input = Symbol.explode "foo bar foo" 
   370 in
   370 in
   371    Scan.finite Symbol.stopper (p >> implode) input
   371    Scan.finite Symbol.stopper (p >> implode) input
   372 end" 
   372 end\<close> 
   373 "(\"foo bar foo\",[])"}
   373 \<open>("foo bar foo",[])\<close>}
   374 
   374 
   375   where the single-character strings in the parsed output are transformed
   375   where the single-character strings in the parsed output are transformed
   376   back into one string.
   376   back into one string.
   377 
   377 
   378   The function @{ML_ind lift in Scan} takes a parser and a pair as arguments. This function applies
   378   The function @{ML_ind lift in Scan} takes a parser and a pair as arguments. This function applies
   379   the given parser to the second component of the pair and leaves the  first component 
   379   the given parser to the second component of the pair and leaves the  first component 
   380   untouched. For example
   380   untouched. For example
   381 
   381 
   382 @{ML_matchresult [display,gray]
   382 @{ML_matchresult [display,gray]
   383 "Scan.lift ($$ \"h\" -- $$ \"e\") (1, Symbol.explode \"hello\")"
   383 \<open>Scan.lift ($$ "h" -- $$ "e") (1, Symbol.explode "hello")\<close>
   384 "((\"h\", \"e\"), (1, [\"l\", \"l\", \"o\"]))"}
   384 \<open>(("h", "e"), (1, ["l", "l", "o"]))\<close>}
   385 
   385 
   386   \footnote{\bf FIXME: In which situations is \<open>lift\<close> useful? Give examples.} 
   386   \footnote{\bf FIXME: In which situations is \<open>lift\<close> useful? Give examples.} 
   387 
   387 
   388   Be aware of recursive parsers. Suppose you want to read strings separated by
   388   Be aware of recursive parsers. Suppose you want to read strings separated by
   389   commas and by parentheses into a tree datastructure; for example, generating
   389   commas and by parentheses into a tree datastructure; for example, generating
   424   separated by a comma, or acts like @{ML parse_basic}. Unfortunately,
   424   separated by a comma, or acts like @{ML parse_basic}. Unfortunately,
   425   because of the mutual recursion, this parser will immediately run into a
   425   because of the mutual recursion, this parser will immediately run into a
   426   loop, even if it is called without any input. For example
   426   loop, even if it is called without any input. For example
   427 
   427 
   428   @{ML_matchresult_fake_both [display, gray]
   428   @{ML_matchresult_fake_both [display, gray]
   429   "parse_tree \"A\""
   429   \<open>parse_tree "A"\<close>
   430   "*** Exception- TOPLEVEL_ERROR raised"}
   430   \<open>*** Exception- TOPLEVEL_ERROR raised\<close>}
   431 
   431 
   432   raises an exception indicating that the stack limit is reached. Such
   432   raises an exception indicating that the stack limit is reached. Such
   433   looping parser are not useful, because of ML's strict evaluation of
   433   looping parser are not useful, because of ML's strict evaluation of
   434   arguments. Therefore we need to delay the execution of the
   434   arguments. Therefore we need to delay the execution of the
   435   parser until an input is given. This can be done by adding the parsed
   435   parser until an input is given. This can be done by adding the parsed
   448   changed: with this version of the parser the execution is delayed until 
   448   changed: with this version of the parser the execution is delayed until 
   449   some string is  applied for the argument \<open>xs\<close>. This gives us 
   449   some string is  applied for the argument \<open>xs\<close>. This gives us 
   450   exactly the parser what we wanted. An example is as follows:
   450   exactly the parser what we wanted. An example is as follows:
   451 
   451 
   452   @{ML_matchresult [display, gray]
   452   @{ML_matchresult [display, gray]
   453   "let
   453   \<open>let
   454   val input = Symbol.explode \"(A,((A))),A\"
   454   val input = Symbol.explode "(A,((A))),A"
   455 in
   455 in
   456   Scan.finite Symbol.stopper (parse_tree \"A\") input
   456   Scan.finite Symbol.stopper (parse_tree "A") input
   457 end"
   457 end\<close>
   458   "(Br (Br (Lf \"A\", Lf \"A\"), Lf \"A\"), [])"}
   458   \<open>(Br (Br (Lf "A", Lf "A"), Lf "A"), [])\<close>}
   459 
   459 
   460 
   460 
   461   \begin{exercise}\label{ex:scancmts}
   461   \begin{exercise}\label{ex:scancmts}
   462   Write a parser that parses an input string so that any comment enclosed
   462   Write a parser that parses an input string so that any comment enclosed
   463   within \<open>(*\<dots>*)\<close> is replaced by the same comment but enclosed within
   463   within \<open>(*\<dots>*)\<close> is replaced by the same comment but enclosed within
   464   \<open>(**\<dots>**)\<close> in the output string. To enclose a string, you can use the
   464   \<open>(**\<dots>**)\<close> in the output string. To enclose a string, you can use the
   465   function @{ML "enclose s1 s2 s" for s1 s2 s} which produces the string @{ML
   465   function @{ML \<open>enclose s1 s2 s\<close> for s1 s2 s} which produces the string @{ML \<open>s1 ^ s ^ s2\<close> for s1 s2 s}. Hint: To simplify the task ignore the proper 
   466   "s1 ^ s ^ s2" for s1 s2 s}. Hint: To simplify the task ignore the proper 
       
   467   nesting of comments.
   466   nesting of comments.
   468   \end{exercise}
   467   \end{exercise}
   469 \<close>
   468 \<close>
   470 
   469 
   471 section \<open>Parsing Theory Syntax\<close>
   470 section \<open>Parsing Theory Syntax\<close>
   495   example @{ML_ind Ident in Token} for identifiers, @{ML Keyword in
   494   example @{ML_ind Ident in Token} for identifiers, @{ML Keyword in
   496   Token} for keywords and @{ML_ind Command in Token} for commands). Some
   495   Token} for keywords and @{ML_ind Command in Token} for commands). Some
   497   token parsers take into account the kind of tokens. The first example shows
   496   token parsers take into account the kind of tokens. The first example shows
   498   how to generate a token list out of a string using the function 
   497   how to generate a token list out of a string using the function 
   499   @{ML_ind explode in Token}. It is given the argument 
   498   @{ML_ind explode in Token}. It is given the argument 
   500   @{ML "Position.none"} since,
   499   @{ML \<open>Position.none\<close>} since,
   501   at the moment, we are not interested in generating precise error
   500   at the moment, we are not interested in generating precise error
   502   messages. The following code
   501   messages. The following code
   503 
   502 
   504 
   503 
   505 @{ML_matchresult_fake [display,gray] \<open>Token.explode 
   504 @{ML_matchresult_fake [display,gray] \<open>Token.explode 
   522 
   521 
   523 
   522 
   524 text \<open>
   523 text \<open>
   525   then lexing @{text [quotes] "hello world"} will produce
   524   then lexing @{text [quotes] "hello world"} will produce
   526 
   525 
   527   @{ML_matchresult_fake [display,gray] "Token.explode 
   526   @{ML_matchresult_fake [display,gray] \<open>Token.explode 
   528   (Thy_Header.get_keywords' @{context}) Position.none \"hello world\"" 
   527   (Thy_Header.get_keywords' @{context}) Position.none "hello world"\<close> 
   529 "[Token (_,(Keyword, \"hello\"),_), 
   528 \<open>[Token (_,(Keyword, "hello"),_), 
   530  Token (_,(Space, \" \"),_), 
   529  Token (_,(Space, " "),_), 
   531  Token (_,(Ident, \"world\"),_)]"}
   530  Token (_,(Ident, "world"),_)]\<close>}
   532 
   531 
   533   Many parsing functions later on will require white space, comments and the like
   532   Many parsing functions later on will require white space, comments and the like
   534   to have already been filtered out.  So from now on we are going to use the 
   533   to have already been filtered out.  So from now on we are going to use the 
   535   functions @{ML filter} and @{ML_ind is_proper in Token} to do this. 
   534   functions @{ML filter} and @{ML_ind is_proper in Token} to do this. 
   536   For example:
   535   For example:
   537 
   536 
   538 @{ML_matchresult_fake [display,gray]
   537 @{ML_matchresult_fake [display,gray]
   539 "let
   538 \<open>let
   540    val input = Token.explode (Thy_Header.get_keywords' @{context}) Position.none \"hello world\"
   539    val input = Token.explode (Thy_Header.get_keywords' @{context}) Position.none "hello world"
   541 in
   540 in
   542    filter Token.is_proper input
   541    filter Token.is_proper input
   543 end" 
   542 end\<close> 
   544 "[Token (_,(Ident, \"hello\"), _), Token (_,(Ident, \"world\"), _)]"}
   543 \<open>[Token (_,(Ident, "hello"), _), Token (_,(Ident, "world"), _)]\<close>}
   545 
   544 
   546   For convenience we define the function:
   545   For convenience we define the function:
   547 \<close>
   546 \<close>
   548 
   547 
   549 ML %grayML\<open>fun filtered_input str = 
   548 ML %grayML\<open>fun filtered_input str = 
   552 ML \<open>filtered_input "inductive | for"\<close>
   551 ML \<open>filtered_input "inductive | for"\<close>
   553 text \<open>
   552 text \<open>
   554   If you now parse
   553   If you now parse
   555 
   554 
   556 @{ML_matchresult_fake [display,gray] 
   555 @{ML_matchresult_fake [display,gray] 
   557 "filtered_input \"inductive | for\"" 
   556 \<open>filtered_input "inductive | for"\<close> 
   558 "[Token (_,(Command, \"inductive\"),_), 
   557 \<open>[Token (_,(Command, "inductive"),_), 
   559  Token (_,(Keyword, \"|\"),_), 
   558  Token (_,(Keyword, "|"),_), 
   560  Token (_,(Keyword, \"for\"),_)]"}
   559  Token (_,(Keyword, "for"),_)]\<close>}
   561 
   560 
   562   you obtain a list consisting of only one command and two keyword tokens.
   561   you obtain a list consisting of only one command and two keyword tokens.
   563   If you want to see which keywords and commands are currently known to Isabelle, 
   562   If you want to see which keywords and commands are currently known to Isabelle, 
   564   use the function @{ML_ind get_keywords' in Thy_Header}:
   563   use the function @{ML_ind get_keywords' in Thy_Header}:
   565 
   564 
   567   see the complete list.
   566   see the complete list.
   568 
   567 
   569   The parser @{ML_ind "$$$" in Parse} parses a single keyword. For example:
   568   The parser @{ML_ind "$$$" in Parse} parses a single keyword. For example:
   570 
   569 
   571 @{ML_matchresult [display,gray]
   570 @{ML_matchresult [display,gray]
   572 "let 
   571 \<open>let 
   573   val input1 = filtered_input \"where for\"
   572   val input1 = filtered_input "where for"
   574   val input2 = filtered_input \"| in\"
   573   val input2 = filtered_input "| in"
   575 in 
   574 in 
   576   (Parse.$$$ \"where\" input1, Parse.$$$ \"|\" input2)
   575   (Parse.$$$ "where" input1, Parse.$$$ "|" input2)
   577 end"
   576 end\<close>
   578 "((\"where\",_), (\"|\",_))"}
   577 \<open>(("where",_), ("|",_))\<close>}
   579 
   578 
   580   Any non-keyword string can be parsed with the function @{ML_ind reserved in Parse}.
   579   Any non-keyword string can be parsed with the function @{ML_ind reserved in Parse}.
   581   For example:
   580   For example:
   582 
   581 
   583   @{ML_matchresult [display,gray]
   582   @{ML_matchresult [display,gray]
   584 "let 
   583 \<open>let 
   585   val p = Parse.reserved \"bar\"
   584   val p = Parse.reserved "bar"
   586   val input = filtered_input \"bar\"
   585   val input = filtered_input "bar"
   587 in
   586 in
   588   p input
   587   p input
   589 end"
   588 end\<close>
   590   "(\"bar\",[])"}
   589   \<open>("bar",[])\<close>}
   591 
   590 
   592   Like before, you can sequentially connect parsers with @{ML "--"}. For example: 
   591   Like before, you can sequentially connect parsers with @{ML \<open>--\<close>}. For example: 
   593 
   592 
   594 @{ML_matchresult [display,gray]
   593 @{ML_matchresult [display,gray]
   595 "let 
   594 \<open>let 
   596   val input = filtered_input \"| in\"
   595   val input = filtered_input "| in"
   597 in 
   596 in 
   598   (Parse.$$$ \"|\" -- Parse.$$$ \"in\") input
   597   (Parse.$$$ "|" -- Parse.$$$ "in") input
   599 end"
   598 end\<close>
   600 "((\"|\", \"in\"), [])"}
   599 \<open>(("|", "in"), [])\<close>}
   601 
   600 
   602   The parser @{ML "Parse.enum s p" for s p} parses a possibly empty 
   601   The parser @{ML \<open>Parse.enum s p\<close> for s p} parses a possibly empty 
   603   list of items recognised by the parser \<open>p\<close>, where the items being parsed
   602   list of items recognised by the parser \<open>p\<close>, where the items being parsed
   604   are separated by the string \<open>s\<close>. For example:
   603   are separated by the string \<open>s\<close>. For example:
   605 
   604 
   606 @{ML_matchresult [display,gray]
   605 @{ML_matchresult [display,gray]
   607 "let 
   606 \<open>let 
   608   val input = filtered_input \"in | in | in foo\"
   607   val input = filtered_input "in | in | in foo"
   609 in 
   608 in 
   610   (Parse.enum \"|\" (Parse.$$$ \"in\")) input
   609   (Parse.enum "|" (Parse.$$$ "in")) input
   611 end" 
   610 end\<close> 
   612 "([\"in\", \"in\", \"in\"], [_])"}
   611 \<open>(["in", "in", "in"], [_])\<close>}
   613 
   612 
   614   The function @{ML_ind enum1 in Parse} works similarly, except that the
   613   The function @{ML_ind enum1 in Parse} works similarly, except that the
   615   parsed list must be non-empty. Note that we had to add a string @{text
   614   parsed list must be non-empty. Note that we had to add a string @{text
   616   [quotes] "foo"} at the end of the parsed string, otherwise the parser would
   615   [quotes] "foo"} at the end of the parsed string, otherwise the parser would
   617   have consumed all tokens and then failed with the exception \<open>MORE\<close>. Like in the previous section, we can avoid this exception using the
   616   have consumed all tokens and then failed with the exception \<open>MORE\<close>. Like in the previous section, we can avoid this exception using the
   618   wrapper @{ML Scan.finite}. This time, however, we have to use the
   617   wrapper @{ML Scan.finite}. This time, however, we have to use the
   619   ``stopper-token'' @{ML Token.stopper}. We can write:
   618   ``stopper-token'' @{ML Token.stopper}. We can write:
   620 
   619 
   621 @{ML_matchresult [display,gray]
   620 @{ML_matchresult [display,gray]
   622 "let 
   621 \<open>let 
   623   val input = filtered_input \"in | in | in\"
   622   val input = filtered_input "in | in | in"
   624   val p = Parse.enum \"|\" (Parse.$$$ \"in\")
   623   val p = Parse.enum "|" (Parse.$$$ "in")
   625 in 
   624 in 
   626   Scan.finite Token.stopper p input
   625   Scan.finite Token.stopper p input
   627 end" 
   626 end\<close> 
   628 "([\"in\", \"in\", \"in\"], [])"}
   627 \<open>(["in", "in", "in"], [])\<close>}
   629 
   628 
   630   The following function will help to run examples.
   629   The following function will help to run examples.
   631 \<close>
   630 \<close>
   632 
   631 
   633 ML %grayML\<open>fun parse p input = Scan.finite Token.stopper (Scan.error p) input\<close>
   632 ML %grayML\<open>fun parse p input = Scan.finite Token.stopper (Scan.error p) input\<close>
   634 
   633 
   635 text \<open>
   634 text \<open>
   636   The function @{ML_ind "!!!" in Parse} can be used to force termination
   635   The function @{ML_ind "!!!" in Parse} can be used to force termination
   637   of the parser in case of a dead end, just like @{ML "Scan.!!"} (see previous
   636   of the parser in case of a dead end, just like @{ML \<open>Scan.!!\<close>} (see previous
   638   section). A difference, however, is that the error message of @{ML
   637   section). A difference, however, is that the error message of @{ML \<open>Parse.!!!\<close>} is fixed to be @{text [quotes] "Outer syntax error"}
   639   "Parse.!!!"} is fixed to be @{text [quotes] "Outer syntax error"}
       
   640   together with a relatively precise description of the failure. For example:
   638   together with a relatively precise description of the failure. For example:
   641 
   639 
   642 @{ML_matchresult_fake [display,gray]
   640 @{ML_matchresult_fake [display,gray]
   643 "let 
   641 \<open>let 
   644   val input = filtered_input \"in |\"
   642   val input = filtered_input "in |"
   645   val parse_bar_then_in = Parse.$$$ \"|\" -- Parse.$$$ \"in\"
   643   val parse_bar_then_in = Parse.$$$ "|" -- Parse.$$$ "in"
   646 in 
   644 in 
   647   parse (Parse.!!! parse_bar_then_in) input 
   645   parse (Parse.!!! parse_bar_then_in) input 
   648 end"
   646 end\<close>
   649 "Exception ERROR \"Outer syntax error: keyword \"|\" expected, 
   647 \<open>Exception ERROR "Outer syntax error: keyword "|" expected, 
   650 but keyword in was found\" raised"
   648 but keyword in was found" raised\<close>
   651 }
   649 }
   652 
   650 
   653   \begin{exercise} (FIXME)
   651   \begin{exercise} (FIXME)
   654   A type-identifier, for example @{typ "'a"}, is a token of 
   652   A type-identifier, for example @{typ "'a"}, is a token of 
   655   kind @{ML_ind Keyword in Token}. It can be parsed using 
   653   kind @{ML_ind Keyword in Token}. It can be parsed using 
   670 
   668 
   671 text \<open>
   669 text \<open>
   672   where we pretend the parsed string starts on line 7. An example is
   670   where we pretend the parsed string starts on line 7. An example is
   673 
   671 
   674 @{ML_matchresult_fake [display,gray]
   672 @{ML_matchresult_fake [display,gray]
   675 "filtered_input' \"foo \\n bar\""
   673 \<open>filtered_input' "foo \\n bar"\<close>
   676 "[Token ((\"foo\", ({line=7, end_line=7}, {line=7})), (Ident, \"foo\"), _),
   674 \<open>[Token (("foo", ({line=7, end_line=7}, {line=7})), (Ident, "foo"), _),
   677  Token ((\"bar\", ({line=8, end_line=8}, {line=8})), (Ident, \"bar\"), _)]"}
   675  Token (("bar", ({line=8, end_line=8}, {line=8})), (Ident, "bar"), _)]\<close>}
   678 
   676 
   679   in which the @{text [quotes] "\\n"} causes the second token to be in 
   677   in which the @{text [quotes] "\\n"} causes the second token to be in 
   680   line 8.
   678   line 8.
   681 
   679 
   682   By using the parser @{ML position in Parse} you can access the token 
   680   By using the parser @{ML position in Parse} you can access the token 
   683   position and return it as part of the parser result. For example
   681   position and return it as part of the parser result. For example
   684 
   682 
   685 @{ML_matchresult_fake [display,gray]
   683 @{ML_matchresult_fake [display,gray]
   686 "let
   684 \<open>let
   687   val input = filtered_input' \"where\"
   685   val input = filtered_input' "where"
   688 in 
   686 in 
   689   parse (Parse.position (Parse.$$$ \"where\")) input
   687   parse (Parse.position (Parse.$$$ "where")) input
   690 end"
   688 end\<close>
   691 "((\"where\", {line=7, end_line=7}), [])"}
   689 \<open>(("where", {line=7, end_line=7}), [])\<close>}
   692 
   690 
   693   \begin{readmore}
   691   \begin{readmore}
   694   The functions related to positions are implemented in the file
   692   The functions related to positions are implemented in the file
   695   @{ML_file "Pure/General/position.ML"}.
   693   @{ML_file "Pure/General/position.ML"}.
   696   \end{readmore}
   694   \end{readmore}
   752   There is usually no need to write your own parser for parsing inner syntax, that is 
   750   There is usually no need to write your own parser for parsing inner syntax, that is 
   753   for terms and  types: you can just call the predefined parsers. Terms can 
   751   for terms and  types: you can just call the predefined parsers. Terms can 
   754   be parsed using the function @{ML_ind term in Parse}. For example:
   752   be parsed using the function @{ML_ind term in Parse}. For example:
   755 
   753 
   756 @{ML_matchresult_fake [display,gray]
   754 @{ML_matchresult_fake [display,gray]
   757 "let 
   755 \<open>let 
   758   val input = Token.explode (Thy_Header.get_keywords' @{context}) Position.none \"foo\"
   756   val input = Token.explode (Thy_Header.get_keywords' @{context}) Position.none "foo"
   759 in 
   757 in 
   760   Parse.term input
   758   Parse.term input
   761 end"
   759 end\<close>
   762 "(\"<markup>\", [])"}
   760 \<open>("<markup>", [])\<close>}
   763 
   761 
   764 
   762 
   765   The function @{ML_ind prop in Parse} is similar, except that it gives a different
   763   The function @{ML_ind prop in Parse} is similar, except that it gives a different
   766   error message, when parsing fails. As you can see, the parser not just returns 
   764   error message, when parsing fails. As you can see, the parser not just returns 
   767   the parsed string, but also some markup information. You can decode the
   765   the parsed string, but also some markup information. You can decode the
   768   information with the function @{ML_ind parse in YXML} in @{ML_structure YXML}. 
   766   information with the function @{ML_ind parse in YXML} in @{ML_structure YXML}. 
   769   The result of the decoding is an XML-tree. You can see better what is going on if
   767   The result of the decoding is an XML-tree. You can see better what is going on if
   770   you replace @{ML Position.none} by @{ML "Position.line 42"}, say:
   768   you replace @{ML Position.none} by @{ML \<open>Position.line 42\<close>}, say:
   771 
   769 
   772 @{ML_matchresult_fake [display,gray]
   770 @{ML_matchresult_fake [display,gray]
   773 "let 
   771 \<open>let 
   774   val input = Token.explode (Thy_Header.get_keywords' @{context}) (Position.line 42) \"foo\"
   772   val input = Token.explode (Thy_Header.get_keywords' @{context}) (Position.line 42) "foo"
   775 in 
   773 in 
   776   YXML.parse (fst (Parse.term input))
   774   YXML.parse (fst (Parse.term input))
   777 end"
   775 end\<close>
   778 "Elem (\"token\", [(\"line\", \"42\"), (\"end_line\", \"42\")], [XML.Text \"foo\"])"}
   776 \<open>Elem ("token", [("line", "42"), ("end_line", "42")], [XML.Text "foo"])\<close>}
   779 
   777 
   780   The positional information is stored as part of an XML-tree so that code 
   778   The positional information is stored as part of an XML-tree so that code 
   781   called later on will be able to give more precise error messages. 
   779   called later on will be able to give more precise error messages. 
   782 
   780 
   783   \begin{readmore}
   781   \begin{readmore}
   831 
   829 
   832   To see what the parser returns, let us parse the string corresponding to the 
   830   To see what the parser returns, let us parse the string corresponding to the 
   833   definition of @{term even} and @{term odd}:
   831   definition of @{term even} and @{term odd}:
   834 
   832 
   835 @{ML_matchresult [display,gray]
   833 @{ML_matchresult [display,gray]
   836 "let
   834 \<open>let
   837   val input = filtered_input
   835   val input = filtered_input
   838      (\"even and odd \" ^  
   836      ("even and odd " ^  
   839       \"where \" ^
   837       "where " ^
   840       \"  even0[intro]: \\\"even 0\\\" \" ^ 
   838       "  even0[intro]: \"even 0\" " ^ 
   841       \"| evenS[intro]: \\\"odd n \<Longrightarrow> even (Suc n)\\\" \" ^ 
   839       "| evenS[intro]: \"odd n \<Longrightarrow> even (Suc n)\" " ^ 
   842       \"| oddS[intro]:  \\\"even n \<Longrightarrow> odd (Suc n)\\\"\")
   840       "| oddS[intro]:  \"even n \<Longrightarrow> odd (Suc n)\"")
   843 in
   841 in
   844   parse spec_parser input
   842   parse spec_parser input
   845 end"
   843 end\<close>
   846 "(([(even, NONE, NoSyn), (odd, NONE, NoSyn)],
   844 \<open>(([(even, NONE, NoSyn), (odd, NONE, NoSyn)],
   847    [((even0,_),_),
   845    [((even0,_),_),
   848     ((evenS,_),_),
   846     ((evenS,_),_),
   849     ((oddS,_),_)]), [])"}
   847     ((oddS,_),_)]), [])\<close>}
   850 \<close>
   848 \<close>
   851 
   849 
   852 ML \<open>let
   850 ML \<open>let
   853   val input = filtered_input 
   851   val input = filtered_input 
   854         "foo::\"int \<Rightarrow> bool\" and bar::nat (\"BAR\" 100) and blonk"
   852         "foo::\"int \<Rightarrow> bool\" and bar::nat (\"BAR\" 100) and blonk"
   863 
   861 
   864   The function @{ML_ind "vars" in Parse} in Line 2 of the parser reads an 
   862   The function @{ML_ind "vars" in Parse} in Line 2 of the parser reads an 
   865   \isacommand{and}-separated 
   863   \isacommand{and}-separated 
   866   list of variables that can include optional type annotations and syntax translations. 
   864   list of variables that can include optional type annotations and syntax translations. 
   867   For example:\footnote{Note that in the code we need to write 
   865   For example:\footnote{Note that in the code we need to write 
   868   \<open>\"int \<Rightarrow> bool\"\<close> in order to properly escape the double quotes
   866   \<open>"int \<Rightarrow> bool"\<close> in order to properly escape the double quotes
   869   in the compound type.}
   867   in the compound type.}
   870 
   868 
   871 @{ML_matchresult_fake [display,gray]
   869 @{ML_matchresult_fake [display,gray]
   872 "let
   870 \<open>let
   873   val input = filtered_input 
   871   val input = filtered_input 
   874         \"foo::\\\"int \<Rightarrow> bool\\\" and bar::nat (\\\"BAR\\\" 100) and blonk\"
   872         "foo::\"int \<Rightarrow> bool\" and bar::nat (\"BAR\" 100) and blonk"
   875 in
   873 in
   876   parse Parse.vars input
   874   parse Parse.vars input
   877 end"
   875 end\<close>
   878 "([(foo, SOME _, NoSyn), 
   876 \<open>([(foo, SOME _, NoSyn), 
   879   (bar, SOME _, Mixfix (Source {text=\"BAR\",...}, [], 100, _)), 
   877   (bar, SOME _, Mixfix (Source {text="BAR",...}, [], 100, _)), 
   880   (blonk, NONE, NoSyn)],[])"}  
   878   (blonk, NONE, NoSyn)],[])\<close>}  
   881 \<close>
   879 \<close>
   882 
   880 
   883 text \<open>
   881 text \<open>
   884   Whenever types are given, they are stored in the @{ML SOME}s. The types are
   882   Whenever types are given, they are stored in the @{ML SOME}s. The types are
   885   not yet used to type the variables: this must be done by type-inference later
   883   not yet used to type the variables: this must be done by type-inference later
   897   list of introduction rules, that is propositions with theorem annotations
   895   list of introduction rules, that is propositions with theorem annotations
   898   such as rule names and attributes. The introduction rules are propositions
   896   such as rule names and attributes. The introduction rules are propositions
   899   parsed by @{ML_ind prop in Parse}. However, they can include an optional
   897   parsed by @{ML_ind prop in Parse}. However, they can include an optional
   900   theorem name plus some attributes. For example
   898   theorem name plus some attributes. For example
   901 
   899 
   902 @{ML_matchresult [display,gray] "let 
   900 @{ML_matchresult [display,gray] \<open>let 
   903   val input = filtered_input \"foo_lemma[intro,dest!]:\"
   901   val input = filtered_input "foo_lemma[intro,dest!]:"
   904   val ((name, attrib), _) = parse (Parse_Spec.thm_name \":\") input 
   902   val ((name, attrib), _) = parse (Parse_Spec.thm_name ":") input 
   905 in 
   903 in 
   906   (name, map Token.name_of_src attrib)
   904   (name, map Token.name_of_src attrib)
   907 end" "(foo_lemma, [(\"intro\", _), (\"dest\", _)])"}
   905 end\<close> \<open>(foo_lemma, [("intro", _), ("dest", _)])\<close>}
   908  
   906  
   909   The function @{ML_ind opt_thm_name in Parse_Spec} is the ``optional'' variant of
   907   The function @{ML_ind opt_thm_name in Parse_Spec} is the ``optional'' variant of
   910   @{ML_ind thm_name in Parse_Spec}. Theorem names can contain attributes. The name 
   908   @{ML_ind thm_name in Parse_Spec}. Theorem names can contain attributes. The name 
   911   has to end with @{text [quotes] ":"}---see the argument of 
   909   has to end with @{text [quotes] ":"}---see the argument of 
   912   the function @{ML Parse_Spec.opt_thm_name} in Line 7.
   910   the function @{ML Parse_Spec.opt_thm_name} in Line 7.
  1017 
  1015 
  1018   The crucial part of a command is the function that determines the
  1016   The crucial part of a command is the function that determines the
  1019   behaviour of the command. In the code above we used a
  1017   behaviour of the command. In the code above we used a
  1020   ``do-nothing''-function, which because of the parser @{ML_ind succeed in Scan}
  1018   ``do-nothing''-function, which because of the parser @{ML_ind succeed in Scan}
  1021   does not parse any argument, but immediately returns the simple
  1019   does not parse any argument, but immediately returns the simple
  1022   function @{ML "Local_Theory.background_theory I"}. We can replace
  1020   function @{ML \<open>Local_Theory.background_theory I\<close>}. We can replace
  1023   this code by a function that first parses a proposition (using the
  1021   this code by a function that first parses a proposition (using the
  1024   parser @{ML Parse.prop}), then prints out some tracing information
  1022   parser @{ML Parse.prop}), then prints out some tracing information
  1025   (using the function \<open>trace_prop\<close>) and finally does
  1023   (using the function \<open>trace_prop\<close>) and finally does
  1026   nothing. For this you can write:
  1024   nothing. For this you can write:
  1027 \<close>
  1025 \<close>
  1049   \isacommand{definition} and \isacommand{declare}.  In other cases, commands
  1047   \isacommand{definition} and \isacommand{declare}.  In other cases, commands
  1050   are expected to parse some arguments, for example a proposition, and then
  1048   are expected to parse some arguments, for example a proposition, and then
  1051   ``open up'' a proof in order to prove the proposition (for example
  1049   ``open up'' a proof in order to prove the proposition (for example
  1052   \isacommand{lemma}) or prove some other properties (for example
  1050   \isacommand{lemma}) or prove some other properties (for example
  1053   \isacommand{function}). To achieve this kind of behaviour, you have to use
  1051   \isacommand{function}). To achieve this kind of behaviour, you have to use
  1054   the kind indicator @{ML_ind thy_goal in Keyword} and the function @{ML
  1052   the kind indicator @{ML_ind thy_goal in Keyword} and the function @{ML \<open>local_theory_to_proof\<close> in Outer_Syntax} to set up the command. 
  1055   "local_theory_to_proof" in Outer_Syntax} to set up the command. 
       
  1056   Below we show the command \isacommand{foobar\_goal} which takes a
  1053   Below we show the command \isacommand{foobar\_goal} which takes a
  1057   proposition as argument and then starts a proof in order to prove
  1054   proposition as argument and then starts a proof in order to prove
  1058   it. Therefore, we need to announce this command in the header 
  1055   it. Therefore, we need to announce this command in the header 
  1059   as \<open>thy_goal\<close>.
  1056   as \<open>thy_goal\<close>.
  1060 
  1057 
  1082   The function \<open>goal_prop\<close> in Lines 2 to 7 takes a string (the proposition to be
  1079   The function \<open>goal_prop\<close> in Lines 2 to 7 takes a string (the proposition to be
  1083   proved) and a context as argument.  The context is necessary in order to be able to use
  1080   proved) and a context as argument.  The context is necessary in order to be able to use
  1084   @{ML_ind read_prop in Syntax}, which converts a string into a proper proposition.
  1081   @{ML_ind read_prop in Syntax}, which converts a string into a proper proposition.
  1085   In Line 6 the function @{ML_ind theorem in Proof} starts the proof for the
  1082   In Line 6 the function @{ML_ind theorem in Proof} starts the proof for the
  1086   proposition. Its argument @{ML NONE} stands for a locale (which we chose to
  1083   proposition. Its argument @{ML NONE} stands for a locale (which we chose to
  1087   omit); the argument @{ML "(K I)"} stands for a function that determines what
  1084   omit); the argument @{ML \<open>(K I)\<close>} stands for a function that determines what
  1088   should be done with the theorem once it is proved (we chose to just forget
  1085   should be done with the theorem once it is proved (we chose to just forget
  1089   about it). 
  1086   about it). 
  1090 
  1087 
  1091   If you now type \isacommand{foobar\_goal}~@{text [quotes] "True \<and> True"},
  1088   If you now type \isacommand{foobar\_goal}~@{text [quotes] "True \<and> True"},
  1092   you obtain the following proof state:
  1089   you obtain the following proof state: