diff -r b11653b11bd3 -r cd612b489504 CookBook/Parsing.thy --- a/CookBook/Parsing.thy Mon Oct 20 06:22:11 2008 +0000 +++ b/CookBook/Parsing.thy Mon Oct 27 18:48:52 2008 +0100 @@ -3,6 +3,7 @@ begin + chapter {* Parsing *} text {* @@ -15,16 +16,19 @@ is driven by priority grammars. Parsers for outer syntax are built up by functional parsing combinators. These combinators are a well-established technique for parsing, which has, for example, been described in Paulson's classic book \cite{paulson-ml2}. - - Isabelle developers are usually concerned with writing parsers for outer - syntax, either for new definitional packages or for calling tactics. The library - for writing such parsers in can roughly be split up into two parts. + Isabelle developers are usually concerned with writing these outer syntax parsers, + either for new definitional packages or for calling tactics with specific arguments. + + \begin{readmore} + The library + for writing parser combinators can be split up, roughly, into two parts. The first part consists of a collection of generic parser combinators defined in the structure @{ML_struct Scan} in the file @{ML_file "Pure/General/scan.ML"}. The second part of the library consists of combinators for dealing with specific token types, which are defined in the structure @{ML_struct OuterParse} in the file @{ML_file "Pure/Isar/outer_parse.ML"}. + \end{readmore} *} @@ -50,6 +54,8 @@ will raise the exception @{ML_text "FAIL"}. There are three exceptions used in the parsing combinators: + (FIXME: do the exception need to be explained, because the user cannot use them from ``outside''?) + \begin{itemize} \item @{ML_text "FAIL"} is used to indicate that alternative routes of parsing might be explored. @@ -86,7 +92,7 @@ Parsers that explore alternatives can be constructed using the function @{ML "(op ||)"}. For example, the - parser @{ML_open "(p || q)" for p q} returns the result of @{ML_text "p"}, if it succeeds, + parser @{ML_open "(p || q)" for p q} returns the result of @{ML_text "p"}, in case it succeeds, otherwise it returns the result of @{ML_text "q"}. For example @{ML_response [display] @@ -135,24 +141,27 @@ @{ML_open [display] "(p -- q) || r" for p q r} However, this parser is problematic for producing an appropriate error message, in case - the parsing of @{ML_open "(p -- q)" for p q} fails. Because one loses the information - that @{ML_text p} should be followed by @{ML_text q}. To see this consider the case that @{ML_text p} - is present in the input, but not @{ML_text q}. So @{ML_open "(p -- q)" for p q} will fail and the + the parsing of @{ML_open "(p -- q)" for p q} fails. Because in that case one loses with the parser + above the information + that @{ML_text p} should be followed by @{ML_text q}. To see this consider the case in + which @{ML_text p} + is present in the input, but not @{ML_text q}. That means @{ML_open "(p -- q)" for p q} will fail + and the alternative parser @{ML_text r} will be tried. However in many circumstanes this will be the wrong - parser for the input and therefore will fail. The error message is then caused by the - failure of @{ML_text r}, not by the absense of @{ML_text p} in the input. These situations - can be avoided using the funtion @{ML "(op !!)"}. This function aborts the whole process of + parser for the input ``p-followed-by-q'' and therefore will fail. The error message is then caused by the + failure of @{ML_text r}, not by the absense of @{ML_text q} in the input. This kind of situation + can be avoided by using the funtion @{ML "(op !!)"}. This function aborts the whole process of parsing in case of failure and invokes an error message. For example if we invoke the parser @{ML [display] "(!! (fn _ => \"foo\") ($$ \"h\"))"} - on @{ML_text "hello"}, the parsing succeeds + on @{ML_text [quotes] "hello"}, the parsing succeeds @{ML_response [display] "(!! (fn _ => \"foo\") ($$ \"h\")) (explode \"hello\")" "(\"h\", [\"e\", \"l\", \"l\", \"o\"])"} - but if we invoke it on @{ML_text "world"} + but if we invoke it on @{ML_text [quotes] "world"} @{ML_response_fake [display] "(!! (fn _ => \"foo\") ($$ \"h\")) (explode \"world\")" "Exception ABORT raised"} @@ -165,14 +174,14 @@ "Exception Error \"foo\" raised"} This ``prefixing'' is usually done by wrappers such as @{ML "OuterSyntax.command"} - (FIXME: give reference to late). + (FIXME: give reference to later place). Returning to our example of parsing @{ML_open "(p -- q) || r" for p q r}. If we want - to generate the correct error message for @{ML_text q} not following @{ML_text p}, then + to generate the correct error message for p-followed-by-q, then we have to write *} -ML {* +ML {* fun p_followed_by_q p q r = let val err = (fn _ => p ^ " is not followed by " ^ q) @@ -240,12 +249,43 @@ This is because the parsers for the theory syntax, as well as the parsers for the argument syntax of proof methods and attributes, use the type @{ML_type OuterParse.token} (which is identical to the type @{ML_type OuterLex.token}). + + \begin{readmore} The parser functions for the theory syntax are contained in the structure - @{ML_struct OuterParse} defined in the file\linebreak @{ML_file "Pure/Isar/outer_parse.ML"}. + @{ML_struct OuterParse} defined in the file @{ML_file "Pure/Isar/outer_parse.ML"}. + The definition for tokens is in the file @{ML_file "Pure/Isar/outer_lex.ML"}. + \end{readmore} +*} + +ML {* OuterLex.mk_text "hello" *} + +ML {* print_depth 50 *} + +ML {* + let open OuterLex in + OuterSyntax.scan Position.none "for" + end; + *} ML {* map OuterLex.mk_text (explode "hello") *} +ML {* + +fun my_scan lex pos str = + Source.of_string str + |> Symbol.source {do_recover = false} + |> OuterLex.source {do_recover = SOME false} + (fn () => pairself (Scan.make_lexicon o map Symbol.explode) lex) pos + |> Source.exhaust; + +*} + +ML {* +let val toks = my_scan (["hello"], []) Position.none "hello" +in (OuterParse.$$$ "hello") toks end +*} + text {* @{ML_response_fake [display] "map OuterLex.mk_text (explode \"hello\")"