diff -r 35e1dff0d9bb -r b11653b11bd3 CookBook/Parsing.thy --- a/CookBook/Parsing.thy Fri Oct 17 17:41:34 2008 -0400 +++ b/CookBook/Parsing.thy Mon Oct 20 06:22:11 2008 +0000 @@ -30,7 +30,6 @@ section {* Building Up Generic Parsers *} - text {* Let us first have a look at parsing strings using generic parsing combinators. @@ -43,22 +42,25 @@ @{ML_response [display] "($$ \"w\") (explode \"world\")" "(\"w\", [\"o\", \"r\", \"l\", \"d\"])"} This function will either succeed (as in the two examples above) or raise the exeption - @{ML_text "FAIL"} if no string can be consumed. For example in the next example - try to parse: + @{ML_text "FAIL"} if no string can be consumed. For example trying to parse - @{ML_text [display] "($$ \"x\") (explode \"world\")"} - + @{ML_response_fake [display] "($$ \"x\") (explode \"world\")" + "Exception FAIL raised"} + + will raise the exception @{ML_text "FAIL"}. There are three exceptions used in the parsing combinators: - (FIXME: describe exceptions) - \begin{itemize} - \item @{ML_text "FAIL"} - \item @{ML_text "MORE"} @{ML_text "($$ \"h\") []"} - \item @{ML_text "ABORT"} dead end + \item @{ML_text "FAIL"} is used to indicate that alternative routes of parsing + might be explored. + \item @{ML_text "MORE"} indicates that there is not enough input for the parser. For example + in @{ML_text "($$ \"h\") []"}. + \item @{ML_text "ABORT"} is the exception which is raised when a dead end is reached. + It is used for example in the function @{ML "(op !!)"} (see below). \end{itemize} - Slightly more general than @{ML "(op $$)"} is the function @{ML Scan.one} in that it + + Slightly more general than the parser @{ML "(op $$)"} is the function @{ML Scan.one}, in that it takes a predicate as argument and then parses exactly one item from the input list satisfying this prediate. For example the following parser either consumes an @{ML_text "h"} or a @{ML_text "w"}: @@ -74,7 +76,8 @@ "((\"h\", [\"e\", \"l\", \"l\", \"o\"]),(\"w\", [\"o\", \"r\", \"l\", \"d\"]))"} Two parser can be connected in sequence by using the funtion @{ML "(op --)"}. - For example + For example parsing @{ML_text "h"}, @{ML_text "e"} and @{ML_text "l"} in this + sequence can be achieved by @{ML_response [display] "(($$ \"h\") -- ($$ \"e\") -- ($$ \"l\")) (explode \"hello\")" "(((\"h\", \"e\"), \"l\"), [\"l\", \"o\"])"} @@ -83,7 +86,7 @@ Parsers that explore alternatives can be constructed using the function @{ML "(op ||)"}. For example, the - parser @{ML_open "p || q" for p q} returns the result of @{ML_text "p"}, if it succeeds, + parser @{ML_open "(p || q)" for p q} returns the result of @{ML_text "p"}, if it succeeds, otherwise it returns the result of @{ML_text "q"}. For example @{ML_response [display] @@ -131,15 +134,15 @@ @{ML_open [display] "(p -- q) || r" for p q r} - However, this way is problematic for producing an appropriate error message, in case + However, this parser is problematic for producing an appropriate error message, in case the parsing of @{ML_open "(p -- q)" for p q} fails. Because one loses the information that @{ML_text p} should be followed by @{ML_text q}. To see this consider the case that @{ML_text p} is present in the input, but not @{ML_text q}. So @{ML_open "(p -- q)" for p q} will fail and the - alternative parser @{ML_text r} will be tried. In many circumstances this will be the wrong - parser for the input and therefore probably fail. However, the error message is then caused by the + alternative parser @{ML_text r} will be tried. However in many circumstanes this will be the wrong + parser for the input and therefore will fail. The error message is then caused by the failure of @{ML_text r}, not by the absense of @{ML_text p} in the input. These situations - can be avoided using the funtion @{ML "(op !!)"}, which aborts the whole process of - parsing and invokes an error message. For example if we invoke the parser + can be avoided using the funtion @{ML "(op !!)"}. This function aborts the whole process of + parsing in case of failure and invokes an error message. For example if we invoke the parser @{ML [display] "(!! (fn _ => \"foo\") ($$ \"h\"))"} @@ -149,20 +152,22 @@ "(!! (fn _ => \"foo\") ($$ \"h\")) (explode \"hello\")" "(\"h\", [\"e\", \"l\", \"l\", \"o\"])"} - In contrast if we invoke it on @{ML_text "world"} + but if we invoke it on @{ML_text "world"} - @{ML [display] "(!! (fn _ => \"foo\") ($$ \"h\")) (explode \"world\")"} + @{ML_response_fake [display] "(!! (fn _ => \"foo\") ($$ \"h\")) (explode \"world\")" + "Exception ABORT raised"} the parsing aborts and the error message @{ML_text "foo"} is printed out. In order to see the error message properly, we need to prefix the parser with the function @{ML "Scan.error"}. For example - @{ML [display] "Scan.error ((!! (fn _ => \"foo\") ($$ \"h\")))"} + @{ML_response_fake [display] "Scan.error ((!! (fn _ => \"foo\") ($$ \"h\")))" + "Exception Error \"foo\" raised"} This ``prefixing'' is usually done by wrappers such as @{ML "OuterSyntax.command"} - (FIXME: see below). + (FIXME: give reference to late). - Lets return to our example of parsing @{ML_open "(p -- q) || r" for p q r}. If we want + Returning to our example of parsing @{ML_open "(p -- q) || r" for p q r}. If we want to generate the correct error message for @{ML_text q} not following @{ML_text p}, then we have to write *} @@ -170,39 +175,42 @@ ML {* fun p_followed_by_q p q r = let - val err = (fn _ => p ^ " is not followed by " ^ q) + val err = (fn _ => p ^ " is not followed by " ^ q) in - (($$ p) -- (!! err ($$ q))) || (($$ r) -- ($$ r)) + (($$ p) -- (!! err ($$ q))) || (($$ r) -- ($$ r)) end *} + text {* Running this parser with - @{ML_text [display] "Scan.error (p_followed_by_q \"h\" \"e\" \"w\") (explode \"holle\")"} + @{ML_response_fake [display] "Scan.error (p_followed_by_q \"h\" \"e\" \"w\") (explode \"holle\")" + "Exception ERROR \"h is not followed by e\" raised"} - gives the correct error message and running it with + gives the correct error message. Running it with @{ML_response [display] "Scan.error (p_followed_by_q \"h\" \"e\" \"w\") (explode \"wworld\")" "((\"w\", \"w\"), [\"o\", \"r\", \"l\", \"d\"])"} yields the expected parsing. - The function @{ML "Scan.repeat"} will apply a parser as often as it succeeds. For examle + The function @{ML_open "Scan.repeat p" for p} will apply a parser @{ML_text p} as + often as it succeeds. For example - @{ML_response "Scan.repeat ($$ \"h\") (explode \"hhhhello\")" + @{ML_response [display] "Scan.repeat ($$ \"h\") (explode \"hhhhello\")" "([\"h\", \"h\", \"h\", \"h\"], [\"e\", \"l\", \"l\", \"o\"])"} Note that @{ML "Scan.repeat"} stores the parsed items in a list. The function - @{ML "Scan.repeat1"} is similar, but requires that in @{ML_open "Scan.repeat1 p" for p} - the parse @{ML_text "p"} succeeds at least once. + @{ML "Scan.repeat1"} is similar, but requires that the parser @{ML_text "p"} + succeeds at least once. *} text {* - After parsing succeeded, one wants to apply functions on the parsed items. This is - done using the function @{ML_open "(p >> f)" for p f} which applies first the - parser @{ML_text p} upon successful completion applies the function @{ML_text f}. - For example + After parsing succeeded, one nearly always wants to apply a function on the parsed + items. This is done using the function @{ML_open "(p >> f)" for p f} which runs + first the parser @{ML_text p} and upon successful completion applies the + function @{ML_text f} to the result. For example @{ML_response [display] "let @@ -212,6 +220,8 @@ end" "((\"hh\", \"ee\"), [\"l\", \"l\", \"o\"])"} + doubles the two parsed input strings. + The function @{ML Scan.lift} takes a parser and a pair as arguments. This function applies the given parser to the second component of the pair and leaves the first component untouched. For example @@ -223,17 +233,38 @@ (FIXME: In which situations is this useful?) *} -section {* Parsing Tokens *} +section {* Parsing Theory Syntax *} text {* - Most of the time, however, we will have to deal with tokens that are not just strings. - The parsers for the theory syntax, as well as the parsers for the argument syntax - of proof methods and attributes use the token type @{ML_type OuterParse.token}, - which is identical to @{ML_type OuterLex.token}. + Most of the time, however, Isabelle developers have to deal with parsing tokens, not strings. + This is because the parsers for the theory syntax, as well as the parsers for the + argument syntax of proof methods and attributes, use the type + @{ML_type OuterParse.token} (which is identical to the type @{ML_type OuterLex.token}). The parser functions for the theory syntax are contained in the structure - @{ML_struct OuterParse} defined in the file @{ML_file "Pure/Isar/outer_parse.ML"}. + @{ML_struct OuterParse} defined in the file\linebreak @{ML_file "Pure/Isar/outer_parse.ML"}. +*} + +ML {* map OuterLex.mk_text (explode "hello") *} + +text {* + + @{ML_response_fake [display] "map OuterLex.mk_text (explode \"hello\")" + "[Token (\), Token (\), Token (\), Token (\), Token (\)]"} + *} +ML {* + OuterLex.mk_text "hello" +*} + +ML {* + + let val input = [OuterLex.mk_text "hello", OuterLex.mk_text "world"]; + in (Scan.one (fn _ => true)) input end + +*} + + chapter {* Parsing *}