diff -r a04bdee4fb1e -r 0c3580c831a4 CookBook/Parsing.thy --- a/CookBook/Parsing.thy Fri Nov 28 05:56:28 2008 +0100 +++ b/CookBook/Parsing.thy Sat Nov 29 21:20:18 2008 +0000 @@ -94,7 +94,7 @@ Parsers that explore alternatives can be constructed using the function @{ML "(op ||)"}. For example, the - parser @{ML_open "(p || q)" for p q} returns the result of @{ML_text "p"}, in case it succeeds, + parser @{ML "(p || q)" for p q} returns the result of @{ML_text "p"}, in case it succeeds, otherwise it returns the result of @{ML_text "q"}. For example @{ML_response [display] @@ -121,7 +121,7 @@ end" "((\"e\", [\"l\", \"l\", \"o\"]),(\"h\", [\"l\", \"l\", \"o\"]))"} - The parser @{ML_open "Scan.optional p x" for p x} returns the result of the parser + The parser @{ML "Scan.optional p x" for p x} returns the result of the parser @{ML_text "p"}, if it succeeds; otherwise it returns the default value @{ML_text "x"}. For example @@ -152,14 +152,14 @@ followed by @{ML_text q}, or start a completely different parser @{ML_text r}, one might write - @{ML_open [display] "(p -- q) || r" for p q r} + @{ML [display] "(p -- q) || r" for p q r} However, this parser is problematic for producing an appropriate error message, in case - the parsing of @{ML_open "(p -- q)" for p q} fails. Because in that case one loses with the parser + the parsing of @{ML "(p -- q)" for p q} fails. Because in that case one loses with the parser above the information that @{ML_text p} should be followed by @{ML_text q}. To see this consider the case in which @{ML_text p} - is present in the input, but not @{ML_text q}. That means @{ML_open "(p -- q)" for p q} will fail + is present in the input, but not @{ML_text q}. That means @{ML "(p -- q)" for p q} will fail and the alternative parser @{ML_text r} will be tried. However in many circumstance this will be the wrong parser for the input ``p-followed-by-q'' and therefore will also fail. The error message is then @@ -191,7 +191,7 @@ This ``prefixing'' is usually done by wrappers such as @{ML "OuterSyntax.command"} (FIXME: give reference to later place). - Returning to our example of parsing @{ML_open "(p -- q) || r" for p q r}. If we want + Returning to our example of parsing @{ML "(p -- q) || r" for p q r}. If we want to generate the correct error message for p-followed-by-q, then we have to write: *} @@ -219,7 +219,7 @@ yields the expected parsing. - The function @{ML_open "Scan.repeat p" for p} will apply a parser @{ML_text p} as + The function @{ML "Scan.repeat p" for p} will apply a parser @{ML_text p} as often as it succeeds. For example @{ML_response [display] "Scan.repeat ($$ \"h\") (explode \"hhhhello\")" @@ -241,7 +241,7 @@ in practise, because it is already done by the infrastructure for you. After parsing succeeded, one nearly always wants to apply a function on the parsed - items. This is done using the function @{ML_open "(p >> f)" for p f} which runs + items. This is done using the function @{ML "(p >> f)" for p f} which runs first the parser @{ML_text p} and upon successful completion applies the function @{ML_text f} to the result. For example @@ -281,14 +281,17 @@ \end{readmore} The structure @{ML_struct OuterLex} defines several kinds of token (for example - @{ML "OuterLex.Ident"} for identifiers, @{ML "OuterLex.Keyword"} for keywords and - @{ML "OuterLex.Command"} for commands). Some token parsers take into account the + @{ML "Ident" in OuterLex} for identifiers, @{ML "Keyword" in OuterLex} for keywords and + @{ML "Command" in OuterLex} for commands). Some token parsers take into account the kind of token. - - We can generate a token list out of a string using the function @{ML - "OuterSyntax.scan"}, which we give below @{ML "Position.none"} as argument - since, at the moment, we are not interested in generating precise error - messages. The following +*} + +text {* + For the examples below, we can generate a token list out of a string using + the function @{ML "OuterSyntax.scan"}, which we give below @{ML + "Position.none"} as argument since, at the moment, we are not interested in + generating precise error messages. The following + @{ML_response_fake [display] "OuterSyntax.scan Position.none \"hello world\"" "[Token (\,(Ident, \"hello\"),\), @@ -302,7 +305,7 @@ the token.} The second indicates a space. Many parsing functions later on will require spaces, comments and the like - to have been filtered out. In what follows below, we are going to use the + to have already been filtered out. So from now on we are going to use the functions @{ML filter} and @{ML OuterLex.is_proper} do this. For example @@ -367,20 +370,20 @@ end" "((\"|\",\"in\"),[])"} - The parser @{ML_open "OuterParse.enum s p" for s p} parses a possibly empty - list of items recognised by the parser @{ML_text p}, where the items are - separated by the string @{ML_text s}. For example + The parser @{ML "OuterParse.enum s p" for s p} parses a possibly empty + list of items recognised by the parser @{ML_text p}, where the items being parsed + are separated by the string @{ML_text s}. For example @{ML_response [display] "let - val input = filtered_input \"in | in | in end\" + val input = filtered_input \"in | in | in foo\" in (OuterParse.enum \"|\" (OuterParse.$$$ \"in\")) input end" "([\"in\",\"in\",\"in\"],[\])"} @{ML "OuterParse.enum1"} works similarly, except that the parsed list must - be non-empty. Note that we had to add a @{ML_text [quotes] "end"} at the end + be non-empty. Note that we had to add an @{ML_text [quotes] "foo"} at the end of the parsed string, otherwise the parser would have consumed all tokens and then failed with the exception @{ML_text "MORE"}. Like in the previous section, we can avoid this exception using the wrapper @{ML @@ -391,10 +394,21 @@ "let val input = filtered_input \"in | in | in\" in - Scan.finite OuterLex.stopper (OuterParse.enum \"|\" (OuterParse.$$$ \"in\")) input + Scan.finite OuterLex.stopper + (OuterParse.enum \"|\" (OuterParse.$$$ \"in\")) input end" "([\"in\",\"in\",\"in\"],[])"} + The following function will help us to run examples + +*} + +ML {* +fun parse p input = Scan.finite OuterLex.stopper (Scan.error p) input +*} + +text {* + The function @{ML "OuterParse.!!!"} can be used to force termination of the parser in case of a dead end, just like @{ML "Scan.!!"} (see previous section), except that the error message is fixed to be @{text [quotes] "Outer syntax error"} @@ -405,14 +419,22 @@ val input = filtered_input \"in |\" val parse_bar_then_in = OuterParse.$$$ \"|\" -- OuterParse.$$$ \"in\" in - Scan.error (OuterParse.!!! parse_bar_then_in) input + parse (OuterParse.!!! parse_bar_then_in) input end" "Exception ERROR \"Outer syntax error: keyword \"|\" expected, but keyword in was found\" raised" } + \begin{exercise} + A type-identifier, for example @{typ "'a"}, is a token of + kind @{ML "Keyword" in OuterLex} can be parsed + + \end{exercise} + + *} + section {* Positional Information *} text {*