diff -r a0edabf14457 -r 3d4b49921cdb CookBook/Parsing.thy --- a/CookBook/Parsing.thy Mon Nov 24 02:51:08 2008 +0100 +++ b/CookBook/Parsing.thy Tue Nov 25 05:19:27 2008 +0000 @@ -21,7 +21,7 @@ \begin{readmore} The library - for writing parser combinators can be split up, roughly, into two parts. + for writing parser combinators is split up, roughly, into two parts. The first part consists of a collection of generic parser combinators defined in the structure @{ML_struct Scan} in the file @{ML_file "Pure/General/scan.ML"}. The second part of the library consists of @@ -37,7 +37,7 @@ text {* Let us first have a look at parsing strings using generic parsing combinators. - The function @{ML "(op $$)"} takes a string and will ``consume'' this string from + The function @{ML "(op $$)"} takes a string as argument and will ``consume'' this string from a given input list of strings. ``Consume'' in this context means that it will return a pair consisting of this string and the rest of the input list. For example: @@ -63,7 +63,7 @@ It is used for example in the function @{ML "(op !!)"} (see below). \end{itemize} - However, note that these exception private to the parser and cannot be accessed + However, note that these exceptions are private to the parser and cannot be accessed by the programmer (for example to handle them). Slightly more general than the parser @{ML "(op $$)"} is the function @{ML @@ -108,7 +108,7 @@ "((\"h\", [\"e\", \"l\", \"l\", \"o\"]), (\"w\", [\"o\", \"r\", \"l\", \"d\"]))"} The functions @{ML "(op |--)"} and @{ML "(op --|)"} work like the sequencing funtion - for parsers, except that they discard the item parsed by the first (respectively second) + for parsers, except that they discard the item being parsed by the first (respectively second) parser. For example @{ML_response [display] @@ -136,7 +136,16 @@ "((\"h\", [\"e\", \"l\", \"l\", \"o\"]), (\"x\", [\"w\", \"o\", \"r\", \"l\", \"d\"]))"} The function @{ML Scan.option} works similarly, except no default value can - be given and in the failure case this parser does nothing. + be given. Instead, the result is wrapped as an @{text "option"}-type. For example: + +@{ML_response [display] +"let + val p = Scan.option ($$ \"h\") + val input1 = (explode \"hello\") + val input2 = (explode \"world\") +in + (p input1, p input2) +end" "((SOME \"h\", [\"e\", \"l\", \"l\", \"o\"]), (NONE, [\"w\", \"o\", \"r\", \"l\", \"d\"]))"} The function @{ML "(op !!)"} helps to produce appropriate error messages during parsing. For example if one wants to parse that @{ML_text p} is immediately @@ -157,7 +166,7 @@ caused by the failure of @{ML_text r}, not by the absense of @{ML_text q} in the input. This kind of situation can be avoided by using the funtion @{ML "(op !!)"}. This function aborts the whole process of - parsing in case of failure and invokes an error message. For example if we invoke the parser + parsing in case of a failure and invokes an error message. For example if we invoke the parser @{ML [display] "(!! (fn _ => \"foo\") ($$ \"h\"))"} @@ -184,7 +193,7 @@ Returning to our example of parsing @{ML_open "(p -- q) || r" for p q r}. If we want to generate the correct error message for p-followed-by-q, then - we have to write, for example + we have to write: *} ML {* @@ -221,14 +230,15 @@ succeeds at least once. Also note that the parser would have aborted with the exception @{ML_text MORE}, if - we had run it only on just @{ML_text [quotes] "hhhh"}. This can be awoided using + we had run it only on just @{ML_text [quotes] "hhhh"}. This can be avoided using the wrapper @{ML Scan.finite} and the ``stopper-token'' @{ML Symbol.stopper}. With them we can write @{ML_response [display] "Scan.finite Symbol.stopper (Scan.repeat ($$ \"h\")) (explode \"hhhh\")" "([\"h\", \"h\", \"h\", \"h\"], [])"} - However, the Isabelle-develloper rarely needs to do this kind of wrapping manually. + However, this kind of manually wrapping needs to be done only very rarely + in practise, because it is already done by the infrastructure for you. After parsing succeeded, one nearly always wants to apply a function on the parsed items. This is done using the function @{ML_open "(p >> f)" for p f} which runs @@ -272,31 +282,58 @@ The structure @{ML_struct OuterLex} defines several kinds of token (for example @{ML "OuterLex.Ident"} for identifiers, @{ML "OuterLex.Keyword"} for keywords and - @{ML "OuterLex.Command"} for commands). Some parsers take into account the + @{ML "OuterLex.Command"} for commands). Some token parsers take into account the kind of token. - We can generate a token list using the function @{ML "OuterSyntax.scan"}, which we give - below @{ML "Position.none"} as argument since, at the moment, we are not interested in - generating precise error messages. The following\footnote{There is something funny - going on with the pretty printing of the result token list.} + We can generate a token list out of a string using the function @{ML + "OuterSyntax.scan"}, which we give below @{ML "Position.none"} as argument + since, at the moment, we are not interested in generating precise error + messages. The following @{ML_response_fake [display] "OuterSyntax.scan Position.none \"hello world\"" -"[Token (\,(OuterLex.Ident, \"hello\"),\), - Token (\,(OuterLex.Space, \" \"),\), - Token (\,(OuterLex.Ident, \"world\"),\)]"} +"[Token (\,(Ident, \"hello\"),\), + Token (\,(Space, \" \"),\), + Token (\,(Ident, \"world\"),\)]"} + + produces three tokens where the first and the last are identifiers, since + @{ML_text [quotes] "hello"} and @{ML_text [quotes] "world"} do not match any + other syntactic category.\footnote{Note that because of a possible a bug in + the PolyML runtime system the result is printed as @{text "?"}, instead of + the token.} The second indicates a space. + + Many parsing functions later on will require spaces, comments and the like + to have been filtered out. In what follows below, we are going to use the + functions @{ML filter} and @{ML OuterLex.is_proper} do this. For example + - produces three tokens where the first and the last are identifiers, since - @{ML_text [quotes] "hello"} and @{ML_text [quotes] "world"} do not match - any other category. The second indicates a space. Many parsing functions - later on will require spaces, comments and the like to have been filtered out. +@{ML_response_fake [display] +"let + val input = OuterSyntax.scan Position.none \"hello world\" +in + filter OuterLex.is_proper input +end" +"[Token (\,(Ident, \"hello\"), \), Token (\,(Ident, \"world\"), \)]"} + + For convenience we are going to use the function + +*} + +ML {* +fun filtered_input str = + filter OuterLex.is_proper (OuterSyntax.scan Position.none str) +*} + +text {* + If we parse -@{ML_response_fake [display] "OuterSyntax.scan Position.none \"inductive|for\"" -"[Token (\,(OuterLex.Command, \"inductive\"),\), - Token (\,(OuterLex.Keyword, \"|\"),\), - Token (\,(OuterLex.Keyword, \"for\"),\)]"} +@{ML_response_fake [display] +"filtered_input \"inductive | for\"" +"[Token (\,(Command, \"inductive\"),\), + Token (\,(Keyword, \"|\"),\), + Token (\,(Keyword, \"for\"),\)]"} - we obtain a list consiting of only command and keyword tokens. + we obtain a list consiting of only a command and two keyword tokens. If you want to see which keywords and commands are currently known, use the following (you might have to adjust the @{ML print_depth} in order to see the complete list): @@ -310,11 +347,11 @@ "([\"}\",\"{\",\],[\"\\",\"\\",\])"} Now the parser @{ML "OuterParse.$$$"} parses a single keyword. For example - + @{ML_response [display] "let - val input1 = OuterSyntax.scan Position.none \"where for\" - val input2 = OuterSyntax.scan Position.none \"|in\" + val input1 = filtered_input \"where for\" + val input2 = filtered_input \"| in\" in (OuterParse.$$$ \"where\" input1, OuterParse.$$$ \"|\" input2) end" @@ -324,7 +361,7 @@ @{ML_response [display] "let - val input = OuterSyntax.scan Position.none \"|in\" + val input = filtered_input \"| in\" in (OuterParse.$$$ \"|\" -- OuterParse.$$$ \"in\") input end" @@ -336,24 +373,23 @@ @{ML_response [display] "let - val input = OuterSyntax.scan Position.none \"in|in|in\\n\" + val input = filtered_input \"in | in | in end\" in (OuterParse.enum \"|\" (OuterParse.$$$ \"in\")) input end" "([\"in\",\"in\",\"in\"],[\])"} - @{ML "OuterParse.enum1"} works similarly, except that the parsed list must - be non-empty. - - Note that we had to add a @{ML_text [quotes] "\\n"} at the end of the parsed - string, otherwise the parser would have consumed all tokens and then failed with - the exception @{ML_text "MORE"}. Like in the previous section, we can avoid - this exception using the wrapper @{ML Scan.finite}. This time, however, we - have to use the ``stopper-token'' @{ML OuterLex.stopper}. We can write + @{ML "OuterParse.enum1"} works similarly, except that the parsed list must + be non-empty. Note that we had to add a @{ML_text [quotes] "end"} at the end + of the parsed string, otherwise the parser would have consumed all tokens + and then failed with the exception @{ML_text "MORE"}. Like in the previous + section, we can avoid this exception using the wrapper @{ML + Scan.finite}. This time, however, we have to use the ``stopper-token'' @{ML + OuterLex.stopper}. We can write @{ML_response [display] "let - val input = OuterSyntax.scan Position.none \"in|in|in\" + val input = filtered_input \"in | in | in\" in Scan.finite OuterLex.stopper (OuterParse.enum \"|\" (OuterParse.$$$ \"in\")) input end" @@ -366,7 +402,7 @@ @{ML_response_fake [display] "let - val input = OuterSyntax.scan Position.none \"in|\" + val input = filtered_input \"in |\" val parse_bar_then_in = OuterParse.$$$ \"|\" -- OuterParse.$$$ \"in\" in Scan.error (OuterParse.!!! parse_bar_then_in) input @@ -377,15 +413,6 @@ *} - -ML {* -let - val input = filter OuterLex.is_proper (OuterSyntax.scan Position.none "(in foo)") -in - OuterParse.target input -end -*} - section {* Positional Information *} text {* @@ -410,6 +437,19 @@ *} +ML {* + OuterParse.opt_target +*} + +ML {* + OuterParse.opt_target -- + OuterParse.fixes -- + OuterParse.for_fixes -- + Scan.optional (OuterParse.$$$ "where" |-- + OuterParse.!!! (OuterParse.enum1 "|" (SpecParse.opt_thm_name ":" -- OuterParse.prop))) [] + +*} + text {* (FIXME funny output for a proposition) *}