diff -r 352e31d9dacc -r f76135c6c527 ProgTutorial/Parsing.thy --- a/ProgTutorial/Parsing.thy Tue Sep 29 22:10:48 2009 +0200 +++ b/ProgTutorial/Parsing.thy Thu Oct 01 10:19:21 2009 +0200 @@ -20,14 +20,14 @@ \begin{readmore} The library for writing parser combinators is split up, roughly, into two - parts. The first part consists of a collection of generic parser combinators + parts: The first part consists of a collection of generic parser combinators defined in the structure @{ML_struct Scan} in the file @{ML_file "Pure/General/scan.ML"}. The second part of the library consists of combinators for dealing with specific token types, which are defined in the structure @{ML_struct OuterParse} in the file @{ML_file - "Pure/Isar/outer_parse.ML"}. Specific parsers for packages are defined in - @{ML_file "Pure/Isar/spec_parse.ML"}. Parsers for method arguments are - defined in @{ML_file "Pure/Isar/args.ML"}. + "Pure/Isar/outer_parse.ML"}. In addition specific parsers for packages are + defined in @{ML_file "Pure/Isar/spec_parse.ML"}. Parsers for method arguments + are defined in @{ML_file "Pure/Isar/args.ML"}. \end{readmore} *} @@ -73,7 +73,7 @@ In the examples above we use the function @{ML_ind Symbol.explode}, instead of the more standard library function @{ML_ind explode}, for obtaining an input list for - the parser. The reason is that @{ML_ind Symbol.explode} is aware of character sequences, + the parser. The reason is that @{ML_ind Symbol.explode} is aware of character sequences, for example @{text "\"}, that have a special meaning in Isabelle. To see the difference consider @@ -114,7 +114,7 @@ Note how the result of consumed strings builds up on the left as nested pairs. If, as in the previous example, you want to parse a particular string, - then you should use the function @{ML_ind this_string in Scan}: + then you can use the function @{ML_ind this_string in Scan}. @{ML_response [display,gray] "Scan.this_string \"hell\" (Symbol.explode \"hello\")" @@ -176,16 +176,23 @@ (p input1, p input2) end" "((SOME \"h\", [\"e\", \"l\", \"l\", \"o\"]), (NONE, [\"w\", \"o\", \"r\", \"l\", \"d\"]))"} - The function @{ML_ind "!!"} helps to produce appropriate error messages - for parsing. For example if you want to parse @{text p} immediately + The function @{ML_ind ahead in Scan} parses some input, but leaves the original + input unchanged. For example: + + @{ML_response [display,gray] + "Scan.ahead (Scan.this_string \"foo\") (Symbol.explode \"foo\")" + "(\"foo\", [\"f\", \"o\", \"o\"])"} + + The function @{ML_ind "!!"} helps with producing appropriate error messages + during parsing. For example if you want to parse @{text p} immediately followed by @{text q}, or start a completely different parser @{text r}, you might write: @{ML [display,gray] "(p -- q) || r" for p q r} - However, this parser is problematic for producing an appropriate error - message, if the parsing of @{ML "(p -- q)" for p q} fails. Because in that - case you lose the information that @{text p} should be followed by @{text q}. + However, this parser is problematic for producing a useful error + message, if the parsing of @{ML "(p -- q)" for p q} fails. Because with the + parser above you lose the information that @{text p} should be followed by @{text q}. To see this assume that @{text p} is present in the input, but it is not followed by @{text q}. That means @{ML "(p -- q)" for p q} will fail and hence the alternative parser @{text r} will be tried. However, in many @@ -222,8 +229,8 @@ (see Section~\ref{sec:newcommand} which explains this function in more detail). Let us now return to our example of parsing @{ML "(p -- q) || r" for p q - r}. If you want to generate the correct error message for - @{text "p"}-followed-by-@{text "q"}, then you have to write: + r}. If you want to generate the correct error message for failure + of parsing @{text "p"}-followed-by-@{text "q"}, then you have to write: *} ML{*fun p_followed_by_q p q r = @@ -260,14 +267,14 @@ succeeds at least once. Also note that the parser would have aborted with the exception @{text MORE}, if - you had run it only on just @{text [quotes] "hhhh"}. This can be avoided by using + you had it run with the string @{text [quotes] "hhhh"}. This can be avoided by using the wrapper @{ML_ind finite in Scan} and the ``stopper-token'' @{ML_ind stopper in Symbol}. With them you can write: @{ML_response [display,gray] "Scan.finite Symbol.stopper (Scan.repeat ($$ \"h\")) (Symbol.explode \"hhhh\")" "([\"h\", \"h\", \"h\", \"h\"], [])"} - @{ML Symbol.stopper} is the ``end-of-input'' indicator for parsing strings; + The function @{ML stopper in Symbol} is the ``end-of-input'' indicator for parsing strings; other stoppers need to be used when parsing, for example, tokens. However, this kind of manually wrapping is often already done by the surrounding infrastructure. @@ -325,8 +332,9 @@ @{ML_response [display,gray] "let fun double (x, y) = (x ^ x, y ^ y) + val parser = $$ \"h\" -- $$ \"e\" in - (($$ \"h\") -- ($$ \"e\") >> double) (Symbol.explode \"hello\") + (parser >> double) (Symbol.explode \"hello\") end" "((\"hh\", \"ee\"), [\"l\", \"l\", \"o\"])"} @@ -343,15 +351,6 @@ where the single-character strings in the parsed output are transformed back into one string. - - (FIXME: move to an earlier place) - - The function @{ML_ind ahead in Scan} parses some input, but leaves the original - input unchanged. For example: - - @{ML_response [display,gray] - "Scan.ahead (Scan.this_string \"foo\") (Symbol.explode \"foo\")" - "(\"foo\", [\"f\", \"o\", \"o\"])"} The function @{ML_ind lift in Scan} takes a parser and a pair as arguments. This function applies the given parser to the second component of the pair and leaves the first component @@ -415,8 +414,8 @@ @{text [quotes] "hello"} and @{text [quotes] "world"} do not match any other syntactic category. The second indicates a space. - We can easily change what is recognised as a keyword with - @{ML_ind keyword in OuterKeyword}. For example calling this function + We can easily change what is recognised as a keyword with the function + @{ML_ind keyword in OuterKeyword}. For example calling it with *} ML{*val _ = OuterKeyword.keyword "hello"*} @@ -517,20 +516,20 @@ end" "([\"in\", \"in\", \"in\"], [\])"} - @{ML_ind enum1 in OuterParse} works similarly, except that the parsed list must - be non-empty. Note that we had to add a string @{text [quotes] "foo"} at the - end of the parsed string, otherwise the parser would have consumed all - tokens and then failed with the exception @{text "MORE"}. Like in the - previous section, we can avoid this exception using the wrapper @{ML - Scan.finite}. This time, however, we have to use the ``stopper-token'' @{ML - OuterLex.stopper}. We can write: + The function @{ML_ind enum1 in OuterParse} works similarly, except that the + parsed list must be non-empty. Note that we had to add a string @{text + [quotes] "foo"} at the end of the parsed string, otherwise the parser would + have consumed all tokens and then failed with the exception @{text + "MORE"}. Like in the previous section, we can avoid this exception using the + wrapper @{ML Scan.finite}. This time, however, we have to use the + ``stopper-token'' @{ML OuterLex.stopper}. We can write: @{ML_response [display,gray] "let val input = filtered_input \"in | in | in\" + val p = OuterParse.enum \"|\" (OuterParse.$$$ \"in\") in - Scan.finite OuterLex.stopper - (OuterParse.enum \"|\" (OuterParse.$$$ \"in\")) input + Scan.finite OuterLex.stopper p input end" "([\"in\", \"in\", \"in\"], [])"} @@ -540,11 +539,10 @@ ML{*fun parse p input = Scan.finite OuterLex.stopper (Scan.error p) input *} text {* - - The function @{ML_ind "!!!" in OuterParse} can be used to force termination of the - parser in case of a dead end, just like @{ML "Scan.!!"} (see previous section). - Except that the error message of @{ML "OuterParse.!!!"} is fixed to be - @{text [quotes] "Outer syntax error"} + The function @{ML_ind "!!!" in OuterParse} can be used to force termination + of the parser in case of a dead end, just like @{ML "Scan.!!"} (see previous + section). A difference, however, is that the error message of @{ML + "OuterParse.!!!"} is fixed to be @{text [quotes] "Outer syntax error"} together with a relatively precise description of the failure. For example: @{ML_response_fake [display,gray] @@ -569,8 +567,8 @@ Whenever there is a possibility that the processing of user input can fail, it is a good idea to give all available information about where the error occurred. For this Isabelle can attach positional information to tokens - and then thread this information up the processing chain. To see this, - modify the function @{ML filtered_input} described earlier to + and then thread this information up the ``processing chain''. To see this, + modify the function @{ML filtered_input}, described earlier, as follows *} ML{*fun filtered_input' str = @@ -587,8 +585,8 @@ in which the @{text [quotes] "\\n"} causes the second token to be in line 8. - By using the parser @{ML OuterParse.position} you can access the token position - and return it as part of the parser result. For example + By using the parser @{ML position in OuterParse} you can access the token + position and return it as part of the parser result. For example @{ML_response_fake [display,gray] "let @@ -605,13 +603,34 @@ *} +section {* Parsers for ML-Code (TBD) *} + text {* - (FIXME: there are also handy parsers for ML-expressions and ML-files) + @{ML_ind ML_source in OuterParse} *} section {* Context Parser (TBD) *} text {* + @{ML_ind Args.context} +*} +(* +ML {* +let + val parser = Args.context -- Scan.lift Args.name_source + + fun term_pat (ctxt, str) = + str |> Syntax.read_prop ctxt +in + (parser >> term_pat) (Context.Proof @{context}, filtered_input "f (a::nat)") + |> fst +end +*} +*) + +text {* + @{ML_ind Args.context} + Used for example in \isacommand{attribute\_setup} and \isacommand{method\_setup}. *} @@ -622,7 +641,7 @@ text {* There is usually no need to write your own parser for parsing inner syntax, that is for terms and types: you can just call the predefined parsers. Terms can - be parsed using the function @{ML_ind term in OuterParse}. For example: + be parsed using the function @{ML_ind term in OuterParse}. For example: @{ML_response [display,gray] "let @@ -632,10 +651,10 @@ end" "(\"\\^E\\^Ftoken\\^Efoo\\^E\\^F\\^E\", [])"} - The function @{ML_ind prop in OuterParse} is similar, except that it gives a different + The function @{ML_ind prop in OuterParse} is similar, except that it gives a different error message, when parsing fails. As you can see, the parser not just returns the parsed string, but also some encoded information. You can decode the - information with the function @{ML_ind parse in YXML}. For example + information with the function @{ML_ind parse in YXML} in @{ML_struct YXML}. For example @{ML_response [display,gray] "YXML.parse \"\\^E\\^Ftoken\\^Efoo\\^E\\^F\\^E\"" @@ -954,16 +973,17 @@ @{text [display] "$ isabelle emacs -k foobar a_theory_file"} If you now build a theory on top of @{text "Command.thy"}, - then the command \isacommand{foobar} can be used. You can just write + then you can use the command \isacommand{foobar}. You can just write *} foobar text {* but you will not see any action as we chose to implement this command to do - nothing. A similarly procedure has to be done with any other new command, and - also any new keyword that is introduced with @{ML_ind OuterKeyword.keyword}. - + nothing. The point of this command is to only show the procedure of how + to interact with ProofGeneral. A similar procedure has to be done with any + other new command, and also any new keyword that is introduced with + @{ML_ind OuterKeyword.keyword}. *} ML{*val _ = OuterKeyword.keyword "blink" *} @@ -1081,7 +1101,6 @@ end val parser = SpecParse.opt_thm_name ":" -- OuterParse.ML_source - in OuterSyntax.local_theory_to_proof "foobar_prove" "proving a proposition" OuterKeyword.thy_goal (parser >> setup_proof)