diff -r dc955603d813 -r 7b6d81ff9d9a ProgTutorial/Parsing.thy --- a/ProgTutorial/Parsing.thy Wed Apr 08 22:47:39 2009 +0100 +++ b/ProgTutorial/Parsing.thy Thu Apr 09 18:11:35 2009 +0000 @@ -7,9 +7,9 @@ text {* Isabelle distinguishes between \emph{outer} and \emph{inner} syntax. - Theory commands, such as \isacommand{definition}, \isacommand{inductive} and so - on, belong to the outer syntax, whereas items inside double quotation marks, such - as terms, types and so on, belong to the inner syntax. For parsing inner syntax, + Commands, such as \isacommand{definition}, \isacommand{inductive} and so + on, belong to the outer syntax, whereas terms, types and so on belong to the + inner syntax. For parsing inner syntax, Isabelle uses a rather general and sophisticated algorithm, which is driven by priority grammars. Parsers for outer syntax are built up by functional parsing combinators. These combinators are a well-established technique for parsing, @@ -18,14 +18,15 @@ either for new definitional packages or for calling methods with specific arguments. \begin{readmore} - The library - for writing parser combinators is split up, roughly, into two parts. - The first part consists of a collection of generic parser combinators defined - in the structure @{ML_struct Scan} in the file - @{ML_file "Pure/General/scan.ML"}. The second part of the library consists of - combinators for dealing with specific token types, which are defined in the - structure @{ML_struct OuterParse} in the file - @{ML_file "Pure/Isar/outer_parse.ML"}. + The library for writing parser combinators is split up, roughly, into two + parts. The first part consists of a collection of generic parser combinators + defined in the structure @{ML_struct Scan} in the file @{ML_file + "Pure/General/scan.ML"}. The second part of the library consists of + combinators for dealing with specific token types, which are defined in the + structure @{ML_struct OuterParse} in the file @{ML_file + "Pure/Isar/outer_parse.ML"}. Specific parsers for packages are defined + in @{ML_file "Pure/Isar/spec_parse.ML"}. Parsers for method arguments + are defined in @{ML_file "Pure/Isar/args.ML"}. \end{readmore} *} @@ -75,10 +76,10 @@ @{ML_response [display,gray] "let val hw = Scan.one (fn x => x = \"h\" orelse x = \"w\") - val input1 = (explode \"hello\") - val input2 = (explode \"world\") + val input1 = explode \"hello\" + val input2 = explode \"world\" in - (hw input1, hw input2) + (hw input1, hw input2) end" "((\"h\", [\"e\", \"l\", \"l\", \"o\"]),(\"w\", [\"o\", \"r\", \"l\", \"d\"]))"} @@ -86,16 +87,18 @@ For example parsing @{text "h"}, @{text "e"} and @{text "l"} (in this order) you can achieve by: - @{ML_response [display,gray] "(($$ \"h\") -- ($$ \"e\") -- ($$ \"l\")) (explode \"hello\")" - "(((\"h\", \"e\"), \"l\"), [\"l\", \"o\"])"} + @{ML_response [display,gray] + "($$ \"h\" -- $$ \"e\" -- $$ \"l\") (explode \"hello\")" + "(((\"h\", \"e\"), \"l\"), [\"l\", \"o\"])"} Note how the result of consumed strings builds up on the left as nested pairs. If, as in the previous example, you want to parse a particular string, then you should use the function @{ML Scan.this_string}: - @{ML_response [display,gray] "Scan.this_string \"hell\" (explode \"hello\")" - "(\"hell\", [\"o\"])"} + @{ML_response [display,gray] + "Scan.this_string \"hell\" (explode \"hello\")" + "(\"hell\", [\"o\"])"} Parsers that explore alternatives can be constructed using the function @{ML "||"}. The parser @{ML "(p || q)" for p q} returns the @@ -105,9 +108,9 @@ @{ML_response [display,gray] "let - val hw = ($$ \"h\") || ($$ \"w\") - val input1 = (explode \"hello\") - val input2 = (explode \"world\") + val hw = $$ \"h\" || $$ \"w\" + val input1 = explode \"hello\" + val input2 = explode \"world\" in (hw input1, hw input2) end" @@ -119,9 +122,9 @@ @{ML_response [display,gray] "let - val just_e = ($$ \"h\") |-- ($$ \"e\") - val just_h = ($$ \"h\") --| ($$ \"e\") - val input = (explode \"hello\") + val just_e = $$ \"h\" |-- $$ \"e\" + val just_h = $$ \"h\" --| $$ \"e\" + val input = explode \"hello\" in (just_e input, just_h input) end" @@ -134,8 +137,8 @@ @{ML_response [display,gray] "let val p = Scan.optional ($$ \"h\") \"x\" - val input1 = (explode \"hello\") - val input2 = (explode \"world\") + val input1 = explode \"hello\" + val input2 = explode \"world\" in (p input1, p input2) end" @@ -147,8 +150,8 @@ @{ML_response [display,gray] "let val p = Scan.option ($$ \"h\") - val input1 = (explode \"hello\") - val input2 = (explode \"world\") + val input1 = explode \"hello\" + val input2 = explode \"world\" in (p input1, p input2) end" "((SOME \"h\", [\"e\", \"l\", \"l\", \"o\"]), (NONE, [\"w\", \"o\", \"r\", \"l\", \"d\"]))"} @@ -166,7 +169,7 @@ To see this assume that @{text p} is present in the input, but it is not followed by @{text q}. That means @{ML "(p -- q)" for p q} will fail and hence the alternative parser @{text r} will be tried. However, in many - circumstances this will be the wrong parser for the input ``p-followed-by-something'' + circumstances this will be the wrong parser for the input ``@{text "p"}-followed-by-something'' and therefore will also fail. The error message is then caused by the failure of @{text r}, not by the absence of @{text q} in the input. This kind of situation can be avoided when using the function @{ML "!!"}. This function @@ -174,13 +177,13 @@ message. For example if you invoke the parser - @{ML [display,gray] "(!! (fn _ => \"foo\") ($$ \"h\"))"} + @{ML [display,gray] "!! (fn _ => \"foo\") ($$ \"h\")"} on @{text [quotes] "hello"}, the parsing succeeds @{ML_response [display,gray] - "(!! (fn _ => \"foo\") ($$ \"h\")) (explode \"hello\")" - "(\"h\", [\"e\", \"l\", \"l\", \"o\"])"} + "(!! (fn _ => \"foo\") ($$ \"h\")) (explode \"hello\")" + "(\"h\", [\"e\", \"l\", \"l\", \"o\"])"} but if you invoke it on @{text [quotes] "world"} @@ -191,20 +194,21 @@ see the error message properly, you need to prefix the parser with the function @{ML "Scan.error"}. For example: - @{ML_response_fake [display,gray] "Scan.error (!! (fn _ => \"foo\") ($$ \"h\"))" - "Exception Error \"foo\" raised"} + @{ML_response_fake [display,gray] + "Scan.error (!! (fn _ => \"foo\") ($$ \"h\"))" + "Exception Error \"foo\" raised"} This ``prefixing'' is usually done by wrappers such as @{ML "OuterSyntax.local_theory"} (see Section~\ref{sec:newcommand} which explains this function in more detail). Let us now return to our example of parsing @{ML "(p -- q) || r" for p q - r}. If you want to generate the correct error message for p-followed-by-q, - then you have to write: + r}. If you want to generate the correct error message for + @{text "p"}-followed-by-@{text "q"}, then you have to write: *} ML{*fun p_followed_by_q p q r = let - val err_msg = (fn _ => p ^ " is not followed by " ^ q) + val err_msg = fn _ => p ^ " is not followed by " ^ q in ($$ p -- (!! err_msg ($$ q))) || ($$ r -- $$ r) end *} @@ -253,7 +257,7 @@ @{ML_response [display,gray] "let val p = Scan.repeat (Scan.one Symbol.not_eof) - val input = (explode \"foo bar foo\") + val input = explode \"foo bar foo\" in Scan.finite Symbol.stopper p input end" @@ -282,8 +286,8 @@ @{ML_response [display,gray] "let val p = Scan.repeat (Scan.unless ($$ \"*\") (Scan.one Symbol.not_eof)) - val input1 = (explode \"fooooo\") - val input2 = (explode \"foo*ooo\") + val input1 = explode \"fooooo\" + val input2 = explode \"foo*ooo\" in (Scan.finite Symbol.stopper p input1, Scan.finite Symbol.stopper p input2) @@ -309,7 +313,7 @@ @{ML_response [display,gray] "let val p = Scan.repeat (Scan.one Symbol.not_eof) - val input = (explode \"foo bar foo\") + val input = explode \"foo bar foo\" in Scan.finite Symbol.stopper (p >> implode) input end" @@ -332,7 +336,7 @@ untouched. For example @{ML_response [display,gray] -"Scan.lift (($$ \"h\") -- ($$ \"e\")) (1,(explode \"hello\"))" +"Scan.lift ($$ \"h\" -- $$ \"e\") (1, explode \"hello\")" "((\"h\", \"e\"), (1, [\"l\", \"l\", \"o\"]))"} (FIXME: In which situations is this useful? Give examples.) @@ -342,7 +346,8 @@ within @{text "(*\*)"} is replaced by the same comment but enclosed within @{text "(**\**)"} in the output string. To enclose a string, you can use the function @{ML "enclose s1 s2 s" for s1 s2 s} which produces the string @{ML - "s1 ^ s ^ s2" for s1 s2 s}. + "s1 ^ s ^ s2" for s1 s2 s}. Hint: To simplify the task ignore the proper + nesting of comments. \end{exercise} *}