diff -r 403d4e0ad712 -r e21b2f888fa2 CookBook/Parsing.thy --- a/CookBook/Parsing.thy Mon Oct 13 17:53:13 2008 +0200 +++ b/CookBook/Parsing.thy Tue Oct 14 01:33:55 2008 -0400 @@ -1,5 +1,5 @@ theory Parsing -imports Main +imports Base begin @@ -7,6 +7,139 @@ text {* + Isabelle distinguishes between \emph{outer} and \emph{inner} syntax. + Theory commands, such as \isacommand{definition}, \isacommand{inductive} and so + on, belong to the outer syntax, whereas items inside double quotation marks, such + as terms and types, belong to the inner syntax. For parsing inner syntax, + Isabelle uses a rather general and sophisticated algorithm due to Earley, which + is driven by priority grammars. Parsers for outer syntax are built up by functional + parsing combinators. These combinators are a well-established technique for parsing, + which has, for example, been described in Paulson's classic book \cite{paulson-ml2}. + + Isabelle developers are usually concerned with writing parsers for outer + syntax, either for new definitional packages or for calling tactics. The library + for writing such parsers in can roughly be split up into two parts. + The first part consists of a collection of generic parser combinators defined + in the structure @{ML_struct Scan} in the file + @{ML_file "Pure/General/scan.ML"}. The second part of the library consists of + combinators for dealing with specific token types, which are defined in the + structure @{ML_struct OuterParse} in the file + @{ML_file "Pure/Isar/outer_parse.ML"}. + +*} + +section {* Building Up Generic Parsers *} + +text {* + Let us first have a look at parsing strings using generic parsing combinators. + The function @{ML "$$"} takes a string and will ``consume'' this string from + a given input list of strings. Consume in this context means that it will + return a pair consisting of this string and the rest of the list. + For example: +*} + +ML {* ($$ "h") (explode "hello"); + ($$ "w") (explode "world") *} + +text {* + returns @{ML "(\"h\", [\"e\", \"l\", \"l\", \"o\"])"} and then + @{ML "(\"w\", [\"o\", \"r\", \"l\", \"d\"])"}. This function will either succeed (as in + the two examples above) or raise the exeption @{ML_text "FAIL"} if no string + can be consumed: for example @{ML_text "($$ \"x\") (explode \"world\")"}. + + Two such parser can be connected in sequence using the funtion @{ML "(op --)"}. + For example +*} + +ML {* (($$ "h") -- ($$ "e") -- ($$ "l")) (explode "hello") *} + +text {* + returns @{ML "(((\"h\", \"e\"), \"l\"), [\"l\", \"o\"])"}. Note how the result of + consumed strings builds up on the left as nested pairs. + + Parsers that explore + alternatives can be constructed using the function @{ML "(op ||)"}. For example, the + parser @{ML_open "p || q" for p q} returns the result of @{ML_text "p"}, if it succeeds, + otherwise it returns the result of @{ML_text "q"}. For example +*} + +ML {* + let val hw = ($$ "h") || ($$ "w") + val input1 = (explode "hello") + val input2 = (explode "world") + in + (hw input1, hw input2) + end +*} + +text {* + will in the first case consume the @{ML_text "h"} and in the second the @{ML_text "w"}. + The functions @{ML "(op |--)"} and @{ML "(op --|)"} work like the sequencing funtion + for parsers, except that they discard the item parsed by the first (respectively second) + parser. For example +*} + +ML {* + let val just_h = ($$ "h") --| ($$ "e") + val just_e = ($$ "h") |-- ($$ "e") + val input = (explode "hello") + in + (just_h input, just_e input) + end +*} + +text {* + produces @{ML "(\"h\", [\"l\", \"l\", \"o\"])"} and @{ML "(\"e\", [\"l\", \"l\", \"o\"])"}, + respectively. + + (FIXME does it make sense to explain the other functions from @{ML_text BASIC_SCAN}?) + + The parser @{ML_open "Scan.optional p x" for p x} returns the result of the parser + @{ML_text "p"}, if it succeeds; otherwise it returns + the default value @{ML_text "x"}. For example +*} + +ML {* + let val p = Scan.optional ($$ "h") "x" + val input1 = (explode "hello") + val input2 = (explode "world") + in + (p input1, p input2) + end +*} + +text {* + returns @{ML "(\"h\", [\"e\", \"l\", \"l\", \"o\"])"} in the first case and + in the second case @{ML "(\"x\", [\"w\", \"o\", \"r\", \"l\", \"d\"])"}. + +*} + +text {* (FIXME: explain @{ML "op !!"}) *} + +ML {* + + val err_fn = (fn _ => "foo"); + val p = (!! err_fn ($$ "h")) || ($$ "w"); + val input1 = (explode "hello"); + val input2 = (explode "world"); + + p input1; +*} + +text {* (FIXME: why does @{ML_text "p input2"} not do anything with foo?) *} + +text {* (FIXME: explain function application) *} + +ML {* fun parse_fn (x,y) = (x,y^y) *} + +ML {* ((($$ "h") -- ($$ "e")) >> parse_fn) (explode "hello") *} + +text {* (FIXME: explain @{ML_text "lift"}) *} + +chapter {* Parsing *} + +text {* + Lots of Standard ML code is given in this document, for various reasons, including: \begin{itemize}