--- a/CookBook/Parsing.thy Mon Oct 20 06:22:11 2008 +0000
+++ b/CookBook/Parsing.thy Mon Oct 27 18:48:52 2008 +0100
@@ -3,6 +3,7 @@
begin
+
chapter {* Parsing *}
text {*
@@ -15,16 +16,19 @@
is driven by priority grammars. Parsers for outer syntax are built up by functional
parsing combinators. These combinators are a well-established technique for parsing,
which has, for example, been described in Paulson's classic book \cite{paulson-ml2}.
-
- Isabelle developers are usually concerned with writing parsers for outer
- syntax, either for new definitional packages or for calling tactics. The library
- for writing such parsers in can roughly be split up into two parts.
+ Isabelle developers are usually concerned with writing these outer syntax parsers,
+ either for new definitional packages or for calling tactics with specific arguments.
+
+ \begin{readmore}
+ The library
+ for writing parser combinators can be split up, roughly, into two parts.
The first part consists of a collection of generic parser combinators defined
in the structure @{ML_struct Scan} in the file
@{ML_file "Pure/General/scan.ML"}. The second part of the library consists of
combinators for dealing with specific token types, which are defined in the
structure @{ML_struct OuterParse} in the file
@{ML_file "Pure/Isar/outer_parse.ML"}.
+ \end{readmore}
*}
@@ -50,6 +54,8 @@
will raise the exception @{ML_text "FAIL"}.
There are three exceptions used in the parsing combinators:
+ (FIXME: do the exception need to be explained, because the user cannot use them from ``outside''?)
+
\begin{itemize}
\item @{ML_text "FAIL"} is used to indicate that alternative routes of parsing
might be explored.
@@ -86,7 +92,7 @@
Parsers that explore
alternatives can be constructed using the function @{ML "(op ||)"}. For example, the
- parser @{ML_open "(p || q)" for p q} returns the result of @{ML_text "p"}, if it succeeds,
+ parser @{ML_open "(p || q)" for p q} returns the result of @{ML_text "p"}, in case it succeeds,
otherwise it returns the result of @{ML_text "q"}. For example
@{ML_response [display]
@@ -135,24 +141,27 @@
@{ML_open [display] "(p -- q) || r" for p q r}
However, this parser is problematic for producing an appropriate error message, in case
- the parsing of @{ML_open "(p -- q)" for p q} fails. Because one loses the information
- that @{ML_text p} should be followed by @{ML_text q}. To see this consider the case that @{ML_text p}
- is present in the input, but not @{ML_text q}. So @{ML_open "(p -- q)" for p q} will fail and the
+ the parsing of @{ML_open "(p -- q)" for p q} fails. Because in that case one loses with the parser
+ above the information
+ that @{ML_text p} should be followed by @{ML_text q}. To see this consider the case in
+ which @{ML_text p}
+ is present in the input, but not @{ML_text q}. That means @{ML_open "(p -- q)" for p q} will fail
+ and the
alternative parser @{ML_text r} will be tried. However in many circumstanes this will be the wrong
- parser for the input and therefore will fail. The error message is then caused by the
- failure of @{ML_text r}, not by the absense of @{ML_text p} in the input. These situations
- can be avoided using the funtion @{ML "(op !!)"}. This function aborts the whole process of
+ parser for the input ``p-followed-by-q'' and therefore will fail. The error message is then caused by the
+ failure of @{ML_text r}, not by the absense of @{ML_text q} in the input. This kind of situation
+ can be avoided by using the funtion @{ML "(op !!)"}. This function aborts the whole process of
parsing in case of failure and invokes an error message. For example if we invoke the parser
@{ML [display] "(!! (fn _ => \"foo\") ($$ \"h\"))"}
- on @{ML_text "hello"}, the parsing succeeds
+ on @{ML_text [quotes] "hello"}, the parsing succeeds
@{ML_response [display]
"(!! (fn _ => \"foo\") ($$ \"h\")) (explode \"hello\")"
"(\"h\", [\"e\", \"l\", \"l\", \"o\"])"}
- but if we invoke it on @{ML_text "world"}
+ but if we invoke it on @{ML_text [quotes] "world"}
@{ML_response_fake [display] "(!! (fn _ => \"foo\") ($$ \"h\")) (explode \"world\")"
"Exception ABORT raised"}
@@ -165,14 +174,14 @@
"Exception Error \"foo\" raised"}
This ``prefixing'' is usually done by wrappers such as @{ML "OuterSyntax.command"}
- (FIXME: give reference to late).
+ (FIXME: give reference to later place).
Returning to our example of parsing @{ML_open "(p -- q) || r" for p q r}. If we want
- to generate the correct error message for @{ML_text q} not following @{ML_text p}, then
+ to generate the correct error message for p-followed-by-q, then
we have to write
*}
-ML {*
+ML {*
fun p_followed_by_q p q r =
let
val err = (fn _ => p ^ " is not followed by " ^ q)
@@ -240,12 +249,43 @@
This is because the parsers for the theory syntax, as well as the parsers for the
argument syntax of proof methods and attributes, use the type
@{ML_type OuterParse.token} (which is identical to the type @{ML_type OuterLex.token}).
+
+ \begin{readmore}
The parser functions for the theory syntax are contained in the structure
- @{ML_struct OuterParse} defined in the file\linebreak @{ML_file "Pure/Isar/outer_parse.ML"}.
+ @{ML_struct OuterParse} defined in the file @{ML_file "Pure/Isar/outer_parse.ML"}.
+ The definition for tokens is in the file @{ML_file "Pure/Isar/outer_lex.ML"}.
+ \end{readmore}
+*}
+
+ML {* OuterLex.mk_text "hello" *}
+
+ML {* print_depth 50 *}
+
+ML {*
+ let open OuterLex in
+ OuterSyntax.scan Position.none "for"
+ end;
+
*}
ML {* map OuterLex.mk_text (explode "hello") *}
+ML {*
+
+fun my_scan lex pos str =
+ Source.of_string str
+ |> Symbol.source {do_recover = false}
+ |> OuterLex.source {do_recover = SOME false}
+ (fn () => pairself (Scan.make_lexicon o map Symbol.explode) lex) pos
+ |> Source.exhaust;
+
+*}
+
+ML {*
+let val toks = my_scan (["hello"], []) Position.none "hello"
+in (OuterParse.$$$ "hello") toks end
+*}
+
text {*
@{ML_response_fake [display] "map OuterLex.mk_text (explode \"hello\")"