--- a/CookBook/Parsing.thy Mon Nov 24 02:51:08 2008 +0100
+++ b/CookBook/Parsing.thy Tue Nov 25 05:19:27 2008 +0000
@@ -21,7 +21,7 @@
\begin{readmore}
The library
- for writing parser combinators can be split up, roughly, into two parts.
+ for writing parser combinators is split up, roughly, into two parts.
The first part consists of a collection of generic parser combinators defined
in the structure @{ML_struct Scan} in the file
@{ML_file "Pure/General/scan.ML"}. The second part of the library consists of
@@ -37,7 +37,7 @@
text {*
Let us first have a look at parsing strings using generic parsing combinators.
- The function @{ML "(op $$)"} takes a string and will ``consume'' this string from
+ The function @{ML "(op $$)"} takes a string as argument and will ``consume'' this string from
a given input list of strings. ``Consume'' in this context means that it will
return a pair consisting of this string and the rest of the input list.
For example:
@@ -63,7 +63,7 @@
It is used for example in the function @{ML "(op !!)"} (see below).
\end{itemize}
- However, note that these exception private to the parser and cannot be accessed
+ However, note that these exceptions are private to the parser and cannot be accessed
by the programmer (for example to handle them).
Slightly more general than the parser @{ML "(op $$)"} is the function @{ML
@@ -108,7 +108,7 @@
"((\"h\", [\"e\", \"l\", \"l\", \"o\"]), (\"w\", [\"o\", \"r\", \"l\", \"d\"]))"}
The functions @{ML "(op |--)"} and @{ML "(op --|)"} work like the sequencing funtion
- for parsers, except that they discard the item parsed by the first (respectively second)
+ for parsers, except that they discard the item being parsed by the first (respectively second)
parser. For example
@{ML_response [display]
@@ -136,7 +136,16 @@
"((\"h\", [\"e\", \"l\", \"l\", \"o\"]), (\"x\", [\"w\", \"o\", \"r\", \"l\", \"d\"]))"}
The function @{ML Scan.option} works similarly, except no default value can
- be given and in the failure case this parser does nothing.
+ be given. Instead, the result is wrapped as an @{text "option"}-type. For example:
+
+@{ML_response [display]
+"let
+ val p = Scan.option ($$ \"h\")
+ val input1 = (explode \"hello\")
+ val input2 = (explode \"world\")
+in
+ (p input1, p input2)
+end" "((SOME \"h\", [\"e\", \"l\", \"l\", \"o\"]), (NONE, [\"w\", \"o\", \"r\", \"l\", \"d\"]))"}
The function @{ML "(op !!)"} helps to produce appropriate error messages
during parsing. For example if one wants to parse that @{ML_text p} is immediately
@@ -157,7 +166,7 @@
caused by the
failure of @{ML_text r}, not by the absense of @{ML_text q} in the input. This kind of situation
can be avoided by using the funtion @{ML "(op !!)"}. This function aborts the whole process of
- parsing in case of failure and invokes an error message. For example if we invoke the parser
+ parsing in case of a failure and invokes an error message. For example if we invoke the parser
@{ML [display] "(!! (fn _ => \"foo\") ($$ \"h\"))"}
@@ -184,7 +193,7 @@
Returning to our example of parsing @{ML_open "(p -- q) || r" for p q r}. If we want
to generate the correct error message for p-followed-by-q, then
- we have to write, for example
+ we have to write:
*}
ML {*
@@ -221,14 +230,15 @@
succeeds at least once.
Also note that the parser would have aborted with the exception @{ML_text MORE}, if
- we had run it only on just @{ML_text [quotes] "hhhh"}. This can be awoided using
+ we had run it only on just @{ML_text [quotes] "hhhh"}. This can be avoided using
the wrapper @{ML Scan.finite} and the ``stopper-token'' @{ML Symbol.stopper}. With
them we can write
@{ML_response [display] "Scan.finite Symbol.stopper (Scan.repeat ($$ \"h\")) (explode \"hhhh\")"
"([\"h\", \"h\", \"h\", \"h\"], [])"}
- However, the Isabelle-develloper rarely needs to do this kind of wrapping manually.
+ However, this kind of manually wrapping needs to be done only very rarely
+ in practise, because it is already done by the infrastructure for you.
After parsing succeeded, one nearly always wants to apply a function on the parsed
items. This is done using the function @{ML_open "(p >> f)" for p f} which runs
@@ -272,31 +282,58 @@
The structure @{ML_struct OuterLex} defines several kinds of token (for example
@{ML "OuterLex.Ident"} for identifiers, @{ML "OuterLex.Keyword"} for keywords and
- @{ML "OuterLex.Command"} for commands). Some parsers take into account the
+ @{ML "OuterLex.Command"} for commands). Some token parsers take into account the
kind of token.
- We can generate a token list using the function @{ML "OuterSyntax.scan"}, which we give
- below @{ML "Position.none"} as argument since, at the moment, we are not interested in
- generating precise error messages. The following\footnote{There is something funny
- going on with the pretty printing of the result token list.}
+ We can generate a token list out of a string using the function @{ML
+ "OuterSyntax.scan"}, which we give below @{ML "Position.none"} as argument
+ since, at the moment, we are not interested in generating precise error
+ messages. The following
@{ML_response_fake [display] "OuterSyntax.scan Position.none \"hello world\""
-"[Token (\<dots>,(OuterLex.Ident, \"hello\"),\<dots>),
- Token (\<dots>,(OuterLex.Space, \" \"),\<dots>),
- Token (\<dots>,(OuterLex.Ident, \"world\"),\<dots>)]"}
+"[Token (\<dots>,(Ident, \"hello\"),\<dots>),
+ Token (\<dots>,(Space, \" \"),\<dots>),
+ Token (\<dots>,(Ident, \"world\"),\<dots>)]"}
+
+ produces three tokens where the first and the last are identifiers, since
+ @{ML_text [quotes] "hello"} and @{ML_text [quotes] "world"} do not match any
+ other syntactic category.\footnote{Note that because of a possible a bug in
+ the PolyML runtime system the result is printed as @{text "?"}, instead of
+ the token.} The second indicates a space.
+
+ Many parsing functions later on will require spaces, comments and the like
+ to have been filtered out. In what follows below, we are going to use the
+ functions @{ML filter} and @{ML OuterLex.is_proper} do this. For example
+
- produces three tokens where the first and the last are identifiers, since
- @{ML_text [quotes] "hello"} and @{ML_text [quotes] "world"} do not match
- any other category. The second indicates a space. Many parsing functions
- later on will require spaces, comments and the like to have been filtered out.
+@{ML_response_fake [display]
+"let
+ val input = OuterSyntax.scan Position.none \"hello world\"
+in
+ filter OuterLex.is_proper input
+end"
+"[Token (\<dots>,(Ident, \"hello\"), \<dots>), Token (\<dots>,(Ident, \"world\"), \<dots>)]"}
+
+ For convenience we are going to use the function
+
+*}
+
+ML {*
+fun filtered_input str =
+ filter OuterLex.is_proper (OuterSyntax.scan Position.none str)
+*}
+
+text {*
+
If we parse
-@{ML_response_fake [display] "OuterSyntax.scan Position.none \"inductive|for\""
-"[Token (\<dots>,(OuterLex.Command, \"inductive\"),\<dots>),
- Token (\<dots>,(OuterLex.Keyword, \"|\"),\<dots>),
- Token (\<dots>,(OuterLex.Keyword, \"for\"),\<dots>)]"}
+@{ML_response_fake [display]
+"filtered_input \"inductive | for\""
+"[Token (\<dots>,(Command, \"inductive\"),\<dots>),
+ Token (\<dots>,(Keyword, \"|\"),\<dots>),
+ Token (\<dots>,(Keyword, \"for\"),\<dots>)]"}
- we obtain a list consiting of only command and keyword tokens.
+ we obtain a list consiting of only a command and two keyword tokens.
If you want to see which keywords and commands are currently known, use
the following (you might have to adjust the @{ML print_depth} in order to
see the complete list):
@@ -310,11 +347,11 @@
"([\"}\",\"{\",\<dots>],[\"\<rightleftharpoons>\",\"\<leftharpoondown>\",\<dots>])"}
Now the parser @{ML "OuterParse.$$$"} parses a single keyword. For example
-
+
@{ML_response [display]
"let
- val input1 = OuterSyntax.scan Position.none \"where for\"
- val input2 = OuterSyntax.scan Position.none \"|in\"
+ val input1 = filtered_input \"where for\"
+ val input2 = filtered_input \"| in\"
in
(OuterParse.$$$ \"where\" input1, OuterParse.$$$ \"|\" input2)
end"
@@ -324,7 +361,7 @@
@{ML_response [display]
"let
- val input = OuterSyntax.scan Position.none \"|in\"
+ val input = filtered_input \"| in\"
in
(OuterParse.$$$ \"|\" -- OuterParse.$$$ \"in\") input
end"
@@ -336,24 +373,23 @@
@{ML_response [display]
"let
- val input = OuterSyntax.scan Position.none \"in|in|in\\n\"
+ val input = filtered_input \"in | in | in end\"
in
(OuterParse.enum \"|\" (OuterParse.$$$ \"in\")) input
end"
"([\"in\",\"in\",\"in\"],[\<dots>])"}
- @{ML "OuterParse.enum1"} works similarly, except that the parsed list must
- be non-empty.
-
- Note that we had to add a @{ML_text [quotes] "\\n"} at the end of the parsed
- string, otherwise the parser would have consumed all tokens and then failed with
- the exception @{ML_text "MORE"}. Like in the previous section, we can avoid
- this exception using the wrapper @{ML Scan.finite}. This time, however, we
- have to use the ``stopper-token'' @{ML OuterLex.stopper}. We can write
+ @{ML "OuterParse.enum1"} works similarly, except that the parsed list must
+ be non-empty. Note that we had to add a @{ML_text [quotes] "end"} at the end
+ of the parsed string, otherwise the parser would have consumed all tokens
+ and then failed with the exception @{ML_text "MORE"}. Like in the previous
+ section, we can avoid this exception using the wrapper @{ML
+ Scan.finite}. This time, however, we have to use the ``stopper-token'' @{ML
+ OuterLex.stopper}. We can write
@{ML_response [display]
"let
- val input = OuterSyntax.scan Position.none \"in|in|in\"
+ val input = filtered_input \"in | in | in\"
in
Scan.finite OuterLex.stopper (OuterParse.enum \"|\" (OuterParse.$$$ \"in\")) input
end"
@@ -366,7 +402,7 @@
@{ML_response_fake [display]
"let
- val input = OuterSyntax.scan Position.none \"in|\"
+ val input = filtered_input \"in |\"
val parse_bar_then_in = OuterParse.$$$ \"|\" -- OuterParse.$$$ \"in\"
in
Scan.error (OuterParse.!!! parse_bar_then_in) input
@@ -377,15 +413,6 @@
*}
-
-ML {*
-let
- val input = filter OuterLex.is_proper (OuterSyntax.scan Position.none "(in foo)")
-in
- OuterParse.target input
-end
-*}
-
section {* Positional Information *}
text {*
@@ -410,6 +437,19 @@
*}
+ML {*
+ OuterParse.opt_target
+*}
+
+ML {*
+ OuterParse.opt_target --
+ OuterParse.fixes --
+ OuterParse.for_fixes --
+ Scan.optional (OuterParse.$$$ "where" |--
+ OuterParse.!!! (OuterParse.enum1 "|" (SpecParse.opt_thm_name ":" -- OuterParse.prop))) []
+
+*}
+
text {* (FIXME funny output for a proposition) *}